Your Text Encoder Can Be An Object-Level Watermarking Controller

Naresh Kumar Devulapally; Mingzhen Huang; Vishal Asnani; Shruti Agarwal; Siwei Lyu; Vishnu Suresh Lokhande

2025 ICCV ICCV 2025

Your Text Encoder Can Be An Object-Level Watermarking Controller

Abstract

Invisible watermarking of AI-generated images can help with copyright protection, enabling detection and identification of AI-generated media. In this work, we present a novel approach to watermark images of T2I Latent Diffusion Models (LDMs). By only fine-tuning text token embeddings \mathcal W _*, we enable watermarking in selected objects or parts of the image, offering greater flexibility compared to traditional full-image watermarking. Our method leverages the text encoder's compatibility across various LDMs, allowing plug-and-play integration for different LDMs. Moreover, introducing the watermark early in the encoding stage improves robustness to adversarial perturbations in later stages of the pipeline. Our approach achieves 99% bit accuracy (48 bits) with a 10^5 xreduction in model parameters, enabling efficient watermarking.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Naresh Kumar Devulapally , Mingzhen Huang , Vishal Asnani , Shruti Agarwal , Siwei Lyu , Vishnu Suresh Lokhande

Topics

Deep Learning > Models > Diffusion Models Computer Vision > Processing > Image Editing

Keywords

adversarial perturbation image watermarking latent diffusion text encoder token embedding

Download PDF

Related papers

MA-CIR: A Multimodal Arithmetic Benchmark for Composed Image Retrieval 2025

SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality 2025

MonSTeR: a Unified Model for Motion, Scene, Text Retrieval 2025

ASGS: Single-Domain Generalizable Open-Set Object Detection via Adaptive Subgraph Searching 2025

Robust Dataset Condensation using Supervised Contrastive Learning 2025