Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines

Michael Toker; Hadas Orgad; Mor Ventura; Dana Arad; Yonatan Belinkov

2024 ACL ACL 2024

Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines

Abstract

AbstractText-to-image diffusion models (T2I) use a latent representation of a text prompt to guide the image generation process. However, the process by which the encoder produces the text representation is unknown. We propose the Diffusion Lens, a method for analyzing the text encoder of T2I models by generating images from its intermediate representations. Using the Diffusion Lens, we perform an extensive analysis of two recent T2I models. Exploring compound prompts, we find that complex scenes describing multiple objects are composed progressively and more slowly compared to simple scenes; Exploring knowledge retrieval, we find that representation of uncommon concepts require further computation compared to common concepts, and that knowledge retrieval is gradual across layers. Overall, our findings provide valuable insights into the text encoder component in T2I pipelines.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Michael Toker , Hadas Orgad , Mor Ventura , Dana Arad , Yonatan Belinkov

Topics

Artificial Intelligence > Core AI > Interpretability Deep Learning > Models > Diffusion Models Computer Vision > Generation > Image Generation

Keywords

image generation diffusion model latent representation text-to-image diffusion text encoder representational analysis

Download PDF

Related papers

Reinforcement Learning-Driven LLM Agent for Automated Attacks on LLMs 2024

EtymoLink: A Structured English Etymology Dataset 2024

Turkish Delights: A Dataset on Turkish Euphemisms 2024

Subjectivity Detection in English News using Large Language Models 2024

Does DetectGPT Fully Utilize Perturbation? Bridging Selective Perturbation to Fine-tuned Contrastive Learning Detector would be Better 2024