EmotiCrafter: Text-to-Emotional-Image Generation based on Valence-Arousal Model

Shengqi Dang; Yi He; Long Ling; Ziqing Qian; Nanxuan Zhao; Nan Cao

2025 ICCV ICCV 2025

EmotiCrafter: Text-to-Emotional-Image Generation based on Valence-Arousal Model

Abstract

Recent research shows that emotions can enhance users' cognition and influence information communication. While research on visual emotion analysis is extensive, limited work has been done on helping users generate emotionally rich image content. Existing work on emotional image generation relies on discrete emotion categories, making it challenging to capture complex and subtle emotional nuances accurately. Additionally, these methods struggle to control the specific content of generated images based on text prompts. In this paper, we introduce the task of continuous emotional image content generation (C-EICG) and present EmotiCrafter, a general emotional image generation model that generates images based on free text prompts and Valence-Arousal (V-A) values. It leverages a novel emotion-embedding mapping network to fuse V-A values into textual features, enabling the capture of emotions in alignment with intended input prompts. A novel loss function is also proposed to enhance emotion expression. The experimental results show that our method effectively generates images representing specific emotions with the desired content and outperforms existing techniques.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Interdisciplinary and Natural Language Processing

🧭 Keyword Pioneer — emotional image generation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Shengqi Dang , Yi He , Long Ling , Ziqing Qian , Nanxuan Zhao , Nan Cao

Topics

Deep Learning > Models > Generative Models Computer Vision > Generation > Image Generation Natural Language Processing > Generation > Text Generation Interdisciplinary > Social > Affective Computing

Keywords

image synthesis text-to-image generation continuous emotion emotional image generation valence-arousal model emotion embedding

Download PDF

Related papers

MA-CIR: A Multimodal Arithmetic Benchmark for Composed Image Retrieval 2025

SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality 2025

MonSTeR: a Unified Model for Motion, Scene, Text Retrieval 2025

ASGS: Single-Domain Generalizable Open-Set Object Detection via Adaptive Subgraph Searching 2025

Robust Dataset Condensation using Supervised Contrastive Learning 2025