Exploring Artificial Image Generation for Stance Detection

Zhengkang Zhang; Zhongqing Wang; Guodong Zhou

2025 EMNLP EMNLP 2025

Exploring Artificial Image Generation for Stance Detection

Abstract

AbstractStance detection is a task aimed at identifying and analyzing the author’s stance from text. Previous studies have primarily focused on the text, which may not fully capture the implicit stance conveyed by the author. To address this limitation, we propose a novel approach that transforms original texts into artificially generated images and uses the visual representation to enhance stance detection. Our approach first employs a text-to-image model to generate candidate images for each text. These images are carefully crafted to adhere to three specific criteria: textual relevance, target consistency, and stance consistency. Next, we introduce a comprehensive evaluation framework to select the optimal image for each text from its generated candidates. Subsequently, we introduce a multimodal stance detection model that leverages both the original textual content and the generated image to identify the author’s stance. Experiments demonstrate the effectiveness of our approach and highlight the importance of artificially generated images for stance detection.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — artificial image generation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Zhengkang Zhang , Zhongqing Wang , Guodong Zhou

Topics

Artificial Intelligence > Core AI > Multimodal Learning Deep Learning > Models > Diffusion Models Computer Vision > Generation > Image Generation Natural Language Processing > Applications > Text Classification Machine Learning > Learning Types > Multimodal Learning Deep Learning > Learning Types > Multi-Modal Learning

Keywords

image generation text classification stance detection multimodal learning text-to-image generation visual representation multimodal stance detection artificial image generation

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025