Computer Vision › Generation ›

Image Captioning

781 directly classified papers

Papers per year

Papers

ViPCap: Retrieval Text-Based Visual Prompts for Lightweight Image Captioning AAAI 2025

A Diagnostic Framework for Auditing Reference-Free Vision-Language Metrics IJCNLP 2025

Now You See Me: Context-Aware Automatic Audio Description WACV 2025

What Makes for Good Image Captions? EMNLP 2025

WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines NAACL 2025

IterMeme: Expert-Guided Multimodal LLM for Interactive Meme Creation with Layout-Aware Generation IJCAI 2025

MsRAG: Knowledge Augumented Image Captioning with Object-level Multi-source RAG IJCAI 2025

Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives IJCAI 2025

SGDiff: Scene Graph Guided Diffusion Model for Image Collaborative SegCaptioning AAAI 2025

LOTUS: A Leaderboard for Detailed Image Captioning from Quality to Societal Bias and User Preferences ACL 2025

Detective Networks: Enhancing Disaster Recognition in Images Through Attention Shifting using Optimal Masking WACV 2025

Enhancing Large Language Models for Scientific Multimodal Summarization with Multimodal Output COLING 2025

Bridging Semantic and Modality Gaps in Zero-Shot Captioning via Retrieval from Synthetic Data EMNLP 2025

Seeing Beyond: Enhancing Visual Question Answering with Multi-Modal Retrieval COLING 2025

Learning to Describe Implicit Changes: Noise-robust Pre-training for Image Difference Captioning EMNLP 2025

VELA: An LLM-Hybrid-as-a-Judge Approach for Evaluating Long Image Captions EMNLP 2025

Enhancing Large Vision-Language Models with Ultra-Detailed Image Caption Generation EMNLP 2025

RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction EMNLP 2025

SPECS: Specificity-Enhanced CLIP-Score for Long Image Caption Evaluation EMNLP 2025

NarrAD: Automatic Generation of Audio Descriptions for Movies with Rich Narrative Context WACV 2025

Semantic and Expressive Variations in Image Captions Across Languages CVPR 2025

ImageEval 2025: The First Arabic Image Captioning Shared Task EMNLP 2025

CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era ACL 2025

STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection CVPR 2025

COVE: COntext and VEracity prediction for out-of-context images NAACL 2025