Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Generation
Computer Vision
›
Generation
›
Image Captioning
781 directly classified papers
Papers per year
2003: 1
2008: 1
2011: 1
2012: 1
2013: 5
2014: 2
2015: 21
2016: 17
2017: 36
2018: 47
2019: 92
2020: 73
2021: 96
2022: 91
2023: 107
2024: 86
2025: 96
2026: 8
Papers
ViPCap: Retrieval Text-Based Visual Prompts for Lightweight Image Captioning
AAAI 2025
A Diagnostic Framework for Auditing Reference-Free Vision-Language Metrics
IJCNLP 2025
Now You See Me: Context-Aware Automatic Audio Description
WACV 2025
What Makes for Good Image Captions?
EMNLP 2025
WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines
NAACL 2025
IterMeme: Expert-Guided Multimodal LLM for Interactive Meme Creation with Layout-Aware Generation
IJCAI 2025
MsRAG: Knowledge Augumented Image Captioning with Object-level Multi-source RAG
IJCAI 2025
Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives
IJCAI 2025
SGDiff: Scene Graph Guided Diffusion Model for Image Collaborative SegCaptioning
AAAI 2025
LOTUS: A Leaderboard for Detailed Image Captioning from Quality to Societal Bias and User Preferences
ACL 2025
Detective Networks: Enhancing Disaster Recognition in Images Through Attention Shifting using Optimal Masking
WACV 2025
Enhancing Large Language Models for Scientific Multimodal Summarization with Multimodal Output
COLING 2025
Bridging Semantic and Modality Gaps in Zero-Shot Captioning via Retrieval from Synthetic Data
EMNLP 2025
Seeing Beyond: Enhancing Visual Question Answering with Multi-Modal Retrieval
COLING 2025
Learning to Describe Implicit Changes: Noise-robust Pre-training for Image Difference Captioning
EMNLP 2025
VELA: An LLM-Hybrid-as-a-Judge Approach for Evaluating Long Image Captions
EMNLP 2025
Enhancing Large Vision-Language Models with Ultra-Detailed Image Caption Generation
EMNLP 2025
RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction
EMNLP 2025
SPECS: Specificity-Enhanced CLIP-Score for Long Image Caption Evaluation
EMNLP 2025
NarrAD: Automatic Generation of Audio Descriptions for Movies with Rich Narrative Context
WACV 2025
Semantic and Expressive Variations in Image Captions Across Languages
CVPR 2025
ImageEval 2025: The First Arabic Image Captioning Shared Task
EMNLP 2025
CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era
ACL 2025
STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection
CVPR 2025
COVE: COntext and VEracity prediction for out-of-context images
NAACL 2025
<
1
2
3
4
5
…
32
>