Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Generation
Computer Vision
›
Generation
›
Image Captioning
781 directly classified papers
Papers per year
2003: 1
2008: 1
2011: 1
2012: 1
2013: 5
2014: 2
2015: 21
2016: 17
2017: 36
2018: 47
2019: 92
2020: 73
2021: 96
2022: 91
2023: 107
2024: 86
2025: 96
2026: 8
Papers
Mitigating Object and Action Hallucinations in Multimodal LLMs via Self-Augmented Contrastive Alignment
WACV 2026
AfriCaption: Establishing a New Paradigm for Image Captioning in African Languages
EACL 2026
MIRAGE: Metadata-guided Image Retrieval and Answer Generation for E-commerce Troubleshooting
EACL 2026
ORCA: Object Recognition and Comprehension for Archiving Marine Species
WACV 2026
Temporal Object Captioning for Street Scene Videos from LiDAR Tracks
WACV 2026
ChartQA-X: Generating Explanations for Visual Chart Reasoning
WACV 2026
LASOR: Towards Clinically Transparent and Explainable Ophthalmic Report Generation via Lesion-Aware Segmentation
WACV 2026
A Woman with a Knife or A Knife with a Woman? Measuring Directional Bias Amplification in Image Captions
WACV 2026
RONA: Pragmatically Diverse Image Captioning with Coherence Relations
NAACL 2025
Engage for All: Making Ordinary Image Descriptions Appealing Again!
ICCV 2025
Cross-modal Clustering-based Retrieval for Scalable and Robust Image Captioning
ACL 2025
Towards Cross-Lingual Explanation of Artwork in Large-scale Vision Language Models
NAACL 2025
ReVision: A Dataset and Baseline VLM for Privacy-Preserving Task-Oriented Visual Instruction Rewriting
IJCNLP 2025
Chain of Functions: A Programmatic Pipeline for Fine-Grained Chart Reasoning Data Generation
IJCNLP 2025
DAMPER: A Dual-Stage Medical Report Generation Framework with Coarse-Grained MeSH Alignment and Fine-Grained Hypergraph Matching
AAAI 2025
DiffTell: A High-Quality Dataset for Describing Image Manipulation Changes
ICCV 2025
DDGIP: Radiology Report Generation Through Disease Description Graph and Informed Prompting
NAACL 2025
Defining and Quantifying Visual Hallucinations in Vision-Language Models
NAACL 2025
ExCap3D: Expressive 3D Scene Understanding via Object Captioning with Varying Detail
ICCV 2025
A Diagnostic Framework for Auditing Reference-Free Vision-Language Metrics
IJCNLP 2025
Integrating Video and Text: A Balanced Approach to Multimodal Summary Generation and Evaluation
IJCNLP 2025
Captions Speak Louder than Images: Generalizing Foundation Models for E-commerce from High-quality Multimodal Instruction Data
IJCNLP 2025
Target-Augmented Shared Fusion-based Multimodal Sarcasm Explanation Generation
NAACL 2025
Caption Generation in Cultural Heritage: Crowdsourced Data and Tuning Multimodal Large Language Models
NAACL 2025
VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video Generation
ACL 2025
<
1
2
3
4
5
…
32
>