Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Generation
Computer Vision
›
Generation
›
Image Captioning
781 directly classified papers
Papers per year
2003: 1
2008: 1
2011: 1
2012: 1
2013: 5
2014: 2
2015: 21
2016: 17
2017: 36
2018: 47
2019: 92
2020: 73
2021: 96
2022: 91
2023: 107
2024: 86
2025: 96
2026: 8
Papers
Towards Cross-Lingual Explanation of Artwork in Large-scale Vision Language Models
NAACL 2025
DDGIP: Radiology Report Generation Through Disease Description Graph and Informed Prompting
NAACL 2025
Evaluation of Multilingual Image Captioning: How far can we get with CLIP models?
NAACL 2025
Target-Augmented Shared Fusion-based Multimodal Sarcasm Explanation Generation
NAACL 2025
RONA: Pragmatically Diverse Image Captioning with Coherence Relations
NAACL 2025
Caption Generation in Cultural Heritage: Crowdsourced Data and Tuning Multimodal Large Language Models
NAACL 2025
Defining and Quantifying Visual Hallucinations in Vision-Language Models
NAACL 2025
Chain of Functions: A Programmatic Pipeline for Fine-Grained Chart Reasoning Data Generation
IJCNLP 2025
Captions Speak Louder than Images: Generalizing Foundation Models for E-commerce from High-quality Multimodal Instruction Data
IJCNLP 2025
DAMPER: A Dual-Stage Medical Report Generation Framework with Coarse-Grained MeSH Alignment and Fine-Grained Hypergraph Matching
AAAI 2025
Integrating Video and Text: A Balanced Approach to Multimodal Summary Generation and Evaluation
IJCNLP 2025
RRG-Mamba: Efficient Radiology Report Generation with State Space Model
IJCAI 2025
Cross-modal Clustering-based Retrieval for Scalable and Robust Image Captioning
ACL 2025
Engage for All: Making Ordinary Image Descriptions Appealing Again!
ICCV 2025
ChartCap: Mitigating Hallucination of Dense Chart Captioning
ICCV 2025
ReVision: A Dataset and Baseline VLM for Privacy-Preserving Task-Oriented Visual Instruction Rewriting
IJCNLP 2025
ExCap3D: Expressive 3D Scene Understanding via Object Captioning with Varying Detail
ICCV 2025
DiffTell: A High-Quality Dataset for Describing Image Manipulation Changes
ICCV 2025
A Diagnostic Framework for Auditing Reference-Free Vision-Language Metrics
IJCNLP 2025
Describe Anything: Detailed Localized Image and Video Captioning
ICCV 2025
A Conformal Risk Control Framework for Granular Word Assessment and Uncertainty Calibration of CLIPScore Quality Estimates
ACL 2025
ViPCap: Retrieval Text-Based Visual Prompts for Lightweight Image Captioning
AAAI 2025
LOTUS: A Leaderboard for Detailed Image Captioning from Quality to Societal Bias and User Preferences
ACL 2025
STARS: A Unified Framework for Singing Transcription, Alignment, and Refined Style Annotation
ACL 2025
RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction
EMNLP 2025
<
1
2
3
4
5
…
32
>