Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Generation
Computer Vision
›
Generation
›
Image Captioning
781 directly classified papers
Papers per year
2003: 1
2008: 1
2011: 1
2012: 1
2013: 5
2014: 2
2015: 21
2016: 17
2017: 36
2018: 47
2019: 92
2020: 73
2021: 96
2022: 91
2023: 107
2024: 86
2025: 96
2026: 8
Papers
Evaluation of Multilingual Image Captioning: How far can we get with CLIP models?
NAACL 2025
VideoGameBunny: Towards Vision Assistants for Video Games
WACV 2025
Chain of Functions: A Programmatic Pipeline for Fine-Grained Chart Reasoning Data Generation
IJCNLP 2025
Captions Speak Louder than Images: Generalizing Foundation Models for E-commerce from High-quality Multimodal Instruction Data
IJCNLP 2025
Integrating Video and Text: A Balanced Approach to Multimodal Summary Generation and Evaluation
IJCNLP 2025
A Conformal Risk Control Framework for Granular Word Assessment and Uncertainty Calibration of CLIPScore Quality Estimates
ACL 2025
ExCap3D: Expressive 3D Scene Understanding via Object Captioning with Varying Detail
ICCV 2025
DDGIP: Radiology Report Generation Through Disease Description Graph and Informed Prompting
NAACL 2025
MICE: Mixture of Image Captioning Experts Augmented e-Commerce Product Attribute Value Extraction
ACL 2025
ChartCap: Mitigating Hallucination of Dense Chart Captioning
ICCV 2025
Towards Cross-Lingual Explanation of Artwork in Large-scale Vision Language Models
NAACL 2025
VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video Generation
ACL 2025
Now You See Me: Context-Aware Automatic Audio Description
WACV 2025
ReVision: A Dataset and Baseline VLM for Privacy-Preserving Task-Oriented Visual Instruction Rewriting
IJCNLP 2025
A Diagnostic Framework for Auditing Reference-Free Vision-Language Metrics
IJCNLP 2025
Caption Generation in Cultural Heritage: Crowdsourced Data and Tuning Multimodal Large Language Models
NAACL 2025
ViPCap: Retrieval Text-Based Visual Prompts for Lightweight Image Captioning
AAAI 2025
Enhancing Large Language Models for Scientific Multimodal Summarization with Multimodal Output
COLING 2025
Seeing Beyond: Enhancing Visual Question Answering with Multi-Modal Retrieval
COLING 2025
Target-Augmented Shared Fusion-based Multimodal Sarcasm Explanation Generation
NAACL 2025
Engage for All: Making Ordinary Image Descriptions Appealing Again!
ICCV 2025
LOTUS: A Leaderboard for Detailed Image Captioning from Quality to Societal Bias and User Preferences
ACL 2025
Movie101v2: Improved Movie Narration Benchmark
ACL 2025
STARS: A Unified Framework for Singing Transcription, Alignment, and Refined Style Annotation
ACL 2025
ImageEval 2025: The First Arabic Image Captioning Shared Task
EMNLP 2025
<
1
2
3
4
5
…
32
>