Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Generation
Computer Vision
›
Generation
›
Image Captioning
781 directly classified papers
Papers per year
2003: 1
2008: 1
2011: 1
2012: 1
2013: 5
2014: 2
2015: 21
2016: 17
2017: 36
2018: 47
2019: 92
2020: 73
2021: 96
2022: 91
2023: 107
2024: 86
2025: 96
2026: 8
Papers
Text-Guided Image Clustering
EACL 2024
Soft Knowledge Prompt: Help External Knowledge Become a Better Teacher to Instruct LLM in Knowledge-based VQA
ACL 2024
FIRE: Food Image to REcipe Generation
WACV 2024
Comprehensive Visual Grounding for Video Description
AAAI 2024
DCU ADAPT at WMT24: English to Low-resource Multi-Modal Translation Task
EMNLP 2024
MIVC: Multiple Instance Visual Component for Visual-Language Models
WACV 2024
Cycle-Consistency Learning for Captioning and Grounding
AAAI 2024
MineObserver 2.0: A Deep Learning & In-Game Framework for Assessing Natural Language Descriptions of Minecraft Imagery
AAAI 2024
Noise-Aware Image Captioning with Progressively Exploring Mismatched Words
AAAI 2024
CLID: Controlled-Length Image Descriptions With Limited Data
WACV 2024
DIUSum: Dynamic Image Utilization for Multimodal Summarization
AAAI 2024
Video Discourse Parsing and Its Application to Multimodal Summarization: A Dataset and Baseline Approaches
EMNLP 2024
Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training
AAAI 2024
Automated Defect Report Generation for Enhanced Industrial Quality Control
AAAI 2024
Chitranuvad: Adapting Multi-lingual LLMs for Multimodal Translation
EMNLP 2024
Altogether: Image Captioning via Re-aligning Alt-text
EMNLP 2024
Visual Language – Let the Product Say What You Want
AAAI 2024
CIC: A Framework for Culturally-Aware Image Captioning
IJCAI 2024
GLaMM: Pixel Grounding Large Multimodal Model
CVPR 2024
KALE: An Artwork Image Captioning System Augmented with Heterogeneous Graph
IJCAI 2024
No Culture Left Behind: ArtELingo-28, a Benchmark of WikiArt with Captions in 28 Languages
EMNLP 2024
Learning by Correction: Efficient Tuning Task for Zero-Shot Generative Vision-Language Reasoning
CVPR 2024
MAFA: Managing False Negatives for Vision-Language Pre-training
CVPR 2024
ImageCaptioner2: Image Captioner for Image Captioning Bias Amplification Assessment
AAAI 2024
Resampled Datasets Are Not Enough: Mitigating Societal Bias Beyond Single Attributes
EMNLP 2024
<
1
…
6
7
8
…
32
>