Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Generation
Computer Vision
›
Generation
›
Image Captioning
781 directly classified papers
Papers per year
2003: 1
2008: 1
2011: 1
2012: 1
2013: 5
2014: 2
2015: 21
2016: 17
2017: 36
2018: 47
2019: 92
2020: 73
2021: 96
2022: 91
2023: 107
2024: 86
2025: 96
2026: 8
Papers
DiffuVST: Narrating Fictional Scenes with Global-History-Guided Denoising Models
EMNLP 2023
Crossing the Gap: Domain Generalization for Image Captioning
CVPR 2023
Summary-Oriented Vision Modeling for Multimodal Abstractive Summarization
ACL 2023
PR-MCS: Perturbation Robust Metric for MultiLingual Image Captioning
EMNLP 2023
Generating Visual Spatial Description via Holistic 3D Scene Understanding
ACL 2023
MetaReVision: Meta-Learning with Retrieval for Visually Grounded Compositional Concept Acquisition
EMNLP 2023
Zero-shot Visual Question Answering with Language Model Feedback
ACL 2023
HEAR: Hearing Enhanced Audio Response for Video-grounded Dialogue
EMNLP 2023
HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning
CVPR 2023
Query-based Image Captioning from Multi-context 360cdegree Images
EMNLP 2023
RC3: Regularized Contrastive Cross-lingual Cross-modal Pre-training
ACL 2023
Transferring General Multimodal Pretrained Models to Text Recognition
ACL 2023
Visual Storytelling with Question-Answer Plans
EMNLP 2023
Pragmatic Inference with a CLIP Listener for Contrastive Captioning
ACL 2023
GPT-4 as an Effective Zero-Shot Evaluator for Scientific Figure Captions
EMNLP 2023
“Let’s not Quote out of Context”: Unified Vision-Language Pretraining for Context Assisted Image Captioning
ACL 2023
FACTUAL: A Benchmark for Faithful and Consistent Textual Scene Graph Parsing
ACL 2023
VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning
AAAI 2023
Show, Interpret and Tell: Entity-Aware Contextualised Image Captioning in Wikipedia
AAAI 2023
METransformer: Radiology Report Generation by Transformer With Multiple Learnable Expert Tokens
CVPR 2023
Improving multimodal datasets with image captioning
NIPS 2023
Controllable Chest X-Ray Report Generation from Longitudinal Representations
EMNLP 2023
Women Wearing Lipstick: Measuring the Bias Between an Object and Its Related Gender
EMNLP 2023
Cross2StrA: Unpaired Cross-lingual Image Captioning with Cross-lingual Cross-modal Structure-pivoted Alignment
ACL 2023
Boosting Radiology Report Generation by Infusing Comparison Prior
ACL 2023
<
1
…
9
10
11
…
32
>