Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Generation
Computer Vision
›
Generation
›
Image Captioning
781 directly classified papers
Papers per year
2003: 1
2008: 1
2011: 1
2012: 1
2013: 5
2014: 2
2015: 21
2016: 17
2017: 36
2018: 47
2019: 92
2020: 73
2021: 96
2022: 91
2023: 107
2024: 86
2025: 96
2026: 8
Papers
GLaMM: Pixel Grounding Large Multimodal Model
CVPR 2024
Learning by Correction: Efficient Tuning Task for Zero-Shot Generative Vision-Language Reasoning
CVPR 2024
CIC: A Framework for Culturally-Aware Image Captioning
IJCAI 2024
Visual Language – Let the Product Say What You Want
AAAI 2024
Soft Knowledge Prompt: Help External Knowledge Become a Better Teacher to Instruct LLM in Knowledge-based VQA
ACL 2024
Vript: A Video Is Worth Thousands of Words
NIPS 2024
Resampled Datasets Are Not Enough: Mitigating Societal Bias Beyond Single Attributes
EMNLP 2024
MAFA: Managing False Negatives for Vision-Language Pre-training
CVPR 2024
LVD-2M: A Long-take Video Dataset with Temporally Dense Captions
NIPS 2024
Towards More Unified In-context Visual Understanding
CVPR 2024
On Scaling Up a Multilingual Vision and Language Model
CVPR 2024
Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models
CVPR 2024
Knowledge-Guided Cross-Topic Visual Question Generation
COLING 2024
FLEUR: An Explainable Reference-Free Evaluation Metric for Image Captioning Using a Large Multimodal Model
ACL 2024
Understanding Retrieval Robustness for Retrieval-augmented Image Captioning
ACL 2024
KiUT: Knowledge-Injected U-Transformer for Radiology Report Generation
CVPR 2023
METransformer: Radiology Report Generation by Transformer With Multiple Learnable Expert Tokens
CVPR 2023
ConZIC: Controllable Zero-Shot Image Captioning by Sampling-Based Polishing
CVPR 2023
Semantic-Conditional Diffusion Networks for Image Captioning
CVPR 2023
Generalized Decoding for Pixel, Image, and Language
CVPR 2023
Aesthetically Relevant Image Captioning
AAAI 2023
FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks
CVPR 2023
SmallCap: Lightweight Image Captioning Prompted With Retrieval Augmentation
CVPR 2023
HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning
CVPR 2023
MetaCLUE: Towards Comprehensive Visual Metaphors Research
CVPR 2023
<
1
…
7
8
9
…
32
>