Computer Vision › Generation ›

Image Captioning

781 directly classified papers

Papers per year

Papers

GLaMM: Pixel Grounding Large Multimodal Model CVPR 2024

Learning by Correction: Efficient Tuning Task for Zero-Shot Generative Vision-Language Reasoning CVPR 2024

CIC: A Framework for Culturally-Aware Image Captioning IJCAI 2024

Visual Language – Let the Product Say What You Want AAAI 2024

Soft Knowledge Prompt: Help External Knowledge Become a Better Teacher to Instruct LLM in Knowledge-based VQA ACL 2024

Vript: A Video Is Worth Thousands of Words NIPS 2024

Resampled Datasets Are Not Enough: Mitigating Societal Bias Beyond Single Attributes EMNLP 2024

MAFA: Managing False Negatives for Vision-Language Pre-training CVPR 2024

LVD-2M: A Long-take Video Dataset with Temporally Dense Captions NIPS 2024

Towards More Unified In-context Visual Understanding CVPR 2024

On Scaling Up a Multilingual Vision and Language Model CVPR 2024

Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models CVPR 2024

Knowledge-Guided Cross-Topic Visual Question Generation COLING 2024

FLEUR: An Explainable Reference-Free Evaluation Metric for Image Captioning Using a Large Multimodal Model ACL 2024

Understanding Retrieval Robustness for Retrieval-augmented Image Captioning ACL 2024

KiUT: Knowledge-Injected U-Transformer for Radiology Report Generation CVPR 2023

METransformer: Radiology Report Generation by Transformer With Multiple Learnable Expert Tokens CVPR 2023

ConZIC: Controllable Zero-Shot Image Captioning by Sampling-Based Polishing CVPR 2023

Semantic-Conditional Diffusion Networks for Image Captioning CVPR 2023

Generalized Decoding for Pixel, Image, and Language CVPR 2023

Aesthetically Relevant Image Captioning AAAI 2023

FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks CVPR 2023

SmallCap: Lightweight Image Captioning Prompted With Retrieval Augmentation CVPR 2023

HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning CVPR 2023

MetaCLUE: Towards Comprehensive Visual Metaphors Research CVPR 2023