← Models

Deep Learning › Models ›

Vision-Language Models

685 directly classified papers

Papers per year

Papers

Large Language Models are Temporal and Causal Reasoners for Video Question Answering EMNLP 2023

Learning the Visualness of Text Using Large Vision-Language Models EMNLP 2023

Violet: A Vision-Language Model for Arabic Image Captioning with Gemini Decoder EMNLP 2023

PR-MCS: Perturbation Robust Metric for MultiLingual Image Captioning EMNLP 2023

Language Guided Visual Question Answering: Elevate Your Multimodal Language Model Using Knowledge-Enriched Prompts EMNLP 2023

VLIS: Unimodal Language Models Guide Multimodal Language Generation EMNLP 2023

Evaluating Object Hallucination in Large Vision-Language Models EMNLP 2023

A Multi-dimensional study on Bias in Vision-Language models ACL 2023

Digging out Discrimination Information from Generated Samples for Robust Visual Question Answering ACL 2023

AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities ACL 2023

Delving into the Openness of CLIP ACL 2023

MMSD2.0: Towards a Reliable Multi-modal Sarcasm Detection System ACL 2023

KU-DMIS-MSRA at RadSum23: Pre-trained Vision-Language Model for Radiology Report Summarization ACL 2023

Medical Visual Textual Entailment for Numerical Understanding of Vision-and-Language Models ACL 2023

Improving the Cross-Lingual Generalisation in Visual Question Answering AAAI 2023

BridgeTower: Building Bridges between Encoders in Vision-Language Representation Learning AAAI 2023

STOA-VLP: Spatial-Temporal Modeling of Object and Action for Video-Language Pre-training AAAI 2023

Exploring CLIP for Assessing the Look and Feel of Images AAAI 2023

CLIP-ReID: Exploiting Vision-Language Model for Image Re-identification without Concrete Text Labels AAAI 2023

Unifying Vision-Language Representation Space with Single-Tower Transformer AAAI 2023

Top-Down Visual Attention From Analysis by Synthesis CVPR 2023

ConStruct-VL: Data-Free Continual Structured VL Concepts Learning CVPR 2023

Referring Image Matting CVPR 2023

MAP: Multimodal Uncertainty-Aware Vision-Language Pre-Training Model CVPR 2023

Grounding Counterfactual Explanation of Image Classifiers to Textual Concept Space CVPR 2023