Computer Vision › Core AI ›

Multimodal Learning

1257 directly classified papers

Papers per year

Papers

FOAM: A Follower-aware Speaker Model For Vision-and-Language Navigation NAACL 2022

Visual Acoustic Matching CVPR 2022

FashionVLP: Vision Language Transformer for Fashion Retrieval With Feedback CVPR 2022

PointCLIP: Point Cloud Understanding by CLIP CVPR 2022

SEEG: Semantic Energized Co-Speech Gesture Generation CVPR 2022

Vector Quantized Diffusion Model for Text-to-Image Synthesis CVPR 2022

Coreference by Appearance: Visually Grounded Event Coreference Resolution EMNLP 2021

Can images help recognize entities? A study of the role of images for Multimodal NER EMNLP 2021

Capturing Logical Structure of Visually Structured Documents with Multimodal Transition Parser EMNLP 2021

MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question Answering EMNLP 2021

Visually Grounded Concept Composition EMNLP 2021

Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering EMNLP 2021

Inflate and Shrink:Enriching and Reducing Interactions for Fast Text-Image Retrieval EMNLP 2021

Visually Grounded Reasoning across Languages and Cultures EMNLP 2021

GLAVNet: Global-Local Audio-Visual Cues for Fine-Grained Material Recognition CVPR 2021

Looking Into Your Speech: Learning Cross-Modal Affinity for Audio-Visual Speech Separation CVPR 2021

Seeing Out of the Box: End-to-End Pre-Training for Vision-Language Representation Learning CVPR 2021

TAP: Text-Aware Pre-Training for Text-VQA and Text-Caption CVPR 2021

Look Before You Speak: Visually Contextualized Utterances CVPR 2021

Domain-Robust VQA With Diverse Datasets and Methods but No Target Labels CVPR 2021

Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting CVPR 2021

Discrete-Continuous Action Space Policy Gradient-Based Attention for Image-Text Matching CVPR 2021

Separating Skills and Concepts for Novel Visual Question Answering CVPR 2021

YouRefIt: Embodied Reference Understanding With Language and Gesture ICCV 2021

Towards Domain Invariant Single Image Dehazing AAAI 2021