← Learning Types

Deep Learning › Learning Types ›

Multimodal Learning

323 directly classified papers

Papers per year

Papers

Globetrotter: Connecting Languages by Connecting Images CVPR 2022

M4Singer: A Multi-Style, Multi-Singer and Musical Score Provided Mandarin Singing Corpus NIPS 2022

SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning AAAI 2022

Multimodal Adversarially Learned Inference with Factorized Discriminators AAAI 2022

UNISON: Unpaired Cross-Lingual Image Captioning AAAI 2022

D-vlog: Multimodal Vlog Dataset for Depression Detection AAAI 2022

Towards Multimodal Vision-Language Models Generating Non-generic Text AAAI 2022

Building Goal-Oriented Dialogue Systems with Situated Visual Context AAAI 2022

Vision-Language Pre-Training for Multimodal Aspect-Based Sentiment Analysis ACL 2022

Things not Written in Text: Exploring Spatial Commonsense from Visual Signals ACL 2022

MSCTD: A Multimodal Sentiment Chat Translation Dataset ACL 2022

Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition ACL 2022

Finding Structural Knowledge in Multimodal-BERT ACL 2022

What do Models Learn From Training on More Than Text? Measuring Visual Commonsense Knowledge ACL 2022

Vision-Language Pretraining: Current Trends and the Future ACL 2022

DuReadervis: A Chinese Dataset for Open-domain Document Visual Question Answering ACL 2022

Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors ACL 2022

Seeing the advantage: visually grounding word embeddings to better capture human semantic knowledge ACL 2022

Visually Grounded Interpretation of Noun-Noun Compounds in English ACL 2022

Combining Language Models and Linguistic Information to Label Entities in Memes ACL 2022

Detecting the Role of an Entity in Harmful Memes: Techniques and their Limitations ACL 2022

Fine-tuning and Sampling Strategies for Multimodal Role Labeling of Entities under Class Imbalance ACL 2022

How does fake news use a thumbnail? CLIP-based Multimodal Detection on the Unrepresentative News Image ACL 2022

Utilizing Cross-Modal Contrastive Learning to Improve Item Categorization BERT Model ACL 2022

Can Pretrained Language Models Generate Persuasive, Faithful, and Informative Ad Text for Product Descriptions? ACL 2022