← Learning Types

Deep Learning › Learning Types ›

Multimodal Learning

323 directly classified papers

Papers per year

Papers

Exploiting Commonsense Knowledge about Objects for Visual Activity Recognition ACL 2023

Putting Natural in Natural Language Processing ACL 2023

Stereotypes and Smut: The (Mis)representation of Non-cisgender Identities by Text-to-Image Models ACL 2023

RC3: Regularized Contrastive Cross-lingual Cross-modal Pre-training ACL 2023

UnLoc: A Unified Framework for Video Localization Tasks ICCV 2023

Multimodal Industrial Anomaly Detection via Hybrid Fusion CVPR 2023

Long-Term Rhythmic Video Soundtracker ICML 2023

GenKIE: Robust Generative Multimodal Document Key Information Extraction EMNLP 2023

MAGVLT: Masked Generative Vision-and-Language Transformer CVPR 2023

Beneath the Surface: Unveiling Harmful Memes with Multimodal Reasoning Distilled from Large Language Models EMNLP 2023

Token Mixing: Parameter-Efficient Transfer Learning from Image-Language to Video-Language AAAI 2023

Sentiment Knowledge Enhanced Self-supervised Learning for Multimodal Sentiment Analysis ACL 2023

MNER-QG: An End-to-End MRC Framework for Multimodal Named Entity Recognition with Query Grounding AAAI 2023

Mutual-Enhanced Incongruity Learning Network for Multi-Modal Sarcasm Detection AAAI 2023

ConKI: Contrastive Knowledge Injection for Multimodal Sentiment Analysis ACL 2023

What Do You MEME? Generating Explanations for Visual Semantic Role Labelling in Memes AAAI 2023

SyncDiffusion: Coherent Montage via Synchronized Joint Diffusions NIPS 2023

Towards Unified, Explainable, and Robust Multisensory Perception AAAI 2023

Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation NIPS 2023

Mass-Producing Failures of Multimodal Systems with Language Models NIPS 2023

Grounding Answers for Visual Questions Asked by Visually Impaired People CVPR 2022

Visual Emotion Representation Learning via Emotion-Aware Pre-training IJCAI 2022

Are Vision-Language Transformers Learning Multimodal Representations? A Probing Perspective AAAI 2022

How well can Text-to-Image Generative Models understand Ethical Natural Language Interventions? EMNLP 2022

Probing Cross-modal Semantics Alignment Capability from the Textual Perspective EMNLP 2022