Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Deep Learning
›
Learning Types
›
Multimodal Learning
323 directly classified papers
Papers per year
2014: 1
2015: 1
2017: 8
2018: 11
2019: 11
2020: 27
2021: 23
2022: 46
2023: 35
2024: 53
2025: 104
2026: 3
Papers
Exploiting Commonsense Knowledge about Objects for Visual Activity Recognition
ACL 2023
Putting Natural in Natural Language Processing
ACL 2023
Stereotypes and Smut: The (Mis)representation of Non-cisgender Identities by Text-to-Image Models
ACL 2023
RC3: Regularized Contrastive Cross-lingual Cross-modal Pre-training
ACL 2023
UnLoc: A Unified Framework for Video Localization Tasks
ICCV 2023
Multimodal Industrial Anomaly Detection via Hybrid Fusion
CVPR 2023
Long-Term Rhythmic Video Soundtracker
ICML 2023
GenKIE: Robust Generative Multimodal Document Key Information Extraction
EMNLP 2023
MAGVLT: Masked Generative Vision-and-Language Transformer
CVPR 2023
Beneath the Surface: Unveiling Harmful Memes with Multimodal Reasoning Distilled from Large Language Models
EMNLP 2023
Token Mixing: Parameter-Efficient Transfer Learning from Image-Language to Video-Language
AAAI 2023
Sentiment Knowledge Enhanced Self-supervised Learning for Multimodal Sentiment Analysis
ACL 2023
MNER-QG: An End-to-End MRC Framework for Multimodal Named Entity Recognition with Query Grounding
AAAI 2023
Mutual-Enhanced Incongruity Learning Network for Multi-Modal Sarcasm Detection
AAAI 2023
ConKI: Contrastive Knowledge Injection for Multimodal Sentiment Analysis
ACL 2023
What Do You MEME? Generating Explanations for Visual Semantic Role Labelling in Memes
AAAI 2023
SyncDiffusion: Coherent Montage via Synchronized Joint Diffusions
NIPS 2023
Towards Unified, Explainable, and Robust Multisensory Perception
AAAI 2023
Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation
NIPS 2023
Mass-Producing Failures of Multimodal Systems with Language Models
NIPS 2023
Grounding Answers for Visual Questions Asked by Visually Impaired People
CVPR 2022
Visual Emotion Representation Learning via Emotion-Aware Pre-training
IJCAI 2022
Are Vision-Language Transformers Learning Multimodal Representations? A Probing Perspective
AAAI 2022
How well can Text-to-Image Generative Models understand Ethical Natural Language Interventions?
EMNLP 2022
Probing Cross-modal Semantics Alignment Capability from the Textual Perspective
EMNLP 2022
<
1
…
7
8
9
…
13
>