Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Deep Learning
›
Learning Types
›
Multimodal Learning
323 directly classified papers
Papers per year
2014: 1
2015: 1
2017: 8
2018: 11
2019: 11
2020: 27
2021: 23
2022: 46
2023: 35
2024: 53
2025: 104
2026: 3
Papers
Siamese Learning with Joint Alignment and Regression for Weakly-Supervised Video Paragraph Grounding
CVPR 2024
Language-Driven Anchors for Zero-Shot Adversarial Robustness
CVPR 2024
Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters
CVPR 2024
cPAPERS: A Dataset of Situated and Multimodal Interactive Conversations in Scientific Papers
NIPS 2024
LoTLIP: Improving Language-Image Pre-training for Long Text Understanding
NIPS 2024
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
NIPS 2024
II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models
NIPS 2024
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
NIPS 2024
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens
NIPS 2024
Localize, Understand, Collaborate: Semantic-Aware Dragging via Intention Reasoner
NIPS 2024
Weakly Supervised Video Emotion Detection and Prediction via Cross-Modal Temporal Erasing Network
CVPR 2023
MAP: Multimodal Uncertainty-Aware Vision-Language Pre-Training Model
CVPR 2023
Quantifying & Modeling Multimodal Interactions: An Information Decomposition Framework
NIPS 2023
ImageNetVC: Zero- and Few-Shot Visual Commonsense Evaluation on 1000 ImageNet Categories
EMNLP 2023
Structure and Content-Guided Video Synthesis with Diffusion Models
ICCV 2023
FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction
ACL 2023
Measuring Progress in Fine-grained Vision-and-Language Understanding
ACL 2023
Tackling Modality Heterogeneity with Multi-View Calibration Network for Multimodal Sentiment Detection
ACL 2023
ShapeTalk: A Language Dataset and Framework for 3D Shape Edits and Deformations
CVPR 2023
AVFormer: Injecting Vision Into Frozen Speech Models for Zero-Shot AV-ASR
CVPR 2023
FOCAL: Contrastive Learning for Multimodal Time-Series Sensing Signals in Factorized Orthogonal Latent Space
NIPS 2023
Tackling Ambiguity with Images: Improved Multimodal Machine Translation and Contrastive Evaluation
ACL 2023
MultiEMO: An Attention-Based Correlation-Aware Multimodal Fusion Framework for Emotion Recognition in Conversations
ACL 2023
Visually-augmented pretrained language models for NLP tasks without images
ACL 2023
Cross-Modal Attribute Insertions for Assessing the Robustness of Vision-and-Language Learning
ACL 2023
<
1
…
6
7
8
…
13
>