Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Deep Learning
›
Learning Types
›
Multimodal Learning
323 directly classified papers
Papers per year
2014: 1
2015: 1
2017: 8
2018: 11
2019: 11
2020: 27
2021: 23
2022: 46
2023: 35
2024: 53
2025: 104
2026: 3
Papers
Siamese Learning with Joint Alignment and Regression for Weakly-Supervised Video Paragraph Grounding
CVPR 2024
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models
CVPR 2024
Finding and Editing Multi-Modal Neurons in Pre-Trained Transformers
ACL 2024
MCIL: Multimodal Counterfactual Instance Learning for Low-resource Entity-based Multimodal Information Extraction
COLING 2024
Large Language Models can Share Images, Too!
ACL 2024
Listen Again and Choose the Right Answer: A New Paradigm for Automatic Speech Recognition with Large Language Models
ACL 2024
CLIP-Driven Open-Vocabulary 3D Scene Graph Generation via Cross-Modality Contrastive Learning
CVPR 2024
Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal LLMs
EMNLP 2024
Identifiable Object-Centric Representation Learning via Probabilistic Slot Attention
NIPS 2024
Investigating the Role of Instruction Variety and Task Difficulty in Robotic Manipulation Tasks
EMNLP 2024
Large Language Models Are Challenged by Habitat-Centered Reasoning
EMNLP 2024
The Factuality Tax of Diversity-Intervened Text-to-Image Generation: Benchmark and Fact-Augmented Intervention
EMNLP 2024
GraphVis: Boosting LLMs with Visual Knowledge Graph Integration
NIPS 2024
GOME: Grounding-based Metaphor Binding With Conceptual Elaboration For Figurative Language Illustration
EMNLP 2024
Extending AZee with Non-manual Gesture Rules for French Sign Language
COLING 2024
InteRead: An Eye Tracking Dataset of Interrupted Reading
COLING 2024
Unveiling the mystery of visual attributes of concrete and abstract concepts: Variability, nearest neighbors, and challenging categories
EMNLP 2024
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation
CVPR 2024
PALM: Few-Shot Prompt Learning for Audio Language Models
EMNLP 2024
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
CVPR 2024
Selectively Informative Description can Reduce Undesired Embedding Entanglements in Text-to-Image Personalization
CVPR 2024
Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction Anticipation
CVPR 2024
Revisiting Multimodal Transformers for Tabular Data with Text Fields
ACL 2024
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
CVPR 2024
RWKV-CLIP: A Robust Vision-Language Representation Learner
EMNLP 2024
<
1
…
5
6
7
…
13
>