Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Models
Deep Learning
›
Models
›
Multi-Modal Learning
115 directly classified papers
Papers per year
2015: 1
2016: 1
2017: 1
2018: 1
2019: 1
2020: 3
2021: 3
2022: 7
2023: 5
2024: 35
2025: 57
Papers
VisFinEval: A Scenario-Driven Chinese Multimodal Benchmark for Holistic Financial Understanding
EMNLP 2025
Speech Discrete Tokens or Continuous Features? A Comparative Analysis for Spoken Language Understanding in SpeechLLMs
EMNLP 2025
VoiceBBQ: Investigating Effect of Content and Acoustics in Social Bias of Spoken Language Model
EMNLP 2025
DELOC: Document Element Localizer
EMNLP 2025
EduVidQA: Generating and Evaluating Long-form Answers to Student Questions based on Lecture Videos
EMNLP 2025
OpenS2S: Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model
EMNLP 2025
Self-Improvement in Multimodal Large Language Models: A Survey
EMNLP 2025
CONSTRUCTURE: Benchmarking CONcept STRUCTUre REasoning for Multimodal Large Language Models
EMNLP 2024
Infrared-LLaVA: Enhancing Understanding of Infrared Images in Multi-Modal Large Language Models
EMNLP 2024
Adversarial Attacks on Parts of Speech: An Empirical Study in Text-to-Image Generation
EMNLP 2024
Jina-ColBERT-v2: A General-Purpose Multilingual Late Interaction Retriever
EMNLP 2024
MLLM-CompBench: A Comparative Reasoning Benchmark for Multimodal LLMs
NIPS 2024
Single Image Unlearning: Efficient Machine Unlearning in Multimodal Large Language Models
NIPS 2024
MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models
NIPS 2024
Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?
ACL 2024
M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation
ACL 2024
Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives
ACL 2024
MemeMQA: Multimodal Question Answering for Memes via Rationale-Based Inferencing
ACL 2024
SpeechGuard: Exploring the Adversarial Robustness of Multi-modal Large Language Models
ACL 2024
Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models
NIPS 2024
Octopus: A Multi-modal LLM with Parallel Recognition and Sequential Understanding
NIPS 2024
MMoE: Enhancing Multimodal Models with Mixtures of Multimodal Interaction Experts
EMNLP 2024
OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer
EMNLP 2024
Kiss up, Kick down: Exploring Behavioral Changes in Multi-modal Large Language Models with Assigned Visual Personas
EMNLP 2024
UNICORN: A Unified Causal Video-Oriented Language-Modeling Framework for Temporal Video-Language Tasks
EMNLP 2024
<
1
2
3
4
5
>