Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Resources & Methods
Natural Language Processing
›
Resources & Methods
›
Multimodal NLP
86 directly classified papers
Papers per year
2016: 2
2017: 1
2018: 2
2019: 3
2020: 8
2021: 10
2022: 14
2023: 7
2024: 9
2025: 30
Papers
SketchAgent: Language-Driven Sequential Sketch Generation
CVPR 2025
2M-BELEBELE: Highly Multilingual Speech and American Sign Language Comprehension Dataset Download PDF
ACL 2025
M2-TabFact: Multi-Document Multi-Modal Fact Verification with Visual and Textual Representations of Tabular Data
ACL 2025
VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary
CVPR 2025
GRAPHGPT-O: Synergistic Multimodal Comprehension and Generation on Graphs
CVPR 2025
MIRe: Enhancing Multimodal Queries Representation via Fusion-Free Modality Interaction for Multimodal Retrieval
ACL 2025
REAL-MM-RAG: A Real-World Multi-Modal Retrieval Benchmark
ACL 2025
Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering
CVPR 2025
Multimodal Coreference Resolution for Chinese Social Media Dialogues: Dataset and Benchmark Approach
ACL 2025
SilVar: Speech-Driven Multimodal Model for Reasoning Visual Question Answering and Object Localization
EMNLP 2025
Cross-Aligned Fusion for Multimodal Understanding
WACV 2025
Deciphering the Complaint Aspects: Towards an Aspect-Based Complaint Identification Model with Video Complaint Dataset in Finance
WACV 2025
UoR-NCL at SemEval-2025 Task 1: Using Generative LLMs and CLIP Models for Multilingual Multimodal Idiomaticity Representation
SEMEVAL 2025
Vision-Language Models Can't See the Obvious
ICCV 2025
AIMA at SemEval-2025 Task 1: Bridging Text and Image for Idiomatic Knowledge Extraction via Mixture of Experts
SEMEVAL 2025
Multi-Schema Proximity Network for Composed Image Retrieval
ICCV 2025
Advancing Sentiment Analysis in Tamil-English Code-Mixed Texts: Challenges and Transformer-Based Solutions
NAACL 2025
VLG-BERT: Towards Better Interpretability in LLMs through Visual and Linguistic Grounding
NAACL 2025
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis
CVPR 2025
PS3: A Multimodal Transformer Integrating Pathology Reports with Histology Images and Biological Pathways for Cancer Survival Prediction
ICCV 2025
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation
ICCV 2025
Know "No" Better: A Data-Driven Approach for Enhancing Negation Awareness in CLIP
ICCV 2025
From Easy to Hard: The MIR Benchmark for Progressive Interleaved Multi-Image Reasoning
ICCV 2025
Unified Multimodal Understanding via Byte-Pair Visual Encoding
ICCV 2025
Towards Natural Language-Based Document Image Retrieval: New Dataset and Benchmark
CVPR 2025
<
1
2
3
4
>