Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Resources & Methods
Natural Language Processing
›
Resources & Methods
›
Multimodal NLP
86 directly classified papers
Papers per year
2016: 2
2017: 1
2018: 2
2019: 3
2020: 8
2021: 10
2022: 14
2023: 7
2024: 9
2025: 30
Papers
SketchAgent: Language-Driven Sequential Sketch Generation
CVPR 2025
VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary
CVPR 2025
Know "No" Better: A Data-Driven Approach for Enhancing Negation Awareness in CLIP
ICCV 2025
From Easy to Hard: The MIR Benchmark for Progressive Interleaved Multi-Image Reasoning
ICCV 2025
Unified Multimodal Understanding via Byte-Pair Visual Encoding
ICCV 2025
Improving Vision-and-Language Reasoning via Spatial Relations Modeling
WACV 2024
CLIP-DIY: CLIP Dense Inference Yields Open-Vocabulary Semantic Segmentation For-Free
WACV 2024
Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models
CVPR 2024
DocLLM: A Layout-Aware Generative Language Model for Multimodal Document Understanding
ACL 2024
MM-SOC: Benchmarking Multimodal Large Language Models in Social Media Platforms
ACL 2024
Multilingual Synopses of Movie Narratives: A Dataset for Vision-Language Story Understanding
EMNLP 2024
Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal Alignment
CVPR 2024
Peacock: A Family of Arabic Multimodal Large Language Models and Benchmarks
ACL 2024
Nebula: A discourse aware Minecraft Builder
EMNLP 2024
Spontaneous gestures encoded by hand positions improve language models: An Information-Theoretic motivated study
ACL 2023
CIF-PT: Bridging Speech and Text Representations for Spoken Language Understanding via Continuous Integrate-and-Fire Pre-Training
ACL 2023
Dynamic Inference With Grounding Based Vision and Language Models
CVPR 2023
Position-Guided Text Prompt for Vision-Language Pre-Training
CVPR 2023
Coupling Artificial Neurons in BERT and Biological Neurons in the Human Brain
AAAI 2023
Clover: Towards a Unified Video-Language Alignment and Fusion Model
CVPR 2023
Grounded and well-rounded: a methodological approach to the study of cross-modal and cross-lingual grounding
EMNLP 2023
Revisiting the "Video" in Video-Language Understanding
CVPR 2022
Bridging between Cognitive Processing Signals and Linguistic Features via a Unified Attentional Network
AAAI 2022
Visual Grounding of Inter-lingual Word-Embeddings
EMNLP 2022
CogBERT: Cognition-Guided Pre-trained Language Models
COLING 2022
<
1
2
3
4
>