← Learning Types

Machine Learning › Learning Types ›

Multimodal Learning

85 directly classified papers

Papers per year

Papers

HumorDB: Can AI understand graphical humor? ICCV 2025

IteRPrimE: Zero-shot Referring Image Segmentation with Iterative Grad-CAM Refinement and Primary Word Emphasis AAAI 2025

Multimodal LLMs as Customized Reward Models for Text-to-Image Generation ICCV 2025

Does Your Vision-Language Model Get Lost in the Long Video Sampling Dilemma? ICCV 2025

MHBench: Demystifying Motion Hallucination in VideoLLMs AAAI 2025

MMDocIR: Benchmarking Multimodal Retrieval for Long Documents EMNLP 2025

Exploring Artificial Image Generation for Stance Detection EMNLP 2025

STiL: Semi-supervised Tabular-Image Learning for Comprehensive Task-Relevant Information Exploration in Multimodal Classification CVPR 2025

Zero-shot Multimodal Document Retrieval via Cross-modal Question Generation EMNLP 2025

Streaming VideoLLMs for Real-Time Procedural Video Understanding ICCV 2025

Multimodal Prior Learning with Double Constraint Alignment for Snapshot Spectral Compressive Imaging IJCAI 2025

Unmasking Deceptive Visuals: Benchmarking Multimodal Large Language Models on Misleading Chart Question Answering EMNLP 2025

Make VLM Recognize Visual Hallucination on Cartoon Character Image with Pose Information WACV 2025

Unsupervised Video Highlight Detection by Learning from Audio and Visual Recurrence WACV 2025

Language Model Based Text-to-Audio Generation: Anti-Causally Aligned Collaborative Residual Transformers EMNLP 2025

Forecasting Credit Ratings: A Case Study where Traditional Methods Outperform Generative LLMs COLING 2025

Thought2Text: Text Generation from EEG Signal using Large Language Models (LLMs) NAACL 2025

The American Sign Language Knowledge Graph: Infusing ASL Models with Linguistic Knowledge NAACL 2025

SSNCSE@DravidianLangTech 2025: Multimodal Hate Speech Detection in Dravidian Languages NAACL 2025

Overview of the Shared Task on Multimodal Hate Speech Detection in Dravidian languages: DravidianLangTech@NAACL 2025 NAACL 2025

HerWILL@DravidianLangTech 2025: Ensemble Approach for Misogyny Detection in Memes Using Pre-trained Text and Vision Transformers NAACL 2025

Podcast Outcasts: Understanding Rumble’s Podcast Dynamics NAACL 2025

Sentiment Analysis on Video Transcripts: Comparing the Value of Textual and Multimodal Annotations NAACL 2025

UnifiedVisual: A Framework for Constructing Unified Vision-Language Datasets EMNLP 2025

Debiased Multimodal Understanding for Human Language Sequences AAAI 2025