CognitionCapturer: Decoding Visual Stimuli from Human EEG Signal with Multimodal Information

Kaifan Zhang; Lihuo He; Xin Jiang; Wen Lu; Di Wang; Xinbo Gao

2025 AAAI AAAI 2025

CognitionCapturer: Decoding Visual Stimuli from Human EEG Signal with Multimodal Information

Abstract

Abstract Electroencephalogram (EEG) signals have attracted significant attention from researchers due to their non-invasive nature and high temporal sensitivity in decoding visual stimuli. However, most recent studies have focused solely on the relationship between EEG and image data pairs, neglecting the valuable "beyond-image-modality" information embedded in EEG signals. This results in the loss of critical multimodal information in EEG. To address the limitation, this paper proposes a unified framework that fully leverages multimodal data to represent EEG signals, named CognitionCapturer. Specifically, CognitionCapturer trains modality expert encoders for each modality to extract cross-modal information from the EEG modality. Then, it introduces a diffusion prior to map the EEG embedding space to the CLIP embedding space, followed by using a pretrained generative model, the proposed framework can reconstruct visual stimuli with high semantic and structural fidelity. Notably, the framework does not require any fine-tuning of the generative models and can be extended to incorporate more modalities. Through extensive experiments, we demonstrate that CognitionCapturer outperforms state-of-the-art methods both qualitatively and quantitatively.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Healthcare & Medicine

🧭 Keyword Pioneer — visual stimulus reconstruction

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Kaifan Zhang , Lihuo He , Xin Jiang , Wen Lu , Di Wang , Xinbo Gao

Topics

Artificial Intelligence > Core AI > Multimodal Learning Deep Learning > Models > Diffusion Models Healthcare & Medicine > Research > Biosignal Processing Deep Learning > Learning Types > Multi-Modal Learning Artificial Intelligence > Core AI > Brain-Computer Interface

Keywords

multimodal learning brain-computer interface eeg signal processing visual reconstruction diffusion model clip embedding eeg signal visual stimulus reconstruction

Download PDF

Related papers

BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving 2025

APIRL: Deep Reinforcement Learning for REST API Fuzzing 2025

Anywhere: A Multi-Agent Framework for User-Guided, Reliable, and Diverse Foreground-Conditioned Image Generation 2025

3CAD: A Large-Scale Real-World 3C Product Dataset for Unsupervised Anomaly Detection 2025

Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics 2025