Seeing Beyond the Brain: Conditional Diffusion Model With Sparse Masked Modeling for Vision Decoding

Zijiao Chen; Jiaxin Qing; Tiange Xiang; Wan Lin Yue; Juan Helen Zhou

2023 CVPR CVPR 2023

Seeing Beyond the Brain: Conditional Diffusion Model With Sparse Masked Modeling for Vision Decoding

Abstract

Decoding visual stimuli from brain recordings aims to deepen our understanding of the human visual system and build a solid foundation for bridging human and computer vision through the Brain-Computer Interface. However, reconstructing high-quality images with correct semantics from brain recordings is a challenging problem due to the complex underlying representations of brain signals and the scarcity of data annotations. In this work, we present MinD-Vis: Sparse Masked Brain Modeling with Double-Conditioned Latent Diffusion Model for Human Vision Decoding. Firstly, we learn an effective self-supervised representation of fMRI data using mask modeling in a large latent space inspired by the sparse coding of information in the primary visual cortex. Then by augmenting a latent diffusion model with double-conditioning, we show that MinD-Vis can reconstruct highly plausible images with semantically matching details from brain recordings using very few paired annotations. We benchmarked our model qualitatively and quantitatively; the experimental results indicate that our method outperformed state-of-the-art in both semantic mapping (100-way semantic classification) and generation quality (FID) by 66% and 41% respectively. An exhaustive ablation study was also conducted to analyze our framework.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Healthcare & Medicine and Interdisciplinary

🧭 Keyword Pioneer — vision decoding

🐣 Hot Topic Early Bird — brain-computer interface

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Zijiao Chen , Jiaxin Qing , Tiange Xiang , Wan Lin Yue , Juan Helen Zhou

Topics

Deep Learning > Models > Diffusion Models Computer Vision > Generation > Image Generation Healthcare & Medicine > Research > Medical AI Interdisciplinary > Cognitive Science > Perception

Keywords

self-supervised learning image reconstruction brain-computer interface fmri decoding diffusion model latent diffusion vision decoding sparse masked modeling

Download PDF

Related papers

CORA: Adapting CLIP for Open-Vocabulary Detection With Region Prompting and Anchor Pre-Matching 2023

3DAvatarGAN: Bridging Domains for Personalized Editable Avatars 2023

Physics-Driven Diffusion Models for Impact Sound Synthesis From Videos 2023

Transductive Few-Shot Learning With Prototype-Based Label Propagation by Iterative Graph Refinement 2023

EXIF As Language: Learning Cross-Modal Associations Between Images and Camera Metadata 2023