Zero-Shot Visual Reconstruction from MEG Signals Using Contrastive Learning and Generative Diffusion

Poster No:

774 

Submission Type:

Abstract Submission 

Authors:

Xiuwen Wu1,2, Pan Liao2,3, Bingjiang Lyu3, Jia-hong Gao1,2,3

Institutions:

1Center for Biomedical Imaging, University of Science and Technology of China, HeFei, Anhui, 2Center for MRI Research, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, DC, 3Changping Lab, Beijing, DC

First Author:

Xiuwen Wu  
Center for Biomedical Imaging, University of Science and Technology of China|Center for MRI Research, Academy for Advanced Interdisciplinary Studies, Peking University
HeFei, Anhui|Beijing, DC

Co-Author(s):

Pan Liao  
Center for MRI Research, Academy for Advanced Interdisciplinary Studies, Peking University|Changping Lab
Beijing, DC|Beijing, DC
Bingjiang Lyu  
Changping Lab
Beijing, DC
Jia-hong Gao  
Center for Biomedical Imaging, University of Science and Technology of China|Center for MRI Research, Academy for Advanced Interdisciplinary Studies, Peking University|Changping Lab
HeFei, Anhui|Beijing, DC|Beijing, DC

Introduction:

Understanding how the human brain perceives and represents the visual world is a significant topic in neuroscience, artificial intelligence, and brain-computer interface (BCI) research. Reconstructing visual information from brain activity not only reveals neural mechanisms of human cognition but also lays the foundation for practical BCI applications. Magnetoencephalography (MEG) can record dynamic brain activity in response to visual stimuli, while generative diffusion models provide powerful tools for visual reconstruction. In this study, we use contrastive and self-supervised learning to align MEG embeddings with high-level and low-level features of the generative diffusion model, enabling high-quality reconstruction of visual stimuli. Our method enables zero-shot visual reconstruction and achieves state-of-the-art (SOTA) performance across multiple reconstruction metrics.

Methods:

To reconstruct visual stimuli using MEG, we utilized the THINGS-MEG dataset (Hebart et al., 2023), which includes data from four participants during a picture-viewing task. To conduct zero-shot reconstruction test, we removed overlapping classes between the training and test sets. As a result, the adjusted training set contains 1,654 classes for each participant, while the test set includes 200 classes. At the same time, we used resting-state data from another 40 subjects for self-supervised learning, which was preprocessed in the same manner as the THINGS-MEG dataset.
The experiment utilizes the Versatile Diffusion model for image reconstruction (Xu et al., 2023). We designed both low-level and high-level pipelines to align MEG signals with VAE and CLIP features for the same stimuli in a high-dimensional space. In the low-level pipeline, we first pre-trained a 10-layer Transformer model on resting-state MEG data using the Masked Autoencoder (MAE) algorithm (He et al., 2022). Then we fine-tuned this pre-trained model on the THINGS-MEG training set. In the high-level pipeline, we initially used a convolutional neural network (CNN) to compress the MEG data to align with CLIP embeddings. We adopted the InfoNCE Loss (Oord et al., 2018) for this alignment task. These trained models were applied to the test set and fed into the Versatile Diffusion model to generate reconstructed images. In brief, self-supervised learning captures the intrinsic properties of MEG data, while fine-tuning improves alignment with the generative model's semantic space. Contrastive learning further enhances the model's zero-shot capability by capturing the topological structure of the semantic space.

Results:

Our reconstructed images match well with the ground truth visual stimulus images in zero-shot experiments (Fig. 1), which demonstrate that our method can successfully reconstruct visual stimuli from MEG signals. To evaluate the quality of the reconstructed images, we employed a range of metrics, including PixCorr, SSIM, AlexNet(2), AlexNet(5), Inception, and CLIP-based evaluation scores (Scotti et al., 2024). We compared our method with recent MEG studies on image reconstruction (Benchetrit et al., 2023) & (Li et al., 2024). The results show that our methods achieve SOTA performance for five of the six evaluation metrics (Fig. 2).
Supporting Image: results.png
Supporting Image: normalized_radar_chart_final1.png
 

Conclusions:

In this study, we proposed a novel method for reconstructing visual stimuli from MEG signals by leveraging contrastive learning and a generative diffusion model. Our approach aligns MEG embeddings with both low-level semantic space and high-level semantic space, enabling zero-shot, high-quality visual reconstruction. The results demonstrate that our method successfully reconstructs visual stimuli from brain activity, achieving SOTA performance across multiple evaluation metrics. This work highlights the potential of MEG-based visual reconstruction for understanding how the brain perceives and represents the visual world.

Higher Cognitive Functions:

Imagery 1

Modeling and Analysis Methods:

EEG/MEG Modeling and Analysis 2

Keywords:

Computational Neuroscience
Computing
Data analysis
Machine Learning
MEG
Vision

1|2Indicates the priority used for review

Abstract Information

By submitting your proposal, you grant permission for the Organization for Human Brain Mapping (OHBM) to distribute your work in any format, including video, audio print and electronic text through OHBM OnDemand, social media channels, the OHBM website, or other electronic publications and media.

I accept

The Open Science Special Interest Group (OSSIG) is introducing a reproducibility challenge for OHBM 2025. This new initiative aims to enhance the reproducibility of scientific results and foster collaborations between labs. Teams will consist of a “source” party and a “reproducing” party, and will be evaluated on the success of their replication, the openness of the source work, and additional deliverables. Click here for more information. Propose your OHBM abstract(s) as source work for future OHBM meetings by selecting one of the following options:

I am submitting this abstract as an original work to be reproduced. I am available to be the “source party” in an upcoming team and consent to have this work listed on the OSSIG website. I agree to be contacted by OSSIG regarding the challenge and may share data used in this abstract with another team.

Please indicate below if your study was a "resting state" or "task-activation” study.

Task-activation

Healthy subjects only or patients (note that patient studies may also involve healthy subjects):

Healthy subjects

Was this research conducted in the United States?

No

Were any human subjects research approved by the relevant Institutional Review Board or ethics panel? NOTE: Any human subjects studies without IRB approval will be automatically rejected.

Yes

Were any animal research approved by the relevant IACUC or other animal research panel? NOTE: Any animal studies without IACUC approval will be automatically rejected.

Not applicable

Please indicate which methods were used in your research:

MEG

For human MRI, what field strength scanner do you use?

3.0T

Which processing packages did you use for your study?

Free Surfer

Provide references using APA citation style.

Hebart, M. N. (2023). THINGS-data, a multimodal collection of large-scale datasets for investigating object representations in human brain and behavior. Elife, 12, e82580.
Xu, X. (2023). Versatile diffusion: Text, images and variations all in one diffusion model. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 7754-7765).
He, K. (2022). Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 16000-16009).
Oord, A. V. D. (2018). Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748.
Scotti, P. (2024). Reconstructing the mind's eye: fMRI-to-image with contrastive learning and diffusion priors. Advances in Neural Information Processing Systems, 36.
Benchetrit, Y. (2023). Brain decoding: toward real-time reconstruction of visual perception. arXiv preprint arXiv:2310.19812.
Li, D. (2024). Visual decoding and reconstruction via eeg embeddings with guided diffusion. arXiv preprint arXiv:2403.07721.

UNESCO Institute of Statistics and World Bank Waiver Form

I attest that I currently live, work, or study in a country on the UNESCO Institute of Statistics and World Bank List of Low and Middle Income Countries list provided.

No