Poster No:
1551
Submission Type:
Abstract Submission
Authors:
Ruilin Li1, Zijiao Chen1, Juan Helen Zhou1
Institutions:
1National University of Singapore, Singapore, Singapore
First Author:
Ruilin Li
National University of Singapore
Singapore, Singapore
Co-Author(s):
Zijiao Chen
National University of Singapore
Singapore, Singapore
Introduction:
Reconstructing and interpreting the relationship between specific brain regions and auditory input remains a significant challenge. To address this, we propose Mind-Audio, a novel framework capable of not only recovering perceived audio signals from functional magnetic resonance imaging (fMRI) but also offering detailed insights into the brain regions responsible for auditory processing. Empirical results demonstrate the framework's potential to reconstruct speech signals. Furthermore, interpretation analysis reveals that the brain-to-audio reconstruction process progresses from capturing coarse outlines to finer details.
Methods:
Our framework for fMRI-to-audio reconstruction integrated self-supervised learning (Chen et al., 2023) to encode complex fMRI data and a diffusion model for audio generation. Specifically, we replaced the class embedding in AudioLDM (Liu et al., 2023) with fMRI embeddings and fine-tune the fMRI encoder alongside the attention layers in AudioLDM's UNet. However, the default conditioning mechanism in AudioLDM, where class embeddings were added to latent images, limited interpretability. To address this, we designed a UNet architecture that incorporated conditions through cross-attention layers, processing mel-spectrograms in pixel space akin to DDPM (Ho et al., 2020). We name this model NeuroSynth. This design allows for detailed interpretation of the relationship between fMRI signals and generated mel-spectrograms. To visualize how brain regions influence generated mel-spectrograms, we upscaled intermediate attention maps to match the original mel-spectrogram dimensions.
The models were trained using publicly available fMRI data from three participants in a listening experiment (Tang et al., 2023). Audio stimuli were segmented to align with fMRI TRs, each corresponding to a 2-second segment.
We evaluated the framework using two categories of metrics. The first was the N-way classification accuracy test (Chen et al., 2023), assessing the predicted audio tags of reconstructed samples over 100 trials, with averaged results reported. The second included metrics like Fréchet Audio Distance (FAD) and Kullback-Leibler (KL) divergence to measure the similarity between generated and ground-truth mel-spectrograms.
Results:
Compared to the NeuroSynth baseline trained from scratch, AudioLDM demonstrated superior performance across all three subjects as shown in Figure 1. These results demonstrate that the proposed models have the potential to be capable of effectively reconstructing speech signals.
Interpretation analysis, illustrated in Figure 2, revealed that different brain patches contribute to distinct stages of reconstruction, with some patches contribute to generate coarse outlines (examples shown on the left side of Figure 2) and some contribute to refine finer details (examples shown on the right side of Figure 2).
Conclusions:
The proposed Mind-Audio framework represents a significant advancement in recovering perceived speech signals from fMRI while providing a robust interpretation of auditory processing in the brain. By bridging the gap between auditory stimuli and brain activity, this research paves the way for breakthroughs in non-invasive audio decoding and deeper understanding of brain-audio interactions.
Modeling and Analysis Methods:
Activation (eg. BOLD task-fMRI)
Methods Development 1
Other Methods 2
Keywords:
Machine Learning
Other - Brain decoding
1|2Indicates the priority used for review
By submitting your proposal, you grant permission for the Organization for Human Brain Mapping (OHBM) to distribute your work in any format, including video, audio print and electronic text through OHBM OnDemand, social media channels, the OHBM website, or other electronic publications and media.
I accept
The Open Science Special Interest Group (OSSIG) is introducing a reproducibility challenge for OHBM 2025. This new initiative aims to enhance the reproducibility of scientific results and foster collaborations between labs. Teams will consist of a “source” party and a “reproducing” party, and will be evaluated on the success of their replication, the openness of the source work, and additional deliverables. Click here for more information.
Propose your OHBM abstract(s) as source work for future OHBM meetings by selecting one of the following options:
I am submitting this abstract as an original work to be reproduced. I am available to be the “source party” in an upcoming team and consent to have this work listed on the OSSIG website. I agree to be contacted by OSSIG regarding the challenge and may share data used in this abstract with another team.
Please indicate below if your study was a "resting state" or "task-activation” study.
Task-activation
Healthy subjects only or patients (note that patient studies may also involve healthy subjects):
Healthy subjects
Was this research conducted in the United States?
No
Were any human subjects research approved by the relevant Institutional Review Board or ethics panel?
NOTE: Any human subjects studies without IRB approval will be automatically rejected.
Not applicable
Were any animal research approved by the relevant IACUC or other animal research panel?
NOTE: Any animal studies without IACUC approval will be automatically rejected.
Not applicable
Please indicate which methods were used in your research:
Functional MRI
Provide references using APA citation style.
1. Chen, Z., Qing, J., Xiang, T., Yue, W. L., and Zhou. J. H. Zhou. Seeing beyond the brain: Conditional diffusion model with sparse masked modeling for vision decoding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023, pp.22 710–22 720.
2. Ho, J., Jain, A., and Abbeel, P. Denoising diffusion probabilistic models. Advances in neural information processing systems, vol. 33, pp. 6840–6851, 2020.
3. Liu, H., Chen, Z., Yuan, Y., Mei, X., Liu, X., Mandic, D., Wang, W., and Plumbley , M. D. AudioLDM: Text-to-audio generation with latent diffusion models. In International Conference on Machine Learning. PMLR, 2023.
4. Tang, J., LeBel, A., Jain, S., and Huth, A. G. Semantic reconstruction of continuous language from non-invasive brain recordings. Nature Neuroscience, vol. 26, no. 5, pp. 858–866, May 2023.
No