A New Brain Encoding Method with Mixture-of-Experts for Multi-ROI Joint Training

Poster No:

1153 

Submission Type:

Abstract Submission 

Authors:

Yuhui Du1, Yuduo Zhang2, Vince Calhoun3

Institutions:

1Shanxi University, Taiyuan, Shanxi, 2Shanxi University, Taiyuan, ShanXi, 3GSU/GATech/Emory, Atlanta, GA

First Author:

Yuhui Du  
Shanxi University
Taiyuan, Shanxi

Co-Author(s):

Yuduo Zhang  
Shanxi University
Taiyuan, ShanXi
Vince Calhoun  
GSU/GATech/Emory
Atlanta, GA

Introduction:

Understanding how the brain encodes visual information is crucial for advancements in both neuroscience and artificial intelligence. While existing visual encoding models have made significant progress, they often fail to adequately account for the complex interactions between different regions of interest (ROIs) in the brain. Traditional voxel-wise methods treat brain voxels as independent entities, while ROI-based methods, although more effective at capturing spatial redundancy, still fall short in leveraging the interconnectivity between various brain areas. In this study, we introduce a novel approach that employs a Mixture-of-Experts (MoE) mechanism combined with joint training across multiple ROIs to enhance the predictive accuracy and integration of brain responses to natural visual stimuli.

Methods:

We used the Natural Scenes Dataset (NSD) (Allen 2021), which includes high-resolution 7T fMRI scans from multiple participants exposed to various natural scene images. Each image's brain response was estimated through GLMsingle (Prince 2022), and these responses were mapped onto the brain's cortical surface for further analysis. To enhance brain response predictions, we used a pretrained CLIP visual model (Radford 2021), which excels in feature extraction through vision-language pretraining.
Our brain encoding model leverages a MoE framework to facilitate joint training across multiple ROIs. The architecture is depicted in Figure 1(A), where a single ROI encoder is used to process visual features specific to each ROI. Each image is processed through the pretrained CLIP visual model, which provides multi-layer features for each scene. These features are passed through a fusion block where they are weighted using attention mechanisms specific to each ROI. The fusion block reduces the feature map dimensions and creates a dynamic attention map that focuses on critical regions of interest.
Following the individual ROI encoding, the MoE framework is introduced to enable cross-ROI information integration, as shown in Figure 1(B). This framework uses ROI-specific routers that assign different expert outputs based on the requirements of each ROI. The MoE approach ensures that each ROI is selectively routed to the most relevant experts, allowing for efficient multi-ROI encoding. The joint training of multiple ROIs improves the generalization of the model and enhances the predictive accuracy of brain responses.
Supporting Image: OHBM25_1.png
 

Results:

As depicted in Figure 2(A), the predictive performance (r) for each voxel of one subject is mapped back onto the cortical surface, applying a threshold of p < 0.05. Our model demonstrated superior performance, achieving the highest encoding accuracy (r) of 0.889. This analysis not only highlighted robust predictive capabilities in the primary visual cortex but also demonstrated effective generalization across other visual areas.
Figure 2(B) and Figure 2(C) displays the distribution of voxel encoding performance within different ROIs across both hemispheres for sub1, as measured by the noise-normalized performance. The predictive performance was generally comparable between the left and right hemispheres, with some ROIs achieving an average predictive performance reaching up to 80% of the theoretical upper limit.
Supporting Image: OHBM25_2.png
 

Conclusions:

This paper proposes a novel brain encoding model that leverages a MoE framework and joint training to enhance performance by facilitating information sharing across multiple ROIs. Our method offers several advantages. First, it allows for the customization of feature processing by employing ROI-specific routers and experts, enabling the model to adapt more effectively to the unique characteristics of different brain areas. Second, by jointly training across multiple ROIs, our model can leverage shared information, potentially reducing overfitting and improving generalization across similar visual tasks. We validated our model using the NSD and demonstrate that our method outperforms traditional single-ROI training approaches.

Modeling and Analysis Methods:

Activation (eg. BOLD task-fMRI)
Classification and Predictive Modeling 1

Perception, Attention and Motor Behavior:

Perception: Visual 2

Keywords:

Other - Brain encoding,fMRI

1|2Indicates the priority used for review

Abstract Information

By submitting your proposal, you grant permission for the Organization for Human Brain Mapping (OHBM) to distribute your work in any format, including video, audio print and electronic text through OHBM OnDemand, social media channels, the OHBM website, or other electronic publications and media.

I accept

The Open Science Special Interest Group (OSSIG) is introducing a reproducibility challenge for OHBM 2025. This new initiative aims to enhance the reproducibility of scientific results and foster collaborations between labs. Teams will consist of a “source” party and a “reproducing” party, and will be evaluated on the success of their replication, the openness of the source work, and additional deliverables. Click here for more information. Propose your OHBM abstract(s) as source work for future OHBM meetings by selecting one of the following options:

I do not want to participate in the reproducibility challenge.

Please indicate below if your study was a "resting state" or "task-activation” study.

Task-activation

Healthy subjects only or patients (note that patient studies may also involve healthy subjects):

Healthy subjects

Was this research conducted in the United States?

No

Were any human subjects research approved by the relevant Institutional Review Board or ethics panel? NOTE: Any human subjects studies without IRB approval will be automatically rejected.

Yes

Were any animal research approved by the relevant IACUC or other animal research panel? NOTE: Any animal studies without IACUC approval will be automatically rejected.

Not applicable

Please indicate which methods were used in your research:

Functional MRI
Computational modeling

For human MRI, what field strength scanner do you use?

7T

Which processing packages did you use for your study?

Free Surfer

Provide references using APA citation style.

Allen, E. J., St-Yves, G., Wu, Y., Breedlove, J. L., Prince, J. S., Dowdle, L. T., Nau, M., Caron, B., Pestilli, F., & Charest, I. (2022). A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligence. Nature Neuroscience, 25(1), 116–126.
Prince, J. S., Charest, I., Kurzawski, J. W., Pyles, J. A., Tarr, M. J., & Kay, K. N. (2022). Improving the accuracy of single-trial fMRI response estimates using GLMsingle. eLife, 11, e77599. https://doi.org/10.7554/eLife.77599
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., & Sutskever, I. (2021). Learning Transferable Visual Models From Natural Language Supervision (No. arXiv:2103.00020). arXiv. https://doi.org/10.48550/arXiv.2103.00020

UNESCO Institute of Statistics and World Bank Waiver Form

I attest that I currently live, work, or study in a country on the UNESCO Institute of Statistics and World Bank List of Low and Middle Income Countries list provided.

No