Poster No:
1153
Submission Type:
Abstract Submission
Authors:
Yuhui Du1, Yuduo Zhang2, Vince Calhoun3
Institutions:
1Shanxi University, Taiyuan, Shanxi, 2Shanxi University, Taiyuan, ShanXi, 3GSU/GATech/Emory, Atlanta, GA
First Author:
Yuhui Du
Shanxi University
Taiyuan, Shanxi
Co-Author(s):
Introduction:
Understanding how the brain encodes visual information is crucial for advancements in both neuroscience and artificial intelligence. While existing visual encoding models have made significant progress, they often fail to adequately account for the complex interactions between different regions of interest (ROIs) in the brain. Traditional voxel-wise methods treat brain voxels as independent entities, while ROI-based methods, although more effective at capturing spatial redundancy, still fall short in leveraging the interconnectivity between various brain areas. In this study, we introduce a novel approach that employs a Mixture-of-Experts (MoE) mechanism combined with joint training across multiple ROIs to enhance the predictive accuracy and integration of brain responses to natural visual stimuli.
Methods:
We used the Natural Scenes Dataset (NSD) (Allen 2021), which includes high-resolution 7T fMRI scans from multiple participants exposed to various natural scene images. Each image's brain response was estimated through GLMsingle (Prince 2022), and these responses were mapped onto the brain's cortical surface for further analysis. To enhance brain response predictions, we used a pretrained CLIP visual model (Radford 2021), which excels in feature extraction through vision-language pretraining.
Our brain encoding model leverages a MoE framework to facilitate joint training across multiple ROIs. The architecture is depicted in Figure 1(A), where a single ROI encoder is used to process visual features specific to each ROI. Each image is processed through the pretrained CLIP visual model, which provides multi-layer features for each scene. These features are passed through a fusion block where they are weighted using attention mechanisms specific to each ROI. The fusion block reduces the feature map dimensions and creates a dynamic attention map that focuses on critical regions of interest.
Following the individual ROI encoding, the MoE framework is introduced to enable cross-ROI information integration, as shown in Figure 1(B). This framework uses ROI-specific routers that assign different expert outputs based on the requirements of each ROI. The MoE approach ensures that each ROI is selectively routed to the most relevant experts, allowing for efficient multi-ROI encoding. The joint training of multiple ROIs improves the generalization of the model and enhances the predictive accuracy of brain responses.

Results:
As depicted in Figure 2(A), the predictive performance (r) for each voxel of one subject is mapped back onto the cortical surface, applying a threshold of p < 0.05. Our model demonstrated superior performance, achieving the highest encoding accuracy (r) of 0.889. This analysis not only highlighted robust predictive capabilities in the primary visual cortex but also demonstrated effective generalization across other visual areas.
Figure 2(B) and Figure 2(C) displays the distribution of voxel encoding performance within different ROIs across both hemispheres for sub1, as measured by the noise-normalized performance. The predictive performance was generally comparable between the left and right hemispheres, with some ROIs achieving an average predictive performance reaching up to 80% of the theoretical upper limit.
Conclusions:
This paper proposes a novel brain encoding model that leverages a MoE framework and joint training to enhance performance by facilitating information sharing across multiple ROIs. Our method offers several advantages. First, it allows for the customization of feature processing by employing ROI-specific routers and experts, enabling the model to adapt more effectively to the unique characteristics of different brain areas. Second, by jointly training across multiple ROIs, our model can leverage shared information, potentially reducing overfitting and improving generalization across similar visual tasks. We validated our model using the NSD and demonstrate that our method outperforms traditional single-ROI training approaches.
Modeling and Analysis Methods:
Activation (eg. BOLD task-fMRI)
Classification and Predictive Modeling 1
Perception, Attention and Motor Behavior:
Perception: Visual 2
Keywords:
Other - Brain encoding,fMRI
1|2Indicates the priority used for review
By submitting your proposal, you grant permission for the Organization for Human Brain Mapping (OHBM) to distribute your work in any format, including video, audio print and electronic text through OHBM OnDemand, social media channels, the OHBM website, or other electronic publications and media.
I accept
The Open Science Special Interest Group (OSSIG) is introducing a reproducibility challenge for OHBM 2025. This new initiative aims to enhance the reproducibility of scientific results and foster collaborations between labs. Teams will consist of a “source” party and a “reproducing” party, and will be evaluated on the success of their replication, the openness of the source work, and additional deliverables. Click here for more information.
Propose your OHBM abstract(s) as source work for future OHBM meetings by selecting one of the following options:
I do not want to participate in the reproducibility challenge.
Please indicate below if your study was a "resting state" or "task-activation” study.
Task-activation
Healthy subjects only or patients (note that patient studies may also involve healthy subjects):
Healthy subjects
Was this research conducted in the United States?
No
Were any human subjects research approved by the relevant Institutional Review Board or ethics panel?
NOTE: Any human subjects studies without IRB approval will be automatically rejected.
Yes
Were any animal research approved by the relevant IACUC or other animal research panel?
NOTE: Any animal studies without IACUC approval will be automatically rejected.
Not applicable
Please indicate which methods were used in your research:
Functional MRI
Computational modeling
For human MRI, what field strength scanner do you use?
7T
Which processing packages did you use for your study?
Free Surfer
Provide references using APA citation style.
Allen, E. J., St-Yves, G., Wu, Y., Breedlove, J. L., Prince, J. S., Dowdle, L. T., Nau, M., Caron, B., Pestilli, F., & Charest, I. (2022). A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligence. Nature Neuroscience, 25(1), 116–126.
Prince, J. S., Charest, I., Kurzawski, J. W., Pyles, J. A., Tarr, M. J., & Kay, K. N. (2022). Improving the accuracy of single-trial fMRI response estimates using GLMsingle. eLife, 11, e77599. https://doi.org/10.7554/eLife.77599
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., & Sutskever, I. (2021). Learning Transferable Visual Models From Natural Language Supervision (No. arXiv:2103.00020). arXiv. https://doi.org/10.48550/arXiv.2103.00020
No