Transformer reveals visual feature dependence in autism using functional magnetic resonance imaging

Poster No:

276 

Submission Type:

Abstract Submission 

Authors:

Jong-eun Lee1, Hyunjin Park1

Institutions:

1Sungkyunkwan University, Suwon-si, Gyeonggi-do

First Author:

Jong-eun Lee  
Sungkyunkwan University
Suwon-si, Gyeonggi-do

Co-Author:

Hyunjin Park  
Sungkyunkwan University
Suwon-si, Gyeonggi-do

Introduction:

Individuals with autism spectrum disorder (ASD) perceive the world differently from neurotypical individuals [1]. Two key theories, Weak Central Coherence [2] and Enhanced Perceptual Functioning [3], attribute this difference to detail-oriented sensory processing in ASD. However, how atypical sensory perception in ASD disrupts this hierarchical visual processing remains unclear. We developed a novel framework combining movie-watching fMRI with a transformer model to investigate this process. By examining brain responses predicted from different transformer layers, we aimed to identify abnormalities in ASD's hierarchical visual processing.

Methods:

We analyzed fMRI data (TR: 0.8 sec) from 202 participants, including 79 typically developing (TD) and 123 ASD individuals, collected during a movie-watching experiment [4]. Participants viewed a 10-minute clip from Despicable Me. Data were preprocessed using fMRIPrep 20.1.1 and Ciftify toolbox [5,6] and parcellated into 360 brain parcels based on a multimodal parcellation atlas [7]. We used the pretrained BridgeTower model [8], a vision-language transformer trained on image-caption pairs. Early layers process language and vision tokens separately, while later layers integrate both. We focused on cross-modal layers (7-12) for analysis. Parcel-wise encoding models mapped stimulus features to brain responses. Movie features were resampled to match fMRI acquisition TR. A finite impulse response model with four delays (2.4, 4.0, 5.6, 7.2 sec) was applied [9]. The mapping was modeled using L2-regularized linear regression, with regularization parameters selected through cross-validation. Model performance was evaluated using unseen movie clips, with performance measured via Pearson correlation between predicted and actual brain responses. We compared encoding model performance across layers using a linear model, controlling for age, sex, site, and head motion. This allowed us to assess how ASD and TD groups encode features across BridgeTower layers.

Results:

Fig. 1A shows shifting whole-brain model performance across BridgeTower layers. Mapping the highest-performing layer for each parcel revealed that early layers correspond to low-level sensory regions (e.g., primary visual, motor areas), while later layers align with higher-order regions (e.g., auditory association, posterior cingulate) (Fig. 1B). Significant performance differences (p < 0.01, FDR-corrected) were observed in auditory association (F=9.33), primary visual (F=4.30), and orbital & polar frontal regions (F=4.96) (Fig. 1C-D). Performance increased in auditory association areas but decreased in primary visual areas across later BridgeTower layers. Group comparisons revealed specific parcels where ASD participants outperformed TD individuals (Fig. 2A). In the MT+ complex and neighboring visual regions, ASD individuals exhibited superior performance across all layers, while outperforming only in later layers in somatosensory and motor regions. At the section level, ASD participants showed higher performance in higher-order regions such as the temporoparietal-occipital junction (TPOJ), posterior cingulate cortex, and MT+ complex (Fig. 2B), though this was not statistically significant in later layers.
Supporting Image: Figure1.png
Supporting Image: Figure2.png
 

Conclusions:

This study offers new insights into visual processing mechanisms in ASD by integrating movie-watching fMRI with a multimodal transformer model. While TD individuals exhibited typical hierarchical encoding performance, ASD participants showed enhanced performance in specific brain regions, particularly the MT+ complex, suggesting increased sensitivity to visual features during sensory processing in ASD.

This study was supported by National Research Foundation (RS-2024-00408040), Institute for Basic Science (IBS-R015-D1), AI Graduate School Support Program (Sungkyunkwan University) (RS-2019-II190421), ICT Creative Consilience program (RS-2020-II201821), and the Artificial Intelligence Innovation Hub program (RS-2021-II212068).

Disorders of the Nervous System:

Neurodevelopmental/ Early Life (eg. ADHD, autism) 1

Modeling and Analysis Methods:

Activation (eg. BOLD task-fMRI)

Novel Imaging Acquisition Methods:

BOLD fMRI

Perception, Attention and Motor Behavior:

Perception: Visual 2

Keywords:

Autism
Computational Neuroscience
Modeling

1|2Indicates the priority used for review

Abstract Information

By submitting your proposal, you grant permission for the Organization for Human Brain Mapping (OHBM) to distribute your work in any format, including video, audio print and electronic text through OHBM OnDemand, social media channels, the OHBM website, or other electronic publications and media.

I accept

The Open Science Special Interest Group (OSSIG) is introducing a reproducibility challenge for OHBM 2025. This new initiative aims to enhance the reproducibility of scientific results and foster collaborations between labs. Teams will consist of a “source” party and a “reproducing” party, and will be evaluated on the success of their replication, the openness of the source work, and additional deliverables. Click here for more information. Propose your OHBM abstract(s) as source work for future OHBM meetings by selecting one of the following options:

I do not want to participate in the reproducibility challenge.

Please indicate below if your study was a "resting state" or "task-activation” study.

Task-activation

Healthy subjects only or patients (note that patient studies may also involve healthy subjects):

Patients

Was this research conducted in the United States?

No

Were any human subjects research approved by the relevant Institutional Review Board or ethics panel? NOTE: Any human subjects studies without IRB approval will be automatically rejected.

Yes

Were any animal research approved by the relevant IACUC or other animal research panel? NOTE: Any animal studies without IACUC approval will be automatically rejected.

Not applicable

Please indicate which methods were used in your research:

Functional MRI
Computational modeling

For human MRI, what field strength scanner do you use?

3.0T

Which processing packages did you use for your study?

AFNI
SPM
FSL
Free Surfer

Provide references using APA citation style.

[1] Robertson, C. E., & Baron-Cohen, S. (2017). Sensory perception in autism. Nature Reviews Neuroscience, 18(11), 671-684.
[2] Happé, F., & Frith, U. (2006). The weak coherence account: detail-focused cognitive style in autism spectrum disorders. Journal of autism and developmental disorders, 36(1), 5-25.
[3] Mottron, L., Dawson, M., Soulières, I., Hubert, B., & Burack, J. (2006). Enhanced perceptual functioning in autism: An update, and eight principles of autistic perception. Journal of autism and developmental disorders, 36, 27-43.
[4] Alexander, L. M., Escalera, J., Ai, L., Andreotti, C., Febre, K., Mangone, A., ... & Milham, M. P. (2017). An open resource for transdiagnostic research in pediatric mental health and learning disorders. Scientific data, 4(1), 1-26.
[5] Esteban, O., Markiewicz, C. J., Blair, R. W., Moodie, C. A., Isik, A. I., Erramuzpe, A., ... & Gorgolewski, K. J. (2019). fMRIPrep: a robust preprocessing pipeline for functional MRI. Nature methods, 16(1), 111-116.
[6] Dickie, E. W., Anticevic, A., Smith, D. E., Coalson, T. S., Manogaran, M., Calarco, N., ... & Voineskos, A. N. (2019). Ciftify: A framework for surface-based analysis of legacy MR acquisitions. Neuroimage, 197, 818-826.
[7] Glasser, M. F., Coalson, T. S., Robinson, E. C., Hacker, C. D., Harwell, J., Yacoub, E., ... & Van Essen, D. C. (2016). A multi-modal parcellation of human cerebral cortex. Nature, 536(7615), 171-178.
[8] Xu, X., Wu, C., Rosenman, S., Lal, V., Che, W., & Duan, N. (2023, June). Bridgetower: Building bridges between encoders in vision-language representation learning. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 37, No. 9, pp. 10637-10647).
[9] Huth, A. G., Nishimoto, S., Vu, A. T., & Gallant, J. L. (2012). A continuous semantic space describes the representation of thousands of object and action categories across the human brain. Neuron, 76(6), 1210-1224.

UNESCO Institute of Statistics and World Bank Waiver Form

I attest that I currently live, work, or study in a country on the UNESCO Institute of Statistics and World Bank List of Low and Middle Income Countries list provided.

No