Semantic Prediction and Error in Visual Cortex: Insights from fMRI Vision-Language Data

Poster No:

2066 

Submission Type:

Abstract Submission 

Authors:

Shi Gu1, Shurui Li2, Ruyuan Zhang3, Yuanning Li2

Institutions:

1University of Electronic Science and Technology of China, Chengdu, 四川, 2ShanghaiTech University, Shanghai, Shanghai, 3Shanghai Jiao Tong University, Shanghai, Shanghai

First Author:

Shi Gu  
University of Electronic Science and Technology of China
Chengdu, 四川

Co-Author(s):

Shurui Li  
ShanghaiTech University
Shanghai, Shanghai
Ruyuan Zhang  
Shanghai Jiao Tong University
Shanghai, Shanghai
Yuanning Li  
ShanghaiTech University
Shanghai, Shanghai

Introduction:

Large-scale fMRI datasets with naturalistic stimuli offer ecologically relevant conditions for studying sensory perception and enable the application of advanced AI models to explore the neural coding of language and visual information. Despite these advancements, most research has focused on isolated visual or language networks, with limited attention to the interaction between vision and language at the semantic level. This study addresses this gap by investigating the neural mechanisms of semantic matching across modalities. Specifically, we employ a paired caption-image semantic matching task to explore how the visual cortex encodes semantic expectations and prediction errors.

Methods:

We collected 320 hours of 3T MRI data from 8 subjects. Each participant viewed over 4,400 caption-image pairs, where each stimulus contained a text caption followed by a naturalistic image. Participants determined whether the caption and image matched semantically. To investigate neural coding mechanisms, we developed neural encoding models leveraging features from large vision and language models. These models were used to study the role of the visual cortex in semantic expectation and predictive error.

Results:

Our findings reveal that the early visual cortex exhibits reduced activity when presented with semantically expected images compared to unexpected ones. Language model features predicted the early visual cortex's response after viewing a text caption, indicating its ability to generate semantic expectations. Furthermore, we observed that neural activity from V1 to V3 encodes prediction mismatches, transitioning from low-level to high-level prediction errors. The degree of response amplitude reduction correlated with the neural coding of high-level prediction errors, highlighting a gradient of prediction coding in the visual hierarchy.
Supporting Image: WX20241211-0720392x.png
   ·Neural mechanism: Predictive errors
 

Conclusions:

This study provides evidence of cross-modal semantic expectation and predictive error coding in the visual cortex using a large fMRI vision-language dataset. Our findings highlight the visual cortex's role in integrating semantic information across modalities and advancing understanding of predictive coding mechanisms in neural systems.

Novel Imaging Acquisition Methods:

BOLD fMRI 2

Perception, Attention and Motor Behavior:

Perception: Visual 1

Keywords:

FUNCTIONAL MRI
Language
Machine Learning
Vision

1|2Indicates the priority used for review

Abstract Information

By submitting your proposal, you grant permission for the Organization for Human Brain Mapping (OHBM) to distribute your work in any format, including video, audio print and electronic text through OHBM OnDemand, social media channels, the OHBM website, or other electronic publications and media.

I accept

The Open Science Special Interest Group (OSSIG) is introducing a reproducibility challenge for OHBM 2025. This new initiative aims to enhance the reproducibility of scientific results and foster collaborations between labs. Teams will consist of a “source” party and a “reproducing” party, and will be evaluated on the success of their replication, the openness of the source work, and additional deliverables. Click here for more information. Propose your OHBM abstract(s) as source work for future OHBM meetings by selecting one of the following options:

I am submitting this abstract as an original work to be reproduced. I am available to be the “source party” in an upcoming team and consent to have this work listed on the OSSIG website. I agree to be contacted by OSSIG regarding the challenge and may share data used in this abstract with another team.

Please indicate below if your study was a "resting state" or "task-activation” study.

Task-activation

Healthy subjects only or patients (note that patient studies may also involve healthy subjects):

Healthy subjects

Was this research conducted in the United States?

No

Were any human subjects research approved by the relevant Institutional Review Board or ethics panel? NOTE: Any human subjects studies without IRB approval will be automatically rejected.

Yes

Were any animal research approved by the relevant IACUC or other animal research panel? NOTE: Any animal studies without IACUC approval will be automatically rejected.

Not applicable

Please indicate which methods were used in your research:

Functional MRI
Computational modeling

For human MRI, what field strength scanner do you use?

3.0T

Provide references using APA citation style.

not applicable

UNESCO Institute of Statistics and World Bank Waiver Form

I attest that I currently live, work, or study in a country on the UNESCO Institute of Statistics and World Bank List of Low and Middle Income Countries list provided.

No