Artificial contextual embeddings align better with non-native than native language processing

Poster No:

1349 

Submission Type:

Abstract Submission 

Authors:

Jianing Zhang1, Lang Qin2, Pan Liao3, Jia-hong Gao1

Institutions:

1Peking University, Beijing, DC, 2peking university, Beijing,DC, 3Center for MRI Research, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing,, Beijing, DC

First Author:

Jianing Zhang  
Peking University
Beijing, DC

Co-Author(s):

Lang Qin  
peking university
Beijing,DC
Pan Liao  
Center for MRI Research, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing,
Beijing, DC
Jia-hong Gao  
Peking University
Beijing, DC

Introduction:

Recent advances in artificial intelligence have inspired interest in the parallels between large language models (LLMs) and human neural processing. Although previous research has demonstrated surprising similarities between LLM representations and human brain activities, these studies are only focused on native language processing. It is unknown if such human-machine parallel could be found with non-native language processing, given the significant differences in the neural mechanisms underpinning learning and using both native and non-native languages. Here, this study endeavors to answer two questions: 1) Can the shared computational principles between LLMs and native language processing be generalized to align LLMs with non-native language processing? 2) LLMs comprehend language in a way more akin to native or non-native speakers?

Methods:

The present study recruited 8 healthy right-handed participants with no history of psychiatric or neurological disorders. The final sample consisted of 8 participants (21.6 ± 0.96 years old, mean ± se; 3 females). They are all English-native students at Peking University for a one-year Mandarin Chinese immersion learning program, and 6 of them are in advanced-level classes (HSK 5-6) and 2 of them are in intermediate-level classes (HSK 3-4). The experimental procedures were approved by the Peking University Institutional Review Board, and all participants provided written informed consent. Each participant listened to identical Chinese and English auditory materials during two separate Magnetoencephalography (MEG) experiments. The experimental stimuli consisted of the Chinese and English versions of the same story from 'The Adventures of Sherlock Holmes'. To ensure that the participants' initial Chinese proficiency was closely matched, all Chinese MEG scans were conducted within one and a half months of their arrival in China. Given that the story content was the same, the English scans were performed two months apart. Each scan lasted about one hour. We adopt the method described in Goldstein et al. (2022) to our MEG data.

Results:

We found that the brain regions and responses significantly associated with processing English as a native language and Chinese as a second language were largely consistent with findings from previous literature. When participants processed English (their native language), the activation in the left and right hemispheres peaked almost simultaneously. In contrast, during the processing of Chinese (a non-native language), neural responses in both hemispheres were slower compared to English, and there was a significant delay in the timing of their peak activations. Brain activation Compared to processing their native language (English), lower temporal synchronization was found between the left and right hemispheres when processing a non-native language (Chinese). We employed the Qwen2.5-7B open-source multilingual LLM to model the neural activities associated with both native and non-native language processing. The results indicated that the fitting of most layers of the LLM with the neural activity related to non-native language processing was higher than that with the neural activity related to native language processing. A paired T-test analysis showed significant differences in fitting for the later layers between the two groups of data (p < 0.05).
Supporting Image: _20241218125246.png
 

Conclusions:

Based on our preliminary results, processing non-native languages is slower than processing native languages, which is consistent with previous literature. Also, lower temporal synchronization was found between the left and right hemispheres when processing a non-native language (Chinese). Importantly, compared to processing their native language (English), contextual embeddings derived from a LLM can better capture the brain's representation of non-native words than native words in same natural contexts.

Brain Stimulation:

Non-Invasive Stimulation Methods Other

Language:

Language Acquisition 2
Language Comprehension and Semantics

Modeling and Analysis Methods:

Activation (eg. BOLD task-fMRI)
EEG/MEG Modeling and Analysis 1

Keywords:

Acquisition
Cognition
Data analysis
fMRI CONTRAST MECHANISMS
Language
Learning
MEG
Modeling
MRI
Perception

1|2Indicates the priority used for review

Abstract Information

By submitting your proposal, you grant permission for the Organization for Human Brain Mapping (OHBM) to distribute your work in any format, including video, audio print and electronic text through OHBM OnDemand, social media channels, the OHBM website, or other electronic publications and media.

I accept

The Open Science Special Interest Group (OSSIG) is introducing a reproducibility challenge for OHBM 2025. This new initiative aims to enhance the reproducibility of scientific results and foster collaborations between labs. Teams will consist of a “source” party and a “reproducing” party, and will be evaluated on the success of their replication, the openness of the source work, and additional deliverables. Click here for more information. Propose your OHBM abstract(s) as source work for future OHBM meetings by selecting one of the following options:

I am submitting this abstract as an original work to be reproduced. I am available to be the “source party” in an upcoming team and consent to have this work listed on the OSSIG website. I agree to be contacted by OSSIG regarding the challenge and may share data used in this abstract with another team.

Please indicate below if your study was a "resting state" or "task-activation” study.

Task-activation

Healthy subjects only or patients (note that patient studies may also involve healthy subjects):

Healthy subjects

Was this research conducted in the United States?

Yes

Are you Internal Review Board (IRB) certified? Please note: Failure to have IRB, if applicable will lead to automatic rejection of abstract.

Not applicable

Were any human subjects research approved by the relevant Institutional Review Board or ethics panel? NOTE: Any human subjects studies without IRB approval will be automatically rejected.

Yes

Were any animal research approved by the relevant IACUC or other animal research panel? NOTE: Any animal studies without IACUC approval will be automatically rejected.

Not applicable

Please indicate which methods were used in your research:

MEG

Provide references using APA citation style.

Caucheteux, C.(2023). Evidence of a predictive coding hierarchy in the human brain listening to speech. Nature human behaviour, 7(3), 430-441.
Mischler, G.(2024). Contextual feature extraction hierarchies converge in large language models and the brain. Nature Machine Intelligence, 1-11.
Goldstein.(2022). Shared computational principles for language processing in humans and deep language models. Nature neuroscience, 25(3), 369-380.
Hahne, A. (2001). What's different in second-language processing? Evidence from event-related brain potentials. Journal of psycholinguistic research, 30, 251-266.
Perani, D.(2005). The neural basis of first and second language processing. Current opinion in neurobiology, 15(2), 202-206.

UNESCO Institute of Statistics and World Bank Waiver Form

I attest that I currently live, work, or study in a country on the UNESCO Institute of Statistics and World Bank List of Low and Middle Income Countries list provided.

No