Mapping individual and shared cortical language representations during real-time natural dialogues

Presented During:

Tuesday, June 25, 2024: 12:00 PM - 1:15 PM
COEX  
Room: Grand Ballroom 101-102  

Poster No:

1052 

Submission Type:

Abstract Submission 

Authors:

Zaid Zada1, Samuel Nastase1, Sebastian Speer1, Laetitia Mwilambwe-Tshilobo1, Lily Tsoi2, Shannon Burns3, Uri Hasson1, Diana Tamir1

Institutions:

1Princeton University, Princeton, NJ, 2Caldwell University, Caldwell, NJ, 3Pomona College, Claremont, CA

First Author:

Zaid Zada  
Princeton University
Princeton, NJ

Co-Author(s):

Samuel Nastase  
Princeton University
Princeton, NJ
Sebastian Speer  
Princeton University
Princeton, NJ
Laetitia Mwilambwe-Tshilobo  
Princeton University
Princeton, NJ
Lily Tsoi  
Caldwell University
Caldwell, NJ
Shannon Burns  
Pomona College
Claremont, CA
Uri Hasson  
Princeton University
Princeton, NJ
Diana Tamir  
Princeton University
Princeton, NJ

Introduction:

How is language encoded in the brain during everyday conversations, and how is that linguistic encoding shared across interlocutors? Typical studies of the neural basis of language present subjects with predetermined, isolated words or sentences (Price, 2010), and do not consider the role of spontaneous language production nor linguistic neural coupling (Garrod & Pickering, 2004). Here, we aim to address both gaps and map brain areas involved in both speech production and comprehension during natural dialogue.

Methods:

We developed a hyperscanning paradigm to collect simultaneous fMRI data in 30 dyads (60 subjects, 41F) as they freely discussed 10 topics across 5 runs (Fig 1A) (Speer et al., 2023). Topics were presented as a starting point, but each dyad was free to pursue the discussion in different ways. Our goal was to characterize the linguistic content encoded in the brain within subjects. To that end, we estimated voxel-wise encoding models to predict held-out BOLD signals during speech production or comprehension from 6 feature spaces: nuisance task structure regressors, mel-spectral features, phonetic articulatory features, head motion parameters, and word embeddings extracted from the GPT-2 language model (Fig 1B) (la Tour et al., 2022). To account for turn-taking during natural conversations, we split all regressors into separate sets for speaking and listening, and fit both submodels jointly using banded ridge regression; this allows the model to learn different weights for each process, and allows us to quantify the relative contribution of each submodel. Then, we correlated the actual and predicted BOLD activity for left-out runs in each voxel only from the language model word embedding, quantifying the extent of linguistic content in the signal for production or comprehension time points separately (Fig 1C).
Supporting Image: fig1.png
 

Results:

We found strong encoding performance bilaterally throughout the language network-superior temporal cortex, middle frontal gyrus, inferior frontal gyrus, and angular gyrus-as well as somatomotor and precuneus areas. When evaluating the feature sets separately, we found that the task structure, spectral, and articulation features all recruited the somatomotor cortex during speech production, while speech comprehension more strongly recruited the superior temporal cortex (Fig 2C). While head motion features predicted a typical halo in the superior area. With that variance accounted for, the language model embeddings predicted posterior temporal cortex and middle frontal regions the most (Fig 1A). Surprisingly, we found better overall accuracy in the right hemisphere than left, and the right angular gyrus is heavily recruited for production. The superior temporal region depicted a ventral accuracy gradient for comprehension only (orange), to joint processing (white), and then only production (blue).

In order to test for linguistic coupling across dyads, we developed a model-based coupling method where we correlate the production/comprehension model predictions of one subject to the comprehension/production responses of their partner. We found that model-based predictions can generalize between subjects in several hubs across the language network from the early auditory cortex, to temporal regions, and to precuneus (Fig 2B).
Supporting Image: fig2.png
 

Conclusions:

Our findings lay the foundation for assessing model-based, brain-to-brian coupling between speakers and listeners. We showed that cortical language representations during interactive natural dialogues can be predicted by language model embeddings, revealing both shared and selective encoding for speaking and listening. While this concurs with existing literature for language comprehension (Caucheteux & King, 2022; Schrimpf et al., 2021), we also show this in production during dialogue (Goldstein et al., 2023; Yamashita et al., 2023). We further extend the production–perception relationship by formally modeling the shared speaker–listener alignment on linguistic features (Zada et al., 2023).

Emotion, Motivation and Social Neuroscience:

Social Interaction

Language:

Language Comprehension and Semantics 2
Speech Perception
Speech Production 1

Modeling and Analysis Methods:

Classification and Predictive Modeling

Keywords:

Computational Neuroscience
FUNCTIONAL MRI
Language
Perception
Social Interactions

1|2Indicates the priority used for review

Provide references using author date format

Caucheteux, C., & King, J.-R. (2022). Brains and algorithms partially converge in natural language processing. Communications Biology, 5(1), 134.
Garrod, S., & Pickering, M. J. (2004). Why is conversation so easy? Trends in Cognitive Sciences, 8(1), 8–11.
Goldstein, A., Wang, H., Niekerken, L., Zada, Z., Aubrey, B., Sheffer, T., Nastase, S. A., Gazula, H., Schain, M., Singh, A., Rao, A., Choe, G., Kim, C., Doyle, W., Friedman, D., Devore, S., Dugan, P., Hassidim, A., Brenner, M., … Hasson, U. (2023). Deep speech-to-text models capture the neural basis of spontaneous speech in everyday conversations. In bioRxiv. https://doi.org/10.1101/2023.06.26.546557
la Tour, T. D., Eickenberg, M., Nunez-Elizalde, A. O., & Gallant, J. L. (2022). Feature-space selection with banded ridge regression. In bioRxiv. https://doi.org/10.1101/2022.05.05.490831
Price, C. J. (2010). The anatomy of language: a review of 100 fMRI studies published in 2009. Annals of the New York Academy of Sciences, 1191, 62–88.
Schrimpf, M., Blank, I. A., Tuckute, G., Kauf, C., Hosseini, E. A., Kanwisher, N., Tenenbaum, J. B., & Fedorenko, E. (2021). The neural architecture of language: Integrative modeling converges on predictive processing. Proceedings of the National Academy of Sciences of the United States of America, 118(45). https://doi.org/10.1073/pnas.2105646118
Speer, S., Mwilambwe-Tshilobo, L., Tsoi, L., Burns, S., Falk, E. B., & Tamir, D. (2023). What makes a good conversation? fMRI-hyperscanning shows friends explore and strangers converge. In PsyArXiv. https://doi.org/10.31234/osf.io/jfvpx
Yamashita, M., Kubo, R., & Nishimoto, S. (2023). Cortical representations of languages during natural dialogue. In bioRxiv. https://doi.org/10.1101/2023.08.21.553821
Zada, Z., Goldstein, A., Michelmann, S., Simony, E., Price, A., Hasenfratz, L., Barham, E., Zadbood, A., Doyle, W., Friedman, D., Dugan, P., Melloni, L., Devore, S., Flinker, A., Devinsky, O., Nastase, S. A., & Hasson, U. (2023). A shared linguistic space for transmitting our thoughts from brain to brain in natural conversations. bioRxiv : The Preprint Server for Biology. https://doi.org/10.1101/2023.06.27.546708