Semantic classification guided by acoustic latent space in Stereo-EEG speech decoding

Poster No:

1156 

Submission Type:

Abstract Submission 

Authors:

Yunying Wu1, Feiyan Chen1, Weidong Chen1, shaomin zhang1

Institutions:

1Zhejiang University, Hangzhou, AK

First Author:

Yunying Wu  
Zhejiang University
Hangzhou, AK

Co-Author(s):

Feiyan Chen  
Zhejiang University
Hangzhou, AK
Weidong Chen  
Zhejiang University
Hangzhou, AK
shaomin zhang  
Zhejiang University
Hangzhou, AK

Introduction:

The speech brain-computer interface (BCI) is a neural speech prosthetic that decodes brain signals into speech. There are two approaches to achieve speech decoding -- continuous decoding, which maps brain signals into continuous acoustic features, and discrete decoding, which maps brain signals into categories representing the discrete syllables, or words.
Due to the limitations of data sources and relatively low target dimension, direct discrete decoding cannot comprehensively extract the relationship among features in high-dimensional acoustic feature space and the temporal correlation, resulting in poor performance. We suppose that the latent feature space for discrete decoding and continuous decoding are consistent. Then discrete decoding can be carried on the latent space extracted by the prior continuous decoding, and improve the decoding performance.
Comparing our method with direct discrete decoding on the sEEG data, the results showed that this method improved the classification accuracy (6.13% for sub01, and 9.44% for sub02), and the comparison of prediction probabilities was statistically significant (p < 0.001) for same successfully predicted trials. This showed that this method used high-dimensional targets to extract acoustic features with mutual relations and temporal constraints, and improved classification performance.

Methods:

The experiments were conducted on 2 native speakers of Chinese, who were implanted with sEEG electrodes as part of the clinical therapy for their epilepsy. The participants were asked to read 5 vowel letters ('a', 'o', 'i', 'e', 'u') and 10 Arabic numerals (1(yi), 2(er), 3(san), 4(si), 5(wu), 6(liu), 7(qi), 8(ba), 9(jiu), 10(shi)) shown to them on a laptop screen (each for 1.5 seconds). sEEG data was recorded at 512 Hz for first subject and 2048 Hz for second subject, and audio data was recorded at 44100Hz.
The sEEG signal was preprocessed by 1)70-170Hz filtering, and 100Hz, 150Hz notch filtering; 2) down sampling to 100Hz with mean smooth of 50ms window. The audio signal's acoustic targets, Mel Frequency Cepstrum Coefficient (MFCC), fundamental frequency (f0) and Aperiodicity (AP), were extracted by tool "pyworld".
A 2D Cov + LSTM model was trained firstly to fit the acoustic targets (MFCC, f0, AP), Then classifier (1D Conv) mapped the features extracted by prior model to the categories of words (Figure 1).
Supporting Image: fig1.png
   ·Model
 

Results:

For each subject, we trained them individually using 5folds cross validation where 4 folds were used for training and the remaining one was used for test. All the results of the test fold were collected for subsequent analysis. As shown in Table I, the discrete decoding for sub01 achieved mean accuracy of 40.36%, much higher than the chance level 6.67%. The discrete decoding for sub02 had mean accuracy of 22.89%, also higher than the chance level.
The instructive effect of continuous decoding was obvious. Accuracy of sub01 achieved 46.49%, higher than original 40.36%. and for sub02, the accuracy was improved from 22.89% to 32.33% (Tabel I). In confusion matrix, we found that for sub01, maintaining the advantage on "u", "3", "7", "9", the fitting guided classification improve the accuracy of "a", "i", "o", "1(yi)", "2(er)", "6(liu)". And for sub02, the classification accuracy of "a", "e", "i", "o", "1(yi)", "2(er)", "3(san)", "4(si)", "6(liu)", "9(jiu)" were improved while maintaining "u", "4(si)", "8(ba)" (shown in Fig. 3 B, E).
Supporting Image: results.png
   ·Results
 

Conclusions:

In our experiment, discrete decoding improved performance under the guiding of the continuous decoding on high-dimensional space mapping. There was a significant improvement not only in the number of correct trials but also in the prediction probability of correct trials.
We also noticed that direct discrete decoding had its own unique set of successfully predicted trials. Therefore, it is necessary to explore the feature space of continuous decoding and discrete decoding in depth, and to integrate the advantages of both to improve the classification performance.

Language:

Speech Production 2

Modeling and Analysis Methods:

Classification and Predictive Modeling 1

Novel Imaging Acquisition Methods:

Imaging Methods Other

Keywords:

Machine Learning
Other - Speech decoding, Discrete decoding, Continuous decoding, Transfer learning

1|2Indicates the priority used for review

Abstract Information

By submitting your proposal, you grant permission for the Organization for Human Brain Mapping (OHBM) to distribute your work in any format, including video, audio print and electronic text through OHBM OnDemand, social media channels, the OHBM website, or other electronic publications and media.

I accept

The Open Science Special Interest Group (OSSIG) is introducing a reproducibility challenge for OHBM 2025. This new initiative aims to enhance the reproducibility of scientific results and foster collaborations between labs. Teams will consist of a “source” party and a “reproducing” party, and will be evaluated on the success of their replication, the openness of the source work, and additional deliverables. Click here for more information. Propose your OHBM abstract(s) as source work for future OHBM meetings by selecting one of the following options:

I am submitting this abstract as an original work to be reproduced. I am available to be the “source party” in an upcoming team and consent to have this work listed on the OSSIG website. I agree to be contacted by OSSIG regarding the challenge and may share data used in this abstract with another team.

Please indicate below if your study was a "resting state" or "task-activation” study.

Task-activation

Healthy subjects only or patients (note that patient studies may also involve healthy subjects):

Healthy subjects

Was this research conducted in the United States?

No

Were any human subjects research approved by the relevant Institutional Review Board or ethics panel? NOTE: Any human subjects studies without IRB approval will be automatically rejected.

Yes

Were any animal research approved by the relevant IACUC or other animal research panel? NOTE: Any animal studies without IACUC approval will be automatically rejected.

Not applicable

Please indicate which methods were used in your research:

Other, Please specify  -   sEEG

Which processing packages did you use for your study?

Other, Please list  -   Pytorch, pyworld

Provide references using APA citation style.

[1] Bocquelet, Florent et al. “Key considerations in designing a speech brain-computer interface.” Journal of physiology, Paris vol. 110,4 Pt A (2016): 392-401. doi:10.1016/j.jphysparis.2017.07.002
[2] Anumanchipalli, Gopala K et al. “Speech synthesis from neural decoding of spoken sentences.” Nature vol. 568,7753 (2019): 493-498. doi:10.1038/s41586-019-1119-1[1]
[3] Verwoert, Maxime et al. “Dataset of Speech Production in intracranial.Electroencephalography.” Scientific data vol. 9,1 434. 22 Jul. 2022, doi:10.1038/s41597-022-01542-9
[4] Tanja, et al. "Biosignal-Based Spoken Communication: A Survey." IEEE/ACM Transactions on Audio, Speech, and Language Processing 25.12(2017):2257-2271.
[5] Angrick, Miguel et al. “Speech synthesis from ECoG using densely connected 3D convolutional neural networks.” Journal of neural engineering vol. 16,3 (2019): 036019. doi:10.1088/1741-2552/ab0c59
[6] Herff, Christian et al. “Generating Natural, Intelligible Speech From Brain Activity in Motor, Premotor, and Inferior Frontal Cortices.” Frontiers in neuroscience vol. 13 1267. 22 Nov. 2019, doi:10.3389/fnins.2019.01267

UNESCO Institute of Statistics and World Bank Waiver Form

I attest that I currently live, work, or study in a country on the UNESCO Institute of Statistics and World Bank List of Low and Middle Income Countries list provided.

No