Presented During:
Thursday, June 27, 2024: 11:30 AM - 12:45 PM
COEX
Room:
Grand Ballroom 104-105
Poster No:
2053
Submission Type:
Abstract Submission
Authors:
Giuseppe Gallitto1, Robert Englert2, Balint Kincses1, Raviteja Kotikalapudi1, Jialin Li1, Kevin Hoffschlag1, Ulrike Bingel1, Tamas Spisak2
Institutions:
1Department of Neurology, University Medicine Essen, Germany, Essen, NRW, 2Department of Diagnostic and Interventional Radiology and Neuroradiology, University Medicine Essen, Essen, NRW
First Author:
Giuseppe Gallitto
Department of Neurology, University Medicine Essen, Germany
Essen, NRW
Co-Author(s):
Robert Englert
Department of Diagnostic and Interventional Radiology and Neuroradiology, University Medicine Essen
Essen, NRW
Balint Kincses
Department of Neurology, University Medicine Essen, Germany
Essen, NRW
Jialin Li
Department of Neurology, University Medicine Essen, Germany
Essen, NRW
Kevin Hoffschlag
Department of Neurology, University Medicine Essen, Germany
Essen, NRW
Ulrike Bingel
Department of Neurology, University Medicine Essen, Germany
Essen, NRW
Tamas Spisak
Department of Diagnostic and Interventional Radiology and Neuroradiology, University Medicine Essen
Essen, NRW
Introduction:
In traditional human neuroimaging experiments, researchers create experimental paradigms with a psychological/behavioral validity to infer the corresponding neural correlates. Here, we introduce a novel approach called Reinforcement Learning via Brain Feedback (RLBF), that inverts the direction of inference; it seeks for the optimal stimulation or paradigm to maximize (or minimize) response in predefined brain regions or networks (fig.1). The stimulation/paradigm is found via a reinforcement learning algorithm (Kaelbling et al., 1996) that is rewarded based on real-time fMRI (Sulzer et al., 2013) data. Specifically, the reinforcement learning agent manipulates the paradigm space (e.g. via generative AI) to drive neural activity in a specific direction. Then, rewarded by measured brain responses, the agent gradually learns to adjust its choices to converge towards an optimal solution. Here, we present the results of a proof of concept study that aimed to confirm the viability of the proposed approach with simulated and empirical real-time fMRI data.
Methods:
We build a streamlined setup: a soft Q-learner (Haarnoja et al., 2017) with a smooth reward function (fig 1. "Reinforcement Learning") and simple visual stimulations to implement the paradigm space (fig 1. "Paradigm Generator"). We present the participants various versions of a flickering checkerboard, with contrast and frequency as free parameters of the paradigm space, and a contrast of zero equal to no stimulation. The reward signal is calculated from responses in the primary visual cortex (fig.2b), measured by a linear model fitted to a single block of fMRI data (5s stimulation, 11s rest). The hypothesis function is convolved with a conventional double-gamma HRF. Here, the agent's task is to determine the best contrast-frequency configuration that maximizes a participant's brain activity in the primary visual cortex.
First, we test our method through simulations with realistic effect size estimates. The optimal ground truth is defined as a linear function of contrast and frequency, with maximum activation at maximum contrast and 7Hz frequency. The agent has 100 trials in which it selects a contrast and frequency value and updates its Q-table using the reward calculated by our ground truth equation, with Gaussian noise added. We fine-tune the hyperparameters for the models using realistic initial conditions (signal-to-noise: 0.5 - 3.0; q-table smoothness: 0.5 - 4.0; soft-Q temperature: 0.2; learning rate: 0.05 - 0.9). Then, with parameters chosen on the simulation results, we measure data for n=5 participants. In the scanner, we presented the checkerboard in 45 blocks with a TR of 1 second (10 minutes) and allowed the reinforcement learner to optimize the visual stimulation based on brain feedback.
Results:
Simulation results show that the proposed implementation provides a robust solution in a relatively wide range of initial conditions and a small amount of trials. High smoothing power appears to function well with higher SNRs, while lower SNRs seem to require lower learning rates for optimal training (fig.2a). The models display a remarkable stability with a wide range of learning rate values. Results from the empirical measurements (fig.2b) are in line with knowledge about the contrast and frequency dependence of the checkerboard-response (Victor et al., 1997) and provide initial confirmation for the feasibility of the proposed approach.
Conclusions:
Here we presented a proof of concept for the RLBF method, a novel experimental approach, which aims to find the optimal stimulation paradigm to modulate individual brain activity in predefined regions/networks. While this proof of concept study employed a simplified setup, future work aims to extend the approach with paradigm spaces constructed by generative AI solutions. By inverting the direction of inference ("brain -> behavior"; instead of "behavior -> brain") the proposed approach may emerge as a novel tool for basic and translational research.
Modeling and Analysis Methods:
Activation (eg. BOLD task-fMRI) 2
Methods Development
Motor Behavior:
Brain Machine Interface 1
Perception, Attention and Motor Behavior:
Perception: Visual
Keywords:
Data analysis
Machine Learning
MRI
Other - Brain Machine Interface
1|2Indicates the priority used for review
Provide references using author date format
Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996). Reinforcement learning: A survey. Journal of artificial intelligence research, 4, 237-285.
Sulzer, J., Haller, S., Scharnowski, F., Weiskopf, N., Birbaumer, N., Blefari, M. L., ... & Sitaram, R. (2013). Real-time fMRI neurofeedback: progress and challenges. Neuroimage, 76, 386-399.
Haarnoja, T., Tang, H., Abbeel, P., & Levine, S. (2017, July). Reinforcement learning with deep energy-based policies. In International conference on machine learning (pp. 1352-1361). PMLR.
Victor, J. D., Conte, M. M., & Purpura, K. P. (1997). Dynamic shifts of the contrast-response function. Visual neuroscience, 14(3), 577-587.