Electrophysiological Correlates of Reinforcement Learning in the Human Ventral Tegmental Area

Presented During:

Thursday, June 26, 2025: 11:30 AM - 12:45 PM
Brisbane Convention & Exhibition Centre  
Room: M3 (Mezzanine Level)  

Poster No:

626 

Submission Type:

Abstract Submission 

Authors:

Arjun Ramaswamy1, Douglas Steele2, Jonathan Roiser3, Manjit Matharu1, Umesh Vivekananda1, Ludvic Zrinzo1, Vladimir Litvak1

Institutions:

1UCL Queen Square Institute of Neurology, London, London, 2University of Dundee, Dundee, Scotland, 3UCL Institute of Cognitive Neuroscience, London, London

First Author:

Arjun Ramaswamy  
UCL Queen Square Institute of Neurology
London, London

Co-Author(s):

Douglas Steele  
University of Dundee
Dundee, Scotland
Jonathan Roiser  
UCL Institute of Cognitive Neuroscience
London, London
Manjit Matharu  
UCL Queen Square Institute of Neurology
London, London
Umesh Vivekananda  
UCL Queen Square Institute of Neurology
London, London
Ludvic Zrinzo  
UCL Queen Square Institute of Neurology
London, London
Vladimir Litvak  
UCL Queen Square Institute of Neurology
London, London

Introduction:

The Ventral Tegmental Area (VTA) is the main source of dopaminergic input to the human prefrontal cortex. Studies in non-human primates and human fMRI suggest VTA dopaminergic neurons encode reward prediction error, critical for reinforcement learning. However, direct electrophysiological evidence in humans remains absent.

Methods:

We recorded VTA Local Field Potentials (LFPs) in 14 patients (9 male, mean age 46±10) undergoing Deep Brain Stimulation for chronic cluster headache. The task was a modified Pessiglione task (Pessiglione et al., 2006), involving rewarding ("win"), neutral ("look"), and aversive ("loss") outcomes linked to distinct fractal image pairs with the alternative being "nothing" in all 3 cases. Each pair was associated with high (0.7) or low (0.3) probabilities of the respective outcome, randomised across participants. Participants aimed to maximise 'vouchers' won through trial and error, though no actual payments were made.
We analysed task choices using a Rescorla-Wagner model with dual learning rates and participant-specific learning and decision sensitivity parameters.
VTA-LFPs recorded with a common earlobe reference were converted to a bipolar montage, down-sampled to 300 Hz, filtered above 0.5 Hz, and epoched around stimulus, button press, and outcome events. Trials with artefacts were excluded using a z-score threshold of 5. One patient without clear LFP responses was excluded.
Parametric statistical analysis was done in SPM12 toolbox with cluster-level correction for multiple comparisons (p<0.05) and in SPSS software. Evoked responses were low-pass filtered at 20 Hz, down-sampled to 60 Hz, averaged across channels, baseline-corrected, and normalised by the raw LFP's standard deviation. Induced responses up to 100 Hz were also analysed.

Results:

Patients displayed 3 behavioural strategies: gradual learning well-conforming to the Rescorla-Wagner model, persistent choice (sticking to one option for over 90% of trials) and random choice (with null model better explaining the data). Clear evoked VTA-LFP responses appeared during stimulus presentation (significant clusters 100–367 ms, 617–833 ms) and outcome presentation (83–317 ms), but not button presses. All events exhibited significant low-frequency induced responses, likely reflecting evoked activity. Distinct higher frequency induced responses only appeared late (>500 ms).

Pooled across trial types, win/loss/look outcomes showed significant difference compared to 'nothing' between 283–500 ms post-outcome (Fig. 1A). Linear mixed effects model revealed a significant effect of trial type (p=0.008) on LFP difference averaged across this window (Fig. 1B), with the effect of strategy and interaction between strategy and trial type not being significant. Post-hoc t-tests showed that 'win' outcomes elicited greater differences from 'nothing' than ' loss' and 'look' ones. 'Loss' and 'look' outcome effects did not differ significantly.
Supporting Image: figures_v1.png
 

Conclusions:

We present the first evidence of human VTA electrophysiological responses in a reinforcement learning paradigm. Responses were significantly greater for reward outcomes compared to loss and neutral, regardless of behavioural strategy. Consistent with prior electrophysiological studies in non-human primates (Schultz, 2019), responses did not differ between loss and neutral outcomes. This confirms that VTA activity is selectively sensitive to rewarding outcomes. Representing the value of the chosen option is essential for computing reward prediction error, and identifying this neural representation is the focus of our ongoing work.

Brain Stimulation:

Deep Brain Stimulation

Emotion, Motivation and Social Neuroscience:

Reward and Punishment 1

Higher Cognitive Functions:

Decision Making

Modeling and Analysis Methods:

EEG/MEG Modeling and Analysis

Neuroanatomy, Physiology, Metabolism and Neurotransmission:

Subcortical Structures 2

Keywords:

Dopamine
ELECTROPHYSIOLOGY
Sub-Cortical
Other - intracranial recordings, reinforcement learning, reward

1|2Indicates the priority used for review

Abstract Information

By submitting your proposal, you grant permission for the Organization for Human Brain Mapping (OHBM) to distribute your work in any format, including video, audio print and electronic text through OHBM OnDemand, social media channels, the OHBM website, or other electronic publications and media.

I accept

The Open Science Special Interest Group (OSSIG) is introducing a reproducibility challenge for OHBM 2025. This new initiative aims to enhance the reproducibility of scientific results and foster collaborations between labs. Teams will consist of a “source” party and a “reproducing” party, and will be evaluated on the success of their replication, the openness of the source work, and additional deliverables. Click here for more information. Propose your OHBM abstract(s) as source work for future OHBM meetings by selecting one of the following options:

I do not want to participate in the reproducibility challenge.

Please indicate below if your study was a "resting state" or "task-activation” study.

Task-activation

Healthy subjects only or patients (note that patient studies may also involve healthy subjects):

Patients

Was this research conducted in the United States?

No

Were any human subjects research approved by the relevant Institutional Review Board or ethics panel? NOTE: Any human subjects studies without IRB approval will be automatically rejected.

Yes

Were any animal research approved by the relevant IACUC or other animal research panel? NOTE: Any animal studies without IACUC approval will be automatically rejected.

Not applicable

Please indicate which methods were used in your research:

EEG/ERP
Neurophysiology
Structural MRI
Behavior
Computational modeling

For human MRI, what field strength scanner do you use?

1.5T

Which processing packages did you use for your study?

SPM

Provide references using APA citation style.

Pessiglione, M., Seymour, B., Flandin, G., Dolan, R. J., & Frith, C. D. (2006). Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans. Nature, 442(7106), 1042–1045. https://doi.org/10.1038/NATURE05051

Schultz, W. (2019). Recent advances in understanding the role of phasic dopamine activity. F1000Research, 8, 1680. https://doi.org/10.12688/f1000research.19793.1

UNESCO Institute of Statistics and World Bank Waiver Form

I attest that I currently live, work, or study in a country on the UNESCO Institute of Statistics and World Bank List of Low and Middle Income Countries list provided.

No