The effects of data leakage on connectome-based machine learning models

Matthew Rosenblatt Presenter
Yale University
New Haven, CT 
United States
 
Tuesday, Jun 25: 12:00 PM - 1:15 PM
3944 
Oral Sessions 
COEX 
Room: ASEM Ballroom 202 
Understanding individual differences in brain-behavior relationships is a central goal of neuroscience. As such, machine learning approaches using neuroimaging data, such as functional connectivity, have grown increasingly popular in predicting numerous phenotypes. The reproducibility of such studies is hindered by data leakage, where information about the test data is introduced into the model during training (1). Although leakage is never a correct practice, quantifying the effects of leakage in neuroimaging data is important due to its pervasiveness. Here, we evaluate the effects of leakage on functional connectome-based machine learning in four large datasets for the prediction of three phenotypes.