Distribution bias: imbalanced training data induces local brain age prediction shifts

Poster No:

1114 

Submission Type:

Abstract Submission 

Authors:

Maximilian Konowski1, Jan Ernsting1, Nils Winter1, Lukas Fisch1, Carlotta Barkhau1, Jennifer Spanagel1, Udo Dannlowski2, Nils Opel3, Tim Hahn1, Ramona Leenings3

Institutions:

1University of Münster, Münster, Germany, 2Institute for Translational Psychiatry, Münster, Germany, 3Department of Psychiatry and Psychotherapy, Jena, Germany

First Author:

Maximilian Konowski  
University of Münster
Münster, Germany

Co-Author(s):

Jan Ernsting  
University of Münster
Münster, Germany
Nils Winter  
University of Münster
Münster, Germany
Lukas Fisch  
University of Münster
Münster, Germany
Carlotta Barkhau  
University of Münster
Münster, Germany
Jennifer Spanagel  
University of Münster
Münster, Germany
Udo Dannlowski  
Institute for Translational Psychiatry
Münster, Germany
Nils Opel  
Department of Psychiatry and Psychotherapy
Jena, Germany
Tim Hahn  
University of Münster
Münster, Germany
Ramona Leenings  
Department of Psychiatry and Psychotherapy
Jena, Germany

Introduction:

The brain age gap (BAG), has been used as a biomarker for atypical neurodegeneration in MRI images and has shown significant salience in the context of different psychiatric conditions and lifestyle factors (Hahn 2022, Blake 2023). Brain Age predictions often exhibit a shift, which is attributed to regression towards the mean and termed age bias. While the age bias has been addressed in the literature (Smith 2019, Beheshti 2019), we here demonstrate the influence of an additional factor, which we term distribution bias. Prevalent imbalances in age distributions can introduce challenges to the training, as the brain age model develops disproportionate understanding of neurobiology across the age continuum. Moreover, individual aging trajectories vary significantly between individuals, and brain structures show overlapping levels of neurodegeneration among closely related age groups, which are indifferentiable by the model. In case of ambiguity, predicting over-represented age groups thus statistically optimizes the loss function. To evaluate a potential impact on scientific analysis, we empirically evaluate differently skewed age distributions in the training sample and highlight their influence on the resulting predictions using a shared, balanced testset.

Methods:

Using the Simple Fully Convolutional Network (SFCN) architecture (SFCN-reg, see Leonardsen 2022), we train brain age models with random subsamples in three imbalanced age distributions, respectively over-representing a) younger individuals, b) older individuals, c) both young and old individuals equally and a fourth age distribution d) using a balanced age distribution. Training data was curated from publicly available datasets (ADNI Petersen 2010; OASIS-3 Marcus 2007, LaMontagne 2019; OpenNeuro Markiewicz 2021, curated by Fisch 2023). Age distribution of this available data is depicted in Fig. 1. From a total of n=9352 CAT12-preprocessed (Gaser, https://neuro-jena.github.io/cat/) T1-MRI scans of healthy control subjects, we randomly sampled data from different age groups. To ensure robust estimates for our analysis, we bootstrapped the random sampling 100 times. In each of the 100 evaluations, we select 7 random samples from each 1-year age bin to form a balanced testset Xtest (n=482), the training distributions are randomly, yet stratified for gender and study/scanner, subsampled from the remaining Xtrain (n=8871). Finally, we use each model's predictions on Xtest to evaluate the resulting age distribution of the model predictions and a potential drift in the BAGs.
Supporting Image: OHBM_abstract_fig1.png
 

Results:

The overall MAE is comparable across the different models (mean [std]: a = 5.74 [0.59], b= 5.37 [0.33], c = 5.23 [0.31], d = 5.92 [0.36]). In contrast, when calculating the local MAE, i.e. per age-group, it varies greatly when analyzed along the age continuum (see Fig. 2). In all cases we see the regression to the mean, i.e. the age bias, in the predictions that are drawn away from the outer edges of the age continuum towards the center. Importantly, we find evidence for what we call distribution bias, namely that the age predictions are also drawn away from age-groups under-represented in the training dataset towards over-represented ones. The local MAE exhibits a peak among age-groups under-represented in the training data (up to 10 years) and is substantially lower for over-represented age groups.
Supporting Image: OHBM_abstract_fig2.png
 

Conclusions:

Our findings indicate that not only the previously acknowledged age bias, but also the distribution bias distorts the predicted age of brain age models. If not accounted for, distribution bias poses an implicit risk to the validity of scientific analysis, particularly for evaluations of predictions for age-groups, which were under-represented during training.

Modeling and Analysis Methods:

Classification and Predictive Modeling 1

Neuroanatomy, Physiology, Metabolism and Neurotransmission:

Neuroanatomy Other 2

Keywords:

Aging
Machine Learning
Modeling
MRI
STRUCTURAL MRI
Other - Brain Age

1|2Indicates the priority used for review

Abstract Information

By submitting your proposal, you grant permission for the Organization for Human Brain Mapping (OHBM) to distribute your work in any format, including video, audio print and electronic text through OHBM OnDemand, social media channels, the OHBM website, or other electronic publications and media.

I accept

The Open Science Special Interest Group (OSSIG) is introducing a reproducibility challenge for OHBM 2025. This new initiative aims to enhance the reproducibility of scientific results and foster collaborations between labs. Teams will consist of a “source” party and a “reproducing” party, and will be evaluated on the success of their replication, the openness of the source work, and additional deliverables. Click here for more information. Propose your OHBM abstract(s) as source work for future OHBM meetings by selecting one of the following options:

I do not want to participate in the reproducibility challenge.

Please indicate below if your study was a "resting state" or "task-activation” study.

Other

Healthy subjects only or patients (note that patient studies may also involve healthy subjects):

Healthy subjects

Was this research conducted in the United States?

No

Were any human subjects research approved by the relevant Institutional Review Board or ethics panel? NOTE: Any human subjects studies without IRB approval will be automatically rejected.

Yes

Were any animal research approved by the relevant IACUC or other animal research panel? NOTE: Any animal studies without IACUC approval will be automatically rejected.

Not applicable

Please indicate which methods were used in your research:

Structural MRI

For human MRI, what field strength scanner do you use?

3.0T
1.5T

Which processing packages did you use for your study?

Other, Please list  -   CAT12

Provide references using APA citation style.

Beheshti, I. (2019). Bias-adjustment in neuroimaging-based brain age frameworks: A robust scheme. NeuroImage: Clinical, vol. 24, 102063. https://doi.org/10.1016/j.nicl.2019.102063

Blake, K. V. (2023), ‘Advanced brain ageing in adult psychopathology: A systematic review and meta-analysis of structural MRI studies’, Journal of Psychiatric Research, vol. 157, pp. 180–191.

Fisch, L. (2023), ‘Deepbet: Fast brain extraction of T1-weighted MRI using Convolutional Neural Networks’, arXiv preprint arXiv:2308.07003

Hahn, T. (2022), ‘An uncertainty-aware, shareable, and transparent neural network architecture for brain-age modeling’. Science advances, vol. 8, no. 1, p. eabg9471.

LaMontagne, P. J. (2019), "OASIS-3: longitudinal neuroimaging, clinical, and cognitive dataset for normal aging and Alzheimer disease." medrxiv, 2019.12.13.19014902. https://doi.org/10.1101/2019.12.13.19014902

Leonardsen, E. H. (2022). Deep neural networks learn general and clinically relevant representations of the ageing brain. NeuroImage, vol. 256, 119210. https://doi.org/10.1016/j.neuroimage.2022.119210

Marcus, D. S. (2007), "Open Access Series of Imaging Studies (OASIS): cross-sectional MRI data in young, middle aged, nondemented, and demented older adults." Journal of cognitive neuroscience vol. 19, no. 9, pp. 1498-1507.

Markiewicz, C. J. (2021). The OpenNeuro resource for sharing of neuroscience data. eLife, vol 10, e71774.

Petersen, R. C. (2010), ‘Alzheimer's disease Neuroimaging Initiative (ADNI) clinical characterization.’ Neurology, vol. 74, no. 3, pp. 201-209.

Smith, S. M. (2019). Estimation of brain age delta from brain imaging. NeuroImage, vol. 200, pp. 528–539. https://doi.org/10.1016/j.neuroimage.2019.06.017

UNESCO Institute of Statistics and World Bank Waiver Form

I attest that I currently live, work, or study in a country on the UNESCO Institute of Statistics and World Bank List of Low and Middle Income Countries list provided.

No