Generalizability of Deep Learning in Alzheimer’s Disease and Frontotemporal Dementia Classification

Poster No:

1134 

Submission Type:

Abstract Submission 

Authors:

Myrthe van Haaften1, Kaouther Mouheb1, Tavia Evans1, Henri Vrooman1, Harro Seelaar1, Frank Wolters1, Meike Vernooij1, Esther Bron1

Institutions:

1Erasmus MC University Medical Center, Rotterdam, the Netherlands

First Author:

Myrthe van Haaften, MSc  
Erasmus MC University Medical Center
Rotterdam, the Netherlands

Co-Author(s):

Kaouther Mouheb  
Erasmus MC University Medical Center
Rotterdam, the Netherlands
Tavia Evans  
Erasmus MC University Medical Center
Rotterdam, the Netherlands
Henri Vrooman  
Erasmus MC University Medical Center
Rotterdam, the Netherlands
Harro Seelaar, MD, PhD  
Erasmus MC University Medical Center
Rotterdam, the Netherlands
Frank Wolters  
Erasmus MC University Medical Center
Rotterdam, the Netherlands
Meike Vernooij, MD, PhD  
Erasmus MC University Medical Center
Rotterdam, the Netherlands
Esther Bron, PhD  
Erasmus MC University Medical Center
Rotterdam, the Netherlands

Introduction:

Dementia is a syndrome that can be caused by different underlying diseases, such as Alzheimer's disease (AD) and frontotemporal dementia (FTD). Differentiating between AD and FTD is difficult because of overlap in symptoms and neuroimaging patterns, and both diseases have varying clinical presentations. Data-driven deep learning models are able to discover complex patterns in imaging data, which can support the diagnostic process. Several large neuroimaging cohorts of patients with dementia provide the sample size to study such deep learning models, but their generalizability to local situations is uncertain. The aim of this study is twofold: first, to develop an MRI-based deep learning model on public AD and FTD datasets, and evaluate its generalizability to a local memory clinic (MC) cohort; second, to evaluate the effect of finetuning the model on the local MC cohort.

Methods:

We included T1-weighted brain MRI scans of AD and FTD patients from four cohorts: ADNI, NIFD, NACC (NIA-funded Alzheimer's Disease Research Centers, grant U24 AG072122), and ACE (our in-house MC cohort). Only AD patients with an age up to 75 years were included, as first FTD diagnosis in older patients is uncommon and therefore the classification task is less relevant beyond this age. ADNI, NIFD and NACC were combined into the source dataset (N=593 AD, N=192 FTD), whereas ACE was the local MC cohort (N=125 AD, N=113 FTD). The datasets were 10 times randomly split into a train, validation and test set (source: 80%/10%/10%, ACE: 40%/10%/50%), stratified for the clinical diagnosis and, for the source, original data cohort. The MRI scans were processed using a voxel-based morphometry pipeline to construct gray matter density maps, which were then used as the input for AD vs. FTD classification using DenseNet-121, a widely used deep learning model. For each data split, we first trained the model on the source dataset, and then finetuned all model layers on ACE. Hyperparameters were tuned in the first split, with the optimal combination used for each of the 10 splits. Class weights were used to correct for the class imbalance. The models before and after finetuning were tested on the respective source and ACE test sets, and mean performance metrics were computed over the 10 splits.

Results:

The models trained on only the source data (pre-finetuning) showed a good overall performance (Fig. 1), with a small drop between the internal source test set (balanced accuracy (BA)=0.87) and external ACE test set (BA=0.82). Notably, finetuning the models on ACE (post-finetuning) lowered the performance slightly instead of improving it (BA=0.80 on ACE). FTD images from NACC were harder to classify (accuracy (ACC)=0.70) compared to those from NIFD (ACC=0.89) and ACE (ACC=0.77) (Fig. 2a), which might be due to increased data heterogeneity (e.g. in imaging protocol) and lower FTD sample size in NACC. We did not observe this effect in AD, however, possibly because the AD sample size in NACC was larger than for FTD. In ACE, model performance was higher on AD than on FTD (ACC 0.87 vs. 0.77 pre-finetuning, 0.84 vs. 0.77 post-finetuning). For both the pre- and post-finetuning models, the performance differed over the AD and FTD subtypes (Fig. 2b). The pre-finetuning model had a large performance drop for progressive non-fluent aphasia diagnosis, an FTD language variant, between the source data and ACE (ACC 0.86 vs. 0.44), which was not resolved by finetuning.
Supporting Image: fig1_overall_performance.png
Supporting Image: fig2_subset_performance.png
 

Conclusions:

Models developed on ADNI, NIFD and NACC generally showed good generalizability to a local tertiary MC cohort, implying that these datasets are representative at least for specialized referral centers. Finetuning all layers had no beneficial effect on performance, but its added value might be limited by the heterogeneity and small size of the MC validation set. Other finetuning strategies (e.g. tuning only last layers, few-shot learning) or more extensive hyperparameter tuning can be further explored.

Disorders of the Nervous System:

Neurodegenerative/ Late Life (eg. Parkinson’s, Alzheimer’s) 2

Modeling and Analysis Methods:

Classification and Predictive Modeling 1

Novel Imaging Acquisition Methods:

Anatomical MRI

Keywords:

Degenerative Disease
Machine Learning
STRUCTURAL MRI
Other - Dementia Diagnosis

1|2Indicates the priority used for review

Abstract Information

By submitting your proposal, you grant permission for the Organization for Human Brain Mapping (OHBM) to distribute your work in any format, including video, audio print and electronic text through OHBM OnDemand, social media channels, the OHBM website, or other electronic publications and media.

I accept

The Open Science Special Interest Group (OSSIG) is introducing a reproducibility challenge for OHBM 2025. This new initiative aims to enhance the reproducibility of scientific results and foster collaborations between labs. Teams will consist of a “source” party and a “reproducing” party, and will be evaluated on the success of their replication, the openness of the source work, and additional deliverables. Click here for more information. Propose your OHBM abstract(s) as source work for future OHBM meetings by selecting one of the following options:

I do not want to participate in the reproducibility challenge.

Please indicate below if your study was a "resting state" or "task-activation” study.

Other

Healthy subjects only or patients (note that patient studies may also involve healthy subjects):

Patients

Was this research conducted in the United States?

No

Were any human subjects research approved by the relevant Institutional Review Board or ethics panel? NOTE: Any human subjects studies without IRB approval will be automatically rejected.

Yes

Were any animal research approved by the relevant IACUC or other animal research panel? NOTE: Any animal studies without IACUC approval will be automatically rejected.

Not applicable

Please indicate which methods were used in your research:

Structural MRI
Computational modeling

Provide references using APA citation style.

Not applicable

UNESCO Institute of Statistics and World Bank Waiver Form

I attest that I currently live, work, or study in a country on the UNESCO Institute of Statistics and World Bank List of Low and Middle Income Countries list provided.

No