Cluster-aware machine learning of robust brain-behavior associations for precision neuropsychiatry.

Presented During:

Tuesday, June 25, 2024: 12:00 PM - 1:15 PM
COEX  
Room: ASEM Ballroom 202  

Poster No:

1931 

Submission Type:

Abstract Submission 

Authors:

Amanda Buch1, Conor Liston1, Logan Grosenick1

Institutions:

1Weill Cornell Medicine, Cornell University, New York, NY

First Author:

Amanda Buch  
Weill Cornell Medicine, Cornell University
New York, NY

Co-Author(s):

Conor Liston  
Weill Cornell Medicine, Cornell University
New York, NY
Logan Grosenick  
Weill Cornell Medicine, Cornell University
New York, NY

Introduction:

Explainable machine learning of complex multimodal data in neuroscience research is revolutionizing precision neuropsychiatry. Interpretable clustering of patients into distinct subtypes can enhance personalized prognosis, diagnosis, and treatment. However, training on biomedical data poses challenges due to high dimensionality, clustering, and limited sample size. To address this, we propose a scalable approach for cluster-aware embedding, incorporating a convex clustering penalty. This approach facilitates hierarchical clustering of principal component analysis (PCA), locally linear embedding (LLE), and canonical correlation analysis (CCA). Our method improves upon existing techniques and offers a modular framework for interpretable biomarker discovery in precision medicine. We apply this approach to identify neurocognitive subtypes in the Adolescent Brain Cognitive Development (ABCD) and Autism Brain Imaging Data Exchange (ABIDE) datasets.

Methods:

Clustering algorithms are often considered difficult optimization problems. However, by relaxing certain constraints, clustering algorithms can be reformulated as convex optimization problems by relaxing the hard clustering constraint ("convex clustering") [3]. Here, we introduce Pathwise Clustered Matrix Factorization (PCMF) and its multi-view extension, incorporating a convex clustering penalty to make embedding methods cluster-aware. PCMF does not require pre-determining the number of clusters and can generate a dendrogram for subtype discovery in biomedical datasets. It fits a path of solutions along a sequence of parameters and estimates split points based on model fit improvement. We extend our PCMF approach to include canonical correlation analysis (CCA) within clusters, introducing pathwise clustered canonical correlation analysis (P3CA). We apply P3CA to discover neurocognitive subtypes in the Adolescent Brain Cognitive Development (ABCD) [5] and Autism Brain Imaging Data Exchange (ABIDE) [6-7] datasets. In the first case study, we use resting state functional connectivity (RSFC) and behavioral (ADOS-2 and verbal IQ) data from N=299 individuals with autism spectrum disorder (ASD) in the ABIDE dataset. In the second case study, we use RSFC and behavioral (NIH Toolbox) data from N=490 adolescents in the ABCD dataset. We implemented a stringent preprocessing of both datasets following or exceeding well-established guidelines [8–9]. Prior to fitting P3CA to the ABIDE and ABCD datasets, we perform robust feature selection as previously described [2]. We evaluated the stability of neurocognitive subtypes by randomly holding out 30% of the data and recalculating P3CA.
Supporting Image: OHBM2024-01.png
   ·Figure 1: PCMF for explainable joint PCA and hierarchical clustering.
 

Results:

Applying P3CA to the ABIDE dataset, we found strong differences in associations of ASD subtype embeddings with behavior and RSFC (Fig. 2), consistent with known autism subpopulation differences of behaviors with prefrontal cortex to somatosensory cortex, posterior parietal cortex, and middle temporal gyrus [2]. Subject-level P3CA embedding coefficients were robust to data perturbation (cosine similarity: 0.93 ± 0.05 for U estimates; 0.97 ± 0.03 for V estimates). Our method provides clearer cluster separation and improved interpretability. Applying P3CA to the ABCD dataset, we identified cluster-specific brain-behavior embeddings that described two neurocognitive phenotypes (separating high/low crystallized vs. fluid intelligence individuals), which were robust in replication in a second partition of the ABCD dataset.
Supporting Image: OHBM2024-02.png
   ·Figure 2: Fitting the cluster-aware embedding method to neuroimaging and behavioral datasets reveals ASD and neurocognitive subtypes.
 

Conclusions:

Overall, our approach provides a flexible and effective method for clustering and subtype discovery in biomedical datasets with neuroimaging and behavioral data. We have introduced an interpretable joint clustering and embedding strategy using a modular convex clustering penalty that can be applied in multimodal datasets. We showcased our method in two case studies using the ABCD and ABIDE datasets, which revealed robust neurocognitive subtypes in both adolescents and individuals with autism spectrum disorder.

Disorders of the Nervous System:

Neurodevelopmental/ Early Life (eg. ADHD, autism) 2

Modeling and Analysis Methods:

Connectivity (eg. functional, effective, structural)
Methods Development 1
Multivariate Approaches

Keywords:

Autism
Cognition
Computational Neuroscience
Data analysis
Development
Machine Learning
Modeling
Statistical Methods
Other - neuropsychiatric subtypes

1|2Indicates the priority used for review

Provide references using author date format

[1] Drysdale, Andrew T. et al (2017). “Resting-state connectivity biomarkers define neurophysiological subtypes of depression.” Nature Medicine 23 (1): 28–38
[2] Buch, Amanda M., et al (2023) “Molecular and Network-Level Mechanisms Explaining Individual Differences in Autism Spectrum Disorder.” Nature Neuroscience 26 (4): 650–63.
[3] Hocking, Toby Dylan et al (2011) “Clusterpath: An Algorithm for Clustering Using Convex Fusion Penalties.” In Proceedings of the 28th International Conference on Machine Learning, 745–52.
[4] Buch, Amanda M. et al (2022) “Simple and Scalable Algorithms for Cluster-Aware Precision Medicine.” arXiv [cs.LG]. arXiv. http://arxiv.org/abs/2211.16553.
[5] Volkow, Nora D. et al. “The Conception of the ABCD Study: From Substance Use to a Broad NIH Collaboration.” Developmental Cognitive Neuroscience 32 (August): 4–7.
[6] Martino, A. et al (2014), ‘The Autism Brain Imaging Data Exchange: Towards a Large-Scale Evaluation of the Intrinsic Brain Architecture in Autism’, Molecular Psychiatry, vol. 19, pp. 659-667
[7] Martino, A. et al (2017), ‘Enhancing Studies of the Connectome in Autism Using the Autism Brain Imaging Data Exchange II’, Scientific Data, vol. 4, no. 170010
[8] Power, J.D. et al (2012), ‘Spurious But Systematic Correlations in Functional Connectivity MRI Networks Arise from Subject Motion’, Neuroimage, vol. 59, pp. 2142-2154
[9] Yan, C.G. et al (2013) ‘Standardizing the Intrinsic Brain: Towards Robust Measurement of Inter-Individual Variation in 1000 Functional Connectomes’, Neuroimage, vol. 80, pp. 246-262