Poster No:
1083
Submission Type:
Abstract Submission
Authors:
George Hutchings1, Thomas Nichols1, Chris Holmes1, Habib Ganjgahi1
Institutions:
1University of Oxford, Oxford, Oxfordshire
First Author:
Co-Author(s):
Introduction:
There is growing interest in large-scale epidemiological studies, and dimensionality reduction is an important tool for summarising the structure in these datasets. For example, the UK Biobank and Novartis-Oxford multiple sclerosis dataset (NO.MS, a longitudinal study on over 34000 subjects) pose several challenges:
-High dimensionality: Both the number of individuals and imaging variables make these datasets extremely high-dimensional, necessitating efficient and scalable methods.
-Mixed data types: The inclusion of binary (eg lesion masks), ordinal, count, and other highly non-Gaussian variables creates difficulties for traditional methods like ICA and PCA, which are optimal for continuous (and even Gaussian) data.
-Unknown number of latent dimensions: The number of latent variables, K, is unknown, posing challenges for methods that typically require pre-specification of K or rely on subjective heuristics to determine an appropriate value.
We propose an efficient data compression method using Bayesian factor analysis, capable of handling high-dimensional data, mixed modalities, inferring the number of latent variables in a principled manner, and providing an interpretable compression of the data. We further apply this method to binary segmentation masks to uncover meaningful latent dimensions that capture underlying spatial patterns in the data.
Methods:
Key features of our Bayesian model (Fig 1) include:
- Scalability, ensured by using an EM based approach as opposed to more traditional MCMC methods which are prohibitive in high dimensions and can suffer from the label switching problem.
- A sparsity-inducing spike & slab prior on the loading matrix promotes interpretable latent variables.
- Handles binary, ordinal, count and other non-Gaussian variables via a semiparametric Gaussian copula, providing a principled approach to address mixed data.
- Latent dimension K is inferred through an Indian buffet process (IBP) prior, which shrinks any unimportant latent variables to 0.

·Figure 1.
Results:
We applied our Bayesian factor analysis model to 2,000 baseline lesion segmentation masks from NO.MS to uncover latent dimensions (brain regions) that have correlated lesion incidence. The model identified 93 latent dimensions (K=93) using the IBP prior (Fig. 2a). Despite the absence of spatial information, these latent components exhibit clear localization and encompass the majority of regions affected by lesions. The use of a spike-and-slab prior enforced sparsity in the loading matrix, ensuring that the latent dimensions remain parsimonious and interpretable. This is evident in Fig. 2a, where the regions are largely noise-free and show clear localization. In contrast, the first principal component from a naive PCA + varimax rotation method (Fig. 2b) is not sparse and is difficult to interpret.
Next, we used these latent dimensions as predictors in a Cox proportional hazards model to evaluate their association with the time to confirmed disability worsening, adjusting for baseline covariates. Of the 93 latent dimensions, 8 were significantly associated with confirmed disability worsening (Fig. 2c). In contrast, the traditional volumetric summary measure, Volume of Lesions on T2-weighted MRI (VOLT2), was not significant in this analysis.
Finally, we show the top 5 latent dimensions (brain regions) significantly correlated with clinical covariates (Fig. 2d).

·Figure2.
Conclusions:
We propose a scalable factor analysis method designed to handle the complexities of large-scale epidemiological studies. The method efficiently compresses high-dimensional, mixed modality data, using an approach inspired by probabilistic PCA, whilst infering K. Applied to MS lesion segmentation data, it identifies spatially localized, and interpretable latent dimensions, and also shows regions which are significantly correlated to several clinical covariates.
Modeling and Analysis Methods:
Bayesian Modeling 1
Methods Development 2
Keywords:
Statistical Methods
Other - Dimensionality Reduction
1|2Indicates the priority used for review
By submitting your proposal, you grant permission for the Organization for Human Brain Mapping (OHBM) to distribute your work in any format, including video, audio print and electronic text through OHBM OnDemand, social media channels, the OHBM website, or other electronic publications and media.
I accept
The Open Science Special Interest Group (OSSIG) is introducing a reproducibility challenge for OHBM 2025. This new initiative aims to enhance the reproducibility of scientific results and foster collaborations between labs. Teams will consist of a “source” party and a “reproducing” party, and will be evaluated on the success of their replication, the openness of the source work, and additional deliverables. Click here for more information.
Propose your OHBM abstract(s) as source work for future OHBM meetings by selecting one of the following options:
I do not want to participate in the reproducibility challenge.
Please indicate below if your study was a "resting state" or "task-activation” study.
Resting state
Healthy subjects only or patients (note that patient studies may also involve healthy subjects):
Patients
Was this research conducted in the United States?
No
Were any human subjects research approved by the relevant Institutional Review Board or ethics panel?
NOTE: Any human subjects studies without IRB approval will be automatically rejected.
Yes
Were any animal research approved by the relevant IACUC or other animal research panel?
NOTE: Any animal studies without IACUC approval will be automatically rejected.
Not applicable
Please indicate which methods were used in your research:
Structural MRI
Provide references using APA citation style.
D. Hoff, P. (2007). Extending the rank likelihood for semiparametric copula estimation.
Murray, J. S., Dunson, D. B., Carin, L., & Lucas, J. E. (2013). Bayesian Gaussian copula factor models for mixed data. Journal of the American Statistical Association, 108(502), 656-665.
Ročková, V., & George, E. I. (2018). The spike-and-slab lasso. Journal of the American Statistical Association, 113(521), 431-444.
Ročková, V., & George, E. I. (2016). Fast Bayesian factor analysis via automatic rotations to sparsity. Journal of the American Statistical Association, 111(516), 1608-1622.
No