High-Dimensional Inference Controlling for UKB Brain Imaging Confounds

Poster No:

1612 

Submission Type:

Abstract Submission 

Authors:

Lav Radosavljevic1, Stephen Smith1, Thomas Nichols1

Institutions:

1University of Oxford, Oxford, Oxfordshire

First Author:

Lav Radosavljevic  
University of Oxford
Oxford, Oxfordshire

Co-Author(s):

Stephen Smith  
University of Oxford
Oxford, Oxfordshire
Thomas Nichols, PhD  
University of Oxford
Oxford, Oxfordshire

Introduction:

The UK Biobank (UKB) Brain Imaging cohort contains brain imaging data from around 50,000 subjects and has yielded invaluable understanding of the links between imaging and non-imaging variables. When establishing such links, it is crucial to control for confounding factors to avoid the detection of spurious associations. While this can be easily done in principle by adding variables to a linear regression, there are over 1000 brain imaging confounds and analyses with small subsets of subjects can be rank-deficient. In this work, we explore a number of methods that perform inference for linear models where the number of confounding covariates exceeds the number of subjects and evaluate their performance using real UKB brain imaging confounds and simulated outcomes and exposures. Using the results from our simulation study we propose a candidate method for high-dimensional inference on UKB data and perform a real world data task, testing the association between Alzheimer's Disease (AD) and ≈4000 IDPs.

Methods:

Our baseline method is Ordinary Least Squares (OLS); to reduce the confound variables and allow OLS with smaller samples we use Principal Component Analysis (PCA) on the confound data and include as many principal components (PCs) while ensuring there are at least 20 error degrees of freedom (eDF) in our OLS model. A similar less conservative method, "OLS-PCA", uses PCs sufficient to explain 95% of the confound variance (subject to having at least 20 eDF). We consider three additional classes of methods for high-dimensional inference:

1. Ridge Regression, with p-values obtained either via approximate t-tests [2] or permutation testing [4].

2. Variable selection methods, either naive single selection (confounds selected for outcome, [6]), or double selection (confounds selected for either outcome or exposure, [1]). We use LASSO or ElasticNet for variable selection purposes.

3. De-sparsified LASSO approaches [5].

Our simulation framework manipulates the level of sparsity by using confound regression coefficients from a signed power of a mean zero normal variate, for powers δ=1,2,3,5,7,10, standardized. We also vary the number of subjects and the strength of confounding.
The most appropriate method is selected for testing association between AD and IDPs, adjusting for confounds. Our data consists of 192 subjects in total, with 64 AD patients and 128 age- and sex-matched controls.

Results:

Our extensive simulations show that one de-sparsified LASSO method, which we refer to as DESLA-MP, has a controlled False Positive Rate (FPR) at all levels of sparsity and confounding, and considerably higher power than OLS. It is the only method that we found could be used in all possible settings without having an inflated FPR. For selection based methods, this is likely due to failing to select weak confounders which cause an inflation in the FPR. Figure 1 shows an FPR plot and a power plot for an example setting with 100 subjects, medium sparsity and strong confounding. A dashed line in the power plot indicates that the method did not have a controlled FPR. Figure 2 shows the Z-scores of the real data analysis testing association between AD and IDPs. As we can see, DESLA-MP yields a number of significant negative associations between AD and regional tissue volumes, while the baseline method OLS has clearly much lower sensitivity. The five IDPs which pass the Bonferroni corrected significance threshold using DESLA-MP all measure the volume of regions in the Hippocampus, which is severely affected by AD [3].
Supporting Image: comb_plot_n_100_delta_2_rhobeta_05.png
   ·Figure 1: Plots showing the FPR and power of each method at n=100 subjects, medium sparsity and strong confounding. Of the methods that have a controlled FPR, DESLA-MP has the highest power.
Supporting Image: DESLA-MP_192.png
   ·Figure 2: Z-score plots of association tests between AD and IDPs, for OLS and DESLA-MP. DESLA-MP clearly has higher sensitivity for IDPs that are expected to be negatively associated with AD.
 

Conclusions:

Our simulation study has found an appropriate method for associations with IDPs using high-dimensional confounds, motivated by the UKB. We have also demonstrated that this method appears to improve sensitivity on AD-IDP associations. This method is also applicable in any setting where a large pool of confounds are considered but sample size or other considerations prevent simply adding all confounds to the regression.

Disorders of the Nervous System:

Neurodegenerative/ Late Life (eg. Parkinson’s, Alzheimer’s)

Modeling and Analysis Methods:

Methods Development 2
Multivariate Approaches 1

Keywords:

Data analysis
Multivariate
Statistical Methods

1|2Indicates the priority used for review

Abstract Information

By submitting your proposal, you grant permission for the Organization for Human Brain Mapping (OHBM) to distribute your work in any format, including video, audio print and electronic text through OHBM OnDemand, social media channels, the OHBM website, or other electronic publications and media.

I accept

The Open Science Special Interest Group (OSSIG) is introducing a reproducibility challenge for OHBM 2025. This new initiative aims to enhance the reproducibility of scientific results and foster collaborations between labs. Teams will consist of a “source” party and a “reproducing” party, and will be evaluated on the success of their replication, the openness of the source work, and additional deliverables. Click here for more information. Propose your OHBM abstract(s) as source work for future OHBM meetings by selecting one of the following options:

I am submitting this abstract as an original work to be reproduced. I am available to be the “source party” in an upcoming team and consent to have this work listed on the OSSIG website. I agree to be contacted by OSSIG regarding the challenge and may share data used in this abstract with another team.

Please indicate below if your study was a "resting state" or "task-activation” study.

Other

Healthy subjects only or patients (note that patient studies may also involve healthy subjects):

Patients

Was this research conducted in the United States?

No

Were any human subjects research approved by the relevant Institutional Review Board or ethics panel? NOTE: Any human subjects studies without IRB approval will be automatically rejected.

Yes

Were any animal research approved by the relevant IACUC or other animal research panel? NOTE: Any animal studies without IACUC approval will be automatically rejected.

Not applicable

Please indicate which methods were used in your research:

Functional MRI
Structural MRI
Diffusion MRI
Computational modeling

For human MRI, what field strength scanner do you use?

3.0T

Provide references using APA citation style.

Belloni, A. (2014). Inference on treatment effects after selection among high-dimensional controls. Review of Economic Studies, 81(2), 608-650.

Cule, E. (2011). Significance testing in ridge regression for genetic data. BMC bioinformatics, 12, 1-15.

Halliday, G. (2017). Pathology and hippocampal atrophy in Alzheimer's disease. The Lancet Neurology, 16(11), 862-864.

Hemerik, J. (2021). Permutation testing in high-dimensional linear models: an empirical investigation. Journal of Statistical Computation and Simulation, 91(5), 897-914.

Javanmard, A. (2014). Confidence intervals and hypothesis testing for high-dimensional regression. The Journal of Machine Learning Research, 15(1), 2869-2909.

Zhao, S. (2021). In defense of the indefensible: A very naive approach to high-dimensional inference. Statistical science: a review journal of the Institute of Mathematical Statistics, 36(4), 562.

UNESCO Institute of Statistics and World Bank Waiver Form

I attest that I currently live, work, or study in a country on the UNESCO Institute of Statistics and World Bank List of Low and Middle Income Countries list provided.

No