Universal Reproducibility Laws for High-Dimensional Neuroimaging Data

Presented During: Poster Session 3
Friday, June 27, 2025: 01:45 PM - 03:45 PM

Presented During: Poster Session 4
Saturday, June 28, 2025: 01:45 PM - 03:45 PM

Poster No:

1574 

Submission Type:

Abstract Submission 

Authors:

Amanda Buch1, Logan Grosenick1, Conor Liston1

Institutions:

1Weill Cornell Medicine, Cornell University, New York, NY

First Author:

Amanda Buch, Ph.D.  
Weill Cornell Medicine, Cornell University
New York, NY

Co-Author(s):

Logan Grosenick, Ph.D.  
Weill Cornell Medicine, Cornell University
New York, NY
Conor Liston, M.D., Ph.D.  
Weill Cornell Medicine, Cornell University
New York, NY

Introduction:

Reproducibility is a cornerstone of scientific research, yet it remains a significant challenge when working with high-dimensional datasets typical of neuroimaging studies. Such datasets often feature more variables than observations, a regime known as the Large Dimensional Limit (LDL). While LDL is central to fields ranging from neuroimaging to genomics, its impact on reproducibility and statistical power is poorly understood. This knowledge gap has led to controversies in high-profile brain-wide association studies and neuroimaging genomic studies [1], highlighting the urgent need for a theoretical framework.

Methods:

We empirically and theoretically demonstrate that reproducibility in LDL problems adheres to "Universal Reproducibility Laws" (URLs), which precisely quantify the sample size required to ensure robust findings for given signal strength and dimensionality. Building on Random Matrix Theory (RMT) [2–4], our framework identifies critical "reproducibility phase transitions," providing tools for prospective sample size determination and power analyses (Fig. 1). We validate these URLs using neuroimaging and behavioral data from human (Fig. 1-2; ABCD functional MRI, CBCL, NIH Toolbox datasets) and animal studies (not shown), leveraging robust cross-validation and regularization techniques across 13.876 million model fits.
Supporting Image: Buch_OHBM-2025_Fig1-2-01.png
   ·Estimating robust effect sizes and separating signal from noise using URLs.
 

Results:

In brain-wide association studies (BWAS), we find that reproducible associations require far fewer samples than previously suggested using the ABCD dataset when relating functional MRI to the CBCL scale and NIH Toolbox scales (Fig. 2). For cognitive abilities, robust reproducibility (p < 0.01) is achieved with as few as 22 subjects using Support Vector Regression (SVR) or 16 with Regularized Canonical Correlation Analysis (RCCA). For psychopathology measures, 550 (SVR) and 230 (RCCA) subjects suffice, with diminishing returns observed beyond a few hundred samples. This "reproducibility phase transition" aligns with RMT predictions, showing rapid initial gains followed by logarithmic improvements in reproducibility as sample size increases.
Supporting Image: Buch_OHBM-2025_Fig1-2-02.png
   ·BWAS studies show reproducibility phase transitions with smaller sample sizes than previously reported.
 

Conclusions:

Our findings challenge the prevailing notion that thousands of participants are necessary for BWAS reproducibility, offering a more attainable path forward for studies in precision psychiatry and neuroscience. By mathematically linking multivariate neuroimaging analyses to statistical physics, we present a unified framework for reproducible machine learning in high-dimensional biomedical datasets. Open-source tools accompanying this work empower researchers to design efficient, reproducible studies across disciplines, mitigating barriers posed by recruitment and cost.

Modeling and Analysis Methods:

Methods Development 1
Multivariate Approaches 2

Keywords:

Computational Neuroscience
Data analysis
Experimental Design
FUNCTIONAL MRI
Machine Learning
Psychiatric Disorders
Statistical Methods
Other - Brain-behavior

1|2Indicates the priority used for review

Abstract Information

By submitting your proposal, you grant permission for the Organization for Human Brain Mapping (OHBM) to distribute your work in any format, including video, audio print and electronic text through OHBM OnDemand, social media channels, the OHBM website, or other electronic publications and media.

I accept

The Open Science Special Interest Group (OSSIG) is introducing a reproducibility challenge for OHBM 2025. This new initiative aims to enhance the reproducibility of scientific results and foster collaborations between labs. Teams will consist of a “source” party and a “reproducing” party, and will be evaluated on the success of their replication, the openness of the source work, and additional deliverables. Click here for more information. Propose your OHBM abstract(s) as source work for future OHBM meetings by selecting one of the following options:

I do not want to participate in the reproducibility challenge.

Please indicate below if your study was a "resting state" or "task-activation” study.

Resting state

Healthy subjects only or patients (note that patient studies may also involve healthy subjects):

Healthy subjects

Was this research conducted in the United States?

Yes

Are you Internal Review Board (IRB) certified? Please note: Failure to have IRB, if applicable will lead to automatic rejection of abstract.

Not applicable

Were any human subjects research approved by the relevant Institutional Review Board or ethics panel? NOTE: Any human subjects studies without IRB approval will be automatically rejected.

Yes

Were any animal research approved by the relevant IACUC or other animal research panel? NOTE: Any animal studies without IACUC approval will be automatically rejected.

Not applicable

Please indicate which methods were used in your research:

Functional MRI
Structural MRI
Behavior
Neuropsychological testing
Computational modeling

For human MRI, what field strength scanner do you use?

3.0T

Which processing packages did you use for your study?

AFNI
FSL

Provide references using APA citation style.

[1] S. Marek, B., et al. (2022). Reproducible brain-wide association studies require thousands of individuals. Nature. 603, 654–660.
[2] I. M. Johnstone and D. Paul. (2018). PCA in High Dimensions: An orientation. Proc. IEEE Inst. Electr. Electron. Eng. 106, 1277–1292.
[3] L. Aparicio, et al. (2020). A Random Matrix Theory Approach to Denoise Single-Cell Data. Patterns (N Y). 1, 100035.
[4] Couillet, R. and Liao, Z. (2022) Random Matrix Methods for Machine Learning. Cambridge University Press.

UNESCO Institute of Statistics and World Bank Waiver Form

I attest that I currently live, work, or study in a country on the UNESCO Institute of Statistics and World Bank List of Low and Middle Income Countries list provided.

No