Poster No:
1574
Submission Type:
Abstract Submission
Authors:
Amanda Buch1, Logan Grosenick1, Conor Liston1
Institutions:
1Weill Cornell Medicine, Cornell University, New York, NY
First Author:
Co-Author(s):
Introduction:
Reproducibility is a cornerstone of scientific research, yet it remains a significant challenge when working with high-dimensional datasets typical of neuroimaging studies. Such datasets often feature more variables than observations, a regime known as the Large Dimensional Limit (LDL). While LDL is central to fields ranging from neuroimaging to genomics, its impact on reproducibility and statistical power is poorly understood. This knowledge gap has led to controversies in high-profile brain-wide association studies and neuroimaging genomic studies [1], highlighting the urgent need for a theoretical framework.
Methods:
We empirically and theoretically demonstrate that reproducibility in LDL problems adheres to "Universal Reproducibility Laws" (URLs), which precisely quantify the sample size required to ensure robust findings for given signal strength and dimensionality. Building on Random Matrix Theory (RMT) [2–4], our framework identifies critical "reproducibility phase transitions," providing tools for prospective sample size determination and power analyses (Fig. 1). We validate these URLs using neuroimaging and behavioral data from human (Fig. 1-2; ABCD functional MRI, CBCL, NIH Toolbox datasets) and animal studies (not shown), leveraging robust cross-validation and regularization techniques across 13.876 million model fits.

·Estimating robust effect sizes and separating signal from noise using URLs.
Results:
In brain-wide association studies (BWAS), we find that reproducible associations require far fewer samples than previously suggested using the ABCD dataset when relating functional MRI to the CBCL scale and NIH Toolbox scales (Fig. 2). For cognitive abilities, robust reproducibility (p < 0.01) is achieved with as few as 22 subjects using Support Vector Regression (SVR) or 16 with Regularized Canonical Correlation Analysis (RCCA). For psychopathology measures, 550 (SVR) and 230 (RCCA) subjects suffice, with diminishing returns observed beyond a few hundred samples. This "reproducibility phase transition" aligns with RMT predictions, showing rapid initial gains followed by logarithmic improvements in reproducibility as sample size increases.

·BWAS studies show reproducibility phase transitions with smaller sample sizes than previously reported.
Conclusions:
Our findings challenge the prevailing notion that thousands of participants are necessary for BWAS reproducibility, offering a more attainable path forward for studies in precision psychiatry and neuroscience. By mathematically linking multivariate neuroimaging analyses to statistical physics, we present a unified framework for reproducible machine learning in high-dimensional biomedical datasets. Open-source tools accompanying this work empower researchers to design efficient, reproducible studies across disciplines, mitigating barriers posed by recruitment and cost.
Modeling and Analysis Methods:
Methods Development 1
Multivariate Approaches 2
Keywords:
Computational Neuroscience
Data analysis
Experimental Design
FUNCTIONAL MRI
Machine Learning
Psychiatric Disorders
Statistical Methods
Other - Brain-behavior
1|2Indicates the priority used for review
By submitting your proposal, you grant permission for the Organization for Human Brain Mapping (OHBM) to distribute your work in any format, including video, audio print and electronic text through OHBM OnDemand, social media channels, the OHBM website, or other electronic publications and media.
I accept
The Open Science Special Interest Group (OSSIG) is introducing a reproducibility challenge for OHBM 2025. This new initiative aims to enhance the reproducibility of scientific results and foster collaborations between labs. Teams will consist of a “source” party and a “reproducing” party, and will be evaluated on the success of their replication, the openness of the source work, and additional deliverables. Click here for more information.
Propose your OHBM abstract(s) as source work for future OHBM meetings by selecting one of the following options:
I do not want to participate in the reproducibility challenge.
Please indicate below if your study was a "resting state" or "task-activation” study.
Resting state
Healthy subjects only or patients (note that patient studies may also involve healthy subjects):
Healthy subjects
Was this research conducted in the United States?
Yes
Are you Internal Review Board (IRB) certified?
Please note: Failure to have IRB, if applicable will lead to automatic rejection of abstract.
Not applicable
Were any human subjects research approved by the relevant Institutional Review Board or ethics panel?
NOTE: Any human subjects studies without IRB approval will be automatically rejected.
Yes
Were any animal research approved by the relevant IACUC or other animal research panel?
NOTE: Any animal studies without IACUC approval will be automatically rejected.
Not applicable
Please indicate which methods were used in your research:
Functional MRI
Structural MRI
Behavior
Neuropsychological testing
Computational modeling
For human MRI, what field strength scanner do you use?
3.0T
Which processing packages did you use for your study?
AFNI
FSL
Provide references using APA citation style.
[1] S. Marek, B., et al. (2022). Reproducible brain-wide association studies require thousands of individuals. Nature. 603, 654–660.
[2] I. M. Johnstone and D. Paul. (2018). PCA in High Dimensions: An orientation. Proc. IEEE Inst. Electr. Electron. Eng. 106, 1277–1292.
[3] L. Aparicio, et al. (2020). A Random Matrix Theory Approach to Denoise Single-Cell Data. Patterns (N Y). 1, 100035.
[4] Couillet, R. and Liao, Z. (2022) Random Matrix Methods for Machine Learning. Cambridge University Press.
No