Robust Group PCA for Extreme Noise: Subject-Level PCA Done Right

Poster No:

1359 

Submission Type:

Abstract Submission 

Authors:

Samuel Oriola1, Calvin McCurdy1, Bradley Baker2, Vince Calhoun3, Rogers Silva4

Institutions:

1Georgia State University, Atlanta, GA, 2TReNDs, Atlanta, GA, 3GSU/GATech/Emory, Atlanta, GA, 4TReNDS Center, Atlanta, GA

First Author:

Samuel Oriola  
Georgia State University
Atlanta, GA

Co-Author(s):

Calvin McCurdy  
Georgia State University
Atlanta, GA
Bradley Baker  
TReNDs
Atlanta, GA
Vince Calhoun  
GSU/GATech/Emory
Atlanta, GA
Rogers Silva  
TReNDS Center
Atlanta, GA

Introduction:

Functional magnetic resonance imaging (fMRI) captures neural function with high spatial resolution and has driven discoveries in brain connectivity, such as default mode network interactions (Allen, 2011) and dynamic dysconnectivity in schizophrenia (Damaraju, 2014). As fMRI datasets grow, loss of subject-specific details can bias data, misrepresent individuals, limit replication, and exclude minorities. Group-level analysis is challenging because brains, like faces, differ, and warping is imperfect. Group principal component analysis (PCA) aggregates individual data but varies across implementations. Tools like FSL (FMRIB, 2024) and GIFT (MIALAB, 2024) support group PCA and have shaped neuroimaging studies for decades. Yet, the impact of subject variability on group results remains not fully explored. This aims to identify strategies that improve group-level representations for robust analyses. We analyze three GPCA methods (Fig. 1.a): 1) simple concatenation (common with FSL), 2) concatenation with variance normalization, and 3) concatenation with PCA whitening (common with GIFT). Simulated scenarios test these methods to identify optimal approaches for group dimensionality reduction while preserving the ground-truth group mean information.

Methods:

Group PCA extends single-subject PCA to combine spatial maps across individuals, capturing prominent features from group data. To evaluate GPCA performance, we simulated ground-truth spatial maps with 22,341 voxels, 50 principal components (PCs) per subject as sources, and 100 subjects (Fig. 1.b). Each subject's data combines 40 components representing signal sources and 10 representing noise. The similarity between subjects in signal sources is systematically reduced from high in the first source to very low in the fortieth. We manipulated three key parameters. The first, proportion of total variance for signal sources (pS), represents the ratio of signal to total variance in the data. This models real-world scenarios where the ratios of signal and noise are unknown. The second parameter, variance profile (vm), determines how subject contributions are weighted at the group level. The third parameter, threshold for variance retained post subject-level PCA with whitening, sets the number of subject-specific PCs included in the group analysis. Together, these parameters simulate diverse conditions to evaluate GPCA performance in retaining subject-specific variance. By comparing variance retention between GPCA approaches and the ground-truth mean sources, we quantify subject-level variance loss and assess group-level source accuracy. This pragmatic evaluation informs improvements in GPCA methodology.
Supporting Image: OHBM2024_Fig1.png
   ·Methodology
 

Results:

Fig. 2.a shows that the ground-truth mean captures variability well for highly similar sources across subjects (dark green) but performs poorly for dissimilar sources (light green). Errors are largest for the most dissimilar subjects at the "edges" of the plot, revealing that the true group mean can bias inferences about atypical subjects. Fig. 2.b shows that GPCA after normalization matches ground-truth mean performance, except under high noise. GPCA with subject-level whitening underperforms for thresholds ≤0.9. Surprisingly, subject-level whitening without data reduction matches the ground-truth mean and is very robust to noise, though this approach is atypical. These findings suggest default GPCA toolbox settings may need updates.
Supporting Image: OHBM2024_Fig2.png
   ·Results
 

Conclusions:

Variance normalization matches ground-truth mean performance in most cases. Surprisingly, subject-level PCA with whitening and no data reduction performs equally well, even with extreme noise or as few as two subjects (not shown), effectively denoising data. This suggests a new guideline for improving neuroimaging analyses. Future work will explore impacts on published studies and back-reconstruction techniques, as well as the effect of spatial dependence and larger datasets on GPCA performance.

Modeling and Analysis Methods:

Exploratory Modeling and Artifact Removal 1
Methods Development
Multivariate Approaches
Task-Independent and Resting-State Analysis 2
Other Methods

Keywords:

Computing
Data analysis
FUNCTIONAL MRI
Machine Learning
Modeling
Multivariate
Statistical Methods
Other - ICA; Group PCA

1|2Indicates the priority used for review

Abstract Information

By submitting your proposal, you grant permission for the Organization for Human Brain Mapping (OHBM) to distribute your work in any format, including video, audio print and electronic text through OHBM OnDemand, social media channels, the OHBM website, or other electronic publications and media.

I accept

The Open Science Special Interest Group (OSSIG) is introducing a reproducibility challenge for OHBM 2025. This new initiative aims to enhance the reproducibility of scientific results and foster collaborations between labs. Teams will consist of a “source” party and a “reproducing” party, and will be evaluated on the success of their replication, the openness of the source work, and additional deliverables. Click here for more information. Propose your OHBM abstract(s) as source work for future OHBM meetings by selecting one of the following options:

I am submitting this abstract as an original work to be reproduced. I am available to be the “source party” in an upcoming team and consent to have this work listed on the OSSIG website. I agree to be contacted by OSSIG regarding the challenge and may share data used in this abstract with another team.

Please indicate below if your study was a "resting state" or "task-activation” study.

Resting state

Healthy subjects only or patients (note that patient studies may also involve healthy subjects):

Healthy subjects

Was this research conducted in the United States?

Yes

Are you Internal Review Board (IRB) certified? Please note: Failure to have IRB, if applicable will lead to automatic rejection of abstract.

Not applicable

Were any human subjects research approved by the relevant Institutional Review Board or ethics panel? NOTE: Any human subjects studies without IRB approval will be automatically rejected.

Not applicable

Were any animal research approved by the relevant IACUC or other animal research panel? NOTE: Any animal studies without IACUC approval will be automatically rejected.

Not applicable

Please indicate which methods were used in your research:

Functional MRI

Which processing packages did you use for your study?

FSL
Other, Please list  -   GIFT

Provide references using APA citation style.

1) Allen, E. A. (2011). A baseline for the multivariate comparison of resting-state networks. Frontiers. https://www.frontiersin.org/journals/systems neuroscience/articles/10.3389/fnsys.2011.00002/full
2) Damaraju, E. (2014). Dynamic functional connectivity analysis reveals transient states of dysconnectivity in schizophrenia. NeuroImage: Clin, 5, 298–308. https://doi.org/10.1016/j.nicl.2014.07.003
3) FMRIB. (2024). FMRIB Software Library (FSL). https://www.fmrib.ox.ac.uk/fsl
4) MIALAB. (2024). Group ICA of fMRI Toolbox (GIFT). http://trendscenter.org/trends/software/gift

UNESCO Institute of Statistics and World Bank Waiver Form

I attest that I currently live, work, or study in a country on the UNESCO Institute of Statistics and World Bank List of Low and Middle Income Countries list provided.

No