Big Data, Small Bias: Harmonizing Structural Connectomes to Mitigate Site Bias in Data Integration

Poster No:

3274 

Submission Type:

Abstract Submission 

Authors:

Rui Shen1, Drew Parker1, Andrew Chen2, Benjamin Yerys3, Birkan Tunç3, Timothy Roberts3, Russell Shinohara1, Ragini Verma1

Institutions:

1University of Pennsylvania, Philadelphia, PA, 2Medical University of South Carolina, Charleston, SC, 3Children’s Hospital of Philadelphia, Philadelphia, PA

First Author:

Rui Shen  
University of Pennsylvania
Philadelphia, PA

Co-Author(s):

Drew Parker  
University of Pennsylvania
Philadelphia, PA
Andrew Chen  
Medical University of South Carolina
Charleston, SC
Benjamin Yerys  
Children’s Hospital of Philadelphia
Philadelphia, PA
Birkan Tunç  
Children’s Hospital of Philadelphia
Philadelphia, PA
Timothy Roberts  
Children’s Hospital of Philadelphia
Philadelphia, PA
Russell Shinohara  
University of Pennsylvania
Philadelphia, PA
Ragini Verma  
University of Pennsylvania
Philadelphia, PA

Introduction:

Structural connectomes are commonly used to investigate connectivity changes related to various disorders. However, small sample sizes in individual studies and highly heterogeneous disorder-related manifestations underscore the need to pool datasets across multiple studies to identify coherent and generalizable patterns linked to disorders. Yet, combining datasets introduces site bias due to variations in scanner hardware or acquisitions. This highlights the necessity for data harmonization to mitigate site bias while preserving the biological integrity associated with participant demographics and the disorders. While several paradigms exist for harmonizing normally distributed imaging data, this study represents the first effort to establish a harmonization framework specifically for structural connectomes.

Methods:

Common harmonization methods such as ComBat and CovBat assume a normal distribution and are therefore unsuitable for structural connectomes, where most edges (defined by streamline counts) are highly skewed. We explored several statistical models to develop a tailored framework specifically for structural connectomes. We pooled structural connectomes from 6 datasets and created 4 data configurations by combining the cohorts in various ways. A total of 1503 participants (890 males, 613 females) were involved, comprising 1194 neurotypicals (NT) and 309 with autism. Each dataset is detailed in Fig.1A.

Firstly, we applied a logarithmic transformation to skewed edges before ComBat/CovBat and restored harmonized weights with exponentiation, namely log-ComBat and log-CovBat. Alternatively, we modeled edge values using a gamma-distributed generalized linear model (gamma-GLM) with a log link, incorporating site, sex, age, and age^2 as covariates.

Fig.1B shows the overview of the evaluation framework. Harmonization is considered successful if it removed site effects at edge-, node-, and global levels while preserving the biological variability. We used the Kruskal-Wallis test to identify edgewise site effects before and after harmonization. One-way ANOVA test was used to evaluate site effects in 6 global graph measures and 4 nodal measures. Sex, age, and age^2 were controlled. Moreover, we assessed the replicability of Spearman correlations between age and edge values before and after harmonization. We also evaluated the case in presence of substantial confounds between age and sites. Finally, we assessed the ability of harmonization in enhancing the generalizability of machine learning models to new sites and increasing statistical power for detecting group differences between autism and NT.
Supporting Image: Fig1.jpg
   ·Overview of the harmonization and evaluation framework for structural connectomes
 

Results:

We observed striking site effects in edgewise connectivity values before harmonization, which were largely reduced after harmonization. The gamma-GLM model outperformed other methods (Fig.2A). Specifically, 1035 (70%) edges initially showed significant site effects pre-harmonization. None of them remained significant after gamma-GLM.
All 6 global measures showed significant site effects pre-harmonization. After ComBat, site effects on characteristic path length and global efficiency remained significant. All other methods addressed site effects on tested global measures, while gamma-GLM showed the best performance, yielding the smallest effect sizes (Fig.2B).
The gamma-GLM performed the best in the replicability of age associations at each site, showing an almost flat CAT curve near 1 and recovered all correlations between age and edgewise connectivity. In the presence of confounds, the performance of gamma-GLM remained robust (Fig.2C).
In two use cases, we showed gamma-GLM effectively enhanced the generalizability of predictive models to unseen data at new sites and resolved significant differences in structural connectomes between diagnostic groups that were previously undetectable.
Supporting Image: Fig2.jpg
   ·Performance of harmonization models
 

Conclusions:

We recommend the gamma-GLM to harmonize structural connectomes, as it outperformed other models in reducing site bias while preserving biological integrity.

Modeling and Analysis Methods:

Connectivity (eg. functional, effective, structural) 1
Diffusion MRI Modeling and Analysis
Methods Development 2

Keywords:

Autism
Computational Neuroscience
Data analysis
Machine Learning
Modeling
MRI
Tractography
WHITE MATTER IMAGING - DTI, HARDI, DSI, ETC

1|2Indicates the priority used for review

Abstract Information

By submitting your proposal, you grant permission for the Organization for Human Brain Mapping (OHBM) to distribute your work in any format, including video, audio print and electronic text through OHBM OnDemand, social media channels, the OHBM website, or other electronic publications and media.

I accept

The Open Science Special Interest Group (OSSIG) is introducing a reproducibility challenge for OHBM 2025. This new initiative aims to enhance the reproducibility of scientific results and foster collaborations between labs. Teams will consist of a “source” party and a “reproducing” party, and will be evaluated on the success of their replication, the openness of the source work, and additional deliverables. Click here for more information. Propose your OHBM abstract(s) as source work for future OHBM meetings by selecting one of the following options:

I am submitting this abstract as an original work to be reproduced. I am available to be the “source party” in an upcoming team and consent to have this work listed on the OSSIG website. I agree to be contacted by OSSIG regarding the challenge and may share data used in this abstract with another team.

Please indicate below if your study was a "resting state" or "task-activation” study.

Other

Healthy subjects only or patients (note that patient studies may also involve healthy subjects):

Patients

Was this research conducted in the United States?

Yes

Are you Internal Review Board (IRB) certified? Please note: Failure to have IRB, if applicable will lead to automatic rejection of abstract.

Yes, I have IRB or AUCC approval

Were any human subjects research approved by the relevant Institutional Review Board or ethics panel? NOTE: Any human subjects studies without IRB approval will be automatically rejected.

Yes

Were any animal research approved by the relevant IACUC or other animal research panel? NOTE: Any animal studies without IACUC approval will be automatically rejected.

Not applicable

Please indicate which methods were used in your research:

Structural MRI
Diffusion MRI
Computational modeling

For human MRI, what field strength scanner do you use?

3.0T

Which processing packages did you use for your study?

Free Surfer
Other, Please list  -   mrtrix3

Provide references using APA citation style.

not applicable

UNESCO Institute of Statistics and World Bank Waiver Form

I attest that I currently live, work, or study in a country on the UNESCO Institute of Statistics and World Bank List of Low and Middle Income Countries list provided.

No