Poster No:
1077
Submission Type:
Abstract Submission
Authors:
Emma Prevot1, Thomas Nichols1, Chris Holmes1, Habib Ganjgahi1
Institutions:
1University of Oxford, Oxford, Oxfordshire
First Author:
Co-Author(s):
Introduction:
Magnetic Resonance Imaging (MRI) harmonization is crucial for reducing scanner effects - non-biological variation introduced by differences in equipment and acquisition protocols - enabling data integration in large multi-site studies. While ComBat [1] is widely used it has two fundamental limitations: reliance on linear modelling and dependence on scanner IDs. For instance, research shows that brain structures often follow complex, non-linear patterns of maturation and age-related changes [2]. Harmonization methods like ComBat, which assume a linear model, are inherently limited. Additionally, ComBat's simple linear modeling approach for harmonization may lead to an excessive false positive rate (Fig. 1b). Further, we have found that scanner IDs alone cannot capture the rich variability in image quality metrics (IQMs), which reflect detailed scanner-specific properties. Studies like Neuroharmony [3] have shown the value of IQMs in harmonization, but still rely on ComBat corrections. Finally, in anonymized datasets, scanner IDs may not always be available, and ComBat also struggles when there are only a handful of subjects from a scanner.
In this abstract, we introduce BARTharm, a novel harmonization method that uses Bayesian Additive Regression Trees (BART) [4] to model biological and scanner effects non-linearly, leveraging IQMs instead of scanner IDs. We demonstrate its superiority and effectiveness under various challenging scenarios such as non-linearity, model misspecification, and strong correlation, using synthetic and real-world NO.MS Multiple Sclerosis data [5,6].
Methods:
BARTharm adjusts the images for site effects using BART, a non-parametric modelling approach which approximates the relationship between covariates and outcome as a sum of regression trees. By combining the predictions of multiple trees, BART creates a flexible ensemble model that captures complex, non-linear relationships [4]. Building on this foundation, we draw inspiration from the Bayesian Causal Forest (BCF) [7] and separately model the effect of the biological covariates (aka covariates of interest) from the effect of
IQMs (see Figure 1). After fitting our model (Fig. 1, Eq. 1-4), we obtain an estimate for the scanner effect and the biological effect terms. We evaluate the clean, harmonized data by removing the estimated scanner effect from the observed outcome (Fig. 1, Eq. 5).
Results:
In Simulation 1 (Fig. 1b), we simulated data to evaluate BARTharm and ComBat under scenarios with non-linear relationships and complex scanner effects, demonstrating BARTharm's clear superiority, especially in non-linear settings, and highlighting that relying solely on scanner IDs cannot capture the inherent variability of scanner effects. Simulation 2 (Fig. 2a) tested model misspecification, i.e., fitting both models with only 50% of biological covariates, and varying correlations between biological and scanner effects. We used real covariates from NO.MS data with simulated scanner IDs and outcomes. Across all experiments, BARTharm more effectively retrieved the true biological signal. On real data (Fig. 2b), we applied BARTharm to the NO.MS dataset, harmonizing 29 brain image-derived phenotypes (IDPs). The IQMs were extracted with MRIQC from the MRI data [8]. For anonymization, the NO.MS dataset doesn't include scanner IDs, which underscores the advantage of BARTharm's ability to perform harmonization without relying on scanner IDs. By successfully removing IQM-induced effects, BARTHarm led to improved associations between brain IDPs and EDSS (disability measure) in linear models.

Conclusions:
We demonstrated how BARTharm outperforms ComBat by effectively capturing non-linear biological signals and removing scanner effects using information-rich IQMs, even in challenging scenarios like model misspecification and strong correlation. BARTHarm can also be run voxel-wise with parallel computation, yielding significant improvements over ComBat, despite computational disadvantage.
Modeling and Analysis Methods:
Bayesian Modeling 1
Classification and Predictive Modeling
Methods Development 2
Univariate Modeling
Keywords:
Computing
Data analysis
MRI
Statistical Methods
Univariate
Other - Harmonization, ComBat
1|2Indicates the priority used for review
By submitting your proposal, you grant permission for the Organization for Human Brain Mapping (OHBM) to distribute your work in any format, including video, audio print and electronic text through OHBM OnDemand, social media channels, the OHBM website, or other electronic publications and media.
I accept
The Open Science Special Interest Group (OSSIG) is introducing a reproducibility challenge for OHBM 2025. This new initiative aims to enhance the reproducibility of scientific results and foster collaborations between labs. Teams will consist of a “source” party and a “reproducing” party, and will be evaluated on the success of their replication, the openness of the source work, and additional deliverables. Click here for more information.
Propose your OHBM abstract(s) as source work for future OHBM meetings by selecting one of the following options:
I do not want to participate in the reproducibility challenge.
Please indicate below if your study was a "resting state" or "task-activation” study.
Resting state
Task-activation
Other
Healthy subjects only or patients (note that patient studies may also involve healthy subjects):
Patients
Was this research conducted in the United States?
No
Were any human subjects research approved by the relevant Institutional Review Board or ethics panel?
NOTE: Any human subjects studies without IRB approval will be automatically rejected.
Yes
Were any animal research approved by the relevant IACUC or other animal research panel?
NOTE: Any animal studies without IACUC approval will be automatically rejected.
Not applicable
Please indicate which methods were used in your research:
Structural MRI
Computational modeling
For human MRI, what field strength scanner do you use?
1.5T
3.0T
Which processing packages did you use for your study?
FSL
Provide references using APA citation style.
[1] Fortin, J. P., Cullen, N., Sheline, Y. I., Taylor, W. D., Aselcioglu, I., Cook, P. A., Adams, P., Cooper, C., Fava, M., McGrath, P. J., McInnis, M., Phillips, M. L., Trivedi, M. H., Weissman, M. M., & Shinohara, R. T. (2018). Harmonization of cortical thickness measurements across scanners and sites. NeuroImage, 167, 104–120.
[2] Fjell, A. M., Walhovd, K. B., Westlye, L. T., Østby, Y., Tamnes, C. K., Jernigan, T. L., Gamst, A., & Dale, A. M. (2010). When does brain aging accelerate? Dangers of quadratic fits in cross-sectional studies. NeuroImage, 50(4), 1376–1383.
[3] Garcia-Dias, R., Scarpazza, C., Baecker, L., Vieira, S., Pinaya, W. H. L., Corvin, A., Redolfi, A., Nelson, B., Crespo-Facorro, B., McDonald, C., Tordesillas-Gutiérrez, D., Cannon, D., Mothersill, D., Hernaus, D., Morris, D., Setien-Suero, E., Donohoe, G., Frisoni, G., Tronchin, G., ... Mechelli, A. (2020). Neuroharmony: A new tool for harmonizing volumetric MRI data from unseen scanners. NeuroImage, 220, Article 117127.
[4] Chipman, H. A., George, E. I., & McCulloch, R. E. (2010). BART: Bayesian additive regression trees. The Annals of Applied Statistics, 4(1), 266–298.
[5] Dahlke, F., Arnold, D. L., Aarden, P., Ganjgahi, H., Häring, D. A., Čuklina, J., Nichols, T. E., Gardiner, S., Bermel, R., & Wiendl, H. (2021). Characterization of MS phenotypes across the age span using a novel data set integrating 34 clinical trials (NO.MS cohort): Age is a key contributor to presentation. Multiple Sclerosis, 27(13), 2062–2076.
[6] Mallon, A. M., Häring, D. A., Dahlke, F., Aarden, P., Afyouni, S., Delbarre, D., El Emam, K., Ganjgahi, H., Gardiner, S., Kwok, C. H., West, D. M., Straiton, E., Haemmerle, S., Huffman, A., Hofmann, T., Kelly, L. J., Krusche, P., Laramee, M. C., Lheritier, K., ... Holmes, C. (2021). Advancing data science in drug development through an innovative computational framework for data sharing and statistical analysis. BMC Medical Research Methodology, 21(1), Article 250.
[7] Hahn, P. R., Murray, J. S., & Carvalho, C. M. (2020). Bayesian regression tree models for causal inference: Regularization, confounding, and heterogeneous effects (with discussion). Bayesian Analysis, 15(3), 965–2020.
[8] Esteban O, Birman D, Schaer M, Koyejo OO, Poldrack RA, Gorgolewski KJ; MRIQC: Advancing the Automatic Prediction of Image Quality in MRI from Unseen Sites; PLOS ONE 12(9):e0184661.
No