Correct deconfounding can support causal brain-behavioural predictive modeling

Presented During:

Saturday, June 28, 2025: 11:30 AM - 12:45 PM
Brisbane Convention & Exhibition Centre  
Room: P2 (Plaza Level)  

Poster No:

1131 

Submission Type:

Abstract Submission 

Authors:

Vera Komeyer1, Simon Eickhoff2, Charles Rathkopf3, Christian Grefkes4, Kaustubh Patil1, Federico Raimondo1

Institutions:

1Research Center Jülich, Jülich, NRW, 2Research Centre Jülich, Jülich, NRW, 3Research Center Juelich, Juelich, NRW, 4Goethe University Frankfurt and University Hospital Frankfurt, Frankfurt (Main), Hessen

First Author:

Vera Komeyer  
Research Center Jülich
Jülich, NRW

Co-Author(s):

Simon Eickhoff  
Research Centre Jülich
Jülich, NRW
Charles Rathkopf  
Research Center Juelich
Juelich, NRW
Christian Grefkes  
Goethe University Frankfurt and University Hospital Frankfurt
Frankfurt (Main), Hessen
Kaustubh Patil  
Research Center Jülich
Jülich, NRW
Federico Raimondo  
Research Center Jülich
Jülich, NRW

Introduction:

Machine Learning (ML) in neuroscientific research offers opportunities to understand neuronal underpinnings of behaviour in health and disease. While ML applications often aim to advance neuroscientific understanding, they are frequently judged solely based on accuracy, fueling a "performance race" in model development. Problematically, such high accuracies, especially in neuroscience, are often achieved by relying on confounder information. This reliance can exacerbate challenges, including unreliable predictions, non-reproducibility, limited generalizability, and non-interpretability of ML results.
In clinical settings, randomized control trials (RCT) are a well-established tool to mitigate confounding influences to obtain cause-effect insights. In contrast, ML solutions are typically applied to observational data, which require post-hoc statistical confounder control, treating confounding as a purely associative phenomenon. However, distinguishing confounding effects from mediators, colliders or proxies, requires understanding of the directionality of effects, i.e. causal reasoning, to prevent faulty adjustments that may introduce spurious correlations (Hamdan, 2023). Additionally, integrating causal reasoning into neuroscientific ML workflows can facilitate investigation of brain-behavioural cause-effect relationships, akin to RCTs in clinical studies.
Here, using a brain-behavioural predictive example, we illustrate how to leverage domain knowledge to build a Directed Acyclic Graph (DAG) of causal relations and how this DAG can serve as a basis for confounder adjustment in ML analysis, enabling provisional causal insights (Pearl & Mackenzie, 2018).

Methods:

For the predictive example we used data from the UK Biobank (UKB) (Miller, 2016) to predict hand grip strength (HGS) using grey matter volume (GMV) (Wagner, 2022) in 41,180 participants. A linear support vector regression model with heuristic hyperparameter C was employed. Performance was assessed by MAE, R², and Pearson's r. Models were trained using stratified 5-fold cross validation on 80% data and out of sample performance was evaluated on 20% hold-out test data.
Integrating causal reasoning into predictive models relies on identifying and adjusting for a correct set of deconfounders. A causal analysis around the cause-effect relationship of interest, here GMV → HGS, builds the core of our suggested 5-step process, that we here describe both theoretically and practically with the predictive example.

Results:

First, we identified known and plausible direct causes of the target (HGS) using prior research and domain expertise. To construct the DAG in a bottom-up fashion (Fig. 1, step 3), we iteratively assessed the causes of added variables (Fig. 2a). The DAG was considered complete when additional variables add no further information, i.e. when it allowed for identifying a sufficient set of deconfounders that block all non-causal pathways (backdoor criterion (Pearl & Mackenzie, 2018)), in our example sex-hormones and age (Fig. 2b). If crucial deconfounders are unavailable in pre-existing data, conceptual approximators can serve as substitutes with corresponding adjustments made to the DAG (Fig. 1, step 4), e.g. sex as biological approximator for unmeasured sex-hormones (Fig. 2c). Finally one needs to evaluate the statistical relationships of the causally (knowledge) derived deconfounders with features (GMV) and target (HGS) (Fig. 1, step 5). Here, sex and age both correlated with HGS and GMV (Fig. 2d). Only variables that are both statistically and causally relevant must be adjusted for in the predictive model, which here resulted in a performance of r=0.06 (Fig. 2e).
Supporting Image: Fig01_5step-recipe.png
   ·Fig. 1 - 5-step recipe for confounder adjustment in causal predictive modelling
Supporting Image: Fig02_DAGs-HGS.png
   ·Fig. 2 - Practical illustration of 5-step approach with the GMV->HGS predictive example
 

Conclusions:

By combining causal with associative reasoning and implementing proper deconfounding, ML models can provide provisional causal insights, helping to address biomedical "why" questions. The proposed 5-step approach offers actionable guidance for conducting this causal analysis and integrating it into ML workflows.

Modeling and Analysis Methods:

Classification and Predictive Modeling 1
Methods Development 2
Multivariate Approaches
Other Methods

Motor Behavior:

Motor Behavior Other

Keywords:

Computational Neuroscience
Data analysis
Machine Learning
Modeling
Motor
MRI
Multivariate
Statistical Methods
Workflows
Other - deconfounding

1|2Indicates the priority used for review

Abstract Information

By submitting your proposal, you grant permission for the Organization for Human Brain Mapping (OHBM) to distribute your work in any format, including video, audio print and electronic text through OHBM OnDemand, social media channels, the OHBM website, or other electronic publications and media.

I accept

The Open Science Special Interest Group (OSSIG) is introducing a reproducibility challenge for OHBM 2025. This new initiative aims to enhance the reproducibility of scientific results and foster collaborations between labs. Teams will consist of a “source” party and a “reproducing” party, and will be evaluated on the success of their replication, the openness of the source work, and additional deliverables. Click here for more information. Propose your OHBM abstract(s) as source work for future OHBM meetings by selecting one of the following options:

I do not want to participate in the reproducibility challenge.

Please indicate below if your study was a "resting state" or "task-activation” study.

Other

Healthy subjects only or patients (note that patient studies may also involve healthy subjects):

Healthy subjects

Was this research conducted in the United States?

No

Were any human subjects research approved by the relevant Institutional Review Board or ethics panel? NOTE: Any human subjects studies without IRB approval will be automatically rejected.

Yes

Were any animal research approved by the relevant IACUC or other animal research panel? NOTE: Any animal studies without IACUC approval will be automatically rejected.

No

Please indicate which methods were used in your research:

Structural MRI
Behavior

For human MRI, what field strength scanner do you use?

3.0T

Which processing packages did you use for your study?

Other, Please list  -   python

Provide references using APA citation style.

1. Hamdan, S., Love, B. C., von Polier, G. G., Weis, S., Schwender, H., Eickhoff, S. B., & Patil, K. R. (2023). Confound-leakage: Confound removal in machine learning leads to leakage. GigaScience, 12.
2. Miller, K. L., Alfaro-Almagro, F., Bangerter, N. K., Thomas, D. L., Yacoub, E., Xu, J., Bartsch, A. J., Jbabdi, S., Sotiropoulos, S. N., Andersson, J. L. R., Griffanti, L., Douaud, G., Okell, T. W., Weale, P., Dragonu, I., Garratt, S., Hudson, S., Collins, R., Jenkinson, M., … Smith, S. M. (2016). Multimodal population brain imaging in the UK Biobank prospective epidemiological study. Nature Neuroscience, 19(11), 1523–1536. https://doi.org/10.1038/nn.4393
3. Pearl, J., & Mackenzie, D. (2018). The book of why: The new science of cause and effect. Basic Books.
4. Wagner, A. S., Waite, L. K., Wierzba, M., Hoffstaedter, F., Waite, A. Q., Poldrack, B., Eickhoff, S. B., & Hanke, M. (2022). FAIRly big: A framework for computationally reproducible processing of large-scale data. Scientific Data, 9(1), 80.

UNESCO Institute of Statistics and World Bank Waiver Form

I attest that I currently live, work, or study in a country on the UNESCO Institute of Statistics and World Bank List of Low and Middle Income Countries list provided.

No