Systematic review and evaluation of meta-analysis methods for same data meta-analyses in multiverse

Presented During:

Wednesday, June 26, 2024: 11:30 AM - 12:45 PM
COEX  
Room: Hall D 2  

Poster No:

1905 

Submission Type:

Abstract Submission 

Authors:

jeremy Lefort-Besnard1, Thomas Nichols2, Camille Maumet1

Institutions:

1Inria, Empenn team, Rennes, France, 2University of Oxford, Oxford, United Kingdom

First Author:

jeremy Lefort-Besnard  
Inria, Empenn team
Rennes, France

Co-Author(s):

Thomas Nichols  
University of Oxford
Oxford, United Kingdom
Camille Maumet  
Inria, Empenn team
Rennes, France

Introduction:

Researchers using task-fMRI data have access to a wide range of analysis tools to model brain activity. This diversity of analytical approaches has been shown to have substantial effects on neuroimaging results (Botvinik-Nezer et al., 2020; Bowring et al., 2018; Carp, 2012; Glatard et al., 2015). Combined with selective reporting, this analytical flexibility can lead to an inflated rate of false positives and contributes to the irreproducibility of neuroimaging findings (Poldrack et al., 2017). Multiverse analyses are a way to systematically explore and integrate pipeline variation on a given dataset. We focus on the setting where multiple statistic maps are produced as an output of a set of analyses. Meta-analysis is a natural approach to extract consensus inferences from these maps, yet the traditional assumption of independence amongst input datasets does not hold. In this work we consider a suite of methods to conduct meta-analysis in the multiverse setting, accounting for inter-pipeline dependence among the results.

Methods:

We propose several same data meta-analysis (SDMA) methods based on the traditional 'Stouffer' fixed-effects meta-analysis (Stouffer, 1949):
-SDMA Stouffer in which correlation across pipelines is taken into account,
-Consensus SDMA Stouffer and Consensus Average methods, where the combined inference is calibrated to be as similar to the input pipelines as possible, and
-General Least Squares (GLS) SDMA, where inter-pipeline correlation is used to find the statistically optimal combination of pipeline results.
The validity of these models were assessed in a set of simulations. Here, we focus on false positive control in two scenarios: 1/ independent pipelines with no significant results (null case), and 2/ correlated pipelines with no significant results (null correlated case). These meta-analysis models were also evaluated on a real world dataset from NARPS (Botvinik-Nezer et al., 2020), a multiverse analysis with 70 different statistic maps originating from the same data. Finally, given that these SDMA methods assume that the inter-pipeline correlation is the same across the brain, we measured heterogeneity with the Frobenius norm between the whole brain and a set of several brain regions derived from the AAL atlas.

Results:

Simulation results under the null setting of no effect and independent pipelines show that all the tested meta-analysis estimators are valid (Fig 1 top row). However, when data are correlated, as typically observed in a multiverse setting, only the SDMA estimators had valid inferences (Fig 1 lower row), while the conventional meta-analysis approach (Stouffer) dramatically overestimated the number of false-positives. On the real world dataset (Fig 2), the GLS method finds more significant voxels while the 3 other methods all have similar sensitivity. Finally, we found that the Frobenius norm was 0.01% across brain regions, supporting the validity of the consistent correlation assumption.
Supporting Image: fig1_ohbm24.png
   ·Figure 1: comparative P-P plots of the meta-analysis estimators in traditional and multiverse settings.
Supporting Image: fig2_ohbm24.png
   ·Figure 2: significant p-values for each meta-analysis estimator.
 

Conclusions:

We compared several methods for combining multiverse results that account for the dependence among inputs. Our findings demonstrated the validity of the SDMA models under inter-pipeline dependence. As expected, the (traditional) Stouffer's method is liberal while the SDMA methods are all valid and present different levels of significance. These different levels illustrated different types of inference that practitioners can choose based on the assumptions of their analyses. As an illustration, in their work, Botvinik-Nezer and colleagues (2020) implemented the Consensus Average model to combine inferences, striving to align them closely with each of the input pipelines, under the assumption that all pipelines were equally valid and thus contributed equally relevant information. This assumption may not hold true in other multiverse settings.

Modeling and Analysis Methods:

Methods Development 1

Neuroinformatics and Data Sharing:

Databasing and Data Sharing 2
Workflows

Keywords:

Meta- Analysis
MRI
Open Data
Open-Source Code
Other - Multiverse, Reproducibility, Variability, Open Science

1|2Indicates the priority used for review

Provide references using author date format

Botvinik-Nezer, R. (2020). Variability in the analysis of a single neuroimaging dataset by many teams.
Bowring, A. (2018). Same data-different software-different results? Analytic variability of group fmri results. 1–3.
Carp, J. (2012). On the plurality of (methodological) worlds: Estimating the analytic flexibility of FMRI experiments. Frontiers in Neuroscience, 6, 149.
Glatard, T. (2015). Reproducibility of neuroimaging analyses across operating systems. Frontiers in Neuroinformatics, 9, 12.
Poldrack, R. A. (2017). Scanning the horizon: Towards transparent and reproducible neuroimaging research. Nature Reviews Neuroscience, 18(2), 115–126.
Stouffer, S. A. (1949). Adjustment during army life. Studies in social psychology in World War II. Princeton Univ. Press.