Beyond blobology: advances in statistical inference for neuroimaging

Wouter Weeda Organizer
Leiden University
Samuel Davenport Co Organizer
University of Calfornia San Diego
LA Jolla, San Diego, CA 
United States
Thomas Nichols Co Organizer
Oxford, Oxford 
United Kingdom
Bertrand Thirion Co Organizer
Saturday, Jul 22: 8:00 AM - 5:00 PM
Educational Course - Full Day (8 hours) 
Room: 511CF 
Even long before a dead salmon showed a neural response to emotional cues, multiple comparison correction has been a hot topic in brain imaging. However, since the inception of statistical parametric mapping the most common methods of statistical inference have not changed significantly, despite recent criticisms. These main criticisms revolve around the setting of an arbitrary ‘cluster-forming’ threshold, and the common misinterpretation of significant clusters being ‘fully active’, while inference only allows the claim: ‘there is at least one active voxel within this cluster’. Furthermore, given the recent increase in large scale studies, cluster-extent analysis is not able to cope properly with these increased number of subjects, leading to uninformative analyses. While these criticisms have been around for quite a while, new methods addressing these issues do not seem to be adopted by the community to a large extent.

Recent advances in mathematics and statistics have led to new methods overcoming these criticisms. These methods allow for valid inference on an indefinite number of clusters – giving them a principled way to choose the correct threshold – and give information on where activity within a cluster is located. The flexibility of these methods may seem strange to the user in practice, as they allow for an ‘exploratory’ analysis of different thresholds, until one is happy with the results, while allowing ‘confirmatory’ hypotheses to be tested. Also, by focusing on effect-sizes, instead of p-values, interpretable analyses of large-scale datasets become possible.

The main aim of this full-day educational course is therefore two-fold. First, to give an overview of the latest advances in statistical inference in neuroimaging. We aim to provide a comprehensive overview of the latest methods, reviewing theory, and explaining with practical examples how these methods work. And two, to focus on application of these methods: how to apply these methods in practice, how to incorporate them in your pipeline, and how to interpret results.


At the end of the course the participants will (i) have knowledge of the problems with ‘classical’ inference (ii) have knowledge of recent advances in inference methods, specifically True/False Discovery Proportion (TDP/FDP) based methods, Joint Error Rate control methods, Spatial Confidence Sets, and advanced RFT methods, and (iii) be able to perform and interpret outcomes of these analyses. 

Target Audience

Neuroimaging researchers using (functional) MRI. The course is explicitly aimed at all researchers, from any level, doing (functional) MRI analysis. 


Classical inference and its caveats

At the moment the main method for statistical inference in neuroimaging is still based on classical null-hypothesis testing using clusters as the main object of inference: If a cluster (a blob of connected voxels) is larger than a certain size this cluster is said to be active. The size of the cluster above which it is deemed significant is usually based on permutations or random field theory. There are two main criticisms of cluster-extent inference: (i) the method depends on an arbitrary ‘cluster-forming’ threshold, with no principled way of choosing the correct threshold and (ii) the interpretation of clusters is prone to the spatial specificity paradox where the larger the cluster found, the less we can say about the specific location of activity within the cluster. In addition, standard cluster inference is not able to cope with large numbers of subjects in an interpretable manner. In this talk I will give an overview of classical cluster-inference and all its different flavors, how it differs (or is similar) from voxelwise inference, and what the caveats of this method are. 


Wouter Weeda, Leiden University Leiden

Cluster failure or power failure? Empirically evaluating sensitivity and specificity of classical fMRI inference

The caveats of cluster-based inference are now becoming more widely appreciated in the field, with the advent of sufficiently advanced computational methods and large datasets to empirically demonstrate their limitations. In this talk, I will describe new advances in empirically measuring error rates (i.e., sensitivity and specificity) for fMRI inference and show how error rates may be lower than desired for typical research goals. I will touch upon a few fundamental problems and solutions (i.e., broad-scale effects, FDR correction). This talk provides a modern introduction to how we can empirically quantify error rates, points to a few simple paths forward, and sets the stage for subsequent talks discussing solutions for fMRI inference. 


Stephanie Noble, PhD, Yale University
Radiology & Biomedical Imaging
New Haven, CT 
United States

False / True Discovery Proportion based methods primer

To solve the spatial specificity paradox and improve cluster inference we need alternative methods that are more quantitative and more flexible. Quantitative methods do not only infer that signal is present in a cluster, but also quantify how widespread that signal is. Flexible methods allow drilling down into subclusters, as well as zooming out to superclusters to investigate clusters at multiple resolutions simultaneously. Quantitative and flexible methodology for cluster inference can be done though FDP/TDP based methods. TDP (True Discovery Proportion) is the fraction of active voxels in a cluster. FDP (False Discovery Proportion) is its complement, the fraction of inactive voxels. FDP/TDP methods give a simultaneous lower bound for TDP (upper bound for FDP) for all subsets of the brain. These methods allow users to explore the brain, find interesting clusters or anatomical regions post hoc, and report TDP or FDP for these regions. We will discuss some basic TDP/FDP methods, the way such methods can solve the double dipping problem and the spatial specificity paradox, and explain how TDP/FDP can be an alternative to the p-value when reporting on significant clusters. 


Jelle Goeman, Leiden University Medical Center Leiden, ZH 

Non-parametric templates (theory) and Introducing the Sansouci package and Notip (practical)

Theoretical Session: Non-parametric templates

The weak information conveyed by standard cluster-level inference has motivated the use of post hoc estimates that allow statistically valid estimation of the proportion of activated voxels in clusters. In the context of fMRI data, the All-Resolutions Inference framework provides post hoc estimates of the proportion of activated voxels. However, this method relies on parametric threshold families, which results in conservative inference. In this talk, we will show how to leverage randomization methods to adapt to data characteristics and obtain tighter false discovery control. This leads to Notip, for Non-parametric True Discovery Proportion control: a powerful, non-parametric method that yields statistically valid guarantees on the proportion of activated voxels in data-derived clusters. Numerical experiments demonstrate substantial gains in the number of detections compared with state-of-the-art methods on dozens of fMRI datasets. The conditions under which the proposed method brings benefits are also discussed.

Practical Session: Introducing the Sansouci package and Notip.

We will propose a hands-on session to probe Notip, guaranteeing FDP control in a Python software suite for brain imaging.



Alexandre Blain, PhD student, Inria
Parietal Team
Palaiseau, Essonne 

Inference in General Linear Models and Generalized Linear Models (theory) and TDP Inference in regression (practical).

Theoretical Session: Inference in General Linear Models and Generalized Linear Models

In this session we will discuss how to extend post hoc inference for the False Discovery Proportion (FDP) to general linear and generalized linear models (GLMs). To do so we shall first give an overview of methods for resampling in linear models and how they can be used to perform multiple testing. We will show how these methods can be generalized to GLMs via sign-flipping of the score contributions. In each case we will show how resampling can be combined with post hoc inference bounds to provide simultaneous asymptotic control of the FDP over all subsets of hypotheses. We will demonstrate that resampling based approaches have a higher power than parametric methods in this context. We will use the HCP data to demonstrate how these methods can be applied in practice.

Practical Session: TDP Inference in regression

A practical demonstrating how to perform resampling in general and generalized linear models and in particular how to incorporate this with TDP inference. We will introduce the pyperm python package and demonstrate how it can be used to perform multiple testing by combining it with the sansouci package of the previous session. Example applications to brain imaging datasets will be included. 


Samuel Davenport, University of Calfornia San Diego LA Jolla, San Diego, CA 
United States

Spatial inference via confidence sets (theory)

With datasets like ABCD and UK Biobank, the power is so large for some effects that every voxel/element will be significant even by stringent multiple testing corrections, yet we still may want to assess questions of spatial inference related to practical significance. For example: Where is there at least a 1% BOLD change? Where is there a Cohen’s d of 0.1 or larger? In this talk we review methods for confidence sets, a 3D-analog of confidence intervals: For each cluster we obtain ‘outer’ and ‘inner’ clusters that provide a notion of spatial confidence on where the true, noise-free signal exceeds the cluster-forming threshold. We will discuss the practical resampling methods that are used to produce these spatial confidence set and illustrate the approach with several examples.  


Thomas Nichols, Oxford Oxford, Oxford 
United Kingdom

Spatial Bayesian models

Recent advances in computing power, Bayesian methods and spatial statistics have paved the road for moving beyond massive univariate analysis to models that account for spatial dependence across voxels/vertices. In spatial Bayesian models, a multivariate prior distribution encodes expected similarities in activation patterns between neighboring locations, resulting in higher accuracy and power. A major advantage of these models is that the joint posterior distribution across locations can be used to identify a collection of locations that are jointly activated with some specified posterior probability. This circumvents the need to correct for multiple comparisons and dramatically increases power to detect effects. Power is often sufficiently high even in single-subject datasets so that effect sizes (e.g. 1% signal change) can be considered. I will provide an overview of these models, explain the use of the joint posterior distribution for inference, and illustrate their application to HCP data. 


Amanda Mejia, PhD, Indiana University
Department of Statistics
Bloomington, IN 
United States

The All-Resolutions Inference framework

In this session the All-Resolutions Inference (ARI) framework will be explained. ARI allows for a flexible and interactive analysis of fMRI results with full family-wise error-rate (FWER) control. That is, you can interactively change the size/shape of clusters until you are happy with the resulting TDP, all with full FWER control (i.e. allowing researchers to test confirmatory hypotheses). In this session we will use both this interactive approach, and a more data-driven approach that searches for the largest clusters with a certain TDP, using an R, Python, or Matlab implementation. 


Xu Chen, Leiden University Medical Center Leiden, NH 

Spatial inference via confidence sets (practical)

Traditionally, uncertainty estimation in fMRI inference has primarily been concerned with how signal magnitude at each specific voxel varies under repeated sampling. However, very little attention is given to the variability in signal location. In this session, we shall provide a practical introduction to spatial confidence regions; regions which act as probabilistic bounds for the locale of observed clusters and excursion sets.

The session shall cover the generation of confidence regions for excursion sets derived from %BOLD maps, standardized (Cohen’s D) effect size images, and conjunctions (overlaps) for both, and will use Jupyter notebooks to demonstrate a Python toolbox for confidence regions on a range of datasets. By the end of this workshop, participants should have a better understanding of why spatial confidence regions may be used to quantify uncertainty in fMRI inference and how to apply this method in practice. 


Thomas Maullin-Sapey, University of Oxford
University of Oxford
Oxford, Oxfordshire 
United Kingdom