Benchmarking 200+ pairwise similarity measures for resting-state functional connectivity estimation

Poster No:


Submission Type:

Abstract Submission 


Connor Lane1, Florian Rupprecht2, Michael Milham1, Gregory Kiar3


1Child Mind Institute, New York, NY, 2Child Mind Institute, Brooklyn, NY, 3Child Mind Institute, Montreal, Quebec

First Author:

Connor Lane  
Child Mind Institute
New York, NY


Florian Rupprecht  
Child Mind Institute
Brooklyn, NY
Michael Milham  
Child Mind Institute
New York, NY
Gregory Kiar  
Child Mind Institute
Montreal, Quebec


A long term goal in brain imaging is to leverage fMRI to inform clinical practice. The current dominant approach for relating fMRI to clinical behavior leverages features based on resting-state functional connectivity [1]. However, the poor reliability of these features is a major barrier to clinical usefulness [2]. Recent efforts to improve functional connectivity based prediction have emphasized the importance of choosing a good brain parcellation and downstream prediction algorithm [3, 4]. In this work, we evaluate the measure of functional connectivity itself. Using the Python Toolkit of Statistics for Pairwise Interactions (PySPI) [5], we evaluate 215 measures for computing generalized functional connectomes. Our goal is to determine whether any of these metrics should replace Pearson correlation as the standard functional connectivity measure used in behavior prediction.


We included six publicly available datasets with multi-session resting state fMRI: BNU-2, HNU-1, NYU-1, and NKI test-retest from the CoRR initiative [6], HCP S1200 test-retest [7], and MSC [8]. Each dataset consists of 10–61 subjects each scanned for 2–10 sessions, across a range of sites, scanner manufacturers, and protocols. We pre-processed each dataset using two analysis pipelines: fMRIPrep using XCP-D and the Glasser parcellation [9-10], and C-PAC using the Schaefer-400 parcellation [11]. After pre-processing, we computed generalized functional connectivity matrices using 215 statistics of pairwise interactions (SPIs) from the PySPI toolkit [5]. While PySPI contains 250+ metrics, the current analysis includes only computationally tractable methods from the "fast" config. Furthermore, we impose time and memory constraints (5 min, 16 GB).

We evaluated the 215 SPIs in terms of two notions of reliability. First, we computed standard within-subject inter-session reliability using ICC. This measure, which we refer to as ICC_ses, measures the reliability of each connectivity edge across sessions for the same subject. Second, to assess how well each SPI captures "meaningful" structure, we also report a measure we call ICC_sub, which measures the consistency of the overall connectome across subjects.


Of the 215 statistics of pairwise interaction (SPI) considered, we find that only 102 compute successfully under the imposed time and memory constraints (Figure 1). The Pearson correlation baseline is among the most resource efficient methods, with some SPIs requiring an order of magnitude more memory and four orders more run time.

Among the 102 SPIs that compute successfully, we observe substantial variability in both inter-session and inter-subject reliability (Figure 2). For both metrics, the Pearson correlation baseline is among the top 10 most reliable SPIs. Furthermore, the numerical differences between the most reliable SPIs are small. However, there appears to be a small numerical advantage under both reliability metrics for regularized covariance estimation methods (OAS, Ledoit-Wolf) and robust correlation statistics (Spearman, Kendall's Tau). We also observe substantial variability between datasets, which is expected considering the variety of data collection conditions. Variation due to pre-processing pipeline is also present, particularly in inter-subject ICC, although to a smaller degree compared to dataset variation.


In this work, we performed a benchmark of 215 measures for estimating functional connectivity over 6 datasets and 2 pre-processing pipelines. Our initial results suggest that none of the measures considered convincingly outperform the de facto standard Pearson correlation measure. Although we do observe weak support for more robust covariance and correlation measures. In continuing work, we plan to add evaluations of downstream behavior prediction performance, as well as expand the set of evaluated measures.

Modeling and Analysis Methods:

Classification and Predictive Modeling 2
Connectivity (eg. functional, effective, structural)
fMRI Connectivity and Network Modeling 1
Task-Independent and Resting-State Analysis


Machine Learning
Other - functional connectivity

1|2Indicates the priority used for review
Supporting Image: fig1_resource_usage.png
Supporting Image: fig2_reliability.png

Provide references using author date format

[1] Chen, J., et al. (2022). Shared and unique brain network features predict cognitive, personality, and mental health scores in the ABCD study. Nature communications.
[2] Milham, M. P., Vogelstein, J., & Xu, T. (2021). Removing the reliability bottleneck in functional magnetic resonance imaging research to achieve clinical utility. JAMA psychiatry.
[3] Kong, R., et al. (2021). Individual-specific areal-level parcellations improve functional connectivity prediction of behavior. Cerebral Cortex.
[4] Dadi, K., et al. (2019). Benchmarking functional connectome-based predictive models for resting-state fMRI. NeuroImage.
[5] Cliff, O. M., Bryant, A. G., Lizier, J. T., Tsuchiya, N., & Fulcher, B. D. (2023). Unifying pairwise interactions in complex dynamics. Nature Computational Science.
[6] Zuo, X. et al. (2014). An open science resource for establishing reliability and reproducibility in functional connectomics. Scientific data.
[7] Van Essen, D. C., et al. (2013). The WU-Minn human connectome project: an overview. Neuroimage.
[8] Gordon, E. M., et al. (2017). Precision functional mapping of individual human brains. Neuron.
[9] Esteban, O., et al. (2019). fMRIPrep: a robust preprocessing pipeline for functional MRI. Nature methods.
[10] Adebimpe, A., et al. (2023). XCP-D: A Robust Postprocessing Pipeline of fMRI data.
[11] Craddock, C., et al. (2013). Towards automated analysis of connectomes: The configurable pipeline for the analysis of connectomes (c-pac). Front Neuroinform.