Sample size, variance sources, and group-level covariance analysis

Christian Habeck Presenter
Columbia University
New York, NY 
United States
 
Monday, Jun 24: 9:00 AM - 10:15 AM
Symposium 
COEX 
Room: Grand Ballroom 103 
Inter-subject covariance analysis is indispensable for the analysis of cross-sectional data sets and has enjoyed successful applications in neuroimaging analytics with out-of-sample prediction going to the late 1980s. Inter-subject covariance is at the heart of many applications, giving rise to a multitude of clustering and multivariate-decomposition approaches. Current studies focus on the derivation of group-level covariance patterns without fitting deeper model architectures that might be prone to overfitting. Since prediction of endpoints such as cognitive performance or diagnostic scores is only one side of the coin of imaging neuroscience, studies have also focused on the precision and accuracy of the derived activation/connectivity patterns. This talk will present the results of a variety of data modalities (task-based activation with a number of subjects of the order of hundreds, United Kingdom Biobank volumetric data with a number of subjects of the order of thousands, and simulated synthetic data) that were used to probe the stability of estimated covariance patterns and out-of-sample prediction as a function of sample size. While increasing sample size resulted in monotonic improvement of derived pattern stability and out-of-sample endpoint prediction, asymptotic limits were reached for all metrics before exhausting the available data. This contradicts the assumption that any targeted group-level central tendency in data gets ever sharper with minimization of noise sources with a higher number of subjects. Instead, irreducible heterogeneity imposes limits on analytic frameworks oriented towards group-level covariance patterns. A simple toy model of synthetic covariance data with different noise parameters can elucidate the behaviour for topographic stability and held-out outcome prediction for real-world data under variation of training sample sizes. For medium sample size, in the order of hundreds of subjects, inter-subject covariance analysis performs well. For larger datasets, it represents a useful benchmark for more sophisticated deep-learning architectures that allow person-specific variation in topographic patterns.