Nipoppy: a framework for the organization and decentralized processing of neuroimaging-clinical data

Presented During:

Tuesday, June 25, 2024: 12:00 PM - 1:15 PM
COEX  
Room: Grand Ballroom 103  

Poster No:

2256 

Submission Type:

Abstract Submission 

Authors:

Michelle Wang1, Nikhil Bhagwat1, Brent McPherson1, Alyssa Dai1, Rémi Gau1, Qing Wang2, Jean-Baptiste Poline1

Institutions:

1McGill University, Montreal, Canada, 2Shanghai Mental Health Center, Shanghai, China

First Author:

Michelle Wang  
McGill University
Montreal, Canada

Co-Author(s):

Nikhil Bhagwat  
McGill University
Montreal, Canada
Brent McPherson  
McGill University
Montreal, Canada
Alyssa Dai  
McGill University
Montreal, Canada
Rémi Gau  
McGill University
Montreal, Canada
Qing Wang  
Shanghai Mental Health Center
Shanghai, China
Jean-Baptiste Poline  
McGill University
Montreal, Canada

Introduction:

Many of the existing software platforms for reproducible neuroimaging data processing are centralized (i.e. requiring data to be uploaded to a third-party server) [1,2], which is not always possible due to concerns about data privacy and ownership. Moreover, processing of prospective studies with ongoing data collection is challenging: since different software tools and versions can produce different results [3], care needs to be taken to ensure that the new data are processed in the same way as the already processed data. We introduce Nipoppy, a collaborative and open framework that can help achieve decentralized processing of ongoing studies with neuroimaging and clinical data. Nipoppy aims to facilitate every stage of data organization and processing, be flexible and extensible to handle various types of datasets and pipelines, and promote methods transparency and reusability of neuroimaging-clinical data.

Methods:

Nipoppy specifies a standardized workflow for dataset processing and organization, including a specification covering tabular as well as raw and derived imaging data (Fig. 1). The framework is based around two user-provided files: a configuration file for running analyses and a manifest file listing the participant IDs, visits, and imaging datatypes available in the dataset. A series of automatically-generated tabular files contain information about data availability and processing status for each participant-visit pair and each processing pipeline. These files are used to identify new participants and visits that have not been processed yet. Pipelines that were previously used for this dataset can then be run on the new data.

We provide tools for every step of a neuroimaging data processing workflow, including conversion of raw scanner output to the Brain Imaging Data Structure (BIDS) standard [4], processing of neuroimaging data, tracking of processing completion status, and extraction of imaging-derived phenotypes (IDPs) from pipeline outputs. The Boutiques framework [5] is used to provide a harmonized interface for running pipelines on BIDS data. Users can add their own processing pipelines by creating appropriate Boutiques descriptor files for their software. Parameter values for each pipeline and version are stored in Boutiques invocation files, allowing for data provenance to be recorded. Information about processing status for each pipeline and version is combined into a single tabular file, which can be uploaded to a user-friendly web dashboard for interactive visualizations of processing progress (https://digest.neurobagel.org/).
Supporting Image: Nipoppy-OverviewofNipoppydatasetstructure.jpg
 

Results:

Nipoppy has been successfully used to process longitudinal Parkinson's disease cohorts from the Parkinson's Progression Markers Initiative (PPMI) and the Quebec Parkinson Network (QPN) (Fig. 2). Both datasets have been processed with fMRIPrep [6] and MRIQC [7] so far, with more pipelines such as TractoFlow [8] and micapipe [9] planned for the near future. Both datasets are longitudinal and actively releasing new data; Nipoppy is able to process newly added data with the same pipelines and configurations as existing data. Outputs from each pipeline are organized into clearly labelled directories, allowing tracking to be performed automatically for each pipeline and version. Extractors for common IDPs (e.g., FreeSurfer statistics) are available for the currently integrated pipelines.
Supporting Image: Nipoppy-PPMI_QPNprocessingdashboard.jpg
 

Conclusions:

Nipoppy can be used to establish a decentralized data processing network, where research centres or laboratories each process their own datasets following the same general workflow. Efficient data sharing within such a network can be achieved through Nipoppy's data organization standard. Nipoppy also creates files compatible with the Neurobagel ecosystem for distributed dataset harmonization and search (https://neurobagel.org/).

Neuroinformatics and Data Sharing:

Databasing and Data Sharing 2
Workflows 1

Keywords:

Data analysis
Data Organization
MRI
Open-Source Code
Open-Source Software
Workflows

1|2Indicates the priority used for review

Provide references using author date format

Bhagwat, N. (2021). Understanding the impact of preprocessing pipelines on neuroimaging cortical surface analyses. GigaScience, 10(1), giaa155. https://doi.org/10.1093/gigascience/giaa155
Cruces, R. R. (2022). Micapipe: A pipeline for multimodal neuroimaging and connectome analysis. NeuroImage, 263, 119612. https://doi.org/10.1016/j.neuroimage.2022.119612
Esteban, O. (2017). MRIQC: Advancing the automatic prediction of image quality in MRI from unseen sites. PLOS ONE, 12(9), e0184661. https://doi.org/10.1371/journal.pone.0184661
Esteban, O. (2019). fMRIPrep: A robust preprocessing pipeline for functional MRI. Nature Methods, 16(1), Article 1. https://doi.org/10.1038/s41592-018-0235-4
Glatard, T. (2018). Boutiques: A exible framework for automated application integration in computing platforms. GigaScience, 7(5). https://doi.org/10.1093/gigascience/giy016
Gorgolewski, K. J. (2016). The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Scientific Data, 3(1), Article 1. https://doi.org/10.1038/sdata.2016.44
Hayashi, S. (2023). brainlife.io: A decentralized and open source cloud platform to support neuroscience research (arXiv:2306.02183). arXiv. http://arxiv.org/abs/2306.02183
Sherif, T. (2014). CBRAIN: A web-based, distributed computing platform for collaborative neuroimaging research. Frontiers in Neuroinformatics, 8. https://www.frontiersin.org/articles/10.3389/fninf.2014.00054
Theaud, G. (2020). TractoFlow: A robust, efficient and reproducible diffusion MRI pipeline leveraging Nextflow & Singularity. NeuroImage, 218, 116889. https://doi.org/10.1016/j.neuroimage.2020.116889