Poster No:
1821
Submission Type:
Abstract Submission
Authors:
Sebastian Urchs1, Alyssa Dai1, Arman Jahanpour1, Michelle Wang1, Mathieu Dugré2, Nikhil Bhagwat1, Brent McPherson1, Sean Hatton3, David Keator4, Jeffrey Grethe3, Satra Ghosh5, David Kennedy6, Yaroslav Halchenko7, Mallar Chakravarty1, Jean-Baptiste Poline1
Institutions:
1McGill University, Montreal, Quebec, 2Concordia University, Montreal, Quebec, 3University of California, San Diego, San Diego, CA, 4Change Your Life Foundation, Costa Mesa, CA, 5Massachusetts Institute of Technology, Cambridge, MA, 6University of Massachusetts Chan Medical School, Worcester, MA, 7Dartmouth College, Hanover, NH
First Author:
Co-Author(s):
Sean Hatton
University of California, San Diego
San Diego, CA
Satra Ghosh
Massachusetts Institute of Technology
Cambridge, MA
David Kennedy
University of Massachusetts Chan Medical School
Worcester, MA
Introduction:
The growth in available neuroimaging data, driven by the success of data platforms like OpenNeuro [1] or the Canadian Open Neuroscience Platform [2], and the adoption of standards for raw data like the Brain Imaging Data Structure (BIDS [3]), create a need for tools to find relevant data across the platforms where they are stored. Building on the harmonization enabled by BIDS and related standards for phenotypic data, like NIDM [4] and CogAtlas [5], Neurobagel is an ecosystem for decentralized data discovery and access. It empowers users to find cohorts of participants across a growing list of open and restricted-access data repositories. Many of the publicly accessible datasets have been processed into analysis-ready data, but there is no easy way to search for subjects with these processed data across platforms. This is all the more true for data that are not openly available, but require some form of controlled access (e.g. as part of a consortium). To address this need, we have expanded Neurobagel to enable discovery of imaging derivatives.
Unlike raw data, little standardisation exists for derived imaging data, with different processing pipelines generating output files with different names and structure. This raises two harmonization challenges: 1) We need a common language for the successful generation of derivatives across pipelines. For this, we rely on the Nipoppy [6] project, which maintains a library of automatic pipeline output trackers that generate a standardized tabular availability record. The simple structure of this status file can also be generated manually without Nipoppy. 2) We need a curated catalog of pipeline names and their versions to ensure that derivative availability information can be harmonized. To our knowledge, such a catalog does not exist yet. To address this, we are initiating a community-driven open pipeline catalog.
Methods:
We have started a public catalog of known named processing pipelines and their versions as a public GitHub repository [7] that can be easily expanded through pull requests. In addition, each pipeline in the catalog is unambiguously identified by the URL to its definition on GitHub.
The Neurobagel command line interface (CLI) is a containerized Python tool to extract harmonized representations of subjects in local datasets for the purpose of discovery. We have updated the CLI to validate and read Nipoppy status files to extract derivative availability information. The CLI validates a provided status file against a Nipoppy schema to ensure that the structure is correct. It then validates the pipelines and their versions described in the file against the pipeline catalog.
Finally, we have updated the ability of decentralized Neurobagel nodes and the graphical query tool to support queries for imaging derivatives across datasets.

·The public Neurobagel query portal shows a result for a derivative query of data processed with freesurfer
Results:
As an initial proof of concept, we extracted derivative information for two local datasets (N=4690) that had been processed using three different pipelines and for which Nipoppy status files were already generated. In addition, we added the extracted derivative information for the Quebec Parkinson Network dataset to a public Neurobagel node, where the dataset can be queried with aggregated results [8].
Conclusions:
The ability to query for cohorts of participants with analysis-ready data will help improve the reuse of processed data, particularly for large datasets that consume a considerable amount of time and energy to process. The workflow presented here to make these data findable can be largely automated, thanks to the growing set of automatic Nipoppy extractors for different pipelines. We therefore plan on inviting other data owners to make their derived data findable alongside their raw data. For example, a substantial part of datasets on OpenNeuro have been processed with fMRIPrep and would be a natural addition. In parallel, we will invite the community to expand the initial pipeline catalog to facilitate the standardized description of imaging derivatives.
Neuroinformatics and Data Sharing:
Databasing and Data Sharing 1
Workflows 2
Keywords:
Data analysis
Data Organization
Workflows
Other - Decentralized data
1|2Indicates the priority used for review
By submitting your proposal, you grant permission for the Organization for Human Brain Mapping (OHBM) to distribute your work in any format, including video, audio print and electronic text through OHBM OnDemand, social media channels, the OHBM website, or other electronic publications and media.
I accept
The Open Science Special Interest Group (OSSIG) is introducing a reproducibility challenge for OHBM 2025. This new initiative aims to enhance the reproducibility of scientific results and foster collaborations between labs. Teams will consist of a “source” party and a “reproducing” party, and will be evaluated on the success of their replication, the openness of the source work, and additional deliverables. Click here for more information.
Propose your OHBM abstract(s) as source work for future OHBM meetings by selecting one of the following options:
I do not want to participate in the reproducibility challenge.
Please indicate below if your study was a "resting state" or "task-activation” study.
Other
Healthy subjects only or patients (note that patient studies may also involve healthy subjects):
Patients
Was this research conducted in the United States?
No
Were any human subjects research approved by the relevant Institutional Review Board or ethics panel?
NOTE: Any human subjects studies without IRB approval will be automatically rejected.
Not applicable
Were any animal research approved by the relevant IACUC or other animal research panel?
NOTE: Any animal studies without IACUC approval will be automatically rejected.
Not applicable
Please indicate which methods were used in your research:
Other, Please specify
-
Data harmonization
Which processing packages did you use for your study?
Free Surfer
Provide references using APA citation style.
1. Markiewicz CJ, Gorgolewski KJ, Feingold F, Blair R, Halchenko YO, Miller E, et al. The OpenNeuro resource for sharing of neuroscience data. Elife. 2021;10. doi:10.7554/eLife.71774
2. Poline J-B, Das S, Glatard T, Madjar C, Dickie EW, Lecours X, et al. Data and Tools Integration in the Canadian Open Neuroscience Platform. Sci Data. 2023;10: 189.
3. Gorgolewski KJ, Auer T, Calhoun VD, Craddock RC, Das S, Duff EP, et al. The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Sci Data. 2016;3: 160044.
4. Maumet C, Auer T, Bowring A, Chen G, Das S, Flandin G, et al. Sharing brain mapping statistical results with the neuroimaging data model. Sci Data. 2016;3: 160102.
5. Miller E, Seppa C, Kittur A, Sabb F, Poldrack R. The Cognitive Atlas: Employing Interaction Design Processes to Facilitate Collaborative Ontology Creation. Nature Precedings. 2010; 1–1.
6. Bhagwat N, Wang M, McPherson B, Gonzalez Pepe I, Poline J-B. Nipoppy: A lightweight neuroimaging workflow manager. 2023. doi:10.5281/zenodo.8084760
7. pipeline-catalog: A list of pipelines and their versions that are recognized by nipoppy. Github; Available: https://github.com/nipoppy/pipeline-catalog
8. Neurobagel Query Tool - Quebec Parkinson Network. [cited 15 Dec 2024]. Available: https://query.neurobagel.org/?node=Quebec+Parkinson+Network
No