Integration of the Single-Cell RNA Sequencing scRNAbox Pipeline in CBRAIN

Poster No:

1830 

Submission Type:

Late-Breaking Abstract Submission 

Authors:

Natacha Beck1,2, Michael Fiorini3,2, Rhalena Thomas3,2, Pierre Rioux1,2, Serge Boroday1,2, Darcy Quesnel1,2, Xuan Pham1,2, Reza Adalat1,2, Bryan Caron1,2, Sali Farhan3,2, Alan Evans1,2

Institutions:

1McGill Centre for Integrative Neuroscience (MCIN), Montréal, Québec, Canada, 2Montreal Neurological Institute (MNI), McGill University, Montréal, Québec, Canada, 3Department of Neurology and Neurosurgery, McGill University, Montréal, Québec, Canada

First Author:

Natacha Beck  
McGill Centre for Integrative Neuroscience (MCIN)|Montreal Neurological Institute (MNI), McGill University
Montréal, Québec, Canada|Montréal, Québec, Canada

Co-Author(s):

Michael Fiorini  
Department of Neurology and Neurosurgery, McGill University|Montreal Neurological Institute (MNI), McGill University
Montréal, Québec, Canada|Montréal, Québec, Canada
Rhalena Thomas  
Department of Neurology and Neurosurgery, McGill University|Montreal Neurological Institute (MNI), McGill University
Montréal, Québec, Canada|Montréal, Québec, Canada
Pierre Rioux  
McGill Centre for Integrative Neuroscience (MCIN)|Montreal Neurological Institute (MNI), McGill University
Montréal, Québec, Canada|Montréal, Québec, Canada
Serge Boroday  
McGill Centre for Integrative Neuroscience (MCIN)|Montreal Neurological Institute (MNI), McGill University
Montréal, Québec, Canada|Montréal, Québec, Canada
Darcy Quesnel  
McGill Centre for Integrative Neuroscience (MCIN)|Montreal Neurological Institute (MNI), McGill University
Montréal, Québec, Canada|Montréal, Québec, Canada
Xuan Pham  
McGill Centre for Integrative Neuroscience (MCIN)|Montreal Neurological Institute (MNI), McGill University
Montréal, Québec, Canada|Montréal, Québec, Canada
Reza Adalat  
McGill Centre for Integrative Neuroscience (MCIN)|Montreal Neurological Institute (MNI), McGill University
Montréal, Québec, Canada|Montréal, Québec, Canada
Bryan Caron  
McGill Centre for Integrative Neuroscience (MCIN)|Montreal Neurological Institute (MNI), McGill University
Montréal, Québec, Canada|Montréal, Québec, Canada
Sali Farhan  
Department of Neurology and Neurosurgery, McGill University|Montreal Neurological Institute (MNI), McGill University
Montréal, Québec, Canada|Montréal, Québec, Canada
Alan Evans  
McGill Centre for Integrative Neuroscience (MCIN)|Montreal Neurological Institute (MNI), McGill University
Montréal, Québec, Canada|Montréal, Québec, Canada

Late Breaking Reviewer(s):

Giulia Baracchini  
The University of Sydney
Sydney, New South Wales
Andreia Faria  
Johns Hopkins University
Baltimore, MD
Wei Zhang  
Washington University in St. Louis
Saint Louis, MO

Introduction:

CBRAIN (Sherif et al., 2014) (https://cbrain.ca) is an open source, web-based, collaborative research software platform designed to address major challenges in big data research. CBRAIN allows scientists to launch large-scale big data analyses using advanced scientific tools through an easy to use web-based user interface, removing the steep learning curve and pitfalls associated with a complex command-line environment. CBRAIN makes available over 160 pre-configured analysis pipelines for neuroimaging and genetics. This includes scientific tools for single-cell transcriptomics such as scRNAbox (Thomas et al. 2024), a single-cell RNA sequencing (scRNAseq) pipeline.

As new scientific tools and pipelines are being developed, the capability to seamlessly integrate them in CBRAIN is crucial. The process to integrate new tools has been streamlined and uses Boutiques (Glatard et al., 2018) to define the command line and tool parameters in a JSON format. The use of Boutiques offers multiple parameter validation features and enables clearer form validation messages. Based on the descriptor, CBRAIN automatically builds the user interface for each tool allowing users to select application parameters while automatically performing parameter validation. Containers, namely Docker and Apptainer (Kurtzer et al., 2017), are used to encapsulate the tool.

Occasionally, the integration of the tool in CBRAIN via Boutiques descriptor requires additional programming steps to define custom modules that will be used by the CBRAIN framework. To integrate scRNAbox two new custom modules were implemented.

Methods:

The scRNAbox pipeline leverages the Seurat framework (Yuhan Hao et al. 2021) and incorporates eight analytical steps into a comprehensive scRNAseq analysis (Fig. 1). With the integration into CBRAIN, users are able to perform all these steps through the CBRAIN web interface.

To integrate the pipeline in CBRAIN, a wrapper was created (https://doi.org/10.5281/zenodo.14945083) to prepare the input files, including the creation of a config file that contains a summary of variables and options needed for the pipeline execution.

The integration of scRNAbox raised two main issues:
- Input management: in scRNAbox the tool mutates the input given to the pipeline by adding the output to the input given by the previous step. This behaviour can lead to issues in CBRAIN if multiple tasks are run on the same input data. It can result in tasks starting with different input or working on data that has been mutated by other tasks in the cache.
- Resource management: each step had different requirements in terms of memory usage and CPU time. For example, Step 2 needs only 16GB of memory, whereas Step 7 needs up to 150GB.
Supporting Image: Fig1.png
   ·Fig1: Single-cell tool for CBRAIN integration: scRNAbox – scRNAseq Pipeline
 

Results:

To solve the input management problem a Boutiques custom module called 'BoutiquesInputCopier' was developed. This module allows the creation of a full copy of the input before starting the tool. In the case of scRNAbox, the copy is always performed. However, the creator of the Boutiques descriptor can add an option to make this feature selectable in the form, allowing the user to avoid the extra copy, if desired, for other tool integrations.

To address the resource management problem, the 'BoutiquesScrnaboxResourceManager' was developed. This module allows the creator of the Boutiques descriptor to specify the resource requirements for each step of the pipeline. This approach minimizes the resource requests on HPC systems, ensuring efficient use of memory and CPU time.
Supporting Image: Fig2.png
   ·Fig2: scRNAbox pipeline steps as presented through the CBRAIN user interface
 

Conclusions:

Integration of scRNAbox into CBRAIN demonstrates the flexibility and extensibility of integrating tools using the Boutiques framework. To address the challenges posed by scRNAbox, two modules were created to manage input data and resource allocation, ensuring efficient and reliable execution of the pipeline. Future work can include generalisation of the 'BoutiquesScrnaboxResourceManager' in order to accommodate other pipelines.

Genetics:

Genetic Modeling and Analysis Methods 2
Genetics Other

Neuroinformatics and Data Sharing:

Databasing and Data Sharing 1
Workflows
Informatics Other

Keywords:

Data analysis
Open-Source Code
Open-Source Software

1|2Indicates the priority used for review

Abstract Information

By submitting your proposal, you grant permission for the Organization for Human Brain Mapping (OHBM) to distribute your work in any format, including video, audio print and electronic text through OHBM OnDemand, social media channels, the OHBM website, or other electronic publications and media.

I accept

The Open Science Special Interest Group (OSSIG) is introducing a reproducibility challenge for OHBM 2025. This new initiative aims to enhance the reproducibility of scientific results and foster collaborations between labs. Teams will consist of a “source” party and a “reproducing” party, and will be evaluated on the success of their replication, the openness of the source work, and additional deliverables. Click here for more information. Propose your OHBM abstract(s) as source work for future OHBM meetings by selecting one of the following options:

I do not want to participate in the reproducibility challenge.

Please indicate below if your study was a "resting state" or "task-activation” study.

Other

Healthy subjects only or patients (note that patient studies may also involve healthy subjects):

Healthy subjects

Was this research conducted in the United States?

No

Were any human subjects research approved by the relevant Institutional Review Board or ethics panel? NOTE: Any human subjects studies without IRB approval will be automatically rejected.

Not applicable

Were any animal research approved by the relevant IACUC or other animal research panel? NOTE: Any animal studies without IACUC approval will be automatically rejected.

Not applicable

Please indicate which methods were used in your research:

PET
Functional MRI
EEG/ERP
MEG
Neurophysiology
Structural MRI
Diffusion MRI

Provide references using APA citation style.

1. Sherif T, Rioux P, Rousseau M-E, Kassis N, Beck N, Glatard T, Adalat R, Das S, Evans AC (2014) “CBRAIN: a web-based, distributed computing platform for collaborative neuroimaging research,” Front. Neuroinformatics, vol. 8, May 2014, doi: 10.3389/fninf.2014.00054

2. Glatard T, Kiar G, Aumentado-Armstrong T, Beck N, Bellec P, Bernard R, Bonnet A, Brown ST, Camarasu-Pop S, Cervenansky F, Das S, Ferreira da Silva R, Flandin G, Girard P, Gorgolewski KJ, Guttmann CRG, Hayot-Sasson V, Quirion P-O, Rioux P, Rousseau M-E, Evans AC, Boutiques: a flexible framework to integrate command-line applications in computing platforms, GigaScience, Volume 7, Issue 5, May 2018, giy016, https://doi.org/10.1093/gigascience/giy016

3. Thomas, R.A., Fiorini, M.R., Amiri, S. et al. ScRNAbox: empowering single-cell RNA sequencing on high performance computing systems. BMC Bioinformatics 25, 319, 2024. https://doi.org/10.1186/s12859-024-05935-y

4. Hao Y, Hao S, Andersen-Nissen E, Mauck W M., Zheng S, Butler A, Lee M J., Wilk A J., Darby C, Zager M, Hoffman P, Stoeckius M, Papalexi E, Mimitou E P., Jain J, Srivastava A, Stuart T, Fleming L M., Yeung B, Rogers A J., McElrath J M., Blish C A., Gottardo R, Smibert P, Satija R. Integrated analysis of multimodal single-cell data . Cell 184, 3573–3587, June 24, 2021. doi: 10.1016/j.cell.2021.04.048

UNESCO Institute of Statistics and World Bank Waiver Form

I attest that I currently live, work, or study in a country on the UNESCO Institute of Statistics and World Bank List of Low and Middle Income Countries list provided.

No