Maximizing scientific efficiency through sustainability, reproducibility, and FAIRness

Muriah Wheelock Organizer
Washington University in St. Louis
St. Louis, MO 
United States
 
Nikhil Bhagwat Co Organizer
McGill University
Montreal, Quebec 
Canada
 
Niousha Dehestani Co Organizer
National University of Singapore
Singapore
Singapore
 
Naomi Gaggi, PhD Co Organizer
New York University Grossman School of Medicine
Rockaway Park, NY 
United States
 
Tuesday, Jun 24: 9:00 AM - 1:00 PM
1099 
Educational Course - Half Day (4 hours) 
Brisbane Convention & Exhibition Centre 
Room: P2 (Plaza Level) 
The topic of maximizing the efficiency of scientific processes through sustainability, reproducibility, and FAIRness in neuroimaging is highly timely and significant given the increasing complexity and scale of neuroimaging studies. As the field advances, the need for robust systems that ensure reproducibility, transparency, and accessibility of data has never been more urgent. Challenges such as difficulties in data sharing, computational inefficiencies, and lack of standardization hinder the potential for widespread collaboration and the replication of results. At the same time, the environmental impact of high computational demands in neuroimaging research is a growing concern. By addressing these challenges head-on, this course provides crucial insights into best practices for building reproducible, sustainable, and open neuroimaging workflows. Participants will gain essential knowledge in version control systems (VCS) and community approaches to tracking changes and ensuring collaboration, as well as how to create consistent and reproducible processing environments using tools like Neurodesk and Brainlife. The course also covers the harmonization of image processing methods through metadata standards such as BIDS, and sharing tools like Nipoppy and Neurobagel. Additionally, the course emphasizes sustainable computing practices, from efficient coding techniques to optimizing high-performance computing systems, reducing both computational cost and carbon footprint. By the end of the course, participants will have the skills to implement reproducible workflows, adhere to metadata standards, share processed data, and practice efficient, sustainable computing—ultimately enabling them to contribute to open science initiatives and strengthen the integrity and impact of their research.

Objective

1. Implement Version Control and Maximize Reproducibility: Learn how to use version control systems, create proper documentation, and ensure reproducibility of code for neuroimaging analysis and visualization.

2. Utilize Containerized Analysis Pipelines and Standardized Platforms: Understand how to use existing containerized analysis pipelines and harmonize analysis workflows with standardized platforms and data formats to ensure consistency and interoperability.

3. Optimize Computational Efficiency and Reduce Time and Environmental Impact: Gain skills in coding efficiently, utilizing schedulers, and optimizing workflows to reduce computational costs and minimize the carbon footprint of neuroimaging research.
 

Target Audience

The target audience for this course includes neuroimaging researchers, data scientists, and computational biologists at any career stage who are interested in improving the reproducibility, sustainability, and efficiency of their research workflows. Participants should have a basic understanding of neuroimaging data analysis and an interest in adopting best practices for open science and environmentally conscious computing. 

Presentations

First steps towards a FAIR-er and more efficient research: version control systems, public engagement, and open development

You recently read about the best next approach to analyse your latest data, but it takes you four weeks to code the analysis, and at the end of the day it does not really work out. Or maybe - hooray! - the authors declared that all their code is available online, and you find it, just to discover a very intricate bunch of lines without much instructions. You work for months on a data analysis, just to find out that someone, somewhere, published a toolbox to get results in a heartbeat a while back, but it was not findable. Worse, you spend a great deal of time developing the next revolutionary neuroscientific discovery and ending with little to no engagement from the community at large. Are these scenarios familiar? And aren’t they incredibly frustrating?
Under the requests of many journals and funding bodies, our science is becoming more and more open and reproducible, but there is still quite some work to do to achieve complete FAIRness - that is the work needed to move from producing deliverables for oneself to producing deliverables for everyone. Luckily, there are paths of least resistance to get there.
In this talk we will cover the first steps towards better findable, accessible, and reusable deliverables, as well as towards improving their external engagement through open development principles. After an introduction of version control systems (git) and related public platforms (github, gitlab, …), we will discuss a few simple practices to improve our code and text based deliverables for better reusability, starting with documentation and styling. With the power of automation, we will discuss how to improve the findability and accessibility of our deliverables, and we will briefly discuss licenses. After covering tools like Auto, pre-commit, read the docs, Zenodo, and many others, we will discuss the hurdles and gains of open and community development, and how a little extra energy can lead to great collaborations and more scientific engagement with our work. Not a fan of open development and collaboration? Not a problem: the same practices we will discuss in this context can be used to improve the interpersonal and intergenerational interaction of single laboratories and institutions. 

Presenter

Stefano Moia, Maastricht University
Department of Cognitive Neuroscience
Maastricht, Limburg 
Netherlands

Reproducible Neuroimaging Made Easy: A Practical Guide to Software Containers

Neuroimaging research often depends on specialized software, which many of us know can
be a time-consuming and frustrating challenge to install on local computers. But if setting up software locally takes days, how much longer would it take to configure and run it on a high- performance computing (HPC) cluster when scaling up analyses? For many, this question feels daunting. However, this talk aims to provide solutions that let you sleep easier at night. Software containers—self-contained environments that package applications with all their dependencies—offer a practical solution to this challenge. They make software portable across different computing environments, including HPC. Yet, building and managing containers can still feel complex for researchers without extensive technical expertise. Fortunately, neuroimaging data analysis platforms like Neurodesk (Renton, Dao, et al., 2024; Nature Methods) and BrainLife (Hayashi, et al., 2024; Nature Methods) have revolutionized.How the neuroimaging community accesses and utilizes containerized applications. These platforms simplify the process, making advanced computational tools accessible without requiring a computer science background. In this talk, we will introduce key concepts of software containers, demystify common terminology, and explore popular containerization platforms. We will demonstrate how containers enable reproducible neuroimaging workflows and highlight the ease of using these tools on platforms like Neurodesk and BrainLife. By the end of this session, attendees will see how they can leverage this technology to streamline their research, scale their analyses, and reduce the headaches often associated with complex software setups. 

Presenter

Fernanda Ribeiro, Justus-Liebig University Giessen Giessen, Hessen 
Germany

How to measure and reduce the carbon footprint of neuroimaging research computing using open standards and tools

The storage and processing of neuroimaging data uses energy, and therefore has a carbon footprint. Data centres currently account for 1-4% of global CO2 emissions, a number that may continue to rise with advances in machine-learning. In this session, we’ll discuss the impact of neuroimaging pipelines on carbon emissions followed by demonstration of tools and strategies for researchers to help reduce it. There exist several open-tools that can help you estimate and track the impact of computing tasks. We will provide a walk through of Green Algorithms (web-based), CarbonTracker (embedded), and GA4HPC (cluster-side) to help researchers adopt them in their own work. We will offer recommendations on less carbon-intensive software tools and practices. Through a fMRIPrep use case, we will show that you can reduce the carbon footprint up to 48%.We will also present tools that can be deployed institution- or HPC-wide to minimize the carbon footprint at scale by adopting “green” compute job schedulers such as Climate Aware Task Scheduler (CATS) that can optimize times and locations to minimize carbon intensity. Sustainable science practices heavily intersect with open-science principles that promote data sharing and reuse. Here we discuss best practices for individual researchers and much needed community contributions to instill sustainability as a core goal of scientific research. We will present the on-going community initiatives such as COBIDAS that facilitate standardized reporting and discovery of neuroimaging research artifacts to avoid duplication of heavy computation. We will also highlight ongoing software efforts (e.g. CATS, fMRIPrepCleanup) where researchers can contribute, maintain, and help improve these tools to promote and enable efficient data storage and green-computing.
 

Presenter

Nick Souter, University of Sussex Brighton, East Sussex 
United Kingdom

Sustainable data-sharing through open standards and tools for data management and discovery

As datasets grow larger in size, so does “data gravity”, meaning that more energy is required to store, transfer, and process them. To reduce unnecessary duplication of these tasks, FAIR (findable, accessible, interoperable, and reusable) practices must be adopted for efficient sharing and reuse of raw and derived neuroimaging data. In this module, we will present open standards and tools that can help curate FAIR datasets and enable sustainable data-sharing. Standardized data organization and processing is critical for making data interoperable. We will introduce the Brain Imaging Data Structure (BIDS) community standard and the rich ecosystem of portable (containerized) open-source image processing pipelines that natively understand BIDS data (BIDS Apps). We will show how these standards facilitate data pooling and sharing and distribute compute-heavy processing to clusters with a smaller carbon footprint. The sharing, discovery, and reuse of processed data derivatives requires thorough documentation (and metadata) about the data processing methods. We will demonstrate existing tools and frameworks, including – DataLad, BABS, Boutiques, Nipoppy, and Neurobagel – that help with provenance tracking, (meta)data annotation, and data discovery. We will discuss how these tools can help generate a variety of interoperable research objects and share them with the larger community with fully specified underlying reproducible workflows. Alignment with the FAIR principles is essential for data-sharing to be meaningful and sustainable. This module will introduce attendees to decentralized neuroinformatics infrastructure for data processing, annotation and discovery, which can help reduce data gravity and make neuroimaging more sustainable.
 

Presenter

Michelle Wang, McGill University Montreal, Quebec 
Canada

Optimizing and parallelizing algorithms in interpreted languages

Interpreted languages are an excellent tool for prototyping algorithms, but a major tradeoff compared to compiled languages is slower performance. This talk will focus on techniques for identifying and addressing performance bottlenecks specifically in Python and MATLAB. Both languages offer a profiler to determine which sections of code use the most processing time, and this approach should be used first before investing time trying to optimize any code. After determining targets for optimization in code, there are several families of approaches that can be used to speed up processing. One approach is to see if any loops in algorithms can be replaced by “vectorized” operations. While both languages are interpreted languages, they are supported in part by libraries of compiled code, and vectorizing certain algorithms can replace repeated calls to a function with small amounts of data to single calls of a function with all available data. Replacing loops with a single function call reduces the overhead of loop management and memory transfer between the interpreted client and the compiled functions, and often these compiled functions are already designed to minimize computational complexity on large blocks of data. Both languages also offer parallelization, which in certain instances may be another approach to optimizing algorithms. In some instances this parallelization happens implicitly, and both Python and MATLAB offer multiple options for the user to explicitly make their algorithms parallelized. This talk will address when parallelization is a good candidate for reducing processing time, because there are both advantages and disadvantages that must be considered. While parallelization allows users to explicitly take advantage of as many processors as are available, there is also memory and processing overhead involved in initializing, running, and getting results from the parallel workers. Both languages offer options to address and reduce this overhead, and the effectiveness of these options will be addressed. All elements of this talk will be supported by interactive segments, and Jupyter notebooks with examples of these concepts will be made available in both MATLAB and Python. 

Presenter

Andrew Eck, Washington University in St. Louis St. Louis, MO 
United States

From Code to Visualization: Reproducible Pipelines for Neuroimaging Research

How can neuroimaging researchers ensure their findings are not only robust but fully transparent and reproducible? This talk addresses this critical question by exploring the tools and practices needed to build reproducible pipelines, from coding to visualization. Whether you’re new to reproducible research or looking to refine your workflows, this session offers actionable insights for all experience levels. We’ll begin by outlining best practices for reproducible coding, emphasizing how transparent and shareable scripts foster collaboration, trust, and consistency in data processing. Examples will highlight the use of open notebooks hosted on public repositories to share study designs, evaluations, and findings openly with the scientific community. The session will then shift focus to an often-overlooked aspect of reproducibility: scientific visualizations. Figures are central to communicating findings but are often treated as static, unrepeatable images. This talk will demonstrate how reproducible visualizations transform figures into dynamic, script-backed content, enabling others to recreate, adjust, and validate visual representations with ease. Using Python-based tools such as Nilearn, Niivue, Cerebro, MMVT, and DIPY, alongside platforms like Blender, we’ll showcase techniques for generating reproducible visualizations across neuroimaging modalities, including 2D and 3D volumetric imaging, cortical surface renderings, brain network diagrams, and tractography. Attendees will learn how to produce publication-quality figures directly from code, eliminating reliance on manual adjustments via GUIs. This session will equip attendees with practical skills for implementing reproducible workflows, along with openly available scripts that can be adapted for their projects. By integrating these practices, researchers can ensure their findings are robust, accessible, and verifiable—contributing to greater transparency and impact in neuroimaging research while addressing key challenges in reproducibility. 

Presenter

Sina Mansour L., Ph.D., University of Melbourne & National University of Singapore Melbourne
Australia