Poster No:
1846
Submission Type:
Abstract Submission
Authors:
Nikhil Kumar Jangamreddy1, Steffen Bollmann1
Institutions:
1The University of Queensland, Brisbane, Queensland
First Author:
Co-Author:
Introduction:
Neuroimaging research requires specialized analysis software that can be complex to install and may produce inconsistent results across different computing environments. The NeuroDesk platform (Renton et al., 2024)(https://www.neurodesk.org/) addresses these challenges by providing an open-source, community-driven suite of neuroimaging software containers. NeuroDesk enables accessible, flexible, portable, and reproducible neuroimaging analysis on personal workstations, high-performance computing (HPC) clusters, and cloud platforms. This work aims to leverage recent advancements in generative artificial intelligence (genAI) to improve the productivity of NeuroImaging researchers. Our contributions are twofold: 1) NeuroDesk copilot: incorporate code autocompletion features, chatbot features specifically adapted to neuroimaging code and 2) NeuroContainer Copilot: automate the generation of docker build scripts for neurocontainers within Neurodesk.
Methods:
NeuroDesk Copilot: Existing state-of-the-art Large Language Models (LLMs), such as OpenAI GPT-4o (Brown et al., 2020), are incredibly powerful general-purpose tools but are not optimally tailored to the specialized demands of neuroimaging pipelines. Although generic LLMs can assist with coding tasks, they may suggest commands, libraries that are irrelevant or suboptimal for neuroimaging analyses. To increase the relevance and effectiveness of these models, we fine-tune OpenAI GPT-4o using neuroimaging code examples extracted directly from the NeuroDesk GitHub repository. It involves the following steps: 1) Create a proxy server leveraging litellm (this step enhances security so that OPENAI_API_KEY is not shared inside a docker container), 2) Converting the .ipynb files to .jsonl format (jsonl is the format used to finetune LLMs), 3) Finally, we perform model fine-tuning on the curated neuroimaging-specific code to align the LLM's output with common neuroimaging tasks, standard workflows, and established conventions.
NeuroContainer Copilot: While NeuroDesk significantly reduces the friction associated with portability and reproducibility, incorporating new neuroimaging tools remains challenging. Incorporating new neuroimaging tools into NeuroDesk is challenging due setting the correct environment variables, finding specific versions of the software dependencies, version requirements or specific build steps. However, writing accurate and efficient container recipes requires expertise in containerization technologies and knowledge of software dependencies. NeuroContainer Copilot can help identify these details from provided documentation (README or installation documentation) and code samples. The development involves the following steps: 1) We use existing neurocontainer recipes to generate the recipes for new tools using Retrieval Augmented Generation (RAG) with the OpenAI gpt-4o model, 2) Recursively identify the dependencies and environment variables to set based on the README file of a new tool.

·Overview of NeuroDesk copilot, we finetune the OpenAI gpt-4o model with NeuroDesk example notebooks, we can observe from the figure that autocompletion provided by finetuned model is more accurate.

·NeuroContainer Copilot enables users to submit a README or installation instructions for a new neuroimaging tool, which it then uses to automatically generate the corresponding container build scripts
Results:
As demonstrated in Figure 1, fine-tuned openAI GPT-4o provides more accurate and context-specific neuroimaging code completions compared to a baseline general-purpose model. NeuroDesk copilot can streamline the workflow for both novice and expert users, reducing the learning curve, minimizing syntax errors and suggesting best coding practices. By automating container build scripts using LLMs, the NeuroContainer Copilot streamlines adding new tools to the NeuroDesk platform, reducing complexity and setup time.
Conclusions:
To conclude, we introduce NeuroDesk Copilot and NeuroContainer Copilot, two generative AI-driven tools that streamline neuroimaging workflows. NeuroDesk Copilot enhances code efficiency by finetuning state-of-the-art LLMs on NeuroDesk code examples, while NeuroContainer Copilot simplifies building accurate container recipes. Integrating LLMs into platforms like NeuroDesk promises a more accessible and efficient research ecosystem ultimately benefiting the broader neuroimaging community.
Modeling and Analysis Methods:
Methods Development
Neuroinformatics and Data Sharing:
Workflows 1
Informatics Other 2
Keywords:
Computational Neuroscience
Computing
Data analysis
Data Organization
Informatics
Machine Learning
Open Data
Open-Source Code
Open-Source Software
Workflows
1|2Indicates the priority used for review
By submitting your proposal, you grant permission for the Organization for Human Brain Mapping (OHBM) to distribute your work in any format, including video, audio print and electronic text through OHBM OnDemand, social media channels, the OHBM website, or other electronic publications and media.
I accept
The Open Science Special Interest Group (OSSIG) is introducing a reproducibility challenge for OHBM 2025. This new initiative aims to enhance the reproducibility of scientific results and foster collaborations between labs. Teams will consist of a “source” party and a “reproducing” party, and will be evaluated on the success of their replication, the openness of the source work, and additional deliverables. Click here for more information.
Propose your OHBM abstract(s) as source work for future OHBM meetings by selecting one of the following options:
I do not want to participate in the reproducibility challenge.
Please indicate below if your study was a "resting state" or "task-activation” study.
Other
Healthy subjects only or patients (note that patient studies may also involve healthy subjects):
Healthy subjects
Was this research conducted in the United States?
No
Were any human subjects research approved by the relevant Institutional Review Board or ethics panel?
NOTE: Any human subjects studies without IRB approval will be automatically rejected.
Not applicable
Were any animal research approved by the relevant IACUC or other animal research panel?
NOTE: Any animal studies without IACUC approval will be automatically rejected.
Not applicable
Please indicate which methods were used in your research:
Computational modeling
Other, Please specify
-
Workflows
Functional MRI
EEG/ERP
MEG
Structural MRI
Diffusion MRI
For human MRI, what field strength scanner do you use?
1.5T
3.0T
7T
Which processing packages did you use for your study?
AFNI
SPM
FSL
Free Surfer
Provide references using APA citation style.
Renton, A. I. Neurodesk: an accessible, flexible and portable data analysis environment for reproducible neuroimaging. Nature Methods. https://doi.org/10.1038/s41592-023-02145-x
Brown, T. B (2020). Language models are few-shot learners. In Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS '20) (Article 159, pp. 1–25). Curran Associates Inc.
Lewis, P. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. In Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS '20) (Article 793, pp. 1–16). Curran Associates Inc.
No