Sources of data pollution: ill-posed problems

Presented During:

Tuesday, June 25, 2024: 4:00 PM - 5:15 PM
Room: Grand Ballroom 104-105  

Poster No:


Submission Type:



Janine Bijsterbosch1, Ty Easley1, Aki Nikolaidis2, Rotem Botvinik-Nezer3, Chuan-Peng Hu4, Kayla Hannon1


1Washington University in St Louis, St Louis, MO, 2Child Mind Institute, New York, NY, 3The Hebrew University of Jerusalem, Jersusalem, N.A., 4Nanjing Normal University, Nanjing, N.A.


Janine Bijsterbosch  
Washington University in St Louis
St Louis, MO

Additional Organizer:

Ty Easley  
Washington University in St Louis
St Louis, MO


Aki Nikolaidis  
Child Mind Institute
New York, NY
Rotem Botvinik-Nezer  
The Hebrew University of Jerusalem
Jersusalem, N.A.
Chuan-Peng Hu  
Nanjing Normal University
Nanjing, N.A.
Kayla Hannon  
Washington University in St Louis
St Louis, MO

Please describe the advantage of addressing the topic as a symposia:

The field of human brain mapping has long been interested in relating neuroimaging data to individual differences in behavior, cognition, or clinical symptoms. With the recent availability of large-scale neuroimaging data and the focus on predictive modeling using machine learning, this long-standing interest in brain-phenotype associations now dominates much of the field. However, performing brain-behavior modeling without critically evaluating the impact of key methodological decisions and interpretive frameworks risks conclusions that suffer from bias, lack of generalizability, and/or lack of reliability. The goal of this symposium is to share recently developed ideas and insights, and discuss future directions to improve the overall robustness of approaches to brain-behavior modeling.

Specifically, efforts to relate neuroimaging data to behavior, cognition, or clinical symptoms are often plagued by ill-defined phenotypic constructs, analytical flexibility, biased datasets, and disease heterogeneity. These common sources of data pollution share a propensity to create ill-posed problems in large neuroimaging studies, which in turn obfuscate the meanings and goals that can be assigned to research findings. This symposium will inform the audience of the latest insights for each of these challenges and invite a discussion on future directions to address these challenges and improve the field of brain-behavior modeling. The format of the symposium will include 4 x 12-minute talks + 3 minutes for questions after each talk, followed by a panel discussion for 15 minutes.

Provide a brief paragraph (roughly 250 words) describing the timeliness and importance of the topic and the desired learning outcomes.

Brain-behavior neuroimaging research is at an unprecedented inflection point. With the increasing availability of sufficient data, expanding computing resources, and advances in computational approaches, a thoughtful discussion on data pollution is especially timely to educate existing and new members in the field. Recent shifts in the brain-behavior data landscape have altered benchmarks, standards, and constructs, bringing new ethical questions, skill needs, and research paradigms to the fore. This symposium is intended to disseminate important caveats, highlight opportunities for future research, and ultimately improve the insights gained from neuroimaging research.

List 2-3 specific learning objectives for the audience. Learning objectives are used for ACCME purposes.

1. Articulate the impact of sources of data pollution, namely: ill-defined phenotypic constructs, analytical flexibility, biased datasets, and disease heterogeneity.
2. Consider and address potential areas of data pollution in the design and execution of current and future research.

Please identify your target audience (1-2 sentences).

This symposium will be of interest to anyone performing brain-behavior modeling, regardless of the domain (clinical versus basic science) and methodology (ranging from linear regression models to deep learning).

Please provide justification on why your speaker selection meets OHBM's selection criteria concerning diversity of speakers. As stated in our Code of Conduct, we explicitly honor diversity with respect to multiple factors including age, culture, ethnicity, gender identity or expression, language, national origin, political beliefs, profession, race, religion, sexual orientation, and socioeconomic status. Inclusion of speakers from traditionally under-represented groups/nations is particularly encouraged.

If no, please provide justification.
The organizers exhibit diversity of career stage (PhD student and PI), gender (cis and trans), and background (a first generation graduate with a neuroscience background and an interdisciplinary academic trained in math and physics).
The speakers are gender-balanced (2 female and 2 male), cover the full range of career stages (PhD student - postdoc - PI), and capture geographical diversity (US, China, and Israel).