Poster No:
1100
Submission Type:
Abstract Submission
Authors:
Zijiao Chen1, Niousha Dehestani1, Juan Helen Zhou1
Institutions:
1National University of Singapore, Singapore, Singapore
First Author:
Zijiao Chen
National University of Singapore
Singapore, Singapore
Co-Author(s):
Introduction:
Predicting continuous behavioral scores often encounters a skewed target distribution: most individuals cluster around typical or "normal" values, while a small subset presents extreme or out-of-distribution scores. Standard regression models tend to regress towards the mean, underestimating values at the tails[1]. This limitation reduces the utility of predictions for identifying and managing high-risk or atypical cases. Existing correction methods can mitigate some biases but struggle with data lying far from the norm, especially when exposed to new or unseen conditions.
Methods:
We propose an embedding-guided inference mechanism to improve predictions on extreme outcomes (Figure 1). To implement this approach, we begin by training a hierarchical Vision Transformer (ViT) [2], pre-trained on large-scale medical imaging data (from MedSAM Adaptor) [3], using the Adolescent Brain Cognitive Development (ABCD) study dataset (~11,000 participants, ages 9–16) [4]. The model's goal is to predict Child Behavior Checklist (CBCL) internalizing behavior scores [5], a continuous measure where most individuals cluster in a "normal" range, but a minority exhibit extreme values that are crucial to identify for early intervention. During training, the model learns to represent each input as a condensed embedding-an internally generated vector capturing the input's core features and its position within the broader data distribution.
After training, we extract embeddings for all training samples and compute their average embedding, serving as a reference point that characterizes the dataset's typical feature space. At inference, when a new input arrives, we pass it through the embedding extraction process and measure the cosine similarity between its embedding and the average embedding. As illustrated in Figure 1 (top), a high similarity suggests the input is "typical," whereas a low similarity indicates it may be unusual or extreme. Using this similarity, we apply a guidance scale defined by three parameters (base_scale, k, and ρ_0), as shown in Figure 1 (bottom). The base_scale sets the overall magnitude of adjustments, k determines how sharply the scale changes with embedding similarity, and ρ_0 specifies the similarity threshold where adjustments peak. By tuning these parameters, the model can make larger deviations from its raw predictions for inputs deemed atypical, while maintaining stable predictions for more common cases. This embedding-guided mechanism thus adapts dynamically to the input's characteristics, improving robustness and accuracy, particularly in scenarios where conventional regression models often fail.

Results:
For predicting children's CBCL internalizing behavior scores using 3D ViT without embedding guidance, the model achieved: MAE = 4.8182, RMSE = 6.3516, R² = 0.1743, and correlation = 0.4245. With embedding guidance, we observed modest improvements in overall metrics: MAE = 4.8174, RMSE = 6.2639, R² = 0.1759, and correlation = 0.4414. More importantly, it notably enhanced predictions for extreme values. In the lower tail, MAE improved from 7.98 to 7.16 by 11.5%, and in the upper tail, MAE improved from 10.65 to 10.07 by 5.8%. These results indicate that embedding guidance enhances correlation and better captures out-of-distribution samples while maintaining overall accuracy.
Conclusions:
By integrating embedding-based similarity into the prediction process, our method provides a dynamic mechanism to refine outputs for atypical or extreme cases. This leads to more robust and interpretable predictions, which is crucial in clinical settings where identifying individuals with particularly high or low behavioral scores can guide interventions. The improvements in tail metrics and correlation underscore the potential of embedding guidance to overcome the regression-to-the-mean tendency, offering a more reliable framework for clinical prediction tasks.
Disorders of the Nervous System:
Psychiatric (eg. Depression, Anxiety, Schizophrenia) 2
Modeling and Analysis Methods:
Classification and Predictive Modeling 1
Image Registration and Computational Anatomy
Methods Development
Multivariate Approaches
Keywords:
Cognition
Computational Neuroscience
Data analysis
Development
Machine Learning
MRI
Statistical Methods
1|2Indicates the priority used for review
By submitting your proposal, you grant permission for the Organization for Human Brain Mapping (OHBM) to distribute your work in any format, including video, audio print and electronic text through OHBM OnDemand, social media channels, the OHBM website, or other electronic publications and media.
I accept
The Open Science Special Interest Group (OSSIG) is introducing a reproducibility challenge for OHBM 2025. This new initiative aims to enhance the reproducibility of scientific results and foster collaborations between labs. Teams will consist of a “source” party and a “reproducing” party, and will be evaluated on the success of their replication, the openness of the source work, and additional deliverables. Click here for more information.
Propose your OHBM abstract(s) as source work for future OHBM meetings by selecting one of the following options:
I am submitting this abstract as an original work to be reproduced. I am available to be the “source party” in an upcoming team and consent to have this work listed on the OSSIG website. I agree to be contacted by OSSIG regarding the challenge and may share data used in this abstract with another team.
Please indicate below if your study was a "resting state" or "task-activation” study.
Other
Healthy subjects only or patients (note that patient studies may also involve healthy subjects):
Patients
Was this research conducted in the United States?
No
Were any human subjects research approved by the relevant Institutional Review Board or ethics panel?
NOTE: Any human subjects studies without IRB approval will be automatically rejected.
Yes
Were any animal research approved by the relevant IACUC or other animal research panel?
NOTE: Any animal studies without IACUC approval will be automatically rejected.
Not applicable
Please indicate which methods were used in your research:
Structural MRI
Behavior
Computational modeling
For human MRI, what field strength scanner do you use?
3.0T
Which processing packages did you use for your study?
Free Surfer
Provide references using APA citation style.
1. Chung, Y., Kim, S., & Lee, H. (2020). Systematic Bias of Machine Learning Regression Models: Implications for Continuous Outcome Prediction. Journal of Machine Learning Research, 21(1), 1-25.
2. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., & Zisserman, A. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.
3. Wu, J., Ji, W., Liu, Y., Fang, H., Wang, Z.-Y., Xu, Y., & Arbel, T. (2023). Medical SAM Adapter: Adapting Segment Anything Model for Medical Image Segmentation. arXiv preprint arXiv:2304.12620.
4. National Institutes of Health (NIH). (2020). The Adolescent Brain Cognitive Development (ABCD) Study: Understanding the development of risk for mental health and substance use. Nature Psychiatry, 5(6), 486-495. https://doi.org/10.1038/s41386-020-0736-6
5. Achenbach, T.M., & Rescorla, L.A. (2001). Manual for the ASEBA School-Age Forms & Profiles. University of Vermont, Research Center for Children, Youth & Families.
No