2. Sampling inequalities and biases of neuroimaging-based artifact intelligence (AI) models in middle-lower-income countries and areas

Zhiyi Chen Presenter
Third Military Medical University
Experimental Research Center of Medical and Psychological Science
Chongqing, AK 
China
 
Monday, Jun 24: 9:00 AM - 10:15 AM
Symposium 
COEX 
Room: Grand Ballroom 101-102 
The development of artifact intelligence (AI) models for aiding in the diagnosis of mental disorder is recognized as a significant breakthrough in the field of psychiatry. However, clinical practice of such models remains a challenge, with sampling inequalities and biases being a major limitation. Here, we conducted a pre-registered meta-research assessment on neuroimaging-based models in the psychiatric literature, quantitatively examining global and regional sampling issues over recent decades, from a view that has been relatively underexplored. A total of 476 studies (n = 118,137) were included in the current assessment. Based on these findings, we built a comprehensive 5-star rating system to quantitatively evaluate the quality of existing machine-learning models for psychiatric diagnoses. We also examined risks of bias (ROB) by the structural PROBAST tool. A global sampling inequality in these models was revealed quantitatively (sampling Gini coefficient (G) = 0.81, p < .01), varying across different countries (regions) (e.g., China, G = 0.47; the United State, G = 0.58; Germany, G = 0.78; the United Kingdom, G = 0.87). Further, the severity of this sampling inequality (β = - 2.75, p < .001, R2adj = 0.40; r = - .84, 95% CI: - .41 - -.97) was significantly predicted by national economic levels, and was plausibly predictable for model performance, with higher sampling inequality for reporting higher classification accuracy. Further analyses showed that lack of independent testing (84.24% of models, 95% CI: 81.0-87.5%), improper cross-validation (51.68% of models, 95% CI: 47.2-56.2%), and poor technical transparency (87.8% of models, 95% CI: 84.9-90.8%)/availability (80.88% of models, 95% CI: 77.3-84.4%) are prevailing in current diagnostic classifiers despite improvements over time. In light of this, we proposed a purpose-built quantitative assessment checklist, which demonstrated that the overall ratings of these models increased by publication year but were negatively associated with model performance. Finally, we still found high ROB in predominant existing AI model (83.1%, 80.0 - 86.2%). None of them are perceived to be applicable to clinical practices thus far. Together, improving sampling economic equality and hence the quality of machine-learning models may be a crucial facet to plausibly translating neuroimaging-based diagnostic classifiers into clinical practice.