A Tool for Evaluating Artificial Intelligence Studies for Predicting NMIBC Outcomes - Expert Commentary

Artificial intelligence (AI) models are emerging are powerful tools for the diagnosis and risk stratification of several cancers but have not been widely adopted in clinical practice. To evaluate barriers to AI integration and facilitate evaluation of new models for clinical adoption in the context of non-muscle invasive bladder cancer (NMIBC), Kwong et al. developed APPRAISE-AI, a quantitative tool for assessing the quality of AI studies in this field.

The investigators first conducted a systematic review. A total of 15 relevant studies were analyzed, of which 47% were published between 2015 and 2022 and 60% were from Europe. All studies used models that were developed using retrospective data. The median sample size was 125 and median follow-up was 71 months. The median recurrence rate was 50% while the median progression rate was 19%. The grading schemes for tumors varied across studies, with 60% using the WHO 1973 classification system and 33% using WHO 2004/2016. There were different distinct definitions for recurrence and progression. Most AI models were based on neural network frameworks (73%). There were differences in the training and testing procedures across studies, with 47% using different cohorts for training versus testing, 27% using different cohorts for training, validation, and testing, 20% using the same cohort for training and testing, and one study using a 10-fold cross-validation of cohorts.

Study quality improved over time, between 2000 and 2022 (p = 0.03). APPRAISE-AI utilized several domains for evaluating quality such as eligibility criteria, disclosures, bias assessment, and data processing procedures. Only 20% of studies described how missing data was handled. Moreover, 47% of studies compared performances between the AI model and non-AI approaches such as Cox regression analyses or nomograms. AI models generally outperformed non-AI methods for recurrence and progression in all but two studies. The margin of benefit of the AI versus non-AI approaches depended on study quality.

The findings from this study reveal the value of using APPRAISE-AI for practical and standardized evaluation of AI-focused studies. This work raises important questions regarding best practices for AI studied in bladder cancer. Quality issues were highlighted by APPRAISE-AI, such as a lack of sufficient description of methodological processes or patient cohort details, and these factors will need to be addressed in future studies to enhance validity, reproducibility, and clinical usefulness of AI models.

Written by: Bishoy M. Faltas, MD, Director of Bladder Cancer Research, Englander Institute for Precision Medicine, Weill Cornell Medicine


  1. Kwong JCC, Wu J, Malik S, et al. Predicting non-muscle invasive bladder cancer outcomes using artificial intelligence: a systematic review using APPRAISE-AI. NPJ Digit Med. 2024;7(1):98. Published 2024 Apr 18. doi:10.1038/s41746-024-01088-7

Read the Abstract