(UroToday.com) The 2025 American Society of Clinical Oncology (ASCO) Annual Meeting, held in Chicago, IL, was host to the Poster Session: Genitourinary Cancer - Prostate, Testicular, and Penile Cancer. Dr. Umair Ayub presented Poster 5108: A large language model (LLM)-based multi-agent framework for risk stratification and treatment recommendations in localized prostate cancer.
Dr. Ayub began by highlighting their previously proposed hybrid framework that integrates large language models (LLMs) with a rule-based algorithm (RBA) to automate risk stratification in localized prostate cancer. Building on this work, the investigators aimed to:
- Validate the performance of the risk stratification agent (RSA) in a prospective cohort
- Develop and evaluate a treatment recommendation agent (TRA) based on NCCN guidelines.
- Design an interactive interface to support clinicians in delivering accurate risk classification and treatment recommendations at the point of care.
This study included patients with localized prostate cancer seen at the Mayo Clinic between 2004 and 2024, all of whom had at least one positive prostate biopsy and an available MRI report. For prospective validation of the risk stratification agent (RSA), GPT-4 was used in a zero-shot setting to extract key phenotypic variables including PSA, T stage, prostate volume, number of positive cores, Gleason patterns, and grade group from unstructured biopsy and MRI reports. A rule-based algorithm (RBA) then classified patients into NCCN risk categories. Agent performance was compared to treating clinician documentation and validated against gold-standard annotations made independently by two clinicians.
To develop the treatment recommendation agent, the investigators conducted two experiments using GPT-4, with and without retrieval-augmented generation to generate treatment plans. These outputs were evaluated using a decision tree algorithm (DTA) informed by NCCN guidelines. Key evaluation metrics included weighted accuracy and F1 score. A clinician-facing web interface (lisr.org/risk) was developed to deliver real-time risk classification and guideline-based treatment recommendations.
A total of 858 patients were included in the study, 500 for prospective risk stratification agent (RSA) validation and 358 for treatment recommendation analysis. In the prospective cohort, the risk stratification agent impressively demonstrated superior performance compared to treating clinicians, achieving an F1 score of 0.89 versus 0.58 as illustrated in the table below.
For treatment recommendations (TRA), GPT-4 with retrieval-augmented generation (RAG) outperformed GPT-4 alone, achieving 64% full accuracy (all correct treatment options) and 36% partial accuracy (at least one correct option), compared to 35% and 65%, respectively. In a sensitivity analysis using GPT-4 for note generation informed by the NCCN decision tree algorithm, full accuracy reached 94% with 6% partial accuracy. Notably, GPT-4 alone generated hallucinated treatment options in 71% of cases, which was substantially reduced to 32% with the incorporation of RAG.

The investigators concluded that this multi-agent framework, integrating large language models with a rule-based algorithm, demonstrated high accuracy in both risk stratification and treatment recommendation for localized prostate cancer. The addition of an interactive interface further enhances clinical utility, supporting efficient and accurate decision-making. These findings underscore the potential of AI-driven tools to streamline localized prostate cancer management at the point of care.
Presented by: Umair Ayub, PhD, MS. Postdoctoral research fellow at Mayo Clinic Arizona. United States,
Written by: Julian Chavarriaga, MD – Urologic Oncologist at Cancer Treatment and Research Center (CTIC) via Society of Urologic Oncology (SUO) Fellow at The University of Toronto. @chavarriagaj on Twitter during the American Society of Clinical Oncology (ASCO) 2025 Annual Meeting, Chicago, IL, Fri, May 30 – Tues, Jun 3, 2025.