Deep Learning-Based PSMA PET Segmentation Repeatability: A Post-Hoc Analysis of a Single-Center, Prospective, Test-Retest Trial

PSMA PET images from patients treated for advanced prostate cancer can provide biomarkers to help manage treatment. These biomarkers frequently require all individual metastases to be segmented, which can represent a highly labour-intensive task if undertaken manually.

Artificial intelligence (AI) models can quickly generate reproducible segmentations of PSMA PET scans, but the repeatability of AI-derived imaging biomarkers remains underexplored. Repeatability analysis characterises the intrinsic variability in biomarkers from test-retest imaging, defining minimum thresholds of change for true biological change in an imaging biomarker. This is critical for enabling these AI-derived biomarkers to be used in response assessment settings.

Further complicating accurate AI-based response assessment is the myriad of radiotracers used for PSMA-targeted PET imaging clinically, such as [68Ga]Ga-PSMA-11 and [18F]F-PSMA-1007, which possess unique biodistributions and pharmacokinetic profiles. There is limited data on the repeatability of AI-derived PSMA PET biomarkers either within or between radiotracer types, which has implications for AI-based response assessment and incorporation into common response frameworks such as RECIP 1.0. Our primary objective was to quantify the test-retest repeatability of AI-derived PSMA PET biomarkers, with a secondary objective to assess the generalisability of our segmentation model across these two radiotracers.

In this post-hoc analysis of a previous prospective, single-centre, test-retest trial, the repeatability of AI-derived patient-level imaging biomarkers extracted from [68Ga]Ga-PSMA-11 and [18F]F-PSMA-1007 PET scans was quantified. 17 participants with metastatic prostate cancer were randomised into two groups, either receiving the same tracer for both scans (intra-tracer group, n = 9), or a different tracer (inter-tracer group, n = 8). Scans were delineated using a fully automated AI-based method developed by our group, and semi-automatically by a nuclear medicine physician. We extracted several patient-level metrics, including PSMA-positive tumour volume, and characterised their repeatability through Bland-Altman analysis and repeatability coefficients.

Our results showed strong correlations between AI- and physician-derived quantitative biomarkers (e.g., rₛ = 0.85 for PSMA-positive tumour volume). Importantly, the repeatability of AI-derived biomarkers was dependent on both the tracer and the disease burden. In the intra-tracer group, the AI-derived PSMA-positive tumour volume demonstrated a repeatability coefficient of 13.8% in those with higher disease burden (≥ median tumour volume). This is well within the 20% progression threshold defined by RECIP 1.0, suggesting that AI-derived tumour volumes are sufficiently stable for use in this framework for response assessment.

However, markedly poorer repeatability was observed when different tracers were used for test and retest scans. This indicates that the use of different PSMA-targeting radiotracers between timepoints introduces substantial variability, independent of biological change, in biomarker values, suggesting that the same radiotracer should be used wherever possible when utilising quantitative PSMA PET biomarkers for response assessment.

The AI segmentation model also showed good generalisability in performance between radiotracers. Despite being trained exclusively on [⁶⁸Ga]Ga-PSMA-11 scans, there were no significant differences in segmentation performance between [⁶⁸Ga]Ga-PSMA-11 and [¹⁸F]F-PSMA-1007 scans across voxel- and lesion-level metrics.

These findings provide important quantitative evidence that AI-derived PSMA PET imaging biomarkers can achieve repeatability consistent with frameworks for assessing disease progression on PSMA PET, provided that the same radiotracer is used. This represents an important step toward clinical translation of AI-based biomarker quantification for use in response frameworks such as RECIP 1.0.

Written by: Jake Kendrick,^1,2,3 Jeremy S.L. Ong,¹ Nathaniel Barry,^4,5,6 Martin A. Ebert,^3,4,5,6

School of Physics, Mathematics and Computing, The University of Western Australia, Crawley, Perth, WA, Australia.
Centre for Advanced Technologies in Cancer Research, Perth, WA, Australia.
Australian Centre for Quantitative Imaging, University of Western Australia, Crawley, WA, Australia.
School of Physics, Mathematics and Computing, The University of Western Australia, Crawley, Perth, WA, 6009, Australia.
Centre for Advanced Technologies in Cancer Research, Perth, WA, Australia.
Department of Radiation Oncology, Sir Charles Gairdner Hospital, Perth, WA, Australia.

Read the Abstract

Login