Characterization of High-Grade Prostate Cancer at Multiparametric MRI: Assessment of PI-RADS Version 2.1 and Version 2 Descriptors Across 21 Readers With Varying Experience (Multi Study) - Beyond the Abstract

In this study, we asked 21 radiologists of varying experience (7 seniors with more than 5 years of experience, 7 seniors with less than 5 years of experience and 7 juniors) to assess a set of 240 predefined prostate MR lesions using the PI-RADS v2 and v2.1 descriptors.


In multi-reader MRI studies, the radiologists usually assess the images without indicating preselected lesions. This design has been done repeatedly for prostate MRI and both the PI-RADS v2 and the PI-RADS v2.1 have provided good discrimination for clinically significant cancer (csPCa) but with moderate inter-reader reproducibility.1-3 However, with this ‘classical’ design, it may be difficult to assess, in case of disagreement between readers, whether such disagreement arises from discrepancies in lesion detection (e.g., one reader detected a lesion that the other one overlooked) or characterization (both readers did see the same lesion but disagree on its degree of suspicion).

In this study, we made three major methodological choices. First, by indicating to the readers the lesion(s) they should score, we have deliberately skipped the detection phase to assess only the characterization phase. Second, we have also insisted that the readers apply the written PI-RADS descriptors rigorously; we wanted to evaluate these descriptors rather than the readers themselves. Third, we retrospectively selected MRIs from 2015-2016 because at this time, our biopsy policy required to target virtually all prostate focal lesions, regardless of their degree of suspicion. This allowed the constitution of a dataset of lesions with a large range of degrees of suspicion for which we had a histological standard of reference.

Thus, our purpose was not to directly assess the PI-RADS v2 and PI-RADS v2.1 accuracy in clinical practice. It was rather to evaluate whether the PI-RADS descriptors were specific enough to allow readers with different experiences to assign the same scores to the same lesions.

To our knowledge, this question has not been extensively addressed in the literature. In particular, the reproducibility of diagnostic criteria such the lesion size or location (in the transition, peripheral or central zone) has never been assessed, even if these criteria have a substantial impact on the final PI-RADS score.

Are the PI-RADS v2 and v2.1 descriptors accurate and specific enough? The answer is nuanced.

On one hand, juniors obtained good results in discriminating csPCa, with an area under the curve (AUC) around 0.80, using both PI-RADS v2 and PI-RADS v2.1 descriptors. Thus, even with little experience, a reader applying strictly the PI-RADS v2/v2.1 descriptors will assign a higher score to csPCa lesions than to non-csPca lesions in 80% of the cases. This is not that bad. Interestingly, there was a trend for improved specificity with the PI-RADS v2.1 descriptors, but the effect remained small and variable across readers.

On the other hand, even if the PI-RADS descriptors are meant to be as clear as possible, experienced seniors obtained significantly higher AUCs than the two other groups of readers. This suggests that the descriptors remain subjective, and appropriately distinguishing ‘marked’ from ‘non-marked’ abnormalities, ‘encapsulated’ from ‘mostly encapsulated’ nodules, or ‘focal’ from ‘non-focal’ enhancement requires experience. Additionally, inter-reader agreement remained moderate in all groups of readers, for PI-RADS v2 and PI-RADS v2.1 descriptors, although the latter was created to improve reproducibility. Of note, even criteria that are usually taken for granted, such as the location of the lesion, did not show perfect agreement among readers.

Can we do better? One may doubt it. Recently, a major clarification has been made to improve the Bosniak classification of renal cystic lesions.4 And yet, although all descriptors are now strictly defined, the new classification failed to improve specificity and inter-reader agreement.5 When the appearance of benign and malignant conditions shows substantial overlap, as this is the case with prostate lesions and renal cysts, there might be an intrinsic limitation to qualitative/semi-objective human reading.

Some authors have suggested using quantitative thresholds for MR biomarkers such as the apparent diffusion coefficient, to help distinguish the different PI-RADS categories. However, this approach is limited by major variability across protocols and manufacturers.6

Artificial Intelligence-based algorithms may help improve csPCa discrimination on MRI in the long run. However, in that domain too, robustness remains an issue, and most algorithms show decreased performance when tested in independent external cohorts.7

Pending a hypothetical improvement due to Artificial Intelligence, the PI-RADS scoring system provides good discrimination of csPCa on prostate MRI and should remain the basis of prostate MRI, despite its moderate inter-reader variability.

Written by: Olivier Rouvière, MD, PhD, Department of Imaging, Hôpital Edouard Herriot, Lyon, France.

References:

  1. Park KJ, Choi SH, Lee JS, Kim JK, Kim MH. Interreader Agreement with Prostate Imaging Reporting and Data System Version 2 for Prostate Cancer Detection: A Systematic Review and Meta-Analysis. J Urol. 2020;204:661-70.
  2. Park KJ, Choi SH, Lee JS, Kim JK, Kim MH, Jeong IG. Risk Stratification of Prostate Cancer According to PI-RADS(R) Version 2 Categories: Meta-Analysis for Prospective Studies. J Urol. 2020;204:1141-9.
  3. Hotker AM, Bluthgen C, Rupp NJ, Schneider AF, Eberli D, Donati OF. Comparison of the PI-RADS 2.1 scoring system to PI-RADS 2.0: Impact on diagnostic accuracy and inter-reader agreement. PloS one. 2020;15:e0239975.
  4. Silverman SG, Pedrosa I, Ellis JH, Hindman NM, Schieda N, Smith AD, et al. Bosniak Classification of Cystic Renal Masses, Version 2019: An Update Proposal and Needs Assessment. Radiology. 2019;292:475-88.
  5. Smith AD. Bosniak Classification Version 2019: Counterpoint-It's Complicated. AJR Am J Roentgenol. 2022;218:421-2.
  6. Shukla-Dave A, Obuchowski NA, Chenevert TL, Jambawalikar S, Schwartz LH, Malyarenko D, et al. Quantitative imaging biomarkers alliance (QIBA) recommendations for improved precision of DWI and DCE-MRI derived biomarkers in multicenter oncology trials. J Magn Reson Imaging. 2019;49:e101-e21.
  7. Rouviere O, Jaouen T, Baseilhac P, Benomar ML, Escande R, Crouzet S, et al. Artificial intelligence algorithms aimed at characterizing or detecting prostate cancer on MRI: How accurate are they when tested on independent cohorts? - A systematic review. Diagnostic and interventional imaging. 2022.
Read the Abstract