SIU 2017: Automatic Grading of Prostate Cancer using the Gleason Grading Groups

Lisbon, Portugal ( Prostate cancer (PCa) grade in the diagnostic biopsy is an important determinant of patient management. The Gleason grading groups (GGG) are recently introduced for grading of prostate cancer. This new scoring system is subject to a similar inter-observer agreement as the Gleason score (60%). Computer aided diagnosis systems using convolutional neural networks (CNN) have shown to approach human performance in diagnosing skin disease, while reducing inter-observer variability. A CNN learns to recognize patterns from a gold standard and distinguish preset categories. By using a CNN on digital histology slides, it can learn to detect abnormalities and give an indication of likelihood of tumor presence. The aim of this feasibility study was to explore the accuracy of a CNN for grading of prostate biopsies using the recently introduced GGG.

H&E stained formalin fixed paraffin embedded core biopsies from ten patients were digitized using the Philips Ultra-Fast Scanner at 20X magnification. The gold standard was a set of manual annotations, confirmed by a dedicated urologic pathologist. The CNN was trained on 150 annotated biopsy fragments to differentiate between the different Gleason scores. The GGG were constructed out of the two Gleason grades with highest percentages, dividing grade groups 1, 2 and >3, based on the clinical consequence. A test set, which consisted of 15 biopsy fragments that were not used in the training of the CNN, was generated. The accuracy was calculated by dichotomizing the grading of the test set, in order to separate the different treatment groups, and comparing this result to the classification in the pathology report.

In distinguishing GGG <I from > II, the CNN shows a sensitivity, specificity and accuracy of 65%, 93% and 75%, respectively. Distinguishing GGG <II from III shows a sensitivity, specificity and accuracy of 100%, 67% and 73% respectively.

This feasibility study shows the potential value of a CNN in the grading of PCa. Naturally, in order to validate the results of CNN, multiple pathologists need to confirm the gold standard to correct for the inter-observer variation. A larger and more heterogeneous dataset is required to further validate these results.

Presented by: Jansen I
Affiliation: Academic Medical Center, Amsterdam, The Netherlands

Written by: Hanan Goldberg, MD, Urologic Oncology Fellow (SUO), University of Toronto, Princess Margaret Cancer Centre.Twitter: @GoldbergHanan at the 37th Congress of Société Internationale d’Urologie - October 19-22, 2017- Lisbon, Portugal