Development of a Natural Language Processing Engine to Generate Bladder Cancer Pathology Data for Health Services Research

To take a first step towards assembling population based cohorts of bladder cancer patients with longitudinal pathology data, we developed and validated a natural language processing (NLP) engine that abstracts pathology data from full text pathology reports.

Using 600 bladder pathology reports randomly selected from the Department of Veterans Affairs, we developed and validated an NLP engine to abstract data on histology, invasion (presence versus absence and depth), grade, presence of muscularis propria, and presence of carcinoma in situ. Our gold standard was based on independent review of reports by two urologists, followed by adjudication. We assessed NLP performance by calculating accuracy, positive predictive value (PPV), and sensitivity. We subsequently applied the NLP engine to pathology reports from 10,725 bladder cancer patients.

When comparing the NLP output to the gold standard, NLP achieved the highest accuracy (0.98) for presence versus absence of carcinoma in situ. Accuracy for histology, invasion (presence versus absence), grade, and presence of muscularis propria ranged from 0.83 to 0.96. The most challenging variable was depth of invasion (accuracy 0.68), with acceptable PPV for lamina propria (0.82) and muscularis propria (0.87) invasion. The validated engine was capable of abstracting pathologic characteristics for 99% of bladder cancer patients.

NLP had high accuracy for five of six variables and abstracted data for the vast majority of patients. This now allows for assembly of population based cohorts with longitudinal pathology data.

Urology. 2017 Sep 12 [Epub ahead of print]

Florian R Schroeck, Olga V Patterson, Patrick R Alba, Erik A Pattison, John D Seigne, Scott L DuVall, Douglas J Robertson, Brenda Sirovich, Philip P Goodney

White River Junction VA Medical Center, White River Junction, VT; Section of Urology; Norris Cotton Cancer Center, Dartmouth Hitchcock Medical Center, Lebanon, NH; The Dartmouth Institute for Health Policy and Clinical Practice, Geisel School of Medicine at Dartmouth College. Electronic address: ., VA Salt Lake City Health Care System and University of Utah, Salt Lake City, UT., White River Junction VA Medical Center, White River Junction, VT; Section of Urology., Section of Urology; Norris Cotton Cancer Center, Dartmouth Hitchcock Medical Center, Lebanon, NH., White River Junction VA Medical Center, White River Junction, VT; The Dartmouth Institute for Health Policy and Clinical Practice, Geisel School of Medicine at Dartmouth College.