PURPOSE:We examined the degree of exclusion bias that may occur due to missing data when grouping prostate cancer cases from the SEER (Surveillance, Epidemiology and End Results) database into D'Amico clinical risk groups.
Exclusion bias may occur since D'Amico staging requires all 3 variables to be known and data may not be missing at random.
MATERIALS AND METHODS: From the SEER database we identified 132,606 men with incident prostate cancer from 2004 to 2006. We documented age, race, Gleason score, clinical T stage, PSA and geographic region. Men were categorized into D'Amico risk groups. Those with 1 or more unknown tumor variables (prostate specific antigen, T stage and/or Gleason score) were labeled unclassified. We compared the value of the other 2 known clinical variables for men with known vs unknown prostate specific antigen, Gleason score and T stage. Demographics were compared for those with and without missing data. Results were compared using chi-square and logistic regression.
RESULTS: Of the men 33% had 1 or more unknown tumor variables with T stage the most commonly missing variable. There was no clinically significant difference in the value of the other 2 known tumor variables when T stage or prostate specific antigen was missing. Men older than 75 years were more likely to have unknown variables than younger men. There was significant geographic variation in the frequency of unclassified D'Amico data.
CONCLUSIONS: In studies in which the data set is limited to men who can be classified into a D'Amico risk group 33% of eligible patients are excluded from analysis. Such men are older and from certain SEER registries but they have tumor characteristics similar to those with complete data.
Elliott SP, Johnson DP, Jarosek SL, Konety BR, Adejoro OO, Virnig BA. Are you the author?
University of Minnesota, Minneapolis, Minnesota.
Reference: J Urol. 2012 Jun;187(6):2026-31.