When selecting a dataset, it is important to consider the following points:
- Dataset complexity (which increases with the sample size), file structure (single vs. multiple records per individual), and complexity of survey design
- Cost
- Time to acquire access and internal review board approval
- Ability to link to other datasets
Table 1: Comparison of some of the commonly used databases:1

Once the data has been acquired and analyzed, Dr. Albertsen stressed that it is important to use the data to tell a meaningful story and always keep the ultimate goal in mind. Sensitivity analyses are always good to use as they improve the overall story. It is also critical to make sure that the findings are internally consistent, and intelligent use of figures is always recommended, as these are more easily remembered by readers.
Dr. Albertsen concluded his great talk and stated the main advantages and disadvantages of these databases. The advantages include:
- Ability to assess outcomes that would be impossible to assess with primary data collection
- Large sample sizes can be available in relatively short time frames
- Costs are frequently a fraction of the primary data collection
- Modern computers can perform remarkably sophisticated analyses
- Very difficult to prove causality
- Populations and data elements included can often lead to selection biases
- Key variables that are needed to control for confounding are often absent
- Missing data can compromise generalizability and long-term outcomes
- Data elements often lack validation
Written By: Hanan Goldberg, MD, Urologic Oncology Fellow (SUO), University of Toronto, Princess Margaret Cancer Centre @GoldbergHanan at American Urological Association's 2019 Annual Meeting (AUA 2019), May 3 – 6, 2019 in Chicago, Illinois
Reference:
1. Boffa DJ, Rosen JE, Mallin K, et al. Using the National Cancer Database for Outcomes Research: A Review. JAMA Oncol. 2017 Dec 1;3(12):1722-1728. doi: 10.1001/jamaoncol.2016.6905.