| European Urology - Prostate Cancer Nomograms: An Update |
|
|
|
|
|
| Wednesday, 25 October 2006 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Volume 50, Issue 5, Pages 914-926 (November 2006) 1. Introduction: The field of prostate cancer (PCa) prognostics has exploded in the last decade, and clinicians have been provided with numerous tools to assist with evidence-based medical decision-making [1]. Most of these “decision aids” consist of nomograms such as the Kattan nomogram of biochemical recurrence (BCR) after radical prostatectomy (RP) [2], [3], [4], [5], artificial neural networks (ANNs) that were pioneered by Snow et al. [6], probability tables such as the perhaps most widely known and applied Partin staging tables [7], and Classification and Regression Tree (CART) analyses, such as the Hamburg lymph node or side-specific extracapsular extension (SS-ECE) algorithms [8], [9]. These distinct models address various PCa outcomes that range from prediction of biopsy outcome in men considered at risk of PCa [10], [11], [12], [13], [14], [15], [16] through prediction of specific pathologic features [17], [18], [19], [20], [21], [22], [23], [24], such as the likelihood of Gleason upgrading between biopsy and RP pathology [18], to prediction of side-specific extracapsular extension [21] at RP and death from hormone-refractory PCa [25]. For some outcomes more than one model might be available, which makes model selection difficult. Because of the overwhelming output in the field of PCa outcomes and prognostics as well as equally high predictive accuracy (PA) measures compared to ANNs and other machine learning methods and, most importantly, due to better comparability, in this update we decided to focus only on PCa probability nomograms that are based on traditional logistic regression and Cox regression analyses [26], [27], [28], [29]. 2. Defining and reading nomograms Various distinct statistical methodologies have broadly been described as “nomograms.” However, the statistical definition of a nomogram applies to a specific functional representation that graphically displays prediction models using lines with numeric scales based on traditional statistical methods such as multivariable logistic regression analysis to predict a binary outcome or Cox regression analysis to predict a prognostic outcome [1], [29]. Fig. 1 displays an example of a nomogram predicting Gleason sum upgrading between biopsy and final pathology [18]. To obtain nomogram-predicted probability of biopsy upgrading, locate the patient values at each axis. Subsequently, draw a vertical line to the “Point” axis to determine how many points are attributed for each variable value. Then, sum the points for all variables. Locate the sum on the “Total Points” line to be able to assess the individual probability of biopsy Gleason sum upgrading. Fig. 1. Nomogram predicting Gleason sum upgrading between biopsy and radical prostatectomy pathology. To obtain nomogram-predicted probability of biopsy upgrading, locate patient values at each axis. Draw a vertical line to the “Point” axis to determine how many points are attributed for each variable value. Sum the points for all variables. Locate the sum on the “Total Points” line to be able to assess the individual probability of biopsy Gleason sum upgrading on the “P(Upgrade)” line. PSA=prostate-specific antigen (ng/ml); BX Gleason Pri=primary biopsy Gleason score; BX Gleason sec=secondary biopsy Gleason score; P(Upgrade)=probability of biopsy Gleason sum upgrading. The Loess calibration plot, which graphically explores the correspondence between nomogram-predicted probability and observed rate of Gleason sum upgrading between biopsy and final pathology is shown in Fig. 2[18]. Its x-axis represents the nomogram-predicted probability, and its y-axis represents the observed rate of Gleason sum upgrading. Perfect predictions correspond to the 45° line.Fig. 2. Calibration plot of a nomogram for prediction of biopsy Gleason sum upgrading, where the x-axis represents predicted probability and the y-axis represents observed fraction with evidence of upgrading between biopsy and final pathology. Perfect predictions correspond to the 45° line. Points estimated below the 45° line represent overprediction, whereas points situated above 45° line represent underprediction. A nonparametric, smoothed curve indicates the relationship between predicted probability and observed frequency of biopsy Gleason sum upgrading. Vertical lines indicate the frequency distribution of predicted probabilities. 3. Nomogram criteria It is important to note that the following criteria apply to nomograms and other prediction models and might be proposed, as follows: (1)Level of complexity represents an important consideration. Excessively complex models are clearly impractical in busy clinical practice. Similarly, models that require computational infrastructure might pose problems with their applicability. For example, ANNs can accurately predict several outcomes of interest [6], [13], [26], [27]. However, the use of ANNs might be restricted due to lack of access to the ANN code or lack of computer infrastructure. Probability tables, such as the Partin tables [7], decision trees based on CART models [8], [9], or nomograms [10], [11], [12], [14], [15], [16] represent user-friendly, paper-based alternatives, which bypass these problems. (2)Predictive accuracy (PA) is the most important consideration [26], [27], [28], [29], [30], [31], [32]. Current statistical methods offer the possibility of assessing a model's PA. Usually, PA is derived using the receiver operator characteristic (ROC) area under the curve (AUC) and is expressed as a percentage. The ROC is discriminatory; conversely PA is based on both discrimination and calibration. PA values range from 50% to 100%, where 50% is equivalent to a flip of a coin and 100% represents perfect prediction. No model is perfect and generally accepted PAs range from 70% to 80% [1], [2], [3], [4], [5], [6], [7]. PA should ideally be confirmed in an external cohort. Alternatively, statistical methods such as bootstrapping may be used to internally validate the model [12], [15], [27], [31], [32]. (3)Performance characteristics represent another important consideration. Accuracy indicates the overall ability of the model to predict the outcome of interest. However, the overall PA does not inform the user on how good or how bad the predictions might be in specific patient subgroups. Some models may be ideally suited for predictions in high-risk patients but may work poorly in low-risk patients. Other models may predict well throughout the range of predictions [27]. (4)Model generalisability is important because patient characteristics can vary. For example, PCa characteristics may not be the same in Europe as in the United States [21]. Prior to using a tool, the clinician should ensure that it was validated in patients with similar disease characteristics [33], [34], [35], [36]. For example, the preoperative BCR Kattan nomogram has been validated in a community-based cohort [34]. (5)Finally, when judging a new tool, one should examine its accuracy, validity, and performance characteristics relative to established models, with the intent of determining whether the new model offers advantages relative to available alternatives [27], [28], [29], [30], [31], [32]. Availability of several high-quality predictive models should encourage the clinician to adopt these tools into everyday clinical practice. Arguments favouring such behaviour include standardisation of care and of decision-making. 4. Nomogram limitations Despite their advantages, the limitations of nomograms must be acknowledged. Every nomogram depends on its development cohort. Therefore, it needs to be mentioned that most of the PCa nomograms are based on either single-centre series or data from tertiary care referral centres or both [2], [3], [4], [5], [18]. (1)Despite prospective data collection, nomogram modeling itself represents a retrospective statistical methodologic approach. (2)Nomogram update. Tools that were developed in a different era may not provide equally accurate predictions in contemporary patients. For example, nomograms that are based on systematic sextant biopsy information should be updated according to the current gold standard, namely, extended biopsy schemes [22]. (3)Finally, the predicted outcome of interest needs to be put in perspective. PCa nomograms covering pathologic stage predictions or BCR after RP clearly represent established and clinically useful decision aids. However, BCR prediction after treatment only represents a surrogate end point and the definitive assessment of the effect of any predictor will require analyses of survival or metastatic progression rates. However, D’Amico et al. showed that patients with evidence of BCR are at increased risk for dying of PCa [37]. Moreover, it is encouraging to note that several nomograms have been recently published that predict outcomes beyond the BCR prediction [38], [39], [40], [41], [42], [43]. 5. Clinical value of nomograms Controversy surrounds the question of the clinical value of nomograms. Studies have shown that nomograms predict more accurately than clinicians [44], [45]. Thus, it appears that nomograms have a better ability to predict the outcome of interest than even expert clinicians. It is conceivable that the advantage related to the use of nomogram predictions may be even more important if clinical ratings were obtained from less expert clinicians. In breast cancer, nomogram prediction clearly outperformed clinical judgement (72% vs. 54%) where 50% equals a flip of a coin and 100% represents perfect prediction. [44]. In the field of PCa, Ross et al. [45] showed that urologist predictions of BCR after RP were inferior to the nomogram (concordance index decreased from 67% to 55%, −12%, p<0.05). Increases in accuracy related to the use of nomograms may not only be of statistical significance but more importantly, they may be clinically meaningful. For example, an increase in accuracy of 12% translates into 120 men of 1000 patients who are provided with accurate predictions when the nomogram is used [44]. This figure needs then to be extrapolated to the disease prevalence and subsequently to the number of diagnostic or therapeutic procedures. Thus, from a health economic, medical, and personal standpoint, a small increase in predictive accuracy (PA) translates into a clinically important number of patients who are being provided with accurate predictions. However, criteria for selecting a model need to be considered. For example, patients who received neoadjuvant hormonal therapy are excluded in most of the models. Consequently, those patients cannot be subjected to models where those patients were excluded [18]. 6. Patient perspective Patients are becoming increasingly aware of the existence of predictive tools. This trend is likely to increase in the future. Patients are also increasingly demanding to actively participate in decision-making, which may, in part, be explained by the following observations: (1)Advances in therapeutics have offered numerous treatment options, and men no longer accept paternalistic physician-centered treatment decision-making. Instead, they demand to know the efficacy and detailed side-effect profiles of treatment alternatives. (2)The patient is increasingly recognised as a pivotal player in medical decision-making. Decisions can no longer be made by the physician alone. For example, the American Urological Association suggests a detailed informed consent prior to prostate-specific antigen (PSA) testing. (3)Health care “consumerism” is a growing phenomenon in North America and Europe. Patients select what option of health care to purchase, rather than passively receiving a given treatment modality. (4)Attention to bioethical considerations has greatly increased over the past decade and has promoted autonomous decision-making. Thus, it may be postulated that greater emphasis will be placed on standardised predictions, which will further promote the development of new tools or the improvement of existing predictive tools. These considerations may motivate clinicians to adopt the use of decision tools. Their motivation may also stem from the wealth of clinical data used for the development and validation of each model. Most decision tools are based on thousands of observations, and it is virtually impossible to achieve that level of clinical exposure and expertise on an individual basis. Moreover, most clinicians do not have the capacity to systematically record or remember the risk characteristics of thousands of patients. Additionally, unlike computers, clinicians are incapable of systematically and cumulatively processing the recorded risk characteristics and outcomes of historic cases and to derive an estimated probability of outcome for a new case at hand. Thus, it may be expected that the majority of physician-derived estimates are not as accurate as computer-derived decision models [44], [45]. Despite this advantage, decision tools are not meant to replace clinical judgement. The input from clinicians needs to be weighed against the pros and cons of several other considerations, such as comorbidity, cost, and social, religious, or emotional considerations. The above criteria are meant to provide guidelines for the process of decision aid selection. However, a list of hypothetical criteria might not appeal to clinicians and insecurity may persist in choosing a reliable decision aid. To address this issue, we provide an organised update of PCa nomograms. Moreover, we recorded predictor variables, the outcome of interest, and the number of patients who were used to develop the nomogram, nomogram-specific features as well as their accuracy estimates, and whether some kind (either internal or external) of validation has been performed. 7. Prediction of biopsy outcome Prediction tools are needed to assist with the identification of those at highest risk of harbouring PCa on either initial, repeat, or saturation biopsy [10], [11], [12], [14], [15], [16], [46]. Table 1 displays these efforts within the initial biopsy setting. However, since the late 1990s, an extended biopsy scheme represents the standard of care in the early detection of PCa. This scheme consists of at least 10 biopsy cores and increases the detection rate by 30% relative to the sextant scheme [16]. This trend suggests that nomograms developed in the sextant biopsy era may not be able to predict the probability of PCa on needle biopsy in the extended biopsy era equally as accurately as they did in the sextant biopsy era. Based on this assumption, many clinicians are reluctant to use tools that were developed in the sextant biopsy era [10], [11], [12], [13].
DRE=digital rectal examination; PSA=prostate-specific antigen; TRUS=transrectal ultrasound; fPSA=free prostate-specific antigen. Within the repeat biopsy setting, PCa detection rates are as high as 30% and continue to remain elevated at subsequent biopsy sessions, as evidenced by positive biopsy rates of 13–35% at saturation biopsy [14], [15]. However, despite elevated repeat biopsy rates, not all men are at an equally high risk of having PCa after one or several previously negative biopsy sessions. A repeat nomogram was 71% accurate (Table 2) in an external validation [14], [15].
Further, Walz et al. recently reported an internally validated nomogram that was 70% accurate (Table 3) [16]. However, one may argue that a nomogram to predict repeat or saturation biopsy outcome consisting of nine rather complex predictor variables may not be directly applicable to a busy clinical practice.
8. Prediction of specific pathologic features of clinically localised prostate cancer As experienced by the wide use of the Partin probability tables, prediction of pathologic features has a significant impact on choosing an adequate treatment modality [7], [46]. Nomograms that predict specific pathologic features are shown in Table 4.
PSA=prostate-specific antigen; TRUS=transrectal ultrasound; SS-ECE=side-specific extracapsular extension; SVI=seminal vesicle invasion; LNI=lymph node invasion; PLND=pelvic lymphadenectomy. 8.1. Clinically insignificant prostate cancerThe lifetime risk of developing PCa approximates 11%, but the risk of dying from the disease is only 3.6% [17]. The far greater prevalence of histologic or “clinically insignificant” PCa (IPCa) has been cited in support of conservative management of the disease. IPCa, defined as organ confinement with cancer volume <0.5cc without Gleason pattern 4 or 5, appears to pose little risk to the life or health of the patient [46]. As shown in Table 4, three nomograms were developed by Kattan et al. Each is internally validated and from 64% to 79% accurate [17]. 8.2. Gleason sum upgrading between biopsy and final pathology 8.3. Extracapsular extension 8.4. Seminal vesicle invasion 8.5. Lymph node invasion 9. Prediction of biochemical recurrence following radical prostatectomy
Table 5.
Prediction of biochemical recurrence after radical prostatectomy PSA=prostate-specific antigen; BCR=biochemical recurrence; ECE=extracapsular extension; +SM=positive surgical margins; SVI=seminal vesicle invasion; LNI=lymph node invasion. The preoperative nomogram was originally 74% accurate after internal validation [2]. Graefen et al. [33] and Greene et al. [34] externally validated this preoperative nomogram in internationally assembled external cohorts consisting, respectively, of 6232 and 1701 patients. Additionally, in these validation studies, the authors tested different PSA recurrence definitions, institutional variability, and delivery of neoadjuvant therapy. PA did not seem to be substantially affected [33], [34].Moreover, Stephenson et al. most recently published a new preoperative BCR nomogram that is adjusted for the year of surgery (after 2003) and includes more detailed biopsy information such as the number of positive and negative cores [3]. This new nomogram is internally and externally validated with a concordance index of 0.76 and 0.79, respectively, and extends BCR predictions up to 10 yr. It allows the clinician to estimate the probability of recurrence at any point in time from 1 to 10 yr after RP. However, its predictions are likely to be valid in regions where PSA screening is widespread. Thus, for European patients, the original preoperative nomogram is better suited than this new nomogram [3]. 10. Prediction of biochemical recurrence following radiation therapy Table 6.
Prediction of biochemical recurrence after radiation therapy
BCR=biochemical recurrence; ASTRO=; PSA=prostate-specific antigen; XRT=external radiation therapy. 11. Prediction of metastasis following external-beam radiation Table 7.
Prediction of distant metastasis after three-dimensional conformal radiation therapy PSA=prostate-specific antigen.
12. Prediction of metastasis following radical prostatectomy Table 8.
Prediction of distant metastasis
In the Slovin study, the outcome was the length of predicted distant metastasis-free survival. PSA=prostate-specific antigen; RP=radical prostatectomy; PSA DT=prostate-specific antigen doubling time.
In 239 patients with evidence of BCR following RP and no adjuvant treatment, Dotan et al. addressed the prediction of bone metastases as a binary outcome. They developed an internally validated nomogram that was 93% accurate [38]. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||









