Interpreting Phase III Clinical Trial Data - Susan Halabi
November 24, 2019
Susan Halabi, Ph.D., Professor of Biostatistics and Bioinformatics, Duke Cancer Institute, Durham, North Carolina, United States
Alicia Morgans, MD, MPH, Associate Professor of Medicine in the Division of Hematology/Oncology at the Northwestern University Feinberg School of Medicine in Chicago, Illinois.
Read: APCCC 2019: Subgroup Analysis of mCRPC Trials
Alicia Morgans: Hi, I'm thrilled to have here with me today, Dr. Susan Halabi, who was a Professor of Biostatistics and Bioinformatics at Duke University. Thank you so much for being here.
Susan Halabi: Thank you Dr. Morgans. It's always a pleasure to talk to you.
Alicia Morgans: Well, thank you. So I really look to you and your guidance when I think about interpreting the large Phase III clinical trials that we have. We are so fortunate in prostate cancer to have a number of Phase III trials that have actually continued to roll in over the last few years. But when interpreting these trials, I know that there are pitfalls that we need to avoid. And you talked about those today at APCCC 2019 and I'm just wondering, can you share some of the most important pearls for interpreting these clinical trials to the listeners, so that we can actually do things right?
Susan Halabi: Yeah, absolutely. As you're aware that most clinical trials are really designed to answer the primary question. And no matter what, whether the trial is positive or negative, there's always the temptation to do subgroup analysis.
However, the subgroup analysis have a lot of mistakes could happen there. And I would like to share with the listeners some of the, perhaps the three top errors that we can avoid. Because after all, subgroup analysis should be done only for hypothesis generation and nothing beyond.
So the first really mistake to avoid or to keep in mind and to consider when we look at the data is the type I error rate. Obviously the more hypotheses we test, the more likely that something is going to be by chance alone.
Alicia Morgans: Yes.
Susan Halabi: So even though we adjust for the type I error rate, that's not going to be sufficient to give us the level of evidence that's the highest level. Because there are other issues that we have to worry such as the power. The power is important because usually the trial is designed with an overall power of let's say 85% or 90% but to test another hypothesis in a subgroup would require also the power to be high. And unfortunately, we don't have that power.
So the best way to move forward is when we look at the data, we take them with a big grain of salt and focus and emphasize the overall direction of the trial. And not only the overall results of the trial, but when we look at the subgroup, you want to make sure the direction is consistent across trial. And with highly, my preference is not to report any p-values and just focused on the hazard ratios and the 95% confidence interval.
Alicia Morgans: I love the way you showed a figure actually or it was a table of subgroup analysis and you actually Xed out all of the p-values for us. But that really emphasize something that I think is also so important, the span or the width of the confidence intervals.
So if we're looking at confidence intervals in a subgroup analysis that are very tight versus confidence intervals that are actually quite wide, how would you interpret that? Because I knew, I think about that in a very certain way and I don't know how everyone does. So what do you think about these ... the width of the intervals? When you say it's important for us to focus on that, the hazard ratio and then the width of these confidence intervals, what do you mean?
Susan Halabi: Yeah, thank you for the opportunity to clarify that. So when we look at the width of the confidence interval, obviously when you're looking at the hazard ratio, you want the hazard ratio to be within ... The hazard ratio which is an estimate of clinical benefit in a trial. You want that hazard ratio to fall in within a very small interval. You want it to be tight because then your level of confidence is higher. If let's say the confidence interval range from .2 to one, then we don't have that confidence.
And normally when you do subgroup analysis, what's happening is the number of events in that subgroup is very small, so your level of confidence is very low. It's not going to be high.
Now on the other hand, if you have at least 50% or higher of the events happening in a subgroup, then I think my level of confidence will go higher. Because then I know that it's not based on 10 deaths or 20 deaths, it's based on let's say 60% of total events of the trial or 70%. Then my level of confidence is higher and this is why you see tight confidence interval.
Alicia Morgans: Yes.
Susan Halabi: To be exact, if let's say the hazard ratio from the trial for overall survival was .7 and the 95% confidence interval range between .65 and let's say .75 then this, I would consider a very narrow confidence interval, which shows the reliability and the validity of the result. So in many ways, this is, even though interpreting the width of the confidence interval is challenging for most clinicians. I think this give us more security in terms of ... the reliability.
Yes exactly, the reliability, the strength of evidence for that drug.
Alicia Morgans: So if we're looking at a subgroup, that it contains 15 people. For example, African-Americans in a trial where there were only 30 African Americans, 15 on treatment, there were, say, five events out of 600 events, if we were so lucky in a trial. That confidence interval would be very wide. And that's not something that we can look at and that's why people do meta-analyses and pool those trials and bring the number of people in that subgroup and the number of events up so that we can actually be more confident in our assumptions.
So the take-home point is subgroup analyses are helpful in some ways, but actually they spend our alpha and they make it such that our power is less as well.
When we're looking at these subgroup analyses, we should look at where the hazard ratio sits and how wide those confidence intervals are. Tighter competence intervals give us more confidence in the accuracy, the validity of that result. And help us really think more clearly about that subgroup. But still it's hypothesis generated only, right?
Susan Halabi: Absolutely correct. And of course, from my clinical point of view, people spend tons of resources to combat the types. Of course, we want to mind the data, we want to generate hypothesis and that's all legitimate. I just want to clarify one point, I'm not against a subgroup analysis. You can do subgroup analysis as long as it's pre-specified in the protocol a priori before even the trial has been activated,
There are ways where you can consider subgroup analysis either by considering a stratification variable in the trial. Or even powering the trial in such a way you're testing a treatment effect in all populations and testing the same treatment effect in subgroup population. That is legitimate, but at the end of the day, no matter what decision you take, there is a price to pay in terms of the type I error rate and the power.
Unfortunately, when we designed these trials, we designed them with only to get the level of evidence based on very limited information. And when you do subgroup analysis, you can think of it like we're really stretching beyond the limit of information that we're allowed to.
Alicia Morgans: Yes, I remember a statistical teacher once told me, if you beat the data enough, it's sure to talk. So I really appreciate you sharing your insights today and for helping us as clinicians think through the subgroup analyses. And I look forward to talking to you again. Thank you.
Susan Halabi: Me too, it's my pleasure. Thank you. Dr. Morgans.
Alicia Morgans: Thanks.