The Statisticians Perspective on Methods of Comparing Prostate Cancer Clinical Trial Data - Matthew Sydes
April 11, 2020
Matthew R Sydes, MSc CStat CSci, Professor of Clinical Trials & Methodology, MRC Clinical Trials Unit at UCL London, UK. Matt is responsible for leading the unit's Trial Conduct Methodology activities as well as teams conducting research in prostate cancer and osteosarcoma. Matt has a particular interest in improving clinical trial conduct, particularly around: the use of routinely collected electronic health records (EHR) to support and run trials with Health Data Research UK; running trials with a view to regulatory use and submission; proportionate and efficient trial monitoring; clinical trial data sharing; communication of trial findings; adaptive and efficient designs for late-phase trials, including in uncommon conditions; and the functioning of Data Monitoring Committees. On this, he was involved in the DAMOCLES project which set standards for (independent) Data Monitoring Committees and led to the widespread use of charters for trial committees. He is part of the faculty for UCL's regular 1-day course on Data Monitoring Committees in practice. Matt has served as a member of IDMCs and TSCs, often as chair, for more around 50 trials, attending around 200 meetings. Matt teaches on these topics on the UCL Institute of Clinical Trials and Methodology's MSc in Clinical Trials. He is supervising a number of Ph.D. students in these areas of methodological priority.
Alicia Morgans, MD, MPH Associate Professor of Medicine in the Division of Hematology/Oncology at the Northwestern University Feinberg School of Medicine in Chicago, Illinois, USA.
Alicia Morgans: Hi. I am delighted to have here with me today, Professor Matt Sydes of Clinical Trials and Methodology at University College London. Thank you so much for talking with me today.
Matt Sydes: Thank you for asking me.
Alicia Morgans: Wonderful. So I wanted to talk to you about the talk that you gave at APCCC 2019, really giving us a statistician's perspective on some of the data that we're trying to analyze. Can you tell me a little bit about that talk?
Matt Sydes: For sure. So I was asked to talk about my perspective on the data, supporting people's decisions on whether to use docetaxel or whether to use AR pathway inhibitors. And some of the challenges that there could be in interpreting those data.
And so I tried to talk about the various ways in which one can try to understand those effects. I mean first we can look at the trials that have reported on docetaxel so far. STAMPEDE and CHAARTED and GETUG-15 and we can see the hazard ratio, the combined hazard ratio for a meta-analysis. You can also see the combined hazard ratio from looking at the abiraterone trials that we've looked at. Even if we exclude the newer TITAN and ENZAMET studies that have come through. And so you can see those effect sizes.
But those trials were run in slightly different eras, in slightly different ways, in slightly different populations. So is it just that straight forward to look at the relative effects and say which one's better? And I would argue that it's probably a little bit more complicated than that. So you can look at direct comparisons or you can look at indirect comparisons. So I gave the example of within STAMPEDE, a trial which has been recruiting now for 15 years. And we reported on the benefit of docetaxel where we saw a hazard ratio of about 0.76 in metastatic patients. And we reported on our abiraterone cohort of randomized trial against the same comparison within STAMPEDE for the same eligibility criteria. And we saw a hazard ratio of 0.61, 0.62.
I should know that number. And I don't know that number off the top of my head. But very statistically significant improvement-
Alicia Morgans: Yes.
Matt Sydes: In survival with both of those. And the point estimate is larger in the abiraterone cohort.
So people may look at those two papers coming out of STAMPEDE and think it's a no brainer that abiraterone is better. It's a very small cohort of patients. I say small, just short of 600 patients in STAMPEDE, who were randomized to either abiraterone or docetaxel in the same window between November, 2011 and March, 2013. And those are the only directly randomized data for those two treatments. So we presented on those data and we published that information. I was asked partly to talk about that. And there for an early outcome measure, then we do see a clear advantage for abiraterone. But for long-term outcome measures, we see no evidence of a difference at all between the two treatments.
Alicia Morgans: Yes.
Matt Sydes: So I was picking out why that might be. And so even within one trial things shift a little bit potentially, the patients that go in. So if you have to be careful within one trial looking at two papers, I wanted people to be very cautious about how they look at papers across trials.
Alicia Morgans: Absolutely.
Matt Sydes: And then we also talked about indirect ways in which you can compare. You can put trials into meta-analyses. So we're used to looking at meta-analyses, we talked about that earlier, but network meta-analyses, you sort of triangulate from one point to another. You use these nodes and you sort of triangulate in multiple ways. And the STOpCaP meta-analysis team have run one network so far. And so they had a different way of looking at this. They've got a lot of time, really 6,000 patients in the network. And they saw an advantage. It looked like the most advantageous treatment based on aggregate data was for abiraterone. So it tells a slightly different story.
So I think it's complicated for people that there's two ways of looking at this and the story isn't necessarily the same. So we're trying to tease out in the presentation this morning, why that might be. And those complications in STAMPEDE are probably magnified across a number of trials. So it's not that either is right, one is right and one is wrong. It's two different methods of estimation and we're trying to understand why that might be.
And now with ENZAMET, with TITAN, with more data coming out, STOpCAaP network meta-analysis team. We'll be updating with individual patient data which allows them to take in time effects and volume and all kinds of things. So I think they'll have a pretty good ... JT and the unit team will have a pretty good estimate for you in the relatively near future. I hope and I hope you'll be talking to her about this at that point.
Alicia Morgans: Oh, I'm sure that we will. But I think it's really important just to emphasize that we are taught never to compare trial to trial. We are taught that piece of information. Yet people will pick up two different trials and they will try to compare. Oh, the hazard ratio here or the hazard ratio here. The side effects look different here than they look here. These are not within the same study for sure, and you really can't compare either of those things. A meta-analysis is different. This is specific methodology around combining that data on a more detailed level. But even then there are hazards and you could have different ways of approaching it as well.
Matt Sydes: Absolutely. It's tricky to do and you're right. So it's if clinicians, as they're going through the training, are taught not to make comparisons of papers. As you say, people will. It's natural. The eligibility criteria look the same, we must be able to compare. But I think you know, it's fine almost to make a light judgment. You know, it's probably the case, hold that opinion lightly and be ready to shift it. Would be my sense as much as possible. And pick out the similarities and pick out the differences.
Alicia Morgans: Yes.
Matt Sydes: I wonder sometimes if you should think about the patient in front of you in clinic. Would they meet the eligibility criteria for this trial, you know?
Certainly they're in a different year that you're talking to them. To that diagnosis or to those trials. So that's bound to be different. What are the other ways in which they would fit the criteria? What are the ways in which they wouldn't fit? So I think that's quite a helpful way to think about this. In which trial are they closer to?
Alicia Morgans: Yes. So one of the other things that I've noticed, before we wrap up, is that, you know, STAMPEDE has had, as many studies have, subgroup analyses and we try as clinicians to, again, we are told, subgroup analyses, even if pre-planned, may be underpowered if you're powering for the primary endpoint. How do you think about subgroup analysis interpretation within the context of a clinical trial? From a statistician's perspective?
Matt Sydes: Subgroup analyses are really challenging. They're really important because you want to understand how the data from a trial, which is quite broad, often very broad, applies to the patient in front of you. So you want to understand those subgroup analyses. You want the subgroup analyses to be interpretable for you. But there are always going to be ... They're going to have less power because you've got less patients contributing to them. One way to do this is to specify certain criteria, certain analyses you'll do upfront, and others you label much more explicitly as being exploratory.
Alicia Morgans: Yes.
Matt Sydes: So for example, when Chris Parker presented data last year on the radiotherapy comparison for metastatic patients in STAMPEDE, we had a biological rationale from the HORRAD trial that had already come out. That if there was an effect it was more likely to be in low volume patients.
So we pre-specified that we would do that analysis. And we're able to say, this is how much power we would have for that. It's not the ideal amount of power you do, if you're designing a trial just in those patients. But we pre-specified that we could do that.
We also had other subgroup analyses which are more exploratory where we looked by age or performance status and that's really just giving an indication of consistency of effect. But we're sort of holding that information quite lightly as well. And so when you divide into subgroups, you always going to have fewer patients in each of the groups. So your confident intervals are going to be wider than for your main effect. What people are looking for, people often get upset if the confidence intervals they see in a forest plot then cross one. Because that's what we're taught, for a main effect if it's less than one, there's a statistically significant effect.
Whereas the subgroups, it's not quite the same because you've got less patients. Your confidence intervals will always be wider. So I would encourage people not to get too excited about confidence intervals crossing one. It is helpful if people present heterogeneity tests. The heterogeneity test, which you see sometimes, but not always. If I have my way, you'd always see it.
Alicia Morgans: Yes.
Matt Sydes: Is a test of whether the difference between the treatments is different in the two subgroups. So isn't there a difference in the differences? Effectively.
Alicia Morgans: Yes.
Matt Sydes: And so it's just a simple measure of this. And there's variations on that. But that's a principle. Is there a difference in the differences? And you are hoping, really, most of the time, if you've got an overall effect, that there's no evidence for heterogeneity. There's no evidence of inconsistency. Of course, that test itself is often underpowered. And so you may see no evidence. You've got no evidence of heterogeneity. There may be heterogeneity, but there is no evidence. I always think it's very helpful to think about whether you've got evidence of something or no evidence of something. Rather than it's definitely black or white. But that's just my statistical fence-sitting, I suspect.
Alicia Morgans: No, but it's very, very helpful for the clinicians as well. So thank you so much for lending us your expertise for a little bit and for kind of taking the statistical black box and shining a light on it for us. We certainly appreciate that and I appreciate your time.
Matt Sydes: Thank you for having me. Thank you.