Evaluating Approaches for Constructing Polygenic Risk Scores for Prostate Cancer in Men of African and European Ancestry - Burcu Darst

August 2, 2023

Burcu Darst explores her team's study on developing polygenic risk scores for prostate cancer in men of African and European ancestry. This study aims at addressing health disparities, and uses a large, diverse prostate cancer GWAS involving over 230,000 men from various populations. The results indicate that men in the top polygenic risk score decile are three to four times more likely to have prostate cancer than those with an average genetic risk. Despite the efficacy of the score, Dr. Darst underscores the necessity for refining its performance through the inclusion of more diverse populations in future GWAS studies. She further discusses the potential of polygenic risk scores in risk-stratified screening, while acknowledging the challenges in interpretation and the need for prospective clinical studies to validate their clinical applicability. The team plans to continue their research on true causal variants and their role in disease risk prediction.


Burcu Darst, PhD, Fred Hutchinson Cancer Center, Seattle, WA

Andrea K. Miyahira, PhD, Director of Global Research & Scientific Communications, The Prostate Cancer Foundation

Read the Full Video Transcript

Andrea Miyahira: Hi everyone. Thanks for joining us today. I'm Andrea Miyahira here at the Prostate Cancer Foundation. Today I'm joined by Dr. Burcu Darst, an assistant professor at the Public Health Sciences Division at the Fred Hutch. Her group recently published the paper Evaluating Approaches for Constructing Polygenic Risk Scores for Prostate Cancer in Men of African and European Ancestry in the American Journal of Human Genetics. Dr. Darst, thanks so much for joining me today.

Burcu Darst: Yeah, thank you so much for having me. I'm excited to share this work today. So I'm going to talk about our paper that was just recently published. In this paper we're comparing a polygenic risk restore that we constructed a couple of years ago for prostate cancer and comparing it to some genome-wide approaches to develop polygenic risk scores. As recently, there's been some work showing that genome-wide approaches often work better than a statistically significant polygenic risk score. So just to give some motivation as to why we decide to pursue this project, prostate cancer represents one of the biggest health disparities in the US with black men having incidents and mortality rates that are by far higher than any other population in the US. There's very few established risk factors for incident overall prostate cancer, but those include age, race, and family history or germline genetics with prostate cancer being one of the most heritable cancers with a heritability of about 58%, suggesting that there is a really strong genetic component to prostate cancer.

However, genetic studies to date have been overwhelmingly represented by European ancestry individuals over the years. And consequently, when we develop polygenic models of complex traits, we tend to see that the predictive accuracy is increasingly poor in populations that are more ancestrally distant from European ancestry. And in particular, we often see the predictive accuracy being poorest in African ancestry individuals, which you can imagine that in conjunction with existing health disparities for instance, what we see in prostate cancer, that applying such polygenic models to complex traits could potentially exacerbate existing health disparities. To try to overcome this problem, we conducted a large and diverse prostate cancer GWAS a couple of years ago, which included over 230,000 men across the population shown here. We use a statistical fine mapping approach to identify variants that are more likely to have a causal association with prostate cancer risk, which led to a total of 269 known prostate cancer risk variants.

We use these variants to construct a polygenic risk score, which we call a multi ancestry polygenic risk score because we're using the multi ancestry results and weights from our prostate cancer GWAS. When we test this in a large independent sample of over 500,000 men, but many of which are from the Million Veteran Program, you can see that men in the top PRS decile have odds of prostate cancer that range from about three to about four fold increased compared to men with average genetic risk. So this is showing that the polygenic risk score is really conveying a lot of important information about risk of prostate cancer. So the polygenic risk score with 269 variants is a valid predictor of prostate cancer risk. But next we wanted to ask the question of whether the performance that we see for our polygenic risk score could be improved if we use these genome-wide approaches since they have been shown for many other traits to have better performance than just using variants statistically significantly associated with risk of a trait.

So to evaluate this question, we used the GWAS that I just talked about as training data to construct several different genome-wide approaches, genome-wide polygenic risk scores. We use six different genome-wide approaches, which are each shown here. As input, we use our GWAS summary statistics again from the paper from a couple of years ago. The variants that we use for each of these different approaches were from the, there are about 1.1 million variants from the HapMap3 panel. And we chose these variants because they're recommended by many of these different approaches and this is what's commonly used across papers that apply these genome-wide approaches. And then for LD reference or linkage equilibrium reference, we use participants from the 1000 Genomes Project.

Once we have these polygenic risk scores developed, we tested them in our testing data, which included African ancestry men from the California Uganda study, European ancestry men from the UK Biobank, and then we have additional external validation in African and European ancestry men from the Million Veteran Program. And all of these data sets are independent of the training data that's used to construct the polygenic risk scores. We then evaluate the polygenic risk score by looking at the area under the curve and the all odds ratio for the polygenic risk score. And we do this separately in each population. So looking first at results for African ancestry men in our training data. Here we're showing the AUC for a covariate only model. So this has just age and principle components of ancestry in it.

And then here for instance, we're showing the AUC when we add our polygenic risk score that has the 269 variants. And then this is showing the AUC when we add each of the different genome-wide approaches for all the other colors shown here. You can see just looking at the AUC that the AUC for our 269 variance is much higher than any of the AUCs for the genome-wide approaches shown here. And when we look at odds ratios, we consistently see that the odds ratio for our 269 polygenic risk score is higher than the odds ratio for any of the genome-wide polygenic risk scores.

Looking next at the European ancestry men in our testing data, you can see again that our 269 PRS has the highest AUC and odds ratio, but it's pretty similar to what's observed for this particular genome-wide approach, which is called PRS-CSx. Although it's similar, it's not better. It's really very similar to what we observed for our 269, but it doesn't show improved performance. And then looking at our validation data, now the African ancestry men from Million Veteran Program, you can see that now we're observing that our 269 PRS has significantly better performance than all of the other genome-wide approaches, which you can see because the confidence intervals don't overlap at all between R269 and the other approaches. And that's true both for AUC and odds ratios for prostate cancer.

And then last, when we look at the validation data for European ancestry men from the Million Veterans Program, again, we see that are PRS for 269 variants has the highest AUC and the highest odds ratios, but it's again pretty similar to what's observed for PRS-CSx but not better for PRS-CSx. So overall, we're seeing that a genome-wide polygenic risk score where we include over a million variants that are not necessarily associated with prostate cancer risk does not lead to improved performance of predicting prostate cancer compared to our 269 variants where we use a multi ancestry fine mapping approach to carefully select these variants. And in particular, we actually see that there's significantly worse performance in African ancestry men.

Some potential reasons for this discrepancy because this is not something that's usually observed for a lot of other traits. It's a little bit unique to prostate cancer that we're seeing that R260, but a genome-wide significant PRS as opposed to a genome-wide polygenic risk score seems to be doing better. There's some reasons for this. One is that we, potential reasons, one is that we're using a multi ancestry GWAS as opposed to European ancestry only, and we use a statistical fine mapping approach to try to identify variants that are potentially causally associated with prostate cancer risks.

Another could be that prostate cancer has unique genetic architecture. It's one of the most highly heritable cancers, and it also has a lot of large effect variants compared to the variants that are associated with other cancers, which could potentially contribute to why we're seeing such good performance for just variants that is significantly associated with prostate cancer. We weren't able to evaluate the genome-wide approaches in Asian and Hispanic men because sample sizes were not available in these populations. But that's something that'll need to be looked at further in future studies to see if these findings hold in other populations.

And although we're showing that our 269 PRS has really good performance compared to these genome-wide approaches, it's not necessarily suggesting that we've reached optimal performance for a polygenic risk score. We anticipate that with the inclusion of more diverse populations in more prostate cancer GWAS studies, that we'll see additional improvement in our polygenic risk score performance. So a lot of people contributed to this work, which has taken years to conduct, especially the GWAS, to construct the polygenic risk scores that we've been working with. But yeah, so I'd like to acknowledge all of those people and the funding that have contributed to this work over the years.

Andrea Miyahira: Thank you, Dr. Darst for sharing that. I guess I wanted to ask you to expand on why you think the genome-wide PRS scores may perform better than the standard PRS methods for predicting some disease but not prostate cancer, and also why you think there's such a big difference in African-American men versus European men for these scores performing similarly?

Burcu Darst: Yeah, so for the first question about why we're seeing this performance in prostate cancer, and it hasn't been seen in other traits, usually when other papers are comparing a genome-wide polygenic risk score to a genome-wide significant polygenic risk score, that risk score has often been constructed from a pruning and thresholding method, which means that they take the variants that are significantly associated with the trait and then they limit it to those that are independent of each other, so not correlated with each other. We do something that's kind of similar, but we have this added layer of doing the fine mapping to try to identify variants that are potentially causally associated with prostate cancer. And I think the idea is that if you're able to construct a polygenic risk score where all the variants are truly the causal variants, you would expect that that polygenic risk score should be a very good predictor of disease risk.

But because of the complex LD structure, the correlation between variants, when we run these GWAS studies, it's really hard for us to know in a particular region which of those variants is actually the causal variants. So we end up using variants that are maybe proxies of the causal variant, but maybe the performance would be best if we could actually identify that causal variant. So I think it would be really interesting for future studies of other traits to compare genome-wide approaches to those statistical fine mapping approach where they're trying to find those causal variants and see if the findings hold once they make those comparisons. But it could also just be that prostate cancer has unique genetic architecture. I think it could just be a trait dependent or trait specific thing where we see these differences in performance between these different approaches. So I think it's something that wouldn't be a one size fits all.

I think it's something that needs to be carefully investigated across different traits. And then for the question about why we're seeing, especially not seeing very good performance in African ancestry men for the genome-wide approaches, I think part of that is probably because even though we're using this multi ancestry GWAS as our training data to develop the genome-wide approach, about 80% of the GWAS is still European ancestry men. So I think when we construct these genome-wide approaches, it maybe is tending to do a little bit of over fitting towards European ancestry. And then when you apply it to a different population, I think it just doesn't do as well because it's a little bit over fit to a different population. That's my suspicion, but it's kind of hard to say exactly why that might be.

Andrea Miyahira: Okay, thank you. Did you compare the performance of these methods for predicting high risk or metastatic prostate cancer?

Burcu Darst: Yeah, that's a good question. So in some of the previous work that we've had, we often didn't see that there was much difference between how well it predicts overall prostate cancer versus aggressive prostate cancer. And if we were to compare aggressive versus non-aggressive, we wouldn't see any predictive ability at all. So in this paper, we really wanted to just focus on the question of how can we improve our polygenic risk score for overall prostate cancer risk? But in more recent work, we are starting to see now that our polygenic risk score is, with larger sample sizes, we're able to see that there is some discriminative ability of aggressive versus non-aggressive prostate cancer. So I think that could be some important future work to look at to see if that could potentially be a way to help us be able to better distinguish aggressive versus non-aggressive, to look at these genome-wide approaches.

Andrea Miyahira: Thanks. I guess thinking about what you guys are doing next, what efforts is your team currently undertaking to improve your PRS and particularly in the underrepresented populations?

Burcu Darst: Yeah. So there are a lot of ongoing efforts to try to include as many non-European ancestry men as we can in these GWAS studies. So in some of our recent work, I showed that we used MVP in this particular study as our independent data set to evaluate the polygenic risk score. But we since have included them in our multi ancestry GWAS, which is a really huge boost to power, especially in the non-European ancestry men. So yeah, we're collaborating with a lot of other people to try to increase the sample size of non-European ancestry men, and especially Chris Hyman has been leading a lot of efforts to try to pull together as many African ancestry data sets as possible to improve the representation particularly of African ancestry men in prostate cancer GWAS.

Andrea Miyahira: Okay. And how do you envision this test being used in the clinic and what steps do you think are needed for the PRS 269 or its improved version to become a standard of care?

Burcu Darst: Yeah, so I think ideally the polygenic risk score is really ideal for risk predictions. So I think it probably has best use before the diagnosis of prostate cancer and hopefully to guide screening decisions about who should receive screening and who maybe can opt out of screening because they have such low risk of developing prostate cancer. Since it is a very strongly genetic disease, I think removing men who have very low genetic risk of prostate cancer could be a good way to reduce the over-diagnosis of prostate cancer. So yeah, kind of using it for risk stratified screening I think would be optimal. But I think there are a lot of steps that we need to take in order to get there. And one in particular that I think is a challenge that in the field of polygenic risk scores more broadly is something that's being thought about is how do we actually interpret these polygenic risk scores? Because every time you get a score, you have to think about, it's different than carrier status where you're either a carrier or you're not.

Because if you have a polygenic risk score, who is it relative to? Especially because we see that there are differences in polygenic risk scores between different populations. You have to really carefully consider when you interpret a particular score, who is that score relative to and how do you determine if someone has high risk versus low risk? I think that's one of the important questions that we really need to think about. And there also haven't really been too many prospective clinical studies at this point to evaluate the efficacy and effectiveness of polygenic risk scores in a clinical setting. So I think those are some the next steps that we need to do to be able to move them into a clinical setting.

Andrea Miyahira: Okay. Well, thank you for this great work and for coming on today. And congratulations again on the paper.

Burcu Darst: Thank you. Thank you for inviting me.