Principal findings
Our results showed the poor performance of polygenic risk scores in population screening, individual disease prediction, and population risk stratification. This finding is not obvious from the metrics reported in the Polygenic Score Catalog but is clear based on the appropriate metrics used in this study. Our conclusion is consistent with that of others,16 17 19 but is insufficiently recognised. The findings are relevant to consumers, patients, doctors, those involved in preventive medicine and public health, as well as funders and policy makers.
Polygenic risk score distributions overlapped substantially for all conditions studied, and this extensive overlap constrained their performance in each of their intended applications, whether used alone or in combination with conventional risk factors or screening tests. For example, achieving a clinically useful performance in population screening, such as an 80% detection rate for a 5% false positive rate (DR5=80%) requires an odds ratio for one standard deviation of 12 or higher (compared with the median observed value of 1.31) or an area under the receiver operating characteristic curve of 0.96 (compared with the median observed value of 0.65). Only 11.4% of the area under the curve values in the Polygenic Score Catalog exceeded 0.8, which equates to a DR5 of 32%, with most of these resulting from large effect variants at the HLA locus in a few autoimmune diseases (figure 1 and online supplemental file 1).
Study implications
When a risk factor has a monotonic relation with risk of disease,32 more instances arise among the majority with near average risk factor values than among the few with more extreme values, termed the prevention paradox.33 34 In this respect, polygenic risk scores are similar to some non-genetic risk factors, such as blood pressure and low density lipoprotein cholesterol, which although causal, are poor predictors of coronary artery disease.16 35 That the performance of polygenic risk scores in the prediction of coronary artery disease is sometimes compared favourably with that of blood pressure and cholesterol26 is to benchmark one poor predictor against another.
Where safe and inexpensive preventive interventions are available (eg, statins and blood pressure lowering drugs for prevention of coronary artery disease and stroke), broadening rather than limiting eligibility for such interventions gives greater public health benefits.36 Prevention of coronary artery disease and stroke has been achieved in effect by the progressive lowering of the 10 year risk cut-off value for prescription of statins in primary prevention. The cut-off value was reduced from a 10 year risk of coronary artery disease in the UK in 1997 of 30%,37 to 10% for the 10 year risk of coronary artery disease or stroke in the UK from 201627 and to 7.5% in the US from 2019.38 The reduction in the risk cut-off value resulted from reduced drug acquisition costs through patent expiry, and by accumulating evidence on long term safety. Eligibility could be extended even further and simplified by using age alone to guide prescription of statins for primary prevention, preventing coronary artery disease and stroke in many more patients.28 In contrast, retaining the same 10 year risk cut-off value and adding information on polygenic risk score to conventional risk factor models has a much weaker effect. Based on recently reported data,5 26 we showed that several thousand individuals need to be genotyped and a polygenic risk score calculated to prevent one additional vascular event.
Identifying a minority of individuals at very high risk (with genetics or other means) might be justified if a preventive intervention is costly, resource limited, or has substantial harms.39 With breast cancer as an example, however, we showed that identifying those at high risk requires testing in all and, apart from missing the many more patients among those at average risk, generates many false positive results. This finding could have substantial downstream resource implications for healthcare systems if, for example, genetic risk stratification was followed by a confirmatory screening test, such as mammography for breast cancer.40 In this case, reducing the age cut-off value for mammography for all women without determining their polygenic risk score might be more sensible.
The enthusiasm surrounding polygenic risk scores might have been encouraged by pressure on academia to demonstrate a tangible health effect after decades of research investment in human genomics and by commercial opportunity. Unrealistic expectations have probably been raised by use of uninformative metrics. Publications on polygenic risk scores often illustrate comparisons between mutually exclusive groups (eg, those in opposite ends of a polygenic score distribution).41 This finding is relevant in aetiological studies but is not relevant in screening. Figure 3 shows seemingly impressive odds ratios of 13, 7, 5, 4, and 3 for comparisons of the top versus the bottom 1%, 5%, 10%, 20%, and 25%, respectively, of the polygenic risk score distribution for coronary artery disease, all reduced to a DR5 of only 12%. What is relevant in screening is the risk of an event in a group compared with that of the whole population, which is achieved with the calculation of the detection rate for a specified false positive rate.
Policy implications
Our findings are relevant to commercial providers of genetic tests and to researchers working on polygenic risk scores. Commercial providers could communicate individual test results to customers with greater clarity and relevance to performance in disease prediction; for example, by presenting the overlapping distributions of polygenic risk scores among those later affected and unaffected and by presenting an absolute measure of risk for an individual or group, which requires additional information on population average risk at a particular age over a specified time. At the same time, as already suggested,42 policy makers might wish to consider stricter regulation of commercial genetic tests based on polygenic risk scores, with a focus on clinical performance and not just assay performance (as indicated by the Royal Statistical Society Diagnostic Tests Working Group Report43), to protect the public from unrealistic expectations and already stretched public health systems from becoming overburdened by the management of false positive results. Researchers reporting studies on polygenic risk scores should present as a minimum: mean and standard deviation values for polygenic risk scores among later affected and unaffected individuals; overlap in their distributions; relevant performance metrics, such as the detection rate for a specified false positive rate (eg, DR5), avoiding the need to make this calculation indirectly23; and performance of polygenic risk scores with and without the inclusion of other variables so that users can judge the incremental benefit provided by the polygenic risk score itself.
Although our analysis showed the poor performance of polygenic risk scores in screening, prediction, and risk stratification, these scores might be useful in other situations. For example, polygenic scores might explain the variable penetrance of rare mutations in monogenic diseases (eg, hypertrophic cardiomyopathy or familial hypercholesterolaemia), and be used to help detect patients. Other predictive applications of genotyping also exist, for example in pharmacogenetic testing to optimise the efficacy and safety of medicines. Genotyping might also be of value in blood and tissue matching. Because genetic variation is transmitted from parents to offspring through a randomised process (like treatment allocation in a clinical trial), and is unaltered by disease, an important translational application arising from genomic discoveries could be providing evidence on disease causation and targets for pharmaceutical intervention.44
Conclusion
Use of the appropriate metrics showed poor performance of polygenic risk scores in population screening, individual risk prediction, and population risk stratification. The wide scope and analytical approach of our study might help to resolve the debate on the value of polygenic risk scores, and avoid unjustified expectations about their role in the prediction and prevention of disease.