Study population and data sources
We used primary care data from the Clinical Practice Research Datalink (CPRD) GOLD, linked with hospital admissions from Hospital Episode Statistics (HES) and national mortality records from the Office for National Statistics (ONS) in England. CPRD GOLD contains anonymised individual level primary care records from UK general practitioners, covering about 6.9% of the UK population, and is broadly representative with regards to age, sex, and ethnic group.20 Data for sex were taken from information in the CPRD.
Individuals were identified and followed up from the study baseline, which was set as 1 January 2006, allowing an approximate two year time window after 1 April 2004 (the date of introduction of the quality and outcomes framework in the UK NHS21), for the recording of measurements of risk factors before baseline until: the date of their first newly recorded cardiovascular disease event or death; their 95th birthday; date of de-registration at the general practice or the last contact date for the practice with CPRD; or 31 May 2019 (the end of data availability), whichever came first. We further restricted the study population to those aged 40-85 years with no previous cardiovascular disease (codelists shown in online supplemental methods 1), and no history of treatment with statins or prevalent diabetes at baseline (because people with type 1 and type 2 diabetes are considered at high risk of cardiovascular disease, regardless of their predicted risks, in the clinical guidelines from NICE and European Society of Cardiology6 15). Online supplemental figure 1 shows a flowchart of selection of the study population.
Statistical analysis
Risk estimates for cardiovascular disease
For the primary analyses, we estimated the 10 year risk of cardiovascular disease for each individual using the QRISK2 algorithm,5 as recommended in the UK cardiovascular disease risk assessment guideline. Although QRISK3 is recommended in the updated 2023 NICE guideline, until electronic clinical systems in which QRISK2 is embedded are updated with QRISK3, using QRISK2 might be necessary, as indicated in the guideline.6 Online supplemental methods 1 provides details of the cardiovascular disease outcomes and risk factors used in the QRISK2 algorithm. Multiple imputation by chained equations was used to impute missing values for smoking status, systolic blood pressure, total cholesterol, high density lipoprotein cholesterol, and body mass index (online supplemental methods 1). We performed five imputations which were adequate to get relatively high efficiency22 and were pragmatic for our sample size. Analyses were performed in each imputed dataset separately and then the results were pooled across imputations with Rubin's rules.22 External validation of the QRISK2 model in our data involved assessment of overall performance (with the R2 statistic23), discrimination (with Harrell's C statistic24 and D statistic25), and calibration (visually assessing the agreement of observed risk and predicted risk by 10ths of predicted risk26) (online supplemental methods 1).
Risk stratification strategies
With the estimated 10 year risks of cardiovascular disease from the QRISK2 algorithm, individuals were stratified as having a high risk of cardiovascular disease for allocating statins based on two main strategies: in strategy A, predicted risk was a fixed high risk cut-off value of 10% (ie, individuals who had an absolute risk ≥10% were identified as high risk); in strategy B, individuals were first grouped by centiles of predicted risk at each age, by one year age groups, and sex, and then were identified as high risk if they had an absolute risk ≥10% or an estimated risk exceeding the 90th centile of the age and sex specific risk distributions.
The 90th centile was selected as an example to illustrate the potential results of applying age and sex specific thresholds in risk stratification for cardiovascular disease. We applied this approach to lower the thresholds at younger ages rather than to increase the thresholds at older ages, with the consideration that this would be a pragmatic, acceptable, and implementable strategy.
Performance of stratification strategies
Although individuals receiving treatment with statins at baseline were excluded, about 20% of included individuals initiated statin treatment during follow-up (so-called treatment drop-ins27 28). Ignoring treatment initiation could underestimate the observed risks of cardiovascular disease.29 Therefore, we first estimated the counterfactual statin naive survival time, which accounts for the treatment drop-ins effect30 (online supplemental methods 2). The counterfactual statin naive survival times were used for the subsequent evaluation of the stratification performance.
To compare the stratification strategies, we calculated sensitivity (ie, proportion of individuals who are correctly grouped as high risk by the stratification strategy31), specificity (ie, proportion of individuals who are correctly identified as low risk31), an adapted area under the receiver operating characteristic curve for dichotomised predictions (AUROC-dp), and net benefit. AUROC-dp measures the ability to discriminate between individuals who do and do not have a cardiovascular disease event according to the combined risk prediction model and the stratification rule. As a measure of discrimination, AUROC-dp generally has values from 0.5 (representing discriminative ability equal to chance alone) to 1 (when the risk prediction model and stratification strategy perfectly divides individuals into those who do and do not later have a cardiovascular disease event).32–34
Net benefit was estimated to assess the clinical value of different risk stratification strategies and their clinical consequences. Net benefit represents the difference between the true positive rate and false positive rate weighted by the odds of the selected threshold for being at high risk, with higher values indicating greater net benefit.35–37 Sensitivity, specificity, AUROC-dp, and net benefit were calculated accounting for censoring. Online supplemental methods 3 and 4 describe the methods in detail.
Potential public health impact
We quantified the public health impact of the combined risk prediction model and the stratification rule by the number needed to screen and number needed to treat to prevent one new cardiovascular disease event in 10 years, under the assumption that statin treatment is given to individuals at high risk and reduces the risk of cardiovascular disease. We assumed a 25% relative risk reduction in cardiovascular disease upon statins allocation, for all ages, sexes,38 39 and treatment duration,40 while allowing for different adherence rates to statin treatment by age and sex (online supplemental methods 5). The number needed to screen will always be smaller when the threshold is lowered, and is at a minimum when everyone is treated. In contrast, the number needed to treat will always increase when the threshold is lowered.
To investigate the long term benefit of treating individuals with a high risk of cardiovascular disease with statins, we estimated the gain in cardiovascular disease-free life expectancy associated with statin initiation by age and sex. Cardiovascular disease-free life expectancy (or life years free of cardiovascular disease) is defined as the average duration of survival without cardiovascular disease over the follow-up period, and was calculated as the area under the cardiovascular disease-free survival curve.41 To better reflect the potential benefits over a lifetime, especially for younger individuals with low short term risks and who were expected to survive far longer than the available follow-up time, we used age as the time scale and adjusted for the competing risk of death from non-cardiovascular disease events to assess the potential longer term cardiovascular disease-free survival.42 Also, future life years were discounted with a time preference rate of 0.03 (which assumes that the value of the next year is worth 97% of the previous year) to account for a likely increasing lower value that individuals might give to life years further out into the far-off future.43 Estimations were based on sex specific life tables combining age specific risks of cardiovascular disease and risks of death from non-cardiovascular disease in one year age intervals.42
When individuals were identified as having a high risk of cardiovascular disease at baseline age according to each stratification strategy, the one year risk of cardiovascular disease was calculated by incorporating the relative risk reduction of statin treatment on cardiovascular disease into the subdistribution of risk of cardiovascular disease for each of the remaining life years. The gain in cardiovascular disease-free life years is the difference in cardiovascular disease-free life expectancy with and without treatment of statins assumed. To illustrate the results intuitively at a population level, we calculated the possible gain in cardiovascular disease-free life years in England based on the most recent available data on the age and sex structure of the 2020 mid-year England population aged 40-85 years.44 Online supplemental methods 6 provides details of the calculation.
Sensitivity analyses
Because the number needed to screen, number needed to treat, and population average gain in cardiovascular disease-free life years from statin treatment depend on the number of individuals identified as high risk, to make a fairer comparison across strategies, we further performed sensitivity analyses by ascertaining the same number of individuals at high risk of cardiovascular disease in each strategy. We constrained the number of individuals classified as having a high risk of cardiovascular disease to be the same as the number identified in strategy B among the whole population sample, and then identified the corresponding single risk threshold as an alternative fixed threshold for strategy A. This single risk threshold was identified to be 9.2% (strategy A1).
Sensitivity analyses were also conducted with the SCORE214 and SCORE2-OP45 algorithms, as recommended in the current guidelines from European Society of Cardiology (with the low risk region equations for the UK population as recommended).15 Online supplemental methods 1 provides details of the cardiovascular disease outcomes and risk factors used in the SCORE2 and SCORE2-OP algorithms. We further assessed age specific risk stratification thresholds with high risk cut-off values at 7.5%, 10%, or 15% for younger (40-49 years), middle aged (50-69 years), and older (≥70 years) age groups, respectively, as recommended in the European Society of Cardiology guidelines (strategy C). Thus, for risk estimates based on SCORE2, we compared the stratification performance of strategy A (single 10% threshold), strategy B (age and sex specific thresholds), and strategy C (age specific thresholds).
Analyses were performed with Stata version 15.1 (StataCorp, College Station, TX, USA) and R version 3.6.1 (R Foundation for Statistical Computing, Vienna, Austria). The manuscript was prepared in accordance with the strengthening the reporting of observational studies in epidemiology (STROBE) statement (online supplemental material).
Patient and public involvement
Patients and the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research, mainly because this study used anonymised electronic health records data and focused on population level results. We plan to communicate the study findings to stakeholders related to cardiovascular disease guidelines.