Research

Effect of competing mortality risks on predictive performance of the QFracture-2012 risk prediction tool for major osteoporotic fracture and hip fracture: external validation cohort study in a UK primary care population

Abstract

Objective To externally evaluate the QFracture-2012 risk prediction tool for predicting the risk of major osteoporotic fracture and hip fracture.

Design External validation cohort study.

Setting UK primary care population. Linked general practice (Clinical Practice Research Datalink (CPRD) Gold), mortality registration (Office of National Statistics), and hospital inpatient (Hospital Episode Statistics) data, from 1 January 2004 to 31 March 2016.

Participants 2 747 409 women and 2 684 730 men, aged 30-99 years, with up-to-standard linked data that had passed CPRD checks for at least one year.

Main outcome measures Two outcomes were modelled based on those predicted by QFracture: major osteoporotic fracture and hip fracture. Major osteoporotic fracture was defined as any hip, distal forearm, proximal humerus, or vertebral crush fracture, from general practice, hospital discharge, and mortality data. The QFracture-2012 10 year predicted risk of major osteoporotic fracture and hip fracture was calculated, and performance evaluated versus observed 10 year risk of fracture in the whole population, and in subgroups based on age and comorbidity. QFracture-2012 calibration was examined accounting for, and not accounting for, competing risk of mortality from causes other than the major osteoporotic fracture.

Results 2 747 409 women with 95 598 major osteoporotic fractures and 36 400 hip fractures, and 2 684 730 men with 34 321 major osteoporotic fractures and 13 379 hip fractures were included in the analysis. The incidence of all fractures was higher than in the QFracture-2012 internal derivation. Competing risk of mortality was more common than fracture from middle age onwards. QFracture-2012 discrimination in the whole population was excellent or good for major osteoporotic fracture and hip fracture (Harrell’s C statistic in women 0.813 and 0.918, and 0.738 and 0.888 in men, respectively), but was poor to moderate in age subgroups (eg, Harrell’s C statistic in women and men aged 85-99 years was 0.576 and 0.624 for major osteoporotic fractures, and 0.601 and 0.637 for hip fractures, respectively). Without accounting for competing risks, QFracture-2012 systematically under-predicted the risk of fracture in all models, and more so for major osteoporotic fracture than for hip fracture, and more so in older people. Accounting for competing risks, QFracture-2012 still under-predicted the risk of fracture in the whole population, but over-prediction was considerable in older age groups and in people with high comorbidities at high risk of fracture.

Conclusions The QFracture-2012 risk prediction tool systematically under-predicted the risk of fracture (because of incomplete determination of fracture rates) and over-predicted the risk in older people and in those with more comorbidities (because of competing mortality). The current version of QFracture-2016 that is used by the UK's health service needs to be externally validated, particularly in people at high risk of death from other causes.

What is already known on this topic

  • The QFracture risk prediction tool is recommended by the National Institute for Health and Care Excellence (NICE) to predict the risk of fracture and to guide decisions to start bisphosphonates, on the basis of previous validation studies showing good predictive performance

  • Previous validation studies of the original QFracture tool and QFracture-2012 have followed the derivation studies in not including fractures recorded in hospital discharge data, and in not accounting for competing risk of mortality

  • The QFracture-2016 prediction tool currently used by the UK's health service needs to be externally validated in the whole population

What this study adds

  • The observed incidence of fracture was higher in this study (which included hospital recorded fractures) than in the QFracture-2012 derivation and validation studies (which did not)

  • Despite excellent discrimination in the whole population, systematic under-prediction of the risk of fracture by QFracture-2012 was found, as was systematic over-prediction in older people and in those with more comorbidities when accounting for competing risk of mortality

  • Competing mortality risk is an important problem in the context of fracture prediction in older people because non-fracture death is much more common than the fracture outcomes being predicted

How this study might affect research, practice, or policy

  • Research is needed to examine the implications of competing mortality risk for recommended clinical prediction tools where the time-horizon for prediction is long and competing mortality is common

Introduction

Fragility or low impact fractures are a common consequence of osteoporosis and osteopenia, and a major cause of morbidity, disability and, in some cases, death. Bisphosphonates reduce the risk of hip and vertebral fractures in people with osteoporosis,1 and international guidelines recommend drug treatment for people at high risk of fracture.1–5 In the UK, guidelines recommend the use of a fracture risk prediction tool in middle aged and older people who have risk factors for fracture, with measurement of bone mineral density for further classification of risk in those at intermediate risk.2 4 In the US, guidelines from the US Bone Health and Osteoporosis Foundation (previously the National Osteoporosis Foundation) recommend similar use of prediction tools for middle aged people but also recommend routine measurement of bone mineral density in older people.5 These types of guideline recommendations based on risk are increasingly used by people who develop guidelines to target treatment to those with the greatest capacity to benefit, but the effectiveness of this strategy critically depends on the performance of the risk prediction tools used.

Many fracture risk prediction tools have been created, although only three have undergone repeated external validation: QFracture, FRAX, and Garvan.6 7 The first version of QFracture8 was externally validated in a UK primary care dataset, and was found to have excellent discrimination and calibration (discrimination is the ability of the prediction tool to correctly differentiate between people who have a fracture and those who have not, whereas calibration refers to how closely the predicted and observed probabilities agree).9 Subsequently, Dagan et al externally validated the updated QFracture-2012 algorithm and the Garvan prediction tool in an Israeli dataset. QFracture-2012 had good discrimination but Garvan had moderate discrimination, and both tools systematically under-predicted the risk of fracture.7

The fracture risk assessment tool (FRAX) has been internally validated in several datasets, with discrimination reported as good but calibration has rarely been assessed.6 10 FRAX cannot be externally validated, however, because the underlying FRAX algorithm has never been made public which prevents full independent evaluation.7 Dagan et al also presented an external validation of FRAX in their analysis, but FRAX predictions were not based on full FRAX estimates of risk because the prediction equation is not published.7 Based on the approximate FRAX risk used, considerable under-prediction of fractures for this tool was found.

In the UK, the National Institute for Health and Care Excellence (NICE) recommends the use of either QFracture or FRAX to inform decisions to start treatment with bisphosphonates, but recognises that the estimated risk of fracture for individuals can vary considerably between tools.1 2 FRAX over-predicted the risk of fracture when the same method of determining fractures as the QFracture-2012 derivation was used.2 8 11 Two possible reasons for these differences include how fractures are identified in the derivation of each tool, because QFracture-2012 uses codes in primary care records and mortality data12 and FRAX uses self-report and hospital records13 (these might be incomplete in different ways), and only FRAX takes into account competing risks of mortality. Competing risk of mortality from non-fracture causes is a known problem in risk prediction because standard modelling methods assume that patients who are censored before the intended end of follow-up have the same risk of fracture as those who are not censored. Although this assumption might be reasonable for loss to follow-up because of change in address, when someone dies the assumption is clearly false. Not accounting for competing risk of mortality over-predicts the risk of fracture, which is likely to be more of a problem in older people and those with multimorbidities.14–16 The aim of this study therefore was to externally validate the QFracture 2012 risk prediction tool, and specifically to compare prediction in relation to better determination of fracture rates, and to examine the effect of competing risk on predictive performance. QFracture 2012 has subsequently been updated and the QFracture 2016 model is the version currently in use in the UK's health service.

Methods

Data source and population

Linked general practice (Clinical Practice Research Datalink Gold), mortality registration (Office of National Statistics), and hospital inpatient (Hospital Episode Statistics) data were used. The data are similar to the QFracture derivation dataset in terms of inclusion of linked primary care and mortality data, but we also used linked hospital admission data to determine if a fracture occurred. To be included, patients had to be permanently registered with a general practice contributing up-to-standard (ie, passing Clinical Practice Research Datalink quality checks) data for at least one year; have linkage to Hospital Episode Statistics discharge data and Office of National Statistics mortality data; and aged ≥30 years and <100 years. Cohort entry was the latest of the dates on or after 1 January 2004. Cohort exit was the date of the earliest of the first relevant fracture event, death, deregistration from the general practice, date of the last data collection from the practice, or the end of the study on 31 March 2016. All outcomes and predictors were recorded blind to the study hypothesis and recorded as part of routine clinical care. No formal power calculation was done because the study size was determined by data available in the Clinical Practice Research Datalink, which was considered sufficient.17

Outcomes

Two outcomes were modelled based on those predicted by the QFracture tool: major osteoporotic fracture and hip fracture.12 Major osteoporotic fracture was defined as hip, vertebral, wrist, or proximal humeral fractures determined from codes in the general practice electronic health record (with Read codes, which have been shown to have high positive predictive value for hip fracture),18 Hospital Episode Statistics discharge diagnoses (ICD-10 (international classification of diseases, 10th revision) codes recorded in the primary position as the reason for admission to hospital), and Office of National Statistics death registration (ICD-10 codes) (online supplemental tables S1 and S2). Major osteoporotic fracture recorded before entry into the study was used as a predictor variable. Major osteoporotic fracture or hip fracture recorded after the index date was used as the outcome variable, with the date of the event taken as the first record of fracture.

Prediction model

We used the published QFracture-2012 risk model (under GNU Lesser General Public Licence, version 3) and calculated the QFracture-2012 predicted 10 year risk of a major osteoporotic fracture and hip fracture for all patients in our cohort. Online supplemental tables S3-S5 describe the derived codelists for each morbidity predictor. The key difference from the QFracture-2012 derivation was that for QFracture-2012, body mass index, alcohol consumption, and smoking status, recorded after the date of entry into the study but before any fracture outcome, could be used in the prediction, whereas in this analysis we restricted predictor values to those recorded before entry into the study only, to avoid the use of future information in the prediction.

Comorbidity

For each patient at baseline, we calculated the Charlson comorbidity index based on primary care Read codes.19 The Charlson comorbidity index was not used in the prediction, but was used to classify the analysis of discrimination and calibration by level of comorbidity (Charlson comorbidity index score 0, 1, 2, and ≥3 groups).

Missing data

Online supplemental table S6 details the extent and management of missing data. In common with the QFracture-2012 derivation, those with missing data for ethnic group were assumed to be white. For missing data on body mass index, smoking status, and alcohol consumption, multivariate imputation by chained equations20 was used to generate five imputed datasets, which were combined by using Rubin’s rules.21 Morbidities and prescribing used for prediction were assumed to be absent if there were no relevant data recorded for them (the same as for the QFracture-2012 derivation), reflecting that recording of morbidity and prescribing data in the Clinical Practice Research Datalink is generally good.22 23

Statistical analysis

Based on the recommendations of reporting guidance,24 the initial analysis compared the study population and fracture rates in this study with the previously published QFracture derivation and validation cohorts (although variable reporting across previously published papers means that the comparison population varies depending on the data available).8 9 12 The performance of the QFracture-2012 risk score was assessed by examining discrimination and calibration. We used Harrell’s C statistic, shortened to only include pairs where the earliest survival time is no later than 10 years after entry (a C statistic of 0.5 indicates discrimination that is no better than chance, whereas a C statistic of 1 indicates perfect discrimination). Two other measures of discrimination were calculated, the D statistic of Royston and Sauerbrei (which is based on the separation in event free survival between patients with predicted risk scores above and below the median; higher values indicate greater discrimination),25 and a related R2 statistic estimating explained variation for censored survival data.26

Calibration was assessed for 10 equally sized groups (deciles) of participants ranked by predicted risk, by plotting observed proportions versus predicted probabilities. We estimated observed risk for censored data in two ways: with the standard Kaplan-Meier estimator (which is consistent with the assumptions made in the QFracture-2012 derivation in that it does not account for competing risks); and the Aalen Johansen estimator (an extension to allow for competing events, in this case, death from causes other than fractures).27 All models were fitted in R-4.0.0 and Stata 11.2. Plots were generated separately for sex, for all patients, and for subgroups for age and Charlson comorbidity index, based on summary statistics pooled across the imputed datasets.

Patient and public involvement

Public contributors were involved in the design and conduct of the study as members of the study steering group.

Results

We included 2 747 409 women and 2 684 730 men in the analysis, with mean ages of 50.7 and 48.5 years, respectively (table 1). The study population was similar to the previously published QFracture-2012 internal validation population in term of mean age, sex, body mass index, and ethnic group but we found a higher recorded prevalence of previous major osteoporotic fracture, residence in a nursing home or care home, and many long term conditions, including type 2 diabetes, history of falls, dementia, cancer, asthma or chronic obstructive pulmonary disease, chronic renal disease, malabsorption, and epilepsy or prescribed anticonvulsant drugs. For the population evaluated for major osteoporotic fracture, median follow-up was 5.7 (interquartile range 2.2-10.5) years in women and 5.6 (2.2-10.4) years in men. For hip fracture, median follow-up was 5.9 (2.2-10.6) years in women and 5.7 (2.2-10.4) years in men.

Table 1
|
Baseline data in our external validation cohort and in previously published QFracture-2012 internal validation cohort12

The crude incidence of both major osteoporotic fracture and hip fracture was higher in women than in men (major osteoporotic fracture 6.12 per 1000 person years in women v 2.26 in men; hip fracture 2.30 v 0.88, respectively) (online supplemental tables S7 and S8). We found a marked increase with age for both outcomes, and differences between the sexes were larger in older ages (eg, in women aged 30-34 years, major osteoporotic fracture was 0.95 per 1000 person years, increasing to 33.53 for ages 80-99 years; in men aged 30-34 years, 1.02 per 1000 person years increasing to 15.42 for ages 80-99 years) (online supplemental tables S9 and S10). For the whole population, the incidence of major osteoporotic fracture in this study was 4.22 per 1000 person years of follow-up compared with 2.45 per 1000 person years in the previously published updated QFracture-2012 internal validation cohort,12 and 2.89 per 1000 person years in a previously published Clinical Practice Research Datalink validation cohort.12 For hip fracture, overall incidence was 1.60 per 1000 person years compared with 1.32 in the same previously published Clinical Practice Research Datalink validation cohort.28 Two thirds (64 163, 67.1%) of major osteoporotic fractures in women and half (17 276, 50.3%) in men were in people aged ≥65 years. For hip fracture, 32 339 (88.8%) fractures in women and 10 167 (76.0%) in men were in people aged ≥65 years (online supplemental tables S7 and S8).

Although the incidence of major osteoporotic fracture and hip fracture increased with age in men and women, the incidence of mortality from causes other than fractures increased more steeply with age (particularly in men). The incidence of death from causes other than fractures was similar to the incidence of major osteoporotic fracture in young people, but increased greatly with age; four times as common as major osteoporotic fracture in women aged 90-99 years and almost 10 times as common in men aged 90-99 years (figure 1, online supplemental tables S15 and S16). The incidence of death from causes other than fractures was higher than for hip fracture at all ages.

Figure 1
Figure 1

Incidence of major osteoporotic fracture, hip fracture, and death from causes other than fractures (non-fracture death) in women and men

In the whole population, QFracture-2012 discrimination for major osteoporotic fracture was excellent in women (C=0.813) and good in men (C=0.738), and for hip fracture was excellent in both sexes (women C=0.918, men C=0.888) (table 2). Grouped by age, however, for both outcomes discrimination was poor to moderate in older adults where prediction of fracture risk is recommended1 (eg, for major osteoporotic fracture, ages 65-74 years, C=0.616 for women and 0.660 for men; ages 85-99 years, C=0.576 for women and C=0.624 for men) (table 2). Grouped by Charlson comorbidity index, discrimination was good for major osteoporotic fracture and good to excellent for hip fracture in all groups.

Table 2
|
Discrimination and model fit for major osteoporotic fracture and hip fracture*

Figures 2–4 and online supplemental figures S2–S9 show the calibration plots. When observed rates for major osteoporotic fracture were estimated without accounting for competing risk (figures 2 and 3 and online supplemental figures S2–S5), in the whole population for both men and women, we found under-prediction of the risk of fracture at all levels of predicted risk. Grouped by age, under-prediction in all age groups and at all levels of predicted risk was found except in the highest predicted risk decile in people aged 80-99 years where over-prediction was evident. Similar patterns were seen when grouped by Charlson comorbidity index, with under-prediction in all groups except those with the most multimorbidities at the highest levels of predicted risk.

Figure 2
Figure 2

Calibration for major osteoporotic fracture in women without accounting for competing risks and accounting for competing risks. For each pair, observed risk curve above predicted risk curve indicates under-prediction; observed risk curve below predicted risk curve indicates over-prediction. Separate plots for age and Charlson comorbidity index are shown in supplementary figures S2 and S4, respectively. *Observed risk based on Kaplan-Meier estimator, which does not account for competing mortality risk. †Observed risk based on Aalen-Johansen estimator, which accounts for competing mortality risk

Figure 3
Figure 3

Calibration for major osteoporotic fracture in men without accounting for competing risks and accounting for competing risks. For each pair, observed risk curve above predicted risk curve indicates under-prediction; observed risk curve below predicted risk curve indicates over-prediction. Separate plots for age and Charlson comorbidity index are shown in supplementary figures S3 and S5, respectively. *Observed risk based on Kaplan-Meier estimator, which does not account for competing mortality risk. †Observed risk based on Aalen-Johansen estimator, which accounts for competing mortality risk

Figure 4
Figure 4

Calibration for hip fracture in women without accounting for competing risks and accounting for competing risks. For each pair, observed risk curve above predicted risk curve indicates under-prediction; observed risk curve below predicted risk curve indicates over-prediction. Separate plots for age and Charlson comorbidity index are shown in supplementary figures S6 and S8, respectively. *Observed risk based on Kaplan-Meier estimator, which does not account for competing mortality risk. †Observed risk based on Aalen-Johansen estimator, which accounts for competing mortality risk

Figure 5
Figure 5

Calibration for hip fracture in men without accounting for competing risks and accounting for competing risks. For each pair, observed risk curve above predicted risk curve indicates under-prediction; observed risk curve below predicted risk curve indicates over-prediction. Separate plots for age and Charlson comorbidity index are shown in supplementary figures S7 and S9, respectively. *Observed risk based on Kaplan-Meier estimator, which does not account for competing mortality risk. †Observed risk based on Aalen-Johansen estimator, which accounts for competing mortality risk

When observed major osteoporotic fracture rates were estimated accounting for competing risk (figures 2 and 3 and online supplemental figures S2–S5), in the whole population, we found less under-prediction with some over-prediction in women at the highest predicted risk. Grouped by age, under-prediction was found in younger age groups but to a lesser extent than without accounting for competing risk. We found considerable over-prediction in women aged 85-99 years at higher risk and in most men aged 85-99 years, and over-prediction in men and women aged 75-84 years at the highest levels of predicted risk. In these older age groups, observed risk of major osteoporotic fracture was either flat or decreased as the decile of predicted risk increased. Similar patterns were seen when grouped by Charlson comorbidity index, with over-prediction of the risk of fracture in those with the most multimorbidities (Charlson comorbidity index ≥3) and in people with a Charlson comorbidity index of 2 at the highest level of predicted risk.

For hip fracture, when observed rates of hip fracture were estimated without accounting for competing risk (figures 4 and 5 and online supplemental figures S6–S9), in the whole population, we found greater under-prediction of the risk of fracture than for major osteoporotic fracture at all levels of predicted risk for both women and men. Grouped by age, we found under-prediction in all age groups and at all levels of predicted risk except for the highest two predicted risk deciles in women aged 80-99 years where large over-prediction of risk was found. Similar over-prediction was found in the highest risk decile for men aged 80-99 years. When grouped by Charlson comorbidity index, similar patterns were seen, with under-prediction in all groups except for those with the most multimorbidities at the highest levels of predicted risk.

When observed hip fracture rates were estimated accounting for competing risk (figures 4 and 5 and online supplemental figures S6–S9), in the whole population, we found less under-prediction with some over-prediction in women at the highest predicted risk. Grouped by age, under-prediction was less in younger age groups, but over-prediction was considerable in both sexes aged 85-99 years at higher predicted risk, as well as in both sexes aged 75-84 years at the highest levels of predicted risk. Similar to major osteoporotic fracture, in these two older age groups, observed hip fracture rates were flat or declined across all 10 deciles of increasing predicted risk. Similar patterns were seen when grouped by Charlson comorbidity index, with over-prediction of fracture risk in those with the most multimorbidities (Charlson comorbidity index ≥3) and in people with a Charlson comorbidity index of 2 at the highest level of predicted risk.

Discussion

Summary of findings

In this external validation of the QFracture-2012 risk prediction tool, we found very good to excellent discrimination in the whole population aged 30-99 years, but poor to good discrimination in important subgroups, including older patients and those with higher levels of multimorbidity. In contrast, calibration was poor. When evaluated without accounting for competing risk, QFracture-2012 consistently under-predicted both major osteoporotic fracture and hip fracture. The most likely explanation for this finding is that our method of determining the number of fractures in this study was more complete because fractures recorded during admission to hospital were included as well as those recorded in general practice electronic health records and death registrations. In this study, in women, only 14 802 (13.5%) major osteoporotic fractures and 6911 (19.0%) hip fractures were recorded in hospital admission data, compared with 6305 (18.4%) major osteoporotic fractures and 2515 (19.1%) hip fractures in men. Restricting determination of fractures to general practice and mortality data (to match the previously published internal12 and external validation studies9 28), largely explains the higher observed incidence of hip fracture in this study, but only partially explains the observed incidence of major osteoporotic fracture (online supplemental tables S11–S14, online supplemental figure S1). Also, the earliest study entry year in our study was 2004 compared with 1998 in the QFracture-2012 derivation, and recording of fractures in general practice data is likely to have improved over time.

When evaluated against observed fractures, estimated accounting for competing risk of mortality, under-prediction in general declined (because failing to account for competing risk causes over-prediction) but we found large over-prediction at higher levels of predicted risk in older people and in people with more complex multimorbidities. In people aged 85-99 years and in those with a Charlson comorbidity index of ≥3, observed risk was flat or even declining across deciles of increasing predicted risk. QFracture-2012 under-predicted in all patients because derivation was based on incomplete determination of fractures, and it over-predicted in people with a high competing risk of death (mainly elderly people and those with multiple comorbidities).

Strengths and limitations

The strengths of the study include the use of linked population data, the conduct of the study in accordance with methodology recommendations,24 29 the codelists used all being published in the supplementary material to allow our findings to be replicated, the consideration of performance in important subgroups, and by accounting for competing risks of mortality. The high prevalence of missing data for some predictors was an important limitation, and a problem common to all studies that use routine data. Considering that QFracture used information recorded after participant study entry for some variables whereas we did not, more missing data for body mass index and smoking existed in this study compared with the QFracture-2012 internal derivation, although missingness (ie, the extent of missing data) for alcohol status and ethnic group was similar (online supplemental table S6). We used multiple imputation based on the assumption that data are missing at random, which is likely reasonable for the imputed variables in this context. Also, censoring is common with a median follow-up of 5-6 years in this study, similar to others that have used these types of data,9 15 including the QFracture-2012 derivation and validation studies.8 9 12 Although we explicitly accounted for censoring because of death in this study, our analysis, similar to others that have used these types of data, still assumes that people who deregister from a Clinical Practice Research Datalink practice have the same risk of fracture as those who do not. This assumption is likely strong in older people where deregistration because they moved into care housing, or to a nursing home or care home, might be associated with a higher risk of fracture. Studies that can continue to follow up participants even if they move practice would allow this assumption to be examined, which is increasingly possible with the expansion of data linkage driven by the covid-19 pandemic.

A further limitation of our study was that humeral fractures in general practice data are often recorded without specifying whether the fracture was proximal or more distal. Therefore, we defined humeral fractures as proximal if the site was not specified, which might have caused some misclassifications (some false positives). In registry data, 80% of all humeral fractures are proximal,30 however, and we judged that only including humeral fractures specified as proximal (as QFracture does) would have caused greater misclassification (many false negatives). We also included a wider range of wrist fractures (including distal ulnar fractures) in analysis than QFracture derivation (which only included radial fractures), because most ulnar fractures in registry data are not high-energy.30 Some of the observed QFracture-2012 under-prediction may therefore be explained by differences in how fractures are defined. All choices of clinical codes therefore involve judgement about the likely balance of false positive and false negative, and readers can explicitly examine our choices in our codelists documented in the supplementary material). Like previous studies, we also could not validate our fractures against the gold standard of manually searching medical records, but our observed rates for hip fracture were similar to registry data.30 Finally, the QFracture prediction tool does not include data on bone mineral density because these data are not routinely available, and also one of the guideline recommended uses of the tool is to identify those who would benefit from measurement of bone mineral density. Including bone mineral density in the prediction would be expected to improve predictive performance, but investigating this effect was outside the scope of our analysis.

Comparison with other literature

The first version of QFracture8 was independently externally validated in a similar dataset to ours (The Health Improvement Network) and found to have excellent discrimination and calibration in the whole population.9 The updated QFracture-2012 (evaluated in this study)12 was externally validated in the Clinical Practice Research Datalink by the QFracture derivation team who found excellent discrimination and calibration in the whole population.28 In this study, discrimination in the whole population for major osteoporotic fracture and hip fracture was similarly excellent. Given the large differences in the incidence of fractures across the age ranges studied, however, any prediction tool where the whole population includes people aged 30-99 years will have excellent discrimination.31 32 When grouped by age, discrimination varied from poor to moderate (as expected when the most powerful predictor of fracture is partially removed by examining age subgroups).31 32 Unlike these previously published validations in UK data,8 9 12 calibration was poor.

This study differs from previously published validations of the original and QFracture-2012 models in two ways. Firstly, we also included fractures recorded during hospital admission (as well as those recorded in primary care electronic health records and in mortality data), and the primary care data were more recent and therefore recording of fractures in the general practitioner record might also have improved. Better determination of fractures would be expected to result in under-prediction by QFracture-2012, as observed in this study. Consistent with this finding, an Israeli external validation based on community and hospital data for fractures also observed considerable under-prediction by QFracture.7 Because the lists of Read codes used in QFracture-2012 are unpublished, however, we cannot examine the extent to which differences related to the choice of which fracture Read codes to include. Secondly, we examined calibration against observed outcomes estimated in the same way as previous external validations (with the Kaplan-Meier estimator, which does not account for competing mortality risk) and also accounting for competing risk (with the Aalen-Johansen estimator). As expected,14 16 31 when accounting for competing risks, large changes in observed risk in older people and those with more multimorbidities were found where death from causes other than fractures is more common, consistent with over-prediction by QFracture-2012 in people with a high competing mortality risk (despite under-prediction in all patients because of incomplete determination of fracture in the QFracture-2012 derivation).

Implications for policy, practice, and research

QFracture and similar clinical prediction tools28 including a wide age range typically have excellent discrimination, but that likely reflects that age is a powerful predictor of most outcomes.31 32 As we found in this study, excellent discrimination in the whole population is compatible with poor discrimination and poor calibration in the subgroups most at risk of the outcome (older people and those with multiply morbidities). Examination of discrimination and calibration grouped by age (and other important predictors where applicable) provides a better indication of predictive performance from a clinical perspective. Future research could examine whether fracture prediction models that are more tailored to different age groups (including premenopausal and postmenopausal groups in women) provide better prediction (eg, osteoporosis might dominate the risk of fracture in younger people, whereas the risk of falls might be important in older people).

QFracture-2012 has two major problems. Firstly, this study and a previous external validation7 in Israel found that it under-predicts risk in general, most likely because its derivation is based on incomplete determination of fractures. Under-prediction is likely at least partly addressed in the updated QFracture-2016 prediction model, which also ascertains fractures using both general practice and hospital admission data. QFracture-2016 is the version currently used by the UK's health service, and its algorithm was published in February 2023 (after this study was completed) on the QFracture website (https://qfracture.org/src.php). The performance of QFracture-2016 has not been externally validated in the whole population, but has been examined in people with chronic obstructive pulmonary disease where the area under the receiver operator characteristic curve was moderate to good for hip fracture (0.761) and poor for major osteoporotic fracture (0.614).33 Hip fracture rates observed in QFracture-2016 derivation were very similar to rates in this study (for women, 2.31/1000 person years of follow-up with QFracture-2016 v 2.30/1000 in this study; for men, 0.86/1000 v 0.88/1000).34 However, observed major osteoporotic fracture rates in QFracture-2016 derivation were still somewhat lower than in this study (for women, 5.27/1000 person years of follow-up with QFracture-2016 v 6.12/1000 in this study; for men, 1.92/1000 v 2.26/1000). This difference at least partly reflects that QFracture derivation includes fractures recorded since 1998 whereas this study only includes fractures recorded since 2004, and there is a lower incidence of non-hip fractures in 1998-2003 in QFracture derivation than in the later period (for women, 4.35/1000 person years in 1998-2003 v 5.69/1000 person years in 2004-15; for men, 1.40/1000 v 2.16/1000).34

Secondly, QFracture-2012 does not account for competing mortality risks that results in considerable over-prediction in people at high risk of death from other causes, notably older people and those with high level multimorbidities. Similar over-prediction has been observed in cardiovascular risk prediction models15 35 36 but the effect is greater for prediction of the risk of fracture because death related to fractures is a smaller proportion of total mortality than cardiovascular disease. This problem could be resolved by derivation of new models that explicitly account for competing risk.

The FRAX fracture risk prediction tool is also recommended by NICE and accounts for competing risk of mortality, but systematic external validation is not possible because the prediction algorithm is not publicly available.6 10 Dagan et al reported an external validation of FRAX from primary and secondary care Israeli data, and found similar levels of under-prediction to QFracture-2012 (although their analysis did not account for competing risk of mortality).7 FRAX risk prediction was only approximately based on the number of clinical risk factors, however, rather than based on the actual FRAX risk equation because the FRAX prediction algorithm has never been made publicly available and therefore cannot be replicated. How FRAX accounts for competing risk of mortality and its performance in external validation is uncertain. Publication of the full algorithm would allow direct and fair comparison with other tools to identify the optimal tool for different contexts.7

Bisphosphonates are cost effective at relatively low thresholds of predicted risk1 but misclassification might occur with poor calibration. Consideration of the expected benefit for the individual is recommended in decision making, but aids to patient decision making usually rely on reasonably accurate prediction of individual risk.37 From this perspective, determining risk with QFracture-2012 will under-predict the risk of fracture in younger people and in those with less multimorbidities (and therefore underestimate the expected benefit of treatment) and will over-predict the risk of fracture in older people and those with high levels of multimorbidities (and will therefore overestimate expected benefit of treatment). The updated QFracture-2016 tool likely corrects under-prediction of hip fracture by better ascertainment of fractures using hospital data as well as GP data (as used in this study), but could still under-predict major osteoporotic fracture because of lower recorded rates of such fractures in the late 1990s and early 2000s.

Prediction in elderly people requires specific attention, building on small existing studies of prediction in this population.38 Updating the FRAX model, which accounts for competing mortality,39 is planned, but publication of the prediction algorithm will be critical in establishing its external validity.24

Conclusion

This study found that QFracture-2012 under-predicts fracture risk in general because its derivation is based on incomplete determination of fractures, and considerably over-predicts in groups with a high risk of death from other causes because it does not account for competing mortality risk. Competing mortality risk is an important problem in the context of fracture prediction in older people because non-fracture death is much more common than the fracture outcomes being predicted. External validation of the QFracture-2016 prediction tool now used by the UK's health service is needed, including examining the impact of competing mortality.

Ethics approval

The study was approved by the Clinical Practice Research Datalink Independent Scientific Advisory Committee, protocol 16_248.