Discussion
Our study found large variation in the prevalence of multimorbidity depending on the timeframe used to define long term conditions in a nationally representative cohort of patients in the general practice electronic health record in England, ranging from 41.4% where three codes were required within a 12 month period to 73.9% where a single code sufficed. Using active problem codes resulted in the lowest prevalence of 35.2%. A large variation was found despite using the same set of conditions and applying the alternative definitions for only 41 (19%) of the 212 conditions included. Prevalence of multimorbidity by sociodemographic factors varied substantially, particularly for age. Our selection of alternative timeframes represents a range of approaches, rather than a comprehensive assessment of all possibilities. Although we were unable to assess from the electronic health record whether people are appropriately reclassified using alternative timeframes, our results highlight the large disparities in estimates of multimorbidity prevalence, and that choice of timeframe will affect estimates for some people more than others.
Prevalence of multimorbidity reported in the existing literature ranges widely, with differences explained in part by the number of long term conditions included.5 We used a similar number of conditions to those of Head and colleagues who reported a prevalence of 52.8% in England in 2019.3 Their reported prevalence of multimorbidity was much higher than those from two other UK based studies of 23.2% and 27.2%, which used a smaller set of long term conditions (40 and 36, respectively, compared with Head et al's 211).9 16 Our most comparable definition (three codes within 12 months) resulted in a lower prevalence of multimorbidity (41.4%) than that of Head and colleagues, which may relate to our recategorisation of codes from the CALIBER study, and our different set of conditions that required multiple codes. Using a single code definition, the detected prevalence in our study was substantially higher (74%), likely reflecting inclusion of acute or inactive conditions. Our study is novel in identifying that the timeframe for defining long term conditions also has a substantial effect on prevalence of multimorbidity, comparable to the differences in the number of long term conditions on the prevalence of multimorbidity.
Our results also show that differences in prevalence estimates according to timeframe do not have uniform effects across a population. Large differences were found between demographic factors and the prevalence of multimorbidity and the number of long term conditions. Older age, South Asian ethnicity, and greater deprivation were associated with a lower risk of being reclassified as not multimorbid (figure 4) across all alternative definitions, but with corresponding larger reductions in the total count of diseases (online supplemental figure A3). Various sociodemographic factors are known to affect incidence and prevalence of multimorbidity, with previous studies highlighting that risk is higher in people living in areas of greater socioeconomic deprivation and people of black ethnicity.3 17 Although we found that those living in less deprived areas had a higher prevalence of multimorbidity when using a single code definition (table 1), this group were more likely to be reclassified as not multimorbid than people in more deprived groups when requiring multiple codes (figure 4). Our findings suggest differences in the composition of long term conditions contributing to multimorbidity in younger versus older age, and in people living in less versus more deprived areas. Groups that are more likely to be reclassified are more likely to have conditions where a single code could indicate an acute condition, for example, dermatitis and enthesopathy and synovial disorders, which were large contributors to overall multimorbidity burden (online supplemental figure A1). Other explanations for differences between groups might be explained by differential under-counting of disease burden or differences in health seeking behaviour and access to healthcare.18 19 Variation in how disease codes are entered between general practices and between clinicians within the same general practices which impact differently on patient groups.20 21
We hypothesised that using only conditions designated by clinicians as active problems in the healthcare record could be an effective route to identifying active long term conditions. This method yielded the lowest prevalence of multimorbidity of 35.2% overall, and of 87.8% in those aged 80 years and older, but with substantial variation between conditions. In primary care in England, the management of some long term conditions is incentivised by the quality and outcomes framework which started in 2004.22 We found that conditions such as hypertension, type 2 diabetes, and chronic obstructive pulmonary disease, all included in the framework, were much more similar in prevalence comparing the single code and problems definition. This suggests there may be coding bias towards conditions present in the framework when using problem codes, and so we do not recommend this method for studies of multimorbidity including a broad range of conditions.
Strengths and limitations
A strength of our study is the use of large sample of adults registered to primary care in England, with previous studies of the Clinical Practice Research Datalink Aurum data finding it to be representative of the national population.11 We adopted a broad set of disease codes, many of which have particular relevance to primary care, and applied different definitions of long term condition timeframe using the same set of conditions to the same patient cohort. To an extent, these definitions are arbitrary, and our analysis presented a range of approaches used in the literature before, rather than attempting to analyse all possible definitions. For example, a condition lasting more than six months is sometimes used to define a long term condition,6 but we did not include this definition because estimates would lie between our range of three and 12 months.
A limitation of using routinely collected healthcare data is that some conditions may not be recorded, either because a person does not present with a condition or because when presenting, the condition is not coded by a clinician. The likelihood of missing codes is unlikely to be random with respect to either patients or diseases. Although a previous systematic review found good agreement in diseases recorded in Clinical Practice Research Datalink with other sources,23 comparison against cancer diagnoses in the Clinical Practice Research Datalink with cancer registry data found a range of 9–26% for different cancers were missing in the Clinical Practice Research Datalink.24 The financial incentives offered by quality and outcomes framework have improved data collection and coding in primary care22 and may therefore inflate the prevalence of long term conditions requiring multiple codes for conditions included in the framework relative to other conditions.
The code lists used for raised total cholesterol, raised low density lipoprotein cholesterol, low high density lipoprotein cholesterol and raised triglycerides available from the CALIBER study include test results (rather than diagnostic codes), which were rarely coded as problems, partially explaining the substantially lower prevalence of multimorbidity overall. Studies using problem codes alone would therefore need to include diagnostic codes, but was not feasible for comparison in our study as the granularity required in low density lipoprotein cholesterol and high density lipoprotein cholesterol measurements cannot be accounted for using diagnostic codes alone. A further issue of using problem codes is in defining active problems. In Clinical Practice Research Datalink Aurum no date is recorded at which a problem is changed from active to inactive. Our data extraction occurred in May 2022, and so some codes marked as inactive were likely active at the study start date, and so our findings may represent an underestimate. However, given our focus on chronic conditions, the number of conditions to have resolved is unlikely to be large.
We used a large set of chronic conditions, which will have unequal burden on clinical outcomes, healthcare use, and quality of life. Many may not be included in other multimorbidity measures, with a recent Delphi study identifying a core set of 24 conditions to always include and 35 to usually include and with no conditions identified for exclusion.25 A smaller and standardised set of conditions can aid comparability and reproducibility but has drawbacks. Firstly, a smaller set limits the scope to detect novel associations between less common conditions. Secondly, choices over conditions are subjective and those that are less frequent can still have a large burden on individuals.
The overlap between some of the long term conditions included here may lead to double counting and over-estimation of multimorbidity, for example, combining myocardial infarction and angina as ischaemic heart disease. However, when investigating the cause of disease, more granular categories may be a benefit because a person with a myocardial infarction who subsequently develops angina may have a different trajectory and opportunities for intervention and prevention to a person with angina who subsequently develops a myocardial infarction, despite a common pathophysiology.
Combining diagnostic codes with relevant medications or treatments may help to distinguish active versus inactive conditions; for example, use of a proton pump inhibitor in gastritis. Similarly, the quality and outcomes framework excludes patients from the asthma register if no asthma related drugs were prescribed in the past 12 months.26 However, a difficulty with this approach for some diseases is that drugs can have multiple indications; for example, proton pump inhibitors being co-prescribed with non-steroidal anti-inflammatory medications, which may be more common in those with a history of gastro-intestinal problems irrespective of active symptoms.27
Implications
Despite the consensus in the medical literature on the definition of multimorbidity as the co-occurrence of two or more chronic conditions,4 which conditions should be included or how to determine chronicity has not been agreed.5 25 Our findings highlight that even when using the same set of medical codes, decisions on how to define a long term condition can change the prevalence of multimorbidity almost twofold, which has important implications for the direct comparison of estimates between studies. Results also suggest that a universally agreed metric of multimorbidity prevalence may be an unrealistic target and that estimates are highly context dependent. Rather than seeking one rigid definition of multimorbidity, we believe that researchers should instead embrace the variety of approaches, better reflecting the variety of lived experiences of people having multimorbidity. As such, we do not advocate only one of our approaches as the best, with choice dependent in part on the aims of the research. Nevertheless, some approaches may be more suitable in particular contexts, while the impact of bias, and exclusion of specific groups from multimorbidity measures, highlighted in our work, should be assessed (Box 1).
Box 1Recommendations for research in applying a timeframe for counting long term conditions in the electronic health record
Provide a rationale for the choice of timeframe to define a long term condition
For studies of disease causes, use of a single code anywhere in the record may be preferred, as a condition even if historic and inactive may be relevant to subsequent disease development
Analyses of current interactions with healthcare services may be best suited to using codes appearing in a recent timeframe, for example, within the past 12 months
Timeframes requiring more than one code may be biased by factors related to repeated coding, therefore, we recommend considering sensitivity analyses using only a single code definition
Researchers should consider which conditions to include, and may opt for a narrower set of diseases by inclusion of only conditions for which a single code would indicate chronic risk, rather than those that may also represent acute conditions. Where inclusion of a greater breadth of diseases is preferred, researchers should decide whether some of these require presence of multiple codes over time. For studies focused on cause or accumulation of diseases over time, use of any diagnostic code in the record may be the preferred approach, as a disease, even if not currently active and only recorded once, may be relevant to subsequent disease development. However, this approach may lead to inclusion of acute or inactive conditions, inflating the prevalence of multimorbidity. For studies focused on associations between current disease burden and healthcare service use and treatment, use of more contemporaneous active diseases may be preferable. For example, use of codes from the past 12 months, or by incorporating prescription data where possible for some conditions. However, any approach that uses multiple codes may be at greater risk of bias in the frequency of coding of conditions, which may depend on factors related to the patient, clinician, general practice, and coding incentives.21 Therefore, we suggest that use of a single code definition should always be considered as a sensitivity analysis to understand the effect on prevalence and differential impact between patients.