Discussion
Main findings
We used the OpenSAFELY platform to compare how electronic health records in primary and secondary care in England identified cohorts of people receiving kidney replacement therapy. We found that when used separately, primary and secondary care electronic health records had high sensitivity but low positive predictive value when identifying people currently receiving kidney replacement therapy, with accuracy higher for identifying kidney transplantation than dialysis. We found some variation in accuracy by personal characteristics (in particular, lower accuracy in children), and different patterns were seen in primary and secondary care data. When patients in UKRR data were identified in primary or secondary care data, agreement for treatment modality with UKRR data was high. Prevalent patients in UKRR data who were not coded as receiving kidney replacement therapy in primary care data for the most part had a diagnosis of chronic kidney disease. This group was more likely to be recipients of dialysis in UKRR data. In contrast, most patients in UKRR data with no kidney replacement therapy code in secondary care data also had no chronic kidney disease code, and were largely recipients of kidney transplants in UKRR data. Start dates for kidney replacement therapy were inaccurate in primary care, with half of the UKRR incident cohort having no kidney replacement therapy code within three months of the UKRR start date.
Discrepancies between primary care, secondary care, and UKRR data
We found that many people were identified as recipients of kidney replacement therapy (particularly dialysis) in primary and secondary care data only, but only some were potentially accounted for by the known reasons that people receiving kidney replacement therapy are not included in UKRR cohorts. Firstly, the prevalent UKRR cohorts did not include people who stopped kidney replacement therapy by the prevalent date (eg, because of recovery or conservative management); these patients would be included in primary and secondary care cohorts. Secondly, the UKRR database operates on an annual basis and people could be present in one year and not included in a later year (eg, patients who received a kidney transplant who might have moved away temporarily and were cared for only by their primary care doctor on their return). Thirdly, border effects might exist in that only data for English renal centres were submitted to OpenSAFELY, so people residing in England but treated at a Welsh or Scottish renal centre would not be in the UKRR cohort but could be in primary or secondary care cohorts if they were registered with an OpenSAFELY-TPP practice. From UKRR data, we estimate that, in total, these groups would account for <1600 people in the study cohort.
Nationally, more people are registered at English general practices than are in the population,18 so over-registration (eg, of people who have moved abroad) might contribute to the differences seen. The UKRR definition of incidence of kidney replacement therapy excludes those who recover kidney function before 90 days of dialysis treatment, and those who die after starting acute dialysis. A previous UKRR analysis estimated that 20% of people who ever start dialysis are still alive but no longer receiving kidney replacement therapy at 90 days.19 Patients recovering within 90 days who are still alive will continue to be considered part of the kidney replacement therapy population in primary and secondary care, although they would not be identified as an incident patient in the UKRR study population. Also, acute dialysis can be delivered by intensive care staff without involvement of renal care. We believe that acute dialysis is the main reason for the discrepancy between UKRR and secondary care data. The 90 day requirement might also contribute to the discrepancy in start dates, because if a patient receives acute dialysis and continues beyond 90 days to require chronic dialysis, their start date in UKRR data is considered to be when they started acute dialysis. Start dates might not be determined the same way in primary care data.
We found no primary or secondary care codes whose exclusion would substantially improve the positive predictive value without a decrease in sensitivity. Some codes, along with critical care data, however, could be used to flag the patient record for further investigation. In particular, the acquired arteriovenous fistula code indicates dialysis preparation but not actual dialysis; the transplant nephrectomy code was less common among people also in the UKRR cohorts and could be a miscoding of nephrectomy. Among transplant codes, live donor renal transplant and donor renal transplantation might have been entered for the donor rather than the recipient. The proportion of secondary care inpatient episodes featuring some critical care was higher in people not in the UKRR cohort for the codes for dialysis not elsewhere classified and haemodialysis not elsewhere classified, possibly indicating acute dialysis. In the primary care data, restriction based on estimated glomerular filtration rate measurements can reduce the number of people incorrectly identified as receiving chronic dialysis but with limited use for transplant or overall kidney replacement therapy.
Strengths and weaknesses of this study
Our study linked UKRR data with primary and secondary care data at a population level, inclusive of all ages, by using the secure OpenSAFELY platform. This approach allowed validation of primary and secondary care coding in both directions, rather than being limited to assessment of sensitivity through linkage of the UKRR cohort only. The UKRR has been established for more than 25 years, providing in-depth data with complete UK coverage for all adults and children receiving chronic kidney replacement therapy, making it a unique resource for kidney medicine and facilitating analysis that is not possible in other clinical areas. Data undergo extensive validation and cleaning, and thus UKRR data can be considered a gold standard for defining incident and prevalent chronic kidney replacement therapy cohorts.
A key limitation of our study is that the analysis was restricted to people in the OpenSAFELY-TPP database. For the whole patient population, these data have been shown to be reasonably representative of the English population,17 but we found differences compared with the UKRR prevalent cohort. This finding is in part because of the disparities in the cohorts identified by the three data sources (as we set out to describe) but also might be because London has a high prevalence of kidney replacement therapy20 but is under-represented in the OpenSAFELY-TPP database.
Previous work by Iwagami et al21 compared estimated population prevalence in the Clinical Practice Research Datalink with published UKRR data from 2014, and found a similar estimated prevalence of kidney replacement therapy in the two sources. Considering the low prevalence of kidney replacement therapy in the population (0.05%22), discrepancies in the UKRR and primary care kidney replacement therapy cohorts in our study would not have corresponded to notable changes in prevalence. Iwagami et al reported a lower prevalence of haemodialysis in the Clinical Practice Research Datalink compared with UKRR, whereas we found larger numbers of patients receiving dialysis in the primary care data. Our study looked at haemodialysis and peritoneal dialysis together and included more dialysis codes. Also, coding practices and accuracy might have changed since 2014, although no other studies exist to confirm this. Our results are in contrast with findings from international systematic reviews23 24 on the accuracy of coding of chronic kidney disease in electronic health records, where studies generally had poor sensitivity but good specificity and reasonable positive predictive values. This difference suggests that the wider body of work on the validity of using administrative data for chronic kidney disease23–27 is not applicable, and supports the need for further studies looking specifically at kidney replacement therapy.
This linkage was done to understand how previous work on covid-19 based on only primary and secondary care data to identify people receiving kidney replacement therapy might have been affected by misclassification. In this study, we restricted the primary and secondary care definitions to the presence of one of a list of codes indicating kidney replacement therapy in a patient's history, rather than using combinations or exclusions of codes, or requiring codes to be present multiple times to indicate chronic dialysis. This method reflected the approach taken in previous studies of patients receiving kidney replacement therapy in OpenSAFELY, itself based on previous work in other sources of primary care data, but the positive predictive value might be increased while maintaining sensitivity if more complex definitions were applied.
Based on our findings, in general, analyses that do not use UKRR data cannot reliably distinguish between people who have had acute dialysis from those who remain on chronic kidney replacement therapy. More than a third of people starting dialysis are given an acute code in UKRR data, and nearly a quarter of these will still be receiving kidney replacement therapy on day 90 and thus considered to be on chronic dialysis.19 Depending on the question, the distinction between acute and chronic dialysis is perhaps not important, especially in terms of identifying risk factors for poor outcomes related to covid-19 disease. For chronic kidney replacement therapy, particularly if correct start dates are needed, registry data are required. For researchers interested in whether people have ever required any form of kidney replacement therapy (eg, as a baseline risk factor for other outcomes), then a dataset based on primary and secondary care data only could be considered sufficient. We found that most people incorrectly identified as prevalent recipients of kidney replacement therapy in primary care data had reduced kidney function based on their latest estimated glomerular filtration rate. For previous studies of populations receiving kidney replacement therapy based on OpenSAFELY, some misclassification across stages of chronic kidney disease could have occurred, but if anything, the broader definition would likely have led to attenuated findings.
Policy implications and interpretation
Primary and secondary care electronic health records were used during the covid-19 pandemic to identify clinically vulnerable people and communicate shielding advice. Accurate and prompt coding of people with immunosuppression and other high risk conditions is needed to ensure these patients are adequately protected in future pandemics. Some patients who were eligible for interventions, such as vaccination or antiviral treatment, may not have been identified in a timely manner by primary and secondary care codes, as demonstrated by the analysis of kidney replacement therapy start dates. Communication with patients and care providers may therefore have been suboptimal. Accurate and prompt coding of kidney replacement therapy is needed to ensure that clinically vulnerable groups are adequately protected in future pandemics.
Evaluation of short term outcomes of covid-19 disease is perhaps less relevant in children because of the comparably lower risk of adverse outcomes, but these findings suggest linkage of UKRR data is necessary to monitor vaccination trends and long term outcomes after infection in this cohort.28 Children living with kidney disease have a substantial disease burden of treatment throughout their lives, with reduced life years compared with their peers,29 and identifying this cohort and monitoring their care is therefore imperative. Poor coding in primary and secondary care data is concerning. We saw variation in the accuracy of coding across age ranges, as well as by ethnic group and index of multiple deprivation, limiting the ability to provide an equitable health service across the population. Coding is often carried out by inexperienced staff, but inaccuracies can have substantial implications for local resources.30 31
Outside of the context of covid-19, obtaining linked data can be challenging with additional resource and governance requirements. Our analyses can help in clarifying whether routine primary or secondary care electronic health records for a particular project would suffice, thus saving resources if UKRR data are not required. When only routine secondary and primary care data are used, as is typical in pharmaco-epidemiology studies, we showed that linkage to a kidney registry is required to accurately identify starting dates for those who require long term dialysis or kidney transplantation. On the other hand, our work showed the extent of acute kidney care that is performed (and not reported in registries of chronic kidney failure), which is particularly relevant for settings where financing of kidney services is driven only by chronic need. More generally, this study highlighted the value of linking registry data to routine electronic health records with implications beyond kidney medicine, because it adds to a growing body of work demonstrating similar benefits in a range of clinical areas, such as cardiovascular events,5 6 32 cancer,33 and diabetes.34
Conclusions
Linkage with UKRR kidney replacement therapy data facilitated more accurate identification of incident and prevalent cohorts receiving kidney replacement therapy than was achieved with only electronic health records. Codes used in primary and secondary care data only missed a small proportion of prevalent patients receiving kidney replacement therapy. Codes also identified many patients not receiving chronic kidney replacement therapy in UKRR data, particularly dialysis codes. This study also showed that new patients starting dialysis for the first time are not identified promptly by primary care codes leading to a delay in receiving timely interventions for patients with immunosuppression. Poor coding also has implications for any patient care, including resource planning, that relies on accurate recording of kidney replacement therapy in primary and secondary care data.