Research

Comparative effectiveness of second line glucose lowering drug treatments using real world data: emulation of a target trial

Abstract

Objective To build on the recently completed GRADE (Glycemia Reduction Approaches in Diabetes: A Comparative Effectiveness Study) randomised trial examining the comparative effectiveness of second line glucose lowering drugs in achieving and maintaining glycaemic control in adults with type 2 diabetes.

Design Emulation of a target trial.

Setting Medical and pharmacy claims data from the OptumLabs Data Warehouse, a de-identified US national dataset of beneficiaries of commercially insured and Medicare Advantage plans, 29 March 2013 to 30 June 2021.

Participants Adults (≥18 years) with type 2 diabetes who first started taking glimepiride, sitagliptin, liraglutide, insulin glargine, or canagliflozin between 29 March 2013 and 30 June 2021. Participants were treatment naive or were receiving metformin monotherapy at the time of starting the study drug.

Main outcome measures The main outcomes were time to primary and secondary metabolic failure of the assigned treatment, calculated as days to haemoglobin A1c levels of ≥7.0% and >7.5%, respectively. Secondary metabolic, cardiovascular, and microvascular outcomes were analysed as specified in the GRADE statistical analysis plan. Propensity scores were estimated with the gradient boosting method, and inverse propensity score weighting was used to emulate randomisation to the treatment groups, which were then compared with Cox proportional hazards regression.

(Results The study cohort included participants starting treatment with glimepiride (n=20 511), liraglutide (n=5569), sitagliptin (n=13 039), insulin glargine (n=7262), and canagliflozin (n=5290). The insulin glargine arm was excluded because of insufficient control of confounding. Median times to primary metabolic failure were 439 (95% confidence interval 400 to 489) days in the canagliflozin arm, 439 (426 to 453) days in the glimepiride arm, 624 (567 to 731) days in the liraglutide arm, and 461 (442 to 482) days in the sitagliptin arm. Median time to secondary metabolic failure was also longest in the liraglutide arm. Adults receiving liraglutide had the lowest one year cumulative incidence rate of primary metabolic failure (0.37, 95% confidence interval 0.35 to 0.40) followed by sitagliptin (0.44, 0.43 to 0.45), glimepiride (0.45, 0.44 to 0.45), and canagliflozin (0.46, 0.44 to 0.48). Similarly, the one year cumulative incidence rate of secondary metabolic failure was 0.27 (0.25 to 0.29) in the canagliflozin arm, 0.28 (0.27 to 0.29) in the glimepiride arm, 0.23 (0.21 to 0.26) in the liraglutide arm, and 0.28 (0.27 to 0.29) in the sitagliptin arm. No differences were observed between the study arms in the rates of microvascular and macrovascular complications.

Conclusions In this target trial emulation of an expanded GRADE study framework, liraglutide was more effective in achieving and maintaining glycaemic control as a second line glucose lowering drug than canagliflozin, sitagliptin, or glimepiride.

What is already known on this topic

  • The GRADE (Glycemia Reduction Approaches in Diabetes: A Comparative Effectiveness Study) randomised trial compared glimepiride (sulphonylurea), sitagliptin (dipeptidyl-peptidase 4 inhibitor), liraglutide (glucagon-like peptide 1 receptor agonist), and insulin glargine (basal analogue insulin) for the ability of these drugs to lower haemoglobin A1c (HbA1c) in people with moderately raised levels of HbA1c receiving metformin monotherapy

  • Although the GRADE trial found that liraglutide was the most effective among the studied drugs, these findings were limited by the lack of a sodium-glucose cotransporter 2 inhibitor comparator arm and narrow eligibility requirements that excluded most patients in real world practice

What this study adds

  • In this comparative study of the effectiveness of five classes of second line glucose lowering drugs, liraglutide was significantly more effective in maintaining glycaemic control in adults with type 2 diabetes than other second line glucose lowering drugs

  • Liraglutide was more effective than glimepiride, sitagliptin, and canagliflozin in achieving and maintaining glycaemic control in adults with type 2 diabetes

  • Insulin glargine was not included in the comparisons because of insufficient control of confounding with propensity score weighting

How this study might affect research, practice, or policy

  • This study suggests that observational data and methods can be used to emulate clinical trials and to examine the comparative effectiveness and safety of interventions in routine care

Introduction

Type 2 diabetes is a common and serious chronic health condition.1 Timely control of hyperglycaemia, most often measured as serum levels of haemoglobin A1c (HbA1c), is necessary to prevent complications of diabetes and reduce the risk of death.2–8 Most clinical practice guidelines recommend targeting HbA1c levels to <7% for most non-pregnant adults.9 Metformin is recommended as the first line glucose lowering drug because of its efficacy, tolerability, and low cost.10–13 Less certainty exists for the optimal second line glucose lowering treatment, however, when metformin is no longer sufficient, is contraindicated, or cannot be tolerated. This uncertainty is partly because of the scarcity of evidence directly comparing currently available second line drug treatments. Clinical practice guidelines advise that the choice of second line treatment should be informed by clinical and situational considerations specific to each person.10–13 Robust evidence exists indicating the preferential use of specific drug classes in the presence of cardiovascular and kidney comorbidities; however, how these drugs compare with each other in their ability to lower HbA1c levels is less known.

GRADE (Glycemia Reduction Approaches in Diabetes: A Comparative Effectiveness Study) is a recently completed pragmatic, randomised, parallel arm clinical trial that compared, head to head, four second line glucose lowering drugs for their ability to achieve and maintain glycaemic control in adults with moderately uncontrolled type 2 diabetes receiving metformin monotherapy.14–16 GRADE found that liraglutide (a glucagon-like peptide 1 receptor agonist) and insulin glargine (a basal analogue of insulin) were significantly more effective in achieving and maintaining glycaemic control than glimepiride (a sulphonylurea) and sitagliptin (a dipeptidyl-peptidase 4 inhibitor), which were least effective.16

The design of the GRADE trial had important limitations, however, that reduced its relevance to contemporary clinical practice. GRADE did not include sodium-glucose cotransporter 2 inhibitors, agents increasingly used in clinical practice17–19 and the preferred treatment for patients with heart failure and chronic kidney disease.9 Also, the eligibility requirements in the GRADE trial of low baseline levels of HbA1c and the start of second line treatment in the context of baseline metformin monotherapy, resulted in participants representing only 9.1% of adults with diabetes living in the US.20 Both of these factors highlight the need for more timely and generalisable data that are pertinent to the contemporary management of adults with type 2 diabetes.

To show the feasibility and usefulness of using real world data to emulate randomised controlled trials, and thus generate evidence on the comparative effectiveness and safety of drugs faster, cheaper, and with greater external validity, we previously emulated the GRADE trial with observational claims and electronic health record data before the publication of the GRADE results. Our emulation showed similar findings to the GRADE trial,21 although efforts to emulate all specifications in the GRADE trial were hindered because some study conditions (ie, initiation of insulin glargine as a second line treatment at HbA1c concentrations of <8.5%) were not adequately represented among large populations in real world practice because they are not aligned with contemporary standards of care.

In this study, we sought to address some of the limitations of the trial design of GRADE and build on our earlier emulation21 by using the target trial framework22 and comparing the effectiveness of glimepiride, sitagliptin, liraglutide, insulin glargine, and canagliflozin in achieving and maintaining concentrations of HbA1c <7.0% in adults with type 2 diabetes who were naive to these drugs but without further restrictions imposed by the eligibility criteria of the GRADE trial. We also examined the secondary metabolic, microvascular, macrovascular, and safety endpoints planned in GRADE, where feasible using the available claims and electronic health record data. Another prospective randomised controlled trial comparing these drugs with each other for metabolic, microvascular, and macrovascular outcomes is unlikely to be conducted, given the cost, effort, and duration of the original GRADE trial. Hence we expect this emulation to generate important evidence on the comparative effectiveness and safety of these second line glucose lowering drugs in a diverse and generalisable adult population.

Methods

Study design

We retrospectively analysed medical and pharmacy claims data from the OptumLabs Data Warehouse, a de-identified national dataset of beneficiaries of commercially insured and Medicare Advantage plans that represents a diverse mixture of ages, ethnic groups, practice settings, and geographic regions across the USA.23 24 All study data were de-identified consistent with HIPAA (Health Insurance Portability and Accountability Act of 1996) expert de-identification determination. The study is reported according to the Reporting of studies Conducted using Observational Routinely collected Data (RECORD) reporting guideline.

Study population

We first assembled a cohort of adults (≥18 years) who first started taking glimepiride, sitagliptin, liraglutide, insulin glargine, or canagliflozin between 29 March 2013 (date of approval of canagliflozin by the US Food and Drug Administration; the other study drugs were approved earlier) and 30 June 2021 (online supplemental figure S1 and online supplemental method 1). The index date was set to the date of the first claim for the study drug. People who started two or more study drugs on the index date were excluded. To ensure consistent and adequate capture of baseline comorbidities and treatment data, participants were required to have six months of continuous enrolment before the index date. We excluded people with prescription fills for any glucose lowering drugs other than metformin, those with type 1 diabetes, those with missing information for age or sex (<1% of the final cohort), pregnant individuals, and those with no available HbA1c results during the three months before the index date (baseline HbA1c) and in the follow-up period. Laboratory test results are available for a subset of people in the OptumLabs Data Warehouse based on data sharing agreements between OptumLabs and commercial laboratory companies.

Outcomes

The primary outcome was time to primary metabolic failure of the assigned treatment, calculated as days to HbA1c levels ≥7.0%. To assess for potential bias in outcome ascertainment caused by differences in the frequencies and intervals of HbA1c testing, we compared the number, frequency, and timing of available HbA1c test results and found no difference between the groups (online supplemental table S1). Because testing frequency is guided by baseline HbA1c levels, we also examined intervals between sequential HbA1c tests grouped by baseline levels of HbA1c. No differences were found between the treatment groups (online supplemental table S2). Secondary metabolic, cardiovascular, and microvascular outcomes were analysed as specified in the GRADE statistical analysis plan15 and detailed previously,21 if they were feasible to ascertain from claims data (online supplemental table S3).

Covariates

Age, sex, race or ethnic group, and annual household income of participants were identified from OptumLabs Data Warehouse enrollment files at the time of the index date. Data for sex were taken from information in the OptumLabs Data Warehouse database rather than from participant reported gender. Thresholds were chosen based on clinical relevance and distribution of the data (online supplemental method 2). Comorbidities (determined from all claims during the six months preceding the index date) included retinopathy, nephropathy, neuropathy, coronary artery disease, cerebrovascular disease, peripheral vascular disease, heart failure, and previous severe hypoglycaemia and hyperglycaemia. Baseline drug treatments, included as surrogates for the burden of complications, were identified from pharmacy claims in the six months preceding the index date. Online supplemental tables S4 and S5 list the codes and drugs used to define all covariates. We also operationalised, where feasible, the eligibility criteria of participants, as defined in the GRADE trial (online supplemental table S6). We included the eligibility criteria as covariates in the propensity score model rather than excluding those participants from the cohort, because our objective was to examine the comparative effectiveness and safety of second line glucose lowering drugs in a generalisable and heterogeneous real world adult population.

Statistical analysis

Primary analyses followed the intention-to-treat censoring approach, with participants followed until the outcome of interest was reached, the anticipated follow-up duration of the trial (seven years) was achieved, the end of the study period (30 June 2021), the end of insurance coverage, or death (online supplemental figure S2).

Inverse probability of treatment weighting was used to balance the differences in baseline characteristics among the treatment groups. Propensity scores were defined as the probability of receiving each of the treatments given the baseline variables; these propensity score weights were estimated with generalised boosted models, including the baseline variables presented in table 1. Generalised boosted models involve an iterative process with multiple regression trees to capture complex and non-linear relations between treatment assignments and the pretreatment covariates, resulting in the propensity score model that is the best balance among the treatment groups.25 The number of trees in the final generalised boosted model ensemble was selected with interval 10-fold cross validation to minimise differences between the propensity score weighted treatment groups. Stabilised weights were calculated by dividing the marginal frequency of treatment by the propensity scores of the treatment received.26

Table 1
|
Baseline characteristics of weighted cohort

Online supplemental figure S3 illustrates the distribution of weights. Standardised mean differences were used to assess the balance of covariates after weighting; a standardised mean difference ≤0.1 was considered a good balance (online supplemental method 3).27 Before evaluation of the outcomes, weighted sample sizes and ability to account for baseline confounding were examined to determine the feasibility of including each treatment group.

The cumulative incidences of the primary (time to first HbA1c concentration ≥7.0%) and secondary (time to first HbA1c >7.5%) metabolic failure endpoints within each treatment arm were estimated with the inverse probability of treatment weighting Kaplan-Meier method. We used inverse probability of treatment weighting Cox proportional hazards regression models adjusted by baseline HbA1c values to estimate the hazard ratios between treatment groups. Because of the large range of values for baseline HbA1c, we used a spline with five degrees of freedom. The at-risk time for the proportional hazards model was set as three months after the index date because the primary outcome can only be seen starting at the third month. The proportional hazards assumption was assessed with the Schoenfeld residuals for each model. We found significant violation for proportional hazards assumption in the models of primary and secondary metabolic failure outcomes. To deal with the violation of the proportional hazards assumption, we used three time periods (0-40, 41-365, and >365 days after the index date) for baseline values of HbA1c and two periods (0-365 and >365 days after the index date) for the treatment groups for the pairwise comparisons. The boundaries for the time periods were selected based on visual inspection of the Schoenfeld residuals. For the subgroup and falsification endpoint analysis, because no significant proportional hazards violation existed, we used one period for the treatment groups. All pairwise comparisons between the treatment groups were estimated and we applied the Holm method to adjust the P values for multiple testing with an omnibus test for the hazard ratio not equal to one in at least one time period versus one in all time periods. The at-risk start time for modelling secondary metabolic (except for those that are HbA1c related), cardiovascular, and microvascular disease outcomes was set at the study index date. Follow-up time by treatment group was estimated with the inverse probability of treatment weighted Kaplan-Meier method for the censoring distribution.

Results are presented as median times to metabolic failure and the expected proportions of participants with metabolic failure at one and two years by treatment group, and pairwise hazard ratios. P<0.05 was considered significant for all two sided tests. All analyses were performed with SAS 9.4 (SAS Institute, Cary, NC) and R version 4.0.2. (R Foundation, Vienna, Austria). Online supplemental appendix 2 has sample R code for propensity score modelling and outcome comparisons.

Subgroup analyses

A priori defined subgroup analyses were performed based on baseline HbA1c values (<7.0%, 7.0-7.9%, 8.0-8.9%, and ≥9.0%).

Sensitivity analyses

Firstly, we repeated the analyses for primary and secondary metabolic failure with the per protocol censoring approach, with participants followed until the outcome of interest was reached, the study drug was discontinued (defined as not refilling a prescription within 30 days after the end of the last treatment episode), the anticipated follow-up duration of the trial was reached (seven years), the end of the study period (30 June 2021), the end of insurance coverage, or death (online supplemental figure S2). Secondly, to examine the comparative effectiveness of study drugs while treated only with these drugs and not with any other drug treatments for diabetes, accounting for real world treatment practices, we repeated all analyses with the as treated censoring approach, where participants were followed until the outcome of interest occurred, the study drug was discontinued, any other drug was added, the anticipated follow-up duration of the trial was reached (seven years), the end of the study period (30 June 2021), the end of insurance coverage, or death. Thirdly, we assessed residual confounding by testing a falsification endpoint that was unlikely to be associated with the studied drugs: diagnoses of pneumonia, cholecystitis, and appendicitis (online supplemental table S4) during the follow-up period.

Patient and public involvement

Patients were not directly involved in the design, conduct, or dissemination of this study, although people living with type 2 diabetes involved in another ongoing study led by the senior author (RGM) stressed the need for contemporary information about the relative effectiveness of glucose lowering drugs on HbA1c levels specifically. This study was also informed by the calls by clinicians, professional societies, regulatory bodies, and payors to identify preferred glucose lowering treatment strategies in the absence of direct comparisons across the studied drugs and to examine whether and how data collected in the process of routine patient care can be used to emulate and augment the evidence obtained through prospective clinical trials. Study findings cannot be shared directly with participants because patients included in OptumLabs Data Warehouse are de-identified; however, results will be disseminated with patient and public communities via scientific, social media, and institutional communication channels.

Results

Study population

We identified 20 511 adults with type 2 diabetes who started glimepiride, 13 039 who started sitagliptin, 5569 who started liraglutide, 7262 who started insulin glargine, and 5290 who started canagliflozin (online supplemental figure S1). Online supplemental table S7 shows the baseline characteristics of participants before weighting. We found substantial differences (largest standardised mean difference >0.2) in age, race or ethnic group, annual household income, baseline levels of HbA1c, and comorbidities across the four treatment groups. Participants in the liraglutide arm were more likely to be younger, white, and have a higher income than those in the other treatment arms. Participants in the insulin glargine arm were most likely to have the lowest income and the highest prevalence of all of the comorbidities examined.

The insulin glargine arm was excluded from all analyses because of the imbalance in baseline variables after applying propensity scores weighting (online supplemental table S7) and statistically significant differences for the falsification endpoint tests for the insulin glargine pairwise comparisons with the other four arms (all pP<0.05; data not shown). After propensity score weighting, mean age was 60.7 (standard deviation 12.5) years in the canagliflozin arm, 61.5 (12.7) years in the glimepiride arm, 60.5 (12.3) years in the liraglutide arm, and 61.5 (12.7) years in the sitagliptin arm (table 1). Women comprised 47.7%, 47.8%, 49.9%, and 48.4% of the canagliflozin, glimepiride, liraglutide, and sitagliptin treatment arms, respectively. White participants comprised 61.3%, 60.6%, 62.8%, and 60.4% of the treatment arms, respectively. Mean baseline HbA1c levels were 8.3% (standard deviation 1.9), 8.3% (1.9), 8.1% (1.8), and 8.3% (1.9) in the canagliflozin, glimepiride, liraglutide, and sitagliptin arms, respectively. Online supplemental table S8 summarises the distribution of the exclusion criteria from GRADE across the treatment arms. All standardised mean differences were <0.1, except index year.

Primary and secondary metabolic failure

Median follow-up until intention-to-treat censoring was 885 (95% confidence interval 850 to 929) days in the canagliflozin arm, 871 (846 to 894) days in the glimepiride arm, 853 (810 to 917) days in the liraglutide arm, and 883 (852 to 910) days in the sitagliptin arm (online supplemental figure S4). Median times to primary metabolic failure were 439 (95% confidence interval 400 to 489) days in the canagliflozin arm, 439 (426 to 453) days in the glimepiride arm, 624 (567 to 731) days in the liraglutide arm, and 461 (442 to 482) days in the sitagliptin arm (figure 1 and online supplemental table S9). Median time to secondary metabolic failure was also longest in the liraglutide arm (online supplemental figure S5 and online supplemental table S9). To ensure that we had adequate sample sizes to accommodate subgroup analyses for all metabolic outcomes, we measured the weighted number of events per treatment arm (online supplemental table S10).

Figure 1
Figure 1

Cumulative incidence rate of primary metabolic failure by treatment arm (intention-to-treat approach). Primary metabolic failure was defined as time to first haemoglobin A1c concentration of ≥7.0%

In the Kaplan-Meier analysis, liraglutide was more effective in delaying the time to both primary and secondary metabolic failure than the other drugs (table 2). At one year, the estimated cumulative incidence rate of primary metabolic failure was 0.46 (95% confidence interval 0.44 to 0.48) in the canagliflozin arm, 0.45 (0.44 to 0.45) in the glimepiride arm, 0.37 (0.35 to 0.40) in the liraglutide arm, and 0.44 (0.43 to 0.45) in the sitagliptin arm. Similarly, the one year cumulative incidence rate of secondary metabolic failure was 0.27 (0.25 to 0.29) in the canagliflozin arm, 0.28 (0.27 to 0.29) in the glimepiride arm, 0.23 (0.21 to 0.26) in the liraglutide arm, and 0.28 (0.27 to 0.29) in the sitagliptin arm. These trends in cumulative incidence rates persisted at two years for both primary and secondary metabolic failures.

Table 2
|
Estimated cumulative incidence rates of primary and secondary metabolic failure by treatment arm (intention-to-treat approach)

Because the proportional hazards assumption was not met, indicating that the hazard ratios of the different drugs changed over time, we performed pairwise comparisons between the drug classes separately for the two time periods (figure 2 and online supplemental table S11). We found that for glimepiride versus sitagliptin, glimepiride was more likely to achieve primary and secondary metabolic failure during later years of treatment, but we found no significant difference between the two drugs during the first year of treatment. For liraglutide versus sitagliptin, liraglutide was less likely to achieve primary and secondary metabolic failure during the first year of treatment, with no significant difference between the drugs in subsequent years. For canagliflozin versus glimepiride, canagliflozin was less likely to achieve primary and secondary metabolic failure in later years of treatment, with no significant difference between the drugs during the first year of treatment. Canagliflozin and glimepiride were both more likely to achieve primary metabolic failure than liraglutide in the first year, with a consistent effect in the subsequent years for glimepiride and no difference for canagliflozin.

Figure 2
Figure 2

Pairwise comparisons of treatment effects on primary metabolic failure across different time periods. CI=confidence interval

Other secondary outcomes

Insulin was started by 410 (8.4%) participants in the canagliflozin arm, 1900 (9.6%) in the glimepiride arm, 679 (14.5%) in the liraglutide arm, and 1307 (10.4%) in the sitagliptin arm. Online supplemental table S11 presents pairwise comparisons for starting insulin (ie, tertiary metabolic failure). Overall, 323 patientsparticipants had visits to the emergency department or were admitted to hospital for hypoglycaemia during the study period, including <11 individuals in the liraglutide arm, precluding formal statistical analyses.

Online supplemental table S12 shows event rates for all other outcomes. Compared with canagliflozin, glimepiride had a higher risk for major adverse cardiovascular events (online supplemental table S13). Glimepiride had higher risks for all cause mortality and admission to hospital than sitagliptin and canagliflozin. Liraglutide had a lower risk for all cause mortality than sitagliptin. We found no significant differences between the groups for end stage kidney disease, retinopathy, neuropathy, other cardiovascular events, heart failure, pancreatitis, pancreatic and thyroid cancer, or cancer.

Subgroup and sensitivity analyses

We examined the comparative risks of primary metabolic failure in subgroups of baseline levels of HbA1c and found larger hazard ratios for all pairwise comparisons at lower HbA1c levels, with the exception of canagliflozin and sitagliptin where no difference was seen (online supplemental table S14). Online supplemental table S15 shows event rates by subgroup, suggesting that sample sizes were likely adequate for these secondary analyses; formal power calculations were not conducted. Results were mostly similar for secondary metabolic failure.

We also conducted sensitivity analyses with the per protocol (online supplemental figure S6 and online supplemental table S16) and as treated (online supplemental figure S7 and online supplemental table S17) censoring approaches. Another glucose lowering drug was added before discontinuation of the assigned treatment in 727 (15%) participants in the canagliflozin arm, 2618 (13%) in the glimepiride arm, 1374 (29%) in the liraglutide arm, and 2252 (18%) in the sitagliptin arm. Results of the sensitivity analyses were consistent with the primary analyses. We found no significant differences among the treatment groups in the pneumonia, cholecystitis, and appendicitis falsification endpoints (online supplemental table S18).

Discussion

Principal findings

We used the target trial framework to emulate an expanded adaptation of the GRADE trial, based on observational data for a diverse population of participants treated under usual care conditions. Comparing the start of treatment with glimepiride, sitagliptin, liraglutide, or canagliflozin as second line agents in achieving and maintaining glycaemic control among adults with type 2 diabetes, we found that liraglutide was associated with a longer time to both primary (HbA1c >7.0%) and secondary (HbA1c ≥7.5%) metabolic failure than the other agents. We found no difference between the study arms in the rates of most microvascular and macrovascular complications, with two notable exceptions. First, glimepiride was associated with higher risk of major adverse cardiovascular events compared to canagliflozin and with higher risk of all cause mortality and hospital admission compared to either sitagliptin or canagliflozin. Second, liraglutide was associated with lower risk for all cause mortality compared to sitagliptin. Because a prospective randomised controlled trial comparing these drugs head to head is unlikely to be conducted, our findings fill an important knowledge gap in the clinical management of type 2 diabetes and highlight the potential for healthcare data generated as part of routine medical practice to provide important and timely insights about the comparative effectiveness and safety of commonly used drugs.

The six pairwise comparisons of glimepiride, sitagliptin, liraglutide, and canagliflozin showed that liraglutide was the most effective in achieving and maintaining HbA1c levels <7.0% and ≤7.5% in both the intention-to-treat and per protocol analyses. The greater effectiveness of liraglutide compared with other glucose lowering drugs is consistent with previous studies.28–34 Our study’s new contribution was the direct comparison of all four commonly used classes of glucose lowering drugs, including the sodium-glucose cotransporter 2 inhibitor canagliflozin. We found that liraglutide was the most effective, canagliflozin and sitagliptin were moderately effective, whereas glimepiride was least effective in delaying both primary and secondary metabolic failure. For the pairwise comparisons, the greater effectiveness of liraglutide compared with the other drugs was most pronounced in the first year of treatment, narrowing with continued use, and at lower baseline HbA1c values, narrowing as baseline levels of HbA1c increased. Among patients with baseline levels of HbA1c ≥9%, where clinical guidelines recommend combination or insulin treatment,9 glimepiride was associated with earlier time to both primary and secondary metabolic failure than sitagliptin (signalling a potential inadequacy of pancreatic insulin secretion in response to treatment with sulphonylureas at high glucose levels) but no difference between the other three drug classes in reaching HbA1c values <7.0% or ≤7.5% at these high levels of HbA1c.

The risk of starting insulin (ie, tertiary metabolic failure) was highest in individuals treated with liraglutide, particularly during the first year of treatment, perhaps reflecting the greater potency of liraglutide (such that if people fail liraglutide, they are started on insulin rather than trying other non-insulin drugs) and injectable administration (ie, people already taking injectable drugs might have less hesitation in starting insulin than those treated with oral agents).

We found that glimepiride was associated with a significantly higher risk of death than sitagliptin (23% higher; P=0.002), liraglutide (131% higher; P<0.001), and canagliflozin (56% higher; P=0.008). The cause of these deaths is unknown and may be driven by hypoglycaemia, although rates of severe hypoglycaemic events requiring emergency department or hospital care were low in all groups. Previous data on the risk of mortality associated with the use of sulphonylureas have been inconsistent, with meta-analyses or randomised controlled trials finding no difference in the risk of mortality compared with placebo or active comparators,35 but data from observational studies demonstrated increased risk.36 37 Previous studies also did not compare sulphonylureas with newer glucose lowering drugs, two of which (liraglutide and canagliflozin) have robust evidence suggesting benefits for cardiovascular mortality and all cause mortality. Liraglutide and canagliflozin have important cardiovascular, kidney, and metabolic benefits independent of their effect on glycaemic control, including reductions in cardiovascular events, cardiovascular death, progression of kidney disease, renal death, and hospital admission for heart failure.13 38 Our findings therefore support clinical guideline recommendations to consider glucagon-like peptide 1 receptor agonists and sodium-glucose cotransporter 2 inhibitors as second line agents for most people with diabetes, with those with cardiovascular and kidney comorbidities particularly benefiting from their use, and cautious use of sulphonylureas if they cannot be avoided.

Strengths and limitations of this study

This work builds on the recently completed GRADE study16 and on our emulation of the GRADE trial based on data from the OptumLabs Data Warehouse,21 by including a sodium-glucose cotransporter 2 inhibitor comparator arm and broadening the eligibility criteria to ensure greater generalisability of the findings of the study. Sodium-glucose cotransporter 2 inhibitors, in common with glucagon-like peptide 1 receptor agonists, are increasingly recommended as second line, and even first line, glucose lowering agents because of their beneficial effects on the kidney, heart failure, cardiovascular disease, weight, and mortality outcomes.13 Sodium-glucose cotransporter 2 inhibitors are less costly than glucagon-like peptide 1 receptor agonists, however, are given orally, and lack the gastrointestinal side effect profile of glucagon-like peptide 1 receptor agonists, making them highly suitable drugs in the management of type 2 diabetes. In the absence of head-to-head comparisons of the metabolic effects of sodium-glucose cotransporter 2 inhibitors and glucagon-like peptide 1 receptor agonists, patients and clinicians lack the information necessary to inform their decision making when choosing between these drugs. Similarly, for individuals with no cardiovascular or kidney comorbidities or risk factors, the choice of glucose lowering treatment is often driven by metabolic considerations, and these data had been lacking. Our finding of the greater glycaemic effectiveness of liraglutide relative to other second line glucose lowering drugs therefore highlights the advantages of using liraglutide, and likely other glucagon-like peptide 1 receptor agonists, in the management of type 2 diabetes.

Our study had several limitations. Firstly, despite the larger and more heterogeneous study population than our previous emulation of the GRADE trial,21 we could not adequately match people treated with insulin glargine with those treated with other second line drugs and these individuals had to be excluded from the pairwise comparisons. This problem represents one of the biggest limitations of the target trial framework because analyses are limited to interventions that are adequately represented in clinical practice. This limitation, however, does not adversely affect the generalisability of our findings because insulin glargine is not used in this clinical context in routine care (in fact, identifying a population that is equally likely to be treated with insulin glargine and the other agents as second line treatments would likely result in a non-generalisable cohort). Secondly, even with rigorous causal inference analytic methods, observational studies are subject to residual confounding. We sought to check for and mitigate this risk with several falsification endpoint analyses. Thus, although randomised controlled trials are a gold standard for evaluating the comparative efficacy and safety of interventions, observational data can be used to emulate idealised target trials when a randomised controlled trial is not feasible, practical, or ethical.39 40 Thirdly, the frequency and timing of HbA1c tests in routine practice are influenced by many factors, including baseline levels of HbA1c, perceived risk of deterioration of glycaemic control, and the person's capacity to complete testing. Fourthly, HbA1c results are not available for all people in the OptumLabs Data Warehouse, such that our analyses used a convenience sample of participants who obtained their HbA1c test in a commercial laboratory company that provided data to OptumLabs. Also, not all factors that influence glycaemic control can be captured in real world data and therefore could not be accounted for in the analyses. In addition, drugs obtained outside of health insurance benefits (ie, through low cost generic drug programmes, patient assistance programmes, or as drug samples) would have been missed but we expect this practice to be rare in our cohort. Lastly, the study cohort was conducted in Americans with private health plans (both employer sponsored and Medicare Advantage), and the results might not be fully generalisable to people with public health plans, those with no insurance coverage, or those outside of the US.

Conclusions

In this target trial emulation of the GRADE study framework, we found that liraglutide was more effective in achieving and maintaining glycaemic control as a second line glucose lowering drug than canagliflozin, sitagliptin, or glimepiride for people with diabetes. Our findings suggest that observational data and methods can be used to emulate clinical trials and examine the comparative effectiveness and safety of interventions in routine care.

Ethics approval

The study was exempt from approval by the Mayo Clinic institutional review board (IRB No 13-006907) because de-identified data were used in the study. Patient consent could not be obtained because the data were de-identified.