Research

Characteristics of non-randomised studies of drug treatments: cross sectional study

Abstract

Objective To examine the characteristics of comparative non-randomised studies that assess the effectiveness or safety, or both, of drug treatments.

Design Cross sectional study.

Data sources Medline (Ovid), for reports published from 1 June 2022 to 31 August 2022.

Eligibility criteria for selecting studies Reports of comparative non-randomised studies that assessed the effectiveness or safety, or both, of drug treatments were included. A randomly ordered sample was screened until 200 eligible reports were found. Data on general characteristics, reporting characteristics, and time point alignment were extracted, and possible related biases, with a piloted form inspired by reporting guidelines and the target trial emulation framework.

Results Of 462 reports of non-randomised studies identified, 262 studies were excluded (32% had no comparator and 25% did not account for confounding factors). To assess time point alignment and possible related biases, three study time points were considered: eligibility, treatment assignment, and start of follow-up. Of the 200 included reports, 70% had one possible bias, related to: inclusion of prevalent users in 24%, post-treatment eligibility criteria in 32%, immortal time periods in 42%, and classification of treatment in 23%. Reporting was incomplete, and only 2% reported all six of the key elements considered: eligibility criteria (87%), description of treatment (46%), deviations in treatment (27%), causal contrast (11%), primary outcomes (90%), and confounding factors (88%). Most studies used routinely collected data (67%), but only 7% reported using validation studies of the codes or algorithms applied to select the population. Only 7% of reports mentioned registration on a trial registry and 3% had an available protocol.

Conclusions The findings of the study suggest that although access to real world evidence could be valuable, the robustness and transparency of non-randomised studies need to be improved.

What is already known on this topic

  • The literature has highlighted particular concerns in non-randomised studies, such as reporting and bias, and occasionally has focused on specific conditions

  • Heterogeneity and limitations in the conduct, analysis, and reporting of non-randomised studies are not as well studied

What this study adds

  • Only 11% of non-randomised studies that assessed the effectiveness or safety, or both, of drug treatments had a comparator, accounted for confounding factors, and had no biases related to time point misalignment (or all biases were dealt with)

  • In a representative sample of 200 reports of non-randomised studies indexed in Medline, most studies had one possible bias related to time point misalignment (70%) and only 2% reported all six of the key elements considered

  • Most studies used routinely collected data (67%), but few reported using validation studies of the codes or algorithms applied to select the population (7%), mentioned registration on a trial registry (7%), or had an available protocol (3%)

How this study might affect research, practice, or policy

  • The robustness and transparency of non-randomised studies should be improved by providing tools for researchers to take advantage of the availability of routinely collected data, comprehensively report study elements, and facilitate the adoption of the target trial emulation framework

Introduction

Randomised controlled trials have long been considered the gold standard for assessing the effects of drug treatments. Whether these trials comprehensively describe the scope of real world clinical practice, however, has been questioned.1 Also, randomised controlled trials might not be feasible for dealing with specific clinical questions, including a particular population, or providing timely evidence.2 3

The prominence of non-randomised studies has risen in recent years, specifically with the increase in real world data.4 5 Subsequently, non-randomised studies can provide evidence on broader patient populations, various treatment regimens, long term outcomes, rare events, and harms.6 7 These studies can have a role in generating timely and cost effective evidence for comparative effectiveness research, providing insight for decision making on drug treatments in the real world setting.8–10 A study summarising the levels of evidence supporting clinical practice guidelines in cardiology found that 40% of 6329 recommendations were supported by level of evidence B (ie, supported by data from observational studies or one randomised controlled trial), with only few recommendations supported by evidence from randomised trials.11

Non-randomised studies are susceptible to numerous limitations related to their design and analysis choices, which could result in effect estimates that are biased.12 Several guidelines have been developed for reporting non-randomised studies,13–15 and the target trial emulation framework has been developed to overcome the avoidable methodological pitfalls of traditional causal analysis of observational data, thus reducing the risk of bias.16 The framework suggests that a non-randomised study should be conceptualised as an attempt to emulate a hypothetical randomised controlled trial addressing a research question of interest, to make causal inference with observational data.16 This framework requires specifying key components of the target trial, such as time points of eligibility, treatment assignment, and start of follow-up. Failure to align these time points would impose a risk of bias in effect estimates.16

The literature highlights particular concerns in non-randomised studies, such as inadequate reporting, and occasionally focuses on specific conditions.17–20 But the heterogeneity and limitations in the conduct, analysis, and reporting of non-randomised studies, in a representative sample of reports, has not been well studied. In this study, our aim was to examine the characteristics of comparative non-randomised studies that assessed the effectiveness or safety, or both, of drug treatments. We focused on general characteristics, reporting characteristics, and time point alignment, and possible related biases.

Methods

Design

This cross sectional study analysed a representative sample of reports, indexed by Medline, of comparative non-randomised studies accounting for confounding, that assessed the effectiveness or safety, or both, of drug treatments, and that were published in June-August 2022. The protocol is registered in Open Science Framework (https://osf.io/sjauh).

Eligibility criteria and search strategy

We included reports of non-randomised studies: conducted in humans; aimed at assessing the effectiveness or safety, or both, of drug treatments; with a comparator arm (eg, active drug treatment comparator, standard of care, or no treatment); and reporting methods to account for at least one confounding factor (eg, multivariable regression, matching, or weighting). We excluded reports of specific publication types (ie, editorials, letters, and opinions piece); reports of studies only assessing non-drug treatments (eg, alternative treatment, surgery, or vaccines); reports of specific study types (trials, case series, case reports, interrupted time series, and guidelines); and reports not written in English. We searched Medline (Ovid) on 29 September 2022. We developed a search strategy with the help of a medical librarian that included both medical subject headings (MeSH) and keywords (online supplemental appendix 1). We combined terms for non-randomised studies (eg, cohort and real world evidence) and for pharmacological treatments (eg, drug treatments).

Study selection

The records identified from the search were exported to Microsoft Excel (Microsoft Corporation, Redmond, WA) and ranked in random order with a random number generator. One reviewer assessed the eligibility of the abstracts and full texts, with 20% done in duplicate and independently by a second reviewer. Any disagreements were resolved by discussion between the two reviewers or with a third reviewer. We sequentially screened records in batches of 500 until we identified 200 reports of eligible studies.

Data extraction

We developed and piloted a standardised data extraction form (online supplemental appendix 2) inspired by RECORD (Reporting of studies Conducted using Observational Routinely collected health Data),13 STROBE (Strengthening the Reporting of Observational Studies in Epidemiology),14 CONSORT (Consolidated Standards of Reporting Trials),21 target trial emulation framework,16 STaRT-RWE (structured template and reporting tool for real world evidence),22 ROBINS-I (Risk Of Bias in Non-randomised Studies - of Interventions),23 and previous studies.24 Data were extracted in duplicate (independently for 20% and 80% as data verification) by two trained researchers. Any disagreements were resolved by discussion or with a third reviewer.

General characteristics

We extracted study characteristics (eg, medical area, region, and contribution of a statistician or methodologist, determined from author affiliations or explicit reporting). We also extracted data on the research question (eg, explicit statement) and its elements: population (eg, patients with chronic diseases), intervention (eg, start of treatment), comparator (eg, active comparator), and outcomes (eg, effectiveness). We extracted data on study design (eg, cohort), type of data used (eg, routinely collected data), and sources (eg, electronic health records). For the study design, we did not use the descritptions reported by the authors but instead we judged the study design based on prespecified criteria (online supplemental appendix 3). We extracted data on the funding source, conflict of interest statement, setting (eg, primary), centre (eg, number of centres), participants (eg, number analysed), and follow-up time. We extracted data on any reference to registration, access to the protocol, content of reported changes to the protocol, data sharing statement (when available), access to codes and algorithms, and reference to ethical review.

Reporting characteristics

We considered six key study elements for reporting: (1) eligibility criteria (ie, explicit reporting of inclusion and exclusion criteria); (2) description of treatment (ie, explicit reporting of dose, frequency, and length of treatment); (3) deviations in treatment (ie, explicit reporting of differences between the definition of treatment and the actual or received treatment); (4) causal contrast or estimand (ie, explicit reporting); (5) primary outcome (ie, explicit reporting of an outcome to be the main or primary outcome in the methods section of the paper); and (6) confounding factors (ie, explicit reporting of the confounding factors accounted for, as part of the methods or results sections in the main text or in the online supplemental material). We also noted whether the reports explicitly stated the use of a reporting guideline (eg, RECORD). We extracted additional data on specific items related to the participants, treatment, outcomes, and confounding (box 1). Online supplemental appendix 2 shows the data extraction instruction sheet.

Box 1

Definitions of specific items

Validation studies of codes or algorithms

We considered reporting of using validation studies assessing the sensitivity and specificity of the codes or algorithms used to select the population in studies that used routinely collected data (when studies explicitly stated it or cited validation studies). Appropriate sensitivity and specificity are important to avoid misclassification and maximise detection of the population of interest.

Eligibility for any treatment arm

We considered that studies reported eligibility for any treatment arm when the authors explicitly stated that participants should not have any contraindications to any treatment arm or that participants could receive any treatment arm, to ensure clinical equipoise (ie, having an equal probability of individuals being allocated to any of the treatment groups).

Negative controls

We determined whether negative controls were explicitly stated. Negative controls are proxies for an unmeasured confounder. A negative control outcome is a variable known not to be causally affected by the treatment of interest. A negative control exposure is a variable known not to causally affect the outcome of interest.

E value

We determined whether E values were computed. The E value is a sensitivity analysis method that represents the extent to which an unmeasured confounder would have to be associated with both treatment and outcome to nullify the observed treatment-outcome association. The E value is a useful concept for assessing the robustness of non-randomised studies.

Time point alignment and possible related biases

We relied on the target trial emulation framework.16 We determined whether the three time points were identifiable: eligibility (when patients fulfil the eligibility criteria), treatment assignment (when patients are assigned to one of the treatment strategies), and (3) start of follow-up (when outcomes in participants are assessed). We then considered if these time points were aligned and, if not, the biases that could occur, such as bias related to: inclusion of prevalent users (selection bias), post-treatment eligibility (selection bias), immortal time periods, and classification of treatment arms. We also recorded the methods reported to deal with these possible biases. We recorded whether the reports explicitly mentioned these biases. Online supplemental appendix 4 provides a detailed description of biases related to time point misalignment and online supplemental appendix 5 describes the methods to deal with bias related to immortal time periods or classification of treatment arms.

Data synthesis

The data were synthesised narratively and in tabular formats. We report descriptive statistics with frequencies and percentages for categorical outcomes and medians with interquartile ranges (25th-75th percentiles) for continuous outcomes. We used IBM SPSS Statistics version 21 for the data analysis.

Patient and public involvement

The study was about the methods of studies and needed methodological and statistical expertise, therefore the involvement of patients and the public was not possible. The results of this study will be disseminated on the research institute’s website and publicised on social media.

Results

Of the 26 123 reports retrieved from the search and randomly ordered, we screened 6800 records and identified 462 reports of non-randomised studies that assessed the effectiveness or safety, or both, of drug treatments. Of these, 57% were excluded: 148 (32%) reports had no comparator and 114 (25%) reports did not account for confounding factors. Overall, we extracted 200 reports of non-randomised studies that had a comparator and accounted for confounding factors. Online supplemental appendix 6 shows a flowchart of the selection of the reports included in our study and online supplemental appendix 7 lists the reports of the 200 non-randomised studies.

General characteristics

Study characteristics

The reports mainly assessed treatments in the specialties of oncology (n=54, 27%), infectious diseases (n=41, 21%), and cardiology (n=24, 12%). The studies were conducted in Central and East Asia, and the Pacific region (n=82, 41%), Europe (n=59, 30%), and North America (n=50, 25%). Most trials were published in specialised medical journals (n=15, 76%) and only 47% (n=94) included a statistician or methodologist among the authors or in the acknowledgements. Online supplemental appendix 8 shows a summary table of the general characteristics of the included reports of non-randomised studies.

Research question

The research question was explicitly stated in 83% of reports (n=166). The population in most reports were patients with chronic diseases (n=118, 59%). Most studies assessed drugs with long standing approval (n=181, 91%). Diverse types of treatment strategies were assessed, such as the start of treatment (n=88, 44%), static treatment strategies (n=43, 22%) (eg, antibiotic treatment until discharge), dynamic treatment strategies (n=32, 16%) (eg, treat-to-target strategies), and different lengths of treatment (n=11, 6%). The comparator was mainly an active comparator (n=78, 39%) and usual care or no treatment (n=72, 36%). Also, half of the reports focused on both effectiveness and safety (n=99, 50%). Table 1 provides a summary of the characteristics of the reports of non-randomised studies.

Table 1
|
General characteristics of included reports of non-randomised studies (n=200)

Study design and data

The reports were mostly of cohort studies (n=189, 95%). Most reports used routinely collected data (n=126/188, 67%) from various sources, such as electronic health records (n=49/137, 36%), registries (n=32/137, 23%), and administrative data (n=30/137, 22%) (table 1). More than half of the studies were conducted in a tertiary setting (n=102, 54%) (online supplemental appendix 8). Of those that reported the number of centres, half were conducted in one centre (n=72/135, 53%). The median number of participants included was 949 (interquartile range 288-9881) and the median number of participants analysed was 633 (216-7708). Median follow-up time was 17.6 months (3-43 months). The study was funded by governmental sources in 31% of reports (n=52/168) and received no funding in 24% (n=41/168). In 72% of reports (n=141/196), the authors declared no conflict of interest (online supplemental appendix 8).

Only 7% (n=14) of reports mentioned registration in a trial registry and 3% (n=5) had an available protocol. More than half had a data sharing statement (n=123, 62%), with the most common being that data are available on reasonable request (n=61/123, 50%) and that data might be obtained from a third party (n=27/123, 22%). Only 5% (n=9) provided access to the codes or algorithms used to classify interventions and outcomes. Most reports mentioned obtaining ethical approval (n=167/190, 88%). In the abstract, a third of the reports used causal language (n=69, 35%). Online supplemental appendix 8 provides a summary of the characteristics.

Reporting characteristics

Figure 1 shows the reporting of key study elements. Only 2% of studies reported all of the key study elements (n=3). Only 11% (n=21) of reports mentioned adherence to reporting guidelines.

Figure 1
Figure 1

Reporting of key study elements for each report (n=200). Each horizontal line corresponds to one included report. Top right panel: a specific colour was attributed to each of the six key study elements. The colour band shows which of these items were reported for each included report. The 200 included reports of non-randomised studies were sorted according to the total number of reported items, in decreasing order. Top left panel=distribution of total number of reported items for the 200 included reports. Bottom panel=proportion of reports that reported each element

Participants

Eligibility criteria and sources for selection of participants were reported in most reports (n=174 (87%) and n=189 (95%), respectively) (table 2). Only 7% (n=10/137) of reports based on routinely collected data reported using validation studies of the codes or algorithms applied to select the population. Only 13% (n=25) reported that participants did not have any contraindications to any of the treatment arms (ie, participants should be eligible for all treatment arms). Some reports mentioned sample size calculation (n=20, 10%) and 21% (n=41) reported a structured sampling method of the population.

Table 2
|
Reporting characteristics of the included reports (n=200)

Treatment

Less than half of the reports described (ie, an explicit report of dose, frequency, and length of treatment) the treatment (n=92, 46%) (table 2). Deviations in treatment were defined in 27% (n=53) and reported in 26% (n=52) of reports, of which 34% (n=22/64) excluded participants from the analysis.

Causal contrast, outcomes, and confounding factors

The causal contrast or estimand was reported in 11% (n=21) of reports (table 2). Primary outcomes were identified in 90% of reports (n=179), and were identified and defined (ie, included details on the method of assessment or on prespecified time points of assessment) in 79% (n=157). Confounding factors were clearly reported in 88% (n=175) of reports, but were mostly listed without justification (n=115/186, 62%). Of the 52 reports that used statistical methods to identify confounding factors, the most common method was choosing variables from the univariate analysis, with a P value <0.05 (n=17/52, 33%) or a different cut-off value (n=20/52, 39%). Several methods were used to account for confounding factors, and in 39% (n=78) of reports more than one method was used. These methods included: matching (n=73, 37%; of which 85% (n=62) used propensity scores); stratification or regression (n=178, 89%); and inverse probability weighting (n=29, 15%; of which 79% (n=23) used propensity scores). Only three reports (2%) mentioned the E value and two (1%) mentioned using negative control outcomes.

Time point alignment and possible related biases

In most reports (n=189, 95%), the time points for eligibility, treatment assignment, and start of follow-up were identifiable (online supplemental appendix 9). Figures displaying the three time points were presented as study design diagrams (n=12, 6%) or participant flowcharts (n=51, 26%). The time points were not aligned in 72% (n=143) of reports. Methods were applied in 11% (n=15/143) to deal with possible biases imposed from the misalignment, but biases were only completely dealt with in three reports. Online supplemental appendix 5 lists the methods used to deal with bias.

Overall, 70% (n=140) of reports had at least one possible bias, 6% (n=11) could not be assessed because of inadequate reporting, and only 25% (n=49) had no bias (figure 2). We identified bias related to inclusion of prevalent users in 24% (n=47) of reports, post-treatment eligibility in 32% (n=63), immortal time periods in 42% (n=84), and classification of treatment in 23% (n=46).

Figure 2
Figure 2

Presence of possible biases related to time point misalignment (n=200). Possible biases for each report of non-randomised studies are summarised for bias related to: inclusion of prevalent users, post-treatment eligibility criteria, immortal time periods, and classification of treatment arms. Each spoke represents one report. The bricks are a visual representation of the possible bias related to time point misalignment: possible bias, could not assess, or no bias. Every concentric circle represents one of the biases, with bias related to inclusion of prevalent users being the furthest circle from the centre, and bias in classification of treatment arms is the central circle. The most external circle represents an overview of the possible biases for each report (at least one possible bias exists, could not assess, and when no bias exists). The histogram summarises the possible biases for the 200 reports

Post-treatment eligibility included having a specific length of follow-up time (n=28, 14%) or an event during follow-up (n=39, 20%), or both (online supplemental appendix 10). Immortal time periods were related to sequential eligibility criteria (n=27, 14%), a requirement to use the treatment during follow-up (n=43, 22%), and presence of grace periods (n=48, 24%). The median grace period was 1 month (interquartile range 0.1-5.7), and 44% (n=21/48) of reports had a difference in grace periods between treatment arms. The authors explicitly stated the presence of at least one possible bias in only 17% (n=24/140) of reports and all possible biases in 4% (n=5/140) of reports.

Discussion

Principal findings

Our study provides a detailed description of the general characteristics, reporting characteristics, and time point alignment and possible related biases, in a representative sample of non-randomised comparative studies assessing drug treatments, indexed in Medline. Most of the reports were of cohort studies conducted in patients with chronic diseases. The reports commonly compared start of treatment with usual care or no treatment, or other active treatments, assessing both effectiveness and safety outcomes. Most of the reports used routinely collected data, but half were conducted in one center or in a tertiary setting. Also, reporting of key study elements, such as a description of the treatment, was often missing. Most of the reports had at least one possible bias. In summary, if we consider the reports of studies that assessed the effectiveness or safety, or both, of drug treatments, only 11% had a comparator, accounted for confounding factors, and had no possible bias related to time point misalignment (or all biases were dealt with).

Comparison with other studies

Our findings are in agreement with the literature. One study evaluating the reporting quality of cohort studies based on real world data reported limited transparency, and only 24% of studies had an available study protocol and 20% had available raw data.25 Also, a study found poor reporting of eligibility criteria in cohort studies (14%).26 Other key areas with suboptimal reporting were variables and their assessment, description of outcomes, statistical methods, biases, and confounding.25–27 Similarly, in target trial emulation studies, a recent systematic review found that the reporting of how the target trial was emulated was inconsistent across the studies identified.20 The literature also highlights the presence of biases in non-randomised studies. One study found that 25% of studies were at high risk of selection bias or immortal time bias, or both, and only five of these studies described solutions to mitigate these biases.24 Also, a scoping review of pharmacoepidemiological studies analysing healthcare data found that 25% of 117 studies mentioned the presence of immortal time bias,28 which was not the case in our study, because only some reports stated the presence of risk of bias.

Interpretation of the findings of this study

The use of routinely collected data has been encouraged in the past, with claims of increasing generalisability to the real world and better adaptation in assessing long term outcomes.3 29 Our findings showed that most non-randomised studies had important limitations related to the accessibility or quality of the available routinely collected data, poor methodological conduct of the studies, or poor reporting of the studies, or a combination of these factors. In addition, elements that increase confidence in the results (eg, use of validation studies and eligibility to any treatment arm) were rarely reported.

Reports of non-randomised studies also lacked transparency as key study elements were not reported adequately. The reasons behind poor reporting should be questioned because these key elements are the basis of every study and, if completed, would improve the quality, reproducibility, and applicability of research.30 31

The possible biases identified can be avoided by thorough planning and explicit reporting to align the three time points. Although aligning the time points of eligibility, treatment assignment, and start of follow-up might sometimes be challenging, many approaches have been proposed in the target trial emulation framework. Moreover, although we only included reports of studies that accounted for confounding factors, we have highlighted the inadequacy of the methods to select these factors (eg, including significant variables from univariate analysis), raising concerns on the presence of bias related to confounding factors. Also, with limited access to study protocols, we could not compare what was planned to what was conducted, raising the possibility of selective outcome reporting.

Strengths and limitations of this study

Our study had several strengths. We described a representative sample of non-randomised studies indexed in Medline, covering all medical disciplines, without focusing on specific conditions. Also, we included a wide scope of information from the reports, while using rigorous quality control measures for data extraction.

Our study had some limitations. We did not assess all of the biases that might have been present, particularly those that were more relevant to case-control studies, such as inappropriate adjustment for covariates, because we only focused on possible biases related to time point misalignment. Our data extraction and assessments were based on the reporting of studies, which might not always reflect how the study was truly conducted. Also, data extraction was done in duplicate and independently for only 20% of reports and 80% as data verification. We included only those studies indexed in Medline in a specific period of time (three months in the year 2022).

Study implications for practice

Researchers should take advantage of the availability of real world data to show the true value of real world evidence. One approach is to provide tools for researchers specific to the use of routinely collected data, covering elements of the whole study process (eg, conception, design, and conduct). Also, although reporting guidelines for non-randomised studies (eg, RECORD and ESMO GROW (European Society for Medical Oncology-Guidance for Reporting Oncology real World evidence)32) and protocol harmonisation (eg, HARPER (HARmonised Protocol Template to Enhance Reproducibility)33) have been emphasised, we advocate for the development of more comprehensive guidelines that include elements specific to non-randomised studies in comparative effectiveness research because they have distinct methodological problems that require other considerations. Moreover, when applicable, researchers should adopt and properly apply the target trial emulation framework, which highlights the need to clearly define the research question, have a well designed protocol with explicitly stated components, and implement appropriate statistical analysis methods. Tools should be developed to facilitate and guide the planning of studies, urging researchers to explicitly define the research question and comprehensively state the components of the study, along with determining an appropriate statistical analysis plan.

Conclusions

Non-randomised studies assessing the effectiveness, safety, or both, of drug treatments are becoming increasingly important as a source of evidence. As the literature shifts more towards non-randomised studies, however, specifically with the increase in access to routinely collected data, reassessing their conduct and reporting is important. While recognising the value of real world evidence, the robustness, quality, and transparency of non-randomised studies need to be improved.

Ethics approval

Ethical approval was not required for this study.