Article Text

Consistency of covid-19 trial preprints with published reports and impact for decision making: retrospective review
  1. Dena Zeraatkar1,
  2. Tyler Pitre2,
  3. Gareth Leung3,
  4. Ellen Cusano4,
  5. Arnav Agarwal5,
  6. Faran Khalid2,
  7. Zaira Escamilla2,
  8. Matthew Adam Cooper6,
  9. Maryam Ghadimi2,
  10. Ying Wang7,
  11. Francisca Verdugo-Paiva8,9,
  12. Gabriel Rada8,
  13. Elena Kum1,
  14. Anila Qasim1,
  15. Jessica Julia Bartoszko2,
  16. Reed Alexander Cunningham Siemieniuk1,
  17. Chirag Patel10,
  18. Gordon Guyatt2 and
  19. Romina Brignardello-Petersen1,11
  1. 1Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, ON, Canada
  2. 2McMaster University, Hamilton, ON, Canada
  3. 3University of Ottawa, Ottawa, ON, Canada
  4. 4Internal Medicine Residency Program, University of Calgary Cumming School of Medicine, Calgary, AB, Canada
  5. 5Department of Medicine, University of Toronto, Toronto, ON, Canada
  6. 6Department of Medicine, University of Alberta Faculty of Medicine and Dentistry, Edmonton, AB, Canada
  7. 7Department of Pharmacy, Beijing Chao-Yang Hospital, Capital Medical University, Beijing, China
  8. 8Epistemonikos Foundation, Santiago, Chile
  9. 9UC Evidence Centre, Cochrane Chile Associated Centre, Pontificia Universidad Católica de Chile, Santiago, Chile
  10. 10Biomedical Informatics, Harvard Medical School, Boston, MA, USA
  11. 11Faculty of Dentistry, University of Chile, Santiago, Chile
  1. Correspondence to Dr Dena Zeraatkar, Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, ON, Canada; dena.zera{at}gmail.com

Abstract

Objective To assess the trustworthiness (ie, complete and consistent reporting of key methods and results between preprint and published trial reports) and impact (ie, effects of preprints on meta-analytic estimates and the certainty of evidence) of preprint trial reports during the covid-19 pandemic.

Design Retrospective review.

Data sources World Health Organization covid-19 database and the Living Overview of the Evidence (L-OVE) covid-19 platform by the Epistemonikos Foundation (up to 3 August 2021).

Main outcome measures Comparison of characteristics of covid-19 trials with and without preprints, estimates of time to publication of covid-19 preprints, and description of differences in reporting of key methods and results between preprints and their later publications. For the effects of eight treatments on mortality and mechanical ventilation, the study comprised meta-analyses including preprints and excluding preprints at one, three, and six months after the first trial addressing the treatment became available either as a preprint or publication (120 meta-analyses in total, 60 of which included preprints and 60 of which excluded preprints) and assessed the certainty of evidence using the GRADE framework.

Results Of 356 trials included in the study, 101 were only available as preprints, 181 as journal publications, and 74 as preprints first and subsequently published in journals. The median time to publication of preprints was about six months. Key methods and results showed few important differences between trial preprints and their subsequent published reports. Apart from two (3.3%) of 60 comparisons, point estimates were consistent between meta-analyses including preprints versus those excluding preprints as to whether they indicated benefit, no appreciable effect, or harm. For nine (15%) of 60 comparisons, the rating of the certainty of evidence was different when preprints were included versus being excluded—the certainty of evidence including preprints was higher in four comparisons and lower in five comparisons.

Conclusion No compelling evidence indicates that preprints provide results that are inconsistent with published papers. Preprints remain the only source of findings of many trials for several months—an unsuitable length of time in a health emergency that is not conducive to treating patients with timely evidence. The inclusion of preprints could affect the results of meta-analyses and the certainty of evidence. Evidence users should be encouraged to consider data from preprints.

  • COVID-19
  • Public health

Data availability statement

Data are available in a public, open access repository. Data are available at https://osf.io/9adxb/.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

WHAT IS ALREADY KNOWN ON THIS TOPIC

  • Clinicians and decision makers need rapidly available, credible information on the comparative effectiveness of treatments and prophylaxis for covid-19

  • During the covid-19 pandemic, the scientific community adopted preprint servers, which allow the rapid dissemination of research findings before publication in peer reviewed journals

  • The medical community, however, has been cautious about adopting preprints owing to concerns that they could lead to the dissemination of erroneous provisional findings

WHAT THIS STUDY ADDS

  • After a review of covid-19 trial preprints and published reports, no compelling evidence indicated any important discrepancies between preprints and published reports

  • The inclusion of preprints could affect the results of meta-analyses and the certainty (quality) of evidence

  • Generalisability of these results is limited to covid-19; furthermore, preprints that are subsequently published in journals might be the most rigorous and might not represent all trial preprints—particularly those that remain unpublished

HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE, OR POLICY

  • Evidence users—including systematic reviewers, guideline developers, and clinicians—are encouraged to consider evidence from preprint trials in contexts in which decisions are being made rapidly and evidence is being produced faster than can be peer reviewed and published

  • Scepticism might still be warranted when suspicion arises regarding falsified data (for which criteria is provided in this article)

Introduction

During the covid-19 pandemic, the scientific community adopted preprint servers, which allow investigators to disseminate research findings before publication in peer reviewed journals. Authors of seminal covid-19 trials—for example, representing massive international collaborations such as RECOVERY1–4 and SOLIDARITY5—reported their results in preprints before subsequent publication in journals.

Growing interest in preprints predates the covid-19 pandemic.6 7 Researchers and evidence users have raised concerns that the traditional publication model is slow, peer review might not always improve the quality of manuscripts, journals impede dissemination owing to paywalls and high publication fees and encourage publication bias by prioritising significant or anomalous findings—issues that preprints may avoid.8–14 Despite these concerns, and the potential of preprints to resolve them, preprints might result in the dissemination of provisional findings that contain important errors that, presumably, published papers do not—therefore, the medical community has been cautious regarding their adoption.15 16

Authors of systematic reviews, guideline developers, and other decision makers face a trade-off when considering preprints: on the one hand, including preprints could reduce the credibility of evidence syntheses and risk serious errors if important differences appear in published reports; on the other, including preprints might increase the precision of estimates, allow timely dissemination of research, and minimise the effects of publication bias.

Knowledge of the extent to which preprints might accelerate the dissemination of findings, the frequency and nature of discrepancies between preprints and subsequent published reports, and the impact that preprints might have on meta-analytic estimates could inform the trade-off that evidence users face. Our study capitalises on our living systematic reviews and network meta-analyses (SRNMAs) of drug treatments, antiviral antibodies and cellular treatments, and prophylaxis for covid-19—an initiative launched in July 2020 that provides real time summaries of the comparative effectiveness of treatments and prophylaxis for covid-19.17–19 These living SRNMAs informed linked guidelines for covid-19 treatments and prophylaxis.20 We use these reviews to assess the degree of discrepancies between covid-19 trial preprints and their later publications and to assess the effects of considering evidence from preprints on meta-analytic estimates, certainty (quality) of evidence, and decision making.

Methods

We submitted a protocol for this study for publication on 7 September 2021. Because the protocol was still under review at the time the study was completed, we withdrew the protocol for publication and present it in online supplement 1.

Supplemental material

Search

Our study uses the search strategy of our living SRNMA that includes daily searches in the World Health Organization covid-19 database—a comprehensive multilingual source of global published and preprint literature on covid-19 (https://search.bvsalud.org/global-literature-on-novel-coronavirus-2019-ncov/). Before it was merged with the WHO covid-19 database on 9 October 2020, we searched the US Centres for Disease Control and Prevention's (CDC) covid-19 research articles downloadable database. A validated machine learning model facilitates efficient identification of randomised trials.21

Our search was supplemented by ongoing surveillance of living evidence retrieval services, including the Living Overview of the Evidence (L-OVE) covid-19 platform by the Epistemonikos Foundation (https://app.iloveevidence.com/loves/5e6fdb9669c00e4ac072701d) and the Systematic and Living Map on Covid-19 Evidence by the Norwegian Institute of Public Health (https://www.fhi.no/en/qk/systematic-reviews-hta/map/). Using the above sources, we monitor for retraction notices. Online supplement 2 includes additional details of our search strategy. This study included trials identified up to 3 August 2021.

Study selection

As part of the living SRNMA, pairs of reviewers, after calibration exercises to ensure sufficient agreement, worked independently and in duplicate to screen titles and abstracts of search records and subsequently the full texts of records determined as potentially eligible at the title and abstract screening stage. Reviewers also linked preprint reports with their subsequent publications based on trial registration numbers, the names of investigators, recruiting centres and countries, dates of recruitment, and baseline patient characteristics. When links between preprints and subsequent publications were unclear, we contacted trial authors for confirmation. Reviewers resolved discrepancies by discussion or, when necessary, by adjudication with a third party reviewer.

Eligible preprint and peer reviewed articles reported trials that randomised patients with suspected, probable, or confirmed covid-19 to drug treatments, antiviral antibodies and cellular treatments, placebo, or standard care, or reported trials that randomised healthy participants exposed or unexposed to covid-19 to prophylactic drugs, standard care, or placebo. We did not apply any restrictions on severity of illness, setting, or language of publication but excluded trials reporting on nutritional interventions, traditional Chinese herbal medicines without standardisation in formulations and dosing across batches, and non-drug supportive care interventions.

We did not perform a sample size calculation because we included all eligible trial reports identified through our living SRNMAs up to 3 August 2021. While the parallel living SRNMA performed ongoing daily searches, we pragmatically limited our search because it was no longer feasible to continue to collect additional data from preprints beyond this timepoint.

Data collection

As part of the living SRNMA, for each eligible trial, pairs of reviewers, after training and calibration exercises, independently extracted trial characteristics, methods, and results using a standardised, pilot tested data extraction form. To assess risk of bias, reviewers, after training and calibration exercises, used a revision of the Cochrane tool for assessing risk of bias in randomised trials (RoB 2.0)22 (online supplement 3). Reviewers resolved discrepancies by discussion and, when necessary, by adjudication with a third party.

For the current study, pairs of trained and calibrated reviewers, working independently and in duplicate and using a pilot tested data collection form, collected data on differences in key methods and results between preprint and published trial reports. We prioritised collecting information on key methods and results that might affect the interpretation of trials and decision making by evidence users. For key methods, we focused on aspects of the methods that could affect risk-of-bias judgments, which included description of the randomisation process and allocation concealment, blinding of patients and healthcare providers, extent of and handling of missing outcome data, blinding of outcome assessors and adjudicators, and prespecification of outcomes and analyses. Key results included the number of participants analysed and number of events in each trial arm for dichotomous outcomes and number of participants analysed, means or medians and measures of variability for continuous outcomes. We focused on the same outcomes as our living SRNMA and linked guidelines that were identified as being important or critical for decision making by the review authors and authors of the parallel guidelines, including patient partners: mortality, mechanical ventilation, adverse events leading to discontinuation, admission to hospital, viral clearance, hospital length of stay, length of stay in intensive care, duration of mechanical ventilation, time to symptom resolution or clinical improvement, days free from mechanical ventilation, and time to viral clearance.17–20 For preprints with more than one version, we extracted data from the first version of the preprint, which is the least likely to have been modified in response to peer review.

Because risk of bias might vary across outcomes, for this analysis we presented risk-of-bias judgments corresponding to the following hierarchy of outcomes for therapy trials: mortality, mechanical ventilation, duration of hospital stay, time to symptom resolution or clinical improvement, and virological outcomes. For prophylaxis trials, we used the following hierarchy: mortality, laboratory confirmed and suspected covid-19 infection, and laboratory confirmed covid-19 infection. These hierarchies represent the relative importance of outcomes based on rankings made by the linked WHO guideline panel.20

Data synthesis and analysis

We compared the characteristics and risk of bias of trials with preprints, trials with publications, and trials first posted as a preprint and subsequently published by calculating differences in proportions, associated confidence intervals, and z tests to test for differences in independent proportions. To compare the number of participants in trials with preprints, trials with publications, and trials first posted as a preprint and subsequently published, we performed Mann Whitney U tests.

We calculated the median time from a trial being posted on a preprint server to its eventual publication in a journal and used Kaplan-Meier curves and log-rank tests to assess whether the following factors were predictive of time to publication of trial preprints: source of funding, number of centres and participants, early termination for benefit, intensity of care (inpatient v outpatient), and severity (mild/moderate v severe/critical covid-19), significant primary or secondary outcomes (based on cut-off thresholds defined by the authors or, when no cut-off thresholds were defined, based on a cut-off threshold of P<0.05 or confidence intervals not including the null), and risk of bias (trials rated at low v high risk of bias).

Among trial preprints that were subsequently published in a peer reviewed journal, we described the number and types of discrepancies in key methods and results between preprint and published trial reports. For discrepancies in the reporting of key methods, we reported the number and percentage of the changes between preprints and publications that affected risk-of-bias judgments—changes that we considered to be critical. We also compared the number of preprint and published trials that have been retracted.

For trials that reported on interventions that have been addressed by the linked WHO living guideline20 up to 3 August 2021 (ie, corticosteroids, remdesivir, lopinavir-ritonavir, hydroxychloroquine, ivermectin, interleukin 6 receptor blockers, and convalescent plasma for treatment and hydroxychloroquine for prophylaxis) and the two most commonly reported outcomes (ie, mortality, mechanical ventilation), we conducted pairwise frequentist random effects meta-analyses with the restricted maximum likelihood estimator including versus excluding evidence from preprints at one, three, and six months after the first trial of the drug of interest was made public, either via preprint or publication. The choice of timepoints was informed by timeframes within which guideline developers needed to issue recommendations.20 We also conducted an analysis including versus excluding evidence from preprints at 3 August 2021—the longest timepoint at which we collected data. For hydroxychloroquine for prophylaxis, because mechanical ventilation was not an outcome of interest for prophylaxis trials, we reported only on mortality.

To facilitate interpretation, we calculated absolute effects. To calculate absolute effects, for drug treatments, we used mortality data from the CDC and data on ventilation from the International Severe Acute Respiratory and Emerging Infection covid-19 database.23–25 For prophylaxis, we used the event rate among all participants randomised to standard care or placebo to calculate the baseline risk.

We compared the direction of effect between meta-analyses including preprints and meta-analyses excluding preprints. We considered the direction of effect to be different if one point estimate suggested no effect and another suggested a benefit or harm or if one point estimate suggested benefit and another suggested harm. For treatment, we considered an effect to be beneficial if the point estimate indicated a reduction in risk of mortality of 1% or greater or a reduction in risk of mechanical ventilation of 2% or greater for treatment. For prophylaxis, we considered an effect to be beneficial if the point estimate indicated a reduction in risk of mortality of 0.5% or greater. For treatment, we considered an effect to be harmful if the point estimate indicated an increase in risk of mortality of 1% or greater or an increase in risk of mechanical ventilation of 2% or greater. For prophylaxis, we considered an effect to be harmful if the point estimate indicated an increase in risk of mortality of 0.5% or greater for prophylaxis. Otherwise, we inferred that there was no important effect. Our thresholds for beneficial and harmful effects were informed by surveys of the coauthors in the parallel living SRNMAs.17–19

We used the GRADE approach to assess the certainty of evidence, considering risk of bias (limitations in trial design leading to systematic under-estimation or over-estimation of treatment effects), inconsistency (heterogeneity in results reported across trials), indirectness (differences between the question asked in trials and the question of interest), imprecision (width of confidence intervals), and publication bias (propensity for studies with significant results, notable results, or results that support a particular hypothesis to be published, published faster, or published in journals with higher visibility). We also assessed whether meta-analyses including preprints versus excluding preprint reports led to differences in ratings of the overall certainty of evidence, judgments related to specific GRADE domains, and whether differences in ratings were likely to affect decision making (ie, evidence rated as high/moderate v low/very low).26 We used a minimally contextualised approach to make judgments about imprecision.27 This approach considers whether confidence intervals include the null effect and thus does not consider whether plausible effects, captured by confidence intervals, include both important and trivial effects. We considered any effect on mortality and mechanical ventilation to be important. Thresholds of 1% risk difference for mortality and 2% risk difference for mechanical ventilation informed judgments of minimal or no treatment effect. For prophylaxis and mortality, we used a 0.5% risk difference.27 We performed all statistical analyses in R (version 4.03, R Foundation for Statistical Computing), using the meta, forestplot, survival, and survminer packages.

Patient and public involvement

Patients were involved in outcome selection, interpretation of results, and the generation of parallel recommendations, as part of the parallel SRNMA and guidelines.20 Patients were not involved in the present secondary study.

Results

Trial characteristics

As of 3 August 2021, we identified 356 eligible trials, of which 101 were only available as preprints, 181 only available as journal publications, and 74 first available as preprints and subsequently published as journal articles. Online supplement 4 presents additional details on the results of the search and table 1 presents trial characteristics.

Table 1

Trial characteristics. Data are number (%) of trials unless statement otherwise

Most trials were registered, completed at the time of reporting, addressed drug treatments, enrolled fewer than 250 participants, reported one or more outcomes that were statistically significant, and were funded by governments or institutions. Nearly two thirds of trials were at high risk of bias, primarily because of their open label design.

Compared with published trials without preprints, trials only available as preprints and trials first available as preprints and then subsequently published were more likely to be registered; trials only available as preprints were more likely to report on interim results, describe drug treatments compared with antiviral antibodies and cellular treatments or prophylaxis, and to have received industry funding; and trials first posted as preprints and subsequently published were more likely to have received government funding.

Predictors of publication and time to publication

During the 1.5 year span of this study, of 175 preprints, 74 (42.3%) were subsequently published in peer reviewed journals. Table 2 presents the proportion of preprints published up to one year. The median time to publication of preprints was 5.9 months. At one year, a third of preprints remained unpublished.

Table 2

Time to publication of covid-19 trial preprints

Table 3 presents predictors for the publication of preprints. Preprints that received government funding, reported on inpatients, or reported on patients with severe disease were published faster than preprints that did not receive government funding, reported on outpatients, or reported on patients with mild or moderate disease.

Table 3

Predictors of time to publication of covid-19 trial preprints

Differences between preprint and published trial reports

Forty two (56.8%) trials had one or more discrepancies in the reporting of key methods and results between the preprint and the later published trial report. We identified a median of 1 (interquartile range 0-2) discrepancy per pair of preprint and publication reports. Online supplement 5 describes these discrepancies.

Thirty (40.5%) trials had one or more discrepancies in the reporting of key methods. The most common discrepancy in the reporting of key methods was the description of allocation concealment, which occurred in eight trials. For four of these eight trials, our judgment of risk of bias for the randomisation domain changed from "probably high" to "low" owing to additional details reported in the published report. Box 1 presents an example.

Box 1

Example of trial reporting additional information on allocation concealment in its published report

The PANAMO trial, which was initially available as a preprint on the Social Science Research Network and later published in Lancet Rheumatology, provided additional details on allocation concealment in the publication. The publication described central randomisation with an online tool and the development of the randomisation list by a third party—all of which were not reported in the preprint.49 50 This addition resulted in a change in the rating of the risk of bias owing to randomisation, from "probably high risk of bias" to "definitely low risk of bias."

Preprint

“Patients were randomly assigned in a 1:1 ratio to receive IFX-1, at a dose of 800 mg intravenously, for a maximum of seven doses, plus best supportive care, or best supportive care only … Randomisation was performed with an online tool within the eCRF (electronic case report form) and was stratified by study site.”

Publication

“Patients were randomly assigned in a 1:1 ratio to IFX-1 plus best supportive care (the IFX-1 group) or to best supportive care only (the control group). Randomisation was done by investigators centrally with an online tool within the electronic case report form and was stratified by study site. The tool used a randomised variable block length of either two or 4. The randomisation list was only available to contract research organisation (Metronomia) staff involved in the production of the randomisation list and set-up of the online randomisation tool.”

Other differences in the reporting of key methods were the publication reporting one or more additional statistics important for meta-analysis (eg, interquartile ranges or standard deviations) that were not previously reported in the preprint (n=6; 8.1%), the preprint reporting on interim results and the publication on completed trial results (n=4; 5.4%), and the publication including a protocol or statistical analysis plan as a supplementary that was not previously included with the preprint (n=3; 4.1%). The overall trial rating of risk of bias, however, changed only for one trial based on additional information provided in the published report.

Thirty one (41.9%) trials had one or more differences in the reporting of key results between preprints and publications. The most common discrepancy in the reporting of key results were changes in outcome data between preprints and publications, which was seen in 20 (27.0%) trials—although most of these discrepancies might likely be attributed to events accumulating in trials from the time when the preprint was posted to when the trial was published. Table 4 presents an example.

Table 4

Example of trial* reporting different outcome data between preprint51 and publication,2 by outcome and treatment group

Despite discrepancies in outcome data being common, results were similar between preprints and publications both in magnitude and precision. Figure 1 shows differences in results on mortality and mechanical ventilation between preprints and publications. Among all preprints with differences in outcomes, differences in relative effects did not exceed 15%, except for one trial with very few events that included just one additional event in the publication.28 Other differences between preprints and publications in key results included the publication reporting at least one additional key outcome that was not included in the preprint (n=11; 14.9%).

Figure 1

Differences in results on mortality and mechanical ventilation between preprints and publications of covid-19 trials.1–5 28 Data are events/total number of participants

Retractions

We identified four retracted trials.29–36 Two trials reported on (hydroxy)chloroquine,29–32 two on favipiravir,31–34 and two on ivermectin.29 30 35 36 One of the trials was retracted when the authors noticed an error in their analysis35 36 and the remainder were retracted owing to concerns about data fabrication or falsification (eg, inconsistencies between the eligibility criteria and patients included in the trial, discrepancies between when the trial was reported to have been conducted and when patients were recruited, inconsistencies between the dataset and the results reported in the preprint, and inconsistencies between the distribution of baseline variables and the described randomisation procedure). We compared the number of retractions between preprints and journal publications. One of the retracted trials was posted as a preprint29 30 and the remainder were published in peer reviewed journals.

Meta-analyses including v excluding preprint reports

Tables 5 and 6 presents results of meta-analyses including and excluding data from unpublished preprints for the comparison of placebo or standard care with several treatments for covid-19, with the outcomes of mortality and mechanical ventilation. Treatments included corticosteroids, remdesivir, lopinavir-ritonavir, hydroxychloroquine, ivermectin, interleukin 6 receptor blockers, and convalescent plasma. Results were recorded at one, three, and six months after the first trial addressing the intervention was made public, either as a preprint or a publication, and at the longest point of follow-up of the trials (up to 3 August 2021). Online supplement 6 presents a more detailed table.

Table 5

Results of meta-analyses excluding and including results from preprints, by mortality outcome

Table 6

Results of meta-analyses excluding and including results from preprints, by mechanical ventilation outcome

In total, we performed and assessed the certainty of evidence of 120 meta-analyses, 60 of which included preprints and 60 of which excluded preprints. Online supplement 7 presents forest plots for meta-analyses.

Because of insufficient data, we could not perform meta-analyses for six comparisons without preprints: mortality and mechanical ventilation, at one month, for ivermectin versus placebo or standard care and for interleukin 6 receptor blockers versus placebo or standard care; and mortality and mechanical ventilation, at three months, for ivermectin versus placebo or standard care.

Differences in estimates from meta-analyses including v excluding preprints

Except for two (3.3%) cases, all meta-analyses including and excluding results from unpublished preprints produced point estimates that were consistent as to whether they indicated benefit, no appreciable effect, or harm. The meta-analysis of corticosteroids at one month suggested a reduction in risk of mechanical ventilation when preprints were excluded (43 fewer per 1000 people (95% confidence interval 59.24 fewer to 22.12 fewer); moderate certainty) and no appreciable effect when preprints were included (1.2 more per 1000 people (60.3 fewer to 131.1 more); very low certainty). The meta-analysis without preprints included one trial with 5418 participants and the meta-analysis with preprints included two trials with 5472 participants.

The meta-analysis of ivermectin at six months suggested no appreciable effect on risk of mechanical ventilation when preprints were excluded (2.3 fewer per 1000 people (95% confidence interval 52.2 fewer to 83.5 more); low certainty) and a reduction in risk of mechanical ventilation when preprints were included (26.7 fewer per 1000 people (74.2 fewer to 75.4 more); very low certainty). The meta-analysis without preprints included seven trials with 1826 participants and the meta-analysis with preprints included nine trials with 4000 participants. Four of 60 meta-analyses had results that were significant with preprints and not significant without preprints, or vice versa.

Differences in ratings of certainty of evidence from meta-analyses including v excluding preprints

We judged nine (15%) of 60 meta-analyses to have different ratings of the certainty of evidence when preprints were included versus when preprints were excluded. For four of these nine cases, we rated the certainty of evidence for the meta-analyses including preprints to be higher than the evidence excluding preprints. For five of these nine cases, we rated the certainty of meta-analyses excluding preprints to be higher than the meta-analyses including preprints. In six of these cases, differences in ratings of the certainty of evidence could have affected decision making (ie, evidence including preprints is rated as high or moderate whereas evidence excluding preprints is rated as low or very low, or vice versa).

Differences in ratings of GRADE domains from meta-analyses including v excluding preprints

Risk of bias

Between meta-analyses including preprints and meta-analyses excluding preprints, judgments related to the GRADE risk-of-bias domain differed only for one meta-analysis (remdesivir v standard care or placebo for mechanical ventilation at six months). We judged the meta-analysis excluding preprints to not have any concerns related to risk of bias and downgraded the meta-analysis including preprints due to serious risk of bias.

Imprecision

Between meta-analyses including preprints and meta-analyses excluding preprints, judgments related to the GRADE imprecision domain differed for 13 of 60 meta-analyses. We judged nine meta-analyses excluding preprints to have more serious concerns related to imprecision than their counterparts including preprints (ie, additional data from preprints narrowed confidence intervals). Furthermore, we judged four meta-analyses excluding preprints to have less serious concerns related to imprecision than meta-analyses including preprints (ie, additional data from preprints increased statistical heterogeneity and hence imprecision).

Discussion

Main findings

Our study presents a detailed assessment of the degree of discrepancies between covid-19 trial preprints and their later publications and the impact of trial preprints meta-analytic estimates, the certainty of evidence, and decision making. We show that preprints remain the only source of findings of many trials for several months. Half of all preprints, for example, remain unpublished at six months and a third at one year—a length of time that might be unacceptable in a health emergency or to patients who might expect that their care is guided by the most recent and best available evidence. Preprints can importantly accelerate the time to dissemination of trial findings.

We did not find compelling evidence of important differences between preprint and published reports of trials—although preprint reports of trials that are subsequently published in journals might not be representative of all trial preprints. Further, we found retractions to occur for both preprints and publications, suggesting that publication in a peer reviewed journal alone does not indicate the trustworthiness of a trial report.

We also found that in most cases, meta-analyses including versus excluding evidence from preprints yielded consistent results. In a minority of circumstances, however, including preprints improved the certainty (quality) of evidence. At six months after trial data first became available, when preprints were excluded, we found low certainty evidence indicating that interleukin 6 receptor blockers might reduce mortality—downgraded because of risk of bias and imprecision. But when preprints were included, we found moderate certainty evidence indicating that interleukin 6 receptor blockers probably reduce mortality—downgraded because of risk of bias. For interleukin 6 receptor blockers, consideration of evidence from preprints could have importantly accelerated the time to incorporation of this treatment as part of standard care.20

In a minority of circumstances, meta-analyses including preprints also reduced the certainty of evidence. At one month after trial data first became available, for example, when preprints were excluded, we found moderate certainty evidence that corticosteroids probably reduce mechanical ventilation, downgraded because of serious risk of bias; we also found very low certainty evidence when we included preprints, downgraded because of serious risk of bias and very serious imprecision. For corticosteroids, including preprints increased imprecision.

Implications

Our findings have implications for evidence users, such as clinicians, who are concerned with the quality of preprints and for systematic reviewers and guideline developers deciding whether to consider preprint reports in systematic reviews and guideline recommendations. Our results support the overall usefulness of preprints, and preprints as a venue through which the dissemination of trial findings might be accelerated. While we found that preprints could both increase or reduce the certainty of evidence, we encourage systematic reviewers and guideline developers to consider evidence from preprints, appraise preprint reports, and only consider excluding preprints in situations where there are data fabrication or integrity issues.

Because at the point of submission to peer reviewed journals trials have already been completed and major methodological decisions (such as whether to collect data on an outcome or whether to blind investigators) cannot be changed, peer review probably looks at only the transparency of trial reports and interpretation of results. We encourage systematic reviewers and guideline developers to consider trial evidence from preprints, especially in circumstances in which decisions are being made rapidly and evidence is being produced faster than can be peer reviewed and published.

We caution, however, that preprints (and publications) might describe untrustworthy trials with fabricated or falsified data. Evidence users might consider scrutinising both preprint and published trials for anomalies that suggest fabrication and falsification (examples of which are reported in box 2).37 38 While such methods require subjective judgments and cannot be used to definitively identify untrustworthy trials, they could be useful to identify trials that are at high risk of such issues. Evidence users might subsequently investigate such trials further or systematic reviewers might consider sensitivity analyses excluding such trials from meta-analyses. We direct readers to other sources that describe these methods.39 These methods could also be useful to journal editors and peer reviewers to apply when considering trials for publication.

Box 2

Methods to assess for fabrication and falsification in clinical trials39

  • Judge whether the reported recruitment speed is feasible given local disease patterns, trial eligibility criteria, and capacity of the recruiting centres52 53

  • Review the trial registration or protocol and assess the consistency between the registration and the manuscript in aspects of the trial that cannot be modified after completion (eg, blinding status)

  • Review profiles of the investigators or institutions involved, for a history of research misconduct

  • Review baseline patient characteristics and test whether reported distributions are consistent with randomisation37 38 54

  • Review primary data, when available, for duplicate records or inconsistencies between the data and reported statistics in the trial manuscript

  • Review primary data, when available, and assess whether correlations between variables are plausible52

Review authors who are concerned about publication bias or are wanting to make decisions within timeframes that might not be conducive to peer review and publication might also consider conducting prospective meta-analyses—meta-analyses that are conducted using an inception cohort of registered trials and incorporate unpublished data from investigators.40 During the covid-19 pandemic, investigators have conducted such prospective meta-analyses to review the effectiveness of corticosteroids and interleukin 6 receptor blockers for covid-19.41 42

Relation of study's findings to previous work

Our study presents data on the contribution of preprints to the body of evidence of covid-19 treatments and prophylaxis. Three studies have reported on differences between covid-19 preprint and published study reports and citations and Altmetric attention metrics.43–45 One study looked at publication characteristics and dissemination of covid-19 preprints, another looked at outcome reporting and spin in interpretation of results, and another looked at risk of bias and spin. These studies were, however, restricted to only publications up to August and October 2020—which is not representative of the current landscape of covid-19 research and which does not include the majority of evidence being currently used to guide covid-19 care, including critical trials of the effects of corticosteroids and interleukin 6 receptor blockers.1 2 These studies did not compare the effects of including preprints on meta-analytic estimates and the certainty of the body of evidence, which is particularly important because evidence users use the totality of the body of evidence, rather than single studies, to make treatment decisions and recommendations.43 One study has reviewed publication rates and citations for covid-19 research but it primarily deals with basic science research and so its findings might not apply to covid-19 clinical trials.46

The covid-19 pandemic highlighted the need for rapid dissemination of research and incited increased interest in preprint servers, which yielded an substantial amount of research that was published on preprint servers and which made this study possible. We are not aware of studies looking at the trustworthiness or impact of preprints in other areas—although such research in other areas would also be useful. Our results are also aligned with previous assessments before covid-19 that study interpretations and other study details do not change importantly between preprints and their later publications in high impact journals.47

Strengths and limitations of the study

The strengths of this study include the comprehensive search for and inclusion of preprint and published covid-19 trial reports and rigorous data collection. The generalisability of our results is, however, limited to covid-19. Journals have expedited the publication of covid-19 research and have been publishing more prolifically on covid-19 than on other areas, which could reduce opportunity for revisions between preprints and their subsequent publications and might mean that time to and predictors of publication might differ than in other research areas. For these reasons, our estimated time to publication of covid-19 preprints is different from estimates made from before covid-19.48

Although the WHO covid-19 database is a comprehensive source of published and preprint literature, it does not include all preprint servers—but preprint servers not covered by our search strategy include other subjects and are unlikely to include covid-19 trials.

Preprints of trials that are subsequently published in journals could represent the most rigorous or transparently reported preprints, which are not representative of all trial preprints. To minimise this limitation, we compared the characteristics of published and unpublished trial preprints and we did not identify any important differences. This comparison suggested that published and unpublished trial reports in our sample are comparable.

Furthermore, published trial reports might still contain errors, posting trial reports as preprints could allow more errors to be identified before final publication, and even trials without discrepancies between the preprint and later published report might have important limitations that reduce their trustworthiness.

Our assessment of differences in key results between preprints and publications was limited to the outcomes that were included in our living SRNMAs. While these outcomes were identified as being important or critical to decision making by coauthors of the living SRNMA and the parallel guideline, they do not include adverse events. Differences in such outcomes could exist between preprints and publications.44 Further, other aspects of the reporting of results (eg, baseline characteristics of patients) could be different between preprints and publications.

We report on the number of publications and preprints that were retracted. Preprints, however, could be less likely to be retracted because they might draw less attention and because preprint servers could be less likely than journals to have formal policies on research integrity. Further, we found too few retractions to be able to draw confident conclusions.

Our assessment of the impact of preprints focused only on the impact of preprints on meta-analytic estimates, the certainty of evidence, and decision making and did not consider other aspects of impact, such as number of citations or Altmetrics.

We used the GRADE approach to assess the certainty of evidence.26 While the GRADE framework provides a transparent and systematic framework of all factors that might bear on the certainty of evidence, its application is subjective.

Our assessment of the contribution of trial preprint reports to meta-analytic estimates and their effect on the certainty of evidence was undertaken in the context of pairwise meta-analysis and the minimally contextualised approach for assessing the certainty of evidence, whereas parallel guideline recommendations have been based on network meta-analyses and the fully contextualised approach.20 27 Despite no compelling reasons that our results will not be generalisable to network meta-analyses, more differences in judgments will be related to the certainty of evidence in a fully contextualised framework where judgments are more dependent on the magnitude and precision of estimates.

We limited our assessment of the impact of meta-analyses including versus excluding preprint reports on meta-analytic estimates to only interventions that have been addressed by the WHO living guideline at the time of analysis.20 While the effects of including or excluding preprints in meta-analyses might vary across interventions, our analysis looks at the interventions that have had sufficient interest and research to instigate guideline recommendations. Interventions informed by fewer trials may be more sensitive to including or excluding evidence from preprints.

Our estimate of the time to publication of preprint reports might have been over-estimated if some preprint authors did not attempt to subsequently publish in peer reviewed journals—although evidence shows that most authors of covid-19 preprints intend to publish their findings.43 Time to publication might also be under-estimated if preprints are made public later in the submission process.

Where discrepancies exist between preprints and publications, systematic reviewers and guideline developers will also need to consider whether the results of preprints or publications are more trustworthy and should be incorporated in the meta-analysis. In such situations, systematic reviewers and guideline developers might assume that changes between preprints and publications are due to errors or inaccuracies in the reporting of the preprint that were later corrected during peer review. Future research should investigate whether including results from preprints or final publications affects overall findings and decisions.

Finally, although we describe discrepancies in the reporting of key methods and results, we did not assess differences in the discussion or conclusion sections of trial reports.

Conclusions

We found no compelling evidence indicating important discrepancies between preprint and published trial reports. We show that including preprints might affect the results of meta-analyses and the certainty of evidence, and might encourage evidence users to consider data from preprints in contexts in which decisions are being made rapidly and evidence is being produced faster than can be peer reviewed and published. Scepticism might still be warranted when suspicion arises regarding falsified data (for which we provide criteria).

Data availability statement

Data are available in a public, open access repository. Data are available at https://osf.io/9adxb/.

Ethics approval

Not applicable.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Twitter @dena.zera

  • Contributors DZ, RACS, and RB-P conceptualised the study. JJB, FV-P, GR, and AQ screened studies for eligibility. TP, GL, EC, AA, FK, ZE, MAC, MG, YW, EK, JJB, and AQ collected data. DZ drafted the first version of the manuscript. All authors reviewed the manuscript and provided important intellectual input. DZ is study guarantor. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted. Transparency: The lead author affirms that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned have been explained.

  • Funding This study was supported by the Canadian Institutes of Health Research (CIHR-IRSC:057900132) and Coronavirus Rapid Research Funding Opportunity (OV2170359). DZ is funded by a Banting Postdoctoral Fellowship. The funders had no role in considering the study design or in the collection, analysis, interpretation of data, writing of the report, or decision to submit the article for publication.

  • Competing interests All authors have completed the ICMJE uniform disclosure form at www.icmje.org/disclosure-of-interest/ and declare: support from the Canadian Institutes of Health Research for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.