Methods
We submitted a protocol for this study for publication on 7 September 2021. Because the protocol was still under review at the time the study was completed, we withdrew the protocol for publication and present it in online supplement 1.
Search
Our study uses the search strategy of our living SRNMA that includes daily searches in the World Health Organization covid-19 database—a comprehensive multilingual source of global published and preprint literature on covid-19 (https://search.bvsalud.org/global-literature-on-novel-coronavirus-2019-ncov/). Before it was merged with the WHO covid-19 database on 9 October 2020, we searched the US Centres for Disease Control and Prevention's (CDC) covid-19 research articles downloadable database. A validated machine learning model facilitates efficient identification of randomised trials.21
Our search was supplemented by ongoing surveillance of living evidence retrieval services, including the Living Overview of the Evidence (L-OVE) covid-19 platform by the Epistemonikos Foundation (https://app.iloveevidence.com/loves/5e6fdb9669c00e4ac072701d) and the Systematic and Living Map on Covid-19 Evidence by the Norwegian Institute of Public Health (https://www.fhi.no/en/qk/systematic-reviews-hta/map/). Using the above sources, we monitor for retraction notices. Online supplement 2 includes additional details of our search strategy. This study included trials identified up to 3 August 2021.
Study selection
As part of the living SRNMA, pairs of reviewers, after calibration exercises to ensure sufficient agreement, worked independently and in duplicate to screen titles and abstracts of search records and subsequently the full texts of records determined as potentially eligible at the title and abstract screening stage. Reviewers also linked preprint reports with their subsequent publications based on trial registration numbers, the names of investigators, recruiting centres and countries, dates of recruitment, and baseline patient characteristics. When links between preprints and subsequent publications were unclear, we contacted trial authors for confirmation. Reviewers resolved discrepancies by discussion or, when necessary, by adjudication with a third party reviewer.
Eligible preprint and peer reviewed articles reported trials that randomised patients with suspected, probable, or confirmed covid-19 to drug treatments, antiviral antibodies and cellular treatments, placebo, or standard care, or reported trials that randomised healthy participants exposed or unexposed to covid-19 to prophylactic drugs, standard care, or placebo. We did not apply any restrictions on severity of illness, setting, or language of publication but excluded trials reporting on nutritional interventions, traditional Chinese herbal medicines without standardisation in formulations and dosing across batches, and non-drug supportive care interventions.
We did not perform a sample size calculation because we included all eligible trial reports identified through our living SRNMAs up to 3 August 2021. While the parallel living SRNMA performed ongoing daily searches, we pragmatically limited our search because it was no longer feasible to continue to collect additional data from preprints beyond this timepoint.
Data collection
As part of the living SRNMA, for each eligible trial, pairs of reviewers, after training and calibration exercises, independently extracted trial characteristics, methods, and results using a standardised, pilot tested data extraction form. To assess risk of bias, reviewers, after training and calibration exercises, used a revision of the Cochrane tool for assessing risk of bias in randomised trials (RoB 2.0)22 (online supplement 3). Reviewers resolved discrepancies by discussion and, when necessary, by adjudication with a third party.
For the current study, pairs of trained and calibrated reviewers, working independently and in duplicate and using a pilot tested data collection form, collected data on differences in key methods and results between preprint and published trial reports. We prioritised collecting information on key methods and results that might affect the interpretation of trials and decision making by evidence users. For key methods, we focused on aspects of the methods that could affect risk-of-bias judgments, which included description of the randomisation process and allocation concealment, blinding of patients and healthcare providers, extent of and handling of missing outcome data, blinding of outcome assessors and adjudicators, and prespecification of outcomes and analyses. Key results included the number of participants analysed and number of events in each trial arm for dichotomous outcomes and number of participants analysed, means or medians and measures of variability for continuous outcomes. We focused on the same outcomes as our living SRNMA and linked guidelines that were identified as being important or critical for decision making by the review authors and authors of the parallel guidelines, including patient partners: mortality, mechanical ventilation, adverse events leading to discontinuation, admission to hospital, viral clearance, hospital length of stay, length of stay in intensive care, duration of mechanical ventilation, time to symptom resolution or clinical improvement, days free from mechanical ventilation, and time to viral clearance.17–20 For preprints with more than one version, we extracted data from the first version of the preprint, which is the least likely to have been modified in response to peer review.
Because risk of bias might vary across outcomes, for this analysis we presented risk-of-bias judgments corresponding to the following hierarchy of outcomes for therapy trials: mortality, mechanical ventilation, duration of hospital stay, time to symptom resolution or clinical improvement, and virological outcomes. For prophylaxis trials, we used the following hierarchy: mortality, laboratory confirmed and suspected covid-19 infection, and laboratory confirmed covid-19 infection. These hierarchies represent the relative importance of outcomes based on rankings made by the linked WHO guideline panel.20
Data synthesis and analysis
We compared the characteristics and risk of bias of trials with preprints, trials with publications, and trials first posted as a preprint and subsequently published by calculating differences in proportions, associated confidence intervals, and z tests to test for differences in independent proportions. To compare the number of participants in trials with preprints, trials with publications, and trials first posted as a preprint and subsequently published, we performed Mann Whitney U tests.
We calculated the median time from a trial being posted on a preprint server to its eventual publication in a journal and used Kaplan-Meier curves and log-rank tests to assess whether the following factors were predictive of time to publication of trial preprints: source of funding, number of centres and participants, early termination for benefit, intensity of care (inpatient v outpatient), and severity (mild/moderate v severe/critical covid-19), significant primary or secondary outcomes (based on cut-off thresholds defined by the authors or, when no cut-off thresholds were defined, based on a cut-off threshold of P<0.05 or confidence intervals not including the null), and risk of bias (trials rated at low v high risk of bias).
Among trial preprints that were subsequently published in a peer reviewed journal, we described the number and types of discrepancies in key methods and results between preprint and published trial reports. For discrepancies in the reporting of key methods, we reported the number and percentage of the changes between preprints and publications that affected risk-of-bias judgments—changes that we considered to be critical. We also compared the number of preprint and published trials that have been retracted.
For trials that reported on interventions that have been addressed by the linked WHO living guideline20 up to 3 August 2021 (ie, corticosteroids, remdesivir, lopinavir-ritonavir, hydroxychloroquine, ivermectin, interleukin 6 receptor blockers, and convalescent plasma for treatment and hydroxychloroquine for prophylaxis) and the two most commonly reported outcomes (ie, mortality, mechanical ventilation), we conducted pairwise frequentist random effects meta-analyses with the restricted maximum likelihood estimator including versus excluding evidence from preprints at one, three, and six months after the first trial of the drug of interest was made public, either via preprint or publication. The choice of timepoints was informed by timeframes within which guideline developers needed to issue recommendations.20 We also conducted an analysis including versus excluding evidence from preprints at 3 August 2021—the longest timepoint at which we collected data. For hydroxychloroquine for prophylaxis, because mechanical ventilation was not an outcome of interest for prophylaxis trials, we reported only on mortality.
To facilitate interpretation, we calculated absolute effects. To calculate absolute effects, for drug treatments, we used mortality data from the CDC and data on ventilation from the International Severe Acute Respiratory and Emerging Infection covid-19 database.23–25 For prophylaxis, we used the event rate among all participants randomised to standard care or placebo to calculate the baseline risk.
We compared the direction of effect between meta-analyses including preprints and meta-analyses excluding preprints. We considered the direction of effect to be different if one point estimate suggested no effect and another suggested a benefit or harm or if one point estimate suggested benefit and another suggested harm. For treatment, we considered an effect to be beneficial if the point estimate indicated a reduction in risk of mortality of 1% or greater or a reduction in risk of mechanical ventilation of 2% or greater for treatment. For prophylaxis, we considered an effect to be beneficial if the point estimate indicated a reduction in risk of mortality of 0.5% or greater. For treatment, we considered an effect to be harmful if the point estimate indicated an increase in risk of mortality of 1% or greater or an increase in risk of mechanical ventilation of 2% or greater. For prophylaxis, we considered an effect to be harmful if the point estimate indicated an increase in risk of mortality of 0.5% or greater for prophylaxis. Otherwise, we inferred that there was no important effect. Our thresholds for beneficial and harmful effects were informed by surveys of the coauthors in the parallel living SRNMAs.17–19
We used the GRADE approach to assess the certainty of evidence, considering risk of bias (limitations in trial design leading to systematic under-estimation or over-estimation of treatment effects), inconsistency (heterogeneity in results reported across trials), indirectness (differences between the question asked in trials and the question of interest), imprecision (width of confidence intervals), and publication bias (propensity for studies with significant results, notable results, or results that support a particular hypothesis to be published, published faster, or published in journals with higher visibility). We also assessed whether meta-analyses including preprints versus excluding preprint reports led to differences in ratings of the overall certainty of evidence, judgments related to specific GRADE domains, and whether differences in ratings were likely to affect decision making (ie, evidence rated as high/moderate v low/very low).26 We used a minimally contextualised approach to make judgments about imprecision.27 This approach considers whether confidence intervals include the null effect and thus does not consider whether plausible effects, captured by confidence intervals, include both important and trivial effects. We considered any effect on mortality and mechanical ventilation to be important. Thresholds of 1% risk difference for mortality and 2% risk difference for mechanical ventilation informed judgments of minimal or no treatment effect. For prophylaxis and mortality, we used a 0.5% risk difference.27 We performed all statistical analyses in R (version 4.03, R Foundation for Statistical Computing), using the meta, forestplot, survival, and survminer packages.
Patient and public involvement
Patients were involved in outcome selection, interpretation of results, and the generation of parallel recommendations, as part of the parallel SRNMA and guidelines.20 Patients were not involved in the present secondary study.