Methods
This project was preregistered on the Open Science Framework (https://osf.io/drpc5).29 More details on the methods are available in the protocol and on the Open Science Framework. All data collection, preparation, and analyses were conducted in Python 3 (Python Software Foundation) with data and code publicly available from GitHub30; manual data extraction was conducted in Google Forms.
Data sources
Data from all available EUCTR protocols (ie, details of the trial in each country) and results sections were extracted with custom web scraping software between 1 December 2020 and 3 December 2020.31 32 A trial record on the EUCTR is made up of a protocol for every EU country in which the trial took place, a protocol for non-EU locations of some paediatric trials, and, if available, a results section covering the whole trial. Key fields for this study from EUCTR included trial status (eg, ongoing, completed), date of ethics committee opinion and date of competent authority decision (ie, as proxies for the start of the trial), date of the global end of the trial, and the initial estimate of the duration of the trial from section E.8.9 of the protocol information on EUCTR (online supplemental figure S1).
Study population and sample
Trials removed from the full December 2020 dataset were those with a status of not authorised or prohibited by competent authority, and those with ethical approval dates from before the launch of the registry or later than the data extraction date. The remaining trials were checked for the latest completion date in the individual trial protocols or in the results section. Completion dates from the results section were preferred when available.
Poor data on completion of a trial are a known concern for EUCTR because many trials lack updated statuses and completion dates.33–35 To ensure that these trials could be captured within our population, and to avoid selection bias for trials with better record keeping, we developed a method to infer a trial completion date when this information was missing. Each trial protocol in the dataset without a completion date was assigned a start date (ie, from the ethics and regulatory start dates provided, whichever was later, because no clear start date field exists on EUCTR) and an expected duration in days calculated from protocol section E.8.9 (online supplemental figure S1).
All protocols from a trial record were then grouped into one record, where the longest estimated duration was added to the latest start date. Another year was conservatively added to this date, to allow for any delays in the start or conduct of the trial, resulting in a final inferred completion date. A Jupyter Notebook detailing this approach and its validation is available on the project's Open Science Framework repository (https://osf.io/r3vc5/).
For trials with an extracted or inferred completion date, we limited our population to those completed at least two years in the past (ie, before 1 December 2018) to allow time for reporting across all dissemination routes.36 37 From this population, a random sample of 500 trials was taken for the analysis by using the .sample() method in the Pandas Python package. This sample size was chosen based on achieving point estimates with a maximum 95% confidence interval of ±5% (online supplemental box S1). The maximum sample needed to achieve this level of precision was 384; a final sample of 500 trials was chosen to allow for greater precision in point estimates of subpopulations. Trials in the sample found, at any point, to have issues that would make it difficult or impossible to discover the results, such as being withdrawn without enrolment, still ongoing, or currently inaccessible on EUCTR, were excluded from the analysis sample and replaced with another randomly chosen trial.
Results search strategy
The search strategy was piloted in 50 trials, with a subset searched in duplicate. We found high percentage agreement on data extraction, and discrepancies were easily resolved through discussion. The pilot informed the decision to only use open, public databases (ie, PubMed, Google Scholar) to allow for greater reproducibility because proprietary databases (ie, Scopus, Ovid) did not give substantially more value in locating results. Details from the pilot searches are available in the protocol and on the project's Open Science Framework repository (https://osf.io/r3vc5/).
Each EUCTR trial record was reviewed for results and information about duplicate registrations or published results. ClinicalTrials.gov and the ISRCTN registry were then searched for potential cross registrations. Both of these registries can host results directly on the registry, and because of their size, geographical focus,38 39 and related regulations,40 41 would be expected to have cross registrations of trials on the EUCTR. After peer review feedback, any remaining trials with no additional registrations located were searched in the International Clinical Trials Registry Platform (ICTRP) database to confirm that no documented cross registrations existed. Registries were first searched with the EUCTR unique trial identifier and then with the trial title, name/acronym, intervention, condition, sponsor, and any additional secondary trial identity numbers. Matching records were searched for additional relevant information; an eligible result on a registry was hosted directly on the registry rather than linking to an external article or resource. Trials that declared, in place of results, that no analysis was possible (eg, because of low enrolment) were counted as having results because enough detail to understand the fate of the trial would be available to interested parties.
Lastly, PubMed and Google Scholar were searched for journal publications with all known trial identity numbers, trial title, acronym, interventions, conditions studied, and any investigator names and affiliations available in the registrations. Searchers could combine these terms, or add more terms, to their searches at their discretion. Results in the literature were included if they were available as a publication in a journal, reported final primary results of the trial, and were >500 words in length (eg, detailed conference abstracts), consistent with previous methods.37 If multiple eligible publications were located, we recorded the earliest. Only results published before the start of searches were included in the final analysis. Article and registry matches were confirmed through comparison of trial identity numbers, study design, indication, intervention, planned enrolment, and registered outcomes.36 No specific threshold was set to define a match between records, but any problems matching registrations and publications were referred to the full study team for more discussion.
One author (ND) searched all trials and 50% were also searched by a second author (JAS, JM, HD) to validate the search strategy and data extraction. The original sample of 500 trials was searched by the lead researcher (ND) between December 2020 and July 2021; secondary searches of half the sample began in April 2021 and concluded in January 2023. Further checks and searches were carried out in 2023 in response to peer review. Any uncertainties or discrepancies were resolved by consensus discussions, with remaining concerns referred to a senior member of the study team (CH) for final adjudication. Problems with extracting publication date because of inconsistencies between sources (ie, PubMed, journal websites) were resolved by the lead author by re-extracting all publication dates, preferring the date on the journal website when available. Online supplemental table S1 details the reliability measures between the searchers.
Outcomes and statistical analysis
The primary outcome was the proportion of trials with results for the examined dissemination routes, and for each route individually. Prespecified secondary outcomes were the number and proportion of unique results, and the timing of results, for each dissemination route. Analysis of the timing of results was altered from prespecification (online supplemental box S2). Differences in reporting statistics between the groups of trials with inferred and extracted completion dates were assessed with a two proportion z test (unadjusted α=0.05, with Holm-Bonferroni corrections). The presence of trial identity numbers in a journal publication was added as a post hoc secondary outcome. This information was extracted during data collection for validation purposes and provides insights into how the published literature linked back to the EUCTR registration.
Exploratory analyses
We performed two prespecified exploratory analyses. For trials with results available in any dissemination route, factors associated with results appearing on EUCTR were examined with univariable and multivariable logistic regression. Prespecified factors were whether the completion date was reported or inferred, trial start year, sponsor type, number of EU protocols registered, final enrolment (intent to treat), and whether the trial was conducted only in the EU or European Economic Area (EEA), in countries inside and outside of the EU and EEA, or entirely outside of the EU and EEA. Trial start year, final enrolment, and location variable were extracted manually by the lead author (ND) from trial registrations and the results located during searches. If contradictions between sources arose, the most recently updated or available source was preferred; if further ambiguity existed, the EUCTR data were preferred. The remaining variables were directly extracted from EUCTR data in code, with missing data coded as unknown. Significance for the univariable and multivariable models was determined with the Holm-Bonferroni method (unadjusted α=0.05).
The second exploratory analysis examined variation in reporting behaviour by sponsor country. The sponsor country field (ie, protocol section B.1.3.4 on EUCTR) was extracted for all country level protocols for each trial, and each trial was assigned the most frequently appearing sponsor country across protocols. In the event of ties, the trial was coded as having multi-country sponsorship. If no sponsor was located, the sponsor country was coded as unknown. The number of trials reporting any results, and reporting results to EUCTR, other registries, and the literature was examined for each sponsor country.
Patient and public involvement
No patients or members of the public were involved in determining the research question, outcome measures, or interpreting the results as this was a doctoral student project without funding to support patient and public involvement. The results of this study will be summarised for the public in a blog post by the first authors on publication, disseminated on the Bennett Institutes for Applied Data Science and TranspariMED websites to their relevant audiences, and publicised on social media.