### Study design

We performed a meta-analysis of trial IPD to determine the association between comorbidity, age, and sex on attrition, in two stages. Firstly, for each trial, the association between comorbidity count (the number of other conditions in addition to the index condition defining the trial population) and attrition (defined as failure for any reason to complete final trial visit) was estimated in logistic regression models, adjusting for age and sex. In similar models, we estimated the associations between age and sex and trial attrition. Secondly, the resulting effect estimates were meta-analysed in bayesian linear models. We allowed partial pooling across index conditions and drug classes in order to obtain overall, drug class specific and index condition specific estimates of these associations.

### Data sources and participants

Available IPD were obtained from phase 3 or 4 trials contained within two trial repositories: the multi-sponsored Clinical Study Data Request repository and the Yale University Open Data Access project. Appropriate trials for inclusion were identified according to prespecified criteria (PROSPERO CRD42018048202).13 Specifically, we included trials for medical conditions that are predominantly managed by drug treatments (frequently over a sustained period).13

We classified each trial in terms of the index condition based on the stated trial indication as described previously.13 Each trial was also classified in terms of the intervention drug, using the five character WHO Anatomic Therapeutic Chemical (ATC) class.14 For example, the A10BJ (glucagon-like peptide 1 analogues) class includes the drugs A10BJ01 (exenatide) and A10BJ02 (liraglutide).

In a previous publication,13 we defined comorbidities solely using concomitant drug treatments in order to enable comparison across trial and community settings. The comorbid conditions included cardiovascular disease, chronic pain, arthritis, affective disorders, acid related disorders, asthma or chronic obstructive pulmonary disease, diabetes mellitus, osteoporosis, thyroid disease, thromboembolic disease, inflammatory conditions, benign prostatic hyperplasia, gout, glaucoma, urinary incontinence, erectile dysfunction, psychotic disorders, epilepsy, migraine, parkinsonism, and dementia. For the current analysis, for the 80 trials that did not redact medical history data, we additionally defined the same comorbidities using prespecified codes from the Medical Dictionary for Regulatory Activities. Individuals were defined as having a comorbidity if they met required definitions based on either concomitant drug treatment or medical history. Definitions and code lists are available at the project repository15 (https://github.com/ChronicDiseaseEpi/como_complete_public). To produce a comorbidity count for each trial participant, the number of comorbidities at baseline were summed, excluding the index condition of the respective trials.

### Representativeness

Not all sponsors share trial IPD and not all trials are made available to third party researchers. Consequently, to contextualise the IPD trials included in this analysis, we also examined attrition in a wider set of trials registered on the US clinical trials registry (ClinicalTrials.gov) of which the IPD trials are a subset (PROSPERO registration number CRD42018048202).13 We restricted the 2235 trials registered on ClinicalTrials.gov to the 777 registered on or after 2010 since we saw that trials registered before this period (consistent with changes in US Food and Drug Administration requirements for trials registered on or after 2007)16 were less likely to post completion data. Of these, 593 (76.3%) trials had posted data to ClinicalTrials.gov on enrolment, randomisation, and completion, for which we produced summaries of the proportion of participants completing each trial overall and by index condition.

### Statistical analysis

Summary statistics were calculated for each index condition for the available IPD trials including age (mean and standard deviation), sex (number and %), comorbidity count (mean and standard deviation), and proportion with two or more comorbidities. A violin plot was constructed to illustrate the proportion of attritions in IPD and ClinicalTrials.gov trials.

Full descriptions of the modelling are provided in the online supplemental appendix and are described briefly below. In logistic regression models, for each trial, attrition was regressed on age (per 15 year increment, which was close to the standard deviation for most trials), sex (male *v* female (reference)), and comorbidity count (per additional comorbidity). We fitted a range of models with and without terms for comorbidity count, comorbidity count squared, age, sex, treatment arm, and a comorbidity-treatment arm interaction. The effect measure estimates (log-odds ratios) and associated standard errors for each model were then exported from the Yale University Open Data Access and Clinical Study Data Request repository safe havens. Proportions of missing baseline data within trials were very small. Logistic regression models within trial repositories were conducted on complete cases.

The effect measure estimates for the age (adjusted for sex), sex (adjusted for age), and comorbidity (adjusted for age and sex) terms were subsequently meta-analysed separately in bayesian linear regression models. We used bayesian models because these allowed partial pooling across index conditions and drug classes and because they allowed us to obtain credible intervals for estimates at the level of index conditions and drug classes directly from the posterior without a need for post hoc calculations. We performed a range of meta-analyses for each regression coefficient. These meta-analyses were done within a bayesian framework, where the final meta-analysed estimate was a summary of the trial level estimates. This summary is a product of the precision with which the association is estimated for each trial (ie, the inverse of the squared standard error for the relevant coefficient), the variation between trials, the variation between other groups (eg, drug class or condition), and the prior distributions (a vague prior for the overall effect, and weakly informative priors for the variation parameters). Details of the selected priors are available in the online supplemental data file. For the simplest model, only variation between trials was explicitly modelled. For the (progressively) more complex models, the variation between other groups was also modelled: drug class, condition, and both drug class and condition. This modelling allowed estimates to differ for each group, while also allowing sharing of information between the groups (known as partial pooling), which has the effect of improving precision as well as shifting extreme effect estimates towards the overall mean. The variation within groups for trials, conditions, and drug classes was reported as the respective standard deviation.

Models were fit using the brms package.17 For each model, 4000 samples from the posterior were obtained and summarised as 50%, 80%, and 95% credible intervals. The probability (bayesian P value) that comorbidity count was positively associated with attrition was estimated as the proportion of the posterior distribution of the log-odds ratio, which was above 0. An illustration of models used to assess the association between comorbidity count and attrition is displayed in figure 1.

Figure 1Overview of use of models for meta-analysis output. Shaded areas=analyses conducted within Clinical Study Data Request (CSDR) and Yale University Open Data Access (YODA) repository safe havens. Variance matrices of the effect estimates were exported to allow maximum flexibility in subsequent meta-analyses, if required. IPD=individual participant level data

Using the effect estimates obtained from this meta-analysis for the association between comorbidity count and attrition among participants, we then explored the potential impact of comorbidity count at the trial level. Firstly, we constructed a set of notional trials with different plausible mean comorbidity counts (and therefore with different proportions of participants with each comorbidity count) and different risks of attrition among participants with zero comorbidities (which could differ because of trial level factors such as difference in follow-up methods or settings). Next, we applied the effect estimates to participants from these notional trials to estimate the overall percentage of participants who would be expected not to complete the trial visits. This analysis is described in detail in the online supplemental appendix.

We conducted a sensitivity analysis using wider priors for the variances between trials, between conditions, and between drug classes (details in online supplemental data file). We also conducted a sensitivity analysis within each of the trial repositories where we reanalysed the trials having excluded any participant who had an adverse event of any kind. The model outputs for these trials were not exported but were meta-analysed within the repositories, pooling results across all trials. We fit a frequentist random effects model (which assumes effect estimates for each trial come from a normal distribution), using a restricted maximum likelihood estimator within the metafor package. This model was fit using frequentist software rather than the bayesian software used for the main analysis, because bayesian software was not available within the trial repository. The trial level results, model outputs, and analysis code are provided on the project GitHub repository (https://github.com/ChronicDiseaseEpi/como_complete_public).