ADNEX risk prediction model for diagnosis of ovarian cancer: systematic review and meta-analysis of external validation studies

Objectives To conduct a systematic review of studies externally validating the ADNEX (Assessment of Different Neoplasias in the adnexa) model for diagnosis of ovarian cancer and to present a meta-analysis of its performance. Design Systematic review and meta-analysis of external validation studies Data sources Medline, Embase, Web of Science, Scopus, and Europe PMC, from 15 October 2014 to 15 May 2023. Eligibility criteria for selecting studies All external validation studies of the performance of ADNEX, with any study design and any study population of patients with an adnexal mass. Two independent reviewers extracted the data. Disagreements were resolved by discussion. Reporting quality of the studies was scored with the TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) reporting guideline, and methodological conduct and risk of bias with PROBAST (Prediction model Risk Of Bias Assessment Tool). Random effects meta-analysis of the area under the receiver operating characteristic curve (AUC), sensitivity and specificity at the 10% risk of malignancy threshold, and net benefit and relative utility at the 10% risk of malignancy threshold were performed. Results 47 studies (17 007 tumours) were included, with a median study sample size of 261 (range 24-4905). On average, 61% of TRIPOD items were reported. Handling of missing data, justification of sample size, and model calibration were rarely described. 91% of validations were at high risk of bias, mainly because of the unexplained exclusion of incomplete cases, small sample size, or no assessment of calibration. The summary AUC to distinguish benign from malignant tumours in patients who underwent surgery was 0.93 (95% confidence interval 0.92 to 0.94, 95% prediction interval 0.85 to 0.98) for ADNEX with the serum biomarker, cancer antigen 125 (CA125), as a predictor (9202 tumours, 43 centres, 18 countries, and 21 studies) and 0.93 (95% confidence interval 0.91 to 0.94, 95% prediction interval 0.85 to 0.98) for ADNEX without CA125 (6309 tumours, 31 centres, 13 countries, and 12 studies). The estimated probability that the model has use clinically in a new centre was 95% (with CA125) and 91% (without CA125). When restricting analysis to studies with a low risk of bias, summary AUC values were 0.93 (with CA125) and 0.91 (without CA125), and estimated probabilities that the model has use clinically were 89% (with CA125) and 87% (without CA125). Conclusions The results of the meta-analysis indicated that ADNEX performed well in distinguishing between benign and malignant tumours in populations from different countries and settings, regardless of whether the serum biomarker, CA125, was used as a predictor. A key limitation was that calibration was rarely assessed. Systematic review registration PROSPERO CRD42022373182.

ELIGIBILITY CRITERIA FOR SELECTING STUDIES All external validation studies of the performance of ADNEX, with any study design and any study population of patients with an adnexal mass.Two independent reviewers extracted the data.Disagreements were resolved by discussion.Reporting quality of the studies was scored with the TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) reporting guideline, and methodological conduct and risk of bias with PROBAST (Prediction model Risk Of Bias Assessment Tool).Random effects meta-analysis of the area under the receiver operating characteristic curve (AUC), sensitivity and specificity at the 10% risk of malignancy threshold, and net benefit and relative utility at the 10% risk of malignancy threshold were performed.RESULTS 47 studies (17 007 tumours) were included, with a median study sample size of 261 (range 24-4905).On average, 61% of TRIPOD items were reported.Handling of missing data, justification of sample size, and model calibration were rarely described.91% of validations were at high risk of bias, mainly because of the unexplained exclusion of incomplete cases, small sample size, or no assessment of calibration.The summary AUC to distinguish benign from malignant tumours in patients who underwent surgery was 0.93 (95% confidence interval 0.92 to 0.94, 95% prediction interval 0.85 to 0.98) for ADNEX with the serum biomarker, cancer antigen 125 (CA125), as a predictor (9202 tumours, 43 centres, 18 countries, and 21 studies) and 0.93 (95% confidence interval 0.91 to 0.94, 95% prediction interval 0.85 to 0.98) for ADNEX without CA125 (6309 tumours, 31 centres, 13 countries, and 12 studies).The estimated probability that the model has use clinically in a new centre was 95% (with CA125) and 91% (without CA125).When restricting analysis to studies with a low risk of bias, summary AUC values were 0.93 (with CA125) and 0.91 (without CA125), and estimated probabilities that the model has use clinically were 89% (with CA125) and 87% (without CA125).CONCLUSIONS The results of the metaanalysis indicated that ADNEX performed well in distinguishing between benign and malignant tumours in populations from different countries and settings, regardless of whether the serum on March 16, 2024 by guest.Protected by copyright.

Introduction
The optimal management of patients with an ovarian mass depends on the histology of the mass.
Patients with a benign mass can be managed without surgery, with clinical and ultrasound follow-up, or with conservative surgical techniques. 1 2 Malignant tumours benefit from management in specialised oncology centres, but borderline malignancies, stage I primary invasive tumours, and advanced primary invasive tumours might require different surgical approaches. 3 4To optimise patient triage without operating on all masses, diagnostic models can be used to estimate the likelihood of malignancy and hence to plan treatment for patients.
Given the potential advantages of accurately predicting the risk of malignancy, the International Ovarian Tumour Analysis (IOTA) group developed the Assessment of Different Neoplasias in the adnexa (ADNEX) risk prediction model, based on three clinical and six ultrasound predictor variables. 5he clinical variables are age, serum levels of the biomarker, cancer antigen 125 (CA125), and type of centre (oncology centre v other).An oncology centre is defined as a tertiary referral centre with a specific gynaecology oncology unit.The ultrasound variables are the maximum diameter of the lesion, proportion of solid tissue (defined as the largest diameter of the largest solid component divided by the largest diameter of the lesion), number of papillary projections, presence of >10 cyst locules, presence of acoustic shadows, and ascites.The ADNEX multinomial logistic regression model estimates the risk of five tumour types: benign, borderline, stage I primary invasive, stage II-IV primary invasive, and secondary metastatic.
The total risk of malignancy calculated by ADNEX is the sum of the risks for each malignant subtype.ADNEX has two versions: one with and one without CA125 as a predictor (the ADNEX formulas are provided in online supplemental material S1). 5 When we refer to the ADNEX model or ADNEX, we refer to both versions of the model.The model was developed on data from 5909 patients with an adnexal mass who subsequently underwent surgery, recruited at 24 centres in 10 countries (Belgium, Italy, Czech Republic, Poland, Sweden, China, France, Spain, UK, and Canada).][8][9] ADNEX is included in national guidelines (eg, in Belgium, the Netherlands, and Sweden), [10][11][12] and recommended by scientific societies, such as the International Society of Ultrasound in Obstetrics and Gynecology, European Society of Gynaecological Oncology, European Society for Gynaecological Endoscopy, and the American College of Radiology. 4 13Also, manufacturers of ultrasound machines have begun to incorporate ADNEX directly into their machines.
Several external validation studies of ADNEX have been carried out.5][16][17][18] The ADNEX model not only classifies masses as benign or malignant, however, it can also be used as a risk prediction model, providing probability estimates for five different tumour types at the individual patient level.Hence focusing only on its performance in classifying tumours, risks losing useful information. 19When validating ADNEX as a diagnostic test at a 10% threshold, the performance metrics (ie, sensitivity and specificity) do not take into account the individual risks predicted but the same weight is given to a misclassified patient with an 11% risk as with a 99% risk.This approach means that these meta-analyses have not fully validated the diagnostic performance of ADNEX; for example, pooling discrimination performance (area under the receiver operating characteristic curve, AUC) allows determination of the ability of the model to differentiate between patients with and without the outcome across different thresholds.1][22] These guidelines should be used in meta-analyses of validation studies.Hence the objectives of this study were to perform a systematic review of studies that externally validated ADNEX, to describe reporting completeness and risk of bias of the validation studies, and to conduct meta-analyses of measures of performance of the model.

Protocol registration
We report this study according to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses, online supplemental file 2) and TRIPOD-SRMA (Transparent Reporting of multivariable prediction models for Individual Prognosis Or Diagnosis: checklist for Systematic Reviews and Meta-Analyses, online supplemental file 3) checklists. 22  Exclusion criteria were studies that did not evaluate the performance of the ADNEX model; studies that only evaluated the predictive performance of updated versions of ADNEX; studies where only an abstract was available, or the full text could not be obtained; and case studies presenting the performance of ADNEX for individual patients (this criterion was not prespecified in the protocol but was added post hoc on review of the search results).Updating can refer to recalibration, refitting, or extension with additional predictors. 24Studies that conducted comparisons of ADNEX with other models, reporting performance metrics, were eligible for inclusion.

Information sources and search strategy
We created a search string and overall search strategy with the help of biomedical reference librarians from the KU Leuven Libraries.We searched the electronic databases Medline (through PubMed), Embase, Web of Science, and Scopus for published articles, and Europe PMC for preprints.The search dates were from the publication of the first ADNEX paper (15 October 2014) to 15 May 2023 (date when the final search was run).We also screened all articles citing the original ADNEX paper. 5The reference lists of relevant review and opinion articles retrieved by the search strings were checked for other potentially eligible articles.Forward and backward snowballing (forward and back cross reference checking) of the included articles was performed to identify additional publications. 25Language was not restricted, but for papers in languages other than English, Spanish, Dutch, French, or Swedish, we used an automatic translation tool ( deepl.com) to decide whether to include a paper and to extract information.Online supplemental material S2 shows the full search strategy.

Study selection
The studies we identified in our search were imported into Zotero reference manager, where they were automatically deduplicated.The deduplicated records were then imported into the Rayyan web application for manual deduplication (by LB) and subsequent screening of the title and abstract by two independent authors.Disagreements were resolved by discussion between the two authors (LB and AL). 26hree of the authors (BVC, LV, and DT) were members of the IOTA group that developed ADNEX, so we divided the studies into those that were linked or not linked to IOTA.A study was linked to IOTA if it was coauthored by a member of the IOTA steering committee (online supplemental material S3).IOTA linked papers, as well as a few others with a potential conflict of interest (ie, including authors that are or were IOTA collaborators), were independently assessed by two of the authors (PD and GSC, medical statisticians with expertise in prediction modelling and unrelated to IOTA).All other studies were independently assessed (by LB and AL).o describe the performance of the model, we extracted information on any reported measure related to discrimination, calibration, diagnostic accuracy, or clinical utility.The reference standard could be binary (eg, benign v malignant) or multinomial (eg, the five tumour types predicted by ADNEX).Performance data were extracted for all reported validations (ie, for ADNEX with and without CA125), subgroup analyses, sensitivity analyses, and for multicentre studies, results specific to each centre.For each study, we assessed the reporting of all TRIPOD items that were applicable to the external validation studies (online supplemental table S2).We also checked PROBAST's signalling questions and evaluated risk of bias for each subdomain (participants, predictors, outcome, and analysis) and overall.We included our rationale for classification of the risk of bias.

Data extraction and data items
We contacted study authors to obtain further information or results when centre specific results were not reported in multicentre studies, type of centre was not explicitly reported (if no response, the clinical coauthors, JYV, DT, and LV, classified the centre), overall performance was reported but not performance by menopausal status, or performance was not reported for patients who underwent surgery separately in studies that included patients who were managed surgically and non-surgically.Online supplemental table S1 and Open Science Framework repository (extraction sheet, https://osf.io/jtsvd/)have details on all extracted items. 29ased on specific results for each centre in multicentre studies, where possible, was conducted with the random effects meta-analysis method.Metaanalysis was done separately for ADNEX with and without CA125.We used 95% confidence intervals for the summary performance and assessed heterogeneity with τ 2 and 95% prediction intervals.Online supplemental material S4-6 have details on the statistical methodology, including meta-analysis methods, explanations of net benefit and relative utility, and assessment of publication bias.Meta-analysis of the AUC for benign versus malignant tumours was done on the logit scale.Metaanalysis for sensitivity and specificity was also performed on the logit scale with a random effects meta-analysis. 30Because the meta-analysis was conducted only for the 10% threshold for the risk of malignancy, we did not need the 95% confidence ellipse in receiver operating characteristic curve space, so we did not use the bivariate random effects model as specified in the protocol.The 10% threshold was most commonly used in the articles included in our systematic review, and is a commonly recommended threshold. 4Meta-analysis of net benefit 31 and relative utility 32 at the 10% risk of malignancy threshold was performed with bayesian trivariate random effects meta-analyses of sensitivity, specificity, and prevalence of malignancy. 33For bayesian methods, 95% credible intervals are reported instead of 95% confidence intervals.With the bayesian approach, the probability that the model is useful in a new centre can be estimated (ie, the probability that relative utility is >0).To deal with multinomial discrimination performance, we conducted a meta-analysis of AUC values between pairs of tumour outcomes (pairwise AUC values) on a logit scale.We only included studies that used the conditional risk method to calculate pairwise AUC values. 34ubgroups were defined based on geographical location, type of centre, and menopausal status.Sensitivity analyses were based on judgment of the risk of bias and on whether the study was linked to IOTA.As prespecified in the protocol, we only conducted a meta-analysis of performance if at least three estimates in a specific analysis could be retrieved from the included studies.To assess the association between prevalence of malignancy and AUC, and sensitivity and specificity at the 10% risk of malignancy threshold, we used meta-regression. 35eporting bias and small study effects were visually explored with funnel plots adapted for the AUC.The body of evidence was assessed with an adapted version of GRADE (Grading of Recommendations Assessment, Development and Evaluation). 368][39][40] Bayesian methods were computed with JAGS version 4.3.1. 40

Patient and public involvement
Patients and the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.Preliminary results of the research were presented at the ISUOG World Congress in Seoul (October 2023).As this study was a systematic review, there was no collection of patients' data, and we use only the information publicly available in the published papers.

Results
We identified 1843 records and screened 490 after duplicates were removed.Forty seven studies met our inclusion criteria and were included in this systematic review (figure 1 and online supplemental table S3). 6-9 41-835][86][87] The data of three studies that were linked to IOTA and three other studies with a potential conflict of interest were extracted by authors PD and GSC. 6 50 63 72 74 75able 1 summarises the key study characteristics and online supplemental table S3 shows the specific gives a list of reporting inconsistencies.All studies classified borderline tumours as malignant tumours.ADNEX with CA125 was validated in 37 (79%) studies, and ADNEX without CA125 in 19 (40%) studies (16 studies evaluated both versions).Three (6%) studies conducted a mixed validation; ADNEX with CA125 was used when CA125 was available.For four (9%) studies, the ADNEX version was unclear.In total, 63 validations of ADNEX were performed after distinguishing between the ADNEX versions used (online supplemental table S4).When reported results for subgroups (eg, by menopausal status or by centre in multicentre studies) were also included, the total number of validations reported for the 47 studies was 159.
Five (11%) studies focused only on a specific clinical subgroup, such as pregnant patients or tumours, and the clinician's subjective assessment of the outcome was uncertain. 9 47 61 65 75Eight (17%) studies selected patients based on histology (online supplemental table S3).Thirty six (77%) studies did not focus on a specific clinical subgroup and did not select tumours based on histology.In these 36 studies that were eligible for the metaanalysis, median sample size was 284 tumours (range 50-4905), median number of malignant tumours was 68 (7-1041), and median prevalence of malignancy was 28% (3-57%).Fourteen of the 36 (39%) studies had ≥100 benign and ≥100 malignant tumours.
The target population of the studies was patients who were managed with surgery, or patients who were managed surgically and non-surgically.The reference standard for determining the type of tumour in patients who underwent surgery was always histopathology.Four (9%) studies included patients who were managed with and without surgery.In patients who did not undergo surgery, the outcome determination was the clinician's subjective assessment of the tumour as benign or malignant, or spontaneous resolution of the tumour during follow-up.][8][9] The most commonly reported performance measure was AUC for benign versus malignant tumours (72%) (online supplemental table S5).About two thirds of studies (66%) presented a receiver operating characteristic curve, 31 (66%) reported sensitivity and specificity performance at the 10% threshold for the risk of malignancy, 12 (26%) reported measures for multinomial discrimination, and four (9%) studies reported calibration performance.

Critical appraisal: reporting completeness and risk of bias
Completeness of reporting the TRIPOD items was assessed for the 63 validations.Adherence to TRIPOD  2 and online supplemental figure S1).The least commonly reported items were comparison of demographics, predictors, and outcome between the model development and external validation data (item 13c; 5%), reporting of performance measures with confidence intervals (item 16; 11%), specification of all performance measures (item 10d; 11%), rationale for study sample size (item 8; 13%), and description of how missing data were handled (item 9; 22%).Fifty seven (90%) of 63 validations were rated as having a high risk of bias, two (3%) as uncertain risk of bias, and four (6%) as low risk of bias (figure 3, online supplemental figures S2-5, and Open Science Framework repository, extraction sheet https://osf.io/jtsvd/). 29Forty three (68%) validations had a high risk of bias for the participant domain, mostly by having incomplete data as an exclusion criterion.Fifty seven (90%) validations had a high risk of bias for the analysis domain, mostly because of small sample size (69%; ie, <100 tumours in the smallest group), not including all participants in the analysis (85%), inappropriate handling of missing data (82%), and incomplete evaluation of model performance (89%, in most instances by not reporting an assessment of calibration).
Adherence to TRIPOD items in 36 studies without a focus on selected histologies or clinical subgroups was, on average, 65% (17.47 of 27 items).In these studies, two had a low risk of bias, one had an unclear risk of bias, and 33 had a high risk of bias.
Net benefit and relative utility were calculated based on studies that presented sensitivity and specificity at the 10% risk of malignancy threshold in patients who underwent surgery (online supplemental table S7).For ADNEX without CA125, the summary net benefit was 0.28 (95% confidence interval 0.21 to 0.35, 95% prediction interval 0.05 to 0.68) and the summary relative utility was 0.50 (95% confidence interval 0.37 to 0.62, 95% prediction interval −0.44 to 0.79).The probability that the  S6, and figure S7).For ADNEX with CA125, the summary net benefit was 0.28 (95% confidence interval 0.22 to 0.33, 95% prediction interval 0.05 to 0.65), and the summary relative utility was 0.54 (95% confidence interval 0.45 to 0.61, 95% prediction interval −0.12 to 0.78).The probability that the model is clinically useful in a random new centre was estimated to be 95% (online supplemental table S9, figure S8,9).Pairwise AUC values in patients who underwent surgery were reported in four studies for ADNEX without CA125 and in five studies for ADNEX with CA125 (online supplemental table S10). 6 68 72 80 83The summary pairwise AUC values for ADNEX without CA125 ranged from 0.66 (stage II-IV primary invasive v metastatic) to 0.97 (benign v stage II-IV primary invasive) (online supplemental table S11).For ADNEX with CA125, the summary pairwise AUC values ranged from 0.72 (borderline v stage I primary invasive) to 0.98 (benign v stage II-IV primary invasive).
The AUC for benign versus malignant tumours in patients managed with and without surgery combined was reported in two studies (5167 tumours, 18 centres, and eight countries) with a summary estimate of 0.94 (95% confidence interval 0.93 to 0.96, 95% prediction interval 0.88 to 0.99) for ADNEX with CA125 (table 2). 6 7ADNEX without CA125 was assessed in only one study (4905 tumours, 17 centres, and seven countries). 6This study reported a summary AUC of 0.94 (95% confidence interval 0.91 to 0.95, 95% prediction interval 0.82 to 0.98).
Table 2 and online supplemental tables S6-9 present the sensitivity and subgroup results for AUC, specificity, sensitivity, net benefit, and relative utility.These results showed that the findings were  When limiting the to studies with a low risk of bias, summary AUC values were 0.93 (with CA125) and 0.91 (without CA125).Sensitivity was higher and specificity lower in oncology versus non-oncology centres and in patients who were postmenopausal versus premenopausal.In line with these findings, meta-regression suggested that the prevalence of malignancy was not related to AUC but was related positively to sensitivity and negatively to specificity (online supplemental figures S10-12).Meta-analysis of calibration only in patients who underwent surgery was not feasible because only one study reported calibration slope and intercept. 6our studies presented a calibration plot; in three   72 80 83 the estimated risks were close to the observed risks and in one study 6 the risk malignancy was slightly underestimated (online supplemental material S8).
Based on the subdomains of the GRADE assessment, we found that the risk of bias in the studies included in this meta-analysis was a substantial limitation affecting the certainty of our meta-analysis results.Only two studies (representing 5511 tumours or 32% of the total tumours included in this review) were not classified as having a high risk of bias, 6 72 but the sensitivity analysis according to risk of bias showed consistent findings (table 2, and online supplemental table S8 and table S9).Funnel plots for AUC did not suggest publication bias (online supplemental figure S13).

Principal findings
The ADNEX model performed well in classifying tumours and in differentiating between benign and malignant tumours across various settings and populations.Our results indicated that ADNEX was clinically useful at the 10% risk of malignancy threshold (eg, to help decide whether a patient should be referred for assessment to a gynaecological oncology centre).We found deficiencies in study reporting, and most studies were judged to have a high risk of bias, but our sensitivity analyses indicated that performance was almost identical in studies with a low risk and high risk of bias.
For ADNEX with CA125, the AUC was 0.93 based on all studies, versus 0.93 when based only on studies with a low risk of bias.For ADNEX without CA125, the AUC values were 0.93 (all studies) and 0.91 (low risk of bias only).High risk of bias was mainly caused by small sample size, no assessment of calibration performance, and unjustified use of complete case analysis.Small sample size implies that the estimated AUC value is less precise but it does not systematically affect the AUC, unless publication bias exists.The funnel plots did not suggest publication bias.Absence of calibration does not affect the AUC.Using complete cases, in terms of CA125 or other predictors, might lead to underestimation of AUC because missing values tend to be associated with the examiner's subjective impression that the tumour is benign. 84Complete case analysis would then tend to exclude clearly benign tumours, which would make the sample more homogeneous and reduce the AUC.Our results, however, suggest that the effect of complete case analysis on performance might have been minimal.The effect on calibration could not be assessed.Taken together, we believe that the results of our meta-analysis are reliable.Prevalence of malignancy (%) 0.5 0.6 0.7 0.9 1.0 0.8

Speci city (95% CI)
Figure 6 | Forest plot of sensitivity and specificity at the 10% risk of malignancy threshold in studies where the ADNEX (Assessment of Different Neoplasias in the adnexa) model was used without CA125 (cancer antigen 125). 6

Strengths and limitations of this study
Strengths of our systematic review include metaanalysis of ADNEX both as a risk and as a diagnostic test, and the thorough critical appraisal of risk of bias and reporting quality with recommended checklists. 21 28Our study also had limitations.Some of the authors of this study had a conflict of interest because of their involvement in developing ADNEX or in some of the included external validation studies.To deal with this conflict of interest, independent researchers with expertise in study methodology and prediction modelling evaluated the IOTA related studies.A limitation of our findings (but not of our study) is that calibration performance was reported in only four studies and therefore meta-analysis of calibration was not possible.

Comparison with other studies
5][16][17][18] These studies used the QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies 2) tool 88 to assess risk of bias, and found that 0-64% of studies had a high risk of bias in at least one domain.We identified 45 of 47 studies with a high risk of bias with PROBAST, designed for the appraisal of risk prediction models.None of the previous meta-analyses of ADNEX included clinical utility, calibration, or AUC.Our results align with those of other systematic reviews of risk prediction modelling studies in various domains.These studies consistently indicated that reporting in the original studies was poor and that many studies had a high risk Prevalence of malignancy (%) 0.5 0.6 0.7 0.9 1.0 0.8

Study implications
ADNEX is intended for use by gynaecologists to help them decide on the most appropriate management of an adnexal mass detected on ultrasound.Our findings support the use of ADNEX in choosing between surgery and conservative follow-up.Conservative follow-up might be appropriate in patients with a low risk of malignancy (eg, <1%, based on meta-analysis of patients managed with and without surgery).Our results also support the use of ADNEX in deciding on the most appropriate management if conservative management is not suitable (eg, surgery at a local hospital if the risk of malignancy is <10% or referral to an oncology centre if the risk is>10%; based on meta-analysis of patients who underwent surgery). 4DNEX can also be helpful in deciding on the management of a suspected malignancy (ie, investigations to find the primary tumour if a metastasis in the ovary is likely, or fertility sparing surgery if a borderline tumour is likely; based on meta-analysis in patients who underwent surgery).Because the AUC values of ADNEX with and without CA125 were similar, and because adding CA125 mainly helps to distinguish between different types of malignant tumours, we argue that the main use of ADNEX without CA125 is to help decide whether conservative follow-up, surgery in a local centre, or referral to an oncology centre is appropriate.The main use of ADNEX with CA125 is to help decide on the optimal management of a tumour suspected to be malignant, because it differentiates better between malignant subtypes than ADNEX without CA125.
Although our findings suggest that ADNEX is clinically useful, well conducted validations of any model are always of value to monitor its performance in diverse regions and clinical settings, and over time. 96To improve the performance of ADNEX even more, efforts to update the ADNEX formula are of interest. 96 97If further validation studies are conducted, we recommend including a validation of ADNEX without CA125, using a sufficiently large sample that allows calibration and multinomial discrimination to be assessed, and including patients irrespective of whether they are managed surgically or non-surgically (despite the challenges about reference standard for patients managed without surgery). 34 98Methodological recommendations for validation studies include using available tools to guarantee adequate sample size, describe missing data in detail and use methods such as imputation when needed, and assess calibration performance. 24 97 99-101Adherence to the TRIPOD reporting checklists is important to maximise the value of the validation study (www.tripod-statement.org).

Conclusions
ADNEX has been validated in many studies, with AUC values >0.90 in differentiating between benign and malignant tumours in various settings, and with strong results for its clinical utility at the 10% risk of malignancy threshold.Because of the lack of assessment of calibration in most studies, evaluating the accuracy of the estimated risks in a substantial way was not possible in this study.
those of the NHS, NIHR, or Department of Health and Social Care.GC is supported by Cancer Research UK (programme grant C49297/A27294).PD is by CRUK (project grant PRCPJT-Nov21\100021).The funders had no role in considering the study design or in the collection, analysis, interpretation of data, writing of the report, or decision to submit the article for publication.Ethics approval Ethics approval was not required for the systematic review and meta-analysis.

Competing interests
Provenance and peer review Not commissioned; externally peer reviewed.
e r a l l i n t e r p r e t a t i o n 1 9 a -R e s u l t s i n d i s c u s s i o

Figure 2 |Figure 3 |
Figure 2 | Adherence to reporting the TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) items in 63 validations

Figure 4 |
Figure 4 | Forest plot of area under the receiver operating curve in studies where the ADNEX (Assessment of Different Neoplasias in the adnexa) model was used without CA125 (cancer antigen 125). 6 45 49 53 54 60 66-6972 83 Results in the forest plot are centre specific results, so studies with more than one centre can appear multiple times.CI=confidence interval

Figure 5 |
Figure 5 | Forest plot of area under the receiver operating characteristic curve in studies where the ADNEX (Assessment of Different Neoplasias in the adnexa) model was used with CA125 (cancer antigen 125). 6 41 42 46 49 53 58 60 63 64 66-69 72 74 76 78 8082 83 Results in the forest plot are centre specific results, so studies with more than one centre can appear multiple times.CI=confidence interval All authors have completed the ICMJE uniform disclosure form at www.icmje.org/disclosure-of-interest/and declare: support from Research Foundation Flanders (FWO), Internal Funds KU Leuven, National Institute for Health and Care Research Community Healthcare MedTech, In Vitro Diagnostics Co-operative at Oxford Health NHS Foundation Trust, Cancer Research UK, and CRUK for the submitted work; BVC, LV, and DT are members of the steering committee of the International Ovarian Tumour Analysis (IOTA) consortium and were involved in the development of the ADNEX model; BVC and DT report consultancy work done by KU Leuven to help implement and test the ADNEX model in ultrasound machines by Samsung Medison, GE Healthcare, Canon Medical Systems Europe, and Shenzhen Mindray Bio-medical Electronics, outside the submitted work; GC is a statistics editor for the BMJ; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.

Table 2 |
Meta-analysis of area under the receiver operating characteristic curve to differentiate between benign and malignant tumours, including sensitivity and subgroup analyses 6 45 49 53 54 60 66-69 72 83Results in the forest plot are centre specific results, so studies with more than one centre can appear multiple times.CI=confidence interval on March 16, 2024 by guest.Protected by copyright.
494954 60 66 6869 72 83Results in the forest plot are centre specific results, so studies with more than one centre can appear multiple times. CI=confidence interval on arch 16, 2024 by guest.Protected by copyright.