Methods Primer

Assessing the methodological quality and risk of bias of systematic reviews: primer for authors of overviews of systematic reviews

Key messages

  • Systematic reviews underpin evidence based healthcare decision making, but flaws in their conduct may lead to biased estimates of intervention effects and hence invalid recommendations

  • Overviews of reviews (also known as umbrella reviews, meta-reviews, or reviews of reviews) evaluate biases at the systematic review level, among others, but proper use of tools for this purpose require training, time, and an appreciation of their strengths and limitations

  • AMSTAR-2 and ROBIS are the two most popular and rigorous critical appraisal tools used for appraising systematic reviews

  • The AMSTAR-2 16-item checklist focuses on methodological quality of systematic reviews of healthcare interventions, and incorporates aspects of review conduct, reporting comprehensiveness, and risk of bias as specific items

  • ROBIS is a domain based tool with 19 items focusing on risk of biases in a systematic review (eg, selective reporting of outcomes or analyses) of healthcare interventions and contains items related to risk of bias in results and conclusions, relevance, and an item about risk of interpretation bias or ’spin’

Carole Lunny and colleagues consider methods such as AMSTAR-2 and ROBIS tools to evaluate the methodological quality and risk of bias of systematic reviews of intervention effects that are included in overviews of reviews

Introduction

Overviews of reviews, which synthesise the findings of systematic reviews,1 have significantly increased in publication over the past decade.2 However, the terminology used to describe them is not agreed in consensus, with terms such as umbrella reviews, meta-reviews, and reviews of reviews being used interchangeably to mean overviews of reviews. Methods research has been ongoing since the 2010s to develop effective approaches for conducting overviews of reviews and addressing their unique characteristics.3–7 Overview authors use various approaches to assess the methodological quality and risk of bias in their included systematic reviews, and they apply these assessments to inform the overviews' results and conclusions. However, proper use of tools for this purpose require training, time, and an appreciation of their strengths and limitations. This methods primer aims to address the inconsistency in assessing and reporting bias in systematic reviews of intervention effects included within overviews, and focuses on presenting the different validated tools, comparing them, and providing guidance on the interpretation and reporting of these assessments.

Assessing the methodological quality and risk of bias

Assessment tools

Many tools exist to evaluate methodological quality and risks of bias in systematic reviews, but they have been developed with different purposes, and choosing among them is difficult. More than 40 critical appraisal tools exist to evaluate the content and measurement properties of systematic reviews.8 9 After these reviews were published, two new tools were developed (ie, ROBIS and AMSTAR-210 11), and one is under development (Risk of Bias in Network Meta-Analysis (RoB NMA)12 13).

In 2016, ROBIS was developed to assess risk of bias in systematic reviews,11 ROBIS consists of three phases: assessment of relevance (optional), identification of bias concerns with the review process, and judgement of the overall risk of bias in the review. The tool focuses on four domains: study eligibility criteria, identification and selection of studies, data collection and study appraisal, and synthesis and findings. ROBIS helps reviewers identify potential biases in these domains by asking specific questions related to the review's methods and reporting. The tool underwent content validity and reliability testing to ensure its accuracy and consistency in assessing the risk of bias in systematic reviews.

In 2017, an update to AMSTAR, called AMSTAR-2,10 aimed to assess methodological quality of systematic reviews, and involved inter-rater reliability and usability testing. AMSTAR-2 consists of 16 items that evaluate various aspects of the systematic review process, including the research question formulation, study selection and data extraction, assessment of risk of bias in individual studies, consideration of publication bias, and appropriate statistical analysis. This tool also assesses the overall methodological quality and risk of bias in the review, providing a comprehensive evaluation.

The decision about how to evaluate overall risk of bias for ROBIS is made at the assessors' discretion, as opposed to the AMSTAR-2 overall judgement, which is prescribed by AMSTAR-2 guidance. Examples of how to interpret methodological quality and risk of bias assessments, and how to make an overall judgement are found in box 1.

Box 1

Decision rules: how to decide that the results of a review are of high quality or at low risk of bias overall

Decision rules are a priori strategies used to specify rules to define explicitly how each item is rated, as well as how an overall judgement is made about a specific systematic review with the AMSTAR-2 and ROBIS tools. In the case of AMSTAR-2, the authors who are using the tool stipulate how to come to an overall high quality rating in the results of the review, but not how to rate each item. For example, item 15 of AMSTAR-2 asks assessors whether an adequate investigation of publication bias (small study bias) was conducted and whether its likely effect on the results was discussed. However, the AMSTAR-2 team did not specify what happens when 10 studies or fewer were included (ie, the analysis will be underpowered to detect publication bias), what methods to detect publication bias are recommended, and if publication bias is detected, how it should be discussed (ie, as a systematic review limitation).

The ROBIS tool equally does not specify what decision rules should be used for assessment of risk of bias, nor how to come to an overall judgement. For example, item 4.6 of ROBIS ("Were biases in primary studies minimal or addressed in the synthesis?") is similar to item 12 of AMSTAR-2 ("If meta-analysis was performed, did the review authors assess the potential impact of risk of bias in individual studies on the results of the meta-analysis?"). Of note, risk of bias should be assessed in any systematic review regardless of whether a meta-analysis was performed. A possible decision rule for answering these two questions when considering whether bias was adressed and considered in the results and their interpretation could be to respond "Yes" or "Probably/Partial Yes" if:

  • All studies received a low risk of bias rating; and

  • Studies were judged at high risk of bias and sensitivity analyses (grouping high v low risk studies in a meta-analysis) or adjustment approaches were used

For a "No" response:

  • Important biases were suspected to have been in the included studies that have been ignored by the review authors; or

  • Risk of bias was not assessed at all in the included studies; or

  • Bias was assessed but authors did not incorporate it into findings, discussion, and conclusions

Based on the above decision rules, how would the following statement be rated? "We planned on conducting sensitivity analysis on the studies based on their level of risk of bias. Most of the included studies had a similar risk of bias across all the domains except for industry sponsorship bias and incomplete data for total testosterone. Due to the inadequate number of studies, we were not able to conduct a sensitivity analysis on the included studies based on industry sponsorship."

For overall judgements, a decision rule could be that if one or more ROBIS domains are at high risk of bias, then the overall study is deemed at high risk of bias. For AMSTAR-2, the authors of the tool have stipulated that the review is considered of low or critical low quality when any of the subset of seven ‘critical’ items have one or more critical flaws. While the decisions about how to rate the items and make overall judgements can be debated, the grounds on which overview authors make these decisions should be noted explicitly in the manuscript or in an appendix, as then the assessment results will be transparent and reproducible.

  • Cautionary note: empirical evidence does not currently support the assignment of scores to items that are met in a risk of bias tool followed by the summation or averaging of these scores to produce a numerical measure of risk of bias. A thoughtful, nuanced, and customised overall judgement is required that considers all items with suspected bias on the basis of specific context.

The AMSTAR-2 and ROBIS tools were designed to assess systematic reviews with pairwise meta-analysis only. A more recent tool under development aims to assess the potential biases and limitations in network meta-analyses.12 13 Guidance documents (eg, Cochrane14 and JBI15) recommend overview authors use ROBIS or AMSTAR-2 when comparing and critically appraising systematic reviews over other available tools. Figure 1 presents two example assessments conducted by our team, the ROBIS assessment of Normansell and colleagues16 is presented at the domain level, and the AMSTAR-2 assessment of Puig and colleagues17 is presented by item. Items are backed by quotes and rationales to support the answers chosen, for full transparency, and to help when comparing assessments between two independent assessors (figure 2).

Figure 1
Figure 1

Example assessments using ROBIS of Normansell16 and AMSTAR-2 of Puig17. The ROBIS assessment is presented by domain and the AMSTAR-2 assessment by individual items. ROBIS's phase one, where the assessor considers the relevance of the systematic review questions to the overview's question, is not shown. The decision about how to evaluate overall risk of bias for ROBIS is made at the assessors' discretion, as opposed to the AMSTAR-2 overall judgement, which is prescribed by AMSTAR-2 guidance

Figure 2
Figure 2

PICO framework stands for patient or problem, intervention or exposure, comparison or control, and outcomes. DLQI=dermatology life quality index; DMARDs=disease modifying anti-rheumatic drugs; PASI=psoriasis area and severity index; RCT=randomised controlled trial

Comparison of AMSTAR-2 and ROBIS

Both the AMSTAR-2 and ROBIS tools provide structured guidelines for reviewers to evaluate and report on methodological strengths and weaknesses as well as potential biases in systematic reviews, contributing to the overall reliability and credibility of the evidence presented.Considerable overlap exists between the items of the two tools (figure 1). In the documentation for each tool, AMSTAR-2 states that it was developed for systematic reviews of healthcare interventions whereas ROBIS states that it is aimed at reviews of healthcare interventions, diagnosis, prognosis, and biological cause. In practice, the ROBIS tool is generic and its signalling questions relate to interventions in the clinical or public health fields. Questions specific to systematic reviews of diagnosis, prognosis, and biological cause are not found in the tool. AMSTAR-2 was developed to assess methodological quality (which includes indicators of risk of bias) while ROBIS was developed primarily to assess risk of bias but also includes items that address methodological quality.

AMSTAR-2 focuses more on reporting comprehensiveness (eg, reporting of study designs for inclusion and reporting on excluded studies with justification) and methodological quality or transparency constructs (eg, pre-established protocol, sources of funding of primary studies, and reviewers' competing interests). Whereas ROBIS focuses on items related to identification of the different biases (eg, selective reporting of outcomes or analyses and publication bias). Bias occurs when factors systematically affect the results and conclusions of a review and cause them to be systematically different from the truth.1 Systematic reviews affected by bias can be inaccurate; for example, finding false positive or false negative intervention effects by systematically over or under estimating the true effect in the target population. Methodological quality focuses on methodological features associated with internal validity. In theory, assessing risk of bias is the preferred approach because a review might have good methodological quality while still being at high risk of bias. For example, a systematic review might have been conducted according to stated guidance, but some relevant databases were not searched for evidence (database selection bias) leaving out crucial primary studies that may affect the results of the review.

In general, assessors found that AMSTAR-2 was more straightforward and user friendly than ROBIS.18 19 The two tools had similar inter-rater reliability.18 20 21 The range in time taken to use AMSTAR-2 was similar to ROBIS (14-60 v 16-60 min) across three comparison studies18 20 21 (table 1). ROBIS users required training and practice in using the tool22 23 and it was often understood and applied differently.20 AMSTAR-2 has been criticised for unclear guidance on some items,24–26 which can lead to varying interpretations and applications. ROBIS is accompanied by voluminous guidance, which can be difficult to manage by the user.21–23

Table 1
|
Comparison of the AMSTAR-2 and ROBIS tools

While AMSTAR-2 and ROBIS are both widely used tools for assessing systematic reviews, in some situations, one may be preferred over the other. AMSTAR-2 may be preferred when:

  • the primary focus is evaluating the methodological quality of a systematic review of interventions;

  • the aim is to broadly assess aspects of review conduct, reporting comprehensiveness, and risk of bias; or

  • a relatively quick and easy to use tool is sought, because AMSTAR-2 has fewer items compared with ROBIS.

ROBIS may be preferred when:

  • the aim is to identify concerns with the review conduct that may point to risk of biases in the results and conclusions, as well as assessing relevance and minimising interpretation bias or ‘spin’;

  • a more nuanced tool is sought, which may involve more thoughtful assessment and time, because ROBIS contains more items compared with AMSTAR-2;

  • the aim is to assess multiple types of systematic reviews to compare risk of bias across them (eg, when preparing a clinical practice guideline).

Reporting and interpretation

When reporting and interpreting the overview results, assessors should note some key considerations with AMSTAR-2 and ROBIS assessments. Authors should first report methodological quality or bias assessment results by item, domain, and overall judgement. In addition, assessment should be reported at the outcome level as opposed to the systematic review level.18 Several responses to AMSTAR-2 item 13 (whether risk of bias was discussed or interpreted) are possible when multiple outcomes (eg, mortality and adverse events) are reported in one systematic review. Ideally, results of intervention overviews should be reported by qualifying the inherent methodological quality or risk of bias in the included systematic reviews as potential limitations.

Subgrouping systematic reviews by low and high risk of bias using ROBIS can be a great way to determine whether authors of reviews of interventions that have a high risk of bias over emphasised their findings and conclusions. Subgrouping also allows overview authors to exclude systematic reviews that are at a high risk of bias from the synthesis. However, using only one single criteria (ie, the systematic reviews at low risk of bias) for inclusion in analyses can result in unintended loss of information through exclusion of important systematic review data (eg, by excluding the systematic review with the greatest number of unique trials).

Conclusions

Overviews are used by guideline developers and policy makers to summarise large bodies of evidence in consideration of interventions of interest on a given topic. Using the appropriate tools to critically appraise included systematic reviews of intervention effects means that a complete assessment of methodological quality and all the potential biases are considered. Systematic reviews vary considerably by method, how data are synthesised, and how results and conclusions are reported, therefore. an assessment of potential biases is necessary to consider their reproducibility, trustworthiness, and usefulness for end users. At this time, the recommended tools to assess methodological quality and bias among systematic reviews included in overviews are AMSTAR-2 and ROBIS. Proper use of these tools for this purpose requires training, time, and methodological insight.