A checklist for evaluating a systematic review

Last Updated on October 9th, 2017 by

In this article, we focus on some of the key elements to look for in a systematic review, and how to assess its credibility, when searching for answers to a specific clinical question. As outlined in our last blog, a systematic review is a summary of all relevant studies on a clearly formulated clinical question, using systematic methods according to a strict, pre-defined protocol. It often involves meta-analysis, a statistical technique for pooling the results from different studies to provide a single estimate of effect. A major limitation of systematic reviews is that they are only as good as the studies they summarize.

For simplicity, we define ‘treatment’ as any intervention, exposure, or clinical attribute that is being assessed in the systematic review, and ‘placebo’ as the comparison intervention or exposure. 

  1. There should be a clear focused clinical question.

As with any individual study report, authors should clearly state the clinical question. This should include four elements, often referred to in the literature as PICO; the patient (P) or study population characteristics, the intervention (I) or exposure or treatment regime, a comparison (C) intervention or treatment, and specific outcomes (O).

  1. Is there sufficient detail on how the literature search was conducted?

A well designed systematic review should involve details of how studies were identified, e.g. what electronic databases were used to retrieve studies, language restrictions, and any additional sources of data including clinical trials registers, conference reports, and whether any unpublished studies were included.

If the search for relevant studies is not exhaustive, the results of the systematic review may be flawed. For example, one study showed that searching only MEDLINE retrieved 55% of eligible clinical trials. It is important to use multiple electronic databases including EMBASE and The Cochrane Library, using various search terms, medical subject headings (MeSH) and synonyms to yield the best results.

  1. Are there pre-defined criteria for which study types will be included?

A systematic review that involves a therapeutic intervention, or will contribute to clinical guideline development, should prioritize randomized controlled trials where available, as these are more reliable and less subject to selection bias compared to observational study designs. Systematic reviews that aim to assess the adverse effects of treatment may include observational studies, such as case-control studies and post-marketing surveillance studies.

PRISMA guidelines recommend that more than one contributor should be involved in selecting and reviewing the studies for inclusion, in order to avoid subjective decisions. The kappa statistic (κ) of inter-reviewer agreement should be estimated and reported to provide readers and those using the results with a degree of confidence in the systematic review.

  1. Does the systematic review include meta-analysis?

Meta-analysis is a statistical technique that combines the results of multiple studies to produce a single estimate of effect, which tends to be more reliable than those from the individual studies because it is based on a larger sample size. However, meta-analysis should only be performed if the individual studies are sufficiently similar in terms of the PICO (patients, intervention, comparisons and outcomes). It is therefore important that the study question has a relatively narrow focus. For example, consider the following questions:

A. What is the effect of all cancer treatments on cancer outcomes?

B. What is the effect of chemotherapy on ovarian cancer survival?

C. What is the effect of carboplatin-based chemotherapy on ovarian cancer-specific survival?

D. What is the effect of standard doses of paclitaxel and carboplatin chemotherapy on ovarian cancer-specific survival?

Question D considerably narrows the focus of the overall research question to a specific treatment for a specific disease condition and a specific outcome measure, and is more likely than Question A to provide a meaningful result with clinical application. However, the results of Question D will need to be carefully applied, as the question does not address differences in population and other aspects of the disease biology.

  1. The meta-analysis should include a test for study heterogeneity and the results interpreted.

Meta-analysis should always include a statistical test for heterogeneity. This assesses the consistency of the results or variation in outcomes, across included studies. Most tests for study heterogeneity generate a p-value that should be reported and interpreted by the author in the context of the clinical implications of the study findings. One of the best measurements of heterogeneity is the I2 statistic which describes the proportion of variation across studies that is due to heterogeneity rather than chance. In real life, study heterogeneity may mean that the treatment effect may differ between patient groups, possibly according to ethnicity, age, gender etc.

  1. The results of meta-analyses should be graphically displayed with a forest plot.

Forest plots are the most effective way to present individual estimates from the input studies included in the meta-analysis, as well as the single summary estimate derived from the meta-analysis. Study estimates are most commonly expressed as an odds ratio comparing the treatment vs. placebo or comparison intervention, along with their 95% confidence intervals.

The odds ratio is simply a ratio of the effect of the treatment to that of the placebo. It is often called the effect size, and is derived from the simple division of the size of the treatment effect over that of the placebo. If both are very close, then the result of this division is 1, also known as the ‘null’ value, i.e. no difference in outcome between the treatment and the placebo.

The confidence interval is a range of likely effect sizes for the study population and contains the true estimate of effect. It also indicates how ‘confident’ we can be in the results. Narrower confidence intervals indicate that the effect size is very precise or believable, and close to the true population effect, whereas a wide confidence interval suggests that the effect size is very variable and imprecise, is less believable, and should be interpreted with caution.

The vertical axis of the forest plot represents the ‘null’ value of no difference between treatment groups. The odds ratios for individual input studies shown in the forest plot is often depicted as a square, the size of which depends on the sample size of the study, and therefore the ‘weight’ it carries in the meta-analysis; the line drawn through this square represents the confidence interval. The summary estimate from the meta-analysis is typically a weighted average of the results of individual input studies and is often represented on the forest plot as a diamond shape closest to the horizontal axis. The vertical points of the diamond indicate the summary effect estimate, and horizontal points indicate the range of the confidence interval.

  1. The systematic review report should include commentary on bias or study limitations.

A well-conducted systematic review should report information that will help readers decide on the applicability of the results. It should include some commentary on possible sources of bias, e.g. publication bias arising from the tendency of journals to publish studies that have positive effects. Language bias can also be a factor if studies are selected because they are published in an English language journal. Authors of systematic reviews should comment on the risk of bias in the individual studies included, and their interpretation of the result of meta-analysis in the context of these limitations.  They should also include an explanation of study heterogeneity as a potential limitation, as outlined in point #5.

  1. An interpretation of the results and implications for clinical practice or further research should be provided.

The authors should provide an interpretation and explanation of all reported statistical estimates and their meaning in terms of the clinical application of the study findings. Readers of systematic reviews can also visually inspect forest plots to identify differences in effect estimates, overlapping confidence intervals, and the direction of the effect from individual studies, i.e. are most of the odds ratios or squares in the forest plot falling on one side of the ‘null’ line or both sides?

Effect estimates falling on the left of the ‘null’ line indicate that the treatment has a favorable effect on the outcome compared to the placebo; those falling on the right side of the ‘null’ line suggest that the treatment has a worse effect on the outcome compared to the placebo.

A similar judgment can be made for the summary estimate from a meta-analysis to gauge how confident you can be in these results. However, it is incumbent on the authors to provide details of their interpretation, implications for clinical practice, or whether the results are ready for clinical application, and what limitations to its application should be considered.

There is much debate in the scientific literature that systematic reviews cause research waste in light of the mass production of such publications that are poorly designed and conducted, with exaggerated claims. Judging the quality of systematic reviews is a first step in determining how credible the findings are, whether the methods conform to PRISMA guidelines for conduct and reporting, and how confident we can be in applying their results to healthcare.

At SugarApple Communications we can help you find the best way to communicate with your intended audience and assist with writing, editing and statistics. Get in touch today and let’s talk.

← Back to All Articles

Unfog the science…ensure quality, clarity and accuracy.