Systematic-Review 2024
The usefulness and interpretation of systematic reviews
(for instance, randomised versus observational TABLE 1 Comparison of the characteristics of systematic and narrative reviews
studies) and which studies will be included
or excluded according to explicit criteria, Systematic reviews Narrative reviews
predefined in the review protocol. For example, Also called ‘overviews’ Also called ‘traditional reviews’
a question regarding the efficacy of treatments
Collect all studies that address a clearly Select studies on the basis of the views of the
(Which treatment is better? Which dose is more defined clinical question review author(s), usually experts in the field
effective?) is usually best answered by reviewing Use explicit methodological strategies to Select studies on implicit criteria, rather than
evidence from randomised controlled trials identify all studies on a specific topic using explicit methodological strategies
(RCTs), because randomisation protects against All studies that meet predefined criteria Selection and synthesis of results are mainly
selection bias (Table 2). However, RCTs are not are considered and included, although based on the experience and views of the
the appropriate trial design for all questions. different weight may be allocated in the review author(s)
final conclusion, depending on the strengths/
Questions of aetiology (Does stroke predispose weaknesses of the methodology of each study
to later depressive disorder?) are better answered Methods used in the critical appraisal and Do not generally include a section describing
by cohort and case–control studies. Diagnostic synthesis of data are clearly defined methods used to synthesise results
questions (How well does a screening tool pick Gaps/weaknesses in data are clearly described Gaps/weaknesses are described according to
up cases of early psychosis?) are best studied with the opinion of the review author(s)
cross-sectional and prospective studies of patients Explicit methodology reduces the risk of bias, Lack of a clear and explicit methodology
at risk of the disorder. Such studies are called but may not exclude it completely increases the possibility of bias and the
incorrect interpretation of study findings
diagnostic validity studies when one diagnostic
The results may be misleading, but the extent
method is compared with an existing comparator of unreliability is difficult to judge.
or gold standard.
Can be published on a database of systematic Are not usually updated as new studies
Once the question has been identified, the reviews (e.g. the Cochrane Library, www. become available
review proceeds to the systematic identification of Represent a summary of studies selected by an
all the relevant studies addressing that question and updated regularly and systematically expert at a particular time point
according to the original criteria
(according to the methods described in the May be useful as a descriptive tool to
summarise the different aspects of a complex
protocol). Published data are often accessed via question
electronic databases such as PubMed, Embase,
PsycINFO and CINAHL. Care needs to be taken
in the choice and arrangement of keywords used GRADE provides a system for rating the quality
in the search, as this will have a significant effect of evidence and the strength of recommendations
on which papers are identified. Reviewers should that is comprehensive and pragmatic, and is
search not only for published studies, but also for increasingly being adopted worldwide. This can
unpublished data and ‘grey literature’ (informally help to ensure that judgements about the risk of
published written material, such as technical bias, as well as other factors affecting the quality
reports or working papers from research groups). of evidence (such as imprecision, heterogeneity and
Reviewers should make all practicable efforts to publication bias), are considered when interpreting
counteract any publication bias that may exist (see the results of systematic reviews.
‘Methods to reduce the effects of bias’ below).
Following identification of the studies, the
TABLE 2 Types of bias and the strategies used to minimise bias in RCTs
reviewers critically appraise each one. The extent
to which a systematic review can draw conclusions
Bias Strategy adopted to prevent the bias
about the effects of an intervention depends on
whether the data and results from the included Selection bias Randomisation (e.g. a computer-generated
(systematic differences between baseline random number table)
primary studies are valid. A study’s validity
characteristics of the groups) Allocation concealment (concealing the
relates to whether it answers its research question sequence of allocation so that it cannot be
‘correctly’, that is, without bias (Higgins 2011b). foreseen)
The evaluation of the validity of the included Performance bias Masking/blinding of participants and study
studies is therefore an essential component of (systematic differences in care or exposure to personnel to which intervention has been
a systematic review, and should influence the other factors between groups) allocated
analysis, interpretation and conclusions of that Detection bias Masking/blinding of outcome assessors and
(systematic differences between groups in participants
review. High-quality evidence is not always how outcomes are determined)
available for all outcomes of interest. In such a
Attrition bias Complete reporting of outcome data to include
case, summary evidence can still be presented, (systematic differences between groups in withdrawals and exclusions, with reasons
together with a measure of quality to guide withdrawals from a study)
the reader, for example using the Grading of Reporting bias Complete reporting of outcome data
Recommendations Assessment, Development and (preferential reporting of only favourable
results within a study)
Evaluation (GRADE) approach (Guyatt 2008).
The Cochrane Library (www.thecochranelibrary. always reported in original publications and the
com) is possibly the best-known database of randomisation of treatments in the primary studies
systematic reviews and the website contains within may not have been stratified according to the
it several different databases. These include the same subgroups. In addition, the more subgroup
Cochrane Database of Systematic Reviews (CDSR), analyses that are performed, the more likely it
Cochrane Central Register of Controlled Trials is that a statistically significant, but incorrect
(CENTRAL), Cochrane Methodology Register result will be found purely by chance, as shown
(CMR), Database of Abstracts of Reviews of Effects in Box 1. As a general rule, any subgroup analysis
(DARE) and Health Technology Assessment within a meta-analysis should be treated carefully
Database (HTA). and is best regarded as generating hypotheses for
testing in the future, rather than providing reliable
Meta-analysis evidence about a particular subgroup.
Meta-analysis refers specifically to the use of
statistical techniques to summarise data quan Strengths and potential pitfalls
titatively as part of a systematic review (Higgins of meta-analysis
2011a). However, the term is often used more Strengths
loosely to refer to any systematic review that
Meta-analysis as a statistical tool has great
uses statistical methods to combine, weigh and
strengths. Effect size is the estimate of the effect
summarise the results of several studies (Cook
of a treatment in a study (e.g. the risk ratio or
1995). The results from the original studies (e.g.
odds ratio for dichotomous outcomes and the
primary and secondary outcomes, rates of adverse
mean difference or standardised mean difference
effects) are extracted, put together and analysed
for continuous outcomes (Nikolakopoulou 2014)),
statistically in a final pooled estimate. Various
and the techniques of meta-analysis pool research
statistical software packages are available to
data from a number of studies to provide an
perform these analyses, such as RevMan (http://
overall estimate of effect size in an easily digestible and Meta-DiSc
form. The results of a meta-analysis are usually
(w w
(which are both free to use), Stata (www.stata.
com) and Comprehensive Meta-Analysis (www. A meta-analysis BOX 1 The effects of chance on a subgroup
should take into account the characteristics of analysis
each of the primary studies, as the methodological
Counsell et al (1994) conducted an investigation of the
quality of individual trials will affect the quality
effects of chance on the results of a systematic review
of recommendations that each meta-analysis containing a subgroup analysis of a fictional treatment
can provide. It is important to note that the called DICE:
statistical methods of meta-analysis should only • 44 randomised trials were simulated by rolling dice
be undertaken following a systematic review (only – each roll of the die yielding the outcome for one
a systematic review can guarantee transparent ’patient’
and comprehensive collection of all the available • each investigator performed two trials to simulate the
evidence, to avoid systematic biases in the selection effect of gaining experience with the intervention
of studies to be analysed). By contrast, meta-
• it was pre-specified that subgroup analyses would be
analysis is not an essential part of every systematic
performed to distinguish each investigator’s first trial
review: in some cases it may not be appropriate to from their second.
combine the results of studies, for example if the
Overall, chance alone showed that ’DICE treatment’
original studies are too different from each other.
was non-significantly better than ’control’, as measured
The overall results of meta-analysis give by death rates. Overall, the analysis did not show a
main treatment effects and relate to the average significant difference in death rates for DICE treatment.
response in an average patient. Clinical practice, However, in a subgroup analysis looking only at
however, involves the assessment and treatment ‘published’ trials (using a model of publication bias from
of an individual, and so the results of a subgroup real trials) performed by ‘experienced’ operators (second
analysis (according to different clinical or trials only), there was a significant 23% reduction in
socio-demographic characteristics) may at first mortality. Thus, significant subgroup effects can be found
appear more relevant to the decisions made by due to chance alone.
clinicians. Subgroup analysis can be performed Remember the meaning of the acronym DICE – Don’t
by combining data from specific subgroups in Ignore Chance Effects.
each study. However, results in subgroups are not
presented in a forest plot or ‘blobbogram’ (Cipriani achieve meaningful results from meta-analysis:
2006). In the plot the left-hand column lists the ‘garbage in, garbage out’. A meta-analysis needs
names of the studies (usually in chronological to determine to what extent variations in study
order) and the right-hand column shows the effect quality affect the decision to combine the data.
size for each of them (often represented by a square) Many tools have been proposed for assessing
incorporating confidence intervals represented by the quality of studies for use in the context of a
horizontal lines. The meta-analysed measure of systematic review and meta-analysis. Most tools
effect is usually plotted as a diamond, the lateral are either scales, on which various components
points of which indicate confidence intervals of quality are scored and combined to give a
for this estimate. By combining the effect sizes summary score, or checklists, in which specific
statistically, the meta-analysis produces much questions are asked (Jüni 2001). Many instruments
larger sample sizes, minimising random error and contain not only items based on the generally
increasing the generalisability of the study results. accepted criteria for methodological quality
In addition, the methods used in the analysis (randomisation, allocation concealment, masking/
assess the quality of the included studies and blinding), but also items that are not directly
thus the reviewers can indicate the strength of the related to internal validity, such as the presence
summary evidence they report (Higgins 2011b). of a power calculation (which relates more to the
precision of the results) or whether the inclusion
Potential pitfalls and exclusion criteria are clearly described (which
relates more to applicability than validity) (Moher
The methodology of the systematic review
1995). Probably the best example of methods used
Care should always be used in the interpretation for assessing quality in RCTs is CONSORT (www.
of the results of a meta-analysis, as their validity, but there are different
is dependent on the methodology of the original methods for other study designs. These include
systematic review. If this was not properly QUADAS ( for studies of
conducted, the results of the meta-analysis will diagnostic test accuracy, STROBE (www.strobe-
be biased. When reading a systematic review it for observational studies and
is important to be able to assess its merits, as not TREND ( for non-
all systematic reviews use the same methodology randomised studies.
(Box 2). The extent to which bias has been These tools vary and some focus more on the
controlled gives a measure of the internal validity quality of reporting than on the underlying study
of the study. External validity (or generalisability) methodology. To address this problem, the Cochrane
gives a measure of the extent to which the results Collaboration recommends assessing study quality
provide a correct basis for generalisations to other using its ‘risk of bias’ tool, which is neither a scale
circumstances. nor a checklist. It is a domain-based evaluation,
in which critical assessments are made separately
The quality of primary studies for different study-related issues: random sequence
The results of the analysis will also be affected generation; allocation concealment; masking/
by the quality of the primary studies. If the blinding of participants and personnel; masking/
quality is poor, then it may not be possible to blinding of outcome assessment; incomplete
outcome data; selective reporting; and other sources
of bias (Higgins 2011b).
BOX 2 How to appraise the merits of a
systematic review and meta-analysis Addressing the clinical and statistical heterogeneity
• What are the affiliations and financial support for the
of studies
review and its authors? Studies always vary, for example in terms of the
• What are the methods used to identify and select the types of participants involved, the methods used,
primary studies on which the review is based? the types of intervention used as a comparator, the
• What was the quality of the primary studies? length of follow-up and the outcomes measured.
Therefore, there will need to be an element of
• Were the analysis and synthesis appropriate?
selection of studies for inclusion. To avoid bias,
• Were possible sources of bias taken into account?
before starting the review it is very important
• What was the statistical and clinical significance of the to specify the main criteria for selecting studies
results? in the review protocol. Reviewers need to avoid
• Has there been an update of the literature search? over-inclusion of disparate studies, but also over-
exclusion of studies that have relevant data.
However, even if the inclusion/exclusion criteria they found that only 57% of the studies yielded
are clear and coherent, sometimes the included data that were usable in their meta-analysis. This
studies differ significantly. This ‘heterogeneity’ is typical for systematic reviews in this area, and
can present challenges. It may not be possible to severely limits generalisability.
merge the results and perform a meta-analysis;
where there is significant heterogeneity this has Summary
been likened to the error of ‘combining oranges The results of a meta-analysis rely not only on
and apples’ (Eysenck 1994). Even if it is possible to the methodology used in the systematic review
pool the studies, heterogeneity may well be found and meta-analysis, but also on the quality of
during the analysis. If so, usually a random effects the studies used as the primary data source.
model analysis is recommended, as this recognises Systematic reviews and meta-analyses on the same
that the observed differences in effect sizes between topic may produce conflicting results. For example,
different studies reflect true heterogeneity as well since the publication of the landmark paper by
as random error (Nikolakopoulou 2014). For this Caspi and colleagues (Caspi 2003) suggesting
reason, pooled estimates from such an analysis that the serotonin transporter gene modifies the
have wider confidence intervals and results are relationship between stressful life events and
more conservative than a fixed effects analysis. depression, a number of individual studies on the
An example of the difficulties involved in subject have been conducted. Meta-analyses of
addressing heterogeneity in studies in psychiatry those studies have been contradictory, with some
is the question of the effectiveness of community (e.g. Risch 2009) not supporting and others (e.g.
treatment, either intensive or standard, in Karg 2011; Miller 2013) supporting such a gene–
improving the outcome of patients. The systematic environment interaction. So, even though meta-
reviews addressing this question have all struggled analysis is probably the most robust tool currently
with similar issues. For example, the definitions available to summarise the evidence, the results
of ‘community treatment’ and ‘control treatment’ are rarely unequivocal and always need careful
vary significantly between the centres conducting appraisal and interpretation.
the trials and have changed over the time that the
studies have been conducted. Complex mental
Bias in systematic reviews
health interventions and services are difficult to
standardise, and also the labels ‘standard care’
and meta-analyses
and ‘usual services’ used as the control treatment Bias can occur during the selection, appraisal or
are often ill-defined and may overlap with the synthesis of data and should be avoided, as it gives
active treatment. In addition, studies have differed inaccurate or misleading results. Types of bias are
in their choice of the best indicator of outcome, summarised in Box 3.
with different measures used (Dieterich 2010). One A key source of bias in systematic reviews is
approach (e.g. Marshall 2000a,b) is to rely on the publication bias, which occurs as a result of the
labels (such as assertive community treatment, tendency for authors, reviewers and editors to
case management, standard care) given to each publish preferentially studies that have a clearly
treatment arm by the investigators in the original defined, statistically significant result (Mavridis
studies. This is a practical solution, but may well 2014). Studies where the treatment has a similar
mask an underlying ‘clinical heterogeneity’ in or lesser effect than placebo, or than the current
the different treatment arms. Some reviews (e.g. well-established treatment, are less likely to be
Murphy 2012) have found only small numbers published. Publicly funded research is more likely
of studies that meet their criteria. In addition, to be published whatever the results, whereas
many of the studies in this area have small sample commercially funded research shows a significant
sizes (e.g. Malone 2007), giving them inadequate bias towards publication when the findings
power to detect statistically significant outcome are positive (Dickersin 1990). A meta-analysis
differences, leading to ‘statistical heterogeneity’. based purely on published results may well be
Catty et al (2002) used broader inclusion criteria misleading as the published set of data may not
in order to include more studies and increase the be a representative sample of the overall evidence
overall sample size. They included all studies (Higgins 2011b). For example, Turner et al (2008)
of ‘home treatment’, which encompassed any obtained reviews from the US Food and Drug
treatment outside hospital. Despite these broad Administration (FDA) of unpublished studies
inclusion criteria, and the choice of only one of antidepressants submitted for regulatory
outcome measure (days in hospital) and intensive approval. The authors matched results from
follow-up of the authors of the primary studies, unpublished reports with the corresponding
or can be incomplete, not representative of the or study location) and varies across types of
sample being studied and may not have been patients (e.g. grouped by age or stage of disease).
peer reviewed. The methods of a meta-analysis IPDMA has many potential advantages over meta-
should recognise that, despite the best efforts analyses using aggregate data, where the data
of the reviewers, there is likely to be a degree are sometimes poorly reported, not available or
of publication bias in the studies selected for a presented differently across studies (Riley 2010).
systematic review. Use of individual data standardises study methods
Researchers attempt to detect publication bias and often provides extra data (e.g. longer follow-
using a number of statistical tests (e.g. Egger’s up, more outcome measures) not included in the
test and funnel plots) that rely on the underlying original aggregate publication. However, IPDMA
theory that studies with small sample size will be is a highly time-consuming and resource-intensive
more prone to publication bias, whereas larger approach, for both the reviewers and the original
studies are more likely to be published regardless study authors; it requires advanced statistical
of their findings (Egger 1997). In a funnel plot, methods and the original data may well be poor or
effect sizes are plotted on the horizontal axis missing. It has not been widely used in psychiatry
against a measure of the weight/size of each as yet, although there are some examples of how
study (e.g. standard error or sample size) on the IPDMA can help clinicians weigh up the benefits
vertical axis. A symmetrical funnel will be formed of psychiatric treatment in the individual patient
if publication bias is absent, but the funnel will (e.g. Furukawa 2015). The proposal of Tudur
be skewed or asymmetrical if it might be present Smith and colleagues (2014) to start a central
(Egger 1997). It is common, therefore, for a meta- repository of individual patient data from trials
analysis to show a funnel plot and perform tests would substantially reduce the time required to
such as the ‘trim and fill’ method to identify and source the original data.
adjust for asymmetry (Duval 2000). Asymmetry
is often interpreted as showing direct evidence of Network meta-analysis
the presence of publication bias. However, this is
Meta-analyses use as their standard statistical
too simplistic: asymmetry may also result from
technique pair-wise comparisons of treatments.
an essential difference (or heterogeneity) between
This means that when reviewing the data on the
smaller and larger studies (Lau 2006). For example,
efficacy of all available treatments for a particular
small studies may focus on high-risk patients, for
condition, the clinician is presented with an array of
whom treatment may be more effective; or small
pair-wise comparisons, whereas they would rather
studies may have a shorter follow-up. Variation
compare the relative efficacy of all treatments
in quality also affects the shape of the funnel
simultaneously. In addition, some comparisons
plot, with smaller, lower-quality studies showing
between treatments have not been studied directly
greater benefit of treatment.
and so there are no direct data on which to base
a pair-wise comparison. Network meta-analysis
Examples of advanced methodology
(NMA) (also called multiple treatments meta-
Individual patient data meta-analysis analysis or mixed treatment comparison) is a
As already mentioned (see ‘Meta-analysis’ above), statistical method that can fill this gap as it allows
subgroup analysis within a standard meta-analysis multiple treatments to be assessed at the same
has significant limitations. Individual patient data time, using direct and indirect evidence from the
meta-analysis (IPDMA) is a potentially useful comparison data available (Caldwell 2005). The
approach in which a meta-analysis is conducted indirect evidence comes from inferring the relative
using the data on individual patients from primary efficacy of two drugs that have not been directly
studies (Clarke 2005). This allows more accurate compared with each other, but that have each
subgroup analyses because they can be based on been directly compared with the same comparator
common subgroup classification across studies. drug. So for example, as shown in Fig. 1, if there
It is crucial that the meta-analysis preserves the are trials of drug A v. drug B, then this gives us
original clustering of the patients within studies: direct information on their efficacy relative to each
it is inappropriate to analyse the data from all other. Trials of drugs A v. C and drugs B v. C can
the patients as if they had all participated in the also supply indirect data on the relative efficacy
same study. However, an appropriate analysis of A v. B. The use of indirect evidence performs
can produce results that inform evidence-based two functions: it provides data on comparisons for
practice, such as a pooled estimate of treatment which no trials exist and it improves the precision
effect across all studies, how the treatment effect of the direct data by adding indirect data (and
varies between studies (e.g. with treatment dose therefore reducing the width of the confidence
MCQ answers
1e 2c 3b 4d 5a
MCQs d The meta-analysis of a biased systematic c It allows subgroup analysis if individual data
Select the single best option for each question stem review will also be biased are preserved in their original clusters
e The Cochrane Collaboration recommends a d The statistical methods are easy to use and
1 Which of the following is not true of a well- domain-based bias tool. data retrieval is not time-intensive
conducted systematic review? e It can analyse how treatment effects vary in
a Studies on a specific topic are identified 3 Which of the following is not true different patient groups.
b Studies that meet predefined criteria are regarding meta-analysis?
included a It pools data statistically from different studies 5 Which of the following is not true
c The methods used to appraise and synthesise to give an overall estimate of effect size with a regarding network meta-analysis?
the data are clearly defined greater sample size a It uses only indirect evidence to compare
d The review is regularly updated using the b Heterogeneity is usually addressed using a treatments
original criteria fixed effects analysis b Treatments can be ranked against a specific
e The systematic and explicit methods used c It is the use of statistical techniques to variable, e.g. efficacy or tolerability
eliminate the possibility of bias. quantitatively summarise data c Indirect data can provide information where no
d Meta-analyses of the same question can give direct comparison exists
2 Which of the following is not true significantly different conclusions d Indirect data can be added to direct data to
regarding bias? e Its results can be summarised in a forest plot. increase the sample size of that comparison
a The risk of bias is greater for narrative reviews e The results can be easily understood by
than for systematic reviews 4 Which of the following is not true of clinicians and applied to clinical practice.
b One of the aims of guidelines such as the individual patient data meta-analysis?
PRISMA statement is to minimise publication a It can include data obtained, but not reported in
bias the original studies
c Masking/blinding of outcome assessors may b It can investigate how treatment effects vary
help overcome selection bias across centres