Cap 1 y 2
Cap 1 y 2
CONTENTS
1.1 Introduction............................................................................................................................. 1
1.2 Topic Preparation.................................................................................................................... 3
1.3 Literature Search.....................................................................................................................7
1.4 Study Screening...................................................................................................................... 8
1.5 Data Extraction........................................................................................................................ 9
1.6 Critical Appraisal of Study and Assessment of Risk of Bias.......................................... 10
1.7 Analysis.................................................................................................................................. 11
1.8 Reporting............................................................................................................................... 12
1.9 Using a Systematic Review.................................................................................................. 13
1.10 Summary................................................................................................................................ 14
References........................................................................................................................................ 14
1.1 Introduction
The growth of science depends on accumulating knowledge building on the past work of
others. In health and medicine, such knowledge translates into developing treatments for
diseases and determining the risks of exposures to harmful substances or environments.
Other disciplines benefit from new research that finds better ways to teach students, more
effective ways to rehabilitate criminals, and better ways to protect fragile environments.
Because the effects of treatments and exposures often vary with the conditions under
which they are evaluated, multiple studies are usually required to ascertain their true
extent. As the pace of scientific development quickens and the amount of information in
the literature continues to explode (for example, about 500,000 new articles are added to the
National Library of Medicine’s PubMed database each year), scientists struggle to keep up
with the latest research and recommended practices. It is impossible to read all the stud-
ies in even a specialized subfield and even more difficult to reconcile the often-conflicting
messages that they present. Traditionally, practitioners relied on experts to summarize the
literature and make recommendations in articles that became known as narrative reviews.
Over time, however, researchers began to investigate the accuracy of such review arti-
cles and found that the evidence often did not support the recommendations (Antman et
al., 1992). They began to advocate a more scientific approach to such reviews that did not
rely on one expert’s idiosyncratic review and subjective opinion. This approach required
DOI: 10.1201/9781315119403-1 1
2 Handbook of Meta-Analysis
documented evidence to back claims and a systematic process carried out by a multidisci-
plinary team to ensure that all the evidence was reviewed.
This process is now called a systematic review, especially in the healthcare literature.
Systematic reviews use a scientifc approach that carefully searches for and reviews
all evidence using accepted and pre-specifed analytic techniques (Committee on
Standards, 2011). A systematic review encompasses a structured search of the literature
in order to combine information across studies using a defned protocol to answer a
focused research question. The process seeks to fnd and use all available evidence,
both published and unpublished, evaluate it carefully and summarize it objectively to
reach defensible recommendations. The synthesis may be qualitative or quantitative,
but the key feature is its adherence to a set of rules that enable it to be replicated. The
widespread acceptance of systematic reviews has led to a revolution in the way prac-
tices are evaluated and practitioners get information on which interventions to apply.
Table 1.1 outlines some of the fundamental differences between narrative reviews and
systematic reviews.
Systematic reviews are now common in many scientifc areas. The modern systematic
review originated in psychology in a 1976 paper by Gene Glass that quantitatively sum-
marized all the studies evaluating the effectiveness of psychotherapy (Glass, 1976). Glass
called the technique meta-analysis and the method quickly spread into diverse felds such
as education, criminal justice, industrial organization, and economics (Shadish and Lecy,
2015). It also eventually reached the physical and life sciences, particularly policy-intensive
areas like ecology (Järvinen, 1991; Gurevitch et al., 1992). It entered the medical literature
in the 1980s with one of the earliest infuential papers being a review of the effectiveness
of beta blockers for patients suffering heart attacks (Yusuf et al., 1985) and soon grew very
popular. But over time, especially in healthcare, the term meta-analysis came to refer pri-
marily to the quantitative analysis of the data from a systematic review. In other words,
systematic reviews without a quantitative analysis in health studies are not called a meta-
analysis, although this distinction is not yet frmly established in other felds. We will
maintain the distinct terms in this book, however, using meta-analysis to refer to the statis-
tical analysis of the data collected in a systematic review. Before exploring the techniques
available for meta-analysis in the following chapters, it will be useful frst to discuss the
parts of the systematic review process in this chapter. This will enable us to understand
TABLE 1.1
Key Differences between Narrative and Systematic Review
Narrative review Systematic review
FIGURE 1.1
Systematic review process.
the sources of the data and how the nature of those sources affects the subsequent analysis
of the data and interpretation of the results.
Systematic reviews generally involve six major components: topic preparation, literature
search, study screening, data extraction, analysis, and preparation of a report (Figure 1.1).
Each involves multiple steps and a well-conducted review should carefully attend to all
of them (Wallace et al., 2013). The entire process is an extended one and a large, funded
review may take over a year and cost hundreds of thousands of dollars. Fortunately,
several organizations have written standards and manuals describing proper ways
to carry out a review. Excellent references are the Institute of Medicine’s Standards for
Systematic Reviews of Comparative Effectiveness Research (Committee on Standards, 2011),
the Cochrane Collaboration’s Cochrane Handbook for Systematic Reviews of Interventions
(Higgins et al., 2019) and Handbook for Diagnostic Test Accuracy Reviews (Cochrane, https
://methods.cochrane.org/sdt/handbook-dta-reviews), and the Agency for Healthcare
Research and Quality (AHRQ) Methods Guide for Effectiveness and Comparative Effectiveness
Reviews (Agency for Healthcare Research and Quality, 2014). We briefy describe each
component and reference additional sources for readers wanting more detail. Since the
process is most fully developed and codifed in health areas, we will discuss the process
in that area. However, translating the ideas and techniques into any scientifc feld is
straightforward.
Stakeholders may include patients, clinicians, caregivers, policy makers, insurance com-
panies, product manufacturers, and regulators. Each of these groups of individuals will
bring different perspectives to ensure that the review answers the most important ques-
tions. The use of patient-reported outcomes provides an excellent example of the change
in focus brought about by involvement of all stakeholders. Many older studies and meta-
analyses focused only on laboratory measurements or clinical outcomes but failed to
answer questions related to patient quality of life. When treatments cannot reduce pain,
improve sleep, or increase energy, patients may perceive them to be of little beneft even
if they do improve biological processes. It is also important to address potential fnancial,
professional, and intellectual conficts of interest of stakeholders and team members in
order to ensure an unbiased assessment (Committee on Standards, 2011).
Thoroughly framing the topic to be studied and constructing the right testable questions
forms the foundation of a good systematic review. The foundation underlies all the later
steps, especially analysis for which the proper approach depends on addressing the right
question. Scope is often motivated by available resources (time, money, personnel), prior
knowledge about the problem and evidence. Questions must carefully balance the trad-
eoff between breadth and depth. Very broadly defned questions may be criticized for not
providing a precise answer to a question. Very narrowly focused questions have limited
applicability and may be misleading if interpreted broadly; there may also be little or no
evidence to answer them.
An analytic framework is often helpful when developing this formulation. An analytic
framework is a graphical representation that presents the chain of logic that links the
intervention to outcomes and helps defne the key questions of interest, including their
rationale (Anderson et al., 2011). The rationale should address both research and decision-
making perspectives. Each link relating test, intervention, or outcome represents a poten-
tial key question. Stakeholders can provide important perspectives. Figure 1.2 provides
an example from an AHRQ evidence report on the relationship between cardiovascular
disease and omega-3 fatty acids (Balk et al., 2016).
For each question, it is important to identify the PICOS elements: Populations (partici-
pants and settings), Interventions (treatments and doses), Comparators (e.g., placebo, stan-
dard of care or an active comparator), Outcomes (scales and metrics), and Study designs
(e.g., randomized and observational) to be included in the review. Reviews of studies of
diagnostic test accuracy modify these components slightly to refect a focus on tests, rather
than treatments. Instead of interventions and comparators, they examine index tests and
gold standards (see Chapter 19). Of course, some reviews may have non-comparative out-
comes (e.g., prevalence of disease) and so would not have a comparator. Table 1.2 shows
potential PICOS components for this study to answer the question posed in the omega-3
review “Are omega-3 fatty acids benefcial in reducing cardiovascular disease?”
As with primary studies, it is also important to construct a thorough protocol that
defnes all of the review’s inclusion and exclusion criteria and also carefully describes how
the study will carry out the remaining components of the systematic review: searching,
screening, extraction, analysis, and reporting (Committee on Standards, 2011; Moher et al.,
2015).
Because the PICOS elements comprise a major part of the protocol that informs the
whole study design, it is useful to discuss each element of PICOS in turn. Defning the
appropriate populations is crucial for ensuring that the review applies to the contexts for
which it is intended to apply. Often, inferences are intended to apply widely, but studies
in the review may only focus on narrow settings and groups of individuals. For example,
Introduction to Systematic Review and Meta-Analysis 5
FIGURE 1.2
Analytic framework for omega-3 fatty acid intake and cardiovascular disease (Balk et al., 2016).
TABLE 1.2
Potential PICOS Criteria for Addressing the Question: “Are Omega-3 Fatty Acids Benefcial in
Reducing Cardiovascular Disease?”
Participants Interventions Comparator Outcomes Study Design
Primary Fish, EPA, DHA, Placebo Overall mortality RCTs
prevention ALA No control Sudden death Observational studies
Secondary Dosage Isocaloric control Revascularization Follow-up duration
prevention Background intake Stroke Sample size
Duration Blood pressure
many studies exclude the elderly and so do not contribute information about the complete
age spectrum. Even when some studies do include all subpopulations, an analysis must
be carefully designed in order to evaluate whether benefts and harms apply differently
to each subpopulation. Homogeneity of effects across geographic, economic, cultural, or
other units may also be diffcult to test if most studies are conducted in the same environ-
ment. For some problems, inferences may be desired in a particular well-defned area,
such as effects in a single country, but in others variation in wider geographic regions
6 Handbook of Meta-Analysis
may be of interest. The effect of medicines may vary substantially by location if effcacy is
infuenced by the local healthcare system.
Interventions come in many different forms and effects can vary with the way that an
intervention is implemented. Different durations, different doses, different frequencies
and different co-interventions can all modify treatment effects and make results heteroge-
neous. Restricting the review to similar interventions may reduce this heterogeneity but
will also reduce the generalizability of the results to the populations and settings of inter-
est. It is often important to be able to evaluate the sensitivity of the results to variations in
the interventions and to the circumstances under which the interventions are carried out.
Thus, reviewers must carefully consider the scope of the interventions to be studied and
the generality with which inferences about their relative effects should apply. Many ques-
tions of interest involve the comparison of more than two treatments. A common example
is the comparison of drugs within a class or the comparison of brand names to generics.
Network meta-analysis (see Chapter 10) provides a means to estimate the relative effcacy
and rank the set of treatments studied.
The type of comparator that a study uses can have a large impact on the treatment effect
found. In many reviews, interest lies in comparing one or more interventions with stan-
dards of care or control treatments that serve to provide a baseline response rate. While the
placebo effect is well-known and often surprisingly reminds us that positive outcomes can
arise because of patient confdence that a treatment is effcacious, many effectiveness stud-
ies require the use of an active control that has been previously proven to provide benefts
compared to no treatment. Because active controls will typically have larger effects than
placebos, treatment effects in studies with active controls are often smaller than in those
with placebo controls. Combining studies with active controls and studies with placebo
controls can lead to an average that mixes different types of treatment effects and lead
to summaries that are hard to interpret. Even the type of placebo can have an effect. In a
meta-analysis comparing different interventions for patients with osteoarthritis, Bannuru
et al. found that placebos given intravenously worked better than those given orally, thus
distorting the comparison between oral and intravenous treatments which were typi-
cally compared with different placebos (Bannuru et al., 2015). Regression analyses may be
needed when comparators differ in these ways (see Chapter 7).
The studies included in a review may report many different types of outcomes and the
choice of which to summarize can be daunting. Outcomes selected should be meaning-
ful and useful, based on sound scientifc principles. Some outcomes correspond to well-
defned events such as death or passing a test. Other outcomes are more subjective: amount
of depression, energy levels, or ability to do daily activities. Outcomes can be self-reported
or reported by a trained evaluator; they can be extracted from a registry or measured by an
instrument. Some are more important to the clinician, teacher, or policymaker; others are
more important to the research participant, patient, or student. Some are more completely
recorded than others; some are primary and some secondary; some relate to benefts, oth-
ers relate to harms. All of these variations affect the way in which analyses are carried out
and interpreted. They change the impact that review conclusions have on different stake-
holders and the degree of confdence they inspire. All of these considerations play a major
role in the choice of methods used for meta-analysis.
Reviews can summarize studies with many different types of designs. Studies may be
randomized or observational, parallel or crossover, cohort or case-control, prospective or
retrospective, longitudinal or cross-sectional, single or multi-site. Different techniques
are needed for each. Study quality can also vary. Not all randomized trials use proper
randomization techniques, appropriate allocation concealment, and double blinding.
Introduction to Systematic Review and Meta-Analysis 7
Not all studies use standardized protocols, appropriately follow up participants, record
reasons for withdrawal, and monitor compliance to treatment. All of these design differ-
ences among studies can introduce heterogeneity into a meta-analysis and require careful
consideration of whether the results will make sense when combined. Careful consider-
ation of the types and quality of studies to be synthesized in a review can help to either
limit this heterogeneity or expand the review’s generalizability, depending on the aims of
the review’s authors. In many cases, different sensitivity analyses will enable reviewers to
judge the impact of this heterogeneity on conclusions.
Accurate and comprehensive searches require knowledge of the structure of the data-
bases and the syntax needed to search them (e.g., Boolean combinations using AND, OR,
and NOT). Medline citations, for example, include the article’s title, authors, journal, publi-
cation date, language, publication type (e.g., article, letter), and 5–15 controlled vocabulary
Medical Subject Heading (MeSH) search terms (i.e., keywords) chosen from a structured
vocabulary that ensures uniformity and consistency of indexing and so greatly facili-
tates searching. The MeSH terms are divided into headings (e.g., disease category or body
region), specialized subheadings (e.g., diagnosis, therapy, epidemiology, human, animal),
publication types (e.g., journal article, randomized controlled trial), and a large list of sup-
plementary concepts related to the specifc article (e.g., type of intervention). Information
specialists like librarians trained in systematic review can help construct effcient algo-
rithms of keywords, headings, and subclassifcations that best use the search tools in order
to optimize sensitivity (not missing relevant items) and specifcity (not capturing irrel-
evant items) during database searches.
Searching is an iterative process. The scope of the search can vary greatly depending
on the topic and the questions asked, as well as on the time and manpower resources
available. Reviews must balance completeness of the search with the costs incurred. Each
database will require its own search strategy to take advantage of its unique features, but
general features will remain the same. Some generic search flters that have been devel-
oped for specifc types of searches (Glanville et al., 2006) can be easily modifed to provide
a convenient starting strategy for a specifc problem. A search that returns too many cita-
tions may indicate that the questions being asked are too broad and that the topic should
be reformulated in a more focused manner. Manual searches that identify items missed
by the database search may suggest improved strategies. It is important to document
the exact search strategy, including its date, and the disposition of each report identifed
including reasons for exclusion, so that the search and the fnal collection of documents
can be reproduced (Liberati et al., 2009).
solution to this problem by allowing the computer to check the human extractor (Jap
et al., 2019). Often, one screener is a subject matter expert and the other a methodolo-
gist so that both aspects of inclusion criteria are covered. Teams also often have a senior
researcher work with a junior member to ensure accuracy and for training purposes.
Screening is a tedious process and requires careful attention. Sometimes, it is possible
to screen using only a title, but in other cases, careful reading of the abstract is required.
Articles are often screened in batches to optimize effort. If duplicate screening is used,
the pair of screeners meet at the end to reconcile any differences. Once the initial screen-
ing of abstracts is completed, the articles identifed for further review are retrieved and
examined more closely in a second screening phase. Because sensitivity is so important,
reviewers screen abstracts conservatively and may end up retrieving many articles that
ultimately do not meet criteria.
and the intended estimand; high risk of bias studies may lead to large differences. Chapter
12 examines strategies for assessing and dealing with such bias, including analytic strate-
gies such as sensitivity analyses that can be used to assess the impact on conclusions.
It is important to bear in mind that study conduct and study reporting are different
things. A poor study report does not necessarily imply that the study was poorly done and
therefore has biased results. Conversely, a poorly done study may be reported well. In the
end, the report must provide suffcient information for its readers to be confdent enough
to know how to proceed to use it.
1.7 Analysis
Analysis of the extracted data can take many forms depending on the questions asked
and data collected but can be generally categorized as either qualitative or quantitative.
Systematic reviews may consist solely of qualitative assessments of individual studies
when insuffcient data are available for a full quantitative synthesis or they may involve
both qualitative and quantitative components.
A qualitative synthesis typically summarizes the scientifc and methodological char-
acteristics of the included studies (e.g., size, population, interventions, quality of execu-
tion); the strengths and limitations of their design and execution and the impact of these
on study conclusions; their relevance to the populations, comparisons, co-interventions,
settings, and outcomes or measures of interest defned by the research questions; and pat-
terns in the relationships between study characteristics and study fndings. Such qualita-
tive summaries help answer questions not amenable to statistical analysis.
Meta-analysis is the quantitative synthesis of information from a systematic review. It
employs statistical analyses to summarize outcomes across studies using either aggre-
gated summary data from trial reports (e.g., trial group summary statistics like means and
standard deviations) or complete data from individual participants. When comparing the
effectiveness and safety of treatments between groups of individuals that receive differ-
ent treatments or exposures, the meta-analysis summarizes the differences as treatment
effects where size and corresponding uncertainty estimates are expressed by standard
metrics that depend on the scale of the outcome measured such as continuous, categori-
cal, count, or time-to-event. Examples include differences in means of continuous out-
comes and differences in proportions for binary outcomes. When comparison between
treatments is not the object of the analysis, the summaries may take the form of means or
proportions of a single group (see Chapter 3 for discussion of the different types of effect
measures in meta-analysis).
Combining estimates across studies not only provides an overall estimate of how a
treatment is working in populations and subgroups but can overcome the lack of power
that leads many studies to non-statistically signifcant conclusions because of insuffcient
sample sizes. Meta-analysis also helps to explore heterogeneity of results across studies
and helps identify research gaps that can be flled by future studies. Synthesis can help
uncover differential treatment effects according to patient subgroups, form of intervention
delivery, study setting, and method of measuring outcomes. It can also detect bias that
may arise from poor research such as lack of blinding in randomized studies or failure to
follow all individuals enrolled in a study. As with any statistical analysis, it is also impor-
tant to assess the sensitivity of conclusions to changes in the protocol, study selection, and
12 Handbook of Meta-Analysis
analytic assumptions. Findings that are sensitive to small changes in these elements are
less trustworthy.
The extent to which a meta-analysis captures the truth about treatment effects depends
on how accurately the studies included represent the populations and settings for which
inferences must be made. Research gaps represent studies that need to be done. Publication
bias and reporting bias relate to studies that have been done but that have been incom-
pletely reported. Publication bias refers to the incorrect estimation of a summary treat-
ment effect from the loss of information resulting from studies that are not published
because they had uninteresting, negative, or non-statistically signifcant fndings. Failure
to include such studies in a review leads to an overly optimistic view of treatment effects,
biased toward positive results (Dickersin, 2005).
Reporting bias arises when studies report only a subset of their fndings, often a subset
of all outcomes examined (Schmid, 2017). Bias is not introduced if the studies fail to collect
the outcomes for reasons unrelated to the outcome values. For example, a study on blood
pressure treatments may collect data on cardiovascular outcomes, but not on kidney out-
comes. A subsequent meta-analysis of the effect of blood pressure treatments on kidney
outcomes can omit those studies without issues of bias. However, if the outcomes were
collected but were not reported because they were negative, then bias will be introduced if
the analysis omits those studies without adjusting for the missing outcomes. Adjustment
for selective outcome reporting is diffcult because the missing data mechanism is usually
not known and thus hard to incorporate into analysis. Comparing study protocols with
published reports can often detect potential reporting bias. Sometimes, authors have good
reasons for reporting only some outcomes as when the full report is too long to publish or
outcomes relate to different concepts that might be reported in different publications. In
other cases, it is useful to contact authors to fnd out why outcomes were unreported and
whether these results were negative. Chapter 13 discusses statistical and non-statistical
methods for handling publication and reporting bias.
1.8 Reporting
Generation of a report summarizing the fndings of the meta-analysis is the fnal step in
the systematic review process. The most important part of the report contains its conclu-
sions regarding the evidence found to answer the review’s research questions. In addition
to stating whether the evidence does or does not favor the research hypotheses, the report
needs to assess the strength of that evidence in order that proper decisions be drawn from
the review (Agency for Healthcare Research and Quality). Strength of evidence involves
both the size of the effect found and the confdence in the stability and validity of that
effect. A meta-analysis that fnds a large effect based on studies of low quality is weaker
than one that fnds a small effect based on studies of high quality. Likewise, an effect that
disappears with small changes in model assumptions or that is sensitive to leaving out one
study is not very reliable. Thus, a report must summarize not only the analyses leading to a
summary effect estimate, but also the analyses assessing study quality and their potential
for bias. Randomized studies have more internal validity than observational studies; stud-
ies that ignore dropouts are more biased than studies that perform proper missing data
adjustments; analyses that only include studies with individual participant data available
and that ignore the results from known studies with only summary data available may be
Introduction to Systematic Review and Meta-Analysis 13
both ineffcient and biased. It is important to bear in mind that study conduct and study
reporting are different things. A poor report does not necessarily imply that the study
was poorly done and therefore has biased results. Conversely, a poorly done study may be
reported well. In the end, the report must provide suffcient information for its readers to
be confdent enough to know how to proceed to use it.
The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA)
statement (Moher et al., 2009) is the primary guideline to follow for reporting a meta-anal-
ysis of randomized studies. It lists a set of 27 items that should be included in each report.
These include wording for the title and elements in the introduction, methods, results, and
discussion. Slightly different items are needed for observational studies and these can be
found in the Meta-analysis of Observational Studies in Epidemiology (MOOSE) statement
(Stroup et al., 2000). Modifed guidelines have been written for particular types of meta-
analyses such as those for individual participant data (Stewart et al., 2015), networks of
treatments (Hutton et al., 2015), diagnostic tests (Bossuyt et al., 2003), and N-of-1 studies
(Vohra et al., 2015; Shamseer et al., 2015). These are now required by most journals that
publish meta-analyses. Later chapters discuss these and other types of meta-analyses and
explain the need for these additional items.
the public and quoted frequently in the popular press. They speak to the growing infuence
of systematic reviews which are highly referenced in the scientifc literature.
In the United States and Canada, the Agency for Healthcare Research and Quality
(AHRQ) has supported 10–15 Evidence-Based Practice Centers since 1997. These centers
carry out large reviews of questions nominated by stakeholders and refned through a
consensus process. Stakeholders for AHRQ reports include clinical societies, payers, the
United States Congress, and consumers. Like NICE and the Cochrane Collaboration,
AHRQ has published guidance documents for its review teams that emphasize not only
review methods, but review processes and necessary components of fnal reports in order
to ensure uniformity and adherence to standards (AHRQ, 2008).
Systematic reviews also serve an important role in planning future research. A review
often identifes areas where further research is needed either because the overall evidence
is inconclusive or because uncertainty remains about outcomes in certain circumstances
such as in specifc subpopulations, settings, or under treatment variations. Ideally, deci-
sion models should incorporate systematic review evidence and be able to identify which
new studies would best inform the model. Systematic review results can also provide
important sources for inputs such as effect sizes and variances needed in sample size cal-
culations. These can be explicitly incorporated into calculations in a Bayesian framework
(Sutton et al., 2007; Schmid et al., 2004). Chapter 23 discusses these issues.
1.10 Summary
Systematic reviews have become a standard approach for summarizing the existing sci-
entifc evidence in many different felds. They rely on a set of techniques for framing the
proper question, identifying the relevant studies, extracting the relevant information from
those studies, and synthesizing that information into a report that interprets fndings for
its audience. This chapter has summarized the basic principles of these steps as back-
ground for the main focus of this book which is on the statistical analysis of data using
meta-analysis. Readers interested in further information about the non-statistical aspects
of systematic reviews are urged to consult the many excellent references on these essential
preliminaries. The guidance documents referenced in this chapter provide a good starting
point and contain many more pointers in their bibliographies. Kohl et al. (2018) provide a
detailed comparison of computerized systems to aid in the review process.
References
Agency for Healthcare Research and Quality, 2008-. Methods Guide for Effectiveness and Comparative
Effectiveness Reviews [Internet]. Rockville, MD: Agency for Healthcare Research and
Quality (US). https://www.ahrq.gov/research/fndings/evidence-based-reports/technical/
methodology/index.html and https://www.ncbi.nlm.nih.gov/books/NBK47095.
Anderson LM, Petticrew M, Rehfuess E, Armstrong R, Ueffng E, Baker P, Francis D and Tugwell
P, 2011. Using logic models to capture complexity in systematic reviews. Research Synthesis
Methods 2(1): 33–42.
Introduction to Systematic Review and Meta-Analysis 15
Antman EM, Lau J, Kupelnick B, Mosteller F and Chalmers TC, 1992. A comparison of results of meta-
analyses of randomized control trials and recommendations of clinical experts. Treatments for
myocardial infarction. JAMA 268(2): 240–248.
Balk EM, Adam GP, Langberg V, Halladay C, Chung M, Lin L, Robertson S, Yip A, Steele D, Smith
BT, Lau J, Lichtenstein AH and Trikalinos TA, 2016. Omega-3 Fatty Acids and Cardiovascular
Disease: An Updated Systematic Review. Evidence Report/Technology Assessment No. 223.
(Prepared by the Brown Evidence-based Practice Center under Contract No. 290-2012-00012-
I.) AHRQ Publication No. 16-E002-EF. Rockville, MD: Agency for Healthcare Research and
Quality, August 2016. www.effectivehealthcare.ahrq.gov/reports/fnal.cfm.
Balshem H, Stevens A, Ansari M, Norris S, Kansagara D, Shamliyan T, Chou R, Chung M,
Moher D and Dickersin K, 2013. Finding Grey Literature Evidence and Assessing for Outcome
and Analysis Reporting Biases When Comparing Medical Interventions: AHRQ and the Effective
Health Care Program. Methods Guide for Comparative Effectiveness Reviews. AHRQ Publication
No. 13(14)-EHC096-EF. Rockville, MD: Agency for Healthcare Research and Quality. www.
effectivehealthcare.ahrq.gov/reports/fnal.cfm.
Bannuru RR, Schmid CH, Kent D, Wong J and McAlindon T, 2015. Comparative effectiveness of
pharmacological interventions for knee osteoarthritis: A systematic review and network meta-
analysis. Annals of Internal Medicine 162(1): 46–54.
Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, Lijmer JG, Moher D, Rennie
D and de Vet HC, 2003. Towards complete and accurate reporting of studies of diagnostic accu-
racy: The STARD initiative. Clinical Chemistry 49(1): 1–6.
Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy. https://methods.cochrane.
org/sdt/handbook-dta-reviews.
Committee on Standards for Systematic Review of Comparative Effectiveness Research. 2011. In
Eden J, Levit L, Berg A and Morton S (Eds). Finding What Works in Health Care: Standards for
Systematic Reviews. Washington, DC: Institute of Medicine of the National Academies.
Deo A, Schmid CH, Earley A, Lau J and Uhlig K, 2011. Loss to analysis in randomized controlled tri-
als in chronic kidney disease. American Journal of Kidney Diseases 58(3): 349–355.
Dickersin K, 2005. Publication bias: recognizing the problem, understanding its origins and scope,
and preventing harm. In Rothstein HR, Sutton AJ and Borenstein M (Eds). Publication Bias in
Meta-Analysis: Prevention, Assessment and Adjustments. John Wiley & Sons.
Glanville JM, Lefebvre C, Miles JN and Camosso-Stefnovic J, 2006. How to identify randomized
controlled trials in Medline: Ten years on. Journal of the Medical Library Association 94(2):
130–136.
Glass GV, 1976. Primary, secondary and meta-analysis of research. Educational Researcher 5(10): 3–8.
Gurevitch J, Morrow LL, Wallace A and Walsh JS, 1992. A meta-analysis of competition in feld exper-
iments. The American Naturalist 140(4): 539–572.
Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors), 2019. Cochrane
Handbook for Systematic Reviews of Interventions. 2nd Edition. Chichester, UK: John Wiley &
Sons. Also online version 6.0 (updated July 2019) available from www.training.cochrane.org
/handbook.
Hutton T, Salanti G, Caldwell DM, Chaimani A, Schmid CH, Cameron C, Ioannidis JPA, Straus S,
Shing LK, Thorlund K, Jansen J, Mulrow C, Catala-Lopez F, Gotzsche PC, Dickersin K, Altman
D and Moher D, 2015. The PRISMA extension statement for reporting of systematic reviews
incorporating network meta-analyses of healthcare interventions: Checklist and explanations.
Annals of Internal Medicine 162(11): 777–784.
Ip S, Hadar N, Keefe S, Parkin C, Iovin R, Balk EM and Lau J, 2012. A Web-based archive of system-
atic review data. Systematic Reviews 1: 15. https://srdr.ahrq.gov/.
Jap J, Saldanha I, Smith BT, Lau J, Schmid CH, Li T on behalf of the Data Abstraction Assistant
Investigators, 2019. Features and functioning of Data Abstraction Assistant, a software applica-
tion for data abstraction during systematic reviews. Research Synthesis Methods 10(1): 2–14.
Järvinen A, 1991. A meta-analytic study of the effects of female age on laying- date and clutch-size in
the great tit Parus major and the pied fycatcher Ficedula hypoleuca. Ibis 133(1): 62–67.
16 Handbook of Meta-Analysis
Jüni P, Holenstein F, Sterne J, Bartlett C and Egger M, 2002. Direction and impact of language bias in
meta-analyses of controlled trials: Empirical study. International Journal of Epidemiology 31(1):
115–123.
Kohl C, McIntosh EJ, Unger S, Haddaway NR, Kecke S, Schiemann J and Wilhelm R, 2018. Online
tools supporting the conduct and reporting of systematic reviews and systematic maps: A case
study on CADIMA and review of existing tools. Environmental Evidence 7(1): 8.
Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gøtzsche PC, Ioannidis JP, Clarke M, Devereaux PJ,
Kleijnen J and Moher D, 2009. The PRISMA statement for reporting systematic reviews and meta-
analyses of studies that evaluate health care interventions: Explanation and elaboration.Annals
of Internal Medicine 151(4): W65–94.
Marshall IJ, Kuiper J and Wallace BC, 2016. RobotReviewer: Evaluation of a system for automatically
assessing bias in clinical trials. Journal of the American Medical Informatics Association JAMIA
23(1): 193–201.
Marshall IJ, Noel-Storr A, Kuiper J, Thomas J and Wallace BC, 2018. Machine learning for identify-
ing randomized controlled trials: An evaluation and practitioner’s guide. Research Synthesis
Methods 9(4): 602–614.
Moher D, Liberati A, Tetzlaff J, Altman DG and the PRISMA Group, 2009. Preferred reporting items
for systematic reviews and meta-analyses: The PRISMA statement. BMJ 339: b2535.
Moher D, Shamseer L, Clarke M, Ghersi D, Liberati A, Petticrew M, Shekelle P, Stewart LA and the
PRISMA-P Group, 2015. Preferred reporting items for systematic review and meta-analysis
protocols (PRISMA-P) 2015 statement. Systematic Reviews 4: 1.
Morrison A, Polisena J, Husereau D, Moulton K, Clark M, Fiander M, Mierzwinski-Urban M,
Clifford T, Hutton B and Rabb D, 2012. The effect of English-language restriction on systematic
review-based meta-analyses: A systematic review of empirical studies. International Journal of
Technology Assessment in Health Care 28(2): 138–144.
Pham B, Klassen TP, Lawson ML and Moher D, 2005. Language of publication restrictions in system-
atic reviews gave different results depending on whether the intervention was conventional
or complementary. Journal of Clinical Epidemiology 58(8): 769–776. Erratum Journal of Clinical
Epidemiology, 2006 59(2): 216.
Schmid CH, 2017. Outcome reporting bias: A pervasive problem in published meta-analyses.
American Journal of Kidney Diseases 69(2): 172–174.
Schmid CH, Cappelleri JC and Lau J, 2004. Bayesian methods to improve sample size approxima-
tions. In Johnson ML and Brand L (Eds). Methods in Enzymology Volume 383: Numerical Computer
Methods Part D. New York: Elsevier, 406–427.
Schulz KF, Chalmers I, Hayes RJ and Altman DG, 1995. Empirical evidence of bias. Dimensions of
methodological quality associated with estimates of treatment effects in controlled trials. JAMA
273(5): 408–412.
Shadish WR, Brasil ICC, Illingworth DA, White KD, Galindo R, Nagler ED and Rindskopf DM, 2009.
Using UnGraph to extract data from image fles: Verifcation of reliability and validity. Behavior
Research Methods 41(1): 177–183.
Shadish WR and Lecy JD, 2015.The meta-analytic big bang. Research Synthesis Methods 6(3): 246–264.
Shamseer L, Sampson M, Bukutu C, Schmid CH, Nikles J, Tate R, Johnson BC, Zucker DR, Shadish
W, Kravitz R, Guyatt G, Altman DG, Moher D, Vohra S and the CENT Group, 2015. CONSORT
extension for N-of-1 trials (CENT): Explanation and elaboration. BMJ 350: h1793.
Sterne JA, Hernán MA, Reeves BC, Savović J, Berkman ND, Viswanathan M, Henry D, Altman DG,
Ansari MT, Boutron I, Carpenter JR, Chan AW, Churchill R, Deeks JJ, Hróbjartsson A, Kirkham
J, Jüni P, Loke YK, Pigott TD, Ramsay CR, Regidor D, Rothstein HR, Sandhu L, Santaguida PL,
Schünemann HJ, Shea B, Shrier I, Tugwell P, Turner L, Valentine JC, Waddington H, Waters E,
Wells GA, Whiting PF and Higgins JP, 2016. Robins-I: A tool for assessing risk of bias in non-
randomised studies of interventions. BMJ 355: i4919.
Stewart LA, Clarke M, Rovers M, Riley RD, Simmonds M, Stewart G, Tierney JF and the PRISMA-
IPD Development Group, 2015. Preferred reporting items for systematic review and meta-anal-
yses of individual participant data: The PRISMA-IPD Statement. JAMA 313(16): 1657–1665.
Introduction to Systematic Review and Meta-Analysis 17
Stroup DF, Berlin JA, Morton SC, Olkin I, Williamson GD, Rennie D, Moher D, Becker BJ, Sipe TA
and Thacker SB, 2000. Meta-analysis of observational studies in epidemiology: A proposal for
reporting. JAMA 283(15): 2008–2012.
Sutton AJ, Cooper NJ, Jones DR, Lambert PC, Thompson JR and Abrams KR, 2007. Evidence-based
sample size calculations based upon updated meta-analysis. Statistics in Medicine 26(12):
2479–2500.
Vohra S, Shamseer L, Sampson M, Bukutu C, Schmid CH, Tate R, Nikles J, Zucker DR, Kravitz R,
Guyatt G, Altman DG, Moher D and the CENT Group, 2015. CONSORT statement: An exten-
sion for N-of-1 trials (CENT). BMJ 350: h1738.
Wallace BC, Dahabreh IJ, Schmid CH, Lau J and Trikalinos TA, 2013. Modernizing the systematic
review process to inform comparative effectiveness: Tools and methods. Journal of Comparative
Effectiveness Research 2(3): 273–282.
Wallace BC, Small K, Brodley CE, Lau J and Trikalinos TA, 2012. Deploying an interactive machine
learning system in an evidence-based practice center: Abstrackr. In Proceedings of the 2nd ACM
SIGHIT International Health Informatics Symposium, Miami, Florida, 28–30 January 2012. New
York Association for Computing Machinery, 819–824.
Wallace BC, Trikalinos TA, Lau J, Brodley C and Schmid CH, 2010. Semi-automated screening of
biomedical citations for systematic reviews. BMC Bioinformatics 11: 55.
Wang C, De Pablo P, Chen X, Schmid C and McAlindon T, 2008. Acupuncture for pain relief in
patients with rheumatoid arthritis: A systematic review. Arthritis and Rheumatism (Arthritis Care
and Research) 59(9): 1249–1256.
Whiting P, Savović J, Higgins JP, Caldwell DM, Reeves BC, Shea B, Davies P, Kleijnen J, Churchill
R and ROBIS group, 2016. ROBIS: A new tool to assess risk of bias in systematic reviews was
developed. Journal of Clinical Epidemiology 69: 225–234.
Yusuf S, Peto R, Lewis J, Collins R and Sleight P, 1985. Beta blockade during and after myocar-
dial infarction: An overview of the randomized trials. Progress in Cardiovascular Diseases 27(5):
335–371.
2
General Themes in Meta-Analysis
CONTENTS
2.1 Introduction........................................................................................................................... 19
2.2 Data Structures...................................................................................................................... 20
2.2.1 Study-Level Data....................................................................................................... 20
2.2.2 Individual Participant Data..................................................................................... 21
2.2.3 Randomized versus Observational Data.............................................................. 21
2.2.4 Multivariate Data......................................................................................................22
2.3 Models....................................................................................................................................22
2.3.1 Homogeneous or Heterogeneous Effects.............................................................. 23
2.3.2 One- and Two-Stage Models................................................................................... 24
2.3.3 Fixed or Random Study Effects.............................................................................. 24
2.3.4 Model Checking........................................................................................................ 25
2.3.5 Bayesian or Frequentist............................................................................................ 25
2.4 Conclusion............................................................................................................................. 25
References........................................................................................................................................ 26
2.1 Introduction
Chapter 1 reviewed the parts of a systematic review and noted that they often include a
quantitative synthesis or meta-analysis, when sufficient data are available. Meta-analysis
uses statistical methods to combine data across studies in order to estimate parameters of
interest. In general, meta-analysis is used to address four types of questions. The first type
is descriptive, summarizing some characteristic of a distribution such as the prevalence of
a disease, the mean of a population characteristic, or the sensitivity of a diagnostic test. The
second type of question is comparative: how does one treatment compare with another in
terms of reducing the risk of a stroke; does a new method of teaching improve student
test scores compared with the current method; or does exposure to warmer water change
the number of fish caught? Some of these questions involve specific interventions, other
relate to prevalent exposures and others are related to diagnostic tests. We will use the
general term “treatment” to apply to all of them unless a specific need arises to differenti-
ate them. This comparative type is the most common. A third type of question involves
non-comparative associations such as correlations between outcomes or the structure of
an underlying pathway (Chapter 16) and associations between variables in a regression
model (Chapter 18). A fourth type of question involves developing a prognostic or predic-
tive model for an outcome. Frequently different studies report different models or parts
of models that involve predictive factors. Chapter 22 explores methods for combining
DOI: 10.1201/9781315119403-2 19
20 Handbook of Meta-Analysis
such data. Typically, meta-analysis estimates the size and uncertainty of the parameters of
interest expressed by standard metrics that depend on the scale of the outcome. Chapter 3
discusses these metrics in detail for different types of outcomes.
Using meta-analysis to combine information across studies offers a variety of benefts.
It provides an estimate of the average size of a characteristic of a population or of the
effectiveness or harm of a treatment (exposure) as well as a sense of the variation of these
quantities across different study settings. To the extent that the variation is not large or
can be understood, meta-analysis can increase the generalizability of the research fndings
and determine their effects in subgroups. By combining small studies, meta-analysis can
also increase the precision with which key parameters are estimated and help to explain
inconsistent results that arise when underpowered studies report non-statistically signif-
cant conclusions because of insuffcient sample sizes. Meta-analysis can also focus atten-
tion on discrepancies in study fndings that might argue against combining their results or
might argue for more subtle interpretations of parameters whose true values might vary
with characteristics of the populations studied or with the manner in which interventions
are undertaken. In certain cases, exploring the causes of such heterogeneity can lead to
important conclusions in their own right or might point to the need for further studies to
fll in research gaps. Integration of meta-analysis with study of risk of bias (Chapter 12)
can also pinpoint weaknesses in the data and evaluate the sensitivity of the conclusions to
poor study processes.
This chapter introduces the general themes that motivate the methods used for carrying
out meta-analysis and provides a map with which the reader can navigate the rest of the
book. Meta-analysis comes in a variety of favors motivated by the types of data available,
the research questions to be answered and the inferences desired. Section 2.2 discusses var-
ious types of data that may be available and what they imply about the types of questions
that may be asked. Section 2.3 explores the types of models that may be ft with these data.
Before embarking on this overview, however, it is important to bear in mind that meta-
analysis is not appropriate for all systematic reviews. Reviews with few studies, or hav-
ing studies with dissimilar outcomes or outcome scales, studying diverse interventions or
interventions that may have evolved over the course of time covered by the review, or that
combine different study designs (e.g., observational vs. experimental) may not ft into a
single statistical framework and may be better handled qualitatively or by separating into
separate meta-analyses.
and standard deviation (for continuous outcomes). Additional characteristics of the par-
ticipants in each group (e.g., the number who were female, their average age, what propor-
tion went to school, or who had a certain type of disease) or of the treatments or exposures
which they had are also recorded for descriptive purposes and perhaps to evaluate rea-
sons why treatment effects might differ between studies.
If the meta-analysis is focused on a comparison of two treatment groups, the key statistic
will often be the difference in their means or in a function of their means (e.g., the log odds
ratio is a difference between the logits of two proportions). Representing the group mean
for group j of study i as yij , the study treatment effect is yi = yi2 − yi1. Studies often only report
this mean contrast and its resulting standard error si. Sometimes the standard errors will
need to be backcalculated from a reported confdence interval as discussed in Chapter 3.
In rare cases, studies may also report the treatment effects within subgroups such as
male and female and one can investigate treatment by covariate interactions. More com-
monly, results may be reported by dose levels (see Chapter 18). Usually, though, effects
within subgroups are unavailable and heterogeneity of treatment effects across studies can
only be studied through the correlation of the study effects with a summary measure of
the subgroup in each study such as the proportion of females or the average age of partici-
pants. Within-study heterogeneity due to factors such as participant age or education level
that vary within-study by participant cannot be investigated at all. Chapter 7 discusses
meta-regression methods for assessing heterogeneity that test for interactions of treatment
with study-level risk factors such as design characteristics that apply to all individuals in a
given study or to summary measures like average age of a study’s participants.
treatment comparisons, with distributions of all potential measured and unmeasured con-
founders balanced by treatment group on average. If the studies are free of bias, each of
their estimates of treatment effect yi provides an independent contribution to the meta-
analytic average, qˆ. Many scientifc research studies compare exposures that cannot be
randomized either practically or ethically, though. When studies are not randomized,
meta-analysis is potentially complicated by confounding. To remove this type of bias,
many studies will report estimates adjusted for potential confounders using either multi-
variable regression or special methods such as propensity scores or matching. Because the
meta-analyst must rely on these reported study analyses unless IPD are available, meta-
analysis of non-randomized studies can be biased if the studies do not properly adjust for
confounders or if they adjust in different ways. Determining whether appropriate adjust-
ment has taken place can be diffcult to determine from a published report alone since it is
not always clear which factors were considered for adjustment and whether those missing
were unavailable or rejected as not signifcant. Chapter 21 discusses this issue.
2.3 Models
The observed group-level and study-level summaries, yij and yi , respectively, depend on
parameters of the data-generating process. We can construct a model of this measure-
ment process by assuming that each observed effect is a realization of a stochastic process
General Themes in Meta-Analysis 23
with group-level means θij or contrast-level means θi and associated variances σij and σi.
Typically, θi = θi2 − θi1. Although the data may be collected at the group-level, the mean
effect θ and perhaps the individual study effects θi are the parameters of interest in a meta-
analysis focused on comparing two groups; θ represents the best single number summary
of the treatment effects in the group of studies. It is informed by all of the studies together.
But sometimes it may be helpful to determine whether the estimate of the treatment effect,
θi in a single study, particularly a small one, could be improved by information from other
studies similar to it. We will fnd that certain models can do this.
the treatment effect θi in a given study estimated by a weighted average of the estimated
effect yi in that study and the overall mean θ from all the studies. This is commonly termed
a shrinkage estimate and refects the belief that each study is informed by the other studies.
The weights correspond to the relative within- and between-study variability. While we say
little more about the separate-effects model, common-effect and random-effects models are
discussed throughout the book and in particular in Chapters 4 and 5.
estimated, their effects can leak into each other. This is a basic consequence of parameter
shrinkage from multilevel models. As an example, a study with a high observed rate of
events in the control group will tend to have an estimated true study control risk rate
shrunk to the average control risk. This lowering of the control risk in the study will tend
to make the estimated study treatment effect (difference between treated and control) big-
ger and may therefore change the overall treatment effect estimate.
To avoid leakage, one could use a second model that treats each γi as a fxed effect that applies
only to that study and has no relationship to any other study. Assuming that one has correctly
estimated the γi , one might hope that the estimate of the θi and of θ are independent of the γi
and so one has solved the leakage problem. However, this model introduces a new problem
because the number of γi parameters is directly associated with the number of studies in the
meta-analysis, so that the number of parameters is now increasing at the same rate as the
number of data points. This has some poor theoretical properties as discussed in Chapter 5.
Whether to treat control risk parameters as fxed or random remains controversial in the lit-
erature (Jackson et al., 2018). One side argues, particularly when combining randomized stud-
ies, that the parameters must be treated as fxed effects in order to not potentially change the
treatment effects estimates which are unbiased in each study as a result of the randomization.
The other side argues that the asymptotic properties of the maximum likelihood estimates of
treatment effect (and perhaps also of Bayesian estimates under a non-informative prior) are
compromised when the number of parameters is increasing at the same rate as the number of
studies. If one wants to make generalized inferences to larger populations, however, a com-
plete model of all the parameters using a random-effect formulation is needed (see Chapter 5).
2.4 Conclusion
This chapter has summarized the basic data structures and models encountered in meta-
analysis. As indicated, many specifc problems implement specifc variations of these
26 Handbook of Meta-Analysis
structures and models. Before proceeding to discussing models in more detail, we discuss
in Chapter 3 how to choose an appropriate effect measure and how to extract the data nec-
essary for analyzing these effect measures.
References
Jackson D and White IR, 2018. When should meta-analysis avoid making hidden normality assump-
tions? Biometrical Journal 60(6): 1040–1058.
Jackson D, Law M, Stijnen T, Viechtbauer W and White IR, 2018. A comparison of 7 random-effects
models for meta-analyses that estimate the summary odds ratio. Statistics in Medicine 37(7):
1059–1085.
Jackson D, Riley R and White IR, 2011. Multivariate meta-analysis: Potential and promise (with dis-
cussion). Statistics in Medicine 30(20): 2481–2510.
Laird NM and Mosteller F, 1990 Some statistical methods for combining experimental results.
International Journal of Technology Assessment in Health Care 6(1): 5–30.
Rice K, Higgins JPT and Lumley T, 2017. A re-evaluation of fxed effect(s) meta-analysis. Journal of the
Royal Statistical Society: Series A 181(1): 205–227.
Riley RD, Thompson JR and Abrams KR, 2008. An alternative model for bivariate random-effects
meta-analysis when the within-study correlations are unknown. Biostatistics 9(1): 172–186.
Schmid CH, Landa M, Jafar TH, Giatras I, Karim T, Reddy M, Stark PC Levey AS and Angiotensin-
Converting Enzyme Inhibition in Progressive Renal Disease (AIPRD) Study Group, 2003.
Constructing a database of individual clinical trials for longitudinal analysis. Controlled Clinical
Trials 24(3): 324–340.
Stewart LA, Clarke MJ on behalf of the Cochrane Working Group on Meta-Analysis Using Individual
Patient Data, 1995. Practical methodology of meta-analyses (overviews) using updated indi-
vidual patient data. Statistics in Medicine 14(19): 2057–2079.
Trikalinos TA, Hoaglin DC and Schmid CH, 2014. An empirical comparison of univariate and multi-
variate meta-analyses for categorical outcomes. Statistics in Medicine 33(9): 1441–1459.