0% found this document useful (0 votes)
73 views26 pages

Cap 1 y 2

The document provides a comprehensive overview of systematic reviews and meta-analyses, emphasizing their importance in synthesizing scientific knowledge across various fields, particularly healthcare. It outlines the systematic review process, which includes topic preparation, literature search, study screening, data extraction, analysis, and reporting, while highlighting the need for a multidisciplinary approach and adherence to established protocols. Key differences between narrative and systematic reviews are also discussed, underscoring the rigor and objectivity of systematic reviews in informing evidence-based practices.

Uploaded by

Zaray
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views26 pages

Cap 1 y 2

The document provides a comprehensive overview of systematic reviews and meta-analyses, emphasizing their importance in synthesizing scientific knowledge across various fields, particularly healthcare. It outlines the systematic review process, which includes topic preparation, literature search, study screening, data extraction, analysis, and reporting, while highlighting the need for a multidisciplinary approach and adherence to established protocols. Key differences between narrative and systematic reviews are also discussed, underscoring the rigor and objectivity of systematic reviews in informing evidence-based practices.

Uploaded by

Zaray
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

1

Introduction to Systematic Review


and Meta-Analysis

Christopher H. Schmid, Ian R. White, and Theo Stijnen

CONTENTS
1.1 Introduction............................................................................................................................. 1
1.2 Topic Preparation.................................................................................................................... 3
1.3 Literature Search.....................................................................................................................7
1.4 Study Screening...................................................................................................................... 8
1.5 Data Extraction........................................................................................................................ 9
1.6 Critical Appraisal of Study and Assessment of Risk of Bias.......................................... 10
1.7 Analysis.................................................................................................................................. 11
1.8 Reporting............................................................................................................................... 12
1.9 Using a Systematic Review.................................................................................................. 13
1.10 Summary................................................................................................................................ 14
References........................................................................................................................................ 14

1.1 Introduction
The growth of science depends on accumulating knowledge building on the past work of
others. In health and medicine, such knowledge translates into developing treatments for
diseases and determining the risks of exposures to harmful substances or environments.
Other disciplines benefit from new research that finds better ways to teach students, more
effective ways to rehabilitate criminals, and better ways to protect fragile environments.
Because the effects of treatments and exposures often vary with the conditions under
which they are evaluated, multiple studies are usually required to ascertain their true
extent. As the pace of scientific development quickens and the amount of information in
the literature continues to explode (for example, about 500,000 new articles are added to the
National Library of Medicine’s PubMed database each year), scientists struggle to keep up
with the latest research and recommended practices. It is impossible to read all the stud-
ies in even a specialized subfield and even more difficult to reconcile the often-conflicting
messages that they present. Traditionally, practitioners relied on experts to summarize the
literature and make recommendations in articles that became known as narrative reviews.
Over time, however, researchers began to investigate the accuracy of such review arti-
cles and found that the evidence often did not support the recommendations (Antman et
al., 1992). They began to advocate a more scientific approach to such reviews that did not
rely on one expert’s idiosyncratic review and subjective opinion. This approach required

DOI: 10.1201/9781315119403-1 1
2 Handbook of Meta-Analysis

documented evidence to back claims and a systematic process carried out by a multidisci-
plinary team to ensure that all the evidence was reviewed.
This process is now called a systematic review, especially in the healthcare literature.
Systematic reviews use a scientifc approach that carefully searches for and reviews
all evidence using accepted and pre-specifed analytic techniques (Committee on
Standards, 2011). A systematic review encompasses a structured search of the literature
in order to combine information across studies using a defned protocol to answer a
focused research question. The process seeks to fnd and use all available evidence,
both published and unpublished, evaluate it carefully and summarize it objectively to
reach defensible recommendations. The synthesis may be qualitative or quantitative,
but the key feature is its adherence to a set of rules that enable it to be replicated. The
widespread acceptance of systematic reviews has led to a revolution in the way prac-
tices are evaluated and practitioners get information on which interventions to apply.
Table 1.1 outlines some of the fundamental differences between narrative reviews and
systematic reviews.
Systematic reviews are now common in many scientifc areas. The modern systematic
review originated in psychology in a 1976 paper by Gene Glass that quantitatively sum-
marized all the studies evaluating the effectiveness of psychotherapy (Glass, 1976). Glass
called the technique meta-analysis and the method quickly spread into diverse felds such
as education, criminal justice, industrial organization, and economics (Shadish and Lecy,
2015). It also eventually reached the physical and life sciences, particularly policy-intensive
areas like ecology (Järvinen, 1991; Gurevitch et al., 1992). It entered the medical literature
in the 1980s with one of the earliest infuential papers being a review of the effectiveness
of beta blockers for patients suffering heart attacks (Yusuf et al., 1985) and soon grew very
popular. But over time, especially in healthcare, the term meta-analysis came to refer pri-
marily to the quantitative analysis of the data from a systematic review. In other words,
systematic reviews without a quantitative analysis in health studies are not called a meta-
analysis, although this distinction is not yet frmly established in other felds. We will
maintain the distinct terms in this book, however, using meta-analysis to refer to the statis-
tical analysis of the data collected in a systematic review. Before exploring the techniques
available for meta-analysis in the following chapters, it will be useful frst to discuss the
parts of the systematic review process in this chapter. This will enable us to understand

TABLE 1.1
Key Differences between Narrative and Systematic Review
Narrative review Systematic review

Broad overview of topic Focus on well-formulated questions


Content experts Multidisciplinary team
Not guided by a protocol A priori defned protocol
No systematic literature search Comprehensive, reproducible literature search
Unspecifed selection of studies Study selection by eligibility criteria
No critical appraisal of studies Quality assessment of individual studies
Formal quantitative synthesis unlikely Meta-analysis often performed when data available
Conclusions based on opinion Conclusions follow analytic plan and protocol
Direction for future research rarely given States gaps in current evidence
Introduction to Systematic Review and Meta-Analysis 3

FIGURE 1.1
Systematic review process.

the sources of the data and how the nature of those sources affects the subsequent analysis
of the data and interpretation of the results.
Systematic reviews generally involve six major components: topic preparation, literature
search, study screening, data extraction, analysis, and preparation of a report (Figure 1.1).
Each involves multiple steps and a well-conducted review should carefully attend to all
of them (Wallace et al., 2013). The entire process is an extended one and a large, funded
review may take over a year and cost hundreds of thousands of dollars. Fortunately,
several organizations have written standards and manuals describing proper ways
to carry out a review. Excellent references are the Institute of Medicine’s Standards for
Systematic Reviews of Comparative Effectiveness Research (Committee on Standards, 2011),
the Cochrane Collaboration’s Cochrane Handbook for Systematic Reviews of Interventions
(Higgins et al., 2019) and Handbook for Diagnostic Test Accuracy Reviews (Cochrane, https
://methods.cochrane.org/sdt/handbook-dta-reviews), and the Agency for Healthcare
Research and Quality (AHRQ) Methods Guide for Effectiveness and Comparative Effectiveness
Reviews (Agency for Healthcare Research and Quality, 2014). We briefy describe each
component and reference additional sources for readers wanting more detail. Since the
process is most fully developed and codifed in health areas, we will discuss the process
in that area. However, translating the ideas and techniques into any scientifc feld is
straightforward.

1.2 Topic Preparation


The Institute of Medicine’s Standards for Systematic Review (Committee on Standards, 2011)
lists four steps to take when preparing a topic: establishing a review team, consulting with
stakeholders, formulating the review topic, and writing a review protocol.
The review team should have appropriate expertise to carry out all phases of the review.
This includes not only statisticians and systematic review experts, but librarians, science
writers, and a wide array of experts in various aspects of the subject matter (e.g., clinicians,
nurses, social workers, epidemiologists).
Next, for both the scientifc validity and the impact of the review, the research team
must consult with and involve the review’s stakeholders, those individuals to whom the
endeavor is most important and who will be the primary users of the review’s conclusions.
4 Handbook of Meta-Analysis

Stakeholders may include patients, clinicians, caregivers, policy makers, insurance com-
panies, product manufacturers, and regulators. Each of these groups of individuals will
bring different perspectives to ensure that the review answers the most important ques-
tions. The use of patient-reported outcomes provides an excellent example of the change
in focus brought about by involvement of all stakeholders. Many older studies and meta-
analyses focused only on laboratory measurements or clinical outcomes but failed to
answer questions related to patient quality of life. When treatments cannot reduce pain,
improve sleep, or increase energy, patients may perceive them to be of little beneft even
if they do improve biological processes. It is also important to address potential fnancial,
professional, and intellectual conficts of interest of stakeholders and team members in
order to ensure an unbiased assessment (Committee on Standards, 2011).
Thoroughly framing the topic to be studied and constructing the right testable questions
forms the foundation of a good systematic review. The foundation underlies all the later
steps, especially analysis for which the proper approach depends on addressing the right
question. Scope is often motivated by available resources (time, money, personnel), prior
knowledge about the problem and evidence. Questions must carefully balance the trad-
eoff between breadth and depth. Very broadly defned questions may be criticized for not
providing a precise answer to a question. Very narrowly focused questions have limited
applicability and may be misleading if interpreted broadly; there may also be little or no
evidence to answer them.
An analytic framework is often helpful when developing this formulation. An analytic
framework is a graphical representation that presents the chain of logic that links the
intervention to outcomes and helps defne the key questions of interest, including their
rationale (Anderson et al., 2011). The rationale should address both research and decision-
making perspectives. Each link relating test, intervention, or outcome represents a poten-
tial key question. Stakeholders can provide important perspectives. Figure 1.2 provides
an example from an AHRQ evidence report on the relationship between cardiovascular
disease and omega-3 fatty acids (Balk et al., 2016).
For each question, it is important to identify the PICOS elements: Populations (partici-
pants and settings), Interventions (treatments and doses), Comparators (e.g., placebo, stan-
dard of care or an active comparator), Outcomes (scales and metrics), and Study designs
(e.g., randomized and observational) to be included in the review. Reviews of studies of
diagnostic test accuracy modify these components slightly to refect a focus on tests, rather
than treatments. Instead of interventions and comparators, they examine index tests and
gold standards (see Chapter 19). Of course, some reviews may have non-comparative out-
comes (e.g., prevalence of disease) and so would not have a comparator. Table 1.2 shows
potential PICOS components for this study to answer the question posed in the omega-3
review “Are omega-3 fatty acids benefcial in reducing cardiovascular disease?”
As with primary studies, it is also important to construct a thorough protocol that
defnes all of the review’s inclusion and exclusion criteria and also carefully describes how
the study will carry out the remaining components of the systematic review: searching,
screening, extraction, analysis, and reporting (Committee on Standards, 2011; Moher et al.,
2015).
Because the PICOS elements comprise a major part of the protocol that informs the
whole study design, it is useful to discuss each element of PICOS in turn. Defning the
appropriate populations is crucial for ensuring that the review applies to the contexts for
which it is intended to apply. Often, inferences are intended to apply widely, but studies
in the review may only focus on narrow settings and groups of individuals. For example,
Introduction to Systematic Review and Meta-Analysis 5

FIGURE 1.2
Analytic framework for omega-3 fatty acid intake and cardiovascular disease (Balk et al., 2016).

TABLE 1.2
Potential PICOS Criteria for Addressing the Question: “Are Omega-3 Fatty Acids Benefcial in
Reducing Cardiovascular Disease?”
Participants Interventions Comparator Outcomes Study Design
Primary Fish, EPA, DHA, Placebo Overall mortality RCTs
prevention ALA No control Sudden death Observational studies
Secondary Dosage Isocaloric control Revascularization Follow-up duration
prevention Background intake Stroke Sample size
Duration Blood pressure

many studies exclude the elderly and so do not contribute information about the complete
age spectrum. Even when some studies do include all subpopulations, an analysis must
be carefully designed in order to evaluate whether benefts and harms apply differently
to each subpopulation. Homogeneity of effects across geographic, economic, cultural, or
other units may also be diffcult to test if most studies are conducted in the same environ-
ment. For some problems, inferences may be desired in a particular well-defned area,
such as effects in a single country, but in others variation in wider geographic regions
6 Handbook of Meta-Analysis

may be of interest. The effect of medicines may vary substantially by location if effcacy is
infuenced by the local healthcare system.
Interventions come in many different forms and effects can vary with the way that an
intervention is implemented. Different durations, different doses, different frequencies
and different co-interventions can all modify treatment effects and make results heteroge-
neous. Restricting the review to similar interventions may reduce this heterogeneity but
will also reduce the generalizability of the results to the populations and settings of inter-
est. It is often important to be able to evaluate the sensitivity of the results to variations in
the interventions and to the circumstances under which the interventions are carried out.
Thus, reviewers must carefully consider the scope of the interventions to be studied and
the generality with which inferences about their relative effects should apply. Many ques-
tions of interest involve the comparison of more than two treatments. A common example
is the comparison of drugs within a class or the comparison of brand names to generics.
Network meta-analysis (see Chapter 10) provides a means to estimate the relative effcacy
and rank the set of treatments studied.
The type of comparator that a study uses can have a large impact on the treatment effect
found. In many reviews, interest lies in comparing one or more interventions with stan-
dards of care or control treatments that serve to provide a baseline response rate. While the
placebo effect is well-known and often surprisingly reminds us that positive outcomes can
arise because of patient confdence that a treatment is effcacious, many effectiveness stud-
ies require the use of an active control that has been previously proven to provide benefts
compared to no treatment. Because active controls will typically have larger effects than
placebos, treatment effects in studies with active controls are often smaller than in those
with placebo controls. Combining studies with active controls and studies with placebo
controls can lead to an average that mixes different types of treatment effects and lead
to summaries that are hard to interpret. Even the type of placebo can have an effect. In a
meta-analysis comparing different interventions for patients with osteoarthritis, Bannuru
et al. found that placebos given intravenously worked better than those given orally, thus
distorting the comparison between oral and intravenous treatments which were typi-
cally compared with different placebos (Bannuru et al., 2015). Regression analyses may be
needed when comparators differ in these ways (see Chapter 7).
The studies included in a review may report many different types of outcomes and the
choice of which to summarize can be daunting. Outcomes selected should be meaning-
ful and useful, based on sound scientifc principles. Some outcomes correspond to well-
defned events such as death or passing a test. Other outcomes are more subjective: amount
of depression, energy levels, or ability to do daily activities. Outcomes can be self-reported
or reported by a trained evaluator; they can be extracted from a registry or measured by an
instrument. Some are more important to the clinician, teacher, or policymaker; others are
more important to the research participant, patient, or student. Some are more completely
recorded than others; some are primary and some secondary; some relate to benefts, oth-
ers relate to harms. All of these variations affect the way in which analyses are carried out
and interpreted. They change the impact that review conclusions have on different stake-
holders and the degree of confdence they inspire. All of these considerations play a major
role in the choice of methods used for meta-analysis.
Reviews can summarize studies with many different types of designs. Studies may be
randomized or observational, parallel or crossover, cohort or case-control, prospective or
retrospective, longitudinal or cross-sectional, single or multi-site. Different techniques
are needed for each. Study quality can also vary. Not all randomized trials use proper
randomization techniques, appropriate allocation concealment, and double blinding.
Introduction to Systematic Review and Meta-Analysis 7

Not all studies use standardized protocols, appropriately follow up participants, record
reasons for withdrawal, and monitor compliance to treatment. All of these design differ-
ences among studies can introduce heterogeneity into a meta-analysis and require careful
consideration of whether the results will make sense when combined. Careful consider-
ation of the types and quality of studies to be synthesized in a review can help to either
limit this heterogeneity or expand the review’s generalizability, depending on the aims of
the review’s authors. In many cases, different sensitivity analyses will enable reviewers to
judge the impact of this heterogeneity on conclusions.

1.3 Literature Search


The PICOS elements motivate the strategy for searching the relevant literature using a
variety of sources to address research questions. Bibliographic databases such as Medline
or PsycINFO are updated continually and are freely available to the public. Medline,
maintained by the US National Library of Medicine since 1964, indexes more than 5500
biomedical journals and more than 20 million items with thousands added each day. A
large majority are English language publications. Other databases are available through
an annual paid subscription. EMBASE, published by Elsevier, indexes 7500 journals and
more than 20 million items in healthcare. Although it overlaps substantially with Medline,
it includes more European journals. Other databases provide registries of specifc types
of publications. The Cochrane Controlled Trials Registry is part of the Cochrane Library
and indexes more than 500,000 controlled trials identifed through manual searches by
volunteers in Cochrane review groups. Many other databases are more specifc to subject
matter areas. CINAHL covers nursing and allied health felds in more than 1600 journals;
PsycINFO covers more than 2000 journals related to psychology; and CAB (Commonwealth
Agricultural Bureau) indexes nearly 10,000 journals, books, and proceedings in applied
life sciences and agriculture. Sources like Google Scholar are broader but less well-defned
making structured, reproducible searches more diffcult to carry out and complicating the
capture of all relevant articles.
To ensure that searches capture studies missing from databases, researchers should
search the so-called gray literature for results not published as full text papers in journals
(Balshem et al., 2013). These include sources such as dissertations, company reports, regula-
tory flings at government agencies such as the US Food and Drug Administration, online
registries such as clinicaltrials.gov, and conference proceedings that contain abstracts pre-
sented. Some of these items may be available through databases that index gray literature,
others may be identifed through contact with colleagues and others may require manual
searching of key journals and reference lists of identifed publications. Preliminary reports
like abstracts often present data that will change with fnal publication, so it is a good idea
to continue to check the literature for the fnal report.
Searches are often restricted to specifc languages, especially English, for expediency.
Research has not been completely consistent on the potential impact of this language bias
(Morrison et al., 2012; Jüni et al., 2002; Pham et al., 2005), but its impact for studying cer-
tain treatments is undeniable. For example, reviews of Chinese medical treatments such
as acupuncture that ignore the Chinese literature will be incomplete (Wang et al., 2008).
Other considerations in choosing sources to search include the quality of the studies, the
accessibility of journals, the cost of accessing articles, and the presence of peer review.
8 Handbook of Meta-Analysis

Accurate and comprehensive searches require knowledge of the structure of the data-
bases and the syntax needed to search them (e.g., Boolean combinations using AND, OR,
and NOT). Medline citations, for example, include the article’s title, authors, journal, publi-
cation date, language, publication type (e.g., article, letter), and 5–15 controlled vocabulary
Medical Subject Heading (MeSH) search terms (i.e., keywords) chosen from a structured
vocabulary that ensures uniformity and consistency of indexing and so greatly facili-
tates searching. The MeSH terms are divided into headings (e.g., disease category or body
region), specialized subheadings (e.g., diagnosis, therapy, epidemiology, human, animal),
publication types (e.g., journal article, randomized controlled trial), and a large list of sup-
plementary concepts related to the specifc article (e.g., type of intervention). Information
specialists like librarians trained in systematic review can help construct effcient algo-
rithms of keywords, headings, and subclassifcations that best use the search tools in order
to optimize sensitivity (not missing relevant items) and specifcity (not capturing irrel-
evant items) during database searches.
Searching is an iterative process. The scope of the search can vary greatly depending
on the topic and the questions asked, as well as on the time and manpower resources
available. Reviews must balance completeness of the search with the costs incurred. Each
database will require its own search strategy to take advantage of its unique features, but
general features will remain the same. Some generic search flters that have been devel-
oped for specifc types of searches (Glanville et al., 2006) can be easily modifed to provide
a convenient starting strategy for a specifc problem. A search that returns too many cita-
tions may indicate that the questions being asked are too broad and that the topic should
be reformulated in a more focused manner. Manual searches that identify items missed
by the database search may suggest improved strategies. It is important to document
the exact search strategy, including its date, and the disposition of each report identifed
including reasons for exclusion, so that the search and the fnal collection of documents
can be reproduced (Liberati et al., 2009).

1.4 Study Screening


Once potential articles are identifed, they must be screened to determine which are rel-
evant to the review based on the protocol’s pre-specifed criteria. Because a review may
address several different questions, each article is not necessarily relevant to all questions.
For instance, a review addressing the benefts and harms of an intervention may include
only randomized trials in assessing benefts, but both trials and observational studies in
assessing harms.
Traditionally, screening has been a laborious process, poring through a large stack of
printed abstracts. Recently, computerized systems have been developed to facilitate the
process (Wallace et al., 2012). These systems organize and store the abstracts recovered
by the search and enable the screener to read, comment on, and highlight text and make
decisions electronically. Systems for computer-aided searching using text mining and
machine learning have also been developed (Wallace et al., 2010; Marshall et al., 2016;
Marshall et al., 2018).
Experts recommend independent screening by at least two members of the research
team in order to minimize errors (Committee on Standards, 2011), although many teams
do not have resources for such an effort. Computerized screening offers a possible
Introduction to Systematic Review and Meta-Analysis 9

solution to this problem by allowing the computer to check the human extractor (Jap
et al., 2019). Often, one screener is a subject matter expert and the other a methodolo-
gist so that both aspects of inclusion criteria are covered. Teams also often have a senior
researcher work with a junior member to ensure accuracy and for training purposes.
Screening is a tedious process and requires careful attention. Sometimes, it is possible
to screen using only a title, but in other cases, careful reading of the abstract is required.
Articles are often screened in batches to optimize effort. If duplicate screening is used,
the pair of screeners meet at the end to reconcile any differences. Once the initial screen-
ing of abstracts is completed, the articles identifed for further review are retrieved and
examined more closely in a second screening phase. Because sensitivity is so important,
reviewers screen abstracts conservatively and may end up retrieving many articles that
ultimately do not meet criteria.

1.5 Data Extraction


After screening, the review team must extract data from the studies identifed as relevant.
In most cases, the report itself will provide the information; in some cases, investigators
may have access to the study dataset or may need to contact investigators for additional
information. Tables, fgures, and text from the report provide quantitative and qualitative
summary information including bibliographic information and the PICOS elements relat-
ing to the demographics, disease characteristics, comorbidities, enrollments, baseline mea-
surements, exposures and interventions, outcomes, and design elements. Outcomes are
usually reported by treatment group; participant characteristics are usually aggregated as
averages or proportions across or by study groups (e.g., mean age or proportion female). It
is also important to extract how each study has defned and ascertained the outcome for
use in assessing study quality. Combined with study design features such as location or
treatment dosage, these study-level variables aid in assessing the relevance of the study for
answering the research questions and for making inferences to the population of interest,
including assessing how the effect of treatments might change across studies that enroll
different types of participants or use different study designs (see Chapter 7). Extracted
items should also include those necessary to assess study quality and the potential for bias
(see Chapter 12).
As with screening, extraction should follow a carefully pre-specifed process using a
structured form developed for the specifc systematic review. Independent extraction by
two team members, often a subject matter expert and a methodologist who meet to recon-
cile discrepancies, reduces errors as does the use of structured items using precise opera-
tional defnitions of items to extract. For example, one might specify whether the study’s
location is defned by where it was conducted or where its authors are employed. If full
duplicate extraction is too costly, duplicate extraction of a random sample of studies or
of specifc important items may help to determine whether extraction is consistent and
whether further training may be necessary. It is often helpful to categorize variables into
pre-defned levels in order to harmonize extraction and reduce free text items. This can be
especially useful with non-numeric items like drug classes or racial groups. Pilot testing
of the form on a few studies using all extractors can identify inadvertently omitted and ill-
defned items and help reduce the need to re-categorize variables or re-extract data upon
reconciliation. The pilot testing phase is also useful for training inexperienced extractors.
10 Handbook of Meta-Analysis

Advances in natural language processing are beginning to facilitate computerized data


extraction (Marshall et al., 2016).
Quite often, meta-analysts identify errors and missing information in the published
reports and may contact the authors for corrections. In other cases, it may be possible to
infer missing information from other sources by back calculation (e.g., using a confdence
interval to determine a standard error) or by digitizing software from a graph (Shadish
et al., 2009).
Many software tools such as spreadsheets, databases, and dedicated systematic review
packages (e.g., the Cochrane Collaboration’s RevMan) aid in collection of data extracted
from papers. The Systematic Review Data Repository is one example of a tool that provides
many facilities for extracting and storing information from studies (Ip et al., 2012). These
tools can be evaluated based on their cost, ease of setup, ease of use, versatility, portability,
accessibility, data management capability, and ability to store and retrieve data.

1.6 Critical Appraisal of Study and Assessment of Risk of Bias


Many elements extracted from a study help to assess its quality and validity. These
assess the relevance of the study’s populations, interventions, and outcome measures to
the systematic review criteria; the fdelity of the implementation of interventions; and
potential risk of bias that study elements pose to each study’s conclusions and to the
overall synthesis. Elements that inform potential risk of bias in a study include adequacy
of randomization, allocation concealment and blinding in experiments (Schulz et al.,
1995), and proper adjustment for confounding in observational studies (Sterne et al.,
2016). Across studies, biases may arise from missing information in the evidence base
caused by missing or incompletely reported studies that lack full documentation of all
outcomes collected or all patients enrolled (e.g., from withdrawals and loss to follow-up)
(Deo et al., 2011).
One way to get a sense of potential bias is to compare the study as actually conducted
with the study as it should ideally have been conducted. Some trials fail to assign partici-
pants in a truly random fashion or fail to ensure that the randomized treatment for a given
participant is concealed. In other cases, participants may be properly randomized but may
fnd out their treatment during the study. If this knowledge changes their response, then
their response is no longer that intended by the study. Or participants in the control group
may decide to seek out the treatment on their own, leading to an outcome that no lon-
ger represents that under the assigned control. This would bias a study of effcacy but
might actually provide a better estimate in a study of effectiveness (i.e., an estimate of
how a treatment works when applied in a real-world setting). In other cases, individuals
may drop out of a study because of adverse effects. Other study performance issues that
may cause bias include co-interventions given unequally across study arms and outcome
assessments made inconsistently. These issues are more problematic when studies are not
blinded and those conducting the study are infuenced by the assignment or exposure to
treat groups differently.
Several quality assessment tools are available depending on whether the study is experi-
mental or observational (Whiting et al., 2016; Sterne et al., 2016). These provide excellent
checklists of items to check for bias and provide guidelines for defning the degree of bias.
Low risk of bias studies should not lead to large differences between the study estimate
Introduction to Systematic Review and Meta-Analysis 11

and the intended estimand; high risk of bias studies may lead to large differences. Chapter
12 examines strategies for assessing and dealing with such bias, including analytic strate-
gies such as sensitivity analyses that can be used to assess the impact on conclusions.
It is important to bear in mind that study conduct and study reporting are different
things. A poor study report does not necessarily imply that the study was poorly done and
therefore has biased results. Conversely, a poorly done study may be reported well. In the
end, the report must provide suffcient information for its readers to be confdent enough
to know how to proceed to use it.

1.7 Analysis
Analysis of the extracted data can take many forms depending on the questions asked
and data collected but can be generally categorized as either qualitative or quantitative.
Systematic reviews may consist solely of qualitative assessments of individual studies
when insuffcient data are available for a full quantitative synthesis or they may involve
both qualitative and quantitative components.
A qualitative synthesis typically summarizes the scientifc and methodological char-
acteristics of the included studies (e.g., size, population, interventions, quality of execu-
tion); the strengths and limitations of their design and execution and the impact of these
on study conclusions; their relevance to the populations, comparisons, co-interventions,
settings, and outcomes or measures of interest defned by the research questions; and pat-
terns in the relationships between study characteristics and study fndings. Such qualita-
tive summaries help answer questions not amenable to statistical analysis.
Meta-analysis is the quantitative synthesis of information from a systematic review. It
employs statistical analyses to summarize outcomes across studies using either aggre-
gated summary data from trial reports (e.g., trial group summary statistics like means and
standard deviations) or complete data from individual participants. When comparing the
effectiveness and safety of treatments between groups of individuals that receive differ-
ent treatments or exposures, the meta-analysis summarizes the differences as treatment
effects where size and corresponding uncertainty estimates are expressed by standard
metrics that depend on the scale of the outcome measured such as continuous, categori-
cal, count, or time-to-event. Examples include differences in means of continuous out-
comes and differences in proportions for binary outcomes. When comparison between
treatments is not the object of the analysis, the summaries may take the form of means or
proportions of a single group (see Chapter 3 for discussion of the different types of effect
measures in meta-analysis).
Combining estimates across studies not only provides an overall estimate of how a
treatment is working in populations and subgroups but can overcome the lack of power
that leads many studies to non-statistically signifcant conclusions because of insuffcient
sample sizes. Meta-analysis also helps to explore heterogeneity of results across studies
and helps identify research gaps that can be flled by future studies. Synthesis can help
uncover differential treatment effects according to patient subgroups, form of intervention
delivery, study setting, and method of measuring outcomes. It can also detect bias that
may arise from poor research such as lack of blinding in randomized studies or failure to
follow all individuals enrolled in a study. As with any statistical analysis, it is also impor-
tant to assess the sensitivity of conclusions to changes in the protocol, study selection, and
12 Handbook of Meta-Analysis

analytic assumptions. Findings that are sensitive to small changes in these elements are
less trustworthy.
The extent to which a meta-analysis captures the truth about treatment effects depends
on how accurately the studies included represent the populations and settings for which
inferences must be made. Research gaps represent studies that need to be done. Publication
bias and reporting bias relate to studies that have been done but that have been incom-
pletely reported. Publication bias refers to the incorrect estimation of a summary treat-
ment effect from the loss of information resulting from studies that are not published
because they had uninteresting, negative, or non-statistically signifcant fndings. Failure
to include such studies in a review leads to an overly optimistic view of treatment effects,
biased toward positive results (Dickersin, 2005).
Reporting bias arises when studies report only a subset of their fndings, often a subset
of all outcomes examined (Schmid, 2017). Bias is not introduced if the studies fail to collect
the outcomes for reasons unrelated to the outcome values. For example, a study on blood
pressure treatments may collect data on cardiovascular outcomes, but not on kidney out-
comes. A subsequent meta-analysis of the effect of blood pressure treatments on kidney
outcomes can omit those studies without issues of bias. However, if the outcomes were
collected but were not reported because they were negative, then bias will be introduced if
the analysis omits those studies without adjusting for the missing outcomes. Adjustment
for selective outcome reporting is diffcult because the missing data mechanism is usually
not known and thus hard to incorporate into analysis. Comparing study protocols with
published reports can often detect potential reporting bias. Sometimes, authors have good
reasons for reporting only some outcomes as when the full report is too long to publish or
outcomes relate to different concepts that might be reported in different publications. In
other cases, it is useful to contact authors to fnd out why outcomes were unreported and
whether these results were negative. Chapter 13 discusses statistical and non-statistical
methods for handling publication and reporting bias.

1.8 Reporting
Generation of a report summarizing the fndings of the meta-analysis is the fnal step in
the systematic review process. The most important part of the report contains its conclu-
sions regarding the evidence found to answer the review’s research questions. In addition
to stating whether the evidence does or does not favor the research hypotheses, the report
needs to assess the strength of that evidence in order that proper decisions be drawn from
the review (Agency for Healthcare Research and Quality). Strength of evidence involves
both the size of the effect found and the confdence in the stability and validity of that
effect. A meta-analysis that fnds a large effect based on studies of low quality is weaker
than one that fnds a small effect based on studies of high quality. Likewise, an effect that
disappears with small changes in model assumptions or that is sensitive to leaving out one
study is not very reliable. Thus, a report must summarize not only the analyses leading to a
summary effect estimate, but also the analyses assessing study quality and their potential
for bias. Randomized studies have more internal validity than observational studies; stud-
ies that ignore dropouts are more biased than studies that perform proper missing data
adjustments; analyses that only include studies with individual participant data available
and that ignore the results from known studies with only summary data available may be
Introduction to Systematic Review and Meta-Analysis 13

both ineffcient and biased. It is important to bear in mind that study conduct and study
reporting are different things. A poor report does not necessarily imply that the study
was poorly done and therefore has biased results. Conversely, a poorly done study may be
reported well. In the end, the report must provide suffcient information for its readers to
be confdent enough to know how to proceed to use it.
The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA)
statement (Moher et al., 2009) is the primary guideline to follow for reporting a meta-anal-
ysis of randomized studies. It lists a set of 27 items that should be included in each report.
These include wording for the title and elements in the introduction, methods, results, and
discussion. Slightly different items are needed for observational studies and these can be
found in the Meta-analysis of Observational Studies in Epidemiology (MOOSE) statement
(Stroup et al., 2000). Modifed guidelines have been written for particular types of meta-
analyses such as those for individual participant data (Stewart et al., 2015), networks of
treatments (Hutton et al., 2015), diagnostic tests (Bossuyt et al., 2003), and N-of-1 studies
(Vohra et al., 2015; Shamseer et al., 2015). These are now required by most journals that
publish meta-analyses. Later chapters discuss these and other types of meta-analyses and
explain the need for these additional items.

1.9 Using a Systematic Review


Once the systematic review is complete, a variety of different stakeholders may use it for
a variety of different purposes. Many reviews are commissioned by organizations seek-
ing to set evidence-based policy or guidelines. For example, many clinical societies pub-
lish guidelines for their members to follow in treating patients. Such guidelines ensure
that best practices are used but may also protect members from malpractice claims when
proper treatment fails to produce a desired outcome. Government and private health
insurance coverage decisions make use of systematic reviews to estimate the safety and
effectiveness of new treatments. Government agencies regulating the approval or funding
of new drug treatments or medical devices such as the Food and Drug Administration in
the United States, the European Medicines Agency, or the National Institute for Clinical
Excellence (NICE) in the United Kingdom now often require applicants to present the
results of a meta-analysis summarizing all studies related to the product under review in
order to provide context to the application. Many educational policy decisions are moti-
vated by reviews of the evidence deposited in the Institute of Education’s What Works
Clearinghouse (https://ies.ed.gov/ncee/wwc). Many national environmental policies
rely on systematic reviews of chemical exposures, ecological interventions, and natural
resources. Often, the impact and cost-effectiveness of decisions is modeled using inputs
derived from systematic reviews. Other stakeholders include businesses making decisions
about marketing strategies and consumer advocacy groups pursuing legal action.
In addition to the What Works Clearinghouse, two other prominent repositories of system-
atic reviews are maintained to support these efforts. The Cochrane Collaboration maintains
a large and growing database of nearly 8000 reviews of all types of healthcare interven-
tions and diagnostic modalities in the Cochrane Database of Systematic Reviews (www
.cochranelibrary.com). The Campbell Collaboration’s repository is smaller, but of a similar
structure that covers reviews in the social sciences (www.campbellcollaboration.org/libra
ry.html). These repositories have become quite infuential, used by many researchers and
14 Handbook of Meta-Analysis

the public and quoted frequently in the popular press. They speak to the growing infuence
of systematic reviews which are highly referenced in the scientifc literature.
In the United States and Canada, the Agency for Healthcare Research and Quality
(AHRQ) has supported 10–15 Evidence-Based Practice Centers since 1997. These centers
carry out large reviews of questions nominated by stakeholders and refned through a
consensus process. Stakeholders for AHRQ reports include clinical societies, payers, the
United States Congress, and consumers. Like NICE and the Cochrane Collaboration,
AHRQ has published guidance documents for its review teams that emphasize not only
review methods, but review processes and necessary components of fnal reports in order
to ensure uniformity and adherence to standards (AHRQ, 2008).
Systematic reviews also serve an important role in planning future research. A review
often identifes areas where further research is needed either because the overall evidence
is inconclusive or because uncertainty remains about outcomes in certain circumstances
such as in specifc subpopulations, settings, or under treatment variations. Ideally, deci-
sion models should incorporate systematic review evidence and be able to identify which
new studies would best inform the model. Systematic review results can also provide
important sources for inputs such as effect sizes and variances needed in sample size cal-
culations. These can be explicitly incorporated into calculations in a Bayesian framework
(Sutton et al., 2007; Schmid et al., 2004). Chapter 23 discusses these issues.

1.10 Summary
Systematic reviews have become a standard approach for summarizing the existing sci-
entifc evidence in many different felds. They rely on a set of techniques for framing the
proper question, identifying the relevant studies, extracting the relevant information from
those studies, and synthesizing that information into a report that interprets fndings for
its audience. This chapter has summarized the basic principles of these steps as back-
ground for the main focus of this book which is on the statistical analysis of data using
meta-analysis. Readers interested in further information about the non-statistical aspects
of systematic reviews are urged to consult the many excellent references on these essential
preliminaries. The guidance documents referenced in this chapter provide a good starting
point and contain many more pointers in their bibliographies. Kohl et al. (2018) provide a
detailed comparison of computerized systems to aid in the review process.

References
Agency for Healthcare Research and Quality, 2008-. Methods Guide for Effectiveness and Comparative
Effectiveness Reviews [Internet]. Rockville, MD: Agency for Healthcare Research and
Quality (US). https://www.ahrq.gov/research/fndings/evidence-based-reports/technical/
methodology/index.html and https://www.ncbi.nlm.nih.gov/books/NBK47095.
Anderson LM, Petticrew M, Rehfuess E, Armstrong R, Ueffng E, Baker P, Francis D and Tugwell
P, 2011. Using logic models to capture complexity in systematic reviews. Research Synthesis
Methods 2(1): 33–42.
Introduction to Systematic Review and Meta-Analysis 15

Antman EM, Lau J, Kupelnick B, Mosteller F and Chalmers TC, 1992. A comparison of results of meta-
analyses of randomized control trials and recommendations of clinical experts. Treatments for
myocardial infarction. JAMA 268(2): 240–248.
Balk EM, Adam GP, Langberg V, Halladay C, Chung M, Lin L, Robertson S, Yip A, Steele D, Smith
BT, Lau J, Lichtenstein AH and Trikalinos TA, 2016. Omega-3 Fatty Acids and Cardiovascular
Disease: An Updated Systematic Review. Evidence Report/Technology Assessment No. 223.
(Prepared by the Brown Evidence-based Practice Center under Contract No. 290-2012-00012-
I.) AHRQ Publication No. 16-E002-EF. Rockville, MD: Agency for Healthcare Research and
Quality, August 2016. www.effectivehealthcare.ahrq.gov/reports/fnal.cfm.
Balshem H, Stevens A, Ansari M, Norris S, Kansagara D, Shamliyan T, Chou R, Chung M,
Moher D and Dickersin K, 2013. Finding Grey Literature Evidence and Assessing for Outcome
and Analysis Reporting Biases When Comparing Medical Interventions: AHRQ and the Effective
Health Care Program. Methods Guide for Comparative Effectiveness Reviews. AHRQ Publication
No. 13(14)-EHC096-EF. Rockville, MD: Agency for Healthcare Research and Quality. www.
effectivehealthcare.ahrq.gov/reports/fnal.cfm.
Bannuru RR, Schmid CH, Kent D, Wong J and McAlindon T, 2015. Comparative effectiveness of
pharmacological interventions for knee osteoarthritis: A systematic review and network meta-
analysis. Annals of Internal Medicine 162(1): 46–54.
Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, Lijmer JG, Moher D, Rennie
D and de Vet HC, 2003. Towards complete and accurate reporting of studies of diagnostic accu-
racy: The STARD initiative. Clinical Chemistry 49(1): 1–6.
Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy. https://methods.cochrane.
org/sdt/handbook-dta-reviews.
Committee on Standards for Systematic Review of Comparative Effectiveness Research. 2011. In
Eden J, Levit L, Berg A and Morton S (Eds). Finding What Works in Health Care: Standards for
Systematic Reviews. Washington, DC: Institute of Medicine of the National Academies.
Deo A, Schmid CH, Earley A, Lau J and Uhlig K, 2011. Loss to analysis in randomized controlled tri-
als in chronic kidney disease. American Journal of Kidney Diseases 58(3): 349–355.
Dickersin K, 2005. Publication bias: recognizing the problem, understanding its origins and scope,
and preventing harm. In Rothstein HR, Sutton AJ and Borenstein M (Eds). Publication Bias in
Meta-Analysis: Prevention, Assessment and Adjustments. John Wiley & Sons.
Glanville JM, Lefebvre C, Miles JN and Camosso-Stefnovic J, 2006. How to identify randomized
controlled trials in Medline: Ten years on. Journal of the Medical Library Association 94(2):
130–136.
Glass GV, 1976. Primary, secondary and meta-analysis of research. Educational Researcher 5(10): 3–8.
Gurevitch J, Morrow LL, Wallace A and Walsh JS, 1992. A meta-analysis of competition in feld exper-
iments. The American Naturalist 140(4): 539–572.
Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors), 2019. Cochrane
Handbook for Systematic Reviews of Interventions. 2nd Edition. Chichester, UK: John Wiley &
Sons. Also online version 6.0 (updated July 2019) available from www.training.cochrane.org
/handbook.
Hutton T, Salanti G, Caldwell DM, Chaimani A, Schmid CH, Cameron C, Ioannidis JPA, Straus S,
Shing LK, Thorlund K, Jansen J, Mulrow C, Catala-Lopez F, Gotzsche PC, Dickersin K, Altman
D and Moher D, 2015. The PRISMA extension statement for reporting of systematic reviews
incorporating network meta-analyses of healthcare interventions: Checklist and explanations.
Annals of Internal Medicine 162(11): 777–784.
Ip S, Hadar N, Keefe S, Parkin C, Iovin R, Balk EM and Lau J, 2012. A Web-based archive of system-
atic review data. Systematic Reviews 1: 15. https://srdr.ahrq.gov/.
Jap J, Saldanha I, Smith BT, Lau J, Schmid CH, Li T on behalf of the Data Abstraction Assistant
Investigators, 2019. Features and functioning of Data Abstraction Assistant, a software applica-
tion for data abstraction during systematic reviews. Research Synthesis Methods 10(1): 2–14.
Järvinen A, 1991. A meta-analytic study of the effects of female age on laying- date and clutch-size in
the great tit Parus major and the pied fycatcher Ficedula hypoleuca. Ibis 133(1): 62–67.
16 Handbook of Meta-Analysis

Jüni P, Holenstein F, Sterne J, Bartlett C and Egger M, 2002. Direction and impact of language bias in
meta-analyses of controlled trials: Empirical study. International Journal of Epidemiology 31(1):
115–123.
Kohl C, McIntosh EJ, Unger S, Haddaway NR, Kecke S, Schiemann J and Wilhelm R, 2018. Online
tools supporting the conduct and reporting of systematic reviews and systematic maps: A case
study on CADIMA and review of existing tools. Environmental Evidence 7(1): 8.
Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gøtzsche PC, Ioannidis JP, Clarke M, Devereaux PJ,
Kleijnen J and Moher D, 2009. The PRISMA statement for reporting systematic reviews and meta-
analyses of studies that evaluate health care interventions: Explanation and elaboration.Annals
of Internal Medicine 151(4): W65–94.
Marshall IJ, Kuiper J and Wallace BC, 2016. RobotReviewer: Evaluation of a system for automatically
assessing bias in clinical trials. Journal of the American Medical Informatics Association JAMIA
23(1): 193–201.
Marshall IJ, Noel-Storr A, Kuiper J, Thomas J and Wallace BC, 2018. Machine learning for identify-
ing randomized controlled trials: An evaluation and practitioner’s guide. Research Synthesis
Methods 9(4): 602–614.
Moher D, Liberati A, Tetzlaff J, Altman DG and the PRISMA Group, 2009. Preferred reporting items
for systematic reviews and meta-analyses: The PRISMA statement. BMJ 339: b2535.
Moher D, Shamseer L, Clarke M, Ghersi D, Liberati A, Petticrew M, Shekelle P, Stewart LA and the
PRISMA-P Group, 2015. Preferred reporting items for systematic review and meta-analysis
protocols (PRISMA-P) 2015 statement. Systematic Reviews 4: 1.
Morrison A, Polisena J, Husereau D, Moulton K, Clark M, Fiander M, Mierzwinski-Urban M,
Clifford T, Hutton B and Rabb D, 2012. The effect of English-language restriction on systematic
review-based meta-analyses: A systematic review of empirical studies. International Journal of
Technology Assessment in Health Care 28(2): 138–144.
Pham B, Klassen TP, Lawson ML and Moher D, 2005. Language of publication restrictions in system-
atic reviews gave different results depending on whether the intervention was conventional
or complementary. Journal of Clinical Epidemiology 58(8): 769–776. Erratum Journal of Clinical
Epidemiology, 2006 59(2): 216.
Schmid CH, 2017. Outcome reporting bias: A pervasive problem in published meta-analyses.
American Journal of Kidney Diseases 69(2): 172–174.
Schmid CH, Cappelleri JC and Lau J, 2004. Bayesian methods to improve sample size approxima-
tions. In Johnson ML and Brand L (Eds). Methods in Enzymology Volume 383: Numerical Computer
Methods Part D. New York: Elsevier, 406–427.
Schulz KF, Chalmers I, Hayes RJ and Altman DG, 1995. Empirical evidence of bias. Dimensions of
methodological quality associated with estimates of treatment effects in controlled trials. JAMA
273(5): 408–412.
Shadish WR, Brasil ICC, Illingworth DA, White KD, Galindo R, Nagler ED and Rindskopf DM, 2009.
Using UnGraph to extract data from image fles: Verifcation of reliability and validity. Behavior
Research Methods 41(1): 177–183.
Shadish WR and Lecy JD, 2015.The meta-analytic big bang. Research Synthesis Methods 6(3): 246–264.
Shamseer L, Sampson M, Bukutu C, Schmid CH, Nikles J, Tate R, Johnson BC, Zucker DR, Shadish
W, Kravitz R, Guyatt G, Altman DG, Moher D, Vohra S and the CENT Group, 2015. CONSORT
extension for N-of-1 trials (CENT): Explanation and elaboration. BMJ 350: h1793.
Sterne JA, Hernán MA, Reeves BC, Savović J, Berkman ND, Viswanathan M, Henry D, Altman DG,
Ansari MT, Boutron I, Carpenter JR, Chan AW, Churchill R, Deeks JJ, Hróbjartsson A, Kirkham
J, Jüni P, Loke YK, Pigott TD, Ramsay CR, Regidor D, Rothstein HR, Sandhu L, Santaguida PL,
Schünemann HJ, Shea B, Shrier I, Tugwell P, Turner L, Valentine JC, Waddington H, Waters E,
Wells GA, Whiting PF and Higgins JP, 2016. Robins-I: A tool for assessing risk of bias in non-
randomised studies of interventions. BMJ 355: i4919.
Stewart LA, Clarke M, Rovers M, Riley RD, Simmonds M, Stewart G, Tierney JF and the PRISMA-
IPD Development Group, 2015. Preferred reporting items for systematic review and meta-anal-
yses of individual participant data: The PRISMA-IPD Statement. JAMA 313(16): 1657–1665.
Introduction to Systematic Review and Meta-Analysis 17

Stroup DF, Berlin JA, Morton SC, Olkin I, Williamson GD, Rennie D, Moher D, Becker BJ, Sipe TA
and Thacker SB, 2000. Meta-analysis of observational studies in epidemiology: A proposal for
reporting. JAMA 283(15): 2008–2012.
Sutton AJ, Cooper NJ, Jones DR, Lambert PC, Thompson JR and Abrams KR, 2007. Evidence-based
sample size calculations based upon updated meta-analysis. Statistics in Medicine 26(12):
2479–2500.
Vohra S, Shamseer L, Sampson M, Bukutu C, Schmid CH, Tate R, Nikles J, Zucker DR, Kravitz R,
Guyatt G, Altman DG, Moher D and the CENT Group, 2015. CONSORT statement: An exten-
sion for N-of-1 trials (CENT). BMJ 350: h1738.
Wallace BC, Dahabreh IJ, Schmid CH, Lau J and Trikalinos TA, 2013. Modernizing the systematic
review process to inform comparative effectiveness: Tools and methods. Journal of Comparative
Effectiveness Research 2(3): 273–282.
Wallace BC, Small K, Brodley CE, Lau J and Trikalinos TA, 2012. Deploying an interactive machine
learning system in an evidence-based practice center: Abstrackr. In Proceedings of the 2nd ACM
SIGHIT International Health Informatics Symposium, Miami, Florida, 28–30 January 2012. New
York Association for Computing Machinery, 819–824.
Wallace BC, Trikalinos TA, Lau J, Brodley C and Schmid CH, 2010. Semi-automated screening of
biomedical citations for systematic reviews. BMC Bioinformatics 11: 55.
Wang C, De Pablo P, Chen X, Schmid C and McAlindon T, 2008. Acupuncture for pain relief in
patients with rheumatoid arthritis: A systematic review. Arthritis and Rheumatism (Arthritis Care
and Research) 59(9): 1249–1256.
Whiting P, Savović J, Higgins JP, Caldwell DM, Reeves BC, Shea B, Davies P, Kleijnen J, Churchill
R and ROBIS group, 2016. ROBIS: A new tool to assess risk of bias in systematic reviews was
developed. Journal of Clinical Epidemiology 69: 225–234.
Yusuf S, Peto R, Lewis J, Collins R and Sleight P, 1985. Beta blockade during and after myocar-
dial infarction: An overview of the randomized trials. Progress in Cardiovascular Diseases 27(5):
335–371.
2
General Themes in Meta-Analysis

Christopher H. Schmid, Theo Stijnen, and Ian R. White

CONTENTS
2.1 Introduction........................................................................................................................... 19
2.2 Data Structures...................................................................................................................... 20
2.2.1 Study-Level Data....................................................................................................... 20
2.2.2 Individual Participant Data..................................................................................... 21
2.2.3 Randomized versus Observational Data.............................................................. 21
2.2.4 Multivariate Data......................................................................................................22
2.3 Models....................................................................................................................................22
2.3.1 Homogeneous or Heterogeneous Effects.............................................................. 23
2.3.2 One- and Two-Stage Models................................................................................... 24
2.3.3 Fixed or Random Study Effects.............................................................................. 24
2.3.4 Model Checking........................................................................................................ 25
2.3.5 Bayesian or Frequentist............................................................................................ 25
2.4 Conclusion............................................................................................................................. 25
References........................................................................................................................................ 26

2.1 Introduction
Chapter 1 reviewed the parts of a systematic review and noted that they often include a
quantitative synthesis or meta-analysis, when sufficient data are available. Meta-analysis
uses statistical methods to combine data across studies in order to estimate parameters of
interest. In general, meta-analysis is used to address four types of questions. The first type
is descriptive, summarizing some characteristic of a distribution such as the prevalence of
a disease, the mean of a population characteristic, or the sensitivity of a diagnostic test. The
second type of question is comparative: how does one treatment compare with another in
terms of reducing the risk of a stroke; does a new method of teaching improve student
test scores compared with the current method; or does exposure to warmer water change
the number of fish caught? Some of these questions involve specific interventions, other
relate to prevalent exposures and others are related to diagnostic tests. We will use the
general term “treatment” to apply to all of them unless a specific need arises to differenti-
ate them. This comparative type is the most common. A third type of question involves
non-comparative associations such as correlations between outcomes or the structure of
an underlying pathway (Chapter 16) and associations between variables in a regression
model (Chapter 18). A fourth type of question involves developing a prognostic or predic-
tive model for an outcome. Frequently different studies report different models or parts
of models that involve predictive factors. Chapter 22 explores methods for combining

DOI: 10.1201/9781315119403-2 19
20 Handbook of Meta-Analysis

such data. Typically, meta-analysis estimates the size and uncertainty of the parameters of
interest expressed by standard metrics that depend on the scale of the outcome. Chapter 3
discusses these metrics in detail for different types of outcomes.
Using meta-analysis to combine information across studies offers a variety of benefts.
It provides an estimate of the average size of a characteristic of a population or of the
effectiveness or harm of a treatment (exposure) as well as a sense of the variation of these
quantities across different study settings. To the extent that the variation is not large or
can be understood, meta-analysis can increase the generalizability of the research fndings
and determine their effects in subgroups. By combining small studies, meta-analysis can
also increase the precision with which key parameters are estimated and help to explain
inconsistent results that arise when underpowered studies report non-statistically signif-
cant conclusions because of insuffcient sample sizes. Meta-analysis can also focus atten-
tion on discrepancies in study fndings that might argue against combining their results or
might argue for more subtle interpretations of parameters whose true values might vary
with characteristics of the populations studied or with the manner in which interventions
are undertaken. In certain cases, exploring the causes of such heterogeneity can lead to
important conclusions in their own right or might point to the need for further studies to
fll in research gaps. Integration of meta-analysis with study of risk of bias (Chapter 12)
can also pinpoint weaknesses in the data and evaluate the sensitivity of the conclusions to
poor study processes.
This chapter introduces the general themes that motivate the methods used for carrying
out meta-analysis and provides a map with which the reader can navigate the rest of the
book. Meta-analysis comes in a variety of favors motivated by the types of data available,
the research questions to be answered and the inferences desired. Section 2.2 discusses var-
ious types of data that may be available and what they imply about the types of questions
that may be asked. Section 2.3 explores the types of models that may be ft with these data.
Before embarking on this overview, however, it is important to bear in mind that meta-
analysis is not appropriate for all systematic reviews. Reviews with few studies, or hav-
ing studies with dissimilar outcomes or outcome scales, studying diverse interventions or
interventions that may have evolved over the course of time covered by the review, or that
combine different study designs (e.g., observational vs. experimental) may not ft into a
single statistical framework and may be better handled qualitatively or by separating into
separate meta-analyses.

2.2 Data Structures


2.2.1 Study-Level Data
The data extracted from published reports of each study in a systematic review typically
include the PICOS elements (see Chapter 1) comprising elements of the study design,
information on the study participants such as demographics and personal history, their
exposures, and their outcomes. Usually, this information will come from study reports
and protocols, either published or unpublished. If a study is comparative, the study-level
data are often reported by treatment group (or arm) and not at the level of the individual
participant. Data items include the number of individuals who received each treatment
and the number of outcomes these individuals had (for categorical outcomes) or the mean
General Themes in Meta-Analysis 21

and standard deviation (for continuous outcomes). Additional characteristics of the par-
ticipants in each group (e.g., the number who were female, their average age, what propor-
tion went to school, or who had a certain type of disease) or of the treatments or exposures
which they had are also recorded for descriptive purposes and perhaps to evaluate rea-
sons why treatment effects might differ between studies.
If the meta-analysis is focused on a comparison of two treatment groups, the key statistic
will often be the difference in their means or in a function of their means (e.g., the log odds
ratio is a difference between the logits of two proportions). Representing the group mean
for group j of study i as yij , the study treatment effect is yi = yi2 − yi1. Studies often only report
this mean contrast and its resulting standard error si. Sometimes the standard errors will
need to be backcalculated from a reported confdence interval as discussed in Chapter 3.
In rare cases, studies may also report the treatment effects within subgroups such as
male and female and one can investigate treatment by covariate interactions. More com-
monly, results may be reported by dose levels (see Chapter 18). Usually, though, effects
within subgroups are unavailable and heterogeneity of treatment effects across studies can
only be studied through the correlation of the study effects with a summary measure of
the subgroup in each study such as the proportion of females or the average age of partici-
pants. Within-study heterogeneity due to factors such as participant age or education level
that vary within-study by participant cannot be investigated at all. Chapter 7 discusses
meta-regression methods for assessing heterogeneity that test for interactions of treatment
with study-level risk factors such as design characteristics that apply to all individuals in a
given study or to summary measures like average age of a study’s participants.

2.2.2 Individual Participant Data


Sometimes, however, it may be possible to obtain individual participant data (IPD) on each
individual in each study (see Chapter 8). IPD substantially increase the capacity to model
within- and between-study variation. With IPD, one can investigate within-study hetero-
geneity of the associations between the outcome and treatments that manifest between
subgroups. For example, IPD can compare treatment effcacy between younger and older
individuals, rather than just between populations with different average ages. IPD per-
mits inferences at the individual participant level; study-level data only allow inference to
study populations. IPD are necessary for meta-analyses of prognostic studies (see Chapter
22) where one needs to combine predicted outcomes from models that vary by study. IPD
can also facilitate modeling time-to-event outcomes (Chapter 15) because the IPD provide
censoring and follow-up information for individual participants. This can be especially
helpful when ongoing longitudinal studies have additional follow-up information that
earlier reports do not capture. IPD have other advantages too, including the ability to
discover outcomes and do analyses that may have been left out of the study reports, to
identify errors and fll in missing values, and to harmonize variable defnitions to reduce
potential sources of heterogeneity (Stewart and Clarke, 1995). These benefts come with a
cost, however, because the study databases usually require considerable work to harmo-
nize with each other and investigators may be unwilling to share data (Schmid et al., 2003).

2.2.3 Randomized versus Observational Data


Interpreting meta-analyses that combine studies in which participants have been ran-
domized is more straightforward than when the data are non-randomized. Experimental
designs that use randomization to assign participants give unconfounded estimates of
22 Handbook of Meta-Analysis

treatment comparisons, with distributions of all potential measured and unmeasured con-
founders balanced by treatment group on average. If the studies are free of bias, each of
their estimates of treatment effect yi provides an independent contribution to the meta-
analytic average, qˆ. Many scientifc research studies compare exposures that cannot be
randomized either practically or ethically, though. When studies are not randomized,
meta-analysis is potentially complicated by confounding. To remove this type of bias,
many studies will report estimates adjusted for potential confounders using either multi-
variable regression or special methods such as propensity scores or matching. Because the
meta-analyst must rely on these reported study analyses unless IPD are available, meta-
analysis of non-randomized studies can be biased if the studies do not properly adjust for
confounders or if they adjust in different ways. Determining whether appropriate adjust-
ment has taken place can be diffcult to determine from a published report alone since it is
not always clear which factors were considered for adjustment and whether those missing
were unavailable or rejected as not signifcant. Chapter 21 discusses this issue.

2.2.4 Multivariate Data


Systematic reviews typically evaluate several outcomes related to different review ques-
tions. These might involve a combination of benefts and harms or effcacy and resource use.
Traditionally, each outcome is addressed in a separate meta-analysis, but methods are avail-
able to assess outcomes simultaneously in multivariate models (Jackson et al., 2011). These
allow incorporation of correlations among outcomes and can lead to more precise estimates
of effects (Riley et al., 2008; Trikalinos et al., 2014). Chapter 9 lays out general models for mul-
tivariate data. Certain types of meta-analysis use particular multivariate data structures. In
meta-analyses of tests to diagnose a particular disease or condition, both the sensitivity and
the specifcity of the test are of interest. Since the use of a threshold to defne the test result as
positive or negative introduces correlation between the sensitivity and specifcity, it is usual
to model these simultaneously with a bivariate model (see Chapter 19). When combining
correlations between the results of educational tests, it is common to meta-analyze all the
correlations together, leading to a multivariate meta-analysis (Chapter 16).
A different type of multivariate data structure manifests when meta-analyzing a network
of more than two treatments. In such networks, interest lies in the comparison of all pairs of
treatments and in their rank ordering by outcomes using network meta-analysis. Complexity
arises because each study compares only a subset of the entire set of treatments and some
pairs of treatments are never compared in any study. The missing treatment effects can
be recovered indirectly by making assumptions that the comparative difference between
any two treatments is the difference of their respective comparisons to a third treatment
with which they have both been compared. Further assumptions about the studies being
exchangeable and about the correlations among the treatment effects are required to under-
take the modeling process. Chapter 10 provides an overview of network meta-analysis.

2.3 Models
The observed group-level and study-level summaries, yij and yi , respectively, depend on
parameters of the data-generating process. We can construct a model of this measure-
ment process by assuming that each observed effect is a realization of a stochastic process
General Themes in Meta-Analysis 23

with group-level means θij or contrast-level means θi and associated variances σij and σi.
Typically, θi = θi2 − θi1. Although the data may be collected at the group-level, the mean
effect θ and perhaps the individual study effects θi are the parameters of interest in a meta-
analysis focused on comparing two groups; θ represents the best single number summary
of the treatment effects in the group of studies. It is informed by all of the studies together.
But sometimes it may be helpful to determine whether the estimate of the treatment effect,
θi in a single study, particularly a small one, could be improved by information from other
studies similar to it. We will fnd that certain models can do this.

2.3.1 Homogeneous or Heterogeneous Effects


In order to draw inferences about the study effects θi , we must consider whether we are
only interested in the particular studies in the meta-analysis or whether we wish to extrap-
olate to a larger population of similar studies. There are three basic structural models for
the treatment effects. The most common is the random-effects model which treats the θi as
following a distribution, usually a N(q , t 2 ) distribution but other random-effects distribu-
tions are sometimes used too, such as beta and gamma distributions. Because it is more
interpretable on the scale of the data, it is often more useful to report the standard devia-
tion τ rather than the variance τ2. τ is often considered a nuisance parameter for estimating
θ, although τ is important when estimating the uncertainty associated with predicting the
effects in new studies and assessing the amount of heterogeneity in the data. Large values
of τ indicate that study effects vary considerably between studies and suggest searching for
factors associated with this heterogeneity. A common hypothesis of interest is whether the
studies exhibit any heterogeneity. The absence of heterogeneity implies that τ = 0 or, equiva-
lently that θi = θ for all i. In other words, the studies are all estimating a common treatment
effect θ. We call such a model a common-effect (CE) model. The third model, which we shall
call a separate-effects model, treats the θi as fxed effects (Rice et al., 2017). In statistics, a fxed
effect traditionally refers to an unknown parameter that is not randomly drawn from a
larger population. The separate-effects model is appropriate if we do not wish to assume
any relationship among the different treatment effects, but rather consider them separately.
In this case, it may still be of interest to estimate an average of the separate effects. Laird and
Mosteller (1990) suggested using an unweighted mean since there would be no reason to
treat any of the studies differently from each other if one believed them equally important.
This raises an important comment about terminology. Many authors use the term fxed-
effect model instead of common-effect model to describe the model in which each study is
estimating the same effect. We avoid this terminology because this is not a fxed effect
in the common statistical parlance. Furthermore, some authors such as Rice et al. (2017)
use the term fxed-effects model to refer to what we have called the separate-effects model.
Although the use of the plural for the word “effects” technically avoids the improper sta-
tistical terminology, it is confusing both because of the similarity in wording to “fxed
effect” and the confusion with the colloquial use of the term for the common-effect model.
We prefer distinct terms to distinguish distinct models.
The three models also correspond to different points on a continuum with respect to how
much knowledge about one study should be infuenced by information from other studies.
The separate-effects model corresponds to a no pooling model in which the estimate of the
treatment effect in a given study is determined only by the data in that study. The common-
effect model describes the opposite end of the spectrum, a complete pooling model in which
the information from all studies is combined into a single estimate believed to describe
each study effect. The random-effects model serves as a compromise between the two with
24 Handbook of Meta-Analysis

the treatment effect θi in a given study estimated by a weighted average of the estimated
effect yi in that study and the overall mean θ from all the studies. This is commonly termed
a shrinkage estimate and refects the belief that each study is informed by the other studies.
The weights correspond to the relative within- and between-study variability. While we say
little more about the separate-effects model, common-effect and random-effects models are
discussed throughout the book and in particular in Chapters 4 and 5.

2.3.2 One- and Two-Stage Models


Traditionally, meta-analysts have assumed the true σi to be known and equal to the sam-
ple standard error si, although this is a strong assumption (Jackson and White, 2018). In
this case, we may speak of a two-stage approach to modeling in which the frst stage
encompasses the separate analysis of each study’s results to obtain yi and si and the second
stage focuses on using these to estimate θ (and perhaps also the θi). Chapter 4 discusses
the two-stage approach in detail. The two-stage approach focuses on the study treatment
effects and relies heavily on the asymptotic normality of the contrast estimates and on
the assumption of known within-study variances. This approach works particularly well
with larger randomized studies where the balancing of covariates across arms avoids the
need to use estimated effects adjusted for confounding and when the average effect is of
primary importance. However, many problems involve studies that are non-randomized,
small, involve rare events, or for which the normality and known variance assumptions
fail to hold. In such cases, one may need to model the summary data from study arms
directly using their exact likelihoods. This leads to the one-stage models of Chapter 5.
One-stage models are commonly applied to binary outcomes. In such cases, the group-
level summary statistics are the counts in 2 × 2 tables. Using separate binomial distribu-
tions for each study group, the fundamental parameters are the two event proportions in
each group. In this case, the binomial mean and variance are functions of the same single
parameter and one can construct a multilevel model to simultaneously model the group-
level outcomes and their study-level parameters. The four counts in each study are the
suffcient statistics and can be used to construct the exact likelihood.
One-stage models are also useful when ftting survival models (Chapter 15) and if one
wants to model variances instead of assuming them known when the effects are continu-
ous. Chapter 5 discusses four different models for meta-analysis of continuous outcomes
that treat the group-level variances as separate from each other or constrains them to be
equal across arms, across studies, or across both arms and studies.

2.3.3 Fixed or Random Study Effects


It is often helpful to reparameterize the group-level parameters of the one-stage model in
terms of the effect in the reference or control group, γi , and the differences between groups,
θi. In problems focusing only on comparative treatment effects, the γi can be considered
nuisance parameters. If we want to estimate the relationship between treatment effects
and the underlying control risk, the γi are parameters of interest themselves (see Chapter
14). Two different models have been used for these two scenarios.
The frst model treats each γi as a random effect drawn from a distribution of poten-
tial control risks with unknown mean and variance. The γi and the θi are then estimated
together. Such a formulation is necessary if one wants to form a regression model relating
the treatment effects to the control risks or one wants to make inference about results in
each treatment group in a new study. Because the γi and the θi are being simultaneously
General Themes in Meta-Analysis 25

estimated, their effects can leak into each other. This is a basic consequence of parameter
shrinkage from multilevel models. As an example, a study with a high observed rate of
events in the control group will tend to have an estimated true study control risk rate
shrunk to the average control risk. This lowering of the control risk in the study will tend
to make the estimated study treatment effect (difference between treated and control) big-
ger and may therefore change the overall treatment effect estimate.
To avoid leakage, one could use a second model that treats each γi as a fxed effect that applies
only to that study and has no relationship to any other study. Assuming that one has correctly
estimated the γi , one might hope that the estimate of the θi and of θ are independent of the γi
and so one has solved the leakage problem. However, this model introduces a new problem
because the number of γi parameters is directly associated with the number of studies in the
meta-analysis, so that the number of parameters is now increasing at the same rate as the
number of data points. This has some poor theoretical properties as discussed in Chapter 5.
Whether to treat control risk parameters as fxed or random remains controversial in the lit-
erature (Jackson et al., 2018). One side argues, particularly when combining randomized stud-
ies, that the parameters must be treated as fxed effects in order to not potentially change the
treatment effects estimates which are unbiased in each study as a result of the randomization.
The other side argues that the asymptotic properties of the maximum likelihood estimates of
treatment effect (and perhaps also of Bayesian estimates under a non-informative prior) are
compromised when the number of parameters is increasing at the same rate as the number of
studies. If one wants to make generalized inferences to larger populations, however, a com-
plete model of all the parameters using a random-effect formulation is needed (see Chapter 5).

2.3.4 Model Checking


It is important to emphasize the importance of verifying model assumptions, no matter what
model is being used. Both formal model checks as described in Chapter 11 and sensitivity
analyses in which the model is reft under different scenarios (such as leaving one study out
in turn) are important to carry out in order to ensure that results are not driven by outliers and
that assumptions are met. As many of the most common meta-analysis models are types of
linear and generalized linear models, these diagnostics will be familiar. Assumptions about
missing data are particularly important. These can involve participants excluded from stud-
ies, unreported outcomes, and unpublished studies. Chapters 12 and 13 discuss these issues.

2.3.5 Bayesian or Frequentist


The book discusses both frequentist and Bayesian approaches to estimation. Chapters 4 and
5 outline the basic frequentist approaches focusing on normal-distribution based models
and likelihood-based models, respectively. Chapter 6 introduces the Bayesian framework.
Later chapters choose one or the other or sometimes both approaches, depending on the
methods most common in each particular area.

2.4 Conclusion
This chapter has summarized the basic data structures and models encountered in meta-
analysis. As indicated, many specifc problems implement specifc variations of these
26 Handbook of Meta-Analysis

structures and models. Before proceeding to discussing models in more detail, we discuss
in Chapter 3 how to choose an appropriate effect measure and how to extract the data nec-
essary for analyzing these effect measures.

References
Jackson D and White IR, 2018. When should meta-analysis avoid making hidden normality assump-
tions? Biometrical Journal 60(6): 1040–1058.
Jackson D, Law M, Stijnen T, Viechtbauer W and White IR, 2018. A comparison of 7 random-effects
models for meta-analyses that estimate the summary odds ratio. Statistics in Medicine 37(7):
1059–1085.
Jackson D, Riley R and White IR, 2011. Multivariate meta-analysis: Potential and promise (with dis-
cussion). Statistics in Medicine 30(20): 2481–2510.
Laird NM and Mosteller F, 1990 Some statistical methods for combining experimental results.
International Journal of Technology Assessment in Health Care 6(1): 5–30.
Rice K, Higgins JPT and Lumley T, 2017. A re-evaluation of fxed effect(s) meta-analysis. Journal of the
Royal Statistical Society: Series A 181(1): 205–227.
Riley RD, Thompson JR and Abrams KR, 2008. An alternative model for bivariate random-effects
meta-analysis when the within-study correlations are unknown. Biostatistics 9(1): 172–186.
Schmid CH, Landa M, Jafar TH, Giatras I, Karim T, Reddy M, Stark PC Levey AS and Angiotensin-
Converting Enzyme Inhibition in Progressive Renal Disease (AIPRD) Study Group, 2003.
Constructing a database of individual clinical trials for longitudinal analysis. Controlled Clinical
Trials 24(3): 324–340.
Stewart LA, Clarke MJ on behalf of the Cochrane Working Group on Meta-Analysis Using Individual
Patient Data, 1995. Practical methodology of meta-analyses (overviews) using updated indi-
vidual patient data. Statistics in Medicine 14(19): 2057–2079.
Trikalinos TA, Hoaglin DC and Schmid CH, 2014. An empirical comparison of univariate and multi-
variate meta-analyses for categorical outcomes. Statistics in Medicine 33(9): 1441–1459.

You might also like