Sample Size For Qualitative Research

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9
At a glance
Powered by AI
Some key guidelines discussed for determining qualitative sample size include continuing research until saturation is reached and no new information is obtained, as well as ensuring adequate representation of major segments like gender and age. The article also discusses aiming to reduce the chances of 'discovery failure' by having a large enough sample to uncover most potentially important perceptions.

The article discusses some informal rules of thumb for qualitative sample size, such as conducting separate focus groups for major segments and having two or more groups per segment. It also notes the importance of continuing research until saturation is reached with no new information. However, it argues these rules are not solidly grounded.

While qualitative research is not intended for estimating quantities, the sample size still matters as the research aims to discover insights. A smaller sample risks a narrower range of perceptions being uncovered, while a larger sample reduces the chances of a potentially important perception being missed, which could have serious consequences like an issue not being discovered and corrected.

Sample size for qualitative research

32

Article ID:
20001202
Published:
December 2000
Author:
Peter DePaulo

Article Abstract
How large should the sample size be in a qualitative study? This article
discusses the importance of sample size in qualitative research.

The risk of missing something important


Editors note: Peter DePaulo is an independent marketing research consultant and focus
group moderator doing business as DePaulo Research Consulting, Montgomeryville, Pa.

In a qualitative research project, how large should the sample be? How
many focus group respondents, individual depth interviews (IDIs), or
ethnographic observations are needed?
We do have some informal rules of thumb. For example, Maria Krieger (in
her white paper, The Single Group Caveat, Brain Tree Research &
Consulting, 1991) advises that separate focus groups are needed for major
segments such as men, women, and age groups, and that two or more
groups are needed per segment because any one group may be
idiosyncratic. Another guideline is to continue doing groups or IDIs until we

seem to have reached a saturation point and are no longer hearing anything
new.
Such rules are intuitive and reasonable, but they are not solidly grounded
and do not really tell us what an optimal qualitative sample size may be. The
approach proposed here gives specific answers based on a firm foundation.
First, the importance of sample size in qualitative research must be
understood.

Size does matter, even for a qualitative sample


One might suppose that N (the number in the sample) simply is not very
important in a qualitative project. After all, the effect of increasing N, as we
learned in statistics class, is to reduce the sampling error (e.g., the +/- 3
percent variation in opinion polls with N = 1,000) in a quantitative estimate.
Qualitative research normally is inappropriate for estimating quantities. So,
we lack the old familiar reason for increasing sample size.
Nevertheless, in qualitative work, we do try to discover something. We may
be seeking to uncover: the reasons why consumers may or may not be
satisfied with a product; the product attributes that may be important to
users; possible consumer perceptions of celebrity spokespersons; the
various problems that consumers may experience with our brand; or other
kinds of insights. (For lack of a better term, I will use the word perception
to refer to a reason, need, attribute, problem, or whatever the qualitative
project is intended to uncover.) It would be up to a subsequent quantitative
study to estimate, with statistical precision, how important or prevalent each
perception actually is.
The key point is this: Our qualitative sample must be big enough to assure
that we are likely to hear most or all of the perceptions that might be
important. Within a target market, different customers may have diverse
perceptions. Therefore, the smaller the sample size, the narrower the range
of perceptions we may hear. On the positive side, the larger the sample size,
the less likely it is that we would fail to discover a perception that we would
have wanted to know. In other words, our objective in designing qualitative
research is to reduce the chances of discovery failure, as opposed to
reducing (quantitative) estimation error.

Discovery failure can be serious


What might go wrong if a qualitative project fails to uncover an actionable
perception (or attribute, opinion, need, experience, etc.)? Here are some
possibilities:

A source of dissatisfaction is not discovered - and not corrected. In highly


competitive industries, even a small incidence of dissatisfaction could dent
the bottom line.
In the qualitative testing of an advertisement, a copy point that offends a
small but vocal subgroup of the market is not discovered until a publicrelations fiasco erupts.
When qualitative procedures are used to pre-test a quantitative
questionnaire, an undiscovered ambiguity in the wording of a question
may mean that some of the subsequent quantitative respondents give
invalid responses. Thus, qualitative discovery failure eventually can result
in quantitative estimation error due to respondent miscomprehension.

Therefore, size does matter in a qualitative sample, though for a different


reason that in a quant sample. The following example shows how the risk of
discover failure may be easy to overlook even when it is formidable.

Example of the risk being higher than expected


The managers of a medical clinic (name withheld) had heard favorable
anecdotal feedback about the clinics quality, but wanted an independent
evaluation through research. The budget permitted only one focus group
with 10 clinic patients. All 10 respondents clearly were satisfied with the
clinic, and group discussion did not reverse these views.
Did we miss anything as a result of interviewing only 10? Suppose, for
example that the clinic had a moody staff member who, unbeknownst to
management, was aggravating one in 10 clinic patients. Also, suppose that
management would have wanted to discover anything that affects the
satisfaction at least 10 percent of customers. If there really was an unknown
satisfaction problem with a 10 percent incidence, then what was the chance
that our sample of 10 happened to miss it? That is, what is the probability
that no member of the subgroup defined as those who experienced the
staffer in a bad mood happened to get into the sample?
At first thought, the answer might seem to be not much chance of missing
the problem. The hypothetical incidence is one in 10, and we did indeed
interview 10 patients. Actually, the probability that our sample failed to

include a patient aggravated by the moody staffer turns out to be just over
one in three (0.349 to be exact). This probability is simple to calculate:
Consider that the chance of any one customer selected at random not being
a member of the 10 percent (aggravated) subgroup is 0.9 (i.e., a nine in 10
chance). Next, consider that the chance of failing to reach anyone from the
10 percent subgroup twice in a row (by selecting two customers at random)
is 0.9 X 0.9, or 0.9 to the second power, which equals 0.81. Now, it should
be clear that the chance of missing the subgroup 10 times in a row (i.e.,
when drawing a sample of 10) is 0.9 to the tenth power, which is 0.35.
Thus, there is a 35 percent chance that our sample of 10 would have
missed patients who experienced the staffer in a bad mood. Put another
way, just over one in three random samples of 10 will miss an experience or
characteristic with an incidence of 10 percent.
This seems counter-intuitively high, even to quant researchers to whom I
have shown this analysis. Perhaps people implicitly assume the fallacy that if
something has an overall frequency of one in N, then it is almost sure to
appear in N chances.

Basing the decision on calculated probabilities


So, how can we figure the sample size needed to reduce the risk as much as
we want? I am proposing two ways. One would be based on calculated
probabilities like those in the table above, which was created by repeating
the power calculations described above for various incidences and sample
sizes. The client and researcher would peruse the table and select a sample
size that is affordable yet reduces the risk of discover failure to a tolerable
level.
For example, if the research team would want to discover a perception with
an incidence as low as 10 percent of the population, and if the team wanted
to reduce the risk of missing that subgroup to less than 5 percent, then a
sample of N=30 would suffice, assuming random selection. (To be exact, the
risk shown in the table is .042, or 4.2 percent.) This is analogous to having
95 percent confidence in being able to discover a perception with a 10
percent incidence. Remember, however, that we are expressing the
confidence in uncovering a qualitative insight - as opposed to the usual
quantitative notion of confidence in estimating a proportion or mean plus
or minus the measurement error.
If the team wants to be more conservative and reduce the risk of missing
the one-in-10 subgroup to less than 1 percent (i.e., 99 percent confidence),

then a sample of nearly 50 would be needed. This would reduce the risk to
nearly 0.005 (see table).

What about non-randomness?


Of course, the table assumes random sampling, and qualitative samples
often are not randomly drawn. Typically, focus groups are recruited from
facility databases, which are not guaranteed to be strictly representative of
the local adult population, and factors such as refusals (also a problem in
quantitative surveys, by the way) further compromise the randomness of the
sample.
Unfortunately, nothing can be done about subgroups that are impossible to
reach, such as people who, for whatever reason, never cooperate when
recruiters call. Nevertheless, we can still sample those subgroups who are
less likely to be reached as long as the recruiters call has some chance of
being received favorably, for example, people who are home only half as
often as the average target customer but will still answer the call and accept
our invitation to participate. We can compensate for their reduced likelihood
of being contacted by thinking of their reachable incidence as half of their
actual incidence. Specifically, if we wanted to allocate enough budget to
reach a 10 percent subgroup even if it is twice as hard to reach, then we
would suppose that their reachable incidence is as low as 5 percent, and look
at the 5 percent row in the table. If, for instance, we wanted to be very
conservative, we would recruit 100 respondents, resulting in less than a 1
percent chance - .006, to be exact - of missing a 5 percent subgroup (or a
10 percent subgroup that behaves like a 5 percent subgroup in likelihood of
being reached).

An approach based on actual qualitative findings


The other way of figuring an appropriate sample size would be to consider
the findings of a pair of actual qualitative studies reported by Abbie Griffin
and John Hauser in an article, The Voice of the Customer (Marketing
Science, Winter 1993). These researchers looked at the number of customer
needs uncovered by various numbers of focus groups and in-depth
interviews.
In one of the two studies, two-hour focus groups and one-hour in-depth
interviews (IDIs) were conducted with users of a complex piece of office
equipment. In the other study, IDIs were conducted with consumers of
coolers, knapsacks, and other portable means of storing food. Both studies
looked at the number of needs (attributes, broadly defined) uncovered for
each product category. Using mathematical extrapolations, the authors
hypothesized that 20-30 IDIs are needed to uncover 90-95 percent of all
customer needs for the product categories studied.
As with typical learning curves, there were diminishing returns in the sense
that fewer new (non-duplicate) needs were uncovered with each additional
IDI. It seemed that few additional needs would be uncovered after 30 IDIs.
This is consistent with the probability table (shown earlier), which shows
that perceptions of all but the smallest market segments are likely to be
found in samples of 30 or less.
In the office equipment study, one two-hour focus group was no better than
two one-hour IDIs, implying that group synergies [did] not seem to be
present in the focus groups. The study also suggested that multiple
analysts are needed to uncover the broadest range of needs.
These studies were conducted within the context of quality function
deployment, where, according to the authors, 200-400 customer needs are
usually identified. It is not clear how the results might generalize to other
qualitative applications.
Nevertheless, if one were to base a sample-size decision on the Griffin and
Hauser results, the implication would be to conduct 20-30 IDIs and to
arrange for multiple analysts to look for insights in the data. Perhaps
backroom observers could, to some extent, serve as additional analysts by
taking notes while watching the groups or interviews. The observers notes
might contain some insights that the moderator overlooks, thus helping to
minimize the chances of missing something important.

N=30 as a starting point for planning


Neither the calculation of probabilities in the prior table nor the empirical
rationale of Griffin and Hauser is assured of being the last word on
qualitative sample size. There might be other ways of figuring the number of
IDIs, groups, or ethnographic observations needed to avoid missing
something important.
Until the definitive answer is provided, perhaps an N of 30 respondents is a
reasonable starting point for deciding the qualitative sample size that can
reveal the full range (or nearly the full range) of potentially important
customer perceptions. An N of 30 reduces the probability of missing a
perception with a 10 percent-incidence to less than 5 percent (assuming
random sampling), and it is the upper end of the range found by Griffin and
Hauser. If the budget is limited, we might reduce the N below 30, but the
client must understand the increased risks of missing perceptions that may
be worth knowing. If the stakes and budget are high enough, we might go
with a larger sample in order to ensure that smaller (or harder to reach)
subgroups are still likely to be represented.
If focus groups are desired, and we want to count each respondent
separately toward the N we choose (e.g., getting an N of 30 from three
groups with 10 respondents in each), then it is important for every
respondent to have sufficient air time on the key issues. Using mini groups
instead of traditional-size groups could help achieve this objective. Also, it is
critical for the moderator to control dominators and bring out the shy
people, lest the distinctive perceptions of less-talkative customers are
missed.

Across segments or within each one?


A complication arises when we are separately exploring different customer
segments, such as men versus women, different age groups, or consumers
in different geographic regions. In the case of gender and a desired N of 30,
for example, do we need 30 in total (15 males plus 15 females) or do we
really need to interview 60 people (30 males plus 30 females)? This is a
judgment call, which would depend on the researchers belief in the extent
to which customer perceptions may vary from segment to segment. Of
course, it may also depend on budget. To play it safe, each segment should
have its own N large enough so that appreciable subgroups within the
segment are likely to be represented in the sample.

What if we only want the typical or majority view?


For some purportedly qualitative studies, the stated or implied purpose may
be to get a sense of how customers feel overall about the issue under study.
For example, the client may want to know whether customers generally
respond favorably to a new concept. In that case, it might be argued that we
need not be concerned about having a sample large enough to make certain
that we discover minority viewpoints, because the client is interested only in
how most customers react.
The problem with this agenda is that the qualitative research would have
an implicit quantitative purpose: to reveal the attribute or point of view held
by more than 50 percent of the population. If, indeed, we observe what
most qualitative respondents say or do and then infer that we have found
the majority reaction, we are doing more than discovering that reaction:
We are implicitly estimating its incidence at more than 50 percent.
The approach I propose makes no such inferences. If we find that only one
respondent in a sample of 30 holds a particular view, we make no
assumption that it represents a 10 percent population incidence, although,
as discussed later, it might be that high. The actual population incidence is
likely to be closer to 3.3 percent (1/30) than to 10 percent. Moreover, to
keep the study qualitative, we should not say that we have estimated the
incidence at all. We only want to ensure that if there is an attribute or
opinion with an incidence as low as 10 percent, we are likely to have at least
one respondent to speak for it - and a sample of 30 will probably do the job.
If we do want to draw quantitative inferences from a qualitative procedure
(and, normally, this is ill advised), then this paper does not apply. Instead,
the researchers should use the usual calculations for setting a quantitative
sample size at which the estimation error resulting from random sampling
variations would be acceptably low.

Keeping qualitative pure


Whenever I present this sample-size proposal, someone usually objects that
I am somehow quantifying qualitative. On the contrary, estimating the
chances of missing a potentially important perception is completely different
from estimating the percent of a target population who hold a particular
perception. To put it another way, calculating the odds of missing a
perception with a hypothetical incidence does not quantify the incidences of
those perceptions that we actually do uncover.

Therefore, qualitative consultants should not be reluctant to talk about the


probability of missing something important. In so doing, they will not lose
their identity as qualitative researchers, nor will they need any high math.
Moreover, by distinguishing between discovery failure and estimation error,
researchers can help their clients fully understand the difference between
qualitative and quantitative purposes. In short, the approach I propose is
intended to ensure that qualitative will accomplish what it does best - to
discover (not measure) potentially important insights.

You might also like