Sample Size

Sample Size

By
Dr. A. Merwyn Jasper D Reuben
Sample Size Considerations
A pharmaceutical company calls and says, “We believe we
have found a cure for the common cold. How many patients
do I need to study to get our product approved ”
Most common statistical question:

“How many patients do we need?”
Where to begin?
N = (Total Budget / Cost per patient)?
Hopefully not!
Does Size Matter?
Too few
Cannot definitively answer the research question
Potentially unethical
Too many
Wasteful of resources
Exposes more people than necessary to potentially harmful treatments
Potentially unethical
May identify treatment effects that are irrelevant and possibly create confusion
Where to begin?
Understand the research question
Learn about the application and the problem
Learn about the disease and the medicine
“What’s the question?”

The following are NOT research questions:
We want to “look at” CD4 count
We want to analyze the data
We want to see if our results are significant
Crystal Ball
Visualize the final analysis and the statistical methods to be used

Where to begin?
Analysis determines sample size

Sample size calculations are based upon the planned
method of analysis
If you don’t know how the data will be analyzed (e.g., 2-

sample t-test), then you cannot accurately estimate the
sample size
Sample Size Calculation
Formulate a PRIMARY research question
Identify:
A hypothesis to test (write down H0 and HA), or
A quantity to estimate (e.g., using confidence intervals)
Determine the endpoint or outcome measure associated with
the hypothesis test or quantity to be estimated
How do we “measure” or “quantify” the responses?
Is the measure continuous, binary, or a time-to-event?
Is this a one-sample or two-sample problem?

Based upon the PRIMARY outcome
Other analyses (i.e., secondary outcomes) may be

planned, but the study may not be powered to detect
effects for these outcomes
Two strategies
Hypothesis Testing
E.g.,
H0: μ1 = μ2 vs. HA: μ1 ≠ μ2
H0: μ = μ0 vs. HA: μ ≠ μ0
H0: p1 = p2 vs. HA: p1 ≠ p2
H0: p = p0 vs. HA: p ≠ p0
Estimation with Precision

Based on width of a confidence interval
Sample Size Calculation Using Hypothesis
Testing
The most common approach
The idea is to choose a sample size such that both of the following
conditions simultaneously hold:
If the null hypothesis is true, then the probability of incorrectly rejecting is (no
more than) α
If the alternative hypothesis is true, then the probability of correctly rejecting is

(at least) 1-β = power
Reality
Ho True Ho False
Power
Type I error
Reject Ho (1-β)
Test (α)
Result Do not reject

1-α
Type II error
Ho (β)
Determinants of Sample Size:
Hypothesis Testing Approach
α
An “effect size” to detect

Minimum difference that is clinically relevant (for superiority)
E.g., H0: p1 - p2 = 0 vs. HA: p1 - p2 = 0.20
Maximum difference that is clinically irrelevant (for noninferiority)
Estimates of variability
What is Needed to Determine the
Sample-Size?
α
Up to the investigator
Regulated by FDA for phase III pivotal trials (0.05)
How much type I (false positive) error can you afford?

What is Needed to Determine the Sample-
Size?
1-β (power)
Up to the investigator (often 80%-90%)

Not regulated by FDA
How much type II (false negative) error can you afford?

Choosing α and β
Weigh the cost of a Type I error versus a Type II error
In early phase clinical trials, we often do not want to “miss” a significant result
and thus often consider designing a study for higher power (perhaps 90%)
and may consider relaxing the α error (perhaps 0.10)
In order to approve a new drug, the FDA requires significance in two Phase III
trials strictly designed with α error no greater than 0.05 (power = 1-β is
often set to 80% - 90%)
Effect Size
The “minimum difference (between groups) that is clinically relevant
or meaningful”
Defines HA
E.g., H0: p1 - p2 = 0 vs. HA: p1 - p2 = 0.20
Not readily apparent
Requires clinical input
Often difficult to agree upon

Estimates of Variability
Often obtained from prior studies (historical data)
Explore the literature and data from ongoing studies for estimates
needed in calculations
Consider conducting a pilot study to estimate this
May need to validate this estimate later

Considerations
Scale of endpoint
Continuous vs. binary vs. time-to-event
1-sample vs. 2-sample
Independent samples or paired
1-sided vs. 2-sided

The General Situation
An important issue in planning a new study is the
determination of an appropriate sample size required to
meet certain conditions.
For example, for a study dealing with blood
cholesterol levels, these conditions are typically
expressed in terms such as
“How large a sample do I need to be able to reject the
null hypothesis that two population means are equal if
the difference between them is d = 10 mg/dl?”
28 - 20
The General Approach
We focus on the sample size required to test a specific hypothesis.
In general, there exists a formula for calculating a sample size for the
specific test statistic appropriate to test a specified hypothesis.
Typically, these formulae require that the user specify the α-level and
Power = (1 – β) desired, as well as the difference to be detected and
the variability of the measure in question.
Importantly, it is usually wise not to calculate a single number for the
sample size.
Rather, calculate a range of values by varying the assumptions so
that you can get a sense of their impact on the resulting projected
sample size.
Then you can pick a more suitable sample size from this range.
28 - 21
Three Common Situations
we are going to examine the process of estimating
sample size for three common circumstances in detail:
1. One-sample t-test and paired t-test,
2. Two-sample t-test, and
3. Comparison of P1 versus P2 with a z-test.
The tools required for these three situations are broadly

applicable and cover many of the circumstances that are
typically encountered.
There are sophisticated software packages that cover
much more than these three and most professional
biostatisticians have them readily available.
28 - 22
1. One-sample t-test and Paired t-test
For testing the hypothesis:
H0 :  = k vs. H1 :   k
with a two-tailed test, the formula is:
2
 ( z1 / 2  z1  ) 
n 
 d 
Note: this formula is used even though the test statistic

could be a t-test.
28 - 23
One-Sample Example
We are interested in the size for a sample from a
population of blood cholesterol levels.
If σ is about 30 mg/dl for these populations.
The following table shows sample sizes for different
levels of some of the factors included in the equation
for a one sample t-test for differences between a
specified population mean and the true mean.
28 - 24
One-Sample Example (contd.)
α = 0.05, σ = 25, d = 5.0, Power = 0.80
2
 ( z1 / 2  z1  ) 
n   
 d 
2
 1.96  0.842  25 
n   

 5 

2
 14.01  196.28
n  197
28 - 25
Sample Size for One-Sample t-test
Blood Cholesterol Levels: α = 0.05, σ = 25
1-z1-
 = 25 0.5 0.8 0.85 0.9 0.95
d 0 0.842 1.036 1.282 1.645
0.5 9,604 19,628 22,440 26,276 32,490
1.0 2,401 4,907 5,610 6,569 8,123
3.0 267 545 623 730 903
5.0 96 196 224 263 325
10.0 24 49 56 66 81
20.0 6 12 14 16 20
30.0 3 5 6 7 9
28 - 26
1-z1-
 = 30 0.5 0.8 0.85 0.9 0.95
d 0 0.842 1.036 1.282 1.645
0.5 13,830 28,264 32,314 37,838 46,786
1.0 3,457 7,066 8,078 9,460 11,696
3.0 384 785 898 1,051 1,300
5.0 138 283 323 378 468
10.0 35 71 81 95 117
20.0 9 18 20 24 29
30.0 4 8 9 11 13
28 - 27
1-z1-
 = 35 0.5 0.8 0.85 0.9 0.95
d 0 0.842 1.036 1.282 1.645
0.5 18,824 38,471 43,982 51,502 63,681
1.0 4,706 9,618 10,996 12,875 15,920
3.0 523 1,069 1,222 1,431 1,769
5.0 188 385 440 515 637
10.0 47 96 110 129 159
20.0 12 24 27 32 40
30.0 5 11 12 14 18
28 - 28
2. Two Sample t-test
For the hypothesis: H0: 1 = 2 vs. H1: 1  2
For a two tailed t-test, the formula is:
2 2
4 ( z1 / 2  z1  )
N  n1  n2  2
(d  1   2 )
28 - 29
Sample Size for Testing Two tailed t-test
H0: 1 = 2 vs. H1: 1  2
How large a sample would be needed for comparing two
approaches to cholesterol lowering using α = 0.05, to
detect a difference of d = 20 mg/dl or more with
Power = 1-  = 0.90
The formula is: 4 2 ( z1 / 2  z1  ) 2
N  n1  n2  2
(d  1   2 )
Note: Textbooks do not always clearly indicate whether

the formula they provide is for one group only or for
both groups combined.
28 - 30
When  = 30 mg/dl, β = 0.10,  = 0.05; z1-/2 = 1.96
Power = 1- β ; z 1- β = 1.282 , d = 20mg/dl
2 2
4(30) (1.96  1.282)
N  n1  n2 
(20) 2
4  900  (3.242) 2 37,838.03
 
400 400
N  94.6
Hence about 50 for each group

28 - 31
Sample Sizes:  = 25 mg/dl,  = 0.05
1-/z1-
 = 25 0.5 0.8 0.85 0.9 0.95
d 0 0.842 1.036 1.282 1.645
0.5 38,416 78,512 89,760 105,106 129,960
1 9,604 19,628 22,440 26,276 32,490
3 1,067 2,181 2,493 2,920 3,610
5 384 785 898 1,051 1,300
10 96 196 224 263 325
20 24 49 56 66 81
30 11 22 25 29 36
28 - 32
1-/z1-
 = 30 0.5 0.8 0.85 0.9 0.95
d 0 0.842 1.036 1.282 1.645
0.5 55,319 113,057 129,255 151,352 187,143
1 13,830 28,264 32,314 37,838 46,786
3 1,537 3,140 3,590 4,204 5,198
5 553 1,131 1,293 1,514 1,871
10 138 283 323 378 468
20 35 71 81 95 117
30 15 31 36 42 52
28 - 33
1-/z1-
 = 35 0.5 0.8 0.85 0.9 0.95
d 0 0.842 1.036 1.282 1.645
0.5 75,295 153,884 175,930 206,007 254,722
1 18,824 38,471 43,982 51,502 63,681
3 2,092 4,275 4,887 5,722 7,076
5 753 1,539 1,759 2,060 2,547
10 188 385 440 515 637
20 47 96 110 129 159
30 21 43 49 57 71
28 - 34
3. Two-sample proportions
H0 : P1 = P2 vs. H1 : P1  P2
2  P1  P2   P1  P2  
4( z1 / 2  z1  )    1  
  2  2 
N  n1  n2 
2
 d  P1  P2 
28 - 35
Example: d = P1 - P2 = 0.7 - 0.5 = 0.2
When  = 30 mg/dl, β = 0.10,  = 0.05; z1-/2 = 1.96
Power = 1- β ; z1- β = 1.282 , d = 20mg/dl
(P1+P2)/2 = (0.7+0.5)/2
2 = 0.6
4 1.96  1.282  (0.6)(1  0.6) 
N  (n1  n2 ) 
(0.2) 2
4(3.242) 2 (0.6)(0.4)  10.09
   252.25
(0.2) 2 0.04
N  252.25
28 - 36
Sample size for testing P1- P2 with α = 0.05
1-z1-
   0.5 0.8 0.85 0.9 0.95
P1 P2 0 0.842 1.036 1.282 1.645
0.9 0.8 196 400 458 536 663
0.8 0.7 288 589 673 788 975
0.7 0.6 350 714 817 956 1,183
0.6 0.5 380 777 889 1,041 1,287
0.5 0.4 380 777 889 1,041 1,287
0.4 0.3 350 714 817 956 1,183
0.3 0.2 288 589 673 788 975
0.2 0.1 196 400 458 536 663
0.1 0.0 73 149 171 200 247
28 - 37
1-β/z1-β
0.5 0.8 0.85 0.9 0.95
P1 P2 0 0.842 1.036 1.282 1.645
0.9 0.7 61 126 144 168 208
0.8 0.6 81 165 188 221 273
0.7 0.5 92 188 215 252 312
0.6 0.4 96 196 224 263 325
0.5 0.3 92 188 215 252 312
0.4 0.2 81 165 188 221 273
0.3 0.1 61 126 144 168 208
0.2 0.0 35 71 81 95 117
0.9 0.6 32 65 75 88 108

0.8 0.5 39 79 91 106 131
0.7 0.4 42 86 99 116 143
0.6 0.3 42 86 99 116 143
0.5 0.2 39 79 91 106 131
0.4 0.1 32 65 75 88 108
0.3 0.0 22 44 51 60 74
28 - 38
1-z1-
   0.5 0.8 0.85 0.9 0.95
P1 P2 0 0.842 1.036 1.282 1.645
0.9 0.5 20 41 47 55 68
0.8 0.4 23 47 54 63 78
0.7 0.3 24 49 56 66 81
0.6 0.2 23 47 54 63 78
0.5 0.1 20 41 47 55 68
0.4 0.0 15 31 36 42 52
0.9 0.4 14 29 33 38 47
0.8 0.3 15 31 36 42 51
0.7 0.2 15 31 36 42 51
0.6 0.1 14 29 33 38 47
0.5 0.0 12 24 27 32 39
0.9 0.3 10 21 24 28 35
0.8 0.2 11 22 25 29 36
0.7 0.1 10 21 24 28 35
0.6 0.0 9 18 21 25 30
28 - 39
Formulae

Uploaded by

Uploaded by

Sample Size

Most common statistical question:

N = (Total Budget / Cost per patient)?

“What’s the question?”

Visualize the final analysis and the statistical methods to be used

Analysis determines sample size

If you don’t know how the data will be analyzed (e.g., 2-

Formulate a PRIMARY research question

How do we “measure” or “quantify” the responses?

Is the measure continuous, binary, or a time-to-event?

Is this a one-sample or two-sample problem?

Based upon the PRIMARY outcome

Other analyses (i.e., secondary outcomes) may be

Estimation with Precision

If the alternative hypothesis is true, then the probability of correctly rejecting is

Result Do not reject

An “effect size” to detect

Maximum difference that is clinically irrelevant (for noninferiority)

How much type I (false positive) error can you afford?

Up to the investigator (often 80%-90%)

How much type II (false negative) error can you afford?

Not readily apparent

Requires clinical input

Often difficult to agree upon

Consider conducting a pilot study to estimate this

May need to validate this estimate later

1-sample vs. 2-sample

Independent samples or paired

1-sided vs. 2-sided

The tools required for these three situations are broadly

Note: this formula is used even though the test statistic

For a two tailed t-test, the formula is:

Note: Textbooks do not always clearly indicate whether

Hence about 50 for each group

0.9 0.6 32 65 75 88 108

You might also like