Inferential Statistics Last
Inferential Statistics Last
Inferential Statistics Last
Inferential Statistics
Sampling Distributions
Sampling distribution can be defined as the probability
distribution of a sample statistic that is formed when samples of
size n are repeatedly taken from a population.
Sample Sample
Sample
Sample
Sample Sample
Sample Sample
Population
Sample 9
Sample 6 Sample 8
Sample 7 x̄9
x̄6 x̄8
x̄7
Example:
The weight in Kg of under five children have values {5, 10, 15, 20}.
These values are written on slips of paper and put in a hat.
a. Find the mean, standard deviation, and variance of the weight
of the population.
Population μ = 12.5
5
10 σ = 5.59
15
20 σ 2 = 31.25
Example continued:
c. Now create the probability distribution of the sample means.
Probability Distribution of
Sample Means
x f Probability
5 1 0.0625
7.5 2 0.1250 The mean and variance of this
distribution would be 12.5 and
10 3 0.1875 31.25 /2 respectively
12.5 4 0.2500
15 3 0.1875
17.5 2 0.1250
20 1 0.0625
0.20
0.15 The shape of the graph is
symmetric and bell shaped.
0.10
0.05 It approximates a normal
x distribution.
5 7.5 10 12.5 15 17.5 20
Sample mean
05/27/2020 Biostat for Postgraduate 9
The Central Limit Theorem
If a sample of size n 30 is taken from a population with any type of
distribution that has a mean = and standard deviation = ,
x x
the sample means will have a normal distribution.
xx
x x
x x x
x x x x x x
05/27/2020 Biostat for Postgraduate 10
Central Limit theorem cont…
An illustration showing how a sample size determines
the shape of the sampling distribution
Population
x
the sample means will have a normal distribution for any
sample size n.
xx
x x
x x x
x x x x x
x
05/27/2020 Biostat for Postgraduate 12
The Central Limit Theorem cont..
proportion.
a. Find the mean and standard error of the mean of the sampling
distribution.
Standard deviation
Mean (standard error)
μx μ σx
σ
=8 n
0.7 = 0.11
=
38
05/27/2020 Biostat for Postgraduate 14
Interpreting the Central Limit Theorem
Example continued:
μx = 8 σ x = 0.11
05/27/2020 Biostat for Postgraduate 15
Important points about assumptions of normal
distribution
We use the normal distribution to a sample if:
our sample is taken from a normally distributed population whose
population standard deviation () is known, or
the sample size is large (i.e. greater than 30) so that we can use the Central
Limit Theorem (CLT).
We use the student t-distribution (t- statistic) provided that we have the following
three conditions satisfied:
the sample is from a normally distributed population,
Population variance is unknown, and
the sample size is small i.e. less than 30. (Note that we can also use t-test
even if n> 30 if we want to be more conservative!)
i =1
xi
x =
n
Point estimate for population proportion is given by
x
p =
n
where x is the total number of success (events)
05/27/2020 Biostat for Postgraduate
Point estimation cont…
Mean μ
Proportion P
Variance
Here the problem with point estimate is that estimates vary from sample
to sample.
Most commonly used is the 95% CI, however 90% and 99% CI are
sometimes used.
If σ is unknown and the sample is from normal distribution, (whether n is small or large) we
use t-test by substituting σ with S
Of course in this last situation, if n is large we can approximate it by Z test. However, if
more accuracy is desired, we use t-test
20,21, 22,22,23,23,23,24,24,24,24,25,25,25,25,25,26,26,26,26,26,27,
27,27,27,27,27,28,28,28,28,28,28,28,28,29,29,29,29,29,30,30,30,30,
30,30,30,30,30,31,31,31,31,31,31,31,32,32,32,32,32,33,33,33,33,33,
33,33,34,34,34,34,35,35,35,35,36,36,36,36,36,37,37,37,37,38,38,39
Construct the 90%, 95%, 98% confidence interval for population mean
Solution: First the sample mean and standard deviation is calculated as 29.92
and 4.52 respectively.
Then we use Z-distribution as: 29.92 + Zα/2x4.52/√88
90% CI: (29.92 +1.645x4.52/9.4) =(29.13, 30.71)
95% CI: (29.92 +1.96x4.52/9.4) = (28.98, 30.86)
98% CI: (29.92 +2.33x4.52/9.4) = (28.80, 31.04)
05/27/2020 Biostat for Postgraduate
CI for proportions
Example
26
p (1 p )
p z
2 n
0.175 0.825
0.175 1.96
947
0.175 1.96 0.0124
[0.151 ; 0.2]
The purpose of the study is to collect data which will allow the
researcher to test the hypothesis.
1 2
Choose α. The value should be small, usually 5%.
Identify the null hypothesis H0 It is important to consider the consequences of
and the alternate hypothesis HA. both types of errors.
1 H 0 : 0 ( P P0 )
H A : 0 ( P P0 ), two tailed test
2 H 0 : 0 ( P P0 )
H A : 0 ( P P0 ), Left tailed test
3 H 0 : 0 ( P P0 )
H A : 0 ( P P0 ), Right tailed test
Observed _ Hypothesized
Test statistic = Value Value
Standard Error of
observed value
But for what values of p-value should we reject the null hypothesis?
When the p-value is less than 0.05, we often say that the result is
statistically significant.
05/27/2020 Biostat for Postgraduate
P-value and confidence interval
41
A researcher claims that the mean of the IQ for 16 students which are normally distributed is
110 and the expected value for all population is 100 with standard deviation of 10. Test the
hypothesis .
Solution
Ho:µ=100
VS (Step-1)
HA:µ≠100
Assume α=0.05 (Step-2)
Z = (110-100)4/10=4 (Step-3)
Z-critical at 0.025 is equal to 1.96.
And 4≥1.96 (Step-4)
Conclusion: Reject the null hypothesis (Step -5)
•
05/27/2020 Biostat for Postgraduate
Hypothesis testing for proportions
Example
43
Step 5: Conclusion
Hence we concluded that the proportion of childhood abuse in
psychiatry patients is different from 0.3
05/27/2020 Biostat for Postgraduate
Example: one-tailed test
A gynecologist says that girls at birth, averaging less than 51 cm
His colleague Judge reproach him that his claim is based on a
prejudice, and that the average length is 51 cm indeed
They draw a sample of 100 girls and obtained summary results:
H 0 : 51 Null hypothesis
H 1 : 51 Alternative hypothesis
One-sided test
x 51 50.8 51
1.58
s2 1.6
n 100
Conclusion: At significance level of 5%, the length of girls at
birth is 51 cm.
-1.645 -1.58
] ,1.645]
n
x S2
two-tailed test
[1.96, [
-1.96 1.96 2.14
] ,1.96]
Since 4.52 is not in the rejection region (4.52> 1.96), we reject the
null hypothesis at 5% significance level
P H
0
4.52
H 0 (1 H 0 )
1000
p value 0.000