CHAPTER 5 PROBABILITY AND STATISTICS
Definition of statistics: The mathematics of the collection, organization and interpretation of numerical data,
especially the analysis of population by inference from sampling
Let
denotes a probability of an event A which is a subset of a sample space.
5.1 Rules of probability
1. Complement rule
2. Addition rule
3. For disjoint events,
4. Product rule,
, thus
If
5.2 Conditional probability
(a) If and are any events with
then
(b) If
then
and
are any events with
and
independent, then
5.3 Multiplication rule
If and are any events then
5.4 Total probability rule
If
are mutually exclusive and exhaustive events, then
P A P A E1 P A E2 P A Ek
P A E1 P E1 P A E2 P E2 P A Ek P Ek
5.5 Bayes Theorem
If
are mutually exclusive events, one of which occurs given that another event
occurs, then
Example 5.1 Three machines produce similar car parts. A produces 40% of the total output, machines B and C
produce 25% and 15% respectively. The proportions of the output from each machine that do not conform to the
specification are 10% for A, 5% for B and 1% for C. What proportion of these parts that do not conform to the
specification are produced by machine A?
Solution
Let D represent the event that a particular part is defective. Then the overall proportion of defective parts is
Using Bayes theorem,
Example 5.2 Suppose that 0.1% of the people in a certain area have a disease D and that a mass screening test is
used to detect cases. The test gives either a positive result or a negative result for each person. In practice the test
gives a positive result with probability 99.9% for a person who has D and a probability of 0.2% for a person who has
not. What is the probability that a person for whom the test is positive actually has the disease?
1
Solution
Let T represent the event that the test gives a positive result.
Then,
Using Bayes theorem,
5.6 Random variables
A random variable (rv) has a sample space of possible numerical values together with a distribution of probabilities.
Examples: (a) the number of defectives in a process (b) number of successful projects.
Random variables can be discrete or continuous.
Discrete random variables and distributions
Definition
If
X is a discrete random variable, then p x P X x is called
a probability mass function or
probability distribution if, for each outcome of x ,
(a)
p x 0
(b)
p x 1
x
Cumulative distribution functions
The cumulative distribution function,
F x for a discrete random
p x P X x is
F x P X x P X t
variable X with probability distribution
tx
Properties of the cumulative distribution functions
F x satisfies the following properties:
(a) F x P X x P X t
tx
0 F x 1
(c) If x y , then F x F y
(b)
Mean of a discrete random variable
X is a discrete random variable with probability distribution p x P X x , then the mean or
expected value for X which is denoted by X or E X is given by
If
X E X xp x
x
Variance of a discrete random variable
X is a discrete random variable with probability distribution p x P X x , then the variance for
X which is denoted by V X or is given by
2
2
V X X2 E X X x X p x
If
Standard deviation of a discrete random variable
The standard deviation of a discrete random variable, denoted as X , is the positive square root for the variance,
X2 .
Example 5.3
The number of successful projects
X per day obtained by a small engineering firm can be described by the
following probability distribution:
P X x 10
0
for x 0,1, 2, 3, 4
otherwise
Find the cumulative distribution function for X . Find the mean and variance for the number of successful projects
per day.
Solution
The cumulative distribution function for X is given by
FX x P X x PX X t
t x
For
For
For
For
x 0 , F 0 P X 0 P 0
0
0
10
x 1 , F 1 P X 1 P 0 P 1
1
0 0 .1
10
x 2 , F 2 P X 2 P 0 P 1 P 2
1 2
0 0 .3
10 10
x 3 , F 3 P X 3 P 0 P 1 P 2 P 3
0
For
1 2 3
0 .6
10 10 10
x 4 , F 4 P X 4 P 0 P 1 P 2 P 3 P 4
1 2 3 4
0 1 .0
10 10 10 10
5.7 Continuous random variables and distributions
Definition
If
X is a continuous random variable defined over a set of real
numbers, then
f x is called a probability
density function, if
(a)
f x 0
(b)
f x dx 1
P a X b f x dx
b
(c)
where
lies in the interval
a, b
Cumulative distribution functions
The cumulative distribution function,
function
f x is
F x P X
F x for a continuous random variable X with probability density
x f t dt for x
x
Properties of the cumulative distribution functions
F x satisfies the following properties:
P X a f t dt
a
(a)
for
for
(b)
P X a f t dt
a
P a X b f t dx
b
(c)
for
Mean of a continuous random variable
X is a continuous random variable with probability density function f x , then the mean or expected value
for X which is denoted by X or E X is given by
If
X E X xf x dx
Variance of a continuous random variable
X is a continuous random variable with probability density function f x , then the variance for
2
denoted by V X or X is given by
2
V X X2 E X X
If
X which is
x X f X x dx
2
x 2 f X x dx X2
Standard deviation of a continuous random variable
The standard deviation of a continuous random variable, denoted as
X2 .
X , is the positive square root for the variance,
Example 5.4 Assume that the particle size of an air pollutant (in micrometers) can be described by the following
probability function:
3
for x 1
f X x x 4
0
otherwise
(a) Show that the f x is a probability density function
(b) Find the cumulative distribution function
(c) Determine the mean and standard deviation
Solution
(a) f x is a probability density function if it satisfies
f x dx 1 .
Here
f X x dx
1
3
dx
x4
x 3
Therefore
f x is a probability density function.
(b) The cumulative distribution function for X is given by
FX x P X x
f t dt for x
X
3
dx
x4
x
1
3
x 1
1
1
3 11 3
x
x
(c) The mean for X is given by
X EX
xf x dx
x
1
3
dx
x4
3
dx
x3
1
3 2
2 x 1
3
micrometer s
2
The variance for
X is given by
V X X2 x 2 f x dx X2
3
3
x 4 dx
x
2
1
3
3
2 dx
x
2
1
3
3
x 1 2
9 3
3 sq. micrometer s
4 4
5.8 Discrete distributions
Bernoulli distribution
PMF
Range
Mean
Variance
P X x p x 1 p
x 0,1 and 0 p 1
p
p 1 p
1 x
Binomial distribution
PMF
Range
Parameters
Mean
Variance
n
n x
P X x p x 1 p
x
x 0,1,, n and 0 p 1
n and p
np
np1 p
Example 5.5 Suppose a road is flooded with probability
during a year and not more than one flood occurs
during a year. What is the probability that it will be flooded at least once during a five year period?
Solution
Let X be the event a flood occurs in a year.
Then,
Poisson distribution
PMF
Range
Parameter
Mean
P X x
x e
x 0,1, 2,
x!
Variance
If
and
, the binomial distribution can be approximated by the Poisson distribution with
.
Example 5.6 The number of flaws for a thin copper wire follows a Poisson distribution with a mean of 2.3 flaws per
mm. (a)Determine the probability of exactly two flaws in 1mm of wire. (b)Determine the probability of ten flaws in
5mm of wire.
Solution
(a) Let X be the number of flaws in 1mm of wire.
Given that
, thus
(a) Let X be the number of flaws in 5mm of wire. Then X has a Poisson distribution with
flaws.
5.9 Continuous distribution
Normal distribution
1x
1
f x
exp
2
2
PDF
Range
x , 0, 0
Parameters
: location parameter, : scale parameter
If X follows a normal distribution then
Also,
.5
.4
f(x)
.3
.2
.1
0.0
-6
-4
-2
5.10 Sample measures and parameter estimates
X 1 , X 2 ,, X n be a random sample from a population with mean and variance 2 . Then the point
estimate for and are
Let
x
8
where
xi
x2 xn
i 1
n
n
is the sample mean
And
s
2
Thus if
where
1 N
xi x 2 is the
s
n 1 i1
2
then
sample variance.
5.11 Confidence interval for the mean based on the normal distribution
(1)Population variance is known
The
100 1 % confidence interval for the mean
X z
X z
is given by
where
X is the sample mean.
th quantile of the standard normal distribution
(b) z is the 100
2
2
(a)
which is given in Table 1.
Assumptions:
(a)
X 1 , X 2 ,, X n is the random sample of size n from a population which has a normal distribution
with mean
and variance 2 .
(b) The sample size
n can either be small or large.
(2)Population variance is unknown
The
100 1 % confidence interval for the mean
S
S
X z
X z
2
2
n
n
is given by
where
X is the sample mean and S is the sample standard deviation.
th quantile of the standard normal distribution
(b) z is the 100
2
2
(a)
which is given in Table 1.
Assumptions:
(a)
mean
X 1 , X 2 ,, X n is the random sample of size n from a population which has a normal distribution with
and variance 2 .
(b) The sample size
n is large.
Table 1: Cumulative distribution function for the standard normal distribution
PZ z
1 2 x2
e dx
2
0
z
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
.00
0.5000
0.5398
0.5793
0.6179
0.6554
0.6915
0.7257
0.7580
0.7881
0.8159
0.8413
.01
0.5040
0.5438
0.5832
0.6217
0.6591
0.6950
0.7291
0.7611
0.7910
0.8186
0.8438
.02
0.5080
0.5478
0.5871
0.6255
0.6628
0.6985
0.7324
0.7642
0.7939
0.8212
0.8461
.03
0.5120
0.5517
0.5910
0.6293
0.6664
0.7019
0.7357
0.7673
0.7967
0.8238
0.8485
.04
0.5160
0.5557
0.5948
0.6331
0.6700
0.7054
0.7389
0.7704
0.7995
0.8264
0.8508
.05
0.5199
0.5596
0.5987
0.6368
0.6736
0.7088
0.7422
0.7734
0.8023
0.8289
0.8531
.06
0.5239
0.5636
0.6026
0.6406
0.6772
0.7123
0.7454
0.7764
0.8051
0.8315
0.8554
.07
0.5279
0.5675
0.6064
0.6443
0.6808
0.7157
0.7486
0.7794
0.8078
0.8340
0.8577
.08
0.5319
0.5714
0.6103
0.6480
0.6844
0.7190
0.7517
0.7823
0.8106
0.8365
0.8599
.09
0.5359
0.5753
0.6141
0.6517
0.6879
0.7224
0.7549
0.7852
0.8133
0.8389
0.8621
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
0.8643
0.8849
0.9032
0.9192
0.9332
0.9452
0.9554
0.9641
0.9713
0.9772
0.8665
0.8869
0.9049
0.9207
0.9345
0.9463
0.9564
0.9649
0.9719
0.9778
0.8686
0.8888
0.9066
0.9222
0.9357
0.9474
0.9573
0.9656
0.9726
0.9783
0.8708
0.8907
0.9082
0.9236
0.9370
0.9484
0.9582
0.9664
0.9732
0.9788
0.8729
0.8925
0.9099
0.9251
0.9382
0.9495
0.9591
0.9671
0.9738
0.9793
0.8749
0.8944
0.9115
0.9265
0.9394
0.9505
0.9599
0.9678
0.9744
0.9798
0.8770
0.8962
0.9131
0.9279
0.9406
0.9515
0.9608
0.9686
0.9750
0.9803
0.8790
0.8980
0.9147
0.9292
0.9418
0.9525
0.9616
0.9693
0.9756
0.9808
0.8810
0.8997
0.9162
0.9306
0.9429
0.9535
0.9625
0.9699
0.9761
0.9812
0.8830
0.9015
0.9177
0.9319
0.9441
0.9545
0.9633
0.9706
0.9767
0.9817
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3.0
0.9821
0.9861
0.9893
0.9918
0.9938
0.9953
0.9965
0.9974
0.9981
0.9987
0.9826
0.9864
0.9896
0.9920
0.9940
0.9955
0.9966
0.9975
0.9982
0.9987
0.9830
0.9868
0.9898
0.9922
0.9941
0.9956
0.9967
0.9976
0.9982
0.9987
0.9834
0.9871
0.9901
0.9925
0.9943
0.9957
0.9968
0.9977
0.9983
0.9988
0.9838
0.9875
0.9904
0.9927
0.9945
0.9959
0.9969
0.9977
0.9984
0.9988
0.9842
0.9878
0.9906
0.9929
0.9946
0.9960
0.9970
0.9978
0.9984
0.9989
0.9846
0.9881
0.9909
0.9931
0.9948
0.9961
0.9971
0.9979
0.9985
0.9989
0.9850
0.9884
0.9911
0.9932
0.9949
0.9962
0.9972
0.9979
0.9985
0.9989
0.9854
0.9887
0.9913
0.9934
0.9951
0.9963
0.9973
0.9980
0.9986
0.9990
0.9857
0.9890
0.9916
0.9936
0.9952
0.9964
0.9974
0.9981
0.9986
0.9990
3.1
3.2
3.3
3.4
0.9990
0.9993
0.9995
0.9997
0.9991
0.9993
0.9995
0.9997
0.9991
0.9994
0.9995
0.9997
0.9991
0.9994
0.9996
0.9997
0.9992
0.9994
0.9996
0.9997
0.9992
0.9994
0.9996
0.9997
0.9992
0.9994
0.9996
0.9997
0.9992
0.9995
0.9996
0.9997
0.9993
0.9995
0.9996
0.9997
0.9993
0.9995
0.9997
0.9998
Example 5.7
A research was done to determine the wind speed distribution in Penang. The following monthly wind speed data
(measured in m/s) was obtained.
10
15.42 12.85 10.28 13.36 15.42 20.56 16.28
10.28
9.25
8.22 11.31 14.91 16.45 13.36
11.31 11.31 12.85 11.82 14.39 15.42 16.96
12.85 12.85 11.82 14.39 12.34 24.67 12.85
Find a 90% confidence interval for the true mean wind speed in Penang.
25.70
15.42
21.59
20.05
15.42
13.36
15.42
27.24
9.25
12.85
15.42
22.62
Solution
Let
be the true mean wind speed (in m/s) in Penang.
Since the sample size is large
n 40 , the following confidence interval is used.
90% confidence interval for the true population means is given by
S
S
X z
X z
2
2
n
n
S
S
X z 0.05
X z 0.05
n
n
4.489
4.489
14.953 1.65
14.953 1.65
40
40
14.953 1.650.710 14.953 1.650.710
14.953 1.172 14.953 1.172
13.781 16.125
Thus the
Calculations
X 1 X 2 X 40 15.42 10.28 15.42 22.62
14.953
40
40
15.42 14.9532 10.28 14.9532 22.62 14.9532
1 40
2
2
S X i X
39 i 1
39
2
4.489 20.149
From Table 1, z 0.05 1.65
X
Example 5.8
The flow discharge of Sungai Kerian (measured in m3/s) was obtained at random. 50 readings were collected and the
mean flow discharge was found to be 3.512m3/s with a standard deviation of 0.5 m3/s. Construct a 99% confidence
interval for the true mean flow discharge of Sungai Kerian.
Solution
Let
be the true mean flow discharge of Sungai Kerian.
Since the sample size is large
11
n 50 , the following confidence interval is used.
99% confidence interval for the true population means is given by
S
S
X z
X z
2
2
n
n
S
S
X z 0.005
X z 0.005
n
n
0.5
0.5
3.512 2.57
3.512 2.57
50
50
3.512 2.57 0.071 3.512 2.57 0.071
3.512 0.182 3.512 0.182
3.330 3.694
Thus the
Calculations
X 3.512
n 50
S 0.5 . From Table 1, z0.005 2.57
5.12 Confidence intervals for the mean based on the t distribution
The
100 1 % confidence interval for the mean
S
S
X t ,n1
X t ,n1
2
2
n
n
is given by
where
X is the sample mean.
(b) S is the sample standard deviation.
th quantile of the t distribution with n 1 degrees of freedom. The critical
(c) t
is the 100
, n 1
2
2
(a)
values of the t distribution is given in Table 2.
Assumptions:
(a)
X 1 , X 2 ,, X n is the random sample of size n from a population which has a
normal distribution with mean
(b) The sample size
and variance 2 .
n is small.
Example 5.9
The moisture content (measured in percentage) of clay in Batu Ferringhi was investigated. The following data was
obtained from a random sample.
1.81
2.00
2.74
3.56
2.13
4.64
3.64
4.62
4.47
3.12
Construct a 98% confidence interval for the true moisture content for clay by assuming that the sample is from a
normal distribution.
Solution
Let
be the true mean moisture content (in percentage) for clay.
12
Since the sample size is small
n 10 , the following confidence interval is used.
98% confidence interval for the true population means is given by
S
S
X t ,n1
X t ,n1
2
2
n
n
S
S
X t0.01, 9
X t0.01, 9
n
n
1.091
1.091
3.273 2.821
3.273 2.821
10
10
3.273 2.8210.345 3.273 2.8210.345
3.273 0.973 3.273 0.973
2.300 4.246
Thus the
Calculations
X 1 X 2 X 00 1.81 4.64 2.13 3.12
3.273
10
10
1.81 3.273 4.64 3.273 3.12 3.273
1 10
2
S X i X
1.0912 1.190
i
1
9
9
2
From Table 2,
t 0.01, 9 2.821
Table 2: Critical values for the t distribution with degrees of freedom
13
1
2
3
4
5
6
7
8
9
10
0.40
0.325
0.289
0.277
0.271
0.267
0.265
0.263
0.262
0.261
0.260
0.30
0.727
0.617
0.584
0.569
0.559
0.553
0.549
0.546
0.543
0.542
0.20
1.376
1.061
0.978
0.941
0.920
0.906
0.896
0.889
0.883
0.879
0.15
1.963
1.386
1.250
1.190
1.156
1.134
1.119
1.108
1.100
1.093
0.10
3.078
1.886
1.638
1.533
1.476
1.440
1.415
1.397
1.383
1.372
0.05
6.314
2.920
2.353
2.132
2.015
1.943
1.895
1.860
1.833
1.813
0.025
12.706
4.303
3.182
2.776
2.571
2.447
2.365
2.306
2.262
2.228
0.02
15.895
4.849
3.482
2.999
2.757
2.612
2.517
2.449
2.398
2.359
0.015
21.205
5.643
3.896
3.298
3.003
2.829
2.715
2.634
2.574
2.528
0.01
31.821
6.965
4.541
3.747
3.365
3.143
2.998
2.897
2.821
2.764
11
12
13
14
15
16
17
18
19
20
0.260
0.259
0.259
0.258
0.258
0.258
0.257
0.257
0.257
0.257
0.540
0.539
0.538
0.537
0.536
0.535
0.534
0.534
0.533
0.533
0.876
0.873
0.870
0.868
0.866
0.865
0.863
0.862
0.861
0.860
1.088
1.083
1.080
1.076
1.074
1.071
1.069
1.067
1.066
1.064
1.363
1.356
1.350
1.345
1.341
1.337
1.333
1.330
1.328
1.325
1.796
1.782
1.771
1.761
1.753
1.746
1.740
1.734
1.729
1.725
2.201
2.179
2.160
2.145
2.131
2.120
2.110
2.101
2.093
2.086
2.328
2.303
2.282
2.264
2.249
2.235
2.224
2.214
2.205
2.197
2.491
2.461
2.436
2.415
2.397
2.382
2.368
2.356
2.346
2.336
2.718
2.681
2.650
2.625
2.603
2.584
2.567
2.552
2.540
2.528
21
22
23
24
25
26
27
28
29
30
0.257
0.256
0.256
0.256
0.256
0.256
0.256
0.256
0.256
0.256
0.532
0.532
0.532
0.531
0.531
0.531
0.531
0.530
0.530
0.530
0.859
0.858
0.858
0.857
0.856
0.856
0.855
0.855
0.854
0.854
1.063
1.061
1.060
1.059
1.058
1.058
1.057
1.056
1.055
1.055
1.323
1.321
1.320
1.318
1.316
1.315
1.314
1.313
1.311
1.310
1.721
1.717
1.714
1.711
1.708
1.706
1.703
1.701
1.699
1.697
2.080
2.074
2.069
2.064
2.060
2.056
2.052
2.048
2.045
2.042
2.189
2.183
2.177
2.172
2.167
2.162
2.158
2.154
2.150
2.147
2.328
2.320
2.313
2.307
2.301
2.296
2.291
2.286
2.282
2.278
2.518
2.508
2.500
2.492
2.485
2.479
2.473
2.467
2.462
2.457
40
60
120
0.255
0.254
0.254
0.253
0.529
0.527
0.526
0.524
0.851
0.848
0.845
0.842
1.050
1.046
1.041
1.036
1.303
1.296
1.289
1.282
1.684
1.671
1.658
1.645
2.021
2.000
1.980
1.960
2.123
2.099
2.076
2.054
2.250
2.223
2.196
2.170
2.423
2.390
2.358
2.326
5.13 Tests of hypotheses for the mean based on the normal distribution
(1)Population variance is known
One tail tests
Two tail tests
H 0 : d0
H 0 : d0
H1 : d 0
H1 : d0
H : d
1
Test statistic
14
X d0
2
n
Rejection region
Reject
H0
if
Z z
(or Z z )
Z z
Notes:
(b)
d 0 is a constant.
X is the sample mean.
(c)
(a)
is the
100
th
quantile of the standard normal distribution which is given in Table 1.
Assumptions:
(a)
X 1 , X 2 ,, X n is a random sample of size n from a population which has a normal distribution with
mean
and variance 2 .
(b) The sample size
n can either be small or large.
2 Population variance is unknown
One tail tests
Two tail tests
H0 : d0
H0 : d0
H1 : d0
H1 : d0
H : d
1
Test statistic
X d0
S2
n
Rejection region
Reject
Z z
(or Z z )
Notes:
(a)
15
d 0 is a constant.
H0
if
Z z
X is the sample mean and S is the sample standard deviation.
th quantile of the standard normal distribution
(c) z is the 100
2
2
(b)
which is given in Table 1.
Assumptions:
(a)
mean
X 1 , X 2 ,, X n is a random sample of size n from a population which has a normal distribution with
and variance 2 .
(b) The sample size
n is large.
Example 5.10
A research was done to determine the wind speed distribution in Penang. The following monthly wind speed data
(measured in m/s) was obtained.
15.42 12.85 10.28 13.36 15.42 20.56 16.28 25.70 15.42
9.25
10.28
9.25
8.22 11.31 14.91 16.45 13.36 15.42 13.36 12.85
11.31 11.31 12.85 11.82 14.39 15.42 16.96 21.59 15.42 15.42
12.85 12.85 11.82 14.39 12.34 24.67 12.85 20.05 27.24 22.62
Can you conclude that the mean wind speed in Penang is less than 12m/s? Use
0.10 .
Solution
We will follow the six step procedure to solve this problem.
Step 1: Define the population parameter of interests.
Let
be the true mean wind speed (in m/s) in Penang.
Since the sample size is large
n 40 , the following hypothesis test is used.
Step 2 : Define the null and alternative hypotheses
H 0 : 12
H 1 : 12
Step 3 : Calculate the test statistic
X d0
S2
n
14.953 12
Z
20.149
40
2.953
Z
0.710
Z 4.159
Z
Calculations
16
X 1 X 2 X 40 15.42 10.28 15.42 22.62
14.953
40
40
15.42 14.9532 10.28 14.9532 22.62 14.9532
1 40
2
2
S X i X
39 i 1
39
4.489 2 20.149
X
Step 4 : Determine the rejection region
Reject
H 0 if Z z z0.10 1.28 (From Table 1).
Step 5 : Result
The null hypothesis cannot be rejected.
Step 6 : Conclusion
0.10 , there is insufficient evidence to show that the true mean wind speed (in m/s) in Penang is less
At
than 12m/s.
Example 5.11
The flow discharge of Sungai Kerian (measured in m3/s) was obtained at random. Fifty readings were collected and
the mean flow discharge was found to be 3.512m3/s with a standard deviation of 0.5 m3/s. Show that the true mean
0.05 .
flow discharge at Sungai Kerian is not equal to 4 m3/s. Use
Solution
We will follow the six step procedure to solve this problem.
Step 1: Define the population parameter of interests.
Let
be the true mean flow discharge of Sungai Kerian.
Since the sample size is large
n 50 , the following hypothesis test is used.
Step 2 : Define the null and alternative hypotheses
H0 : 4
H1 : 4
Step 3 : Calculate the test statistic
X d0
2
where X 3.512 , S 0.25, n 50
S2
n
3.512 4
Z
0.25
50
Z
17
0.488
0.071
Z 6.873
Step 4 : Determine the rejection region
Reject
H 0 if
Z z z0.025 1.96 or Z z z0.025 1.96 (From Table 1)
2
Step 5 : Result
The null hypothesis is rejected.
Step 6 : Conclusion
0.10
At
, there is sufficient evidence to show that the true mean flow discharge of Sungai Kerian is not
equal to 4 m3/s.
5.14 Test of hypothesis for the mean based on the t distribution
One tail tests
Two tail tests
H0 : d0
H0 : d0
H1 : d0
H1 : d0
H : d
1
Test statistic
X d0
S2
n
Rejection region
Reject
T t ,n1
H0
if
T t
(or T t , n 1 )
,n 1
Notes:
d 0 is a constant.
(b) X is the sample mean.
(c) S is the sample standard deviation.
(a)
(d)
, n 1
is the
100
th
quantile of the t distribution with
of the t distribution is given in Table 2.
Assumptions:
18
n 1 degrees of freedom. The critical values
(a)
X 1 , X 2 ,, X n is a random sample of size n from a population which has a normal distribution with
mean
and variance 2 .
(b) The sample size
n is small.
Example 5.12
The moisture content (measured in percentage) of clay in Batu Ferringhi was investigated. The following data was
obtained from a random sample.
1.81
2.00
2.74
3.56
2.13
4.64
3.64
4.62
4.47
3.12
Is the moisture content greater than 3.0%? Use
0.05 .
Solution
We will follow the six step procedure to solve this problem.
Step 1: Define the population parameter of interests.
Let
be the true mean moisture content (in percentage) for clay.
Since the sample size is small n 10 , the following hypothesis test is used.
Step 2 : Define the null and alternative hypotheses
H 0 : 3.0
H 1 : 3.0
Step 3 : Calculate the test statistic
X d0
S2
n
3.273 3.0
T
1.190
9
0.273
T
0.364
T 0.750
Calculations
X 1 X 2 X 00 1.81 4.64 2.13 3.12
3.273
10
10
1 10
X i X 2 1.81 3.273 4.64 3.273 3.12 3.273 1.0912 1.190
9 i 1
9
2
S2
Step 4 : Determine the rejection region
Reject H 0 if T t ,n 1 t0.05,9 1.833 (From Table 2).
Step 5 : Result
The null hypothesis cannot be rejected.
19
Step 6 : Conclusion
At 0.10 , there is insufficient evidence to show that the true mean moisture content (in percentage) for clay is
greater than 3%.
5.15 Sample correlation
X and Y .
The sample correlation coefficient of n pairs of observations x1 , y1 , x 2 , y 2 ,, x n , y n denoted by
r is given by
Correlation measures the linear relationship between two variables,
X
n
i 1
X
n
i 1
X Yi Y
Y
n
i 1
i 1
2
i
The strength of the linear relationship is determined by the following:
If
0.80 r 1.00
then the relationship is very strong.
If
0.60 r 0.79
then the relationship is strong.
If
0.40 r 0.59
then the relationship is moderate.
If
0.20 r 0.39
then the relationship is weak.
If
0.00 r 0.19
then the relationship is very weak.
i 1
i 1
i 1
20
X Y
X Y
i 1
Y
i 1
i 1
Example 5.13
The cost,
of a manufacturing product usually depends on the lot size,
. The following data on the cost of the
manufacturing product and its lot size is given below:
30
70
140
270
530
1000
2000
3000
Y
1
5
10
25
50
100
250
500
X
Find the value of the correlation coefficient for the above data.
Solution
Y and X is given by
n
X
i Yi
The correlation coefficient between
n
X Y
i 1
X i2
i 1
21
i 1
i 1
i 1
2135030
941
325751
8
Yi 2
i 1
i 1
9417040
8
7040 2
14379200
8
1306950
1306950
463 .752860 .8 1326696
0.985
Therefore, there is a very strong linear relationship between cost and lot size.
Calculations
n8
8
X 941 , Y 7040 , X Y 2135030 , X 325751 .00 ,
i
i 1
8
i 1
i 1
i 1
Y 14379200
2
i 1
5.16 Simple linear regression
Let
X , Y , X
1
, Y2 ,, X n , Yn be n pairs of random variables. Then the simple linear regression
model is given by
Yi 0 1 X i i
i 1, 2,, n
where
Yi is the dependent or response variable
X i is the independent or regressor or explanatory or predictor
variable
0 is the intercept of the regression model
1 is the slope of the regression model
i is the random error term
Assumptions
The assumptions of the random error term are:
E i 0
2
(b) V i c (a constant)
(a)
(c) The probability distribution is normal
(d) Random error term is independent
Method of least squares
The method of least squares can be used to estimate the values of the intercept (
L min min Y X
This method minimizes the sum of squares of the random error term, that is
n
i 1
22
i 1
0 ) and slope ( 1 ) parameters.
Hence,
n
L
2 Yi 0 1 X i
0
i 1
0
n
L
2 Yi 0 1 X i X i 0
i 1
1
Simplifying yields,
n0 1 X i Yi
n
i 1
n
i 1
0 X i 1 X i2 Yi X i
n
i 1
i 1
i 1
Solving the two equations yield,
Y X
YX
n
0 Y 1 X
and 1
i 1
i 1
where
i 1
and
X
i 1
Thus the fitted or estimated regression model is
Yi 0 1 X i
23
i 1, 2,, n
i 1
Xi
Yi
i 1
i 1
ei Yi Yi is called the residual.
Example 5.14
The yield of a chemical process (in percentage) is hypothesized to be linearly related with the amount of catalyst (in
Y
X
Y
grams). Let
denote the yield of the chemical process and
be the amount of catalyst. The data is given below.
0.9
1.4
1.6
1.7
1.8
2.0
2.1
60.54
63.86
63.76
60.15
66.66
71.66
70.81
Fit a simple linear regression model.
Solution
The following simple linear regression model is fitted
Yi 0 1 X i i
i 1, 2,,7
where
Yi is the yield of a chemical process
X i is the amount of catalyst
By using the least squares method, the estimates for
Y X
YX
n
i 1
X
i 1
And
i 1
i 1
2
i
0 and 1 are
760 .17 751 .5086
19.87 18.8929
i 1
8.6614
8.8644
0.9771
0 Y 1 X 65.3486 8.86441.643 65.3486 14.5642 50.7844
Therefore the fitted simple linear regression model is
Yi 50.784 8.864 X i
for
i 1, 2,,7
Example 5.15
A study was conducted to determine the relationship between bridge pier scour depths,
q . A simple linear regression model of the form D 0 q
D q
D q
D
35.67
31.71
17.84
14.63
24
52.51
52.04
22.58
8.51
12.62
9.76
8.54
13.87
11.99
10.33
8.36
8.24
20.73
11.24
8.80
12.44
D and discharge intensity,
was proposed. The following data was obtained:
25.56
7.39
6.71
13.28
11.48
8.71
4.94
10.07
13.22
11.21
2.61
13.21
12.71 11.15 11.60 6.29
9.20
13.72 13.75 19.51 22.03 9.76
12.88 14.31 11.89 11.15 11.42
19.35 9.20
13.72 18.59 11.22
11.92 8.60
11.89 13.66 10.47
14.98 11.43 12.80 15.99 9.48
Determine the simple linear regression model for this problem.
6.49
6.42
7.78
11.85
9.78
7.48
5.50
7.13
6.85
4.00
4.07
4.08
1.62
7.72
4.68
3.40
4.00
3.18
Solution
The proposed model is given by
D 0q
The above model can be transformed into a simple linear regression model by taking natural logarithm as follows:
ln D ln 0 q
ln D ln 0 ln q
ln D ln 0 1 ln q
1
Letting
Yi ln D , 0 ln 0 and X i ln q , we will obtain the following linear regression model
Yi 0 1 X i i i 1, 2,,40
The following data gives the new values for
Yi ln D and X i ln q
Yi
Xi
Yi
Xi
Yi
Xi
Yi
Xi
3.57
3.46
2.88
2.68
2.54
2.62
2.56
2.96
2.48
2.71
3.96
3.95
3.12
2.14
2.41
2.62
2.66
2.22
2.15
2.44
2.54
2.28
2.14
2.63
2.45
2.97
2.48
2.62
2.48
2.55
2.48
2.34
2.12
2.11
1.84
3.09
2.41
2.92
2.61
2.77
3.03
2.42
2.17
2.52
2.22
2.28
2.44
2.42
2.35
2.25
3.24
2.00
1.90
2.59
1.87
1.86
2.05
2.47
2.28
2.01
2.44
2.16
1.60
2.31
1.70
1.96
1.92
1.39
1.40
1.41
2.58
2.42
.96
2.58
.48
2.04
1.54
1.22
1.39
1.16
By using the least squares method, the estimates for
Y X
YX
n
i 1
X i2
i 1
i 1
25
i 1
0 and 1 are
i 1
230 .09 218 .4492
226 .25 207 .1615
And
11.6408
0.6098
19.0885
0 Y 1 X 2.3997 0.60982.2757 2.3997 1.3877 1.012
0 ln 0
1.012
So 0 e e
2.7511
Here
Therefore the fitted model is
D 0 q 2.7511q 0.6098 for i 1, 2,,40
1
Calculations
40
i 1
40
40
Xi
3.57 3.46 1.40 1.41 95.99
2.3997
40
40
3.96 3.95 1.39 1.16 91.03
2.2757
40
40
40
40
Yi X i 3.57 3.96 3.46 3.95 1.40 1.39 1.41 1.16 230 .09
X
i 1
i 1
Y X 95.99 91.03 8737 .9697 218.4492
40
i 1
40
i 1
n
40
40
X i2 3.96 2 3.95 2 3.12 2 1.22 2 1.39 2 1.16 2
25
i 1
15.69 15.62 9.72 1.50 1.92 1.34 226 .25
X 91.03
2
40
i 1
26
40
8286 .4609
207 .1615
40