1743 Chapter 2 Data Description (B)
1743 Chapter 2 Data Description (B)
MEASURES OF DISPERSION
Measures of dispersion help us to understand the spread or variability of
a set of data. It gives additional information to judge the reliability of the
measure of central tendency and helps in comparing dispersion that is
present in various samples.
Two data sets can have the same mean, the same median, or the same
mode and yet they are very different in other respects.
Example: consider the heights (cm) of five employees from each of the
sales and production departments as shown:
The two groups have the same mean height, 190.4 cm, the same median
heights, 193 cm, and the same modal height, 193 cm. Nonetheless, it is
clear that the two data sets differ. To describe this difference
quantitatively, we use a measure of dispersion.
The more spread out or dispersed the data, the larger is the range, the
quartile deviation, the variance and the standard deviation.
Range
Range is the difference between the largest and the smallest
observations in a data set.
For raw data: Range = largest value – smallest value
1
Solution: Range =
Example (Discrete)
The following table shows the daily outputs of 80 workers in a factory.
Determine the range.
Daily output (units) Number of workers
10 – 19 6
20 – 29 10
30 – 39 30
40 – 49 20
50 – 59 14
60 – 69 4
Solution: Range =
Example (Continuous)
Find the range of the following frequency distribution regarding the time
spent by the .
Time Spent Number of students
0 –< 6 2
6 –< 12 4
12 –< 18 10
18 –< 24 12
24 –< 30 8
Solution: Range =
2
Advantage:
It is easy to understand and simple to calculate.
Disadvantage:
Since only the largest and the smallest values are considered, it can be
very much influenced by them especially if they are unrepresentative
extreme values.
Quartiles divide the data into 4 equal parts. Thus, with the quartiles
known, we can say that a quarter of the observations lies below the first
quartile. A quarter lies above the third quartile while half of the
observations lies between the two quartiles.
3
Computation of the quartiles:
(a) Raw data
1. Arrange the data into an array in ascending order of magnitude.
2. Locate the quartile items as:
Q1 = value of n 1 th item
4
3( n 1)
Q3 = value of th item
4
where n = number of items in a data set.
Example
The following array shows the daily income (in RM) of 10 factory workers:
20, 25, 26, 30, 32, 36, 38, 38, 40, and 50.
Solution:
(a) 𝑛 = 10,
𝑛+1 10 + 1
𝑄1 = 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑡ℎ 𝑖𝑡𝑒𝑚 = 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑡ℎ 𝑖𝑡𝑒𝑚
4 4
= 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 2.75𝑡ℎ 𝑖𝑡𝑒𝑚
= 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 2𝑛𝑑 𝑖𝑡𝑒𝑚 + 0.75 × (3𝑟𝑑 𝑖𝑡𝑒𝑚 − 2𝑛𝑑 𝑖𝑡𝑒𝑚)
= 25 + 0.75 × (26 − 25) = 25.75 (𝑅𝑀)
3(𝑛 + 1)
𝑄3 = 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑡ℎ 𝑖𝑡𝑒𝑚 = 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑡ℎ 𝑖𝑡𝑒𝑚
4
= 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑡ℎ 𝑖𝑡𝑒𝑚
= 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑡ℎ 𝑖𝑡𝑒𝑚 + × ( 𝑡ℎ 𝑖𝑡𝑒𝑚 − 𝑡ℎ 𝑖𝑡𝑒𝑚)
=
(b)
𝐼𝑛𝑡𝑒𝑟-𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝑟𝑎𝑛𝑔𝑒 = 𝑄3 − 𝑄1 =
(c)
𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 =
4
Advantages:
It can be computed even though the end values of the distribution are
not known, as with the open-ended classes.
It is not influenced by the extreme values.
Disadvantage:
It is not fully representative of a set of measurements as it is not based
on all the information available.
cQ1 n
Q1 LQ1 f Q11
f Q1 4
where LQ1 = lower class boundary of Q1 class
cQ1 = class size of Q1 class
f Q1 = frequency of Q1 class
fQ1 1 = cumulative frequency of the preceding Q1 class
5
cQ3 3n
Q3 LQ3 f Q3 1
f Q3 4
where LQ3 = lower class boundary of Q3 class
cQ3 = class size of Q3 class
f Q3 = frequency of Q3 class
f Q3 1 = cumulative frequency of the preceding Q3 class
Example
The following frequency distribution shows the daily production level in a
production line.
Production (units) Number of days
13 – 17 2
18 – 22 22
23 – 27 10
28 – 32 14
33 – 37 3
38 – 42 4
43 – 47 6
48 – 52 1
Calculate the quartile deviation using
(a) the linear interpolation method;
(b) an ogive.
Solution:
Production (units) Number of days Cumulative frequency
0
13 – 17 2 2
18 – 22 22 24
23 – 27 10 34
28 – 32 14 48
33 – 37 3 51
38 – 42 4 55
43 – 47 6 61
48 – 52 1 62
6
62
n = 62 Q1 = value of th item = value of 15.5 th item
4
Q3 = value of th item = value of th item
Q3 class boundaries:
Q3 =
Quartile deviation =
(b)
Cumulative
'<' ogive: Production at a production line for a period of 62
frequency
days
70
60
50
40
30
20
10
class boundaries
0
12.5 17.5 22.5 27.5 32.5 37.5 42.5 47.5 52.5
Production (units)
7
From the ‘<’ ogive, Q1 = 20.5 units which shows that 25% of the days are
having production less than or equal to 20.5 units and the other 75% of the
days are having production more than or equal to 20.5 units.
From the ‘<’ ogive, Q3 = 32 units which shows that 75% of the days are having
production less than or equal to 32 units and the other 25% of the days are
having production more than or equal to 32 units.
Q3 Q1 32 20.5
Quartile deviation 5.7 5 units
2 2
8
Range based on percentiles
Percentiles are the summary measures that divide a ranked data set into 100
equal parts. There are 99 percentiles in a ranked data set.
a) Raw data
Example: Find the P10 and P90 for the following raw data.
63, 105, 30, 43, 53, 73, 65, 77, 89, 70, 68, 47, 38, 34, 41, 80, 60, 54, 59
Solution:
Arrange the data in ascending order:
30, 34, 38, 41, 43, 47, 53, 54, 59, 60, 63, 65, 68, 70, 73, 77, 80, 89, 105
P10 =
P90 =
b) Ungrouped frequency distribution
Example: Find the P10 and P90 for the following distribution.
Marks 10 20 30 40 50 60
Number of students 3 9 20 8 5 4
Cumulative frequency 3 12 32 40 45 49
Solution:
P10 =
P90 =
9
c) Grouped frequency distribution
CPk nk
Pk LPk
f Pk 100
f Pk 1
where
LPk = lower class boundary of the percentile class
Example
The following frequency distribution shows the daily production level in a
production line. Compute P10 and P90 using (a) formula and (b) ogive.
Production (units) Number of days
13 – 17 2
18 – 22 22
23 – 27 10
28 – 32 14
33 – 37 3
38 – 42 4
43 – 47 6
48 – 52 1
Solution:
Production (units) Number of days Cumulative frequency
0
13 – 17 2 2
18 – 22 22 24
23 – 27 10 34
28 – 32 14 48
33 – 37 3 51
38 – 42 4 55
43 – 47 6 61
48 – 52 1 62
10
10(62)
n = 62 P10 = value of th item = value of 6.2 th item
100
60
50
40
30
20
10
class boundaries
0
12.5 17.5 22.5 27.5 32.5 37.5 42.5 47.5 52.5
Production (units)
11
Standard Deviation
square deviations:
x1 x 2 , x2 x 2 , …, xn x 2
mean-square deviation: x x
2
n
root-mean-square deviation: x x
2
Alternatively:
x 2
N N s n
n 1
12
The standard deviation computed from population data is denoted by the
symbol (pronounced as sigma); the standard deviation computed from
sample data is denoted by s.
Variance
Example
Find the standard deviation and variance for the following data:
2, 12, 7, 5, 9
Solution
N=5
∑x = 2 + 12 + 7 + 5 + 9 = 35
∑x2 = 22 + 122 + 72 + 52 + 92 = 303
Population standard deviation,
x2 x
2 2
303 35
11.6 3.41
N N 5 5
Population variance, 2 3.412 11.6
Example
During a particular summer month, the number of central air-conditioning
units sold by a random sample of 5 salespersons from a heating and air-
conditioning firm were as follows:
8, 11, 5, 12, 8
13
Solution
n = 5,
∑x =
∑x2 =
x 2
x
2
s n
=
n 1
Sample variance, s2 =
s
f n 1
Where n = f
Alternatively:
Population Standard deviation, Sample standard deviation,
fx 2 fx
2
fx 2
fx
2
n
f f s
n 1
Example
Find the mean and standard deviation of the following frequency distribution.
Class interval Frequency
0–6 2
6 – 12 4
12 – 18 10
18 – 24 12
24 – 30 8
30 – 36 4
14
Solution
Class interval f Class mark, x fx fx2
0–6 2 3 6 18
6 – 12 4 9 36 324
12 – 18 10 15 150 2250
18 – 24 12 21 252 5292
24 – 30 8 27 216 5832
30 – 36 4 33 132 4356
Total 40 792 18072
fx 792
Population mean, 19.8
f 40
Population standard deviation,
2
fx 2 fx
2
18072 792
59.76 7.73
f f 40 40
Example
The output distribution for a sample of 100 workers in BB Company is shown
below:
Output (units) Number of workers
21 – 25 10
26 – 30 35
31 – 35 16
36 – 40 14
41 – 45 12
46 – 50 10
51 – 55 3
Calculate the mean and the standard deviation.
Solution
Output (units) f Class mark, x fx fx2
21 – 25 10 23 230
26 – 30 35 28 980
31 – 35 16 33 528
36 – 40 14 38 532
41 – 45 12 43 516
46 – 50 10 48 480
51 – 55 3 53 159
Total 100 3425
15
fx 3425
Sample mean, x 34.25 units
f 100
fx 2
fx 2
s n
n 1
Example
Distribution 1 Distribution 2
Standard deviation 27 km RM 4.6
Mean 450 km RM 10
Coefficient of variation 6% 46%
Thus, the values in distribution 2 are more variable than the values in
distribution 1.
Example
Typist A can type 50 words per minute with standard deviation of 5 while
typist B can type 150 words per minute with standard deviation of 10. Which
typist is more consistent in her work?
16
Solution
The standard deviation of typist B is twice of typist A. B can type three times
the speed of A. Taking into consideration all the information, the coefficient
of variation is used.
5
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 𝑓𝑜𝑟 𝐴 = × 100 = 10%
50
10
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 𝑓𝑜𝑟 𝐵 = × 100 = 6.67%
150
The results show that the typing ability of typist B is more consistent than
typist A.
When making comparison, rule of the thumb is that the larger the
percentage, the greater is the relative variation.
Coefficient of Skewness
The term skewness is used to describe the shape of a frequency
distribution.
If the histogram of a frequency distribution is drawn, the distribution is said
to be skewed if the peak of the histogram lies to either side of the centre
of the distribution. The terms positive and negative skewness are used to
describe the direction of the skewness.
If the mean = mode = median, the distribution of data is said to be
symmetrical else asymmetrical or skewed
17
2 types of asymmetrical frequency distribution:
3( Mean Median )
Sk
SD
For population data, For sample data,
3( ~) 3( x x)
Sk Sk
s
Range of S k is [-3, 3]
For symmetrical distribution, S k 0
Example:
If the x 30.9 , x 28.8 and S 13.23 . Find S k and interpret the value.
Solution:
3(30.9 − 28.8)
𝑆𝑘 = = 0.4762
13.23
18
BAMS1743 QUANTITATIVE METHODS
TUTORIAL 3 (Measures of dispersion)
2. The following array shows the amounts spent (in RM) by a random
sample of 15 students at a primary school canteen:
1.50, 1.50, 1.80, 1.80, 1.80, 1.90, 1.90, 2.50, 2.90, 2.90, 3.40,
3.50, 3.80, 4.00, 4.10.
Determine
(a) the quartiles,
(b) the inter-quartile range,
(c) the standard deviation.
19
4. A company owns two garages, A and B. In garage A, a representative
sample of 200 consumers’ purchases was taken. The results were as
follows:
5. The following table shows the projected population for males and
females in a village.
(a) Calculate the median, mean and standard deviation for each
distribution.
20
6. (a) Explain the meaning of absolute and relative measures of
dispersion and compare their use.
21
Answers:
1. (a) 13 min (b) 6.7 min, 14.6 min (c) 5 min, 15.25 min (d) 5.125 min
2. (a) RM 1.80, RM 3.50 (b) RM 1.70 (c) RM 0.95
3. (a) 10.587 sales, 19.833 sales, 4.62 sales, 6.07 sales
4. (a) 5.6 gallons, 2.58 gallons. (b) Garage B (CV = 55%)
5. (a) Males: 28.98 years, 32.54 years, 21.85 years; Females: 33.19 years,
35.89 years, 23.39 years (b) Males: CV = 67.2%; Females: CV =
65.2%. (c) Males: SK = 0.4888; Females: SK = 0.3463.
6. (b) CV = 7.70% (c) RM260, RM23.9 (d) Admin Dept: CV = 9.19%; hence
the earnings in administration department are more variable than the
earnings in production department.
7. (b) 66 (RM’000), 56 (RM’000), 82 (RM,000), 13 (RM’000) (c) Prices 112
(RM’000)
22