Chapter 2

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 23

A meteorologist interested in the consistency of temperatures in three cities during a given

week collected the following data. The temperatures for the five days of the week in the three
cities were City 1: 25 24 23 26 17 City2: 22 21 24 22 20 City3: 32 27 35 24 28 Which city
have the most consistent temperature, based on these data?
check_circle

Expert Answer
star
star
star
star
star
1 Rating
Step 1

We have given the temperature for the five days of week in the three cities,

now to find which city have the more consistent temperature based on given data we will use
C.V. (Coefficient of variation).

Coefficient of Variation (C.V): Is defined as the ratio of standard deviation to the mean
usually expressed as percent.

C.V=SX*100%where,S: standard deviation S=∑(Xi−X)2n−1⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯√X: meanX=∑(Xi)n
n: sample size.

The distribution having less C.V is said to be less variable or more consistent 

Step 2

here for City 1 we have given data,

City 1:  25, 24, 23, 26, 17

then,

X1=∑(Xi)n=1155=23and S1=∑(Xi−X)2n−1⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯√=(25−23)2+(24−23)2+(23−23)2+(26−23)2+(17−23)
25−1⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯√S1=504⎯⎯⎯⎯

√=12.5⎯⎯⎯⎯⎯⎯⎯⎯√=3.5355

For City 2: 22, 21, 24, 22, 20

then

X2=∑(Xi)n=1095=21.8and S2=∑(Xi−X)2n−1⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯√=(22−21.8)2+(21−21.8)2+(24−21.8)2+(22−21.8)
2+(20−21.8)25−1⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯

⎯⎯⎯⎯⎯⎯⎯√S2=8.84⎯⎯⎯⎯⎯√=2.2⎯⎯⎯⎯⎯⎯√=1.4832

for City 3:  32, 27, 35, 24, 28

1
X3=∑(Xi)n=1465=29.2and S3=∑(Xi−X)2n−1⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯√=(32−29.2)2+(27−29.2)2+(35−29.2)2+(24−29.2)
2+(28−29.2)25−1⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯

⎯⎯⎯⎯⎯⎯⎯√S3=74.84⎯⎯⎯⎯⎯⎯√=18.7⎯⎯⎯⎯⎯⎯⎯⎯√=4.3243

Step 3

Now coefficient of variation  for each city is,

C.V1=SX*100%C.V1=3.535523*100%=15.3717%C.V1=15.3717%Now For city 2C.V2=SX*100%
C.V2=1.483221.8*100%=6.8036C.V2=6.8036% Now For City 3C.V3=SX*100%=4.324329.2*100%
=14.8092%C.V3=14.8092%

CHAPTER TWO

2. Summarizing of data

2.1 Measure of central tendency

Objectives:
 To comprehend the data easily
 To facilitate comparison
 To make further statistical analysis

When we want to make comparison between groups of numbers it is good to have a


singlevalue that is considered to be a good representative of each group. This single
value is called theaverage of the group.
An average which is representative is called typical average and an average which is
not representative and has only a theoretical value is called a descriptive average.
A typical average should have the following characteristics:
 It should be rigidly defined.
 It should be based on all observation under investigation.
 It should be as little as affected by extreme observations.
 It should be capable of further algebraic treatment.
 It should be as little as affected by fluctuations of sampling.
 It should be ease to calculate and simple to understand.

2
2.2Types of Measures of Central Tendency

Measures of Central Tendency:-give us information about the location of the center of


the distribution of data values. A single value that approximately describes the
characteristics of the entire mass of data is called measures of central tendency.
There are several different measures of central tendency; each has its own
advantages and disadvantages. Among those:
 Mean (Arithmetic, Weighted, Geometric and Harmonic)
 Mode
 Median
 Quantiles(Quartiles, deciles and percentiles)
The choice of these averages depends up on which best fit the property under
discussion.

2.2.1 Mean
The Arithmetic Mean
 Is defined as the sum of the magnitude of the items divided by the number of
items. The mean of X1, X2 ,X3 …Xn is denoted by A.M ,m or X and is given by:
n

x = x 1+
x 2+¿ …+ x
n
¿=
∑ xi ,where n is sample size
i=1
n
n
 If we take an entire population the mean is denoted by μ and is given by:
N

 μ= X 1 +
X 2 +¿…+ X N
¿=
∑ X i ,where N is population size
i=1
N
N
 If X1 occurs f1 times
 If X2 occurs f2 times
.
.
 If Xn occurs fn times
k

∑ f i xi k
, where k is the number of classes and ∑ f i =n
i=1
Then the mean will be X = k

∑ fi i=1

i=1

Example: Calculate the arithmetic mean of the sample of numbers of students in 10


classes:
50 42 48 60 58 54 50 42 50 42

3
n

x= ∑ xi = 50+42+ 48+60+58+54+ 50+42+50+ 42 = 496 = 49.6 ≈ 50


i=1
10 10
n
In this case there are three 42’s, one 48, three 50’s, one 54, one 58 and one 60. The
number of times each number occurs is called its frequency and the frequency is
usually denoted by f. The information in the sentence above can be written in a table,
as follows.
Value, xi 42 48 50 54 58 60
Frequency, fi 3 1 3 1 1 1
x i fi 126 48 150 54 58 60

The formula for the arithmetic mean for data of this type is
k

x1 f 1 + x 2 f 2 +…+ x k f k
∑ xi f i
i=1
x= = k
f 1 + f 2+ …+f k
∑ fi
i=1

In this case we have:


42 x 3+ 48 x 1+50 x 3+54 x 1+58 x 1+ 60 x 1 126+48+150+ 54+58+60 496
x = = = = 49.6
3+1+3+1+1+1 10 10
≈ 50
The mean numbers of students in ten classes is 50.
Arithmetic Mean for Grouped Data
If data are given in the shape of a continuous frequency distribution, then the mean
is obtained as follows:
k

∑ f i xi
i=1
X = k , where Xi =the class mark of the i th class and fi = the frequency of the i th
∑ fi
i=1

class
Example: The following frequency table gives the height (in inches) of 100 students
in a college.
Class Interval (CI) 60- 62-64 64-66 66-68 68-70 70-72 Total
62
Frequency (f) 5 18 42 20 8 7 100
Calculate the mean

Solution:
The formula to be used for the mean is as follows:

4
k

∑ xi f i
i=1
x= k

∑ fi
i=1
Let us calculate these values and make a table for these values for the sake of
convenience.
Class Interval (CI) 60-62 62-64 64-66 66-68 68-70 70-72 Total
Frequency (f) 5 18 42 20 8 7 100
Mid-Point ( x i) 61 63 65 67 69 71
f i xi 305 1134 2730 1340 552 497 6558

6
Substituting these values with ∑ f i = 100, we get
i=1
k

∑ xi f i 6558
i=1
x= =x= = 65.58
k
100
∑ fi
i=1

The mean height of students is 65.58


Exercises:
1.Marks of 75 students are summarized in the following frequency distribution:
Marks No. of students
40-44 7
45-49 10
50-54 22
55-59 F4
60-64 F5
65-69 6
70-74 3
If 20% of the students have marks between 55 and 59
I. Find the missing frequencies f4 and f5.
II. Find the mean.
Special properties of Arithmetic mean
1. The sum of the deviations of a set of items from their mean is always zero. i.e.
n
That is,∑ ( xi −x ) =0
i=1

2. The sum of the squared deviations of a set of items from their mean is the
n n
minimum. i.e.∑ ( xi −x ) 2 ≤ ∑ ( x i− A )2 , x ≠ A
i=1 i=1

3. If the mean of x 1 , x 2 , … , x n is x , then

5
a) The mean of x 1 ± k, x 2± k ,..., x n± k will be x ± k
b) The mean of kx 1 , kx 2 , … , kx n will be k x .

4. If
X̄ 1 is the mean of n1 observations, if
X̄ 2 is the mean of n2 observations,

… ,if
X̄ k n
is the mean of k observation, then the mean of all the observation
in all groups often called the combined mean is given by:
k

x n + x n + …+ x k n k ∑
x i ni
xc= 1 1 2 2 = i=1k
n1+ n2 +…+n k
∑ ni
i=1

Example: If the mean of one class of 50 students are 30 and the mean of marks of
another class of 100 students are 40. What is the mean of all 150 students?
Solution: based on the above formula it is (50*30 + 100*40)/(50 + 100) =36.7.
5. If a wrong figure has been used when calculating the mean the correct mean
can be obtained without repeating the whole process using:
( CorrectValue−WrongValue )
Correct Mean =wrong mean+ , where n is total
n
number of observations.
Example: An average weight of 10 students was calculated to be 65.Latter it was
discovered that one weight was misread as 40 instead of 80 kg. Calculate the correct
average weight.

Solutions:

( CorrectValue−WrongValue )
Correct Mean =wrong mean+
n

( 80−40 )
Correct Mean =65 + = 65+4=69k.g.
10
6. The effect of transforming original series on the mean.
a) If a constant k is added/ subtracted to/from every observation then the new mean
will be theold mean± k respectively.
b) If every observations are multiplied by a constant k then the new mean will be
k*old mean

Example:

6
1. The mean of n Tetracycline Capsules X 1, X2, …,Xn are known to be 12 gm.
New set of capsules of another drug are obtained by the linear transformation
Yi = 2Xi – 0.5 ( i = 1, 2, …, n ) then what will be the mean of the new set of
capsules
Solutions:
New Mean = 2*old mean - 0.5 = 2*12- 0.5=23.5
2. The mean of a set of numbers is 500. a) If 10 is added to each of the numbers
in the set, then what will be the mean of the new set? b) If each of the numbers
in the set are multiplied by -5, then what will be the mean of the new set?
Solutions:
a New Mean = Old Mean + 10 = 500+10= 510
b. NewMean= -5*OldMean=-5*500 = -2500

Merits and Demerits of Arithmetic Mean


Merits:
 It is rigidly defined.
 It is based on all observation.
 It is suitable for further mathematical treatment.
 It is stable average, i.e. it is not affected by fluctuations of sampling to some
extent.
 It is easy to calculate and simple to understand.
Demerits:
 It is affected by extreme observations.
 It cannot be used in the case of open end classes.
 It cannot be determined by the method of inspection.
 It cannot be used when dealing with qualitative characteristics, such as
intelligence, honesty, beauty.
 It can be a number which does not exist in a serious.
 Sometimes it leads to wrong conclusion if the details of the data from which it
is obtained are not available.
 It gives high weight to high extreme values and less weight to low extreme
values.

Weighted Mean

7
When a proper importance is desired to be given to different data a weighted mean
is appropriate.Weights are assigned to each item in proportion to its relative
importance.
Let X1, X2, …Xn be the value of items of a series and W 1,W2, …Wn their corresponding
weights, then the weighted mean denoted X w is defined as:
n

∑ Xi W i
X w= i=1n
∑ wi
i=1

Example:
A student obtained the following percentage in an examination:
Statistics 60, Biology 75, Mathematics 63, Physics 59, and chemistry 55. Find the
students weighted arithmetic mean if weights 1, 2, 1, 3, 3 respectively are allotted to
the subjects.
Solutions:
n

∑ Xi W i 60∗1+75∗2+63∗1+ 59∗3+55∗3 615


X w= i=1n = = =61.5
1+2+1+3+3 10
∑ wi
i=1

The Geometric Mean

If the observed values are measured as ratios, proportions or percentages and the series of
observations contains one or more unusually large values geometric mean gives a better
measure of central tendency than other means.
 The geometric mean of a set of n observation is the nth root of their product.
 The geometric mean of X1, X2 ,X3 …Xn is denoted by G.M and given by:
n
G. M=√ X 1∗X 2∗. . .∗X n
 Taking the logarithms of both sides
1
n
log ( G . M )=log ( √ X 1∗ X 2∗.. .∗ X n )= log( X 1∗X 2∗.. .∗X n )n

1 1
⇒ log ( G . M )= log ( X 1∗X 2∗. . ..∗X n )= ( log X 1 + log X 2 +. . .+ log X n )
n n
n
1
⇒ log ( G . M )= ∑ log X i
n i=1
⇒ The logarithm of the G.M of a set of observation is the arithmetic mean of their
logarithm.

8
n
1
G . M =antilog ∑ log X i
n i=1
Example: Find the G.M of the numbers 2, 4, 8.
Solutions:
n 3 3
G. M=√ X 1∗X 2∗. . .∗X n =√ 2∗4∗8=√ 64=4
The Harmonic Mean
The harmonic mean of X1, X2 , X3 …Xn is denoted by H.M and given by:
n
k
H.M = , is called simple harmonic mean.
∑ X1
i=1 i

In a case of frequency distribution:


n k
H.M =
k
f i , where n=∑ f i
∑ Xi i=1
i=1

If observations X1, X2 ,..., Xnhave weights W1,W2 ,..., Wnrespectively, then their
harmonic mean is given by
n

∑ wi
i=1
H.M = n , this is called Weighted Harmonic Mean.
wi
∑ Xi
i=1

Remark: The Harmonic Mean is useful and appropriate in finding average speeds and
average rates.

2.2.2 Mode

 Mode is a value which occurs most frequently in a set of values.


 The mode may not exist and even if it does exist, it may not be unique.
 In case of discrete distribution the value having the maximum frequency is the
model value.
Examples:
1. Find the mode of 5, 3, 5, 8, 9 Mode =5
2. Find the mode of 8, 9, 9, 7, 8, 2, and 5. It is a bimodal Data: 8 and 9
3. Find the mode of 4, 12, 3, 6, and 7. No mode for this data.
The mode of a set of numbers X1, X2 ,..., Xnis usually denoted by ^
X.

Mode for Grouped data

9
If data are given in the shape of continuous frequency distribution, the mode is
defined as:
∆1
Mode=Lmo + ×w
∆ 1 + ∆2
w=the size of the mod al class
Δ 1=f mo −f 1
Δ 2=f mo −f 2
f mo=frequency of the mod al class
f 1 =frequency of the class preceeding the mod al class
f 2=frequency of the class following the mod al class
Note: The modal class is a class with the highest frequency.

Example: Calculate the mode for the frequency distribution of data


C.I 1 – 5 6 - 10 11 – 15 16 – 20 21 - 25 26 - 30 31 - 35 Total

Freq. 4 8 12 6 3 4 3 40

Solution: By inspection, the mode lies in the third class, where L =10.5, f mod= 12, f1=8,
f2=6, w = 5
Using the formula, the mode is:
∆1
Mode=Lmo + × w= 10.5 + (12-8)*5/(12-8)+(12-5) = 12.5
∆ 1 + ∆2

Merits and Demerits of Mode


Merits:
 It is not affected by extreme observations.
 Easy to calculate and simple to understand.
 It can be calculated for distribution with open end class

Demerits:
 Often its value is not unique.
 It is not based on all observations
 Mode may not exist in the series.
 It is not suitable for further mathematical treatment.

2.2.3 Median
In a distribution, median is the value of the variable which divides it in to two equal
parts. In an ordered series of data median is an observation lying exactly in the

10
middle of the series. It is the middle most value in the sense that the number of
values less than the median is equal to the number of values greater than it and
denoted by~x.
If X1, X2, …,Xn be the observations, then the numbers arranged in ascending order
will be X[1], X[2], …X[n], where X[i] is ith smallest value. i.e. X[1]< X[2]< …<X[n]
Median for ungrouped data
~
x=¿

Example: Find the median for the following data.


a) -5 15 10 5 0 2 1 4 6 and 8
b) 5 2 2 3 1 8 4

Solution;
a. The data in ascending order is given by:
-5 0 1 2 4 5 6 8 10 15
n=10 èn is even. The two middle values are 5th and 6th observations. So the median is,
th th
10 10 th th
~ () +( +1) 5 +6 4 +5
x= 2 2 value = = =4.5
2 2
2
b. The data in ascending order is given by:
1 2 2 3 4 5 8
The middle value is the 4th observation. So the median is 3.
Median for grouped data
If data are given in the shape of continuous frequency distribution, the median is
defined as:

Median=Lmed +
w
f med ( n2 −CF )
where: Lmed =¿ the lower class boundary of the median class;
w = the class width of the median class;
f med =¿the frequency of the median class; and
CF = theless than cum. freq. corresponding to the class preceding the median
class.
Remark:The median class is the class with the smallest cumulative frequency (less than
n
type) greater than or equal to .
2

Example: Calculate the median for the following frequency distribution.

C.I 1 – 5 6 - 10 11 – 15 16 – 20 21 - 25 26 - 30 31 - 35 Total
Freq. 4 8 12 6 3 4 3 40

11
Solution: Construct the less than cumulative frequency distribution, then:
C.I 1 – 5 6 - 10 11 – 15 16 – 20 21 - 25 26 - 30 31 - 35 Total
Freq. 4 8 12 6 3 4 3 40
Cuml. Freq. 4 12 24 30 33 37 40

Since n = 40, 40/2 = 20, and the smallest CF greater than or equal to 20 is 24; thus, the

median class is the third class. And for this class, L = 10.5, w = 5, =12, CF = 12.

Then applying the formula, we get: =10.5+ (20-12)*5/12=13.8

Merits and Demerits of Median


Merits:
 Median is a positional average and hence not influenced by extreme
observations.
 Can be calculated in the case of open end intervals.
 Median can be located even if the data are incomplete.
Demerits:
 It is not a good representative of data if the number of items is small.
 It is not amenable to further algebraic treatment.
 It is susceptible to sampling fluctuations.

The Relationship of the Mean, Median and Mode


Comparing the Mean, Median, and the Mode
 If the data is skewed –avoid the mean.
 If there is high gap around the middle- avoid the median.
 The median is resistant to the influence of extreme data values or outliers.
 The mode has an advantage over both the mean and the median when the data is
categorical since it is not possible to calculate the mean or median for this type of data.
 Mean=Median = Mode forsymmetrical distribution; mean, median and mode coincide.

2.3 Measures of Location (Quantiles)

Quantiles are a measure which divides a given set of data in to approximately equal
subdivision and are obtained by the same procedure to that of median. They are
averages of position (non-central tendency). Some of these are quartiles, deciles and
percentiles.

12
Quartiles: are values which divide the data set in to approximately four equal parts,
denoted byQ1 ,Q 2 ∧Q3. The first quartile (Q 1) is also called the lower quartile and the
third quartile (Q 3) is the upper quartile. The second quartile (Q2) is the median.

• Quartiles for Individual series:


Let x 1 , x 2 , … .. x n be n ordered observations. The ith quartile (Qi) is the value of the item
corresponding with the [i(n+1)/4]th position, i = 1, 2, 3.

That is, after arranging the data in ascending order, Q1, Q2, & Q3 are, obtained by:

( ) ( ) ( ) value.
th th th
1( n+1) 2(n+1) 3(n+1)
Q 1= value , Q2= value and Q3=
4 4 4

• Quartiles for discrete data arranged in a frequency distribution:-

Arranged in a frequency distributionthis case also, we will follow the same procedure
as the median. That is, we construct the less than cumulative frequency distribution
and apply the formula of quartile for individual series.

• Quartiles for grouped continuous data:-

For continuous data, use the following formula:


w ¿
Q i=L+ (
f Qi 4
−CF )
Where i = 1, 2, 3, and L, w,fQi and CF are defined in the same way as the median.

i.e. Q1 = L +
w n
f Q1 4 (
−CF , ) Q2 = L + (
w 2n
f Q2 4 )
−CF ∧¿Q3 = L +
w 3n
f Q3 4
−CF ( )
Deciles: are values dividing the data approximately in to ten equal parts, denoted by
D 1 , D2 , …, D 9.

• Deciles for Individual Series:

Let x1, x2 …xnbe n ordered observations. The ithdecile(D¿¿ i)¿ is the value of the item
corresponding with the [i(n+1)/10]th position, i = 1, 2, . . . ,9.

That is, after arranging the data in ascending order, D1, D2, . . .& D9 are, obtained by:

13
( ) ( ) ( )
th th th
1(n+1) 2(n+1) 9(n+ 1)
D 1= value , D2= value . . . and D 9= value.
10 10 10

• Deciles for Discrete data arranged in a frequency distribution:-

Arranged in a frequency distribution this case also, we will follow the same procedure
as the median. That is, we construct the less than cumulative frequency distribution
and apply the formula of deciles for individual series.

• Deciles for continuous data:

Apply the following formula and follow the procedures of quartile for continuous data.
w ¿
D i=L+
f D 10
i
( )
−CF ,i = 1, 2... 9.

define the symbols in similar ways as we did in the case of quartiles for
Then
continuous data.
Percentiles: are values which divide the data approximately in to one hundred
equal parts, and denoted by P1 , P2 , …, P99 .

• Percentiles for individual Series:

Let x1, x2 …xn be n ordered observations. The ith percentile(P¿¿ i) ¿ is the value of the
item corresponding with the [i(n+1)/100]th position, i = 1, 2, . . . ,99.

That is, after arranging the data in ascending order, P1, P2,…,& P99 are, obtained by:

( ) ( ) ( )
th th th
1(n+1) 2(n+1) 99(n+1)
P1= value , P2= value . . . and P99 = value.
100 100 100

• Percentiles for Discrete data arranged in a frequency distribution:-

Arranged in a frequency distribution this case also, we will follow the same procedure
as the median. That is, we construct the less than cumulative frequency distribution
and apply the formula of percentile for individual series.

• Percentiles for continuous data:

❑❑
Apply the following formula

14
w ¿
Pi=L+
f P 100
i
( )
−CF ,i = 1, 2... 99.

definethe symbols similar ways as we did in the case of quartiles or deciles for
Then
continuous data.
Interpretations
1. Qiisthe value below which ( i×25)% of the observations in the series are found (w’ei = 1,
2,3). For instance Q 3 means the value below which 75% of observations in the given series
are found.
2. Diis the value below which ( i×10)% of the observations in the series are found (where i =
1, 2,...,9 ). For instance D 4 is the value below which 40% of the values are found in the
series.
3. Piis the value below which i percent of the total observations are found (where i =1,
2,3,..,99 ). For example 60 percent of the observations in a given series are below P60.

Example: Marks of 50 students out of 85 is given below. Based on the data find Q1,
D 4 a nd P7.
Marks 46-50 51-55 56-60 61-65 66-70 71-75 76-80
fi 4 8 15 5 9 5 4
Solution: - first find the class boundaries and cumulative frequency distributions.
Marks 46-50 51-55 56-60 61-65 66-70 71-75 76-80
class 45.5-50.5 50.5-55.5 55.5-60.5 60.5-65.5 65.5-70.5 70.5-75.5 75.5-80.5
boundary
fi 4 8 15 5 9 5 4
Cum. 4 12 27 32 41 46 50
Frequency

Q1 Measure of (n/4)th value = 12.5th value which lies in group 55.5 – 60.5

Q1 = L +
w n
f Q1 4 ( )5
−CF = 55.5 + ( 12.5−12 ) = 55.7
15
D4 Measure of (4n/10)th value = 20th value which lies in group 55.5 – 60.5.

D4 = L +
w 4n
f D 4 10 ( 5
)
−CF = 55.5 + ( 20−12 ) = 58.2
15
P7 Measure of (7n/100)th value = 3.5th value which lies in group 45.5 – 50.5

P7 = L +
w 7n
f P 7 100( 5
)
−CF = 45.5 + ( 3.5−0 ) = 49.875.
4

15
2.4. Measures of Dispersion (Variation)

The scatter or spread of items of a distribution is known as dispersion or variation. In other


words the degree to which numerical data tend to spread about an average value is called
dispersion or variation of the data. -Measures of dispersions are statistical measures which
provide ways of measuring the extent in which data are dispersed or spread out.

Objectives of measuring Variation:

 To judge the reliability of measures of central tendency


 To control variability itself.
 To compare two or more groups of numbers in terms of their variability.
 To make further statistical analysis.

Absolute and Relative Measures of Dispersion

The measures of dispersion which are expressed in terms of the original unit of a series are
termed as absolute measures. Such measures are not suitable for comparing the variability of
two distributions which are expressed in different units of measurement and different average
size. appropriate measure of central tendency and are thus pure numbers independent of the
units Relative measures of dispersions are a ratio or percentage of a measure of absolute
dispersion to an of measurement.
For comparing the variability of two distributions (even if they are measured in the same
unit), we compute the relative measure of dispersion instead of absolute measures of
dispersion.

2.4.1 Types of Measures of Dispersion

Various measures of dispersions are in use. The most commonly used measures of
dispersions are:
1) Range and relative range
2) Quartile deviation and coefficient of Quartile deviation
3) Mean deviation and coefficient of Mean deviation
4) Standard deviation and coefficient of variation.

The Range (R)

16
The range is the largest score minus the smallest score. It is a quick and dirty measure of
variability, although when a test is given back to students they very often wish to know the
range of scores. Because the range is greatly affected by extreme scores, it may give a
distorted picture of the scores. The following two distributions have the same range, 13, yet
appear to differ greatly in the amount of variability.
Distribution 1: 32 35 36 36 37 38 40 42 42 43 43 45
Distribution 2: 32 32 33 33 33 34 34 34 34 34 35 45
For this reason, among others, the range is not the most important measure of variability.
R= L-S, L = largest observation
S = smallest observation

Range for grouped data:

If data are given in the shape of continuous frequency distribution, the range is computed as:
R=UCLk - LCL1,UCLkis upperclass limit of thelastclass
UCL1is lower class limit of the first class
This is sometimes expressed as:
R = Xk – X1,Xk is classmarkof thelastclass
X1 is classmarkof thelastclass.

Merits and Demerits of range


Merits:
 It is rigidly defined.
 It is easy to calculate and simple to understand.
Demerits:
 It is not based on all observation.
 It is highly affected by extreme observations.
 It is affected by fluctuation in sampling.
 It is not liable to further algebraic treatment.
 It cannot be computed in the case of open end distribution.
 It is very sensitive to the size of the sample

Relative Range (RR)

It is also sometimes called coefficient of range and given by:

17
L−S R
RR= =
L+ S L+S

For a continuous grouped distribution:


RelativeRange( RR)=RR =¿
Example:

1. Find the relative range of the above two distribution.(exercise!)


2. If the range and relative range of a series are 4 and 0.25 respectively. Then what is the
value of:
a) Smallest observation
b) Largest observation
Solutions :( 2)
 R= L-S= 4 ------(1)
 RR=0.25, L+S=1------ (2), then solving (1) and (2) we get L =10 & S=6.

Variance and Standard Deviation

Variance

The variance is the average of squared deviations from the mean. Recall that the sum
of squared deviations is minimum only when taken from the mean.

a) Population Variance (σ 2)
If we divide the variation by the number of values in the population, we get
something called the population variance.
For ungrouped data (individual series) for population data
N

∑ (X i −μ)2
[∑ ]
N
2 i=1 1 , where μ is the population arithmetic mean
σ = = X i2−N μ2
N N i =1

 For discrete data arranged in FD & for continuous grouped data

σ =
2 ∑ f i ( X i−μ)2 = 1 [ ∑ f X i −N μ ],where X i the class mark of the ith class is, f iis the
2 2
i
N N
frequency of theith class and N=∑ f i

b) Sample Variance ( S2)

18
One would expect the sample variance to simply be the population variance with the
population mean replaced by the sample mean. However, one of the major uses of
statistics is to estimate the corresponding parameter. This formula has the problem
that the estimated value isn't the same as the parameter. To counteract this, the sum
of the squares of the deviations is divided by one less than the sample size, this is
because to get unbiased estimator.
n

∑ ( xi −x)2
[∑ ]
n
1
S2= i=1 = xi2−n x2
n−1 n−1 i=1

For frequency distribution

If the values have frequencies fi (i=1,2,…,m), then the sample variance is given by:

2
S=
∑ f i ( x i−x)2 = 1
[ ∑ f i x i −n x ].
2 2
n−1 n−1

The Standard Deviation


The square root value of variance is called standard deviation.The square root must
be taken to get the units back the same as the original data values.
 the population Standard Deviation=σ =√ σ 2
 the sample Standard Deviation=S= √ S2

The following steps are used to calculate the sample variance:


1. Find the arithmetic mean.
2. Find the difference between each observation and the mean.
3. Square these differences.
4. Sum the squared differences.
5. Since the data is a sample, divide the number (from step 4 above) by the number
of observations minus one, i.e., n-1 (where n is equal to the number of observations
in the data set).

Example: Find the sample variance and standard deviation of:

xi 2 4 5 6 8
fi 2 2 3 1 2
Solution: Prepare the following table:
xi fi fixi xi2 fixi2
2 2 4 4 8

19
4 2 8 16 32
5 3 15 25 75
6 1 6 36 36
8 2 16 64 128
Sum 10 49 279

Thus, n=∑ f i=10 , ∑ f i x i=49 , ∑ f i xi2=279.

[∑ ]
n
2 1
S= f i x i2−n x 2
n−1 i=1

=
1
9 [ 49 2 1
]
279−10( ) = ( 38.9 )=4.32 , andS=√ 4.32=2.08 .
10 9

Properties of Variance & Standard Deviation

1. If a constant is added to (or subtracted from) all the values, the variance remains the

same; i.e., for any constant k, .


2. If each and every value is multiplied by a non-zero constant (k), the standard deviation

is multiplied by |k| and the variance is multiplied by k2; i.e., .


3. Both the variance and the standard deviation give more weight to extreme values and
less to those which are near to the mean.
4. If the standard deviation of X1, X2 …..Xn is S then the standard deviation of
a) X1+k, X2+ k+,....+Xn+k will alsobe S
b) kX1, kX2+,.....+kXnwould be|k|S
c) a+kX1, a+kX2+… +a+kXn would be|k|S
Exercise: Verify each of the above relationship, considering k and a as constants.

Examples:
1. The mean and standard deviation of n Tetracycline Capsules X1, X 2,...,X nare known
to be 12gm and 3gm respectively. New set of capsules of another drug are obtained by the
linear transformation Yi = 2Xi – 0.5 (i = 1, 2,…, n ) then what will be the standard deviation
of the new set of capsules
2. The mean and the standard deviation of a set of numbers are respectively 500 and 10.
a. If 10 is added to each of the numbers in the set, then what will be the variance and
standard deviation of the new set?
b. If each of the numbers in the set are multiplied by -5, then what will be the
variance and standard deviation of the new set?

20
Solutions:
1. Using above the new standard deviation =|k|S=2*3 =6
2.
a) They will remain the same.
b) New standard deviation= |k|S =5*10= 50

Coefficient of Variation (C.V)

Is defined as the ratio of standard deviation to the mean usually expressed as


percents
S
CV = ∗100 The distribution having less C.V is said to be less variable or more consistent.
X
Examples:
1. An analysis of the monthly wages paid (in Birr) to workers in two
firms A and B belonging to the same industry gives the following results
Value Firm A Firm B
Mean wage 52.5 47.5
Median wage 50.5 45.5
Variance 100 121
In which firm A or B is there greater variability in individual wages?
Solutions:
Calculate coefficient of variation for both firms.
SA 10
C . V A= ∗100= ×100=19.05 %
XA 52.5
SB 11
C . V B= ∗100= × 100=23.16 %
XB 47.5
Since C.VA < C.VB, in firm B there is greater variability in individual wages.
2. The students of Mechanical and Civil departments took Introduction to
Statistics and probability course. At the end of the semester, the following information
was recorded.
Department Mechanical Civil
Mean score 85 65
Standard deviation 25 12
Compare the relative dispersions of the two departments’ scores using the
appropriate way.

21
Solution:
SM 25
C . V M= ∗100= × 100=29.41 %
XM 85
SCi 12
C . V Ci = ∗100= ×100=18.46 %
X Ci 65

Interpretation: Since the CV of Mechanical Department students is greater than that of


Civil Department students, we can say that there is more dispersion relative to the mean in
the distribution of Mechanical students’ scores compared with that of Civil students.

3. A meteorologist interested in the consistency of temperatures in three cities


during a given week collected the following data. The temperatures for the five days of the
week in the three cities were
City1 25 24 23 26 17
City2 22 21 24 22 20
City3 32 27 35 24 28
Which city have the most consistent temperature, based on these data? (Exercise)

Standard Scores (Z-Scores)

The Z-scoreis the number of standard deviations that a given value X is below or above the
mean and values above the mean have positive z-scores and values below the mean have
negative Z-scores. The numerical value of the Z-score reflects because of this Z-score is also
referred to as relative measure of relative standing.
 Scores are generally meaningless by themselves unless they are compared to the
distribution or scores from some reference group.
 In addition to comparison the data sets it is useful to transform a given data sets in to a
standard normal distribution.

Properties of the Z-score


 The sum of Z-scores is always zero.
 The mean of Z-score is zero.
 The variance and standard deviation of z-score are equal to one.

Z-score computed from the population


X −μ
Z score=
σ
Z-score computed from the sample

22
X −X
Z score=
S

Example: What is the Z-score for the value of 14 in the following sample data set?
3 8 6 14 4 12 7 10

Solution:
14−8
X = 8, SD = 3.8173 thus, Z = ≈ 1.57.
3.8173
 The data value of 14 is located 1.57 standard deviations above the mean 8 because the z-
score is positive.
Example: Suppose that a student scored 66 in Statistics and 80 in Mathematics. The score of
the summary of the courses is given below.
Course Average score Standard deviation of the score
Statistics 51 12
Mathematics 72 16
In which course did the student scored better as compared to his classmates?
Solution:
X−μ 66−51 15
Z-score of student in Statistics: Z= = = =1.25
σ 12 12
X−μ 80−72 8
Z-score of student in Mathematics: Z= = = =0.5
σ 16 16
From these two standard scores, we can conclude that the student has scored better in
Statistics course relative to his classmates than in Mathematics course.
Exercise
1. Two groups of people were trained 100km race and tested to find out which group is faster
to complete the race. For the two groups the following information was given:
Value Group one Group two
Mean 10.4 min 11.9 min
Stan.dev. 1.2 min 1.3 min
Relatively speaking:

a. Which group is more consistent in its performance?


b. Suppose a person A from group one take 9.2 minutes while person B from Group
two take 9.3 minutes, who was faster in completing the race? Why?

23

You might also like