0% found this document useful (0 votes)
14 views25 pages

Chapter One1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views25 pages

Chapter One1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

CHAPTER ONE: DESCRIPTIVE STATISTICS

Definitions:

Statistics: Is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in
making more effective decisions

Descriptive statistics are methods of organizing, summarizing, and presenting data in an informative
way.

Inferential statistics: The methods used to determine something about a population on basis of the
sample.

Summary of Types of variables:

Types of variables

Quantitative
Qualitative

Continuous
e.g Discrete

• Brand of PC
• Marital status,
• hair color • Children in a family • Amount of income
• Strikes in a golf hole tax paid
• TV sets owned • Weight of a student
• Yearly rainfall
• Temperature

1
Levels of measurements

Data can be classified according to levels of measurements. The level of measurement of the data
often dictates the calculations that can be done to summarize and present the data. It also determines
the statistical tests that should be performed.

Levels of Measurement

Nominal Ordinal Interval Ratio

Data may only be Data are ranked Meaningful difference Meaningful o


classified between values point and ratio
between values

• Make of car • Your rank in


Temperature • Number
• Eyes’color class
of
• Team standing
patient
in the
• Distance
premiership
to school

1.1. Frequency Distributions and Graphical Descriptive Techniques

Constructing a frequency distribution

Definition:

A frequency distribution is grouping of data into mutually exclusive classes showing the number of
observations in each.

Steps for organizing data into a frequency distribution:

2
Step 1: Decide on the number of classes (k): Use “2 to the k rule”
This rule suggests that you select the smallest number (k) for the number of classes such that 2k is
greater than the number of observations ( n ) .
2k > n

Step 2: Determine the class width or class interval.

Generally the class width should be the same for all classes. The class width is determined using the
H −L
following formula: i ≥
k
Where i is the class width, H is the highest observed value, L is the lowest observed value, and k
is the number of classes.

Step 3: Set the individual class limits


Avoid overlapping or unclear class limits.

Step 4: Tally the observations into classes.

Step 5: Count the number of items in each class.(frequencies)

Relative frequency distribution

Definition

A relative frequency distribution converts the frequency into a percentage.

Graphical Descriptive Techniques

Three charts that will help portray a frequency distribution graphical are histogram, frequency
polygon, and cumulative frequency polygon.

HISTOGRAM

A graph in which the classes are marked on the horizontal axis and the class frequencies on the
vertical axis. The classes frequencies are represented by the heights of the bar, and bar are drawn
adjacent to each other.
Example

3
FREQUENCY POLYGON

A frequency polygon consists of line segments connecting the points formed by intersections of the
class midpoints and the frequencies.

4
CUMULATIVE FREQUENCY DISTRIBUTIONS AND CUMILATIVE FREQUENCY POLYGON .

Cumulative frequency distributions


Selling prices Frequency [Link]
15 -18 8 8
18-21 23 31 8+23
21-24 17 48 8+23+17
24-27 18 66 8+23+17+18
27-30 8 74
30-33 4 78
33-36 2 80

5
Cumulative frequency polygon (OGIVE)

LINE GRAPHS

Line charts are particularly effective for business and economic data because they show change or
trends in variable over time.

Example:

Year 1992 1993 1994 1995 1996 1997 1998 1999 2000
Unemployment_rate 17.8 13.7 11 10.2 11.3 9.1 8.8 8.5 7.9

6
7
PIE CHARTS

A pie chart is especially useful for illustrating nominal level data

BAR CHART

8
1.2. Measures of central location

1. Arithmetic mean (AM)


The AM is the commonly used measure of central tendency. To calculate AM for ungrouped
data, we use
x1 + x2 + .... + xn
x=
n
n

∑x
i =1
i
=
n
Example:
For 10 years a company declared its percentage dividends as follows:

year 1 2 3 4 5 6 7 8 9 10
Dividend(xi) 5 6 14 20 30 10 15 20 20 30

Calculate the average dividend of the percentage declared By the company during the 10 years

Solution

Calculating the AM from frequency distribution

∑fx
i =1
i i
The AM a discrete frequency is calculated x = k

∑f
i =1
i

Annual profit( Number


xi ) outlets f i
10 3
15 8
20 23
25 10
30 6

Solution

Calculating the AM from Grouped frequency distribution

9
n

∑fx
i =1
i i
The AM a discrete frequency is calculated x = k

∑f
i =1
i

Where the xi is the class mid-point value of the ith class and
fi is the number of observations falling the ith class.

Example:

The following frequency distribution summarizes data on service times in minutes at the
checkout counter of a supermarket.

Time
interval Customers
1.99-<2.50 3
2.50-<3.00 8
3.00-<3.50 23
3.50-<4.00 10
4.00-<4.5 6

Calculate the estimated average time a customer takes for a checkout at the counter in this
supermarket.

Solution

10
2. Median (Mdn/Md)

The median is defined as the middle value when the data set are arranged in ascending order. It divides
the data set into two equal parts.

Calculating the median for ungrouped data set

Steps:

1. Arrange the data set in ascending order.


2. If the number of observations ( n ) in the data set is odd, then the median position is given by
n +1
2

3. If the number of observations ( n ) in the data set is even, then the median is given by the
n n
average of values in positions and + 1
2 2
Example:

Calculating the median for grouped data set:

Calculating the median for discrete frequency distribution

Steps:

1. Construct the less than cumulative frequency distribution


n
2. Calculate where n is the total cumulative frequency
2
n
3. Find the cumulative frequency equal or just greater than the value of calculated in step 2
2
4. The value at which the cumulative frequency is equal to that corresponding to cumulative
frequency calculated in step 3 is the median for the data set.

11
Example:

In a survey of 50 retail outlets, the following data were collected.

Number
Annual profit outlets
10 3
15 8
20 23
25 10
30 6
Calculate the median for the annual profit.

Calculating the median for grouped frequency distribution with equal intervals

Steps:

1. Construct the less than cumulative frequency distribution


n
2. Calculate where n is the total cumulative frequency
2
n
3. Find the cumulative frequency equal or just greater than the value of calculated in
2
step 2
4. The median class at which the cumulative frequency is that corresponding to cumulative
frequency calculated in step 3.
5. Calculate the median using the following formula:

h n 
M d =I M d +  −F
f 2 

Where,

M d is the median of the data set

I M d is the lower class limit of the median class

h is the width of the median class

f is the frequency of the median class

12
n is the total cumulative frequency

F is the cumulative frequency of the class immediately before the median class.

Example:

The following frequency distribution summarizes data on service times in minutes at the checkout
counter of a supermarket.

Time
interval Customers
2.00-<2.50 3
2.50-<3.00 8
3.00-<3.50 23
3.50-<4.00 10
4.00-<4.5 6

Calculate the median for the time it takes for a customer to be checked out at counter in this
supermarket.

Solution

Step 1

Time Customers
interval (fi) Fi
2.00-<2.50 3 3
2.50-<3.00 8 11
3.00-<3.50 23 34
3.50-<4.00 10 44
4.00-<4.5 6 50
Step 2:

n 50
= = 25
2 2

n
Step 3: the cumulative frequency equal to or just greater than is 34
2

Step 4:

The medial class is 3.00-<3.50

13
h n 
Step 5: The median is found by using the interpolation formula M d =I M d +  −F
f 2 

I M d =3.00 is the lower class limit of the median class

h =0.5 is the width of the median class

f =23 is the frequency of the median class

n =50 is the total cumulative frequency

F =11 is the cumulative frequency of the class immediately before the median class.

0.5  50 
M d =+
3  − 11 =3.30
23  2 

3. Mode ( M o )

The mode of a data set is the value in the data set that occurs most with the greatest frequency.

It is a data point that occurs most frequently in the measurements that constitute a data set.

Calculating the mode from ungrouped data set.

To find the mode of ungrouped data set we simply observe the data value that occurs most
frequently in the data set.

Calculating the mode from grouped data set:

Calculating the mode from a discrete frequency distribution:

The mode is the value that has the highest frequency.

14
Example:

Example:

In a survey of 50 retail outlets, the following data were collected.

Number
Annual profit outlets
10 3
15 8
20 23
25 10
30 6
Calculate the mode for the annual profit.

Solution

The highest frequency is 23. Therefore, 20 is the mode.

Calculating the mode for grouped frequency distribution.

The mode is calculated using the following interpolation formula:

f1 − f 0
Mo =
lM o + ×h
( f1 − f0 ) + ( f1 − f 2 )
f1 − f 0
=lM o + ×h
2 f1 − f 0 − f 2

Where

M o is the mode

lM o is lower limit of the modal class

h is the width of the modal class

f1 is the frequency of the modal class

f 0 is the frequency of the class immediately before the modal class.

f 2 is the frequency of the class immediately after the modal class

15
Definition:

A modal class is the class interval having the highest frequency.

Example

The following frequency distribution summarizes data on service times in minutes at the
checkout counter of a supermarket.

Time
interval Customers
2.00-<2.50 3
2.50-<3.00 8
3.00-<3.50 23
3.50-<4.00 10
4.00-<4.5 6

Calculate the mode for the time it takes for a customer to be checked out at counter in this

Solution:

The modal class is 3.00-<3.50 as it has the highest frequency 23.

lM o =3.0 is lower limit of the modal class

h =0.5 is the width of the modal class

f1 =23 is the frequency of the modal class

f 0 =8 is the frequency of the class immediately before the modal class.

f 2 =10 is the frequency of the class immediately after the modal class

Therefore,

23 − 8
Mo =
3+ × 0.5
( 23 − 8) + ( 231 − 10 )
=3.27

16
MEASURES OF DISPERSION

1.3. Partition values: Quartiles and Percentiles

Partition values are values of a variable that divide a data set into a number of equal parts
e.g. Quartiles, Percentiles, deciles

17
1. Quartiles
Quartiles of a data set are values (partition values) that divide the data set into four equal parts
when data are arranged in ascending order.
There are three quartiles called lower quartile ( Q1 ), the middle quartile (second quartile Q2 ),
and upper quartile ( Q3 ).
Calculating quartiles from frequency distributions

To calculate the kth quartile from grouped frequency distributions, we use the following
procedure:
Step 1: Construct less than cumulative frequency distribution.
k
Step 2: Calculate nk= ×n
4
For Q1 , the value of k=1
For Q2 , the value of k=2
For Q3 , the value of k=3
k
Step 3: Find the cumulative frequency equal to or just greater than the value of × n calculated
4
in step 2.
Step4: The kth quartile class is the class at which the cumulative frequency corresponds to the
cumulative frequency in step 3.
Step 5: The kth quartile class is calculated using the following interpolation formula:
h k 
Qk = lk +  ×n− F 
fk  4 
Where
Qk is the kth quartile for the data set;
lk is the lower class limit of the kth quartile class;
h is the width of the kth quartile class;
f k is the frequency of the kth quartile class;
F is the cumulative frequency of the class immediately before the the kth quartile class;

n is the total cumulative frequency

Example:

The human resource department of a company analyzed the level of absenteeism of 56


employees who reported ill over the past year.

Absenteeism level (days absent) Number of employees ( f i )

18
3-<7 14

7-<11 22

11-<15 11

15-<19 6

19-<23 3

Determine the first quartile, the second quartile, and the third quartile.

2. Percentiles

The percentiles of a data set are values of a random variable dividing a data set into hundred
equal parts, with each containing 1% of values when the values are arranged in ascending order.
There ninety-nine percentiles called first percentile, second percentile,…, and ninety-ninth
percentile.
The fiftieth percentile is the median of the data set
The 25th percentile is the 1st quartile,
And 75 th percentile is 3rd quartile

Calculating percentiles from frequency distributions

To calculate the kth percentile from grouped frequency distributions, we use the following
procedure:
Step 1: Construct less than cumulative frequency distribution.
k
Step 2: Calculate =
nk ×n
100
For p1 , the value of k=1
For p2 , the value of k=2
For p3 , the value of k=3
.
.
.
For p99 , the value of k=99

k
Step 3: Find the cumulative frequency equal to or just greater than the value of ×n
100
calculated in step 2.

19
Step4: The kth percentile class is the class at which the cumulative frequency corresponds to the
cumulative frequency in step 3.

Step 5: The kth percentile is calculated using the following interpolation formula:

h  k 
pk = lk +  ×n− F 
f k  100 

pk is the kth quartile for the data set;


lk is the lower class limit of the kth percentile class;
h is the width of the kth percentile class;
f k is the frequency of the kth percentile class;
F is the cumulative frequency of the class immediately before the kth percentile class;

Example:

The human resource department of a company analyzed the level of absenteeism of 56


employees who reported ill over the past year.

Absenteeism level (days absent) Number of employees ( f i )

3-<7 14

7-<11 22

11-<15 11

15-<19 6

19-<23 3

Determine the 65th percentile, the 70th percentile, and the 90th percentile

1.4. Measures of dispersion


Two or more data sets may have the same mean and yet be very different in the way they spread
out. To describe this difference quantitatively, we use measures of dispersion. A measure of
dispersion indicates the amount of variation in a data set. Some of the commonly used measures of

20
spread are the range, Inter-quartile range, semi-quartile (Quartile deviation) variance, and standard
deviation, and coefficient of variation.

1. Range

The range is the difference between the highest and lowest values in a data set.

It measures the distance across the entire data set.

=Range Maximum value − min imum value

Example:

18 26 17 10 7 27 24 17 17 23 29 28
18 10 23 16 9 12 26 5 12 23 22 24
16 5

xmax = 29
xmin = 5
Range = 29 − 5 = 24

2. Inter-quartile range (IQ)

Definition

Quartiles of a data set are values (partition values) that divide the data set into four equal parts when
data are arranged in ascending order.

There are three quartiles called lower quartile, the middle quartile (second quartile), and upper quartile.

= Q3 − Q1
IQR

3. Semi interquartile range or quartile deviation

21
Q3 − Q1
SIQR(Q.D) =
2
Example:
Let
Q1 = 14.5days
Q2 = 18.89days
Q3 = 23.93days

23.93 − 14.5
SIQR(Q.D) =
2
=4.715 days
Interpretation: 50% of all observations are expected to lie within 4.715 days either side of the
median of 18.89 days. Or 25% of observations are considered to lie within 4.715 days below the
median and 25% of observations are expected to lie within 4.715 days above the median value.

4. Variance and standard deviation


The most useful and reliable measures of dispersion are those that:
• Take every observation into account, and
• Are based on average deviation from the central value.

Because the variance is such a measure that satisfies these properties, it has become the most
commonly used measure of dispersion. It is extensively used in statistical analysis.

The variance is calculated as the average of sum squared deviation.

sum of squares deviation


var iance =
sample size − 1

For ungrouped data, the variance is calculated using the following formula:

nn

∑(x − x ) ∑x − nx 2
2 2
i i
==
S 2
=
i 1 =i 1

n −1 n −1
x

Mathematical computational formulae for grouped data is


n n

∑ f (x − x) ∑fx − nx 2
2 2
i i i i
=
=
S 2 i 1 =i 1
=
n −1 n −1
x

The variance is a measure of average of sum squared deviation about the arithmetic mean. It is
expressed in squared units. Consequently, its meaning in practical sense is obscure.

22
Because of this interpretation problem, a measure that uses original units is derived from the
variance: Standard deviation.

5. Standard deviation

Sx = Sx2
The standard deviation describes how observations are spread about the mean.

6. Coefficient of variation
Sometimes, it is necessary to compare the samples of data from different random variables to
establish which sample data shows greater variability. A direct comparison of their respective
standard deviations would be misleading as the random variables may be measured in different
units. Thus, a meaningful comparison should be based on measure variability expressed in the
same units. This achieved by producing a measure of relative variability (i.e. relative to their
mean) expressed in percentage terms, called coefficient of variation.

Sx
=
CV ×100%
x
Example

Turnover/month Employee age


Mean R54588 38.2 yrs
Standard deviation R8444 7.9 yrs
CV 15.47% 20.68%

The age characteristic shows greater variability than turnover/month.

TUTORIAL 1

23
1. A company employs 12 persons in managerial positions. Their seniority (in years of service)
and sex are listed below:

Sex F M F M F M M F F F F M
Seniority (yrs) 8 15 6 2 9 21 9 3 4 7 2 10
Find the seniority mean, the seniority median and the seniority mode for the above data set.
2. The daily percentage change (to the nearest percentage ) of equity traded on the JSE was
monitored for 100 days by an investment analyst. These daily percentage changes were
summarized into the frequency distribution below.

Daily
percentage
change of
an Number
equity(%) of days
2 15
3 30
4 25
5 19
6 8
7 2
8 1
Find the mean daily percentage change, the median daily percentage change, and mode
daily percentage change.

3. Mary secully is employed as an “Affirmative Action Officer” by Ortex electronics. Mary


reports directly to the plant manager, and is responsible for monitoring and making
recommendations on Ortex hiring procedures, working conditions and compensation plans.
As part of her ongoing monitoring of compensation plans, Mary collected data on hourly
earnings on all non-salaried employees at Ortex. T aid in interpreting the data, Mary
organized the data into the following frequency distribution:

24
Number of
Hourly
earnings(Rands) Women Men
4.70-4.90 6 5
4.90-5.10 31 16
5.10-5.30 15 25
5.30-5.50 29 30
5.50-5.70 19 24
Calculate the mean, median and the mode of the hourly earnings for the men

4. The annual earnings of a company’s salesmen at its Johannesburg and Cape Town offices
are as follows:

Number of salesmen
Cape
Earnings(R1000s) Johannesburg Town
6-<8 3 2
8-<10 7 3
10-<12 13 6
12-<14 17 8
14-<16 4 3
16-<20 4 2
20-<25 2 6

(a) Compare the salesmen’s earnings in Johannesburg and Cape Town offices by find the
means, medians and quartile deviations
(b) Find the standard deviation

25

You might also like