Introduction To Statistics

1

OBJECTIVES:

After the completion of the chapter, students should be able to:

1. differentiate descriptive from inferential Statistics;

2. distinguish population from sample;
3. identify the types of data and the levels of measurement for each given variable;
DISCUSSION:
Probability and statistics, the branches of mathematics concerned with the laws governing
random events, including the collection, analysis, interpretation, and display of numerical data.
An introduction to descriptive statistics
Descriptive statistics summarize and organize characteristics of a data set. A data set is a
collection of responses or observations from a sample or entire population.
In quantitative research, after collecting data, the first step of statistical analysis is to describe
characteristics of the responses, such as the average of one variable (e.g., age), or the relation
between two variables (e.g., age and creativity).
The next step is inferential statistics, which help you decide whether your data confirms or
refutes your hypothesis and whether it is generalizable to a larger population.
Types of descriptive statistics
There are 3 main types of descriptive statistics:
-The distribution concerns the frequency of each value.
-The central tendency concerns the averages of the values.
-The variability or dispersion concerns how spread out the values are.
Research example
You want to study the popularity of different leisure activities by gender. You distribute a survey
and ask participants how many times they did each of the following in the past year:
Go to a library
Watch a movie at a theater
ROMMEL H. SARREAL, RME

INSTRUCTOR I
STAT 2
Visit a national park
Your data set is the collection of responses to the survey. Now you can use descriptive statistics
to find out the overall frequency of each activity (distribution), the averages for each activity
(central tendency), and the spread of responses for each activity (variability).
Frequency distribution
A data set is made up of a distribution of values, or scores. In tables or graphs, you can
summarize the frequency of every possible value of a variable in numbers or percentages.
Simple FDT
For the variable of gender, you list all possible answers on the left hand column. You count the
number or percentage of responses for each answer and display it on the right hand column.
Gender Number
Man 182
Woman 235
No answer 27
From this table, you can see that more women than men took part in the study.
Grouped FDT
In a grouped frequency distribution, you can group numerical response values and add up the
number of responses for each group. You can also convert each of these numbers to
percentages.
Library visits in the past year Percent
0–4 6%
5–8 20%
9–12 42%
13–16 24%
17+ 8%
From this table, you can see that most people visited the library between 5 and 16 times in the
past year.

INSTRUCTOR I
STAT 2
Measures of Central Tendency
Measures of central tendency are measures indicating the center of a set data which are arranged in
order of magnitude. It is described as the point about which the scores tend to cluster, hence, regarded
as a sort of average in the series. It is the center of the concentration of the scores. It is a single number
which describes the totality of the set of data collected. It refers to the parameters of the sample.
There are three measures of central tendency commonly used. These are: the mean, median and
mode.
1. Mean or arithmetic mean ( or average) is the most popular and well known measure of central
tendency. It can be used with both discrete and continuous data (although it is used most often which
continuous data).
For Ungrouped data: The mean is the most frequently used measure of central tendency. The mean is
denoted by a symbol µ (read as “mu’’) and X̅̅ (read as “x- bar) for the population and sample respectively.
The mean of a series of values is equal to the sum of the set of values divided by the number of values.
Symbolically the mean is:
∑�� 1 + �2 + �3 + … + ��
�= =
� �
where:
µ-population mean
Xi-ith observed value in the population
N-total number of observations
− ∑�� 1 + �2 + �3 + … + ��
�= � = �
where:
−
� – sample mean
Xi - ith observed value in the sample
∑ - sum of all values
n – total numbers of observations
Example 1: The items listed below represents the scores of seven BS Applied Statistics
Students during the final examination. Compute the mean score. 89,75,90,85, 78,87 and 80.
− ∑�� 89 + 75 + 90 + 85 + 78 + 87 + 80 584
�= � = 7
=
7
= 83.43

INSTRUCTOR I
STAT 2
Example 2: Suppose BS Applied Math program has 10 students and the height (in, cm) are as follows:
170, 165, 155, 160, 150, 149, 152,161, 163, 175. Find the mean height of the student.
∑�� 170 + 165 + 155 + 160 + 150 + 149 + 152 + 161 + 163 + 175 1600
�= = = = 160��
� 10 10
For Grouped Data: The mean for grouped data is denoted by µg or XG for population and sample
respectively
∑�1 �1 �1 �1 + �2 �2 + �3 �3 … + ��
�� = =
� �
Where:
µG – mean
Xi – class mark of ith class
Fi – frequency of ith class
∑ - sum of all values
N – Total numbers of observations
Example: The table below represents the scores of 64 students in a long quiz.
Class Interval Frequency Class Mark fiXi
5-9 7 7 49
10-14 10 12 120
15-19 13 17 221
20-24 18 22 396
25-29 8 27 216
30-34 5 32 160
35-39 3 37 111
Total N=64 ∑fiXi=1273
N=f1+f2+f3+....+fk
=7+10+13+18+8+5+3
=64
∑fiXi=f1x1+ f2x2+ f3x3+.......+ fkxk
=7(7)+10(12)+13(7)+18(22)+8(27)+5(32)+3(37)
=1273
∑�1 �1 1273
�� = = = 19.89
� 64
Note: The frequency distribution table add another column to represent the product of the ith frequency
and ith class mark. Then take the sum under this column.
Weighted Mean
The weighted mean is denoted by µw or xw for population and sample respectively.

INSTRUCTOR I
STAT 2
∑�1 �1 �1 �1 + �2 �2 + �3 �3 … + ��
�� = =
∑ �� 1+ �2+ �3+ …. . ��
where :
µi-weighted mean
Xi-ith quantity
Wi-weight of ith mean
∑-sum of all values
Example: Consider the grade of a freshman student during the first semester.
Subject Units Grade wiXi

Purposive Comm 3 2.25 6.75
STS 3 1.75 5.25
MMW 3 2.00 6.00
Panitikan 3 2.00 6.00
PHED1 4 1.50 6.00
Military Science 1.5 1.25 1.875
Total ∑wi=17.5 ∑wiXi=31.875
∑�1 �1 3 2.25 +3 1.75 +3 2.00 +3 2.00 +2 1.50 +1.5(1.25) 31.875

�� = = = = 1.82
∑ �� 3+3+3+3+2+1.5 17.5
Properties of the Mean
1.The sum of the deviations of the observations from the mean is zero. The deviation of it observation
from the mean is denoted by
di = Xi -µ
Given the following observed values 3, 8 and 4. The mean is 5.
d1 = X1 – 5 = 3 – 5 = -2
d2 = X2 – 5 = 8 – 5 = 3
d3 = X3 – 5 = 4 – 5 = -1
∑ di = d1 + d2 + d3 = -2 + 3 (-1) = 0
2. The sum of the squared deviations of the observations from the mean is minimum.
∑ di 2 = ∑(Xi - µ)2 = ∑( Xi – 5 )2 = (3-5) + (8-5)2 + (4-5)2 = 14

∑(Xi – X1)2 = ∑( Xi – 3 )2 = (3-3) + (8-3)2 + (4-3)2 = 26
∑(Xi – X2)2 = ∑( Xi – 8 )2 = (3-8) + (8-8)2 + (4-8)2 = 41
∑(Xi – X3)2 = ∑( Xi – 4 )2 = (3-4) + (8-4)2 + (4-4)2 = 17
Hence, the sum of the squared deviation of the observation from the mean has minimum value.
3. The mean reflects the magnitude of every observation, since every observation contributes to the value
of the mean.
4. The mean can be easily affected by the presence of an extreme value, hence not a good measure of
central tendency when an extreme value to occur.

INSTRUCTOR I
STAT 2
From the previous data 3, 8, 4 and 50. The mean is 16.25
5. The mean of subgroups may be combined when properly weighted, the combined mean is called the
weighted mean.
2. Median is the middle score for a set of data arranged in order of magnitude. Median is best used when
data has several extreme entries.
For Ungrouped data: The median is defined as the middle value when a set of observed values have
been arranged in either ascending (from lowest to highest value) or descending from highest to lowest
value) order of magnitude. The median is the center most array into two equal parts, that is 50% of the
total number of observation is less than the median value while the other 50% is greater than the median
value. The median is denoted by Md.
Symbolically, a given set of data is denoted by X1, X2, . . ., Xn the array is denoted by X(1), X(2)+, . . .
X (N). The median is
�(�+1)/2
� �+2 + � �+2 +1
Md= -------------------------
2
Example 1. The items listed below present the scores of seven graduate students during the final
examination. Compute the median score. 89, 75, 90, 85, 78, 87, and 80.
Arranged the data in ascending or descending order of magnitude
X(1) = 75 X(2) = 78 X(3) =80 X(4) = 85 X(5) = 87 X(6) = 89 X(7) = 90
Md = X((7+1) = X(8/2) = X(4) = 85.
Example 2. Suppose MA Math program has 10 graduate students and the height (in cm) are as
follows: 170, 165, 155, 160, 150, 149, 152, 161, 163, 175. Find the median height of the graduate
students.
Arranged the data in ascending or descending order of magnitude
x(1) = 149 x(2) = 150 x(3) = 152 x(4) = 155 x(5) = 160
x(6) = 161 x(7) = 163 x(8) = 165 x(9) = 170 x(10) = 175
� �+2 +� � 10+2 +� �(5)+�(6) 160+161

�+2 +1 10+2 + 1
�� = 2
= 2
= 2
= 2
=160.5 cm

INSTRUCTOR I
STAT 2
For Grouped data: The median from grouped data can be calculated using the formula
�
− ��
�� = �� + � 2
��
where
Lmd - Lower CB of the median class
C - Class size
Fb - <CF before the median class
N - total number of observations
Fmd - frequency of the median class
Note: Median class is the class containing the middle value. It is the class which contain the (N/2)th
observation. This can be easily identified under the less than cumulative frequency column.
Example: The table below presents the scores of 64 students in a long quiz
Class Interval Frequency Class Mark Class <CF Array
Boundary
5-9 7 7 4.5 – 9.5 7 X(1), X(2), . . X(7)
10-14 10 12 9.5 – 14.5 17 X(8), X(9), . . X(17)
15-19 13 17 14.5 – 19.5 30 X(18), X(19), . X(30)
20-24 18 22 19.5 – 24.5 48 X(31), X(32), . X(48)
25-29 8 2 24.5 – 29.5 56 X(49), X(50), . X(56)
30-34 5 27 29.5 – 34.5 61 X(57), X(58), . X(61)
35-39 3 32 34.5 – 39.5 64 X(62), X(64), . X(64)
Total N=64 37
The middle value is the X(32) observation and it falls under the class interval 20 – 24 .
Lmd = 19.5 C=5 Fb = 30 fmd = 18 N = 64
64
− 30
�� = 19.5 + 5 2 = 20.0555
18
Properties of Median
1. Median is a positional value and hence is not affected by the presence of an extreme value unlike the
mean.
2. The sum of the absolute deviation from a point say “a” is minimum when a = Md, that is ∑ │Xi – Md │is
minimum.
3. The Median is not amenable for further computation and hence medians of subgroups cannot be
combined in the same manner as the mean.
4. The median of grouped data can be calculated even with open-ended intervals provided the median is
not open-ended.
3. Mode is the most frequent score in the data set. It is sometimes considered as the most popular option.

INSTRUCTOR I
STAT 2
for Ungrouped data: The Mode is a value which occurs most often or the most frequently occurring
observation. The mode is denoted by Mo.
Example 1. Consider the data set 1, 2, 2, 2, 8, 1, 4, 10. The most frequently occurring observation is 2
which appeared thrice. Thus, the mode is 2, and since there is only one mode, then the distribution is
unimodal.
Example 2. Suppose BS Applied Statistics has 10 students and the height (in cm) are as follows: 170,
165, 155, 160, 150, 161, 163, 175.
Since all values occur with equal frequency, then this data has no mode.
Example 3. Result of the survey of the color of cars owned by faculty shows that 40 were white, 20 blue
and 10 were red. The modal color of cars owned by faculty is white.
for Grouped data: The mode grouped data can be approximated using the formula
�� − ��
�� = �� + �
2�� − �� − ��
where
Lmo - Lower CB of the modal class
C - Class size
fb - frequency before the modal class
fa - frequency after the modal class
Note: Modal class is the class with the highest frequency
Example: The table below represents the scores of 64 students in a long quiz.
Class Interval Frequency Class Mark Class Boundary

5-9 7 7 4.5 – 9.5
10-14 10 12 9.5 – 14.5
15-19 13 17 14.5 – 19.5
20-24 18 22 19.5 – 24.5
25-29 8 27 24.5 – 29.5
30-34 5 32 29.5 – 34.5
35-39 3 37 34.5 – 39.5
Total N=64
The modal class is the interval 20 – 24 .
Lmo = 19.5 c=5 fb = 13 fmo = 18 fa = 8
18 − 13
�� = 19.5 + 5 = 21.17
2(18) − 13 − 8

INSTRUCTOR I
STAT 2
Properties of Mode
1. May not be the center of the data.

2. It does not make use of all observations.
3. It’s difficult to manipulate algebraically.
4. It’s ideal for qualitative type of data.
Frequency Distribution Table
 Classes are mutually exclusive categories defining th lower limit and the upper limit with
equal intervals.
 Class frequency is the number of observations in each class
 Class mark or class midpoint is used in computing the mean and some measures of
variability.
 Cumulative frequency tells the sum of frequencies in a particular class of interest.
 Relative frequency tells the percentage of observations in a particular class of interest.
Steps in Constructing a Frequency Distribution with Equal Class Size
1. Determine the range R of the numerical data.

R=[ Highest value – Lowest value]
2. Determine, the number of classes of K to which the data re to be grouped using the
Sturges Approximation:
K= 1+3.322Log N
Where N = total number of values to be grouped
3. Determine the class size C

C=R/K
4. Determine the lower limit of the first class.

Note: There is no fixed rule in determining the lower limit of the first class. For the
purpose of uniform result, the lowest value in the data set should be the lower limit of the
first class.
5. Construct the class intervals and determine the class frequencies.
Remarks:
1. Sturges Approximation is just a guide and a flexible rule.

2.The number of classes should be a large enough to demonstrate the major
characteristics of the data yet not so large as to result losing the advantage summarizing
raw data. For instance, where the highest of served value fails to be included in the last
class constructed, the number of classes should not be increased just to accommodate
the highest value increase the class size.
3. The number of classes is usually taken between 5 to 20 depending nature of the data
without using the Sturges Approximation.

INSTRUCTOR I
STAT 2
4.Class intervals are chosen so that the class marks coincide with actually observe data.
However, class boundary should not coincide with actually observed data.
Example: Raw scores of 50 students in 200 item test.
144 112 156 122 168 172 141 159 127 154
156 145 134 137 123 149 144 160 136 139
142 138 159 151 147 150 126 152 147 136
135 132 146 133 150 122 139 149 152 129
131 155 116 140 145 135 160 125 172 163
1. The range R = 172-112 = 60

2. K = 1+ 3.322 log 50 = 6.643978 = 7
3. C = 60/7 = 8.571428571 = 9
4. The lower limit is 112
5. Construct the Frequency Distribution Table
Class Frequency Class Mark Class Relative <CF >CF

Intervals Boundary Frequency
112-120 2 116 111.5-120.5 4 2 50
121-129 7 125 120.5-129.5 14 9 48
130-138 10 134 129.5-138.5 20 19 41
139-147 12 143 138.5-147.5 24 31 31
148-156 11 152 147.5-156.5 22 42 19
157-165 5 161 156.5-165.5 10 47 8
166-174 3 170 166.5-174.5 6 50 3
Total 50 100
Graphical Presentation of Frequency Distributions with Equal Class Size
Measures of Dispersion
Measures of dispersion identify how a set of values spreads or fluctuates. The measures of
dispersion are the range, the mean absolute deviation or variance, the standard deviation, the coefficient
of variation, the coefficient of skewness and the boxplot.
Range is the simplest measure of dispersion. It is the difference between the highest and lowest score. It
actually does not reflect the variations in the data that lie in between the highest and the lowest scores;
therefore, it is not considered to be a valid measure of variability and spread ability.
for Ungrouped data: The range of a set of data is the absolute difference between the highest and the
lowest value in the set. The range is denoted by R.
R = │HV - LV│
where
R - Range
HV – Highest value
LV - Lowest value

INSTRUCTOR I
STAT 2
Example 1. The items listed below represent the scores of seven BS Applied Statistics students during
the final examination. Compute the range 89, 75, 90, 85, 78, 87, and 80.
The Range, R = │HV - LV │ = │90 - 75 │= 15
Example 2. Suppose BS Applied Math program has 10 students and the height (in cm) are as follows:
170, 165, 155, 160, 150, 149, 152, 161, 163, 175. Find the range of height of the BSAM students.
The Range R = │HV - LV │= │175 – 149 │ = 26
for Grouped data: The range for grouped data is denoted by RG
RG = │ULHC – LLLC │
where:
R - Range
ULHC - Upper Limit of the Highest Class
LLLC - Lower Limit of the Lowest Class
Example 3. The table below represents the scores of 64 students in a long quiz.
Class Interval Frequency Class Mark

5-9 7 7
10-14 10 12
15-19 13 17
20-24 18 22
25-29 8 27
30-34 5 32
35-39 3 37
Total 64
RG = │ULHC – LLLC │ = │39 – 5 │ =34
Properties of the Range
1. It is quick but rough measure of dispersion.

2. The larger the value of the range, the more dispersed are the observations.
3. It considers only the lowest and the highest value in the data set.
2. Mean absolute deviation, also known as variance, is the simplest method of taking into account the
variations or the spread ability of all items into a series from the point of central tendency.
The variance considers the position of each observation relative to the mean. The variance of a
given data set is the average of the sum of the square deviation of the observation from the mean. The
variance from the population is denoted by σ2 (read as “ sigma square”) and s2 (read as “ s-square”) for
the sample.
For Ungrouped data: Given the set of values X1, X2, X3, . . ., XN. The deviation of ith observation from the
mean is X1 - µ.. The population variance, σ2, is
  ( Xi   )

( X 1   ) 2  ( X 2   ) 2  ( X 3   ) 2  ...  ( Xn   ) 2
N N
The computational formula of the variance is

INSTRUCTOR I
STAT 2
 2

 Xi 2
 2 
X 12  X 2 2  X 32  ..... Xn 2
 2
N N
Example 4. The following data p esent the score of 7 BS Applied Statistics in a quiz:
r
X1=4, X2=7, X3=8, X4=2, X5=2, X6=9, X7=3.
Compute the population variance.
478 2 293
 5
7
 2

(X i   )2

(X 1  5) 2  ( X 2  5) 2  ( X 3  5)  ...  ( X 7  5) 2
7 7
( 4  5 )  ( 7  5 )  (8  5 )  ( 2  5 ) 2  ( 9  5 ) 2  ( 3  5 ) 2
2 2 2
2 
7
 2  7.42857
Using the computational formula
X
2
2 i 4 2  7 2  8 2  2 2  2 2  9 2  32
   2   52  7.43
N 7
Note: Using the definitional or computational formula the population variance is the same.ut the
computational is faster and easier to apply than the definitional formula.
The sample variance is
    
s  2  ( X i  X )2 ( X  X ) 2  ( X 2  X ) 2  ( X 3  X ) 2  ...  ( X n  X )
 1
n 1 n 1
The computational formula of the variance is
n X i  ( X i ) 2
2
2
s
n(n  1)
Example 5: Given a random sample of size, n=10.
X1 =4, X2 =7, X3 =8, X4 =2, X5 =2
X6 =8, X7 =9, X8 =2, X9 =5, X10 =7.

Compute the sample variance.

INSTRUCTOR I
STAT 2

X
X i

4  7  8  2  2  8  9  2  5  7 54
  5 .4
n 10 10
Using the definitional formula
s 2 (X i  X )2

(4  4.9) 2  (7  4.9) 2  (8  4.9) 2  ...  (7  4.9) 2
 7 .6
n 1 9
Using the computational formula
X
2 2 2 2 2
i  x1  x2  x3  ...  xn  4 2  7 2  82  ...  7 2  305
n X i  ( X i ) 2
2
2 10(360)  (54) 2
s   7 .6
n(n  1) 10(10  1)
for Grouped data: The variance from the grouped data can be obtained using the formula.
fX
2 2 2 2 2
2 i i 2 f1 X 1  f 2 X 2  f 3 X 3  ...  f k X k 2
G   G   G
N N
n f i X i  ( f i X i ) 2
2
2
S
n(n  1)
where
fi - the frequency of the ith class
Xi - the class mark of the ith class
µG – the mean from the grouped data
Example 6: The table below represent the scorer of 64 students in along quiz.
Class Interval Frequency Class Mark fixi fixi2

5-9 7 7 49 343
10-14 10 12 120 1440
15-19 13 17 221 3757
20-24 18 22 396 8712
25-29 8 27 216 5832
30-34 5 32 160 5120
35-39 3 37 111 4107
Total 64 1273 29311
fX
2
2 i i 2 29311 1273 2
G   G  ( )  62.347412  62.35
N 64 64

INSTRUCTOR I
STAT 2
n f i X i  ( f i X i ) 2
2
2 64(29311)  (1273) 2
s    63.3370536  63.34
n(n  1) 64(64  1)
Properties of the Variance
1. The variance is always non negative;

2. The larger the value of the variance the more dispersed are the observations;
3. The variance can be easily manipulated;
4. Each observation contributes to the magnitude of the variance;
5. The unit of measure of the variance is the square of the unit of measure the original data set.
Standard deviation is based on the deviations of all the scores in a series. It is always computed from
the mean. The standard deviation is defined as the positive square root of the variance. Hence the
variance is denoted by the σ for the population standard deviation and s for the sample standard
deviation.
b
2
  2 G  G
2 2
S S SG  SG
Example 7. Using the data in example 4. Compute the population standard deviation.
From example 4, the population variance was 7.43, then the population standard is σ = 2.7258.
Example 8. Using the data in example 5. Compute the sample standard deviation.
From example 5, the sample variance was 7.21, then the sample standard deviation is s =
2.68514.
Example 9. Using the data in example 6. Compute the population and sample standard deviation.
From example 6, the population variance was 63.34, then the sample standard deviation is SG =
7.958.
Properties of Standard Deviation
The properties of standard deviation have the same properties with the variance except property
The unit of measure of the standard deviation is the same as the unit of measure of the raw data.
Coefficient of variation, also known as relative dispersion, is the ratio of the standard deviation and the
mean and is usually expressed in percent; i.e.,
� �
CV= � x 100 or CV= µ
x 100
The coefficient of variation is a unitless measure of dispersion, hence it can be used to compare
variability of two or more groups of data measured in the same or different units.
Skewness is a measure or a criterion on how asymmetric the distribution of data is from the mean.
Positive skewness indicates a distribution with an asymmetric tail extending toward the right side of the
distribution while negative skewness indicates a distribution with an asymmetric tail extending toward the
left.

INSTRUCTOR I
STAT 2
The distribution of data is said to symmetric about the mean if its graph can be folded along a
vertical axis about the mean and the two sides coincide. Analytically, if the coefficient of skewness is zero,
then the distribution of the data is symmetric about the mean.
Using Measure of Central Tendency
 If Mean = Median = Mode, the skewness is zero. (Symmetrical)

 If Mean > Median > Mode, the skewness is positive.
If Mean < Median < Mode, the skewness is negative.
Example: Given a random sample of size n=10,
4, 7, 8, 2, 8, 8, 9, 2, 5, 7
Using the measure of central tendency, tell whether the given data are symmetric,
skewed to the left, or skewed to the right.
Mean = 6 Median = 7 Mode = 8
Since the Mean < Median < Mode, therefore it is negatively skewed.
The formula for the coefficient of the Pearsonian skewness, denoted by SK, is
3(   Md )
SK 

where
SK - Pearsonian Coefficient of Skewness

μ - the mean
Md - the median
σ - the sd
If SK = 0 then the distribution is symmetric

SK > 0 then the distribution is positively skewed
SK < 0 then the distribution is negatively skewed
Example 10: The following data represent the score of 7 BS Applied Statistics students in a quiz:
X1 =4, X2 =7, X3 = 8, X4 = 2, X5 = 2, X6 = 9, X7 = 3.
Solution: Md = 4 μ=5 σ = 2.73
3(5  4)
SK   1.0989  1.10
2.73
Hence,positively skewed distribution
Example 11: Given a random sample of size, n=10.

X1=4, X2=7, X3=8, X4=2, X5=2, X6=8, X7=9, X8=2, X9=5, X10=7.
Compute the coefficient of skewness.
Solution: The sample mean is 4.9; S = 2.69; Md = 6

INSTRUCTOR I
STAT 2
3(4.9  6)
SK   1.0989  1.10
2.69
Hence, negatively skewed distribution
Example12: Using the data from the Frequency Distribution Table in example 6, compute the
coefficient of skewness
Solution: MdG= 20.06 μG= 98.91 σG= 7.897
Hence, negatively skewed distribution
3(19.89  20.06)
SK   0.2697  0.27
7.897
Univariate descriptive statistics
Univariate descriptive statistics focus on only one variable at a time. It’s important to examine
data from each variable separately using multiple measures of distribution, central tendency and
spread. Programs like SPSS and Excel can be used to easily calculate these.
Visits to the library
N 6
Mean 9.5
Median 7.5
Mode 3
Standard deviation 9.18
Variance 84.3
Range 24
If you were to only consider the mean as a measure of central tendency, your impression of the
“middle” of the data set can be skewed by outliers, unlike the median or mode.
Likewise, while the range is sensitive to extreme values, you should also consider the standard
deviation and variance to get easily comparable measures of spread.
Bivariate descriptive statistics

INSTRUCTOR I
STAT 2
If you’ve collected data on more than one variable, you can use bivariate or multivariate
descriptive statistics to explore whether there are relationships between them.
In bivariate analysis, you simultaneously study the frequency and variability of two variables to
see if they vary together. You can also compare the central tendency of the two variables before
performing further statistical tests.
Multivariate analysis is the same as bivariate analysis but with more than two variables.
Contingency table
In a contingency table, each cell represents the intersection of two variables. Usually, an
independent variable (e.g., gender) appears along the vertical axis and a dependent one
appears along the horizontal axis (e.g., activities). You read “across” the table to see how the
independent and dependent variables relate to each other.
Number of visits to the library in the past year
Group 0–4 5–8 9–12 13–16 17+
Men 32 68 37 23 22
Women 36 48 43 83 25
Interpreting a contingency table is easier when the raw data is converted to percentages.
Percentages make each row comparable to the other by making it seem as if each group had
only 100 observations or participants. When creating a percentage-based contingency table,
you add the N for each independent variable on the end.
Visits to the library in the past year (Percentages)
Group 0–4 5–8 9–12 13–16 17+ N
Men 18% 37% 20% 13% 12% 182
Women 15% 20% 18% 35% 11% 235

INSTRUCTOR I
STAT 2
From this table, it is more clear that similar proportions of men and women go to the library over
17 times a year. Additionally, men most commonly went to the library between 5 and 8 times,
while for women, this number was between 13 and 16.

INSTRUCTOR I
STAT 2

Introduction To Statistics

Uploaded by

Copyright:

Available Formats

Introduction To Statistics

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Statistics

Uploaded by

Copyright:

Available Formats

1

After the completion of the chapter, students should be able to:

1. differentiate descriptive from inferential Statistics;

An introduction to descriptive statistics

Types of descriptive statistics

There are 3 main types of descriptive statistics:

-The distribution concerns the frequency of each value.

-The central tendency concerns the averages of the values.

Watch a movie at a theater

ROMMEL H. SARREAL, RME

Library visits in the past year Percent

ROMMEL H. SARREAL, RME

Xi-ith observed value in the population

N-total number of observations

Xi - ith observed value in the sample

∑ - sum of all values

n – total numbers of observations

ROMMEL H. SARREAL, RME

The weighted mean is denoted by µw or xw for population and sample respectively.

ROMMEL H. SARREAL, RME

Subject Units Grade wiXi

∑�1 �1 3 2.25 +3 1.75 +3 2.00 +3 2.00 +2 1.50 +1.5(1.25) 31.875

Properties of the Mean

Given the following observed values 3, 8 and 4. The mean is 5.

∑ di 2 = ∑(Xi - µ)2 = ∑( Xi – 5 )2 = (3-5) + (8-5)2 + (4-5)2 = 14

ROMMEL H. SARREAL, RME

Arranged the data in ascending or descending order of magnitude

X(1) = 75 X(2) = 78 X(3) =80 X(4) = 85 X(5) = 87 X(6) = 89 X(7) = 90

Md = X((7+1) = X(8/2) = X(4) = 85.

Arranged the data in ascending or descending order of magnitude

� �+2 +� � 10+2 +� �(5)+�(6) 160+161

ROMMEL H. SARREAL, RME

Lmd = 19.5 C=5 Fb = 30 fmd = 18 N = 64

ROMMEL H. SARREAL, RME

Note: Modal class is the class with the highest frequency

Class Interval Frequency Class Mark Class Boundary

The modal class is the interval 20 – 24 .

Lmo = 19.5 c=5 fb = 13 fmo = 18 fa = 8

ROMMEL H. SARREAL, RME

1. May not be the center of the data.

Frequency Distribution Table

Steps in Constructing a Frequency Distribution with Equal Class Size

1. Determine the range R of the numerical data.

3. Determine the class size C

4. Determine the lower limit of the first class.

5. Construct the class intervals and determine the class frequencies.

1. Sturges Approximation is just a guide and a flexible rule.

ROMMEL H. SARREAL, RME

Example: Raw scores of 50 students in 200 item test.

1. The range R = 172-112 = 60

Class Frequency Class Mark Class Relative <CF >CF

Graphical Presentation of Frequency Distributions with Equal Class Size

ROMMEL H. SARREAL, RME

The Range, R = │HV - LV │ = │90 - 75 │= 15

The Range R = │HV - LV │= │175 – 149 │ = 26

for Grouped data: The range for grouped data is denoted by RG

Class Interval Frequency Class Mark

RG = │ULHC – LLLC │ = │39 – 5 │ =34