FORM 4 – STATISTICS CLASS NOTES
TABLE OF CONTENTS
SECTION HEADER PAGE #
TYPES OF DATA 2
STATISTICAL DIAGRAMS 4
➢ Bar Graphs
➢ Pie-Charts (To be completed on your own)
➢ Histogram
➢ Frequency Polygon
KEY TERMS REVIEW 10
➢ CLASS INTERVAL
➢ CLASS LIMIT
➢ CLASS BOUNDARY
➢ UPPER-CLASS BOUNDARY
➢ LOWER-CLASS BOUNDARY
➢ CLASS MIDPOINT
➢ CLASS WIDTH
➢ RANGE
➢ ESTIMATED MEAN
➢ MODAL CLASS
➢ MEDIAN CLASS
➢ PROBABILITY
MEAN & ESTIMATED / WEIGHTED MEAN 14
➢ Version 1 – Discrete Data organized in Table Form
➢ Version 2 – Continuous Data or Data organized by
class intervals
PAST PAPER QUESTION ON KEY TERMS / 16
CONCEPTS
3 AVERAGES 22
➢ Mean
➢ Median
➢ Mode
1|Page
TYPES OF DATA:
▪ DISCRETE – Data that can only contain certain values (countable data and only a limited number of values is
possible.). The discrete values cannot be subdivided into parts, i.e. discrete data is expressed as integer values
only. For example, the number of children in a school is discrete data. You can count whole individuals. You
can’t count 1.5 kids. We can display discrete data by bar graphs and pie charts.
EX:
- Number of students in a class,
- The number of workers in a company,
- The number of parts damaged during transportation,
- The number of siblings a randomly selected individual has.
- Number of languages an individual speaks.
▪ CONTINUOUS – Continuous data is information that can be measured and could be meaningfully divided into
finer levels. It can be measured on a scale or continuum and can have almost any numeric value. We can
display continuous data by Histograms and Line graphs.
EX:
- You can measure your height at very precise scales — meters, centimetres, millimetres and etc.
(Height = 154.32 cm)
- The amount of time required to complete a project.
- The height of children.
2|Page
- The amount of rain, in inches, that falls in a hurricane.
- The square footage of a two-bedroom house.
- The weight of a truck.
▪ QUANTITATIVE – Data expressed as numbers (Discrete and Continuous)
▪ QUALITATIVE – Non-numerical Data represented by Name, Symbol, or Number code or Opinion and unique
to one person
EX:
3|Page
STATISTICAL DIAGRAMS
▪ BAR GRAPHS: A Bar graph is a pictorial representation of the numerical data by a number of bars of uniform
width erected horizontally or vertically with equally spacing between them.
o KEY FEATURES –
▪ Title for Bar Graph and Axes
▪ Scale
▪ Rectangular Bars / Line that expresses the frequency of data
▪ Fixed space between the bars
▪ Fixed width for bars
o TYPES –
▪ Quantitative Data (Discrete and Continuous)
Bar Graph of 3B's Ice-Cream
Preferences
13
No. of Students (Frequency)
8 8
5
3
CHOCOLATE COOKIES & COCONUT MINT CHOCOLATE VANILLA
CREAM
Ice-Cream Flavours
▪ PIE-CHART (COMPLETE ON YOUR OWN)
4|Page
▪ HISTOGRAM: A Histogram is a pictorial representation of the numerical data by a number of bars of uniform
width erected horizontally or vertically with no spacing between them.
o KEY FEATURES –
▪ Title for Histogram and Axes
▪ Scale
▪ Rectangular Bars that expresses the frequency of data
▪ No space between the bars
▪ Fixed width for bars
▪ Graph Frequency (y-axis) vs. Class Boundaries (x-axis)
o DATA TYPE –
▪ Quantitative Data (Continuous)
Class Interval Frequency
300 – 399 23
400 – 499 30
500 – 599 17
600 - 699 10
Math SAT Sores for Form 5 Class
30
Frequency
23
17
10
300 – 399 400 – 499 500 – 599 600 - 699
SAT Math Sores
5|Page
Class Interval Class Boundary Frequency
20 – 29 19.5 – 29.5 3
30 – 39 29.5 – 39.5 5
40 – 49 39.5 – 49.5 8
50 – 59 49.5 – 59.5 6
60 – 69 59.5 – 69.5 2
6|Page
▪ FREQUENCY POLYGON: A Frequency Polygon is a pictorial representation of the numerical data created by
using lines to join the midpoints of each class interval. It is used to depict trends in the data. Sometimes it is
left “free floating” and other times it is expressed as a closed polygon.
o KEY FEATURES –
▪ Title for Frequency Polygon and Axes
▪ Scale
▪ Straight lines connecting the midpoints of class intervals
▪ Free Floating or Closed Polygon
▪ Graph Frequency (y-axis) vs. Class Midpoint (x-axis)
o DATA TYPE –
▪ Quantitative Data (Continuous)
Midpoint Frequency
349.5 23
449.5 30
549.5 17
649.5 10
Frequency Polygon of SAT Math Scores
35
30
25
Frequency
20
15
10
0
0 100 200 300 400 500 600 700
SAT Math Scores
(FREE – FLOATING)
7|Page
Midpoint Frequency
249.5 0
349.5 23
449.5 30
549.5 17
649.5 10
749.5 0
Frequency Polygon of SAT Math Scores
35
30
Frequency
25
20
15
10
5
0
0 100 200 300 400 500 600 700 800
SAT Math Scores
(COSED POLYGON)
8|Page
Class Interval Class Midpoint Frequency
20 – 29 24.5 3
30 – 39 34.5 5
40 – 49 44.5 8
50 – 59 54.5 6
60 – 69 64.5 2
9|Page
OBJECTIVES: (9/23/20 - 9/30/20)
Review key terms from Form 3 Statistics
Discuss the features and Graph a Histogram by hand
Discuss the features and Graph a Frequency Polygon by hand
Introduce the concepts of modal class and determining the exact mode from a
Histogram
KEY TERMS
▪ CLASS INTERVAL
▪ CLASS LIMIT
▪ CLASS BOUNDARY
▪ UPPER-CLASS BOUNDARY
▪ LOWER-CLASS BOUNDARY
▪ CLASS MIDPOINT
▪ CLASS WIDTH
▪ RANGE
▪ ESTIMATED MEAN
▪ MODAL CLASS
▪ PROBABILITY
▪ MEDIAN CLASS (TBD)
10 | P a g e
SAMPLE QUESTION: (GROUPED DATA)
CLASS INTERVAL FREQUENCY CLASS WIDTH
300 – 399 13 399.5 – 299.5 =100
400 – 499 20 499.5 – 399.5 =100
500 – 599 7 599.5 – 499.5 =100
Class Interval: Are subsets into which large quantities of data are arranged into groups or ranges. Each class
interval must have the same class width.
For interval: 300 – 399
Class Limit: Is the smallest and largest data value that exists in a class interval.
𝑳𝒐𝒘𝒆𝒓 𝑳𝒊𝒎𝒊𝒕 = 𝟑𝟎𝟎 , 𝑼𝒑𝒑𝒆𝒓 𝑳𝒊𝒎𝒊𝒕 = 𝟑𝟗𝟗
Class Boundary: Are the numbers used to separate class intervals. In addition, it is the midpoint of the upper-
class limit of one class and lower-class limit of the subsequent class.
±𝟎. 𝟓 𝒕𝒐 𝒆𝒂𝒄𝒉 𝒄𝒍𝒂𝒔𝒔 𝒍𝒊𝒎𝒊𝒕 𝒔𝒊𝒄𝒆 𝒕𝒉𝒆𝒓𝒆 𝒊𝒔 𝒂 𝟏 𝒖𝒏𝒊𝒕 𝒈𝒂𝒑 𝒃𝒆𝒕𝒘𝒆𝒆𝒏 𝒊𝒏𝒕𝒆𝒓𝒗𝒂𝒍𝒔
𝑳𝒐𝒘𝒆𝒓 𝑪𝒍𝒂𝒔𝒔 𝑩𝒐𝒖𝒏𝒅𝒂𝒓𝒚 = 𝑳𝒐𝒘𝒆𝒓 𝑳𝒊𝒎𝒊𝒕 − 𝟎. 𝟓 = 𝟑𝟎𝟎 − 𝟎. 𝟓 = 𝟐𝟗𝟗. 𝟓
𝑼𝒑𝒑𝒆𝒓 𝑪𝒍𝒂𝒔𝒔 𝑩𝒐𝒖𝒏𝒅𝒂𝒓𝒚 = 𝑼𝒑𝒑𝒆𝒓 𝑳𝒊𝒎𝒊𝒕 + 𝟎. 𝟓 = 𝟑𝟗𝟗 + 𝟎. 𝟓 = 𝟑𝟗𝟗. 𝟓
𝑪𝒍𝒂𝒔𝒔 𝑩𝒐𝒖𝒏𝒅𝒂𝒓𝒚 = 𝟐𝟗𝟗. 𝟓 − 𝟑𝟗𝟗. 𝟓
Class Width: The difference between the Upper-Class and Lower-Class Boundaries in every class-interval. The
class width must be the same for every class interval in a frequency distribution.
𝑪𝒍𝒂𝒔𝒔 𝑾𝒊𝒅𝒕𝒉 = 𝑼𝒑𝒑𝒆𝒓𝑪𝒍𝒂𝒔𝒔 𝑩𝒐𝒖𝒏𝒅𝒂𝒓𝒚 – 𝑳𝒐𝒘𝒆𝒓𝑪𝒍𝒂𝒔𝒔 𝑩𝒐𝒖𝒏𝒅𝒂𝒓𝒚 = 𝟑𝟗𝟗. 𝟓 − 𝟐𝟗𝟗. 𝟓 = 𝟏𝟎𝟎
11 | P a g e
Class Midpoint: Is the middle value between the class-limits in a class interval.
𝑥1+ 𝑥2
𝐶𝑙𝑎𝑠𝑠 𝑀𝑖𝑑𝑝𝑜𝑖𝑛𝑡 = , where 𝑥1 𝑎𝑛𝑑 𝑥2 𝑎𝑟𝑒 𝑡ℎ𝑒 𝑢𝑝𝑝𝑒𝑟 𝑎𝑛𝑑 𝑙𝑜𝑤𝑒𝑟 − 𝑐𝑙𝑎𝑠𝑠 𝑙𝑖𝑚𝑖𝑡𝑠
2
𝑥1+ 𝑥2 (300+399)
= = = 349.5
2 2
Range: The difference between the lowest and highest data entries
Modal class: The class interval with the highest frequency. = 400 – 499
Probability: How likely an event will occur in an experiment. Probability must be a number between 0 and 1.
PROBABILITY SPECTRUM
𝑁𝑜. 𝑜𝑓 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 𝑖𝑛 𝑒𝑥𝑝𝑒𝑟𝑖𝑚𝑒𝑛𝑡
𝑃(𝐸𝑣𝑒𝑛𝑡 𝑂𝑐𝑐𝑢𝑟𝑖𝑛𝑔) =
𝑇𝑜𝑡𝑎𝑙 𝑁𝑜. 𝑜𝑓 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠
20 + 7 27
𝑃(𝑠𝑡𝑢𝑑𝑒𝑛𝑡 𝑜𝑏𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝑎 𝑠𝑐𝑜𝑟𝑒 𝑜𝑓 𝑚𝑜𝑟𝑒 𝑡ℎ𝑎𝑛 400) = = = 0.675
40 40
Based on Probability Spectrum, a probability of 0.675 ≈ 0.7 means there is a probable or likely chance of
the event occurring.
12 | P a g e
DETERMINING THE ESTIMATED MEAN FOR GROUPED DATA
CLASS CLASS MIDPOINT FREQUENCY 𝒙×𝒇
INTERVAL BOUNDARIES (𝒙) (𝒇)
300 – 399 299.5 – 399.5 349.5 13 4,543.5
400 – 499 399.5 – 499.5 449.5 20 8,990
500 – 599 499.5 – 599.5 549.5 7 3,846.5
600 - 699 599.5 – 699.5 649.5 29 18,835.5
∑ 𝒇 = 𝟔𝟗 ∑ 𝒙 × 𝒇 = 𝟑𝟔, 𝟐𝟏𝟓. 𝟓
𝑺𝒖𝒎 𝒐𝒇 𝒂𝒍𝒍 𝒅𝒂𝒕𝒂 𝒆𝒏𝒕𝒓𝒊𝒆𝒔 ∑ 𝒙 × 𝒇
𝑬𝒔𝒕𝒊𝒎𝒂𝒕𝒆𝒅 𝑴𝒆𝒂𝒏 = =
𝒕𝒐𝒕𝒂𝒍 𝒏𝒐. 𝒐𝒇 𝒅𝒂𝒕𝒂 𝒆𝒏𝒕𝒓𝒊𝒆𝒔 ∑𝒇
, where f = frequency and x = midpoint of class interval
𝟑𝟔, 𝟐𝟏𝟓. 𝟓
= = 𝟓𝟐𝟒. 𝟗
𝟔𝟗
13 | P a g e
MEAN & ESTIMATED / WEIGHTED MEAN
(Revisited)
WARM-UP (9/30/20)
EX 1:
A police station kept records of the number of road traffic accidents in their area each day for 100 days. The data was
tabulated in a frequency table shown below. Calculate the mean number of accidents per day.
EX 2:
The table below gives data on the heights, in cm, of 51 children.
14 | P a g e
SAMPLE:
(VERSION 1) – DISCRETE DATA PRESENTED IN A TABLE
0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 7, 8
To avoid this tedious process, the data was organized into a table to summarize the information
1, 1, 1, 1, 1, 1, 1, 1, 1, 1 = 10 x 1 etc
𝑺𝒖𝒎 𝒐𝒇 𝒂𝒍𝒍 𝒅𝒂𝒕𝒂 𝒆𝒏𝒕𝒓𝒊𝒆𝒔 ∑(𝒙 × 𝒇) 𝟑𝟐𝟑
𝑴𝒆𝒂𝒏 = = = = 𝟑. 𝟐𝟑
𝒕𝒐𝒕𝒂𝒍 𝒏𝒐. 𝒐𝒇 𝒅𝒂𝒕𝒂 𝒆𝒏𝒕𝒓𝒊𝒆𝒔 ∑𝒇 𝟏𝟎𝟎
, where f = frequency and x = # of accidents
(VERSION 2) – GROUPED DATA
𝑺𝒖𝒎 𝒐𝒇 𝒂𝒍𝒍 𝒅𝒂𝒕𝒂 𝒆𝒏𝒕𝒓𝒊𝒆𝒔 ∑ 𝒙 × 𝒇 𝟖𝟐𝟏𝟓
𝑬𝒔𝒕𝒊𝒎𝒂𝒕𝒆𝒅 𝑴𝒆𝒂𝒏 = = = = 161 𝑐𝑚
𝒕𝒐𝒕𝒂𝒍 𝒏𝒐. 𝒐𝒇 𝒅𝒂𝒕𝒂 𝒆𝒏𝒕𝒓𝒊𝒆𝒔 ∑𝒇 𝟓𝟏
, where f = frequency and x = midpoint of class interval
15 | P a g e
Q&A: Discrete vs Continuous Data?
Discrete Data is countable – Ex: No. of students in a class
Continuous Data is measured – Ex: Height or Weight of students
SAMPLE QUESTION: CSEC Paper 2 – June ’16 (Question 7 modified)
Twenty bags of sugar were weighed. The weights, to the nearest kg, are as follows:
3 38 17 33 28
12 43 38 31 30
11 8 23 18 26
50 22 35 39 5
a) Complete the frequency table for the data shown above.
Weight (kg) Tally Number of bags
1 – 10 III 3
11 – 20 IIII 4
21 – 30 IIII 5
31 – 40 IIII I 6
41 – 50 II 2
∑ 𝑓 = 20
16 | P a g e
Weight (kg) Class Boundaries Number of bags
1 – 10 0.5 – 10.5 3
11 – 20 10.5 – 20.5 4
21 – 30 20.5 – 30.5 5
31 – 40 30.5 – 40.5 6
41 – 50 40.5 – 50.5 2
∑ 𝑓 = 20
b) For the class interval 31 – 40, state:
i) The upper-limit = 40
ii) The lower-class boundary = 31 – 0.5 = 30.5
iii) The upper-class boundary = 40 + 0.5 = 40.5
𝑥1+ 𝑥2 (31+40)
iv) The class midpoint = = = 35.5
2 2
c) For the data above, determine
i) The range = ℎ𝑖𝑔ℎ𝑒𝑠𝑡 𝑑𝑎𝑡𝑎 𝑒𝑛𝑡𝑟𝑦 – 𝑠𝑚𝑎𝑙𝑙𝑒𝑠𝑡 𝑑𝑎𝑡𝑎 𝑒𝑛𝑡𝑟𝑦 = 50 – 3 = 47
ii) The class width = 𝑢𝑝𝑝𝑒𝑟𝑐𝑙𝑎𝑠𝑠 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦 − 𝑙𝑜𝑤𝑒𝑟𝑐𝑙𝑎𝑠𝑠 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦 = 40.5 – 30.5 = 10
(must be the same value for every class interval)
iii) The estimated mean (= Mean / Average)
Weight (kg) Midpoint Number of bags 𝒙×𝒇
(x) (f)
1 – 10 5.5 3 16.5 (5.5 x 3)
11 – 20 15.5 4 62.0
21 – 30 25.5 5 127.5
31 – 40 35.5 6 213.0
41 – 50 45.5 2 91.0
∑ 𝑓 = 20 ∑ 𝑥 × 𝑓 = 510
𝑺𝒖𝒎 𝒐𝒇 𝒂𝒍𝒍 𝒅𝒂𝒕𝒂 𝒆𝒏𝒕𝒓𝒊𝒆𝒔 ∑ 𝒙 × 𝒇 𝟓𝟏𝟎
𝑬𝒔𝒕𝒊𝒎𝒂𝒕𝒆𝒅 𝑴𝒆𝒂𝒏 = = = = 𝟐𝟓. 𝟓𝒌𝒈
𝒕𝒐𝒕𝒂𝒍 𝒏𝒐. 𝒐𝒇 𝒅𝒂𝒕𝒂 𝒆𝒏𝒕𝒓𝒊𝒆𝒔 ∑𝒇 𝟐𝟎
, where f = frequency and x = midpoint of class interval
17 | P a g e
iv) The modal class = class interval with the highest frequency =
31 - 40
v) The median class
18 | P a g e
d) On a graph page, using a scale of 2cm to represent 10kg on the x-axis, and 1cm to
represent 1 bag on the y-axis, draw a histogram to represent the data contained in
your frequency table above.
Weight (kg) Class Number of
Boundaries bags
(x) (frequency)
(y)
1 – 10 0.5 – 10.5 3
11 – 20 10.5 – 20.5 4
21 – 30 20.5 – 30.5 5
31 – 40 30.5 – 40.5 6
41 – 50 40.5 – 50.5 2
∑ 𝒇 = 𝟐𝟎
Class Boundaries (x-axis) and Frequency (y-axis)
19 | P a g e
e) On a graph page, using a scale of 2cm to represent 10kg on the x-axis, and 1cm to
represent 1 bag on the y-axis, draw a frequency polygon to represent the data
contained in your frequency table above.
Weight (kg) Class Number of
Midpoint bags
(x) (frequency)
(y)
1 – 10 5.5 3
11 – 20 15.5 4
21 – 30 25.5 5
31 – 40 35.5 6
41 – 50 45.5 2
∑ 𝒇 = 𝟐𝟎
𝑥1+ 𝑥2
𝐶𝑙𝑎𝑠𝑠 𝑀𝑖𝑑𝑝𝑜𝑖𝑛𝑡 = , where 𝑥1 𝑎𝑛𝑑 𝑥2 𝑎𝑟𝑒 𝑡ℎ𝑒 𝑢𝑝𝑝𝑒𝑟 𝑎𝑛𝑑 𝑙𝑜𝑤𝑒𝑟 −
2
𝑐𝑙𝑎𝑠𝑠 𝑙𝑖𝑚𝑖𝑡𝑠
𝑥1+ 𝑥2 (1+10)
= = = 349.5
2 2
Class Midpoint (x-axis) and Frequency (y-axis)
20 | P a g e
OBJECTIVES: (10/7/20)
Review key terms from Form 3 Statistics
Recall the concept of Probability
Discuss the features and Graph a Frequency Polygon by hand
Finding the Exact Mode from Histogram
OBJECTIVES: (10/14/20)
Introduce Measures of Central Tendency (Mean, Median and Mode)
Determine the Median from Ungrouped or Grouped Data
Determine the Cumulative Frequency
Graph a Cumulative Frequency Curve
21 | P a g e
AVERAGES - Individual Data Values
Suppose we need to compare the lengths of leaves from two different varieties of tree. We
can use an AVERAGE such as the MEAN, MEDIAN or MODE to see which has longer leaves.
▪ MEAN: Add up the values and divide by the number of values
▪ MEDIAN: Arrange the numbers in ascending order. The median is the middle value (if
there are odd no. of data entries), or average of the two middle values (if there are
even no. of data entries)
▪ MODE (or Modal Value): The most common value. The mode is very rarely used. The
mean and median are commonly used.
Q & A: Why do we need two different averages?
Often, either is suitable, but if the data contains any extreme values or outliers then the
median is preferable as the mean can be greatly affected by one value.
For example:
Suppose a class did a test and the marks were:
16 20 22 25 28 30 32 95
𝑠𝑢𝑚 𝑜𝑓 𝑡𝑒𝑠𝑡 𝑚𝑎𝑟𝑘𝑠 268
The mean mark = = = 33.5
𝑛𝑜.𝑜𝑓 𝑚𝑎𝑟𝑘𝑠 8
𝑁+1
The median mark = (
2
) 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒, 𝑤ℎ𝑒𝑟𝑒 𝑁 = 𝑛𝑜. 𝑜𝑓 𝑑𝑎𝑡𝑎 𝑒𝑛𝑡𝑟𝑖𝑒𝑠
The formula gives the position of the median once the data is arranged in ascending order
8+1 9
=(
2
) 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = (2) 𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 = 4.5𝑡ℎ 𝑣𝑎𝑙𝑢𝑒
4.5th value implies that the median lies between the 4th and the 5th data entry
Therefore,
25+28
The median mark = = 26.5
2
NOTE:
If we use this as the average, then 7 of the 8 students are below that average. In this
situation it is better to use the median, which is 26.5
22 | P a g e