Chapter 3
Methods of data
Collection and Organization
Bekele Simegn (MPH)
Santé Medical College
Department of Medicine
Addis Ababa
May 2022
Learning Objectives
At the end of this chapter students are supposed to
1. Identify different types of data and collection methods
2. Explain the common problems in data collection.
3. Define a questionnaire, identify the different types of a
questionnaire.
4. Perform the various methods of data organization .
3.1 . Methods of data collection
Data collection
• Is technique allows us to systematically collect data.
• The first and foremost step in any statistical analysis.
• There are different types of data collection methods.
• Two categories of Statistical data exist:
– Based on sources.
1) Primary data
2) Secondary data
Primary Data:
• Also called
– First hand Data.
• Collected by the investigator himself.
• Original in character.
• Generated by Field surveys.
• More reliable and accurate.
• But, More expensive.
Secondary Data
• When an investigator uses data, which have already
been collected by others.
• Primary data for the agency that collected them,
• Obtained from
– Journals,
– Reports,
– Government publications,
– Publications of professionals and
– Research organizations.
• less expensive, inaccurate.
4. Common Methods of Data collection
1. Survey through Interview
Interviewer Administered questionnaire Interview.
Self administered questionnaire interview.
In depth interview
Key informant Interview
2. Focus group discussion(FGD)
3. Observation
4. Using available information
1. Survey through interview
• Most commonly used research data collection techniques.
Questionnaire
– a document with a list of questions to be answered by
respondents.
1.1 Interviewer administered
- a process of asking for the required information
- Through a prepared questionnaire.
Merit:
– Gives more rooms for getting accurate information
– Helps to apply skip pattern
– High response rate
Demerit:
– Liable to bias by the interviewer.
– Expensive
1.2 Self- administered questionnaire
• Questionnaire is simply forwarded to respondents.
• can be administered to many persons simultaneously
Merit:
• Cheaper than other method.
Demerit:
• Non-response rate is high
• Limited to educated respondents only
Designing a questionnaire
1. Before beginning to design a questionnaire
• Identify the major variables to be addressed
2. While developing the draft
• The size of the questionnaire is as small as possible
• Be clear with why the question is asked and what I will do
with the answer.
• Avoid time consuming, embracing or personal questions
3. Questions character and appearances..
• Questions should flow from
– Simple – to – complex
– General –to- specific
– Impersonal –to- personal
4. Confidentiality statement should be
addressed.
Designing a questionnaire…
Types of research questions
1. Open ended
• Offers free response for the respondents to fill with their own
words
• No multiple options for the respondents
e.g. what is your marital status?
2. Closed ended
– Offers the respondents a list of options
e.g. what is your marital status?
1. Single
2. Married
3. Divorced
4. Widowed
Requirements of questions
A. Must have face validity
• should give relevant measurement for your variables.
– that is the question that we design should be one that give
an obviously valid and relevant measurement for the variable.
For example, it may be self-evident that records kept in an
obstetrics ward will provide a more valid indication of birth
weights than information obtained by questioning mothers.
B. Must be clear and unambiguous
Phrased in language that it is believed the
respondents will understand in the same way
C. Must not be offensive
• it is wise to avoid questions that may offend
respondents.
E.g.
– that deals with intimate matters ,
• Those requiring him/her the to give socially unacceptable
answer.
Classification of Questionnaire
o Based on different issues:
1. Structured Questionnaire
2. Non-structured Questionnaire
1. Structured Questionnaire
• Designed for surveys..
• Questions are arranged in a logical order.
• Divided into subtopics.
• Skipped pattern is important for structured
questionnaire.
• Data collector is expected to smoothly go
through the sequence.
Non-structured Questionnaire
• Commonly used for qualitative studies.
• No strict sequence of questions
• Data collector may rearrange the questions
depending on the response of the subject
1. Standardized Questionnaire
2. Non-standardized Questionnaire
Standard Questionnaire
• developed by a well known body . E.g. WHO questionnaires
• Non-standard Questionnaire
– developed by the researcher
2.2 Common problems in data collection
• Language barriers
• Lack of adequate time
• Expense
• Inadequately trained and experienced staff
• Invasion of privacy
• Bias (professional, personal, seasonal…)
• Cultural norms(e.g. which precludes men
interviewing women…)
1.3 In-depth interview
• A qualitative method that relies on person to person
discussion.
Advantage:
– Good approach to gather
• in-depth
– Attitudes and
– Beliefs from respondents
– Provides an excellent opportunity to probe.
– Participants don’t need to be able to read and write to
respond
In-depth interview…
Disadvantage
– Doesn’t give quantitative information
– It is time taking
– The analysis is relatively difficult
2. Focus Group Discussion (FGD)
• A Qualitative method.
• To obtain in-depth information on perceptions about a
certain topic.
• Group discussion of approximately 6–12 persons,
• Guided by a
– Facilitators and
– Note taker.
Advantage:
• Excellent approach to gather information on
– In-depth attitudes, and
– Beliefs of a group
• It facilitates the exploration of collective memories
• Group dynamics might generate more ideas than individual
interviews
• Provides an excellent opportunity to probe.
• Participants are not required to read or write.
FGD…
Disadvantage:
– Requires strong facilitator to
• guide discussion and
• ensure participation by all members,
– Doesn’t give quantitative information,
– Not good for sensitive issues.
– It is difficult to organize the discussion,
– Analysis is relatively difficult.
3. Observation
• a technique that involves systematically
– Selecting,
– Watching and
– Recoding behaviors of people or other phenomena and
aspects of the setting in which they occur,
• For the purpose of getting specified information.
• includes all methods from
– simple visual observation
– to the use of high level machines and
– sophisticated equipment such as
• Radiographic
• Machine,
• Biochemical techniques,
• Clinical examinations,
• Microbiological examinations…etc
• Qualitative method
Observation…
Advantages
• Gives relatively more accurate data.
Disadvantages
• Investigators or observer’s own biases
• Needs more
– resources and
– skilled human power during the use of high
level machines.
4. Using documentary sources
• Clinical and other personal records, death certificates,
published mortality statistics, census publications….
• Common examples of documentary sources
1. Official publications of CSA
2. Publication of MoH and other ministries
3. International publications like WHO, UNICEF…
4. Records of hospitals or any health institutions
Merit:- Easy to get and collect the data
Demerit:- Highly liable for bias
3. 2 Data Organization
• After collecting data, the first task for a
researcher is to organize data.
• To simplify general overview of results.
• The data collected in a survey is called
– raw data.
– Mass of unsorted data.
• Unable to give much information.
• Need to be organized.
• Note that data and information are entirely
different.
Data and information
Exercise
1. Each students test score is
A. data.
B. Information
2. Average test score of students
A. data.
B. Information
Definition
Classification of the data according to their resemblance.
Helps to present data in a condensed form using
Tables or
Diagrams
There are different methods to organize data.
Methods of Data Organization
1. Array ( ordered array)
2. Frequency distribution tables
1. Array ( ordered array)
Serial arrangement of numerical data-
In ascending or descending order
Helps to know the range over which observation is
distributed.
Appropriate when data are small in size
e.g. 8, 12, 4, 9, 6, 25, 1, 3, 5
Answer => 1, 3, 4, 5,…………25
2. Frequency distribution tables(FDT)
• Frequency
- Repetition of an observation.
• Frequency distribution-
- list of values with corresponding frequencies .
• Frequency distribution table
Table which lists down values with their number of repetition on
frequency table.
• FDT consists of at least two columns
– One listing categories (x)
– Another for frequencies (f)
• In x column values are listed from the highest to lowest,
without skipping any.
• For the frequency column, tallies are determined for each
value.
• The sum of the frequencies should equal N.
Three types of frequency distributions
1. Categorical frequency distribution
2. Ungrouped frequency distribution
3. Grouped frequency distribution
1. Categorical frequency distribution
Used for nominal or ordinal data presentation, i.e. for qualitative data
e.g. A social worker collected the following data on marital status for 25
persons. M=married, S= single , W= widowed and D= divorced.
M S D W D
S S M M M
W D S M M
W D D S S
S W W D D
Q2. Construct a frequency distribution table for this data
• Solution
– Construct a table with three columns
• The first column shows what is being arranged in
ascending order.
• The second column is tally.
• The third is frequency.
2. Ungrouped frequency distribution
Used for discrete quantitative data.
e.g. The following data represents the number of books read
in the past six months by each student in a class of 25.
6 24 14 11 8
6 15 8 14 10
8 27 15 6 15
15 9 8 15 10
6 11 15 8 6
Ungrouped frequency…
Q1. Construct the frequency distribution table for the above
data
• Solutions;- arrange the data in an ascending or descending
orders based on their magnitude and this arrangement is
called an “ordered array”
6 6 6 6 6 8 8 8 8 9 10 10 11 11 14 14 15 15 15 15 15 15 24 27
( ordered array)
Therefore its frequency distribution will be:-
Number of books read(X) Number of students(f)
6 5
8 5
9 1
10 2
11 2
14 2
15 6
24 1
27 1
Total 25
3. Grouped frequency distribution
• Used for continuous quantitative variables
– Sometimes, a set of scores covers a wide range of values
– list of all the x values be too long.
• Too long to be a simple presentation of data.
– To remedy this situation, a GFDT is used.
Remember the following terms
1. Lower class limit (LCL)
- the smallest values in each class.
2. Upper class limit (UCL)
– the largest values in each class.
3. Class Limit(CL)
- The smallest and largest limits of a class.
Cont…..
4. Class Mark/Mid-point/XC
The mid way between the lower and upper limits of the class
Xc = UCL + LCL
2
5. Class intervals(CI)
- The range of values that each class covers.
6. Class width(Cw)
- The difference b/n two consecutive lower class limits.
7. True limits(class boundaries)
• limits that make an interval of a continuous
variable
• Used for smoothening of the class intervals
– Subtract 0.5 from the lower and add it to the upper
limit
In GFDT, the X column lists groups of scores, called
class intervals, not individual values. Intervals all have
same width
Summary
Class limits Frequency
LCL1 - UCL1
LCL2 - UCL2
LCL3 - UCL3
LCL4 - UCL4
LCL5 - UCL5
Total
Cont’d…
8. Relative frequency(Rf)
- frequency of each class divided by the total
frequency.
- It is the proportion (P) for each category.
- Rf/P = f/N, multiply by 100
- The sum of Rf must equal 1.00
8. Cumulative frequencies ( CF)
• Proportion of distribution up to a specific class.
• Calculated as
CF = fx + previous classes f , Multiply By 100
N
• The last value will always be equal to the total for all
observations, since all frequencies are already have been
added to the previous total.
Guideline
• Classes are mutually exclusive
• Include all classes even if frequencies are zero,
don’t leave it blank.
• Use the same width to all classes
• The sum of frequencies must Equal total data
set.
Constructing grouped frequency table
1. Determine the number of classes (K)
The number of class is determined by using
Sturge’s rule,
which is important to determine only number of classes
K=1+3.322logn
n= total number of observation
log= is a common logarithm
calculates the number of rows that you will draw !
N.B. Any decimal numbers must be approximated to the largest
integer!
Generally ,the number of classes should be between 5 and 20
Constructing grouped frequency…
2. Determine the size of the class/width/(CW)
CW = Range
K
Range = Maximum value minus Minimum value from the given
observation
K= the number of class
N.B. Any decimal numbers must be approximated to the largest
integer!
Constructing grouped frequency…
3. Determine the lower class limits(LCL) of the first class
Which is the smallest value in your observation , i.e.
LCL1= the smallest value in the given observation
LCL2=LCL1 + CW
LCL3= LCL2 + CW
LCL4=LCL3 + CW
LCL5=LCL4 + CW………….+ LCLi + CW
Where LCL= Lower Class Limit
Constructing grouped frequency…
4. Determine the Upper class limit of the first class by
using the formula, and use correction factors(U)
UCL1 = LCL1+ CW-U
UCL2=UCL1+ Cw
UCL3=UCL2+ CW
UCL4=UCL3+ CW
UCL5=UCL4+ CW………+ UCLi + CW
where U is a correction factor (1,0.1,0.01…..etc)
Exercise
• Suppose the maximum age of 30 clients
who visited a certain hospital during a night
time was 80 and the minimum was 11.
• Construct a grouped frequency distribution
table.
Exercise
• The following data represents age of CVD pts. visited
emergency room during last 1 week, TASH, A/A.
• 38 28 78 92 63 44 52 69 74 78 85 58 48 32 49 84 68
52 37 102.
• Qn: construct a grouped frequency table.
Exercise
• The following data represents test scores of 30
students for biostatistics course.
• Construct GFDT, Rf, CF.
20 24 18 11 12 13 19 25 29 10 17 16 28
8 5 22 24 15 14 10 26 35 33
37 29 8 20 21 38 18
Exercise 2
• The following table shows the number of hours 45
hospital patients slept following administration of
a certain anesthetic medication.
10 12 4 8 7 3 8 5 7
12 11 3 8 1 1 13 10 4
4 5 5 8 7 7 3 2 3
8 13 1 7 17 3 4 5 5
3 1 17 10 4 7 7 11 8
Thank You!!!