Questionnaire Design and
Questionnaire Design and
Questionnaire Design and
Surveys Sampling
The contents of this site are aimed at students who need to perform basic statistical
analyses on data from sample surveys, especially those in marketing science. Students
are expected to have a basic knowledge of statistics, such as descriptive statistics and
the concept of hypothesis testing.
MENU
1. Introduction
2. Variance and Standard Deviation
3. What Is a Confidence Interval?
4. Questionnaire Design and Surveys Management
5. General Sampling Methods
6. What Is the Margin of Error
7. Sample Size Determination
8. Percentage: Estimation and Testing
9. Multilevel Statistical Models
10.Surveys Sampling Routines
11.Cronbach's Alpha (Coefficient Alpha)
12.The Inter-Rater Reliability
13.Instrumentality Theory
14.Value Measurements Survey Instruments (Rokeach's Value Survey)
15.Danger of Wrong Survey Design and the Interpretation of the Results
16.JavaScript E-labs Learning Objects
Companion Sites:
Business Statistics
Topics in Statistical Data Analysis
Excel For Statistical Data Analysis
Time Series Analysis and Business Forecasting
Computers and Computational Statistics
Probabilistic Modeling
Systems Simulation
Probability and Statistics Resources
Compendium of Web Site Review
The Business Statistics Online Course
Introduction
The main idea of statistical inference is to take a random sample from a population
and then to use the information from the sample to make inferences about particular
population characteristics such as the mean (measure of central tendency), the
standard deviation (measure of spread) or the proportion of units in the population that
have a certain characteristic. Sampling saves money, time, and effort. Additionally, a
sample can, in some cases, provide as much information as a corresponding study that
would attempt to investigate an entire population-careful collection of data from a
sample will often provide better information than a less careful study that tries to look
at everything.
We must study the behavior of the mean of sample values from different specified
populations. Because a sample examines only part of a population, the sample mean
will not exactly equal the corresponding mean of the population. Thus, an important
consideration for those planning and interpreting sampling results, is the degree to
which sample estimates, such as the sample mean, will agree with the corresponding
population characteristic.
In practice, only one sample is usually taken (in some cases such as "survey data
analysis" a small "pilot sample" is used to test the data-gathering mechanisms and to
get preliminary information for planning the main sampling scheme). However, for
purposes of understanding the degree to which sample means will agree with the
corresponding population mean, it is useful to consider what would happen if 10, or
50, or 100 separate sampling studies, of the same type, were conducted. How
consistent would the results be across these different studies? If we could see that the
results from each of the samples would be nearly the same (and nearly correct!), then
we would have confidence in the single sample that will actually be used. On the other
hand, seeing that answers from the repeated samples were too variable for the needed
accuracy would suggest that a different sampling plan (perhaps with a larger sample
size) should be used.
A sampling distribution is used to describe the distribution of outcomes that one
would observe from replication of a particular sampling plan.
Know that estimates computed from one sample will be different from estimates that
would be computed from another sample.
Understand that estimates are expected to differ from the population characteristics
(parameters) that we are trying to estimate, but that the properties of sampling
distributions allow us to quantify, probabilistically, how they will differ.
Understand the relationship between sample size and the distribution of sample
estimates.
See that in large samples, many sampling distributions can be approximated with a
normal distribution.
Deviations about the mean of a population is the basis for most of the statistical tests
we will learn. Since we are measuring how widely a set of scores is dispersed about
the mean we are measuring variability. We can calculate the deviations about the
mean, and express it as variance or standard deviation. It is very important to have a
firm grasp of this concept because it will be a central concept throughout the course.
Know that a confidence interval computed from one sample will be different from a
confidence interval computed from another sample.
Understand the relationship between sample size and width of confidence interval.
Know that sometimes the computed confidence interval does not contain the true
mean value (that is, it is incorrect) and understand how this coverage rate is related to
confidence level.
This part of the course is aimed at students who need to perform basic
statistical analyses on data from sample surveys, especially those in the
marketing science. Students are expected to have a basic knowledge of
statistics such as descriptive statistics and the concept of hypothesis
testing.
When the sampling units are human beings, the main methods of
collecting information are:
face-to-face interviewing
postal surveys
telephone surveys
direct observation.
Internet
Source of Errors
For example consider the following question: "Over the last twelve
months would you say your health has on the whole been : Good? /
Fairly good? / Not good?" . The respondent is required to tick one of 3
thus-labeled boxes.
Further Reading:
Biemer P., and L.Lyberg, Introduction to Survey Quality, Wiley, 2003.
Lehtonen R., and E. Pahkinen, Practical Methods for Design and Analysis of Complex Surveys, Wiley, 2003.
From the food you eat to the TV you watch, from political elections to
school board actions, much of your life is regulated by the results of
sample surveys. In the information age of today and tomorrow, it is
increasingly important that sample survey design and analysis be
understood by many so as to produce good data for decision making and
to recognize questionable data when it arises. Relevant topics are:
Simple Random Sampling, Stratified Random Sampling, Cluster
Sampling, Systematic Sampling, Ratio and Regression Estimation,
Estimating a Population Size, Sampling a Continuum of Time, Area or
Volume, Questionnaire Design, Errors in Surveys.
S2 = p.(1-p).(1-n/N)/(n-1).
W2t /(Nt-nt)S2t/[nt(Nt-1)]
Since the survey usually measures several attributes for each population
member, it is impossible to find an allocation that is simultaneously
optimal for each of those variables. Therefore, in such a case we use the
popular method of allocation which use the same sampling fraction in
each stratum. This yield optimal allocation given the variation of the
strata are all the same.
with N being the size of the total number of cases, n being the sample
size, the expected error, t being the value taken from the t distribution
corresponding to a certain confidence interval, and p being the
probability of an event.
Further Reading:
Thompson S., Sampling, Wiley, 2002.
Estimation is the process by which sample data are used to indicate the
value of an unknown quantity in a population.
±1.96 [P(1-P)/n]1/2
The reported margin of error is the margin of "sampling error". There are
many nonsampling errors that can and do affect the accuracy of polls.
Here we talk about sampling error. The fact that subgroups have larger
sampling error than one must include the following statement: "Other
sources of error include but are not limited to, individuals refusing to
participate in the interview and inability to connect with the selected
number. Every feasible effort is made to obtain a response and reduce
the error, but the reader (or the viewer) should be aware that some error
is inherent in all research."
If you have a yes/no question in a survey, you probably want to calculate
a proportion P of Yes's (or No's). Under simple random sampling survey,
the variance of P is P(1-P)/n, ignoring the finite population correction,
for large n, say over 30. Now a 95% confidence interval is
The question of how large a sample to take arises early in the planning of
any survey. This is an important question that should be treated lightly.
To take a large sample than is needed to achieve the desired results is
wasteful of resources whereas very small samples often lead to that are
no practical use of making good decision. The main objective is to
obtain both a desirable accuracy and a desirable confidence level with
minimum cost.
People sometimes ask me, what fraction of the population do you need?
I answer, "It's irrelevant; accuracy is determined by sample size alone"
This answer has to be modified if the sample is a sizable fraction of the
population.
For an item scored 0/1 for no/yes, the standard deviation of the item
scores is given by SD = [p(1-p)/N] 1/2 where p is the proportion obtaining
a score of 1, and N is the sample size.
The sample size, N, can then be expressed as largest integer less than or
equal to 0.25/SE2
with N being the size of the total number of cases, n being the sample
size, the expected error, t being the value taken from the t distribution
corresponding to a certain confidence interval, and p being the
probability of an event.
For a finite population of size N, the standard error of the sample mean
of size n, is:
[(N -n)/(nN)]½
There are several formulas for the sample size needed for a t-test. The
simplest one is
n = 2(Z+Z)22/D2
which underestimates the sample size, but is reasonable for large sample
sizes. A less inaccurate formula replaces the Z values with t values, and
requires iteration, since the df for the t distribution depends on the
sample size. The accurate formula uses a non-central t distribution and it
also requires iteration.
The simplest approximation is to replace the first Z value in the above
formula with the value from the studentized range statistic that is used to
derive Tukey's follow-up test. If you don't have sufficiently detailed
tables of the studentized range, you can approximate the Tukey follow-
up test using a Bonferroni correction. That is, change the first Z value to
Z where k is the number of comparisons.
Neither of these solutions is exact and the exact solution is a bit messy.
But either of the above approaches is probably close enough, especially
if the resulting sample size is larger than (say) 30.
There are also heuristic methods for determination of sample size. For
example, in healthcare behavior and process measurement sampling
criteria are designed for a 95% CI of 10 percentage points around a
population mean of 0.50; There is a heuristic rule: "If the number of
individuals in the target population is smaller than 50 per month, systems
do not use sampling procedures but, attempt to collect data from all
individuals in the target population."
Further Readings:
Goldstein H., Multilevel Statistical Models, Halstead Press, 1995.
Kish R., G. Kalton, S. Heeringa, C. O'Muircheartaigh, and J. Lepkowski, Collected Papers of Leslie Kish,
Wiley, 2002.
Kish L., Survey Sampling, Wiley, 1995.
The following are two JavaScript applets that construct exact confidence
intervals and test of hypothesis with respect to proportion, percentage,
and binomial distribution with or without a finite population,
respectively.
1. Ignore the claimed value in the null hypothesis, for time being.
2. Construct a 100(1- )% confidence interval based on the available
data.
3. If the constructed CI does not contain the claimed value, then
there is enough evidence to reject the null hypothesis. Otherwise,
there is no reason to reject the null hypothesis.
Number-of- Successes 4
(m):
Required Confidence 0.95
Level (1-):
The Point Estimate:
The Lower Confidence
Limit:
The Upper Confidence
Limit:
Number-of- Successes 4
(m):
Required Confidence 0.95
Level (1-):
The Point Estimate:
The Lower Confidence
Limit:
The Upper Confidence
Limit:
The existence of such data hierarchies is not accidental and should not be
ignored. Individual people differ as do individual animals and this
necessary differentiation is mirrored in all kinds of social activity where
the latter is often a direct result of the former, for example when students
with similar motivations or aptitudes are grouped in highly selective
schools or colleges. In other cases, the groupings may arise for reasons
less strongly associated with the characteristics of individuals, such as
the allocation of young children to elementary schools, or the allocation
of patients to different clinics. Once groupings are established, even if
their establishment is effectively random, they will tend to become
differentiated, and this differentiation implies that the group' and its
members both influence and are influenced by the group membership.
To ignore this relationship risks overlooking the importance of group
effects, and may also render invalid many of the traditional statistical
analysis techniques used for studying data relationships.
Further Readings:
Goldstein H., Multilevel Statistical Models, Halstead Press, New York, 1995.
Longford N., Random Coefficient Models, Clarendon Press, Oxford, 1993.
These books cover a very wide range of applications and theory.
COMPUTE L=L+ID
LEAVE L
COMPUTE E=L-ID
NUMERIC W(f2)
COMPUTE W=0
DO REPEAT A=A1-A8
IF (ID=1) A=UNIF(32)
LEAVE A
IF (E LT A AND A LE L) W=W + 1
END REPEAT
SELECT IF (W GT 0)
SAMPLE 8 FROM 32
MATRIX
COMPUTE INT=RAND*MAKE(32, 1, 1)
SAVE INT/OUTPUT=*/VAR=INT
END MATRIX
SELECT IF (INDEX=INT)
COMPUTE PI=8*HOU*%/91753
COMPUTE EPSN=UNIF(1)
COMPUT L=L+HOU85
LEAVE L
COMPUTE E=L-HOU85
NUMERIC W(F2)
COMPUTE W=0
DO REPEAT A A1 TO A8
IF (ID=1) A=INIF(91753)
LEAVE A
END REPEAT
SELECT IF W GT 0
COMPUTE #C=#C + 1
COMPUTE CASE = #C
COMPUTE #SN=8
COMPUTE #PN=01753
DO IF CASE = 1
COMPUTE #COMP=#RAN
COMPUTE RAN=#RAN
END IF
COMPUTE SAMIND=0
+ COMPUTE #COMP+#COMP+#INT
END LOOP
EXECUTE.
Further Readings:
Bethel J., Sample allocation in multivariate surveys, Survey Methodology, 15, 1989, 47-57.
Valliant R., and J. Gentle, An application of mathematical programming to a sample allocation
problem, Computational Statistics and Data Analysis, 25, 1997, 337-360.
Note 2: The use of the NOMISS option in the CORR procedure. This is
related to Note 1 above. Another way to handling missing observations
is to use the NOMISS option in the CORR procedure. The syntax is as
follows:
PROC CORR DATA=WORK.ONE ALPHA NOMISS;
VAR X1-X10;
The effect of this is to remove all items X1-X10 from analysis for any
record where a at least one of these items X1-X10 are missing.
Obviously, for achievement testing, especially for speeded tests, where
most examines might not be expected to complete all items, this would
be a problem. The use of the NOMISS option would restrict the analysis
to the subset of examines who did complete all items and this quite often
would not be the population of interest when wishing to establish an
internal consistency reliability estimate.
One common approach to resolving this problem might be to define a
number of items that must be attempted for the record to be included.
Some health status measures, for example the SF-36, have scoring rules
that require that at least 50% of the items must be answered for the scale
to be defined. If less than half of the items are attempted, then the scale
is not interpreted. If the scale is considered valid, by THEIR definition,
then all missing values on that scale are replace by the average of the
non-missing items on that scale. The SAS code to implement this scoring
algorithm is summarized below under the assumption that the scale is
has 10 items.
DATA WORK.ONE;SET WORK.ONE;
ARRAY X {10} X1-X10;
IF NMISS(OF X1-X10) 5 THEN DO I=1 TO 10;
X(I) = .;
END;
ELSE IF NMISS(OF X1-X10) = 5 THEN DO I=1 TO 10;
IF X(I) =. THEN X(I) = MEAN(OF X1-X10);
END;
RUN;
Note that replacing all missing values with the average of the non-
missing values in the cases where then number of missing values is not
greater than half of the total number of items will result in an inflated
Cronbach's alpha. A better approach would be to remove from
consideration records where fewer than 50% of the records are
completed and to leave the remaining records intact, with the missing
values still in. In other words, to implement that first IF statement above,
but to eliminate the ELSE IF clause and then to run the PROC CORR
without the NOMISS option. The bottom line: The NOMISS option in
PROC CORR in general, and with the ALPHA options in particular must
be considered carefully.
Note 3: Making sure that all items in the set are coded in the same
direction. Although 0/1 (wrong/right) coding is rarely a problem with
this, for Likert or other scales with more than 2 points on the scale, it is
not uncommon for the scale to remain constant (e.g., Strongly Agree,
Agree, Disagree, Strongly Disagree), but for the wording of the
questions to reverse the appropriate interpretation of the scale. For
example,
Q1. Social Security System Must be reformed SA A D SD
Q2. Social Security System Remain the Same SA A D SD
Clearly, the two questions are on the same scale, but the meanings of the
end points opposite.
In SAS, the way to adjust for this problem is to pick the direction that we
want the scale to be coded, that is, do we want SA to be a positive
statement about the Social Security System or a negative one, and then
reverse scale those items were SA reflects negatively (or positively)
about Social Security System. In the above example, SA for Q1 is a
negative position relative to the Social Security System and, therefore
should be reverse scaled if the decision is to scale so the SA implies
positive attitudes.
If the coding of the 4-point Likert Scale was SA-0, A-1, D-2, SD-3, then
the item will be reverse scaled as follows:
Q1 = 3-Q1, in this way 0 becomes 3-0 = 1; 1 becomes 3-1 = 2; 2
becomes 3-2 = 1; and 3 becomes 3-3 = 0.
If the coding of the 4-point Likert Scale was SA-1, A-2, D-3, SD-4, then
the item will be reverse scaled as follows:
Q1 = 5-Q1, in this way 1 becomes 5-1 = 4; 2 becomes 5-2 = 3; 3
becomes 5-3 = 2; and 4 becomes 5-4 = 1.
From the earlier example, If items X1, X3, X5, X7, and X9 would need
to be reverse scaled for before computing an internal consistency
estimate, then the following SAS code would do the job, Assuming a the
4-point Likert scale illustrated above with 1-4 scoring.
DATA WORK.ONE;SET WORK.ONE;
ARRAY X {10} X1-X10;
/* DEFINING AN ARRAY FOR THE 10 ITEMS */
DO I=1,3,5,7,9; /* INDICATING WHICH ITEMS
IN THE ARRAY TO BE REVERSE SCALED */
X(I) = 5-X(I); /* REVERSE SCALING
FOR 1-4 CODING OF 4-POINT LIKERT SCALE */
END;
RUN;
It should be noted that some of the output from PROC CORR with the
ALPHA option, such as the correlation of the item with the total and the
internal consistency estimate for the scale with each individual item
NOT part of the scale provides very useful diagnostics that should alert
the researcher about either poorly functioning items or items that were
missed when considering reverse scaling. An item that correlated
negatively with the total usually needs to be reverse scaled or is poorly
formed.
Further Readings:
Feldt L., and R. Brennan, Reliability, in Educational Measurement, Linn R. (Ed.), 105-146, 1989,
Macmillian Publishing Company.
Instrumentality Theory
Suppose two corresponding items, one from the dimension being rated
and its mate, the relative importance of that topic, called the "valence",
are cross-multiplied, then added up across all such pairs, then divided by
the number of such pairs. This procedure provides a weighted score, the
sum of the items each weighted by its relative importance. The higher
the average weighted score, the greater the overall importance and rating
of the topic. The technique has been well-liked since two issues are
being considered here, how satisfied or prepared or . . . someone is, and
how important that topic is to them. The approach has been applied to
multivariate issues such as factors affecting leaving an organization, job
satisfaction, managerial behavior, etc.
Research into the relationship between peoples values and their actions
as consumers is still in its infancy. However, it is an area that is destined
to receive increased attention, for it taps a broad dimension of human
behavior that could not be explored effectively before the availability of
standardized value instruments.
If the items are not reworded to accommodate the Likert format; instead,
respondents are asked to indicate the degree of personal importance each
RVS value holds, from "very unimportant" to "very important," and then
they're given the standard Likert scale next to each RVS value. Some
applications use , for example, a 5-point scale and then features a rank-
ordering of the top three RVS values after each list of has already been
rated, to use in correcting for end-piling. It is show that in many cases,
slightly, but not significantly, lower test-retest reliabilities for the Likert
versus rank-ordered procedure.
Since the common reason for preferring to use the RVS in a Likert
format is to be able to perform normative statistical tests on the data, it is
worthwhile to point out that there are good arguments in favor of using
normative statistical tests on RVS data with the scale in its original,
rank-ordered format, under some conditions.
Further Readings:
Arsham H., Questionnaire Design and Surveys Sampling, SySurvey: The Online Survey Tool, 2002.
Braithwaite V., Beyond Rokeach's equality-freedom model: Two dimensional values in a one dimensional
world, Journal of Social Issues , 50, 67-94, 1994.
Boomsma A., M. Van Duijn, and T. Snijders, (eds.), Essays on Item Response Theory, Springer Verlag,
2000.
Gibbins K., and I. Walker, Multiple interpretations of the Rokeach value survey, Journal of Social
Psychology, 133, 797-805, 1993.
Sijtsma K., and I. W. Molenaar, Introduction to Nonparametric Item Response Theory, Sage 2002. Provides
an alternative to parametric Item Response Theory is non-parametric (ordinal) Item Response Theory, such
as the Mokken Scaling method.
One of the first things that learners of survey design and sampling must
recognize is that statistical results can very easily be interpreted wrongly.
Saying such as “You can prove anything with figure” have gained
widespread circulation because they embody the bitter experience of
people who have found themselves misled by incorrect deductions
drawn from basically correct data.