Statistics 2 Marks and Notes 2019
Statistics 2 Marks and Notes 2019
Statistics 2 Marks and Notes 2019
UNIT 1
PART A
1. Define Probability.
The term probability means “It is a chance of occurrence of a certain event when
appeared quantitatively”.
1. P[X=X]=0
2. ∑ P(x)=1
8. Write the usefulness of Poisson distribution?
The Poisson distribution can be considered to be a good approximation of the binomial
distribution when the number of trials (n) is large and the probability of success (p) is very small
(i.e) as n∞ and p0 . It is given by the function.
e-λ λx
P(X=x) = -----, x=0,1, 2…..
X!
18. Write the mean and variance of uniform distribution. (Nov/Dec 2018)
Mean = a+b = 1
2
Variance = (a+b)2
12
19. What are the different types of variables?( Jan 2015)
PART-B
The normal probability cure with mean µ and standard deviation σ has the following
properties:
distribution.
4. It has significant applications in statistical quality control as the control chart in statistical
quality control is closely related to normal distribution.
5. It can be used for smoothing and graduating a distribution which is not normal, simply by
contracting a normal case.
6. It serves as a guiding instrument in the analysis and interpretation of statistical data.
5. Assumption for the binomial, poisson and normal distributions. Under what conditions can
you approximate a binomial and a poisson distribution as a normal distribution? How will you
translate the distribution parameters into normal distribution parameters? (Jan 2015)
Assumption for binomial
i) N->∞
ii) P->0
iii) P=π\n
You toss a coin 10 times. The random variable X is the number of times you get a ‘tail’. X can only
take values 0, 1, 2, … , 10. Therefore, X is a discrete random variable.
Therefore, a person can die immediately on birth (where life = 0 years) or after he attains an age
of 110 years. Within this range, he can die at any age. Therefore, the variable ‘Age’ can take any
value between 0 and 110.
Hence, continuous random variables do not have specific values since the number of values is
infinite. Also, the probability at a specific value is almost zero. Instead, it is defined over an
interval of values and represented by the area under a curve.
UNIT-II
1. What is sampling?
“Sampling” basically selecting people/objects from a “population” in order to test the whole
population for something.
For example, we might want to find out how people are going to vote at the next election.
Obviously we can’t ask everyone in the country, so we take a sample.
Population in statistics means the whole of the information which comes under the preview
of Statistical investigation.
A population may be finite or infinite according as the number of individuals. E.g.: The
population of the heights of the students in a school.
Statistic:
Any Statistical measure computed from sample data is known as statistic.
6. Define Sampling.
Sampling is the procedure or process of selecting a sample from the population. It is the study of
existing relationship between a population and a sample drawn from the population.
7. What is meant by Statistical estimation?
It helps in estimating an unknown population parameter (such as population mean, median,
mode, standard deviation, kurtosis ect.) on the basis of suitable statistic ( such as sample mean,
median , mode, variance ect.) computed from the sample drawn from such parent population.
When sampling is done from a population with mean μ and finite standard deviation σ , the
sampling distribution of sample mean x will tend to be normal distribution with mean μ and
S.D σ/ √n as sample size becomes large.
Statistic Parameter
Sample statistic is used to estimate the Population parameter are estimated by sample
population parameter and it is called as statistics,
estimator of parameter.
It is denoted by σᵪ = σ/√n.
14. If Random samples Come from the normal population, what can be said about the sampling
distribution of mean? (Nov/Dec 2016)
If the population is normal , the sampling distribution of mean (x) is also normal for Samples of
all sizes
4. Variability of population.
18. What is non-sampling errors?
Non-sampling errors automatically creep in due to human factor which always varies from
one investigator to another. The non-sampling error arises from the following reason.
1. Quality planning
2. Error in response
3. Non-response by ours
4. Errors in design of the survey
5. Errors in complication
6. Publication error
19. What are the types of estimation?
1. Point estimation
2. Interval estimation
20. Define point estimation?
In point estimation, a single statistics is used to provide an estimate of the population
parameter.
The estimate of the population parameter given by a single number is called the point
estimation.
21. Define interval estimation?(Nov/Dec 2018)
Interval estimation is the range of values used in making estimation of a population
parameter.
The interval estimation of a population parameter ‘ө’ is the estimation of the parameter
‘ө’ with the help of the interval (t-s, t+s) Where ‘t’ is sample statistics (i.e) t-s ≤ ө ≤ t+s
1.A test of statistical hypothesis is one sided is When the test of hypothesis is made on the
called as one tailed test. basis of rejection region represented by both
side of the standard normal curve it is called
a two tailed test.
2. It may be left or right side. It accepts the both side.
23. Distinguish between point estimation and interval estimation of population parameter.(jan
2015)
Point estimate Interval estimate
When a single statistic is used to provide an In an estimate parameter lies between the
estimate it is called as point estimate of range of values (i.e) two numbers is called as
population parameter.
interval estimate. This is otherwise called as
confidence interval.
It is deterministic It is Probabilistic
24. Why does sampling introduce errors in research studies? Jan 2014
Sampling assumes that a small subset represents the whole population which might not be the
case.
25. Define standard error.(May/ June 2016)
The standard deviation of sampling distribution of a statistic in testing the hypothesis is called as
Standard error of statistic.
26. Write two properties of the sampling distribution of mean when population is normally
distributed. (Jan 2016)
Mean = population mean i.e µx = µ
Standard deviation = Population standard deviation / square root of sample size i.e σx =
σ / √n
The sampling distribution of x is the probability distribution of all possible values of the sample
mean x. If a population is normal, the sampling distribution of the mean (x) is also normal for all
sizes.
The value of level of significance (zα) and Error (E) must be specified.
30. What is standard error of proportion?
It is computed from proportions of all possible samples of same size drawn from a
population.
It is denoted by σp = √pq/n
Where P= population proportion; q = 1-p ; n= Sample Size
Part B
1. EXPLAIN SAMPLING TECHNIQUES.
DESCRIBE THE PROBABILITY AND NON-PROBABILITY SAMPLING METHODS
INTRODUCTION:
Sampling is a process of selecting a sufficient number of elements from the population. Sample
designs are basically of two types viz.,i) Probability Sampling ii) Non-Probability sampling
I. PROBABILITY SAMPLING
Under this sampling design, every item of the universe has an equal chance of inclusion in the
sample. .
In this method every nth element in the population starting with a randomly chosen element between
1 and n.
In this sampling, one unit is selected at random from the universe and the other units are at a
specified interval from the selected unit.
Example: If the researcher wants to conduct a study on consumption rate of ‘Aavin Milk’ in a colony
of 400 houses. The sample size is supposed to be 100. So the researcher chooses every 4 th house like
4, 8, 12, and 16.
Advantages:
Sample easy to select and Cost effective
Suitable sampling frame can be identified easily
Disadvantages:
Sample may be biased
Each element does not get equal chance
In this method, the universe is divided into some recognizable sub-groups which are called
'clusters'. After this a simple random sample of these clusters is drawn and then all the units
belonging to t clusters constitute the sample.
In cluster sampling, groups of elements that are heterogeneous form a group and then the groups are
chosen randomly.
Example: A researcher wants to know the climatic changes and its effects over people throughout
Tamilnadu. He divided TN into different parts like Northern TN, Southern TN, Eastern TN,
Western TN, Central TN and selects some samples in these areas.
Advantages:
1) CONVENIENCE SAMPLING:
In this researcher chooses the sampling units on the basis of convenience or accessibility. It is called
accidental samples because the sample-units enter by accident. This is also known as a sample of the
man in the street.
Eg: MD of that company wants to know about the competitive products, its features and pricing
strategies, he enquires only with those 5 officers.
Advantage:
A sample selected for ease of access, immediately known population group and good response
rate.
Disadvantage:
2. What is sample size? Explain the factors to be considered while deciding the sample size.
Sample Size:
The number of individuals included in the finite sample is called sample size . It is typically
denoted by n, a positive integer (natural number).
For example, in a national voting poll the margin of error might be + or – 3%. This means that if
60% of the people in a sample favor Mr. Smith, you could confident 1 that, if you surveyed the
entire population, between 57% (60-3) and 63% (60+3) of the population would favor Mr. Smith.
The margin of error in social science research generally ranges from 3% to 7% and is closely
related to sample size. A margin of error will get narrower as the sample size increases. The
margin of error selected depends on the precision needed to make population estimates from a
sample.
2. The confidence level :
It is the estimated probability that a population estimate lies within a given margin of error.
Using the example above, a confidence level of 95% tells you that you can be 95% confident that
between 57% and 63% of the population favors Mr. Smith. Common confidence levels in
social science research include 90%, 95%, and 99%. Confidence levels are also closely related
to sample size. As the confidence level increases, so too does the sample size. A researcher
that chooses a confidence level of 90% will need a smaller sample than a researcher who is
required to be 99% confident that the population estimate lies within the margin of error.
Looking at it another way, with a confidence level of 95%, there is a 5% chance that an
estimate derived from a sample will fall outside the confidence interval of 57% to 63%.
Researchers will chose a higher confidence level in order to reduce the chance of making a
wrong conclusion about the population from the sample estimate. For all samples used in the
MGAP Outcome Evaluation, the confidence level is 95%.
3. proportion (or percentage) :
Proportion of a sample that will choose a given answer to a survey question is unknown, but it’s
necessary to estimate this number since it is required for calculating the sample size.
Most researchers will use a proportion (or percentage) that is considered the most conservative
estimate – that is, that 50% of the sample will provide a given response to a survey question.
This is considered the most conservative estimate because it is associated with the largest sample
size. Smaller sample sizes are needed if the proportion of a sample that will choose a given
answer to a question is estimated at 60% (or 40%) while an even smaller sample size is needed if
the estimated proportion of responses is either 70% (or 30%), 80% (or 20%), or 90% (or 10%).
Thus, when determining the sample size needed for a given level of accuracy (i.e., given
confidence level and margin of error), the most conservative estimate of 50% should be used
because it is associated with the largest sample size.
UNIT 3
Part A
One-tailed test:
A test of a statistical hypothesis, where the region of rejection is on only one side of
the sampling distribution, is called a one-tailed test.
In the normal curve any one of the side either positive or negative.
Two-tailed test:
A test of a statistical hypothesis, where the region of rejection is on both sides of the
sampling distribution, is called a two-tailed test.
If the normal curve, have both sides (right & left) is called two tailed test
3. What is ANOVA?
ANOVA - Analysis of variance. It is a statistical method in which the variation in a set of
observations is divided into distinct components.
Between
SSC C-1 MSC=SSC/C-1
samples
Within F=MSC/MSE
samples SSE C(R-1) MSE=SSE/C(R-1)
(error)
Between
SSC C-1 MSC=SSC/C-1
Column F=MSC/MSE
Between
SSR (R-1) MSR=SSR/(R-1)
rows F=MSR/MSE
Error
SSE (C-1)(R-1) MSE=SSE/(R-1)(C-1)
(residual)
When there is a need to determine whether the given two population means are different,
when the variance is known with Large Sample Size (n ≥30).
The test statistic is assumed to have a normal distribution and parameters such as
standard deviation should be known , Z test is performed.
28. Discuss the test procedure to test hypothesized population proportion using single sample
population.
Procedure:
1. Hypothesis:
Null hypothesis : Two proportion are equal (or) µ1=µ2.
Alternate hypothesis: Two proportion are unequal (or) )µ1≠µ2.
2. Test statistics
p1− p 2
Z=
S . E( p 1− p 2)
p 1 q 1 p 2q 2
S.E.(p1-p2)=
√ n1
+
n2
where q=1-p
UNIT 4
v−µ v
V statistics, Z = σv
2n 1 n 2
µv = n 1+ n 2 + 1 2n1n2 (2n1n2 -n1-n2)
σ2v =
Prepared by A.Anitha – AP/DOMS Page 26
Mailam Engineering College Statistics for Management (BA5106)
(n1+n2)² (n1+n2 – 1)
V = No. of runs.
6. Distinguish between Mann –whitney U test and Kruskal Wallis test. (Jan 2018)
7. List out the working rules of Mann –whitney U test. (Nov/Dec 2017)
Null Hypothesis:
Ho: µ1=µ2, the two population are identical and
Alternative Hypothesis:
H1: µ1≠ µ2, the two populations are not identical
Test Statistic:
1. Combine all the given samples (from smallest to the largest), and assign ranks to all these
values.
2. Assign the average of ranks if the sample values are same (i.e there are tie score)
3. Find the sum of the ranks for each of the sample. Let us denote these sums by R1 and R2.
Also n1 and n2 are their respective sample sizes.
4. Calculate U –statistic:
U 1=n 1 n2+ n1 ( n 1+1 )
-R1 [For sample 1]
2
(OR)
U−μ v
Z= σv
4. Level of significance:
Find the level of significant for 5%,1% ect
Find the table value by using Z test table value.
5. Conclusion:
Find the decision for this problem by comparing calculated value and table value and
write the decision whether it is accepted or rejected.
If │Z│≤ Zα , we accept H0 and reject H0 if │Z│≥ Zα . where Zα is the tabulated value of Z
for the given level of significance α.
Dn=max|Fe −F o|
11. What are the advantages of Non – parametric methods? (May/June 2016)
Advantages:
It does not require any parameters and the population to be normal
It is simple and easy to understand
It is based on assumptions
Disadvantages:
They ignore a certain amount of information
They are not efficient as parametric test.
The non –parametric tests cannot be used to estimate parameters in the population or the
confidence intervals for such parameters.
12. Distinguish between Non-Parametric and Parametric tests.(Jan 2015)
BASIS FOR
PARAMETRIC TEST NONPARAMETRIC TEST
COMPARISON
Meaning A statistical test, in which specific A statistical test used in the case of
assumptions are made about the non-metric independent variables, is
population parameter is known as called non-parametric test.
parametric test.
only two population whereas Kruskal Wallis test is employed when more than two populations
are involved.
3. Ranks for the different samples are separated and summed up as R1, R2, R3 ect.
1. One sample: We set up the hypothesis so that + and – signs are the values of random
variables having equal size.
2. Paired sample: This test is also called an alternative to the paired sample t-test. This test
uses the + and – signs in paired sample tests or in before-after study. In this test, null
hypothesis is set up so that the sign of + and – are of equal size, or the population means
are equal to the sample mean.
Procedure:
1. Calculate the + and – sign for the given distribution. Put a + sign for a value greater than
the mean value, and put a – sign for a value less than the mean value. Put 0 as the value
is equal to the mean value; pairs with 0 as the mean value are considered ties.
2. Denote the total number of signs by ‘n’ (ignore the zero sign) .
3. Sign test for paired data:-
It is based on the direction of a part of observation and not on their numerical
magnitude.
When sign value is less than 5 then we use Binomial (or) Poisson distribution,
e−α α x
P ( X=x )=n c r p r qn−r , p ¿X=x) =
x!
When sign value is greater than 5 or equal to 5 then we use normal distribution
x−μ pQ
z=
σ √
, SE( Ṕ)=
n
For table value see one tailed test or two tailed test.
UNIT 5
1. What is mean by Correlation analysis?
Correlation analysis is a statistical technique used to describe not only the degree of
relationship between the variables, but also the direction of influences.
variables.
According to A.M.Tuttle “Correlation is an analysis of the co-variation between two or
more variables.”
6. Regression analysis:
Regression is the measure of the average relationship between 2 or more variables
in terms of the original units of the data.
E.g., If we know that advertising & sales are correlated, we find out expected
amount of sales for a given advertising expenditure or the required amount of expenditure
σy
y− y=r (x−x )
σx
σy
Where r regression Coefficient of Y on X
σx
σx
x−x=r ( y− y )
σy
σx
Where r regression Coefficient of X on Y
σy
Ʃx Ʃy
x= ; y=
n n
Regression coefficient of Y on X
σy
r =byx …………….( 1)
σx
Regression coefficient of Y on X
σx
r =bxy …………….( 2)
σy
r =± √ byx ×bxy
σy
b yx= r = n Ʃ xy −( Ʃ x )( Ʃ y)
σx
n Ʃx2 –( Ʃx)2
σx
b xy = r = n Ʃ xy −( Ʃ x )( Ʃ y)
σy
n Ʃy2 –( Ʃy)2
Ʃy
a= ; b=Ʃ xy
N
Ʃx2
Second degree trend / parabolic trend:
Y = a+bx+cx2
When x ≠ 0;
When x = 0;
Y = a+bx+cx²
∑y = Na+b∑x+c∑x²
∑xy =a∑x+b∑x²+c∑x3
∑x²y = a∑x²+b∑x3+c∑x4
Regression analysis shows us how to determine both nature and the strength of relationship
between two variables.
14. What do you interpret if the r=0 and r=-1? (Jan 2011)
r=0 then the variables are uncorrected.
r=-1 then it is a perfect negative correlation.
19. Briefly explain how a scatter diagram benefits the researcher? (June 2014)
i) Relation –ship between variables.
ii) Combination of the two variables can be easily identified.
20. Explain the difference between linear and curvilinear relationships.( Jan 2016)
Linear relationships curvilinear relationships
Linear relationship has direct proportionality Nonlinear/curvilinear relationship does not
that causes the dependent variable to change have proportionality between the dependent
and independent variables (there is not
when the independent variable changes
consistent change)
In a linear relationship all the points on the Nonlinear/curvilinear relationship are depicted
scatter diagram tends to lie near a straight line, graphically by anything other than a straight
line.
21. Define Rank Correlation. Write down the formula to calculate rank correlation co-
efficient.(Jan 2013),(Nov/Dec 2018)
The correlation coefficient between the rank xi and yi is called the rank correlation
between the two characteristics A and B for the group of individuals
Where d= x-y
For Repeated ranks:
6 [Σ d2 + m1(m12 – 1) + m1(m12 – 1) + ……. ]
12 12
rs = 1 ̶
n ( n2 -1)
Where d2 is the square of differences of corresponding ranks and n is the number of pairs
of observation.