The Normal Distribution Estimation Correlation
The Normal Distribution Estimation Correlation
The Normal Distribution Estimation Correlation
=
i i i i
i i i i
y y n x x n
y x y x n
r
- Since r is computed from the sample data, it is a sample statistic.
- Interpretation of the values of r
r = 1 : perfect positive correlation between X and Y
0.5 s r < 1 : strong positive correlation between X and Y
0 < r < 0.5 : positive correlation between X and Y
r = 0 : zero correlation
-0.5 < r < 0 : negative correlation between X and Y
-1 < r s -0.5 : strong negative correlation between X and Y
r = -1 : perfect negative correlation between X and Y
- Zero correlation means lack of linearity and not lack of association.
- r measures the strength of the linear relationship. It is not designed to measure the
strength of a relationship that is not linear.
- The value of r is always between 1 and 1, that is 1 s r s 1 . (rounding off should be at
least up to 3 decimal places)
- Common errors in interpreting the results:
1. We must be careful to avoid concluding that a significant linear correlation
between two variables is a proof that there is a cause-effect relationship
between them.
2. No significant linear correlation does not mean X and Y are not related in any
way.
3. Rounding errors can wreak havoc with the results. Round the linear correlation
coefficient to three decimal places.
Examples:
For numbers 1 to 4, identify the error in the stated conclusion and write the correct conclusion.
1. Given: The paired sample data result in a linear correlation coefficient very close to zero.
Conclusion: The two variables are not related in any way.
2. Given: There is a strong positive linear correlation between smoking and cancer.
Conclusion: Smoking causes cancer.
3. Given: x = age y = test score r = 0.40
Conclusion: Older people tend to get lower scores.
4. Given: There is a strong positive linear correlation between income and spending.
Conclusion: Increased spending is caused by increased income.
5. Ten students from the College of Business Administration were chosen to become
respondents in a study conducted to determine the relationship between the grades of
students ( X ) with their number of hours studying ( Y ). After computing the degree of
relationship, it was found out to be 0.575. What would be the conclusion?
6. The data on yearly consumption of cigarettes in the Philippines and the percentage of the
countrys population admitted to mental institutions as psychiatric cases were collected for 8
years. The correlation coefficient r = 0.61. What can we conclude about the data?
7. The temperature in a certain locality and number of pregnant women were found to have a
strong negative correlation. What would be the right conclusion?
EXAMPLES: Construct a scatter diagram, find r and interpret the results.
1. X 2 3 7 12 16 20 22
Y 14 20 9 14 5 1 15
2. X 9 4 5 4 2 6 3 7 2 8
Y 8 5 8 4 3 4 4 10 4 10
3. X 2 4 6 8 10 12
Y 6 12 18 24 30 36
4. X 25 64 75 35 86 15 19 66 37 9 12 9 47
Y 90 3 85 70 67 45 22 12 85 66 54 16 24
5.
X 3 4 3 4 5 6 5 6 7 8 7 8 9 11 9 10
Y 15 17 3 4 5 21 23 13 11 12 25 6 7 9 16 7
EXERCISES
A. Construct a scatter diagram, find r and interpret the results.
1. Grades of 6 students selected at random
MATH GRADE ( X ) 70 92 80 74 65 83
ENGLISH GRADE (Y) 74 84 63 87 78 90
2. The data below consists of weights in pounds of discarded paper and size of households
X (paper) 2.41 7.57 9.55 8.82 8.72 6.96 6.83 11.42
Y (household size) 2 3 3 6 4 2 1 5
3.The data below consists of number of persons in the household and the number of cars they
own
X (household size) 2 4 4 2 2 1 2 3 5
Y (cars) 2 0 2 2 1 1 3 0 2
4. The data below consists of age and the income in thousands of dollars
Age 60 63 51 25 47 56 19 24 25 20 66 19 48 52 27
Income 43.4 18.8 14.4 29.4 19.4 83 10.4 12.6 36.4 29.6 17.2 17.2 67 33 37.4
5. A teacher is interested in knowing whether or not two IQ tests produce linearly related
scores. A sample of 10 students was taken randomly. Five students took Test 1 and 5 students
took Test 2 in the morning. In the afternoon, those who took Test 1 took Test 2 and vice versa.
The results are shown in the table below:
STUDENT TEST 1 (X) TEST 2 (Y)
A 125 114
B 145 127
C 110 126
D 120 116
E 124 108
F 110 100
G 121 129
H 142 131
I 100 96
J 126 113
a. Plot a scatter diagram for these data.
b. Solve for r.
c. How well do the two tests relate linearly? Explain.
6. In a study of factors that affect success in a calculus score, data were collected for 10
different persons. Scores on an Algebra placement tests are given, along with Calculus
achievement scores.
a. Plot a scatter diagram for these data.
b. Find the value of the linear correlation coefficient r.
c. Test the significance of r at = 0.05.
ALGEBRA
SCORE (X)
17 21 11 16 15 11 24 27 19 8
CALCULUS
SCORE (Y)
73 66 64 61 70 71 90 68 84 52
7. One study was conducted to determine the relationship between the age and systolic blood
pressure of 12 women.
Age ( X ) Systolic Blood Pressure ( Y )
56 147
42 125
72 160
36 118
63 149
47 128
55 150
49 145
38 115
42 140
68 152
60 155
a. Plot a scatter diagram for these data.
b. Solve for r and interpret.
c. What can you conclude about the relationship between age and systolic blood
pressure of women? Explain statistically.