Multiple Regression Analysis 1
Multiple Regression Analysis 1
Analysis
Multiple Regression
Cautions About Linear Regression
Correlation and regression describe only linear
relations.
Correlation and least-squares regression line
are not resistant to outliers.
Predictions outside the range of observed data
are often inaccurate.
Relationship between two variables often
influenced by lurking variables not included in
our model.
General Principle of Data Analysis
Y c b1 X 1 b2 X 2 ... b p X p e
Y c b1 X 1 b2 X 2 ... b p X p e
• c is the Y intercept
• Y is therefore a weighted combination of the predictors
(and intercept) called a linear composite (LC)
Bivariate regression
Multiple Regression
Multiple Regression
Variance Explained – R2
r2
SS regression
SSYˆ
Yˆ Y
2
Y Y
2
SS total SSY
a ratio reflecting the proportion of variance captured by our
model relative to the overall variance in our data
R 2 =.50 means 50% of the variance in Y is explained by the
combination of X1, X2… Xp
2
R vs r 2
Significance of the Model
( N p 1) R 2
MS regression
F
p (1 R ) 2
MS residual
Importance of Individual Predictors
• Residuals
• normality: array of Y values are normally
distributed around Yˆ (assumption of normality in
arrays)
• homoscedasticity: variance of Y values are
constant across full range Y values (assumption
of homogeneity of variance in arrays)
• linearity: straight-line relationship between Y
and residuals (with mean = 0 and slope = 0)
• independence (residuals uncorrelated)
Multicollinearity and Singularity
Tolerance = (1 Rx2 )
2
• where Rx is the overlap between a particular
predictor and all the other predictors
• values below .10 considered problematic
high discrepancy
Multivariate Outliers – low influence
high discrepancy
Multivariate Outliers – Testing
Leverage
• Leverage statistic (h): varies from 0 to 1, values > .50 are
problematic
• Mahalanobis Distance h x (n-1), distributed as chi-square and
tested as such (df = p, <.001)
• Point to
Linear…
… and click.
Using SPSS
Multiple Linear Regression:
Selecting Variables
To select multiple
variables, hold down
the Ctrl Key and chose
the variables that you
want.
Using SPSS
Multiple Linear Regression:
Selecting Variables
Move shelf space
(space) & price per kg
(price), which are
already highlighted, to
the box labeled
Independent(s) then
click the arrow.
Requesting Statistics
Request
descriptive
statistics by
clicking the
button
labeled
Statistics…
Using SPSS
Multiple Linear Regression:
Requesting Statistics
Statistics for the Model
fit and Estimates for
Regression
Coefficients will be
produced by default.
Enter Method
The independent
variables can be
entered into the
analysis using
five different
methods.
Enter Method
Enter is the
default method
of variable entry.
Click the OK
button to run the
Multiple Linear
Regression
procedure.
Using SPSS
Multiple Linear Regression Output:
Descriptive Statistics
Regression
Correlations
Using SPSS
Multiple Linear Regression Enter Method Output:
Variables Entered
Using SPSS
Multiple Linear Regression Enter Method Output:
Model Summary
Correlation Standard Deviation
Coefficient of around the
Determination regression line
Durbin-Watson
Statistic
Using SPSS
Multiple Linear Regression Enter Method Output:
Model Summary
Independence
Durbin-Watson Statistic.
The D-W statistic is
defined as:
D = 2(1-ρ)
ANOVA
Measures of
Variation
Using SPSS
Multiple Linear Regression Enter Method Output:
Coefficients
Regression Equation:
ŷi = 10.50x1 + 0.057x2 + 2.029
Using SPSS
Multiple Linear Regression Enter Method Output:
Residuals Statistics
Using SPSS
Multiple Linear Regression Enter Method Output:
Residuals Histogram
Normality
Normality of residuals
is only required for
valid hypothesis
testing, that is, the
normality assumption
assures that the p-
values for the t-tests
and F-test will be
valid. Normality is not
required in order to
obtain unbiased
estimates of the
regression
coefficients
Using SPSS
Multiple Linear Regression Enter Method Output:
A standardized
normal
probability (P-P)
plot is sensitive
to non-normality
in the middle
range of data
tails.
Using SPSS
Multiple Linear Regression Enter Method Output:
Interpretation of Output
1. What contribution do both shelf space and price make to the
prediction of sales of pet food?
Both independent variables (shelf space and price) together explain 85 per
cent of the variance (R Square) in sales of pet food, which is highly
significant as indicated by the F-value of 34.08
Using SPSS
Multiple Linear Regression Enter Method Output:
Interpretation of Output
2. Which of the two variable is a better predictor of sales of pet food?
An examination of the t-values and Beta values indicate that price contributes
better to the prediction of sales. Therefore, you can say that price
significantly predicts sales of pet food with t = 3.22, P < .05. However, the
shelf space allocated is not a significant predictor.