Regression with two or three predictors
We use previously defined data.
First lets construct different models:
> g <- lm(sav ~ p15 + p75 + inc)
> g1 <- lm(sav ~ p15 + p75)
> g2 <- lm(sav ~ p15 + inc)
Lets look at these models:
> summary(g,cor=F)
Call: lm(formula = sav ~ p15 + p75 + inc)
Residuals:
Min
1Q Median 3Q Max
-8.646 -2.567 -0.1192 2.28 10.37
Coefficients:
Value Std. Error t value Pr(>|t|)
(Intercept) 31.4581 7.4822
4.2044 0.0001
p15 -0.4922 0.1490 -3.3021
0.0019
p75 -1.5677 1.1208 -1.3988
0.1686
inc -0.0008 0.0009 -0.8938
0.3761
Residual standard error: 3.939 on 46 degrees
of freedom
Multiple R-Squared: 0.2744
F-statistic: 5.797 on 3 and 46 degrees of
freedom, the p-value is 0.001898
> summary(g1,cor=F)
Call: lm(formula = sav ~ p15 + p75)
Residuals:
Min
1Q Median 3Q Max
-8.725 -2.704 -0.1199 2.282 10.32
Coefficients:
Value Std. Error t value Pr(>|t|)
(Intercept) 30.6284 7.4085
4.1342 0.0001
p15 -0.4709 0.1468 -3.2072
0.0024
p75 -1.9342 1.0409 -1.8582
0.0694
Residual standard error: 3.931 on 47 degrees
of freedom
Multiple R-Squared: 0.2618
F-statistic: 8.332 on 2 and 47 degrees of
freedom, the p-value is 0.0007993
> summary(g2,cor=F)
Call: lm(formula = sav ~ p15 + inc)
Residuals:
Min
1Q Median 3Q Max
-8.117 -2.656 -0.00547 1.484 10.98
Coefficients:
Value Std. Error t value Pr(>|t|)
(Intercept) 22.7131 4.1523
5.4700 0.0000
p15 -0.3303 0.0949 -3.4800
0.0011
inc -0.0013 0.0009 -1.4949
0.1416
Residual standard error: 3.979 on 47 degrees
of freedom
Multiple R-Squared: 0.2435
F-statistic: 7.564 on 2 and 47 degrees of
freedom, the p-value is 0.001419
Now lets see what is the correlation between different variables:
> r1<-g1$residual
> r2<-g2$residual
> cor(r1,p15)
[1] -5.5567e-17
> cor(r2,p15)
[1] -3.252871e-17
> cor(r2,r1)
[1] 0.9613393
What is 0.9613393? (note r1 and r2 are residuals).
Lets plot it:
> motif()
> brush(cbind(p15,p75,inc,r1,r2))
Play with it, rotate, choose different variables. Click "quit"
when you are done.
Now lets look at sum of squares:
>r<-g$residual
> sum(r^2)
[1] 713.7616
> sum(r1^2)
[1] 726.158
> sum(r2^2)
[1] 744.1213
Why first number is smallest? Why second number is smaller than
third? What does it mean?
Let's look at sectional sum of squares:
> anova(g)
Analysis of Variance Table
Response: sav
Terms added sequentially (first to last)
Df Sum of Sq Mean Sq F Value Pr(F)
p15
1 204.1241 204.1241 13.15524 0.0007158
p75
1 53.3462 53.3462 3.43802 0.0701299
inc
1 12.3964 12.3964 0.79892 0.3760707
Residuals 46 713.7616 15.5166
> anova(g1)
Analysis of Variance Table
Response: sav
Terms added sequentially (first to last)
Df Sum of Sq Mean Sq F Value Pr(F)
p15
1 204.1241 204.1241 13.21177 0.0006877
p75
1 53.3462 53.3462 3.45279 0.0694148
Residuals 47 726.1580 15.4502
> anova(g2)
Analysis of Variance Table
Response: sav
Terms added sequentially (first to last)
Df Sum of Sq Mean Sq F Value Pr(F)
p15
1 204.1241 204.1241 12.89283 0.0007855
inc
1 35.3829 35.3829 2.23484 0.1416147
Residuals 47 744.1213 15.8324
What is the difference between RSS of the first and second model?
Do you see it some were, or do you need to calculate it?
Why SS(p75) are the same in both models, but SS(inc) are not?
Why SS(p15) is always the same?
|