Influential Observations
This follows on from the previous page and
uses variables created there. First we compute and plot the cook statistics:
> cook <- stud^2*lev/(5*(1-lev))
> dotchart(cook,colabs)
Which ones are large? We now exclude that one and see how the fit changes:
> gl <- lm(sav ~ p15+p75+inc+gro,subset=(cook < 0.2))
> summary(gl)
Coefficients:
Value Std. Error t value Pr(>|t|)
(Intercept) 24.5247 8.2240 2.9821 0.0047
p15 -0.3915 0.1579 -2.4790 0.0171
p75 -1.2810 1.1452 -1.1186 0.2694
inc -0.0003 0.0009 -0.3430 0.7332
gro 0.6103 0.2688 2.2706 0.0281
Residual standard error: 3.795 on 44 degrees of freedom
Multiple R-Squared: 0.3554
F-statistic: 6.066 on 4 and 44 degrees of freedom, the p-value is 0.0005616
Compared to the full data fit:
> summary(g)
Coefficients:
Value Std. Error t value Pr(>|t|)
(Intercept) 28.5666 7.3545 3.8842 0.0003
p15 -0.4612 0.1446 -3.1886 0.0026
p75 -1.6916 1.0836 -1.5611 0.1255
inc -0.0003 0.0009 -0.3617 0.7193
gro 0.4097 0.1962 2.0882 0.0425
Residual standard error: 3.803 on 45 degrees of freedom
Multiple R-Squared: 0.3385
F-statistic: 5.756 on 4 and 45 degrees of freedom, the p-value is 0.0007902
What changed? It would be rather tedious to do this for each country but
there's a quicker way:
> ginf <- lm.influence(g)
> dotchart(ginf$coeff[,2],colabs,main="p15")
We just plotted the change in the second parameter estimate when a case
is left out. Try this for the other estimates - which countries stick out?
Consider Japan:
> gj <- lm(sav ~ p15+p75+inc+gro,subset=(colabs != "Japan"))
> summary(gj)
Coefficients:
Value Std. Error t value Pr(>|t|)
(Intercept) 23.9408 7.7840 3.0756 0.0036
p15 -0.3679 0.1536 -2.3948 0.0210
p75 -0.9738 1.1554 -0.8428 0.4039
inc -0.0005 0.0009 -0.5119 0.6113
gro 0.3348 0.1984 1.6869 0.0987
Residual standard error: 3.738 on 44 degrees of freedom
Multiple R-Squared: 0.277
F-statistic: 4.214 on 4 and 44 degrees of freedom, the p-value is 0.005648
Compare to the full data fit - what qualitative changes do you observe?
Finally what happens to the residual standard error when cases are removed?
We plot this:
> dotchart(ginf$sig,colabs)
Could we have known which countries would stick out on this plot in advance?
|