Influential Observations

This follows on from the previous page and uses variables created there. First we compute and plot the cook statistics:
    > cook <- stud^2*lev/(5*(1-lev))
    > dotchart(cook,colabs)
Which ones are large? We now exclude that one and see how the fit changes:
    > gl <- lm(sav ~ p15+p75+inc+gro,subset=(cook < 0.2))
    > summary(gl)
    Coefficients:
                   Value Std. Error  t value Pr(>|t|) 
    (Intercept)  24.5247   8.2240     2.9821   0.0047
            p15  -0.3915   0.1579    -2.4790   0.0171
            p75  -1.2810   1.1452    -1.1186   0.2694
            inc  -0.0003   0.0009    -0.3430   0.7332
            gro   0.6103   0.2688     2.2706   0.0281
    
    Residual standard error: 3.795 on 44 degrees of freedom
    Multiple R-Squared: 0.3554 
    F-statistic: 6.066 on 4 and 44 degrees of freedom, the p-value is 0.0005616
Compared to the full data fit:
    > summary(g)
    Coefficients:
                   Value Std. Error  t value Pr(>|t|) 
    (Intercept)  28.5666   7.3545     3.8842   0.0003
            p15  -0.4612   0.1446    -3.1886   0.0026
            p75  -1.6916   1.0836    -1.5611   0.1255
            inc  -0.0003   0.0009    -0.3617   0.7193
            gro   0.4097   0.1962     2.0882   0.0425
    
    Residual standard error: 3.803 on 45 degrees of freedom
    Multiple R-Squared: 0.3385 
    F-statistic: 5.756 on 4 and 45 degrees of freedom, the p-value is 0.0007902
What changed? It would be rather tedious to do this for each country but there's a quicker way:
    > ginf <- lm.influence(g)
    > dotchart(ginf$coeff[,2],colabs,main="p15")
We just plotted the change in the second parameter estimate when a case is left out. Try this for the other estimates - which countries stick out? Consider Japan:
    > gj <- lm(sav ~ p15+p75+inc+gro,subset=(colabs != "Japan"))
    > summary(gj)
    Coefficients:
                   Value Std. Error  t value Pr(>|t|) 
    (Intercept)  23.9408   7.7840     3.0756   0.0036
            p15  -0.3679   0.1536    -2.3948   0.0210
            p75  -0.9738   1.1554    -0.8428   0.4039
            inc  -0.0005   0.0009    -0.5119   0.6113
            gro   0.3348   0.1984     1.6869   0.0987
    
    Residual standard error: 3.738 on 44 degrees of freedom
    Multiple R-Squared: 0.277 
    F-statistic: 4.214 on 4 and 44 degrees of freedom, the p-value is 0.005648
Compare to the full data fit - what qualitative changes do you observe? Finally what happens to the residual standard error when cases are removed? We plot this:
    > dotchart(ginf$sig,colabs)
Could we have known which countries would stick out on this plot in advance?