Residual Plots

We use the savings dataset as an example again:
    > g <- lm(sav ~ p15 + p75 + inc + gro)
First the residuals vs. fitted plot and the abs(residuals) vs. fitted plot.
    > motif()
    > plot(g$fit,g$res)
    > abline(h=0)
    > plot(g$fit,abs(g$res))
What do you see? A quick way to check non-constant variance is this regression:
    > summary(lm(abs(g$res) ~ g$fit))
    Coefficients:
                  Value Std. Error t value Pr(>|t|) 
    (Intercept)  4.8397  1.1865     4.0790  0.0002 
          g$fit -0.2035  0.1185    -1.7166  0.0925 

    Residual standard error: 2.163 on 48 degrees of freedom
    Multiple R-Squared: 0.05784 
    F-statistic: 2.947 on 1 and 48 degrees of freedom, the p-value is 0.09251
This test is not quite right as some weighting should be used and the degrees of freedom should be adjusted but there doesn't seem to be a clear problem with non-constant variance.

 It's often had to judge residual plots without prior experience so let's generate some of the artificial variety. The following four panels show

  1. Constant Variance
  2. Strong non-constant variance
  3. Mild non-constant variance
  4. Non-linearity
  5. > par(mfrow=c(3,3))
    > for(i in 1:9) plot(1:50,rnorm(50))
    > for(i in 1:9) plot(1:50,(1:50)*rnorm(50))
    > for(i in 1:9) plot(1:50,sqrt((1:50))*rnorm(50)) 
    > for(i in 1:9) plot(1:50,cos((1:50)*pi/25)+rnorm(50))
    > par(mfrow=c(1,1))

In this case we know the truth - do you think you would be able to come to the right conclusions for real data? Repeat to get an idea of the usual amount of variation.

 Now look at the residuals against predictor plots:

    > plot(p15,g$res)
Can you see the two groups? Let's compare and test the variances:
    > var(g$res[p15 > 35])/var(g$res[p15 < 35]) 
    [1] 2.785184
    > length(g$res[p15 > 35])
    [1] 23
    > 1-pf(2.785184,22,26)
    [1] 0.006786117
A significant difference is seen:

 Now try the partial regression (added variable) plot for p15:

    > d <- lm(sav ~ p75 + inc + gro)$res
    > m <- lm(p15 ~ p75 + inc + gro)$res
    > plot(m,d)
Compare the slope on the plot to the original regression and show the line on the plot:
    > lm(d ~ m)$coef
       (Intercept)         m 
     -1.231709e-16 -0.461205
    > g$coef
     (Intercept)       p15       p75           inc       gro 
        28.56661 -0.461205 -1.691576 -0.0003367615 0.4096998
    > abline(0,g$coef['p15'])
A partial residual plot is easier to do:
    > plot(p15,g$res+g$coef['p15']*p15)
    > abline(0,g$coef['p15'])
Might there be a different relationship in the two groups:
    > g1 <- lm(sav ~ p15 + p75 + inc + gro,subset=(p15 > 35))
    > g2 <- lm(sav ~ p15 + p75 + inc + gro,subset=(p15 < 35))
    > summary(g1)
    Coefficients:
                   Value Std. Error  t value Pr(>|t|) 
    (Intercept)  -2.4340  21.1550    -0.1151   0.9097
            p15   0.2739   0.4392     0.6235   0.5408
            p75  -3.5485   3.0333    -1.1698   0.2573
            inc   0.0004   0.0050     0.0842   0.9339
            gro   0.3955   0.2901     1.3632   0.1896
    
    Residual standard error: 4.454 on 18 degrees of freedom
    Multiple R-Squared: 0.1558 
    F-statistic: 0.8302 on 4 and 18 degrees of freedom, the p-value is 0.5233 
    > summary(g2)
    Coefficients:
                   Value Std. Error  t value Pr(>|t|) 
    (Intercept)  23.9638   8.0836     2.9645   0.0072
            p15  -0.3860   0.1954    -1.9755   0.0609
            p75  -1.3279   0.9260    -1.4339   0.1657
            inc  -0.0005   0.0007    -0.6339   0.5327
            gro   0.8844   0.2953     2.9945   0.0067
    
    Residual standard error: 2.772 on 22 degrees of freedom
    Multiple R-Squared: 0.5073 
    F-statistic: 5.663 on 4 and 22 degrees of freedom, the p-value is 0.002733
Can you see the difference? The graphical analysis has shown a relationship in the data that a purely numerical analysis might easily have missed.