Testing independence of regression errors with residuals as data

Job Candidate Talk
Tuesday, December 8, 2009 - 2:00pm for 1 hour (actually 50 minutes)
Skiles 269
Xia Hua – Massachusetts Institute of Technology
Christian Houdré
In a regression model, say Y_i=f(X_i)+\epsilon_i, where (X_i,Y_i) are observed and f is an unknown regression function, the errors \epsilon_i may satisfy what we call the "weak'' assumption that they are orthogonal with mean 0 and the same variance, and often the further ``strong'' assumption that they are i.i.d. N(0,\sigma^2) for some \sigma\geq 0. In this talk, I will focus on the polynomial regression model, namely f(x) = \sum_{i=0}^n a_i x^i for unknown parameters a_i, under the strong assumption on the errors. When a_i's are estimated via least squares (equivalent to maximum likelihood) by \hat a_i, we then get the {\it residuals} \hat epsilon_j := Y_j-\sum_{i=0}^n\hat a_iX_j^i. We would like to test the hypothesis that the nth order polynomial regression holds with \epsilon_j i.i.d. N(0,\sigma^2) while the alternative can simply be the negation or be more specific, e.g., polynomial regression with order higher than n. I will talk about two possible tests, first the rather well known turning point test, and second a possibly new "convexity point test.'' Here the errors \epsilon_j are unobserved but for large enough n, if the model holds, \hat a_i will be close enough to the true a_i so that \hat epsilon_j will have approximately the properties of \epsilon_j. The turning point test would be applicable either by this approximation or in case one can evaluate the distribution of the turning point statistic for residuals. The "convexity point test'' for which the test statistic is actually the same whether applied to the errors \epsilon_j or the residuals \hat epsilon_j avoids the approximation required in applying the turning point test to residuals. On the other hand the null distribution of the convexity point statistic depends on the assumption of i.i.d. normal (not only continuously distributed) errors.