Global Testing under Sparse Alternatives: ANOVA, Multiple Comparisons and the Higher Criticism

Stochastics Seminar
Thursday, January 27, 2011 - 3:05pm
1 hour (actually 50 minutes)
Skiles 005
University of California, San Diego
We study the problem of testing for the significance of a subset of regression coefficients in a linear model under the assumption that the coefficient vector is sparse, a common situation in modern high-dimensional settings.  Assume there are p variables and let S be the number of nonzero coefficients.  Under moderate sparsity levels, when we may have S > p^(1/2), we show that the analysis of variance F-test is essentially optimal.  This is no longer the case under the sparsity constraint S < p^(1/2).  In such settings, a multiple comparison procedure is often preferred and we establish its optimality under the stronger assumption S < p^(1/4).  However, these two very popular methods are suboptimal, and sometimes powerless, when p^(1/4) < S < p^(1/2).  We suggest a method based on the Higher Criticism that is essentially optimal in the whole range S < p^(1/2).  We establish these results under a variety of designs, including the classical (balanced) multi-way designs and more modern `p > n' designs arising in genetics and signal processing. (Joint work with Emmanuel Candès and Yaniv Plan.)