Non-Asymptotic Bounds for Prediction Problems and Density Estimation

Dissertation Defense
Tuesday, May 1, 2012 - 3:00pm for 2 hours
Skiles 005
Stanislav Minsker – School of Mathematics, Georgia Tech
Stanislav Minsker
This dissertation investigates the statistical learning scenarios where a high-dimensional parameter has to be estimated from a given sample of fixed size, often smaller than the dimension of the problem. The first part answers some open questions for the binary classification problem in the framework of active learning. Given a random couple (X,Y)\in R^d\times {\pm 1} with unknown distribution P, the goal of binary classification is to predict a label Y based on the observation X. The prediction rule is constructed based on the observations (X_i,Y_i)_{i=1}^n sampled from P. The concept of active learning can be informally characterized as follows: on every iteration, the algorithm is allowed to request a label Y for any instance X which it considers to be the most informative. The contribution of this work consists of two parts: first, we provide the minimax lower bounds for performance of the active learning methods under certain assumptions. Second, we propose an active learning algorithm which attains nearly optimal rates over a broad class of underlying distributions and is adaptive with respect to the unknown parameters of the problem. The second part of this work is related to sparse recovery in the framework of dictionary learning. Let (X,Y) be a random couple with unknown distribution P, with X taking its values in some metric space S and Y - in a bounded subset of R. Given a collection of functions H={h_t}_{t\in \mb T} mapping S to R, the goal of dictionary learning is to construct a prediction rule for Y given by a linear (or convex) combination of the elements of H. The problem is sparse if there exists a good prediction rule that depends on a small number of functions from H. We propose an estimator of the unknown optimal prediction rule based on penalized empirical risk minimization algorithm. We show that proposed estimator is able to take advantage of the possible sparse structure of the problem by providing probabilistic bounds for its performance. Finally, we provide similar bounds in the density estimation framework.