Plug-in Approach to Active Learning

Stochastics Seminar
Thursday, March 3, 2011 - 3:05pm
1 hour (actually 50 minutes)
Skiles 005
Georgia Tech
 Let (X,Y) be a random couple with unknown distribution P, X being an observation and Y - a binary label to be predicted. In practice, distribution P remains unknown but the learning algorithm has access to the training data - the sample from P. It often happens that the cost of obtaining the training data is associated with labeling the observations while the pool of observations itself is almost unlimited. This suggests to measure the performance of a learning algorithm in terms of its label complexity, the number of labels required to obtain a classifier with the desired accuracy. Active Learning theory explores the possible advantages of this modified framework.We will present a new active learning algorithm based on nonparametric estimators of the regression function and explain main improvements over the previous work.Our investigation provides upper and lower bounds for the performance of proposed method over a broad class of underlying distributions.