Oracle Complexity of Convex Optimization: Distributional and non-Euclidean Lower Bounds

ACO Student Seminar
Friday, November 22, 2013 - 1:05pm
1 hour (actually 50 minutes)
Skiles 005
ISyE, Georgia Tech
First-order (a.k.a. subgradient) methods in convex optimization are a popular choice when facing extremely large-scale problems, where medium accuracy solutions suffice. The limits of performance of first-order methods can be partially understood under the lens of black box oracle complexity. In this talk I will present some of the limitations of worst-case black box oracle complexity, and I will show two recent extensions of the theory:                                                                                                          First, we extend the notion of oracle compexity to the distributional setting, where complexity is measured as the worst average running time of (deterministic) algorithms against a distribution of instances. In this model, the distribution of instances is part of the input to the algorithm, and thus algorithms can potentially exploit this to accelerate their running time. However, we will show that for nonsmooth convex optimization distributional lower bounds coincide to worst-case complexity up to a constant factor, and thus all notions of complexity collapse; we can further extend these lower bounds to prove high running time with high probability (this is joint work with Sebastian Pokutta and Gabor Braun). Second, we extend the worst-case lower bounds for smooth convex optimization to non-Euclidean settings. Our construction mimics the classical proof for the nonsmooth case (based on piecewise-linear functions), but with a local smoothening of the instances. We establish a general lower bound for a wide class of finite dimensional Banach spaces, and then apply the results to \ell^p spaces, for p\in[2,\infty]. A further reduction will allow us to extend the lower bounds to p\in[1,2). As consequences, we prove the near-optimality of the Frank-Wolfe algorithm for the box and the spectral norm ball; and we prove near-optimality of function classes that contain the standard convex relaxation for the sparse recovery problem (this is joint work with Arkadi Nemirovski).