Approximation algorithms for optimal design problems

ACO Student Seminar
Friday, April 6, 2018 - 1:05pm for 1 hour (actually 50 minutes)
Skiles 005
Uthaipon (Tao) Tantipongpipat – Georgia Tech – tao@gatech.edu
He Guo
We study the $A$-optimal design problem where we are given vectors $v_1,\ldots, v_n\in \R^d$, an integer $k\geq d$, and the goal is to select a set $S$ of $k$ vectors that minimizes the trace of $\left(\sum_{i\in S} v_i v_i^{\top}\right)^{-1}$. Traditionally, the problem is an instance of optimal design of experiments in statistics (\cite{pukelsheim2006optimal}) where each vector corresponds to a linear measurement of an unknown vector and the goal is to pick $k$ of them that minimize the average variance of the error in the maximum likelihood estimate of the vector being measured. The problem also finds applications in sensor placement in wireless networks~(\cite{joshi2009sensor}), sparse least squares regression~(\cite{BoutsidisDM11}), feature selection for $k$-means clustering~(\cite{boutsidis2013deterministic}), and matrix approximation~(\cite{de2007subset,de2011note,avron2013faster}). In this paper, we introduce \emph{proportional volume sampling} to obtain improved approximation algorithms for $A$-optimal design.Given a matrix, proportional volume sampling involves picking a set of columns $S$ of size $k$ with probability proportional to $\mu(S)$ times $\det(\sum_{i \in S}v_i v_i^\top)$ for some measure $\mu$. Our main result is to show the approximability of the $A$-optimal design problem can be reduced to \emph{approximate} independence properties of the measure $\mu$. We appeal to hard-core distributions as candidate distributions $\mu$ that allow us to obtain improved approximation algorithms for the $A$-optimal design. Our results include a $d$-approximation when $k=d$, an $(1+\epsilon)$-approximation when $k=\Omega\left(\frac{d}{\epsilon}+\frac{1}{\epsilon^2}\log\frac{1}{\epsilon}\right)$ and $\frac{k}{k-d+1}$-approximation when repetitions of vectors are allowed in the solution. We also consider generalization of the problem for $k\leq d$ and obtain a $k$-approximation. The last result also implies a restricted invertibility principle for the harmonic mean of singular values.We also show that the $A$-optimal design problem is$\NP$-hard to approximate within a fixed constant when $k=d$.