- Series
- ACO Student Seminar
- Time
- Friday, October 26, 2018 - 1:05pm for 1 hour (actually 50 minutes)
- Location
- Skiles 005
- Speaker
- Yu Cheng – CS, Duke University – chycharlie@gmail.com – https://users.cs.duke.edu/~yucheng/
- Organizer
- He Guo
We study the fundamental problem of high-dimensional mean
estimation in a robust model where a constant fraction of the samples
are adversarially corrupted. Recent work gave the first polynomial time
algorithms for this problem with dimension-independent
error guarantees for several families of structured distributions.
In this work, we give the first nearly-linear time algorithms for
high-dimensional robust mean estimation. Specifically, we focus on
distributions with (i) known covariance and sub-gaussian tails, and (ii)
unknown bounded covariance. Given $N$ samples
on $R^d$, an $\epsilon$-fraction of which may be arbitrarily corrupted,
our algorithms run in time $\tilde{O}(Nd)/poly(\epsilon)$ and
approximate the true mean within the information-theoretically optimal
error, up to constant factors. Previous robust algorithms
with comparable error guarantees have running times $\tilde{\Omega}(N
d^2)$.
Our algorithms rely on a natural family of SDPs parameterized by
our current guess $\nu$ for the unknown mean $\mu^\star$. We give a
win-win analysis establishing the following: either a near-optimal
solution to the primal SDP yields a good candidate for
$\mu^\star$ -- independent of our current guess $\nu$ -- or the dual SDP
yields a new guess $\nu'$ whose distance from $\mu^\star$ is smaller by
a constant factor. We exploit the special structure of the
corresponding SDPs to show that they are approximately
solvable in nearly-linear time. Our approach is quite general, and we
believe it can also be applied to obtain nearly-linear time algorithms
for other high-dimensional robust learning problems.
This is a joint work with Ilias Diakonikolas and Rong Ge.