Job Candidate Talk
Thursday, January 19, 2017 - 2:00pm
1 hour (actually 50 minutes)
Johns Hopkins University
High-dimensional data arise in many fields of contemporary science and introduce new challenges in statistical learning due to the well-known curse of dimensionality. Many data sets in image analysis and signal processing are in a high-dimensional space but exhibit a low-dimensional structure. We are interested in building efficient representations of these data for the purpose of compression and inference, and giving performance guarantees that are only cursed by the intrinsic dimension of data. Specifically, in the setting where a data set in $R^D$ consists of samples from a probability measure concentrated on or near an unknown $d$-dimensional manifold with $d$ much smaller than $D$, we consider two sets of problems: low-dimensional geometric approximation to the manifold and regression of a function on the manifold. In the first case we construct multiscale low-dimensional empirical approximations to the manifold and give finite-sample performance guarantees. In the second case we exploit these empirical geometric approximations of the manifold to construct multiscale approximations to the function. We prove finite-sample guarantees showing that we attain the same learning rates as if the function was defined on a Euclidean domain of dimension $d$. In both cases our approximations can adapt to the regularity of the manifold or the function even when this varies at different scales or locations. All algorithms have complexity $C n\log (n)$ where $n$ is the number of samples, and the constant $C$ is linear in $D$ and exponential in $d$.