Seminars and Colloquia by Series

Friday, October 19, 2018 - 13:05 , Location: Skiles 005 , , CS, Georgia Tech , , Organizer: He Guo
We investigate whether the standard dimensionality reduction techniques inadvertently produce data representations with different fidelity for two different populations. We show on several real-world datasets, PCA has higher reconstruction error on population A than B (for example, women versus men or lower versus higher-educated individuals). This can happen even when the dataset has similar number of samples from A and B . This motivates our study of dimensionality reduction techniques which maintain similar fidelity for A as B . We give an efficient algorithm for finding a projection which is nearly-optimal with respect to this measure, and evaluate it on several datasets. This is a joint work with Uthaipon Tantipongpipat, Jamie Morgenstern, Mohit Singh, and Santosh Vempala.
Friday, October 12, 2018 - 13:05 , Location: Skiles 005 , , ISyE, Georgia Tech , , Organizer: He Guo
Friday, September 28, 2018 - 13:15 , Location: Skiles 005 , , Math, Georgia Tech , , Organizer: He Guo
Given a finite partially ordered set (poset) of width $w$, Dilworth's theorem gives an existence and minimality of a chain partition of size $w$. First-Fit is an online algorithm for creating a chain partition of a poset. Given a linear ordering of the points of the poset, $v_1, \cdots, v_n$, First-Fit assigns the point $v_i$ to the first available chain in the chain partition of the points $v_1, \cdots, v_{i-1}$. It is a known fact that First-Fit has unbounded performance over the family of finite posets of width 2. We give a complete characterization of the family of finite posets in which First-Fit performs with subexponential efficiency in terms of width. We will also review what is currently known on the family of posets in which First-Fit performs with polynomial efficiency in terms of width. Joint work with Kevin Milans.
Friday, September 21, 2018 - 13:05 , Location: Skiles 005 , , CS, Georgia Tech , , Organizer: He Guo
As a generalization of many classic problems in combinatorial optimization, submodular optimization has found a wide range of applications in machine learning (e.g., in feature engineering and active learning).  For many large-scale optimization problems, we are often concerned with the adaptivity complexity of an algorithm, which quantifies the number of sequential rounds where polynomially-many independent function evaluations can be executed in parallel.  While low adaptivity is ideal, it is not sufficient for a (distributed) algorithm to be efficient, since in many practical applications of submodular optimization the number of function evaluations becomes prohibitively expensive.  Motivated by such applications, we study the adaptivity and query complexity of adaptive submodular optimization. Our main result is a distributed algorithm for maximizing a monotone submodular function with cardinality constraint $k$ that achieves a $(1-1/e-\varepsilon)$-approximation in expectation.  Furthermore, this algorithm runs in $O(\log(n))$ adaptive rounds and makes $O(n)$ calls to the function evaluation oracle in expectation.  All three of these guarantees are optimal, and the query complexity is substantially less than in previous works.  Finally, to show the generality of our simple algorithm and techniques, we extend our results to the submodular cover problem. Joint work with Vahab Mirrokni and Morteza Zadimoghaddam (arXiv:1807.07889).
Friday, September 14, 2018 - 13:05 , Location: Skiles 005 , , CS, Georgia Tech , , Organizer: He Guo
We study the dynamic graph connectivity problem in the massively parallel computation model. We give a data structure for maintaining a dynamic undirected graph that handles batches of updates and connectivity queries in constant rounds, as long as the queries fit on a single machine. This assumption corresponds to the gradual buildup of databases over time from sources such as log files and user interactions. Our techniques combine a distributed data structure for Euler Tour (ET) trees, a structural theorem about rapidly contracting graphs via sampling n^{\epsilon} random neighbors, as well as modifications to sketching based dynamic connectivity data structures. Joint work with David Durfee, Janardhan Kulkarni, Richard Peng and Xiaorui Sun.
Friday, April 20, 2018 - 13:05 , Location: Skiles 005 , , CSE, University of Washington , , Organizer: He Guo
We study the $A$-optimal design problem where we are given vectors $v_1,\ldots, v_n\in \R^d$, an integer $k\geq d$, and the goal is to select a set $S$ of $k$ vectors that minimizes the trace of $\left(\sum_{i\in S} v_i v_i^{\top}\right)^{-1}$. Traditionally, the problem is an instance of optimal design of experiments in statistics (\cite{pukelsheim2006optimal}) where each vector corresponds to a linear measurement of an unknown vector and the goal is to pick $k$ of them that minimize the average variance of the error in the maximum likelihood estimate of the vector being measured. The problem also finds applications in sensor placement in wireless networks~(\cite{joshi2009sensor}), sparse least squares regression~(\cite{BoutsidisDM11}), feature selection for $k$-means clustering~(\cite{boutsidis2013deterministic}), and matrix approximation~(\cite{de2007subset,de2011note,avron2013faster}). In this paper, we introduce \emph{proportional volume sampling} to obtain improved approximation algorithms for $A$-optimal design.Given a matrix, proportional volume sampling involves picking a set of columns $S$ of size $k$ with probability proportional to $\mu(S)$ times $\det(\sum_{i \in S}v_i v_i^\top)$ for some measure $\mu$. Our main result is to show the approximability of the $A$-optimal design problem can be reduced to \emph{approximate} independence properties of the measure $\mu$. We appeal to hard-core distributions as candidate distributions $\mu$ that allow us to obtain improved approximation algorithms for the $A$-optimal design. Our results include a $d$-approximation when $k=d$, an $(1+\epsilon)$-approximation when $k=\Omega\left(\frac{d}{\epsilon}+\frac{1}{\epsilon^2}\log\frac{1}{\epsilon}\right)$ and $\frac{k}{k-d+1}$-approximation when repetitions of vectors are allowed in the solution. We also consider generalization of the problem for $k\leq d$ and obtain a $k$-approximation. The last result also implies a restricted invertibility principle for the harmonic mean of singular values.We also show that the $A$-optimal design problem is$\NP$-hard to approximate within a fixed constant when $k=d$.
Physical sensors (thermal, light, motion, etc.) are becoming ubiquitous and offer important benefits to society. However, allowing sensors into our private spaces has resulted in considerable privacy concerns. Differential privacy has been developed to help alleviate these privacy concerns. In this talk, we’ll develop and define a framework for releasing physical data that preserves both utility and provides privacy. Our notion of closeness of physical data will be defined via the Earth Mover Distance and we’ll discuss the implications of this choice. Physical data, such as temperature distributions, are often only accessible to us via a linear transformation of the data.  We’ll analyse the implications of our privacy definition for linear inverse problems, focusing on those that are traditionally considered to be "ill-conditioned”. We’ll then instantiate our framework with the heat kernel on graphs and discuss how the privacy parameter relates to the connectivity of the graph. Our work indicates that it is possible to produce locally private sensor measurements that both keep the exact locations of the heat sources private and permit recovery of the general geographic vicinity'' of the sources. Joint work with Anna C. Gilbert.