Consider a linear combination of independent identically distributed random variables $X_1, . . . , X_n$ with fixed weights $a_1, . . . a_n$. If the random variablesare continuous, the sum is almost surely non-zero. However, for discrete random variables an exact cancelation may occur with a positive probability. Thisprobability depends on the arithmetic nature of the sequence $a_1, . . . a_n$. We will discuss how to measure the relevant arithmetic properties and how to evaluate the probability of the exact and approximate cancelation.
Often
the popular press talks about the power of quantum computing coming
from its ability to perform several computations
simultaneously. We’ve already had a similar capability from
probabilistic machines. This talk will explore the relationship between
quantum and randomized computation, how they are similar and how they
differ, and why quantum can work exponentially faster on
some but far from all computational problems. We’ll talk about some open
problems in quantum complexity that can help shed light on the
similarities and differences between randomness and “quantumness”.
This talk will not assume any previous knowledge of quantum information or quantum computing.
Abstract In
this talk, I will present a popular distributed method, namely,
distributed consensus-based gradient (DCG) method, for solving optimal
learning problems over a network of agents. Such problems arise in many
applications such as, finding optimal parameters over
a large dataset distributed among a network of processors or seeking an
optimal policy for coverage control problems in robotic networks. The
focus is to present our recent results, where we study the performance
of DCG when the agents are only allowed to exchange
their quantized values due to their finite communication bandwidth. In
particular, we develop a novel quantization method, which we refer to as
adaptive quantization. The main idea of our approach is to quantize the
nodes' estimates based on the progress of
the algorithm, which helps to eliminate the quantized errors. Under
the adaptive quantization, we then derive the bounds on the convergence
rates of the proposed method as a function of the bandwidths
and the underlying network topology, for both convex and strongly convex
objective functions. Our results suggest that under the adaptive
quantization, the rate of convergence of DCG with and without
quantization are the same, except for a factor which captures
the number of quantization bits. To the best of the authors’ knowledge,
the results in this paper are considered better than any existing
results for DCG under quantization.
This is based on a joint work with Siva Theja Maguluri and Justin Romberg.
Bio Thinh
T. Doan is a TRIAD postdoctoral fellow at Georgia Institute of
Technology, joint between the School of Industrial and Systems
Engineering and the School of Electrical and Computer Engineering (ECE).
He was born in Vietnam, where he got his Bachelor degree in
Automatic Control at Hanoi University of Science and Technology in 2008.
He obtained his Master and Ph.D. degrees both in ECE from the
University of Oklahoma in 2013 and the University of Illinois at
Urbana-Champaign in 2018, respectively. His research interests
lie at the intersection of control theory, optimization, distributed
algorithms, and applied probability, with the main applications in
machine learning, reinforcement learning, power networks, and
multi-agent systems.
We study the fundamental problem of high-dimensional mean
estimation in a robust model where a constant fraction of the samples
are adversarially corrupted. Recent work gave the first polynomial time
algorithms for this problem with dimension-independent
error guarantees for several families of structured distributions.
In this work, we give the first nearly-linear time algorithms for
high-dimensional robust mean estimation. Specifically, we focus on
distributions with (i) known covariance and sub-gaussian tails, and (ii)
unknown bounded covariance. Given $N$ samples
on $R^d$, an $\epsilon$-fraction of which may be arbitrarily corrupted,
our algorithms run in time $\tilde{O}(Nd)/poly(\epsilon)$ and
approximate the true mean within the information-theoretically optimal
error, up to constant factors. Previous robust algorithms
with comparable error guarantees have running times $\tilde{\Omega}(N
d^2)$.
Our algorithms rely on a natural family of SDPs parameterized by
our current guess $\nu$ for the unknown mean $\mu^\star$. We give a
win-win analysis establishing the following: either a near-optimal
solution to the primal SDP yields a good candidate for
$\mu^\star$ -- independent of our current guess $\nu$ -- or the dual SDP
yields a new guess $\nu'$ whose distance from $\mu^\star$ is smaller by
a constant factor. We exploit the special structure of the
corresponding SDPs to show that they are approximately
solvable in nearly-linear time. Our approach is quite general, and we
believe it can also be applied to obtain nearly-linear time algorithms
for other high-dimensional robust learning problems.
This is a joint work with Ilias Diakonikolas and Rong Ge.
We investigate whether the standard dimensionality reduction techniques
inadvertently produce data representations with different fidelity for
two different populations. We show on several real-world datasets, PCA
has higher reconstruction error on population
A than B (for example, women versus men or lower versus higher-educated
individuals). This can happen even when the dataset has similar number
of samples from A and B . This motivates our study of dimensionality
reduction techniques which maintain similar fidelity
for A as B . We give an efficient algorithm for finding a projection
which is nearly-optimal with respect to this measure, and evaluate it on
several datasets. This is a joint work with Uthaipon
Tantipongpipat, Jamie Morgenstern, Mohit Singh, and Santosh Vempala.
At the heart of most algorithms today there is an optimization engine trying to learn online
and provide the best decision, for e.g.
rankings of objects, at any time with the partial information observed
thus far in time. Often it becomes difficult to find near optimal
solutions to many problems due to their inherent combinatorial structure that
leads to certain computational bottlenecks. Submodularity is a discrete
analogue of convexity and is a key property often exploited in tackling combinatorial optimization
problems.
In the first part of the talk, we will focus on computational
bottlenecks that involve submodular functions: (a) convex function
minimization over submodular base polytopes (for e.g. permutahedron) and
(b) movement along a line inside submodular base polytopes.
We give a conceptually simple and strongly polynomial algorithm Inc-Fix
for the former, which is useful in computing Bregman projections in
first-order projection-based methods like online mirror descent. For the
latter, we will bound the iterations of the
discrete Newton method which gives a running time improvement of at
least n^6 over the state of the art. This is joint work with Michel
Goemans and Patrick Jaillet. In the second part of the talk, we will
consider the dual problem of (a), i.e. minimization
of composite convex and submodular objectives. We will resolve Bach's
conjecture from 2015 about the running time of a popular Kelley's
cutting plane variant to minimize these composite objectives. This is
joint work with Madeleine Udell and Song Zhou.
Abstract:
Queueing
systems are studied in various asymptotic regimes because they are hard
to study in general. One
popular regime of study is the heavy-traffic regime, when the system is
loaded very close to its capacity. Heavy-traffic behavior of queueing
systems is traditionally studied using fluid and diffusion limits. In
this talk, I will present a recently developed
method called the 'Drift Method', which is much simpler, and is based on
studying the drift of certain test functions. In addition to exactly
characterizing the heavy-traffic behavior, the drift method can be used
to obtain lower and upper bounds for all loads.
In this talk, I will present the drift method, and its successful
application in the context of data center networks to resolve a
decade-old conjecture. I will also talk about ongoing work and some open
problems.
Bio:
Siva
Theja Maguluri is an Assistant Professor in the School of Industrial
and Systems Engineering at Georgia
Tech. Before that, he was a Research Staff Member in the Mathematical
Sciences Department at IBM T. J. Watson Research Center. He obtained his
Ph.D. from the University of Illinois at Urbana-Champaign in Electrical
and Computer Engineering where he worked on
resource allocation algorithms for cloud computing and wireless
networks. Earlier, he received an MS in ECE and an MS in Applied Math
from UIUC and a B.Tech in Electrical Engineering from IIT Madras. His
research interests include Stochastic Processes, Optimization,
Cloud Computing, Data Centers, Resource Allocation and Scheduling
Algorithms, Networks, and Game Theory. The current talk is based on a
paper that received the best publication in applied probability award,
presented by INFORMS Applied probability society.
Given a finite partially ordered set (poset) of width $w$, Dilworth's theorem gives an existence and minimality of a chain partition of size $w$. First-Fit is an online algorithm for creating a chain partition of a poset. Given a linear ordering of the points of the poset, $v_1, \cdots, v_n$, First-Fit assigns the point $v_i$ to the first available chain in the chain partition of the points $v_1, \cdots, v_{i-1}$. It is a known fact that First-Fit has unbounded performance over the family of finite posets of width 2. We give a complete characterization of the family of finite posets in which First-Fit performs with subexponential efficiency in terms of width. We will also review what is currently known on the family of posets in which First-Fit performs with polynomial efficiency in terms of width. Joint work with Kevin Milans.
As a generalization of many classic problems in combinatorial optimization, submodular optimization has found a wide range of applications in machine learning (e.g., in feature engineering and active learning). For many large-scale optimization problems,
we are often concerned with the adaptivity complexity of an algorithm, which quantifies the number of sequential rounds where polynomially-many independent function evaluations can be executed in parallel. While low adaptivity is ideal, it is not sufficient
for a (distributed) algorithm to be efficient, since in many practical applications of submodular optimization the number of function evaluations becomes prohibitively expensive. Motivated by such applications, we study the adaptivity and query complexity
of adaptive submodular optimization.
Our main result is a distributed algorithm for maximizing a monotone submodular function with cardinality constraint $k$ that achieves a $(1-1/e-\varepsilon)$-approximation in expectation. Furthermore, this algorithm runs in $O(\log(n))$ adaptive rounds
and makes $O(n)$ calls to the function evaluation oracle in expectation. All three of these guarantees are optimal, and the query complexity is substantially less than in previous works. Finally, to show the generality of our simple algorithm and techniques,
we extend our results to the submodular cover problem.
Joint work with Vahab Mirrokni and Morteza Zadimoghaddam (arXiv:1807.07889).