## Seminars and Colloquia by Series

### Mathematical and Statistical Challenges on Large Discrete Structures

Series
Job Candidate Talk
Time
Wednesday, March 16, 2022 - 11:00 for 1 hour (actually 50 minutes)
Location
https://bluejeans.com/348214744/2450
Speaker
Miklos RaczPrinceton University

From networks to genomics, large amounts of data are abundant and play critical roles in helping us understand complex systems. In many such settings, these data take the form of large discrete structures with important combinatorial properties. The interplay between structure and randomness in these systems presents unique mathematical and statistical challenges. In this talk I will highlight these through two vignettes: (1) inference problems on networks, and (2) DNA data storage.

First, I will discuss statistical inference problems on edge-correlated stochastic block models. We determine the information-theoretic threshold for exact recovery of the latent vertex correspondence between two correlated block models, a task known as graph matching. As an application, we show how one can exactly recover the latent communities using multiple correlated graphs in parameter regimes where it is information-theoretically impossible to do so using just a single graph. Furthermore, we obtain the precise threshold for exact community recovery using multiple correlated graphs, which captures the interplay between the community recovery and graph matching tasks.

Next, I will give an overview of DNA data storage. Storing data in synthetic DNA is an exciting emerging technology which has the potential to revolutionize data storage. Realizing this goal requires innovation across a multidisciplinary pipeline. I will explain this pipeline, focusing on our work on statistical error correction algorithms and optimizing DNA synthesis, highlighting the intimate interplay between statistical foundations and practice.

### Matrix Concentration and Synthetic Data

Series
Job Candidate Talk
Time
Thursday, March 10, 2022 - 11:00 for 1 hour (actually 50 minutes)
Location
https://bluejeans.com/405947238/3475
Speaker
March BoedihardjoUC Irvine

Classical matrix concentration inequalities are sharp up to a logarithmic factor. This logarithmic factor is necessary in the commutative case but unnecessary in many classical noncommutative cases. We will present some matrix concentration results that are sharp in many cases, where we overcome this logarithmic factor by using an easily computable quantity that captures noncommutativity. Joint work with Afonso Bandeira and Ramon van Handel.

Due to privacy, access to real data is often restricted. Data that are not completely real but resemble certain properties of real data become natural substitutes. Data of this type are called synthetic data. I will talk about the extent to which synthetic data may resemble real data under privacy and computational complexity restrictions. Joint work with Thomas Strohmer and Roman Vershynin.

The link to the online talk:  https://bluejeans.com/405947238/3475

### Trees in graphs and hypergraphs

Series
Job Candidate Talk
Time
Tuesday, March 8, 2022 - 11:00 for 1 hour (actually 50 minutes)
Location
ONLINE
Speaker
Maya SteinUniversity of Chile

Graphs are central objects of study in Discrete Mathematics. A graph consists of a set of vertices, some of which are connected by edges. Their elementary structure makes graphs widely applicable, but the theoretical understanding of graphs is far from complete. Extremal graph theory aims to find connections between global parameters and substructure. A key topic is how a large average or minimum degree of a graph can force certain subgraphs (where the degree is the number of edges at a vertex). For instance, Erdős and Gallai proved in the 1960's that any graph of average degree at least $k$ contains a path of length $k$. Some of the most intriguing open questions in this area concern trees (connected graphs without cycles) as subgraphs. For instance, can one substitute the path from the previous paragraph with a tree? We will give an overview of open problems and recent results in this area, as well as their possible extensions to hypergraphs.

### Low-rank Structured Data Analysis: Methods, Models and Algorithms

Series
Job Candidate Talk
Time
Tuesday, February 22, 2022 - 11:00 for 1 hour (actually 50 minutes)
Location
https://bluejeans.com/717545499/6211
Speaker
Longxiu HuangUCLA

In modern data analysis, the datasets are often represented by large-scale matrices or tensors (the generalization of matrices to higher dimensions). To have a better understanding or extract   values effectively from these data, an important step is to construct a low-dimensional/compressed representation of the data that may be better to analyze and interpret in light of a corpus of field-specific information. To implement the goal, a primary tool is the matrix/tensor decomposition. In this talk, I will talk about novel matrix/tensor decompositions, CUR decompositions, which are memory efficient and computationally cheap. Besides, I will also discuss the applications of CUR decompositions on developing efficient algorithms or models to robust decompositions or data completion problems. Additionally, some simulation results will be provided on real and synthetic datasets.

### Zarankiewicz problem, VC-dimension, and incidence geometry

Series
Job Candidate Talk
Time
Thursday, February 17, 2022 - 11:00 for 1 hour (actually 50 minutes)
Location
https://gatech.bluejeans.com/939739653/6882
Speaker
Cosmin PohoataYale University
The Zarankiewicz problem is a central problem in extremal graph theory, which lies at the intersection of several areas of mathematics. It asks for the maximum number of edges in a bipartite graph on $2n$ vertices, where each side of the bipartition contains $n$ vertices, and which does not contain the complete bipartite graph $K_{s,t}$ as a subgraph. One of the many reasons this problem is rather special among Turán-type problems is that the extremal graphs in question, whenever available, always seem to have to be of algebraic nature, in particular witnesses to basic intersection theory phenomena. The most tantalizing case is by far the diagonal problem, for which the answer is unknown for most values of $s=t$, and where it is a complete mystery what the extremal graphs could look like. In this talk, we will discuss a new phenomenon related to an important variant of this problem, which is the analogous question in bipartite graphs with bounded VC-dimension. We will present several new consequences in incidence geometry, which improve upon classical results. Based on joint work with Oliver Janzer.

### On the sum-product problem

Series
Job Candidate Talk
Time
Tuesday, February 15, 2022 - 11:00 for 1 hour (actually 50 minutes)
Location
ONLINE
Speaker
George ShakanCRM

Let A be a subset of the integers of size n. In 1983, Erdos and Szemeredi conjectured that either A+A or A*A must have size nearly n^2. We discuss ideas towards this conjecture, such as an older connection to incidence geometry as well as somewhat newer breakthroughs in additive combinatorics. We further highlight applications of the sum-product phenomenon.

### Recent progress on Hadwiger's conjecture

Series
Job Candidate Talk
Time
Monday, February 14, 2022 - 11:00 for 1 hour (actually 50 minutes)
Location
ONLINE
Speaker
Luke PostleUniversity of Waterloo

In 1943, Hadwiger conjectured that every graph with no $K_t$ minor is $(t-1)$-colorable for every $t \ge 1$. Hadwiger's Conjecture is a vast generalization of the Four Color Theorem and one of the most important open problems in graph theory. Only the cases when $t$ is at most 6 are known. In the 1980s, Kostochka and Thomason independently proved that every graph with no $K_t$ minor has average degree $O(t (\log t)^{0.5})$ and hence is $O(t (\log t)^{0.5})$-colorable.  In a recent breakthrough, Norin, Song, and I proved that every graph with no $K_t$ minor is $O(t (\log t)^c)$-colorable for every $c > 0.25$,  Subsequently I showed that every graph with no $K_t$ minor is $O(t (\log \log t)^6)$-colorable.  Delcourt and I improved upon this further by showing that every graph with no $K_t$ minor is $O(t \log \log t)$-colorable. Our main technical result yields this as well as a number of other interesting corollaries.  A natural weakening of Hadwiger's Conjecture is the so-called Linear Hadwiger's Conjecture that every graph with no $K_t$ minor is $O(t)$-colorable.  We prove that Linear Hadwiger's Conjecture reduces to small graphs. In 2005, Kühn and Osthus proved that Hadwiger's Conjecture for the class of $K_{s,s}$-free graphs for any fixed positive integer $s \ge 2$. Along this line, we show that Linear Hadwiger's Conjecture holds for the class of $K_r$-free graphs for every fixed $r$.

### Stochastic and Convex Geometry for the Analysis of Complex Data

Series
Job Candidate Talk
Time
Thursday, February 10, 2022 - 11:00 for 1 hour (actually 50 minutes)
Location
https://gatech.bluejeans.com/532559688
Speaker
Eliza O’ReillyCalifornia Institute of Technology

Many modern problems in data science aim to efficiently and accurately extract important features and make predictions from high dimensional and large data sets. While there are many empirically successful methods to achieve these goals, large gaps between theory and practice remain.  A geometric viewpoint is often useful to address these challenges as it provides a unifying perspective of structure in data, complexity of statistical models, and tractability of computational methods.  As a consequence, an understanding of problem geometry leads both to new insights on existing methods as well as new models and algorithms that address drawbacks in existing methodology.

In this talk, I will present recent progress on two problems where the relevant model can be viewed as the projection of a lifted formulation with a simple stochastic or convex geometric description. In particular, I will first describe how the theory of stationary random tessellations in stochastic geometry can address computational and theoretical challenges of random decision forests with non-axis-aligned splits. Second, I will present a new approach to convex regression that returns non-polyhedral convex estimators compatible with semidefinite programming. These works open a number of future research directions at the intersection of stochastic and convex geometry, statistical learning theory, and optimization.

### On sphere packings and the hard sphere model

Series
Job Candidate Talk
Time
Tuesday, February 8, 2022 - 11:00 for 1 hour (actually 50 minutes)
Location
https://bluejeans.com/552606446/5315
Speaker
Will PerkinsUniversity of Illinois, Chicago
The classic sphere packing problem is to determine the densest possible packing of non-overlapping congruent spheres in Euclidean space.  The problem is trivial in dimension 1, straightforward in dimension 2, but a major challenge or mystery in higher dimensions, with the only other solved cases being dimensions 3, 8, and 24.  The hard sphere model is a classic model of a gas from statistical physics, with particles interacting via a hard-core pair potential.  It is believed that this model exhibits a crystallization phase transition in dimension 3, giving a purely geometric explanation for freezing phenomena in nature, but this remains an open mathematical problem. The sphere packing problem and the hard sphere model are closely linked through the following rough rephrasing of the phase transition question: do typical sphere packings at densities just below the maximum density align with a maximum packing or are they disordered?

I will present results on high-dimensional sphere packings and spherical codes and new bounds for the absence of phase transition at low densities in the hard sphere model.  The techniques used take the perspective of algorithms and optimization and can be applied to problems in extremal and enumerative combinatorics as well.

### Understanding Statistical-vs-Computational Tradeoffs via Low-Degree Polynomials

Series
Job Candidate Talk
Time
Thursday, February 3, 2022 - 11:00 for 1 hour (actually 50 minutes)
Location
https://bluejeans.com/500115320/1408
Speaker
Alex WeinUC Berkeley/Simons Institute

A central goal in modern data science is to design algorithms for statistical inference tasks such as community detection, high-dimensional clustering, sparse PCA, and many others. Ideally these algorithms would be both statistically optimal and computationally efficient. However, it often seems impossible to achieve both these goals simultaneously: for many problems, the optimal statistical procedure involves a brute force search while all known polynomial-time algorithms are statistically sub-optimal (requiring more data or higher signal strength than is information-theoretically necessary). In the quest for optimal algorithms, it is therefore important to understand the fundamental statistical limitations of computationally efficient algorithms.

I will discuss an emerging theoretical framework for understanding these questions, based on studying the class of "low-degree polynomial algorithms." This is a powerful class of algorithms that captures the best known poly-time algorithms for a wide variety of statistical tasks. This perspective has led to the discovery of many new and improved algorithms, and also many matching lower bounds: we now have tools to prove failure of all low-degree algorithms, which provides concrete evidence for inherent computational hardness of statistical problems. This line of work illustrates that low-degree polynomials provide a unifying framework for understanding the computational complexity of a wide variety of statistical tasks, encompassing hypothesis testing, estimation, and optimization.