Seminars and Colloquia by Series

Improved performance guarantees for Tukey’s median

Series
Stochastics Seminar
Time
Thursday, October 10, 2024 - 15:30 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Stanislav MinskerUniversity of Southern California

Is there a natural way to order data in dimension greater than one? The approach based on the notion of data depth, often associated with the name of John Tukey, is among the most popular. Tukey’s depth has found applications in robust statistics, graph theory, and the study of elections and social choice. 

We will give an introduction to the topic, describe the properties of Tukey’s depth, and introduce some remaining open questions as well as our recent progress towards the solutions.

The talk is based on a joint work with Yinan Shen.

Minimum Norm Interpolation Meets The Local Theory of Banach Spaces

Series
Stochastics Seminar
Time
Thursday, October 3, 2024 - 15:30 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Gil KurETH
Minimum-norm interpolators have recently gained attention primarily as an analyzable model to shed light on the double descent phenomenon observed for neural networks.  The majority of the work has focused on analyzing interpolators in Hilbert spaces, where typically an effectively low-rank structure of the feature covariance prevents a large bias. More recently, tight vanishing bounds have also been shown for isotropic high-dimensional data  for $\ell_p$-spaces with $p\in[1,2)$, leveraging the sparse structure of the ground truth.
This work takes a first step towards establishing a general framework that connects generalization properties of the interpolators to well-known concepts from high-dimensional geometry, specifically, from the local theory of Banach spaces.

Near-optimal estimation on the union of shift-invariant subspaces

Series
Stochastics Seminar
Time
Thursday, September 26, 2024 - 15:30 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Dmitrii OstrovskiiGeorgia Tech

In the 1990s, Arkadi Nemirovski asked the following question:

How hard is it to estimate a solution to unknown homogeneous linear difference equation with constant coefficients of order S, observed in the Gaussian noise on [0,N]?

The class of all such solutions, or "signals," is parametric---described by 2S complex parameters---but extremely rich: it includes the weighted sums of S exponentials, polynomials of degree S, harmonic oscillations with S arbitrary frequencies, and their algebraic combinations. Geometrically, this class is the union of all S-dimensional shift-invariant subspaces of the space of two-sided sequences, and of interest is the minimax risk on it with respect to the mean-squared error on [0,N]. I will present a recent result that shows this minimax risk to be O( S log(N) log(S)^2 ), improving over the state of the art by a polynomial in S factor, and coming within an O( log(S)^2 ) factor from the lower bound. It relies upon an approximation-theoretic construction related to minimal-norm interpolation over shift-invariant subspaces, in the spirit of the Landau-Kolmogorov problem, that I shall present in some detail. Namely, we will see that any shift-invariant subspace admits a bounded-support reproducing kernel whose spectrum has nearly the smallest possible Lp-energies for all p ≥ 1 at once.

Pseudo-Maximum Likelihood Theory for High-Dimension Rank-One Inference

Series
Stochastics Seminar
Time
Thursday, September 19, 2024 - 15:30 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Justin KoUniversity of Waterloo

We consider the task of estimating a rank-one matrix from noisy observations. Models that fall in this framework include community detection and spiked Wigner models. In this talk, I will discuss pseudo-maximum likelihood theory for such inference problems. We provide a variational formula for the asymptotic maximum pseudo-likelihood and characterize the asymptotic performance of pseudo maximum likelihood estimators. We will also discuss the implications of these findings to least squares estimators. Our approach uses the recent connections between statistical inference and statistical physics, and in particular the connection between the maximum likelihood and the ground state of a modified spin glass.

Based on joint work with Curtis Grant and Aukosh Jagannath.

Higher-Order Graphon Theory: Fluctuations, Inference, and Degeneracies

Series
Stochastics Seminar
Time
Thursday, September 12, 2024 - 15:30 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Bhaswar BhattacharyaUniversity of Pennsylvania

Motifs (patterns of subgraphs), such as edges and triangles, encode important structural information about the geometry of a network. Consequently, counting motifs in a large network is an important statistical and computational problem. In this talk we will consider the problem of estimating motif densities and fluctuations of subgraph counts in an inhomogeneous random graph sampled from a graphon. We will show that the limiting distributions of subgraph counts can be Gaussian or non-Gaussian, depending on a notion of regularity of subgraphs with respect to the graphon. Using these results and a novel multiplier bootstrap for graphons, we will construct joint confidence sets for the motif densities. Finally, we will discuss various structure theorems and open questions about degeneracies of the limiting distribution and connections to quasirandom graphs.

Joint work with Anirban Chatterjee, Soham Dan, and Svante Janson

Large deviations for triangles in random graphs in the critical regime

Series
Stochastics Seminar
Time
Thursday, September 5, 2024 - 15:30 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Will PerkinsGeorgia Tech

A classic problem in probability theory and combinatorics is to estimate the probability that the random graph G(n,p) contains no triangles.  This problem can be viewed as a question in "non-linear large deviations".   The asymptotics of the logarithm of this probability (and related lower tail probabilities) are known in two distinct regimes.  When p>> 1/\sqrt{n}, at this level of accuracy the probability matches that of G(n,p) being bipartite; and when p<< 1/\sqrt{n}, Janson's Inequality gives the asymptotics of the log.  I will discuss a new approach to estimating this probability in the "critical regime": when p = \Theta(1/\sqrt{n}).  The approach uses ideas from statistical physics and algorithms and gives information about the typical structure of graphs drawn from the corresponding conditional distribution.  Based on joint work with Matthew Jenssen, Aditya Potukuchi, and Michael Simkin.

Estimation of trace functionals of covariance operators

Series
Stochastics Seminar
Time
Thursday, August 29, 2024 - 15:30 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Vladimir KoltchinskiiGeorgia Tech

We will discuss a problem of estimation of functionals of the form $\tau_f(\Sigma):= {\rm tr} (f(\Sigma))$ of unknown covariance operator $\Sigma$ of a centered Gaussian random variable $X$ in a separable Hilbert space ${\mathbb H}$ based on i.i.d. observation $X_1,\dots, X_n$ of $X,$ where $f:{\mathbb R}\mapsto {\mathbb R}$ is a given function. A naive plug-in estimator $\tau_f(\hat \Sigma_n)$ based on the sample covariance operator $\hat \Sigma_n$ has a large bias and bias reduction methods are needed to construct estimators with better error rates. We develop estimators with reduced bias based on linear aggregation of several plug-in estimators with different sample sizes and obtain the error bounds for such estimators with explicit dependence on the sample size $n,$ the effective rank ${\bf r}(\Sigma)= \frac{tr(\Sigma)}{\|\Sigma\|}$ of covariance operator $\Sigma$ and the degree of smoothness of function $f.$

Asymptotic mutual information for quadratic estimation problems over compact groups

Series
Stochastics Seminar
Time
Thursday, August 22, 2024 - 15:30 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Timothy WeeGeorgia Tech

Motivated by applications to group synchronization and quadratic assignment on random data, we study a general problem of Bayesian inference of an unknown “signal” belonging to a high-dimensional compact group, given noisy pairwise observations of a featurization of this signal.


We establish a quantitative comparison between the signal-observation mutual information in any such problem with that in a simpler model with linear observations, using interpolation methods. For group synchronization, our result proves a replica formula for the asymptotic mutual information and Bayes-optimal mean-squared error. Via analyses of this replica formula, we show that the conjectural phase transition threshold for computationally-efficient weak recovery of the signal is determined by a classification of the real-irreducible components of the observed group representation(s), and we fully characterize the information-theoretic limits of estimation in the example of angular/phase synchronization over SO(2)/U(1). For quadratic assignment, we study observations given by a kernel matrix of pairwise similarities and a randomly permuted and noisy counterpart, and we show in a bounded signal-to-noise regime that the asymptotic mutual information coincides with that in a Bayesian spiked model with i.i.d. signal prior.


This is based on joint work with Kaylee Yang and Zhou Fan.

Max-sliced Wasserstein distances

Series
Stochastics Seminar
Time
Thursday, April 25, 2024 - 15:30 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
March BoedihardjoMichigan State University

I will give essentially matching upper and lower bounds for the expected max-sliced 1-Wasserstein distance between a probability measure on a separable Hilbert space and its empirical distribution from n samples. A version of this result for Banach spaces will also be presented. From this, we will derive an upper bound for the expected max-sliced 2-Wasserstein distance between a symmetric probability measure on a Euclidean space and its symmetrized empirical distribution.

Branching Brownian motion and the road-field model

Series
Stochastics Seminar
Time
Thursday, April 18, 2024 - 15:30 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Nick CookDuke University

The Fisher-KPP equation was introduced in 1937 to model the spread of an advantageous gene through a spatially distributed population. Remarkably precise information on the traveling front has been obtained via a connection with branching Brownian motion, beginning with works of McKean and Bramson in the 70s. I will discuss an extension of this probabilistic approach to the Road-Field Model: a reaction-diffusion PDE system introduced by H. Berestycki et al. to describe enhancement of biological invasions by a line of fast diffusion, such as a river or a road. Based on joint work with Amir Dembo.

 

Pages