Seminars and Colloquia by Series

Robust construction of the incipient infinite cluster in high-dimensional percolation

Series
Stochastics Seminar
Time
Thursday, April 24, 2025 - 15:30 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Pranav ChinmayCUNY graduate center

The incipient infinite cluster was first proposed by physicists in the 1970s as a canonical example of a two-dimensional medium on which random walk is subdiffusive. It is the measure obtained in critical percolation by conditioning on the existence of an infinite cluster, which is a probability zero event. Kesten presented the first rigorous two-dimensional construction of this object as a weak limit of the one-arm event. In high dimensions, van der Hofstad and Jarai constructed the IIC as a weak limit of the two-point connection using the lace expansion. Our work presents a new high-dimensional construction which is "robust", establishing that the weak limit is independent of the choice of conditioning. The main tools used are Kesten's original two-dimensional construction combined with Kozma and Nachmias' regularity method. Our robustness allows for several applications, such as the explicit computation of the limiting distribution of the chemical distance, which forms the content of our upcoming project. This is joint work with Shirshendu Chatterjee, Jack Hanson, and Philippe Sosoe. The preprint can be found at https://arxiv.org/abs/2502.10882.

Injective norm of random tensors and quantum states

Series
Stochastics Seminar
Time
Thursday, April 3, 2025 - 15:30 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Stephane DartoisUniversité Paris-Saclay, CEA, List

In this talk, I will present the results of a collaboration with Benjamin McKenna on the injective norm of large random Gaussian tensors and uniform random quantum states, and describe some of the context underlying this work. The injective norm is a natural generalization to tensors of the operator norm of a matrix and appears in multiple fields. In quantum information, the injective norm is one important measure of genuine multipartite entanglement of quantum states, known as geometric entanglement. In our preprint, we provide a high-probability upper bound on the injective norm of real and complex Gaussian random tensors, which corresponds to a lower bound on the geometric entanglement of random quantum states, and to a bound on the ground-state energy of a particular multispecies spherical spin glass model.

Randomized Iterative Sketch-and-Project Methods as Efficient Large-Scale Linear Solvers

Series
Stochastics Seminar
Time
Thursday, March 27, 2025 - 15:30 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Elizaveta RebrovaPrinceton

Randomized Kaczmarz methods — popular special case of the sketch-and-project optimization framework — solve linear systems through iterative projections onto randomly selected equations, resulting in exponential expected convergence via cheap, local updates. While known to be effective in highly overdetermined problems or under the restricted data access, identifying generic scenarios where these methods are advantageous compared to classical Krylov subspace solvers (e.g., Conjugate Gradient, LSQR, GMRES) remained open. In this talk, I will present our recent results demonstrating that properly designed randomized Kaczmarz (sketch-and-project) methods can outperform Krylov methods for both square and rectangular systems complexity-wise. In addition, they are particularly advantageous for approximately low-rank systems common in machine learning (e.g., kernel matrices, signal-plus-noise models) as they quickly capture the large outlying singular values of the linear system. Our approach combines novel spectral analysis of randomly sketched projection matrices with classical numerical analysis techniques, such as including momentum, adaptive regularization, and memoization.

Statistical problems for Smoluchowski processes

Series
Stochastics Seminar
Time
Tuesday, March 25, 2025 - 15:30 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Alexander GoldenshlugerUniversity of Haifa

Suppose that particles are randomly distributed in Rd, and they are subject to identical stochastic motion independently of each other. The Smoluchowski process describes fluctuations of the number of particles in an observation region over time. The goal is to infer on particle displacement process from such count data. We discuss probabilistic properties of the Smoluchowski processes and consider related statistical problems for two different models of the particle displacement process: the undeviated uniform motion (when a particle moves with random constant velocity along a straight line) and the Brownian motion displacement. In these settings we develop estimators with provable accuracy guarantees.

Matrix superconcentration inequalities

Series
Stochastics Seminar
Time
Thursday, March 13, 2025 - 15:30 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Tatiana BrailovskayaDuke University

One way to understand the concentration of the norm of a random matrix X with Gaussian entries is to apply a standard concentration inequality, such as the one for Lipschitz functions of i.i.d. standard Gaussian variables, which yields subgaussian tail bounds on the norm of X. However, as was shown by Tracy and Widom in 1990s, when the entries of X are i.i.d. the norm of X exhibits even sharper concentration. The phenomenon of a function of many i.i.d. variables having strictly smaller tails than those predicted by classical concentration inequalities is sometimes referred to as «superconcentration», a term originally dubbed by Chatterjee. I will discuss novel results that can be interpreted as superconcentration inequalities for the norm of X, where X is a Gaussian random matrix with independent entries and an arbitrary variance profile. We can also view our results as a nonhomogeneous extension of Tracy-Widom-type upper tail estimates for the norm of X.

Why are the logits of trained models distorted? A theory of overfitting for imbalanced classification

Series
Stochastics Seminar
Time
Thursday, March 6, 2025 - 15:30 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Yiqiao ZhongUniversity of Wisconsin–Madison

Data imbalance is a fundamental challenge in data analysis, where minority classes account for a small fraction of the training data compared to majority classes. Many existing techniques attempt to compensate for the underrepresentation of minority classes, which are often critical in applications such as rare disease detection and anomaly detection. Notably, in empirical deep learning, the large model size exacerbates the issue. However, despite extensive empirical heuristics, the statistical foundations of these methods remain underdeveloped, which poses an issue to the reliability of these machine learning models.

In this talk, I will examine imbalanced classification problems in high dimensions, focusing on support vector machine (SVMs) and logistic regression. I will introduce a "truncation" phenomenon---which we verifed across single-cell tabular data, image data, and text data---where overfitting in high dimensions distorts the distribution of logits on training data. I will provide a theoretical foundation by characterizing the asymptotic distribution via a variational formulation. This analysis formalizes the intuition that overfitting disproportionately harms minority classes and reveals how margin rebalancing---a widely used deep learning heuristic---mitigates data imbalance. As a consequence, the theory offers both qualitative and quantitative insights into generalization errors and uncertainty measures such as calibration.

This talk is based on a joint work with Jingyang Lyu (3rd-year Stats PhD student) and Kangjie Zhou (Columbia Statistics): arXiv:2502.11323.

Approximate Messaging Passing Algorithms for High-dimensional Estimation and Inference

Series
Stochastics Seminar
Time
Thursday, February 20, 2025 - 15:30 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Cynthia RushColumbia University

In this talk, I discuss how one can use approximate message passing (AMP) algorithms — a class of efficient, iterative algorithms that have been successfully employed in many statistical learning tasks like high-dimensional linear regression and low-rank matrix estimation — for characterizing exact statistical properties of estimators in a high-dimensional asymptotic regime, where the sample size of the data is proportional to the number of parameters in the problem. As a running example, we will study sorted L1 penalization (SLOPE) for linear regression and show how AMP theory can be used to give insights on the variable selection properties of this estimator by characterizing the optimal trade-off between measures of type I and type II error. Collaborators on this work include Zhiqi Bu, Oliver Feng, Jason Klusowski, Richard Samworth, Weijie Su, Ramji Venkataramanan, and Ruijia Wu.

Law of Large Numbers and Central Limit Theorem for random sets of solitons for the Korteweg-de Vries equation

Series
Stochastics Seminar
Time
Thursday, February 13, 2025 - 15:30 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Manuela GirottiEmory University

N. Zabusky coined the word "soliton" in 1965 to describe a curious feature he and M. Kruskal observed in their numerical simulations of the initial-value problem for a simple nonlinear PDE. The first part of the talk will be a broad introduction to the theory of solitons/solitary waves and integrable PDEs (the Korteweg-de Vries equation in particular), describing classical results in the field. The second (and main) part of the talk will focus on some new developments and growing interest into a special case of solutions defined as "soliton gas".

 

We study random configurations of N soliton solutions q_N(x,t) of the KdV equation. The randomness appears in the scattering (linear) problem, which is used to solve the PDE: the complex eigenvalues are chosen to be (1) i.i.d. random variables sampled from a probability distribution with compact support on the complex plane, or (2) sampled from a random matrix law. 

Next, we consider the scattering problem for the expectation of the random measure associated to the spectral data, in the limit as N -> + infinity. The corresponding solution q(x,t) of the KdV equation is a soliton gas. 

We are then able to prove a Law of Large Numbers and a Central Limit Theorem for the differences q_N(x,t)-q(x,t).

 

This is a collection of works (and ongoing collaborations) done with K. McLaughlin (Tulane U.), T. Grava (SISSA/Bristol), R. Jenkins (UCF), A. Minakov (U. Karlova), J. Najnudel (Bristol).

Adaptive density estimation under low-rank constraints

Series
Stochastics Seminar
Time
Thursday, February 6, 2025 - 15:30 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Olga KloppESSEC Business School and CREST

In this talk, we address the challenge of bivariate probability density estimation under low-rank constraints for both discrete and continuous distributions. For discrete distributions, we model the target as a low-rank probability matrix. In the continuous case, we assume the density function is Lipschitz continuous over an unknown compact rectangular support and can be decomposed into a sum of K separable components, each represented as a product of two one-dimensional functions. We introduce an estimator that leverages these low-rank constraints, achieving significantly improved convergence rates. Specifically, for continuous distributions, our estimator converges in total variation at the one-dimensional rate of (K/n)^{1/3} up to logarithmic factors, while adapting to both the unknown support and the unknown number of separable components. We also derive lower bounds for both discrete and continuous cases, demonstrating that our estimators achieve minimax optimal convergence rates within logarithmic factors. Furthermore, we introduce efficient algorithms for the practical computation of these estimators.

Pages