Seminars and Colloquia by Series

A precise high-dimensional theory for Boosting

Series
Stochastics Seminar
Time
Thursday, October 1, 2020 - 15:30 for 1 hour (actually 50 minutes)
Location
https://bluejeans.com/276389634
Speaker
Pragya SurHarvard University

This talk will introduce a precise high-dimensional asymptotic theory for Boosting (AdaBoost) on separable data, taking both statistical and computational perspectives. We will consider the common modern setting where the number of features p and the sample size n are both large and comparable, and in particular, look at scenarios where the data is asymptotically separable. Under a class of statistical models, we will provide an (asymptotically) exact analysis of the generalization error of AdaBoost, when the algorithm interpolates the training data and maximizes an empirical L1 margin. On the computational front, we will provide a sharp analysis of the stopping time when boosting approximately maximizes the empirical L1 margin. Our theory provides several insights into properties of Boosting; for instance, the larger the dimensionality ratio p/n, the faster the optimization reaches interpolation. At the heart of our theory lies an in-depth study of the maximum L1-margin, which can be accurately described by a new system of non-linear equations; we analyze this margin and the properties of this system, using Gaussian comparison techniques and a novel uniform deviation argument. Time permitting, I will present analogous results for a new class of boosting algorithms that correspond to Lq geometry, for q>1. This is based on joint work with Tengyuan Liang.

Statistical Inference in Popularity Adjusted Stochastic Block Model

Series
Stochastics Seminar
Time
Thursday, September 24, 2020 - 15:30 for 1 hour (actually 50 minutes)
Location
https://ucf.zoom.us/j/92646603521?pwd=TnRGSVo1WXo2bjE4Y3JEVGRPSmNWQT09
Speaker
Marianna PenskyUniversity of Central Florida

The talk considers the Popularity Adjusted Block model (PABM) introduced by Sengupta and Chen (2018). We argue that the main appeal of the PABM is the flexibility of the spectral properties of the graph which makes the PABM an attractive choice for modeling networks that appear in, for example, biological sciences. In addition, to the best of our knowledge, the PABM is the only stochastic block model that allows to treat the network sparsity as the structural sparsity that describes community patterns, rather than being an attribute of the network as a whole.

Link to Zoom meeting: https://ucf.zoom.us/j/92646603521?pwd=TnRGSVo1WXo2bjE4Y3JEVGRPSmNWQT09

Couplings of Markov chain Monte Carlo and their uses

Series
Stochastics Seminar
Time
Thursday, September 10, 2020 - 15:30 for 1 hour (actually 50 minutes)
Location
https://us02web.zoom.us/j/83378796301
Speaker
Pierre JacobHarvard University

Markov chain Monte Carlo (MCMC) methods are state-of-the-art techniques for numerical integration. MCMC methods yield estimators that converge to integrals of interest in the limit of the number of iterations, obtained from Markov chains that converge to stationarity. This iterative asymptotic justification is not ideal. Indeed the literature offers little practical guidance on how many iterations should be performed, despite decades of research on the topic. This talk will describe a computational approach to address some of these issues. The key idea, pioneered by Glynn and Rhee in 2014, is to generate couplings of Markov chains, whereby pairs of chains contract, coalesce or even "meet" after a random number of iterations; we will see that these meeting times, which can be simulated in many practical settings, contain useful information about the finite-time marginal distributions of the chains. This talk will provide an overview of this line of research, joint work with John O'Leary, Yves Atchadé and various collaborators.
The main reference is available here: https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/rssb.12336

Eigenvectors' overlaps for integrable models of non-Hermitian random matrices

Series
Stochastics Seminar
Time
Thursday, April 2, 2020 - 15:00 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Guillaume Dubach

Right and left eigenvectors of non-Hermitian matrices form a bi-orthogonal system to which one can associate homogeneous quantities known as overlaps. The matrix of overlaps quantifies the stability of the spectrum and characterizes the joint eigenvalues increments under Dyson-type dynamics. Overlaps first appeared in the physics literature: Chalker and Mehlig calculated their conditional expectation for complex Ginibre matrices (1998). For the same model, we extend their results by deriving the distribution of the overlaps and their correlations (joint work with P. Bourgade). Similar results can be obtained for quaternionic Gaussian matrices, as well as matrices from the spherical and truncated-unitary ensembles.

Complexity of the pure spherical p-spin model

Series
Stochastics Seminar
Time
Thursday, March 12, 2020 - 15:05 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Julian GoldNorthwestern University

The pure spherical p-spin model is a Gaussian random polynomial H of degree p on an N-dimensional sphere, with N large. The sphere is viewed as the state space of a physical system with many degrees of freedom, and the random function H is interpreted as a smooth assignment of energy to each state, i.e. as an energy landscape. 

In 2012, Auffinger, Ben Arous and Cerny used the Kac-Rice formula to count the average number of critical points of H having a given index, and with energy below a given value. This number is exponentially large in N for p > 2, and the rate of growth itself is a function of the index chosen and of the energy cutoff. This function, called the complexity, reveals interesting topological information about the landscape H: it was shown that below an energy threshold marking the bottom of the landscape, all critical points are local minima or saddles with an index not diverging with N. It was shown that these finite-index saddles have an interesting nested structure, despite their number being exponentially dominated by minima up to the energy threshold. The total complexity (considering critical points of any index) was shown to be positive at energies close to the lowest. Thus, at least from the perspective of the average number of critical points, these random landscapes are very non-convex. The high-dimensional and rugged aspects of these landscapes make them relevant to the folding of large molecules and the performance of neural nets. 

Subag made a remarkable contribution in 2017, when he used a second-moment approach to show that the total number of critical points concentrates around its mean. In light of the above, when considering critical points near the bottom of the landscape, we can view Subag's result as a statement about the concentration of the number of local minima. His result demonstrated that the typical behavior of the minima reflects their average behavior. We complete the picture for the bottom of the landscape by showing that the number of critical points of any finite index concentrates around its mean. This information is important to studying associated dynamics, for instance navigation between local minima. Joint work with Antonio Auffinger and Yi Gu at Northwestern. 

Martingales and descents

Series
Stochastics Seminar
Time
Thursday, March 5, 2020 - 15:00 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Alperen OzdemirUniversity of Southern California

We provide a martingale proof of the fact that the number of descents in random permutations is asymptotically normal with an error bound of order n^{-1/2}. The same techniques are shown to be applicable to other descent and descent-related statistics as they satisfy certain recurrence relation conditions. These statistics include inversions, descents in signed permutations, descents in Stirling permutations, the length of the longest alternating subsequences, descents in matchings and two-sided Eulerian numbers.

Lower Deviations and Convexity

Series
Stochastics Seminar
Time
Thursday, February 27, 2020 - 15:05 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Petros ValettasUniversity of Missouri, Columbia

While deviation estimates above the mean is a very well studied subject in high-dimensional probability, for their lower analogues far less are known. However, it has been observed, in several key situations, that lower deviation inequalities exhibit very different and stronger behavior. In this talk I will discuss how convexity can serve as a key feature to (a) explain this distinction, (b) obtain improved lower tail bounds, and (c) characterize the tightness of Gaussian concentration. 

Critical first-passage percolation in high dimensions

Series
Stochastics Seminar
Time
Thursday, February 20, 2020 - 15:05 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Jack HansonCity College of New York

In critical Bernoulli percolation on $\mathbb{Z}^d$ for $d$ large, it is known that there are a.s. no infinite open clusters. In particular, for n large, every path from the origin to the boundary of $[-n, n]^d$ must contain some closed edges. Let $T_n$ be the (random) minimal number of closed edges in such a path. How does $T_n$ grow with $n$? We present results showing that for d larger than the upper critical dimension for Bernoulli percolation ($d > 6$), $T_n$ is typically of the order $\log \log n$. This is in contrast with the $d = 2$ case, where $T_n$ grows logarithmically. Perhaps surprisingly, the model exhibits another major change in behavior depending on whether $d > 8$.

Heat semigroup approach to isoperimetric inequalities in metric measure spaces

Series
Stochastics Seminar
Time
Thursday, January 30, 2020 - 15:05 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Patricia Alonso-RuizTexas A&M University

The classical isoperimetric problem consists in finding among all sets with the same volume (measure) the one that minimizes the surface area (perimeter measure). In the Euclidean case, balls are known to solve this problem. To formulate the isoperimetric problem, or an isoperimetric inequality, in more general settings, requires in particular a good notion of perimeter measure.

The starting point of this talk will be a characterization of sets of finite perimeter original to Ledoux that involves the heat semigroup associated to a given stochastic process in the space. This approach put in connection isoperimetric problems and functions of bounded variation (BV) via heat semigroups, and we will extend these ideas to develop a natural definition of BV functions and sets of finite perimeter on metric measure spaces. In particular, we will obtain corresponding isoperimetric inequalies in this setting.

The main assumption on the underlying space will be a non-negative curvature type condition that we call weak Bakry-Émery and is satisfied in many examples of interest, also in fractals such as (infinite) Sierpinski gaskets and carpets. The results are part of joint work with F. Baudoin, L. Chen, L. Rogers, N. Shanmugalingam and A. Teplyaev.

Pages