Seminars and Colloquia by Series

Bias-Variance Tradeoffs in Joint Spectral Embeddings

Series
Stochastics Seminar
Time
Thursday, November 5, 2020 - 15:30 for 1 hour (actually 50 minutes)
Location
https://bluejeans.com/974631214
Speaker
Daniel SussmanBoston University

We consider the ramifications of utilizing biased latent position estimates in subsequent statistical analysis in exchange for sizable variance reductions in finite networks. We establish an explicit bias-variance tradeoff for latent position estimates produced by the omnibus embedding in the presence of heterogeneous network data. We reveal an analytic bias expression, derive a uniform concentration bound on the residual term, and prove a central limit theorem characterizing the distributional properties of these estimates.

Link to the BlueJeans meeting https://bluejeans.com/974631214

Higher-order fluctuations in dense random graph models (note the unusual time: 5pm)

Series
Stochastics Seminar
Time
Thursday, October 22, 2020 - 17:00 for 1 hour (actually 50 minutes)
Location
https://bluejeans.com/751242993/PASSWORD (To receive the password, please email Lutz Warnke)
Speaker
Adrian RoellinNational University of Singapore

Dense graph limit theory is essentially a first-order limit theory analogous to the classical Law of Large Numbers. Is there a corresponding central limit theorem? We believe so. Using the language of Gaussian Hilbert Spaces and the comprehensive theory of generalised U-statistics developed by Svante Janson in the 90s, we identify a collection of Gaussian measures (aka white noise processes) that describes the fluctuations of all orders of magnitude for a broad family of random graphs. We complement the theory with error bounds using a new variant of Stein’s method for multivariate normal approximation, which allows us to also generalise Janson’s theory in some important aspects. This is joint work with Gursharn Kaur.

Please note the unusual time: 5pm

Coalescence estimates for the corner growth model with exponential weights

Series
Stochastics Seminar
Time
Thursday, October 15, 2020 - 15:30 for 1 hour (actually 50 minutes)
Location
Bluejeans (link to be sent)
Speaker
Xiao ShenUniversity of Wisconsin

(Joint work with Timo Seppäläinen) We establish estimates for the coalescence time of semi-infinite directed geodesics in the planar corner growth model with i.i.d. exponential weights. There are four estimates: upper and lower bounds on the probabilities of both fast and slow coalescence on the correct spatial scale with exponent 3/2. Our proofs utilize a geodesic duality introduced by Pimentel and properties of the increment-stationary last-passage percolation process. For fast coalescence our bounds are new and they have matching optimal exponential order of magnitude. For slow coalescence, we reproduce bounds proved earlier with integrable probability inputs, except that our upper bound misses the optimal order by a logarithmic factor.

Approximate Kernel Principal Component Analysis: Computational vs. Statistical Trade-off

Series
Stochastics Seminar
Time
Thursday, October 8, 2020 - 15:30 for 1 hour (actually 50 minutes)
Location
https://gatech.webex.com/gatech/j.php?MTID=mdd4512d3d11623149a0bd46d9fc086c8
Speaker
Bharath SriperumbudurPennsylvania State University

Kernel principal component analysis (KPCA) is a popular non-linear dimensionality reduction technique, which generalizes classical linear PCA by finding functions in a reproducing kernel Hilbert space (RKHS) such that the function evaluation at a random variable $X$ has a maximum variance. Despite its popularity, kernel PCA suffers from poor scalability in big data scenarios as it involves solving a $n \times  n$ eigensystem leading to the computational complexity of $O(n^3)$ with $n$ being the number of samples. To address this issue, in this work, we consider a random feature approximation to kernel PCA which requires solving an $m \times m$ eigenvalue problem and therefore has a computational complexity of $O(m^3+nm^2)$, implying that the approximate method is computationally efficient if $m$ < $n$ with $m$ being the number of random features. The goal of this work is to investigate the trade-off between computational and statistical behaviors of approximate KPCA, i.e., whether the computational gain is achieved at the cost of statistical efficiency. We show that the approximate KPCA is both computationally and statistically efficient compared to KPCA in terms of the error associated with reconstructing a kernel function based on its projection onto the corresponding eigenspaces.

Link to Cisco Webex meeting: https://gatech.webex.com/gatech/j.php?MTID=mdd4512d3d11623149a0bd46d9fc086c8

A precise high-dimensional theory for Boosting

Series
Stochastics Seminar
Time
Thursday, October 1, 2020 - 15:30 for 1 hour (actually 50 minutes)
Location
https://bluejeans.com/276389634
Speaker
Pragya SurHarvard University

This talk will introduce a precise high-dimensional asymptotic theory for Boosting (AdaBoost) on separable data, taking both statistical and computational perspectives. We will consider the common modern setting where the number of features p and the sample size n are both large and comparable, and in particular, look at scenarios where the data is asymptotically separable. Under a class of statistical models, we will provide an (asymptotically) exact analysis of the generalization error of AdaBoost, when the algorithm interpolates the training data and maximizes an empirical L1 margin. On the computational front, we will provide a sharp analysis of the stopping time when boosting approximately maximizes the empirical L1 margin. Our theory provides several insights into properties of Boosting; for instance, the larger the dimensionality ratio p/n, the faster the optimization reaches interpolation. At the heart of our theory lies an in-depth study of the maximum L1-margin, which can be accurately described by a new system of non-linear equations; we analyze this margin and the properties of this system, using Gaussian comparison techniques and a novel uniform deviation argument. Time permitting, I will present analogous results for a new class of boosting algorithms that correspond to Lq geometry, for q>1. This is based on joint work with Tengyuan Liang.

Statistical Inference in Popularity Adjusted Stochastic Block Model

Series
Stochastics Seminar
Time
Thursday, September 24, 2020 - 15:30 for 1 hour (actually 50 minutes)
Location
https://ucf.zoom.us/j/92646603521?pwd=TnRGSVo1WXo2bjE4Y3JEVGRPSmNWQT09
Speaker
Marianna PenskyUniversity of Central Florida

The talk considers the Popularity Adjusted Block model (PABM) introduced by Sengupta and Chen (2018). We argue that the main appeal of the PABM is the flexibility of the spectral properties of the graph which makes the PABM an attractive choice for modeling networks that appear in, for example, biological sciences. In addition, to the best of our knowledge, the PABM is the only stochastic block model that allows to treat the network sparsity as the structural sparsity that describes community patterns, rather than being an attribute of the network as a whole.

Link to Zoom meeting: https://ucf.zoom.us/j/92646603521?pwd=TnRGSVo1WXo2bjE4Y3JEVGRPSmNWQT09

Couplings of Markov chain Monte Carlo and their uses

Series
Stochastics Seminar
Time
Thursday, September 10, 2020 - 15:30 for 1 hour (actually 50 minutes)
Location
https://us02web.zoom.us/j/83378796301
Speaker
Pierre JacobHarvard University

Markov chain Monte Carlo (MCMC) methods are state-of-the-art techniques for numerical integration. MCMC methods yield estimators that converge to integrals of interest in the limit of the number of iterations, obtained from Markov chains that converge to stationarity. This iterative asymptotic justification is not ideal. Indeed the literature offers little practical guidance on how many iterations should be performed, despite decades of research on the topic. This talk will describe a computational approach to address some of these issues. The key idea, pioneered by Glynn and Rhee in 2014, is to generate couplings of Markov chains, whereby pairs of chains contract, coalesce or even "meet" after a random number of iterations; we will see that these meeting times, which can be simulated in many practical settings, contain useful information about the finite-time marginal distributions of the chains. This talk will provide an overview of this line of research, joint work with John O'Leary, Yves Atchadé and various collaborators.
The main reference is available here: https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/rssb.12336

Eigenvectors' overlaps for integrable models of non-Hermitian random matrices

Series
Stochastics Seminar
Time
Thursday, April 2, 2020 - 15:00 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Guillaume Dubach

Right and left eigenvectors of non-Hermitian matrices form a bi-orthogonal system to which one can associate homogeneous quantities known as overlaps. The matrix of overlaps quantifies the stability of the spectrum and characterizes the joint eigenvalues increments under Dyson-type dynamics. Overlaps first appeared in the physics literature: Chalker and Mehlig calculated their conditional expectation for complex Ginibre matrices (1998). For the same model, we extend their results by deriving the distribution of the overlaps and their correlations (joint work with P. Bourgade). Similar results can be obtained for quaternionic Gaussian matrices, as well as matrices from the spherical and truncated-unitary ensembles.

Complexity of the pure spherical p-spin model

Series
Stochastics Seminar
Time
Thursday, March 12, 2020 - 15:05 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Julian GoldNorthwestern University

The pure spherical p-spin model is a Gaussian random polynomial H of degree p on an N-dimensional sphere, with N large. The sphere is viewed as the state space of a physical system with many degrees of freedom, and the random function H is interpreted as a smooth assignment of energy to each state, i.e. as an energy landscape. 

In 2012, Auffinger, Ben Arous and Cerny used the Kac-Rice formula to count the average number of critical points of H having a given index, and with energy below a given value. This number is exponentially large in N for p > 2, and the rate of growth itself is a function of the index chosen and of the energy cutoff. This function, called the complexity, reveals interesting topological information about the landscape H: it was shown that below an energy threshold marking the bottom of the landscape, all critical points are local minima or saddles with an index not diverging with N. It was shown that these finite-index saddles have an interesting nested structure, despite their number being exponentially dominated by minima up to the energy threshold. The total complexity (considering critical points of any index) was shown to be positive at energies close to the lowest. Thus, at least from the perspective of the average number of critical points, these random landscapes are very non-convex. The high-dimensional and rugged aspects of these landscapes make them relevant to the folding of large molecules and the performance of neural nets. 

Subag made a remarkable contribution in 2017, when he used a second-moment approach to show that the total number of critical points concentrates around its mean. In light of the above, when considering critical points near the bottom of the landscape, we can view Subag's result as a statement about the concentration of the number of local minima. His result demonstrated that the typical behavior of the minima reflects their average behavior. We complete the picture for the bottom of the landscape by showing that the number of critical points of any finite index concentrates around its mean. This information is important to studying associated dynamics, for instance navigation between local minima. Joint work with Antonio Auffinger and Yi Gu at Northwestern. 

Pages