Seminars and Colloquia by Series

Network data: Modeling and Statistical Analysis

Series
Job Candidate Talk
Time
Thursday, January 10, 2019 - 11:00 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Subhabrata SenMIT
Network data arises frequently in modern scientific applications. These networks often have specific characteristics such as edge sparsity, heavy-tailed degree distribution etc. Some broad challenges arising in the analysis of such datasets include (i) developing flexible, interpretable models for network datasets, (ii) testing for goodness of fit, (iii) provably recovering latent structure from such data.In this talk, we will discuss recent progress in addressing very specific instantiations of these challenges. In particular, we will1. Interpret the Caron-Fox model using notions of graph sub-sampling, 2. Study model misspecification due to rare, highly “influential” nodes, 3. Discuss recovery of community structure, given additional covariates.

A modern maximum-likelihood approach for high-dimensional logistic regression

Series
Job Candidate Talk
Time
Tuesday, January 8, 2019 - 15:05 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Pragya SurStatistics Department, Stanford University
Logistic regression is arguably the most widely used and studied non-linear model in statistics. Classical maximum-likelihood theory based statistical inference is ubiquitous in this context. This theory hinges on well-known fundamental results—(1) the maximum-likelihood-estimate (MLE) is asymptotically unbiased and normally distributed, (2) its variability can be quantified via the inverse Fisher information, and (3) the likelihood-ratio-test (LRT) is asymptotically a Chi-Squared. In this talk, I will show that in the common modern setting where the number of features and the sample size are both large and comparable, classical results are far from accurate. In fact, (1) the MLE is biased, (2) its variability is far greater than classical results, and (3) the LRT is not distributed as a Chi-Square. Consequently, p-values obtained based on classical theory are completely invalid in high dimensions. In turn, I will propose a new theory that characterizes the asymptotic behavior of both the MLE and the LRT under some assumptions on the covariate distribution, in a high-dimensional setting. Empirical evidence demonstrates that this asymptotic theory provides accurate inference in finite samples. Practical implementation of these results necessitates the estimation of a single scalar, the overall signal strength, and I will propose a procedure for estimating this parameter precisely. This is based on joint work with Emmanuel Candes and Yuxin Chen.

Large-time dynamics in intracellular transport

Series
Job Candidate Talk
Time
Thursday, November 29, 2018 - 11:00 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Dr. Veronica CiocanelMathematical Biosciences Institute at The Ohio State University
The cellular cytoskeleton ensures the dynamic transport, localization, and anchoring of various proteins and vesicles. In the development of egg cells into embryos, messenger RNA (mRNA) molecules bind and unbind to and from cellular roads called microtubules, switching between bidirectional transport, diffusion, and stationary states. Since models of intracellular transport can be analytically intractable, asymptotic methods are useful in understanding effective cargo transport properties as well as their dependence on model parameters.We consider these models in the framework of partial differential equations as well as stochastic processes and derive the effective velocity and diffusivity of cargo at large time for a general class of problems. Including the geometry of the microtubule filaments allows for better prediction of particle localization and for investigation of potential anchoring mechanisms. Our numerical studies incorporating model microtubule structures suggest that anchoring of mRNA-molecular motor complexes may be necessary in localization, to promote healthy development of oocytes into embryos. I will also briefly go over other ongoing projects and applications related to intracellular transport.

Polynomial Techniques in Quantitative Linear Algebra

Series
Job Candidate Talk
Time
Wednesday, March 7, 2018 - 11:00 for 1 hour (actually 50 minutes)
Location
Skiles 005
Speaker
Adam MarcusPrinceton University
I will discuss a recent line of research that uses properties of real rooted polynomials to get quantitative estimates in combinatorial linear algebra problems. I will start by discussing the main result that bridges the two areas (the "method of interlacing polynomials") and show some examples of where it has been used successfully (e.g. Ramanujan families and the Kadison Singer problem). I will then discuss some more recent work that attempts to make the method more accessible by providing generic tools and also attempts to explain the accuracy of the method by linking it to random matrix theory and (in particular) free probability. I will end by mentioning some current research initiatives as well as possible future directions.

Mean Field Variational Inference: Computational and Statistical Guarantees

Series
Job Candidate Talk
Time
Thursday, February 1, 2018 - 11:00 for 1 hour (actually 50 minutes)
Location
skiles 006
Speaker
Anderson Ye ZhangYale
The mean field variational inference is widely used in statistics and machine learning to approximate posterior distributions. Despite its popularity, there exist remarkably little fundamental theoretical justifications. The success of variational inference mainly lies in its iterative algorithm, which, to the best of our knowledge, has never been investigated for any high-dimensional or complex model. In this talk, we establish computational and statistical guarantees of mean field variational inference. Using community detection problem as a test case, we show that its iterative algorithm has a linear convergence to the optimal statistical accuracy within log n iterations. We are optimistic to go beyond community detection and to understand mean field under a general class of latent variable models. In addition, the technique we develop can be extended to analyzing Expectation-maximization and Gibbs sampler.

Invertibility and spectrum of random matrices: a convex-geometric approach

Series
Job Candidate Talk
Time
Tuesday, January 23, 2018 - 11:00 for 1 hour (actually 50 minutes)
Location
Skiles 005
Speaker
Konstantin TikhomirovPrinceton University
Convex-geometric methods, involving random projection operators and coverings, have been successfully used in the study of the largest and smallest singular values, delocalization of eigenvectors, and in establishing the limiting spectral distribution for certain random matrix models. Among further applications of those methods in computer science and statistics are restricted invertibility and dimension reduction, as well as approximation of covariance matrices of multidimensional distributions. Conversely, random linear operators play a very important role in geometric functional analysis. In this talk, I will discuss some recent results (by my collaborators and myself) within convex geometry and the theory of random matrices, focusing on invertibility of square non-Hermitian random matrices (with applications to numerical analysis and the study of the limiting spectral distribution of directed d-regular graphs), approximation of covariance matrices (in particular, a strengthening of the Bai–Yin theorem), as well as some applications of random operators in convex geometry.

TBA by Cheng Mao

Series
Job Candidate Talk
Time
Thursday, January 18, 2018 - 11:00 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Cheng MaoYale University

TBA by Cheng Mao

Series
Job Candidate Talk
Time
Thursday, January 18, 2018 - 11:00 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Cheng MaoYale University
TBA by Cheng Mao

High Dimensional Inference: Semiparametrics, Counterfactuals, and Heterogeneity

Series
Job Candidate Talk
Time
Tuesday, January 16, 2018 - 15:00 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Ying ZhuMichigan State University
Semiparametric regressions enjoy the flexibility of nonparametric models as well as the in-terpretability of linear models. These advantages can be further leveraged with recent ad-vance in high dimensional statistics. This talk begins with a simple partially linear model,Yi = Xi β ∗ + g ∗ (Zi ) + εi , where the parameter vector of interest, β ∗ , is high dimensional butsufficiently sparse, and g ∗ is an unknown nuisance function. In spite of its simple form, this highdimensional partially linear model plays a crucial role in counterfactual studies of heterogeneoustreatment effects. In the first half of this talk, I present an inference procedure for any sub-vector (regardless of its dimension) of the high dimensional β ∗ . This method does not requirethe “beta-min” condition and also works when the vector of covariates, Zi , is high dimensional,provided that the function classes E(Xij |Zi )s and E(Yi |Zi ) belong to exhibit certain sparsityfeatures, e.g., a sparse additive decomposition structure. In the second half of this talk, I discussthe connections between semiparametric modeling and Rubin’s Causal Framework, as well asthe applications of various methods (including the one from the first half of this talk and thosefrom my other papers) in counterfactual studies that are enriched by “big data”.Abstract as a .pdf

Least squares estimation: beyond Gaussian regression models

Series
Job Candidate Talk
Time
Tuesday, December 5, 2017 - 11:00 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Qiyang HanUniversity of Washington
We study the convergence rate of the least squares estimator (LSE) in a regression model with possibly heavy-tailed errors. Despite its importance in practical applications, theoretical understanding of this problem has been limited. We first show that from a worst-case perspective, the convergence rate of the LSE in a general non-parametric regression model is given by the maximum of the Gaussian regression rate and the noise rate induced by the errors. In the more difficult statistical model where the errors only have a second moment, we further show that the sizes of the 'localized envelopes' of the model give a sharp interpolation for the convergence rate of the LSE between the worst-case rate and the (optimal) parametric rate. These results indicate both certain positive and negative aspects of the LSE as an estimation procedure in a heavy-tailed regression setting. The key technical innovation is a new multiplier inequality that sharply controls the size of the multiplier empirical process associated with the LSE, which also finds applications in shape-restricted and sparse linear regression problems.

Pages