Seminars and Colloquia by Series

Sparse Signal Detection with Binary Outcomes

Series
Job Candidate Talk
Time
Thursday, February 23, 2017 - 11:00 for 1 hour (actually 50 minutes)
Location
Skiles 005
Speaker
Rajarshi MukherjeeDepartment of Statistics, Stanford University
In this talk, I will discuss some examples of sparse signal detection problems in the context of binary outcomes. These will be motivated by examples from next generation sequencing association studies, understanding heterogeneities in large scale networks, and exploring opinion distributions over networks. Moreover, these examples will serve as templates to explore interesting phase transitions present in such studies. In particular, these phase transitions will be aimed at revealing a difference between studies with possibly dependent binary outcomes and Gaussian outcomes. The theoretical developments will be further complemented with numerical results.

Probabilistic methods for pathogen and copy number evolution

Series
Job Candidate Talk
Time
Tuesday, January 24, 2017 - 10:00 for 1 hour (actually 50 minutes)
Location
Skiles 005
Speaker
Shishi LuoUC Berkeley
Biology is becoming increasingly quantitative, with large genomic datasets being curated at a rapid rate. Sound mathematical modeling as well as data science approaches are both needed to take advantage of these newly available datasets. I will describe two projects that span these approaches. The first is a Markov chain model of naturalselection acting at two scales, motivated by the virulence-transmission tradeoff from pathogen evolution. This stochastic model, under a natural scaling, converges to a nonlinear deterministic system for which we can analytically derive steady-state behavior. This analysis, along with simulations, leads to general properties of selection at two scales. The second project is a bioinformatics pipeline that identifies gene copy number variants, currently a difficult problem in modern genomics. This quantificationof copy number variation in turn generates new mathematical questionsthat require the type of probabilistic modelling used in the first project.

Multiscale adaptive approximations to data and functions near low-dimensional sets

Series
Job Candidate Talk
Time
Thursday, January 19, 2017 - 14:00 for 1 hour (actually 50 minutes)
Location
Skiles 005
Speaker
Wenjing LiaoJohns Hopkins University
High-dimensional data arise in many fields of contemporary science and introduce new challenges in statistical learning due to the well-known curse of dimensionality. Many data sets in image analysis and signal processing are in a high-dimensional space but exhibit a low-dimensional structure. We are interested in building efficient representations of these data for the purpose of compression and inference, and giving performance guarantees that are only cursed by the intrinsic dimension of data. Specifically, in the setting where a data set in $R^D$ consists of samples from a probability measure concentrated on or near an unknown $d$-dimensional manifold with $d$ much smaller than $D$, we consider two sets of problems: low-dimensional geometric approximation to the manifold and regression of a function on the manifold. In the first case we construct multiscale low-dimensional empirical approximations to the manifold and give finite-sample performance guarantees. In the second case we exploit these empirical geometric approximations of the manifold to construct multiscale approximations to the function. We prove finite-sample guarantees showing that we attain the same learning rates as if the function was defined on a Euclidean domain of dimension $d$. In both cases our approximations can adapt to the regularity of the manifold or the function even when this varies at different scales or locations. All algorithms have complexity $C n\log (n)$ where $n$ is the number of samples, and the constant $C$ is linear in $D$ and exponential in $d$.

Computational Concerns in Statistical Inference and Learning for Network Data Analysis

Series
Job Candidate Talk
Time
Thursday, January 12, 2017 - 11:05 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Tengyuan LiangUniversity of Pennsylvania
Network data analysis has wide applications in computational social science, computational biology, online social media, and data visualization. For many of these network inference questions, the brute-force (yet statistically optimal) methods involve combinatorial optimization, which is computationally prohibitive when faced with large scale networks. Therefore, it is important to understand the effect on statistical inference when focusing on computationally tractable methods. In this talk, we will discuss three closely related statistical models for different network inference problems. These models answer inference questions on cliques, communities, and ties, respectively. For each particular model, we will describe the statistical model, propose new computationally efficient algorithms, and study the theoretical properties and numerical performance of the algorithms. Further, we will quantify the computational optimality through describing the intrinsic barrier for certain efficient algorithm classes, and investigate the computational-to-statistical gap theoretically. A key feature shared by our studies is that, as the parameters of the model changes, the problems exhibit different phases of computational difficulty.

Asymptotic equivalence between density estimation and Gaussian white noise revisited

Series
Job Candidate Talk
Time
Thursday, December 1, 2016 - 15:05 for 1 hour (actually 50 minutes)
Location
Skiles 005
Speaker
Kolyan RayLeiden Univ.
Asymptotic equivalence between two statistical models means that they have the same asymptotic (large sample) properties with respect to all decision problems with bounded loss. In nonparametric (infinite-dimensional) statistical models, asymptotic equivalence has been found to be useful since it can allow one to derive certain results by studying simpler models. One of the key results in this area is Nussbaum’s theorem, which states that nonparametric density estimation is asymptotically equivalent to a Gaussian shift model, provided that the densities are smooth enough and uniformly bounded away from zero.We will review the notion of asymptotic equivalence and existing results, before presenting recent work on the extent to which one can relax the assumption of being bounded away from zero. We further derive the optimal (Le Cam) distance between these models, which quantifies how close they are for finite-samples. As an application, we also consider Poisson intensity estimation with low count data. This is joint work with Johannes Schmidt-Hieber.

Birkhoff conjecture and spectral rigidity of planar convex domains.

Series
Job Candidate Talk
Time
Wednesday, January 27, 2016 - 11:00 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Jacopo De SimoiParis Diderot University
Dynamical billiards constitute a very natural class of Hamiltonian systems: in 1927 George Birkhoff conjectured that, among all billiards inside smooth planar convex domains, only billiards in ellipses are integrable. In this talk we will prove a version of this conjecture for convex domains that are sufficiently close to an ellipse of small eccentricity. We will also describe some remarkable relation with inverse spectral theory and spectral rigidity of planar convex domains. Our techniques can in fact be fruitfully adapted to prove spectral rigidity among generic (finitely) smooth axially symmetric domains which are sufficiently close to a circle. This gives a partial answer to a question by P. Sarnak.

Hybrid simulation methods: simulating the world around you

Series
Job Candidate Talk
Time
Thursday, January 21, 2016 - 11:05 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Craig SchroederUCLA
Hybrid particle/grid numerical methods have been around for a long time, andtheir usage is common in some fields, from plasma physics to artist-directedfluids. I will explore the use of hybrid methods to simulate many differentcomplex phenomena occurring all around you, from wine to shaving foam and fromsand to the snow in Disney's Frozen. I will also talk about some of thepractical advantages and disadvantages of hybrid methods and how one of theweaknesses that has long plagued them can now be fixed.

A General Framework for High-Dimensional Inference and Multiple Testing

Series
Job Candidate Talk
Time
Tuesday, January 19, 2016 - 15:05 for 1 hour (actually 50 minutes)
Location
Skiles 005
Speaker
Yang NingPrinceton University
We consider the problem of how to control the measures of false scientific discoveries in high-dimensional models. Towards this goal, we focus on the uncertainty assessment for low dimensional components in high-dimensional models. Specifically, we propose a novel decorrelated likelihood based framework to obtain valid p-values for generic penalized M-estimators. Unlike most existing inferential methods which are tailored for individual models, our method provides a general framework for high-dimensional inference and is applicable to a wide variety of applications, including generalized linear models, graphical models, classifications and survival analysis. The proposed method provides optimal tests and confidence intervals. The extensions to general estimating equations are discussed. Finally, we show that the p-values can be combined to control the false discovery rate in multiple hypothesis testing.

Adjacency Spectral Embedding for Random Graphs

Series
Job Candidate Talk
Time
Friday, January 15, 2016 - 11:05 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Daniel SussmanDepartment of Statistics, Harward University
The eigendecomposition of an adjacency matrix provides a way to embed a graph as points in finite dimensional Euclidean space. This embedding allows the full arsenal of statistical and machine learning methodology for multivariate Euclidean data to be deployed for graph inference. Our work analyzes this embedding, a graph version of principal component analysis, in the context of various random graph models with a focus on the impact for subsequent inference. We show that for a particular model this embedding yields a consistent estimate of its parameters and that these estimates can be used to accurately perform a variety of inference tasks including vertex clustering, vertex classification as well as estimation and hypothesis testing about the parameters.

Bootstrap confidence sets under model misspecification

Series
Job Candidate Talk
Time
Friday, November 20, 2015 - 11:00 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Mayya ZhilovaWeierstrass Institute
Bootstrap is one of the most powerful and common tools in statistical inference. In this talk a multiplier bootstrap procedure is considered for construction of likelihood-based confidence sets. Theoretical results justify the bootstrap validity for a small or moderate sample size and allow to control the impact of the parameter dimension p: the bootstrap approximation works if p^3/n is small, where n is a sample size. The main result about bootstrap validity continues to apply even if the underlying parametric model is misspecified under a so-called small modelling bias condition. In the case when the true model deviates significantly from the considered parametric family, the bootstrap procedure is still applicable but it becomes conservative: the size of the constructed confidence sets is increased by the modelling bias. The approach is also extended to the problem of simultaneous confidence estimation. A simultaneous multiplier bootstrap procedure is justified for the case of exponentially large number of models. Numerical experiments for misspecified regression models nicely confirm our theoretical results.

Pages