Random Neural Networks with applications to Image Recovery
- Series
- Stochastics Seminar
- Time
- Thursday, April 11, 2019 - 15:05 for 1 hour (actually 50 minutes)
- Location
- Skiles 006
- Speaker
- Paul Hand – Northeastern University – p.hand@northeastern.edu
I will talk about the structure of large square random matrices with centered i.i.d. heavy-tailed entries (only two finite moments are assumed). In our previous work with R. Vershynin we have shown that the operator norm of such matrix A can be reduced to the optimal sqrt(n)-order with high probability by zeroing out a small submatrix of A, but did not describe the structure of this "bad" submatrix, nor provide a constructive way to find it. Now we can give a very simple description of this small "bad" subset: it is enough to zero out a small fraction of the rows and columns of A with largest L2 norms to bring its operator norm to the almost optimal sqrt(loglog(n)*n)-order, under additional assumption that the entries of A are symmetrically distributed. As a corollary, one can also obtain a constructive procedure to find a small submatrix of A that one can zero out to achieve the same regularization.
Im am planning to discuss some details of the proof, the main component of which is the development of techniques that extend constructive regularization approaches known for the Bernoulli matrices (from the works of Feige and Ofek, and Le, Levina and Vershynin) to the considerably broader class of heavy-tailed random matrices.
One of the most famous methods for solving large-scale over-determined linear systems is Kaczmarz algorithm, which iteratively projects the previous approximation x_k onto the solution spaces of the next equation in the system. An elegant proof of the exponential convergence of this method using correct randomization of the process is due to Strohmer and Vershynin (2009). Many extensions and generalizations of the method were proposed since then, including the works of Needell, Tropp, Ward, Srebro, Tan and many others. An interesting unifying view on a number of iterative solvers (including several versions of the Kaczmarz algorithm) was proposed by Gower and Richtarik in 2016. The main idea of their sketch-and-project framework is the following: one can observe that the random selection of a row (or a row block) can be represented as a sketch, that is, left multiplication by a random vector (or a matrix), thereby pre-processing every iteration of the method, which is represented by a projection onto the image of the sketch.
I will give an overview of some of these methods, and talk about the role that random matrix theory plays in the showing their convergence. I will also discuss our new results with Deanna Needell on the block Gaussian sketch and project method.
We identify principal component analysis (PCA) as an empirical risk minimization problem with respect to the reconstruction error and prove non-asymptotic upper bounds for the corresponding excess risk. These bounds unify and improve existing upper bounds from the literature. In particular, they give oracle inequalities under mild eigenvalue conditions. We also discuss how our results can be transferred to the subspace distance and, for instance, how our approach leads to a sharp $\sin \Theta$ theorem for empirical covariance operators. The proof is based on a novel contraction property, contrasting previous spectral perturbation approaches. This talk is based on joint works with Markus Reiß and Moritz Jirak.
Wiener-Hopf factorization (WHf) encompasses several important results in probability and stochastic processes, as well as in operator theory. The importance of the WHf stems not only from its theoretical appeal, manifested, in part, through probabilistic interpretation of analytical results, but also from its practical applications in a wide range of fields, such as fluctuation theory, insurance and finance. The various existing forms of the WHf for Markov chains, strong Markov processes, Levy processes, and Markov additive process, have been obtained only in the time-homogeneous case. However, there are abundant real life dynamical systems that are modeled in terms of time-inhomogenous processes, and yet the corresponding Wiener-Hopf factorization theory is not available for this important class of models. In this talk, I will first provide a survey on the development of Wiener-Hopf factorization for time-homogeneous Markov chains, Levy processes, and Markov additive processes. Then, I will discuss our recent work on WHf for time-inhomogensous Markov chains. To the best of our knowledge, this study is the first attempt to investigate the WHf for time-inhomogeneous Markov processes.