Mathematical and Statistical Challenges on Large Discrete Structures

Job Candidate Talk
Wednesday, March 16, 2022 - 11:00am for 1 hour (actually 50 minutes)
Miklos Racz – Princeton University
Cheng Mao

From networks to genomics, large amounts of data are abundant and play critical roles in helping us understand complex systems. In many such settings, these data take the form of large discrete structures with important combinatorial properties. The interplay between structure and randomness in these systems presents unique mathematical and statistical challenges. In this talk I will highlight these through two vignettes: (1) inference problems on networks, and (2) DNA data storage.

First, I will discuss statistical inference problems on edge-correlated stochastic block models. We determine the information-theoretic threshold for exact recovery of the latent vertex correspondence between two correlated block models, a task known as graph matching. As an application, we show how one can exactly recover the latent communities using multiple correlated graphs in parameter regimes where it is information-theoretically impossible to do so using just a single graph. Furthermore, we obtain the precise threshold for exact community recovery using multiple correlated graphs, which captures the interplay between the community recovery and graph matching tasks. 

Next, I will give an overview of DNA data storage. Storing data in synthetic DNA is an exciting emerging technology which has the potential to revolutionize data storage. Realizing this goal requires innovation across a multidisciplinary pipeline. I will explain this pipeline, focusing on our work on statistical error correction algorithms and optimizing DNA synthesis, highlighting the intimate interplay between statistical foundations and practice.