Reconstructing ancestral sequences in large trees

Mathematical Biology Seminar
Thursday, April 28, 2022 - 3:30pm for 1 hour (actually 50 minutes)
Skiles 006 and ONLINE
Brandon Legried – Southeast Center for Mathematics and Biology
Bo Lin

Please Note: Meeting link:

Statistical consistency in phylogenetics has traditionally referred to the accuracy of estimating mutation rates and phylogenies for a fixed number of species as we increase the amount of data within their signatures, such as DNA and protein sequences. Analyzing sequences undergoing indel mutations (insertions and deletions of sites) has provided a venue for understanding what power can be provided by a lot of data. In this talk, we discuss some of the failings of this approach. For instance, it will be shown that phylogeny estimation is impossible for infinitely long sequences, even with infinite data. This motivates a dual type of statistical consistency, where the number of species is taken to infinity rather than the size of each signature. Here, we give polynomial-time algorithms for ancestral sequence estimation and sequence alignment for reference phylogenies with so many species that they are sufficiently dense. Based on joint work with Louis Fan and Sebastien Roch.