## Seminars and Colloquia by Series

### Generalized Permutohedra from Probabilistic Graphical Models

Series
Mathematical Biology Seminar
Time
Wednesday, November 6, 2019 - 11:00 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Josephine YuGeorgia Tech

A graphical model encodes conditional independence relations among random variables. For an undirected graph these conditional independence relations are represented by a simple polytope known as the graph associahedron, which is a Minkowski sum of standard simplices. We prove that there are analogous polytopes for a much larger class of graphical models.   We construct this polytope as a Minkowski sum of matroid polytopes.  The motivation came from the problem of learning Bayesian networks from observational data.  No background on graphical models will be assumed for the talk.  This is a joint work with Fatemeh Mohammadi, Caroline Uhler, and Charles Wang.

### Likelihood challenges for big trees and networks

Series
Mathematical Biology Seminar
Time
Wednesday, October 30, 2019 - 11:00 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker

Usual statistical inference techniques for the tree of life like maximum likelihood and bayesian inference through Markov chain Monte Carlo (MCMC) have been widely used, but their performance declines as the datasets increase (in number of genes or number of species).

I will present two new approaches suitable for big data: one, importance sampling technique for bayesian inference of phylogenetic trees, and two, a pseudolikelihood method for inference of phylogenetic networks.

The proposed methods will allow scientists to include more species into the tree of life, and thus complete a broader picture of evolution.

### Go with the Flow: a parameterized approach to RNA transcript assembly

Series
Mathematical Biology Seminar
Time
Wednesday, October 23, 2019 - 11:00 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Blair Sullivan School of Computing, University of Utah

A central pervasive challenge in genomics is that RNA/DNA must be reconstructed from short, often noisy subsequences. In this talk, we describe a new digraph algorithm which enables this "assembly" when analyzing high-throughput transcriptomic sequencing data. Specifically, the Flow Decomposition problem on a directed ayclic graph asks for the smallest set of weighted paths that “cover” a flow (a weight function on the edges where the amount coming into any vertex is equal to the amount leaving). We describe a new linear-time algorithm solving *k*-Flow Decomposition, the variant where exactly *k* paths are used. Further, we discuss how we implemented and engineered a general Flow Decomposition solver based on this algorithm, and describe its performance on RNA-sequence data.  Crucially, our solver finds exact solutions while achieving runtimes competitive with a state-of-the-art heuristic, and we discuss the implications of our results on the original model selection for transcript assembly in this setting.

### Host metapopulation, disease epidemiology and host evolution

Series
Mathematical Biology Seminar
Time
Wednesday, October 16, 2019 - 11:00 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Jing JiaoNIMBioS - University of Tennessee

While most evolutionary studies of host-pathogen dynamics consider pathogen evolution alone or host-pathogen coevolution, for some diseases (e.g., White Nose syndrome in bats), there is evidence that hosts can sometimes evolve more rapidly than their pathogen. In this talk, we will discuss the spatial, temporal, and epidemiological factors may drive the evolutionary dynamics of the host population. We consider a simplified system of two host genotypes that trade off factors of disease robustness and spatial mobility or growth. For diseases that infect hosts for life, we find that migration and disease-driven mortality can have antagonistic effect on host densities when disease selection on hosts is low, but show synergy when selection is high. For diseases that allow hosts to recover with immunity, we explore the conditions under which the disease dies out, becomes endemic, or has periodic outbreaks, and show how these dynamics relate to the relative success of the robust and wild type hosts in the population over time. Overall, we will discuss how combinations of host spatial structure, demography, and epidemiology of infectious disease can significantly influence host evolution and disease prevalence. We will conclude with some profound implications for wildlife conservation and zoonotic disease control.

### Partially ordered Reeb graphs, tree decompositions, and phylogenetic networks

Series
Mathematical Biology Seminar
Time
Wednesday, October 9, 2019 - 11:00 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Anastasios StefanouMathematical Biosciences Institute, Ohio State University

Inspired by the interval decomposition of persistence modules and the extended Newick format of phylogenetic networks, we show that, inside the larger category of partially ordered Reeb graphs, every Reeb graph with n leaves and first Betti number s, is equal to a coproduct of at most 2s trees with (n + s) leaves. An implication of this result, is that Reeb graphs are fixed parameter tractable when the parameter is the first Betti number. We propose partially ordered Reeb graphs as a natural framework for modeling time consistent phylogenetic networks.  We define a notion of interleaving distance on partially ordered Reeb graphs which is analogous to the notion of interleaving distance for ordinary Reeb graphs. This suggests using the interleaving distance as a novel metric for time consistent phylogenetic networks.

### Clustering strings with mutations using an expectation-maximization algorithm

Series
Mathematical Biology Seminar
Time
Wednesday, October 2, 2019 - 11:00 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Afaf Saaidi Georgia Tech

An expectation-maximization (EM) algorithm is a powerful clustering method that was initially developed to fit Gaussian mixture distributions. In the absence of a particular probability density function, an EM algorithm aims to estimate the "best" function that maximizes the likelihood of data being generated by the model. We present an EM algorithm which addresses the problem of clustering "mutated" substrings of similar parent strings such that each substring is correctly assigned to its parent string. This problem is motivated by the process of simultaneously reading similar RNA sequences during which various substrings of the sequence are produced and could be mutated; that is, a substring may have some letters changed during the reading process. Because the original RNA sequences are similar, a substring is likely to be assigned to the wrong original sequence. We describe our EM algorithm and present a test on a simulated benchmark which shows that our method yields a better assignment of the substrings than what has been achieved by previous methods. We conclude by discussing how this assignment problem applies to RNA structure prediction.

### Insertions on Double Occurrence Words

Series
Mathematical Biology Seminar
Time
Wednesday, September 25, 2019 - 11:00 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Daniel CruzGeorgia Tech

A double occurrence word (DOW) is a word in which every symbol appears exactly twice; two DOWs are equivalent if one is a symbol-to-symbol image of the other. In the context of genomics, DOWs and operations on DOWs have been used in studies of DNA rearrangement. By modeling the DNA rearrangement process using DOWs, it was observed that over 95% of the scrambled genome of the ciliate Oxytricha trifallax could be described by iterative insertions of the repeat pattern'' and the return pattern''. These patterns generalize square and palindromic factors of DOWs, respectively. We introduce a notion of inserting repeat/return words into DOWs and study how two distinct insertions into the same word can produce equivalent DOWs. Given a DOW w, we characterize the structure of  w which allows two distinct insertions to yield equivalent DOWs. This characterization depends on the locations of the insertions and on the length of the inserted repeat/return words and implies that when one inserted word is a repeat word and the other is a return word, then both words must be trivial (i.e., have only one symbol). The characterization also introduces a method to generate families of words recursively.

### Species network inference under the coalescent model

Series
Mathematical Biology Seminar
Time
Wednesday, September 18, 2019 - 11:00 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Hector BanosGeorgia Tech

When hybridization plays a role in evolution, networks are necessary to describe species-level relationships. In this talk, we show that most topological features of a level-1 species network (networks with no interlocking cycles) are identifiable from gene tree topologies under the network multispecies coalescent model (NMSC). We also present the theory behind NANUQ, a new practical method for the inference of level-1 networks under the NMSC.

### The geometry of phylogenetic tree spaces

Series
Mathematical Biology Seminar
Time
Wednesday, September 11, 2019 - 11:00 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Bo Lin Georgia Tech

Phylogenetic trees  are  the fundamental  mathematical  representation  of evolutionary processes in biology. As data objects, they are characterized by the challenges associated with "big data," as well as the  complication that  their  discrete  geometric  structure  results  in  a  non-Euclidean phylogenetic  tree  space,  which  poses  computational  and   statistical limitations.

In this  talk, I  will compare  the geometric  and statistical  properties between a  well-studied framework  -  the BHV  space, and  an  alternative framework that  we  propose, which  is  based on  tropical  geometry.  Our framework exhibits analytic,  geometric, and  topological properties  that are desirable for  theoretical studies in  probability and statistics,  as well  as  increased  computational  efficiency.  I  also  demonstrate  our approach on an example of seasonal influenza data.

### Some combinatorics of RNA branching

Series
Mathematical Biology Seminar
Time
Wednesday, September 4, 2019 - 11:00 for 1 hour (actually 50 minutes)
Location
Skiles 006
Speaker
Christine HeitschGeorgia Tech

Understanding the folding of RNA sequences into three-dimensional structures is one of the fundamental challenges in molecular biology.  For example, the branching of an RNA secondary structure is an important molecular characteristic yet difficult to predict correctly.  However, recent results in geometric combinatorics (both theoretical and computational) yield new insights into the distribution of optimal branching configurations, and suggest new directions for improving prediction accuracy.