Different behavior of the LCS depending on the number of colors

Stochastics Seminar
Thursday, September 4, 2008 - 3:00pm for 1 hour (actually 50 minutes)
Skiles 269
Heinrich Matzinger – School of Mathematics, Georgia Tech
Heinrich Matzinger
A common subsequence of two sequences X and Y is a sequence which is a subsequence of X as well as a subsequence of Y. A Longest Common Subsequence (LCS) of X and Y is a common subsequence with maximal length. Longest Common subsequences can be represented as alignments with gaps where the aligned letter pairs corresponds to the letters in the LCS. We consider two independent i.i.d.  binary texts X and Y of length n. We show that the behavior of the the alignment corresponding to the LCS is very different depending on the number of colors.  With 2-colors, long blocks tend to be aligned with no gaps, whilst for four or more colors the opposite is true. Let Ln denote the length of the LCS of X and Y.  In general the order of the variance of Ln is not known. We explain how a biased affect of a finite pattern can influence the order of the fluctuation of Ln.