Data-driven methods in protein engineering: new ways to utilize sequence and structures of proteins

Series: Mathematical Biology Seminar
Time: Wednesday, October 22, 2008 - 11:00am for 1 hour (actually 50 minutes)
Location: Skiles 255
Speaker: Andy Bommarius – School of Chemistry &amp; Biochemistry, Georgia Tech
Organizer: Christine Heitsch

After rational protein design and combinatorial protein engineering (directed evolution), data-driven protein engineering emerges as a third generation of techniques for improving protein properties. Data-driven protein engineering relies heavily on the use of mathematical algorithms. In the first example, we developed a method for predicting the positions in the amino acid sequence that are critical for the catalytic activity of a protein. With nucleotide sequences of both functional and non-functional variants and a Support Vector Machine (SVM) learning algorithm, we set out to narrow the interesting sequence space of proteins, i.e. find the truly relevant positions. Variants of TEM-1 β-lactamase were created in silico using simulations of both mutagenesis and recombination protocols. The algorithm was shown to be able to predict critical positions that can tolerate up to two amino acids. Pairs of amino acid residues are known to lead to inactive sequences, unless mutated jointly. In the second example, we combine SVM, Boolean learning (BL), and the combination of the two, BLSVM, to find such interactive residues. Results on interactive residues in two fluorescent proteins, Discosoma Red Fluorescent Protein (Ds-Red) and monomeric Red Fluorescent Protein (mRFP), will be presented.

Georgia Institute of Technology College of Sciences

Search form

Data-driven methods in protein engineering: new ways to utilize sequence and structures of proteins