Tropical Geometry for Biology
Lior Pachter and Bernd SturmfelsDepartment of Mathematics
U.C. Berkeley
Tropical arithmetic• Annotation is sequence labeling• Annotation is important for biology• Annotation is tropical arithmetic
Tropical geometry• Tree basics• Tree reconstruction is important for biology • Tree space is the tropical Grassmanian
Back to the data
What is annotation?
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
INPUT: ..t..r…o..p..i..c..a..a..l...g..e..e..t..r..y..OUTPUT: ..t..r…o..p..i..c..a..a..l...g..e..e..t..r..y..
Annotation is the labeling of the input sequence,in this case with 3 colors:
ome
TAAT ATGTCCACGG TTGTACACGGCA G GTATTGAGGTATTGAG ATGTAAC TGAA
Input: TAATATGTCCACGGGTATTGAGCATTGTACACGGGGTATTGAGCATGTAATGAA
Biology example: gene annotation
Output:
Leucine
x
y
z
Best annotation for TAAT is obtained by evaluating
Example: assign “scores”, say x,y,z to each color regardless of letter
Finding a good annotationwith tropical arithmetic
Tropical arithmetic• Annotation is sequence labeling• Annotation is important for biology• Annotation is tropical arithmetic
Tropical geometry• Tree basics• Tree reconstruction is important for biology • Tree space is the tropical Grassmanian
Back to the data
What is a phylogenetic X-tree?
In Darwin’s exampleX = {A,B,C,D,1}
Tree basics1 3
2 4
1 2
3 4
1 2
4 3
In general, the number of trees is the Schröder number(2n-5)!! = (2n-5)*(2n-7)*… 3*1
12
34
0.1
0.2
0.40.2
0.3
Data
Metrics and trees
[ dij ]Distance between species i and j
A primate tree from genome sequences
Tree space is the tropical Grassmanian
Example: X={1,2,3,4,5}
31
2
4 5
Back to the data
Alignment
Phylogeny
AnnotationMulti HMM Generalized HMM
Tree Markov models
GeneralizedMulti HMM
Evol. HMM Generalized hidden MarkovPhylogeny
Graphical Models
Final message: Tropical mathematics is important for comparative genomics.
For more on mathematics and tropical geometry (and combinatorics and algebra and statistics…):L. Pachter and B. Sturmfels, Tropical Geometry of Statistical Models, PNAS 101, 2004L. Pachter and B. Sturmfels, Parametric Inference for Biological Sequence Analysis, PNAS 101, 2004D. Speyer and B. Sturmfels, The Tropical Grassmanian, Advances in Geometry 4, 2004.L. Pachter and B. Sturmfels, Mathematics of Phylogenomics, arxiv math.ST/0409132, 2004.
and coming soon:
Book (to be published by Cambridge University Press)
Algebraic Statistics for Computational Biologyedited by Pachter and Sturmfels