cdric notredame (22/02/2016) comparing two protein sequences cdric notredame

Download Cdric Notredame (22/02/2016) Comparing Two Protein Sequences Cdric Notredame

If you can't read please download the document

Upload: hollie-tyler

Post on 18-Jan-2018

253 views

Category:

Documents


0 download

DESCRIPTION

Cédric Notredame (22/02/2016) Outline -WHY Does It Make Sense To Compare Sequences -HOW Can we Align Two Sequences ? -HOW can I Search a Database ? -HOW Can we Compare Two Sequences ?

TRANSCRIPT

Cdric Notredame (22/02/2016) Comparing Two Protein Sequences Cdric Notredame Cdric Notredame (22/02/2016) Our Scope Pairwise Alignment methods are POWERFUL Pairwise Alignment methods are LIMITED If You Understand the LIMITS they Become VERY POWERFUL Look once Under the Hood Cdric Notredame (22/02/2016) Outline -WHY Does It Make Sense To Compare Sequences -HOW Can we Align Two Sequences ? -HOW can I Search a Database ? -HOW Can we Compare Two Sequences ? Cdric Notredame (22/02/2016) Why Does It Make Sense To Compare Sequences ? Sequence Evolution Cdric Notredame (22/02/2016) Why Do We Want To Compare Sequences wheat --DPNKPKRAMTSFVFFMSEFRSEFKQKHSKLKSIVEMVKAAGER | | |||||||| || | ||| ||| | |||| |||| ????? KKDSNAPKRAMTSFMFFSSDFRS----KHSDL-SIVEMSKAAGAA EXTRAPOLATE ?????? Homology? SwissProt Cdric Notredame (22/02/2016) Why Do We Want To Compare Sequences Cdric Notredame (22/02/2016) Why Does It Make Sense To Align Sequences ? -Evolution is our Real Tool. -Nature is LAZY and Keeps re-using Stuff. -Evolution is mostly DIVERGEANT Same Sequence Same Ancestor Cdric Notredame (22/02/2016) Why Does It Make Sense To Align Sequences ? Same Sequence Same Function Same 3D Fold Same Origin Many Counter-examples! Cdric Notredame (22/02/2016) Comparing Is Reconstructing Evolution Cdric Notredame (22/02/2016) An Alignment is a STORY ADKPKRPLSAYMLWLN ADKPKRPKPRLSAYMLWLN ADKPRRPLS-YMLWLN ADKPKRPLSAYMLWLN Mutations + Selection Cdric Notredame (22/02/2016) An Alignment is a STORY ADKPRRP---LS-YMLWLN ADKPKRPKPRLSAYMLWLN Mutation InsertionDeletion ADKPKRPLSAYMLWLN ADKPKRPKPRLSAYMLWLN ADKPRRPLS-YMLWLN ADKPKRPLSAYMLWLN Mutations + Selection Cdric Notredame (22/02/2016) Evolution is NOT Always Divergent AFGP with (ThrAlaAla)n Similar To Trypsynogen N AFGP with (ThrAlaAla)n S Chen et al, 97, PNAS, 94, NOT Similar to Trypsinogen Cdric Notredame (22/02/2016) Evolution is NOT Always Divergent AFGP with (ThrAlaAla)n Similar To Trypsynogen AFGP with (ThrAlaAla)n NOT Similar to Trypsinogen N S SIMILAR Sequences BUT DIFFERENT origin Cdric Notredame (22/02/2016) Evolution is NOT always Divergent But in MOST cases, you may assume it is Same Sequence Same Function Same 3D Fold Same Origin Similar Function DOES NOT REQUIRE Similar Sequence Historical Legacy Cdric Notredame (22/02/2016) How Do Sequences Evolve Each Portion of a Genome has its own Agenda. Cdric Notredame (22/02/2016) How Do Sequences Evolve ? CONSTRAINED Genome Positions Evolve SLOWLY EVERY Protein Family Has its Own Level Of Constraint FamilyK S K A Histone36.40 Insulin Interleukin I Globin Apolipoprot. AI Interferon G Rates in Substitutions/site/Billion Years as measured on Mouse Vs Human (80 Million years) Ks Synonymous Mutations, Ka Non-Neutral. Cdric Notredame (22/02/2016) Different molecular clocks for different proteins--another prediction Cdric Notredame (22/02/2016) G C L I V A F Aliphatic Aromatic Hydrophobic C How Do Sequences Evolve ? The amino Acids Venn Diagram To Make Things Worse, Every Residue has its Own Personality S T W Y Q H K R E DN Polar P G Small C Cdric Notredame (22/02/2016) How Do Sequences Evolve ? In a structure, each Amino Acid plays a Special Role OmpR, Cter Domain In the core, SIZE MATTERS On the surface, CHARGE MATTERS - - + Cdric Notredame (22/02/2016) How Do Sequences Evolve ? Accepted Mutations Depend on the Structure Big -> Big Small->Small NO DELETION Charged -> Charged Small Big or Small DELETIONS Cdric Notredame (22/02/2016) How Can We Compare Sequences ? Substitution Matrices Cdric Notredame (22/02/2016) How Can We Compare Sequences ? To Compare Two Sequences, We need: Their FunctionTheir Structure We Do Not Have Them !!! Cdric Notredame (22/02/2016) How Can We Compare Sequences ? We will Need To Replace Structural Information With Sequence Information. Same Sequence Same Function Same 3D Fold Same Origin It CANNOT Work ALL THE TIME !!! Cdric Notredame (22/02/2016) How Can We Compare Sequences ? To Compare Sequences, We need to Compare Residues We Need to Know How Much it COSTS to SUBSTITUTE an Alanine into an Isoleucine a Tryptophan into a Glycine The table that contains the costs for all the possible substitutions is called the SUBSTITUTION MATRIX How to derive that matrix? Cdric Notredame (22/02/2016) How Can We Compare Sequences ? G C L I V A F Aliphatic Aromatic Hydrophobic C S T W Y Q H K R E DN Polar P G Small C Using Knowledge Could Work But we do not know enough about Evolution and Structure. Using Data works better. Cdric Notredame (22/02/2016) How Can We Compare Sequences ? Making a Substitution Matrix -Take 100 nice pairs of Protein Sequences, easy to align (80% identical). -Align them -Count each mutations in the alignments -25 Tryptophans into phenylalanine -30 Isoleucine into Leucine -For each mutation, set the substitution score to the log odd ratio: Expected by chance Observed Log Cdric Notredame (22/02/2016) You re kidding! I was struck by a lightning twice too!! Garry Larson, The Far Side Cdric Notredame (22/02/2016) How Can We Compare Sequences ? Making a Substitution Matrix -Take 100 nice pairs of Protein Sequences, easy to align (80% identical). -Align them -Count each mutations in the alignments -25 Tryptophans into phenylalanine -30 Isoleucine into Leucine -For each mutation, set the substitution score to the log odd ratio: Expected by chance Observed Log Cdric Notredame (22/02/2016) How Can We Compare Sequences ? Making a Substitution Matrix The Diagonal Indicates How Conserved a residue tends to be. W is VERY Conserved Some Residues are Easier To mutate into other similar Cysteins that make disulfide bridges and those that do not get averaged Cdric Notredame (22/02/2016) How Can We Compare Sequences ? Making a Substitution Matrix Cdric Notredame (22/02/2016) How Can We Compare Sequences ? Using Substitution Matrix ADKPRRP---LS-YMLWLN ADKPKRPKPRLSAYMLWLN Mutation Insertion Deletion Given two Sequences and a substitution Matrix, We must Compute the CHEAPEST Alignment Cdric Notredame (22/02/2016) Most popular Subsitution Matrices PAM250 Blosum62 (Most widely used) Raw Score TPEA | | APGA TPEA | | APGA Score = 1= 9 Question: Is it possible to get such a good alignment by chance only? Scoring an Alignment Cdric Notredame (22/02/2016) Insertions and Deletions Gap Penalties Opening a gap is more expensive than extending it Seq AGARFIELDTHE----CAT ||||||||||| ||| Seq BGARFIELDTHELASTCAT Seq AGARFIELDTHE----CAT ||||||||||| ||| Seq BGARFIELDTHELASTCAT gap Gap Opening Penalty Gap Extension Penalty Cdric Notredame (22/02/2016) How Can We Compare Sequences ? Limits of the substitution Matrices They ignore non-local interactions and Assume that identical residues are equal They assume evolution rate to be constant ADKPKRPLSAYMLWLN ADKPKRPKPRLSAYMLWLN ADKPRRPLS-YMLWLN ADKPKRPLSAYMLWLN Mutations + Selection Cdric Notredame (22/02/2016) How Can We Compare Sequences ? Limits of the substitution Matrices Substitution Matrices Cannot Work !!! Cdric Notredame (22/02/2016) How Can We Compare Sequences ? Limits of the substitution Matrices I know But at least, could I get some idea of when they are likely to do all right Cdric Notredame (22/02/2016) How Can We Compare Sequences ? The Twilight Zone Length %Sequence Identity 100 Same 3D Fold Twilight Zone Similar Sequence Similar Structure 30% Different Sequence Structure ???? 30 Cdric Notredame (22/02/2016) How Can We Compare Sequences ? The Twilight Zone Substitution Matrices Work Reasonably Well on Sequences that have more than 30 % identity over more than 100 residues Cdric Notredame (22/02/2016) How Can We Compare Sequences ? Which Matrix Shall I used The Initial PAM matrix was computed on 80% similar Proteins It been extrapolated to more distantly related sequences. Pam 250 Pam 350 Other Matrices Exist: BLOSUM 42 BLOSUM 62 Cdric Notredame (22/02/2016) How Can We Compare Sequences ? Which Matrix Shall I use PAM: Distant Proteins High Index (PAM 350) BLOSUM: Distant Proteins Low Index (Blosum30) GONNET 250> BLOSUM62>PAM 250. But This will depend on: The Family. The Program Used and Its Tuning. Choosing The Right Matrix may be Tricky Insertions, Deletions? Cdric Notredame (22/02/2016) Dot Matrices Global Alignments Local Alignment HOW Can we Align Two Sequences ? Cdric Notredame (22/02/2016) Dot Matrices QUESTION What are the elements shared by two sequences ? Cdric Notredame (22/02/2016) Dot Matrices >Seq1 THEFATCAT >Seq2 THELASTCAT THEFATCAT T H E F A S T C A T Window Stringency Cdric Notredame (22/02/2016) Dot Matrices Sequences Window size Stringency Cdric Notredame (22/02/2016) Dot Matrices Strigency Window=1 Stringency=1 Window=11 Stringency=7 Window=25 Stringency=15 Cdric Notredame (22/02/2016) Dot Matrices x y x y x Cdric Notredame (22/02/2016) Dot Matrices Cdric Notredame (22/02/2016) Dot Matrices Cdric Notredame (22/02/2016) Dot Matrices Cdric Notredame (22/02/2016) Dot Matrices Cdric Notredame (22/02/2016) Dot Matrices Limits -Visual aid -Best Way to EXPLORE the Sequence Organisation -Does NOT provide us with an ALIGNMENT wheat --DPNKPKRAMTSFVFFMSEFRSEFKQKHSKLKSIVEMVKAAGER | | |||||||| || | ||| ||| | |||| |||| ????? KKDSNAPKRAMTSFMFFSSDFRS----KHSDL-SIVEMSKAAGAA Cdric Notredame (22/02/2016) Cost L Afine Gap Penalty Global Alignments -Take 2 Nice Protein Sequences -A good Substitution Matrix (blosum) -A Gap opening Penalty (GOP) -A Gap extension Penalty (GEP) GOP GEP GOP Parsimony: Evolution takes the simplest path (So We Think) Cdric Notredame (22/02/2016) Insertions and Deletions Gap Penalties Opening a gap is more expensive than extending it Seq AGARFIELDTHE----CAT ||||||||||| ||| Seq BGARFIELDTHELASTCAT Seq AGARFIELDTHE----CAT ||||||||||| ||| Seq BGARFIELDTHELASTCAT gap Gap Opening Penalty Gap Extension Penalty Cdric Notredame (22/02/2016) Global Alignments -Take 2 Nice Protein Sequences -A good Substitution Matrix (blosum) -A Gap opening Penalty (GOP) -A Gap extension Penalty (GEP) >Seq1 THEFATCAT >Seq2 THEFASTCAT -DYNAMIC PROGRAMMING DYNAMIC PROGRAMMING THEFA-TCAT THEFASTCAT Cdric Notredame (22/02/2016) Global Alignments F A S T F A T ----FAT FAST--- (L1+l2)! (L1)!*(L2)! ---FAT- FAST--- --F-AT- FAST--- Brute Force Enumeration 2 () DYNAMIC PROGRAMMING Cdric Notredame (22/02/2016) Global Alignments DYNAMIC PROGRAMMING Match=1MisMatch=-1Gap=-1 F A T FAST Dynamic Programming (Needlman and Wunsch) F A T FAST F A T FAST FAST FA-T Cdric Notredame (22/02/2016) Global Alignments DYNAMIC PROGRAMMING Global Alignments are very sensitive to gap Penalties GOP GEP Cdric Notredame (22/02/2016) Global Alignments DYNAMIC PROGRAMMING Global Alignments are very sensitive to gap Penalties Global Alignments do not take into account the MODULAR nature of Proteins C: K vitamin dep. Ca Binding K: Kringle Domain G: Growth Factor module F: Finger Module Cdric Notredame (22/02/2016) Local Alignments GLOBAL AlignmentLOCAL Alignment Smith And Waterman (SW)=LOCAL Alignment Cdric Notredame (22/02/2016) Local Alignments We now have a PairWise Comparison Algorithm, We are ready to search Databases Cdric Notredame (22/02/2016) Database Search 1.10e e e e e QUERRY Comparison Engine Database E-values How many time do we expect such an Alignment by chance? SW Q Cdric Notredame (22/02/2016) CONCLUSION Cdric Notredame (22/02/2016) -There is a relation between Sequence and Structure. The Easiest way to Compare Two Sequences is a dotplot. Sequence Comparison -Thanks to evolution, We CAN compare Sequences -Substitution matrices only work well with similar Sequences (More than 30% id). Cdric Notredame (22/02/2016) A few Addresses Cdric Notredame (22/02/2016)