p rotein seondary & super-secondary structure prediction with hmm by en-shiun annie lee
DESCRIPTION
P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee CS 882 Protein Folding Instructed by Professor Ming Li. 0 . OUTLINE. Introduction Problem Methods (4) HMM Examples (3) Segmentation HMM Profile HMM Conditional Random Field Proposal. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/1.jpg)
PROTEIN SEONDARY & SUPER-SECONDARY
STRUCTURE PREDICTION WITH HMM
By En-Shiun Annie LeeCS 882 Protein Folding
Instructed by Professor Ming Li
![Page 2: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/2.jpg)
1. Introduction2. Problem3. Methods (4)4. HMM Examples (3)
a. Segmentation HMMb. Profile HMMc. Conditional Random Field
5. Proposal
0. OUTLINE
![Page 3: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/3.jpg)
1. Introduction *2. Problem3. Methods (4)4. HMM Examples (3)
a. Segmentation HMMb. Profile HMMc. Conditional Random Field
5. Proposal
1. INTRODUCTION
![Page 4: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/4.jpg)
• Achievements in Genomic– BLAST
(Basic Local Alignment Search Tool) • most cited paper published in 1990s• more than 15,000 times
– Human genome project• Completion April 2003
1. Genomics
![Page 5: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/5.jpg)
• Precedence to Proteomics– Protein Data Bank (PDB)
• 40,132 structures• cited more than 6,000 times
1. Proteomics
![Page 6: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/6.jpg)
1. ProteomicsNumber of Protein Structures in Protein Data Bank
![Page 7: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/7.jpg)
• Importance– The known secondary structure may be used as an input for the
tertiary structure predictions.
1. Secondary Structure
![Page 8: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/8.jpg)
• Primary Structure1. Protein Structure
![Page 9: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/9.jpg)
• Secondary Structure1. Protein Structure
![Page 10: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/10.jpg)
1. Secondary Structure• α-helix
– Interaction between i and (i+4)th residue
![Page 11: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/11.jpg)
1. Secondary Structure• β-sheet/strand
– Parallel or Anti-parallel
![Page 12: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/12.jpg)
1. Secondary Structure• Coil (loop)
![Page 13: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/13.jpg)
• Tertiary Structure1. Protein Structure
![Page 14: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/14.jpg)
• Super-Secondary (2.5) Structure1. Protein Structure
Super-Secondary (2.5)
Structure
![Page 15: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/15.jpg)
• Quaternary Structure1. Protein Structure
Super-Secondary (2.5)
Structure
![Page 16: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/16.jpg)
1. Introduction2. Problem *3. Methods (4)4. HMM Examples (3)
a. Segmentation HMMb. Profile HMMc. Conditional Random Field
5. Proposal
2. PROBLEM
![Page 17: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/17.jpg)
• Problem– Given:
• A primary sequence of amino acids– a1a2…an
– Find: • Secondary structure of each ai as
– α-helix = H– β-strand = E *– coil = C
2. Secondary Structure
![Page 18: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/18.jpg)
• Example– Given:
• Primary Sequence– GHWIATRGQLIREAYEDYRHFSSECPFIP
– Find:• Secondary Structure Element
– CEEEEECHHHHHHHHHHHCCCHHCCCCCC– Note: segments
2. Secondary Structure
![Page 19: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/19.jpg)
• Three-state prediction accuracy– Q3 = # of correctly predicted residues
total # of number of residues– Q, Qβ, Qc
– Q3 for random prediction is 33%– Theoretical limit Q3=90%.
2. Prediction Quality
![Page 20: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/20.jpg)
• Segment Overlap (SOV)– Higher penalties for core segment regions
• Matthews Correlation Coefficients (MCC)– Prediction errors made for each state
2. Prediction Quality
![Page 21: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/21.jpg)
• Three dimensional PDB data– DSSP (Dictionary of Secondary Structure of Proteins)
• 8 states– H = alpha helix H– G = 310 - helix H– I = 5 helix (pi helix) H– E = extended strand (beta ladder) E– B = residue in isolated beta-bridge E– T = hydrogen bonded turn C– S = bend C– C = coil C
– STRIDE
2. True Structures
![Page 22: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/22.jpg)
1. Introduction2. Problem3. Methods (4) *4. HMM Examples (3)
a. Segmentation HMMb. Profile HMMc. Conditional Random Field
5. Proposal
3. METHODS
![Page 23: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/23.jpg)
• Sliding-Window3. Sliding Window
![Page 24: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/24.jpg)
• Sliding-Window3. Sliding Window
![Page 25: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/25.jpg)
• Sliding-Window3. Sliding Window
![Page 26: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/26.jpg)
• Sliding-Window3. Sliding Window
![Page 27: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/27.jpg)
a. Statistical Methodb. Neural Networkc. Support Vector Machined. Hidden Markov Model
3. Four Methods
![Page 28: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/28.jpg)
• Propensity
• Ex. Chou-Fasman 50~53%
3a. Statistical Method
![Page 29: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/29.jpg)
• Ex. PHD 71%
3b. Neural Network
![Page 30: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/30.jpg)
• Ex. PSIPRED 76~78%
3c. SVM
![Page 31: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/31.jpg)
• State set Q• Output alphabet Σ
3d. HMM Definition
![Page 32: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/32.jpg)
• Transition probabilities – probability of entering the state p from state q– Tq(p)
q Q p Q
3d. HMM Definition
![Page 33: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/33.jpg)
• Emission probabilities – probability emits each letter of Σ from state q– Eq(ai)
ai Σ q Q
3d. HMM Definition
![Page 34: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/34.jpg)
• Problem– Given:
• HMM = (Q,Σ,E,T) and• Sequence S
– Where S = S1, S2, …, Sn
– Find:• Most probable path of state gone through to get S
– Where X = X1, X2, …, Xn = state sequence
3d. HMM Decoding
![Page 35: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/35.jpg)
• Optimize– Pr [ S , X ]
• X = X1, X2, …, Xn = state sequence• S = S1, S2, …, Sn
– Pr [ S | X ]
4. HMM Decoding
![Page 36: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/36.jpg)
• Dynamic programming– Memoryless– Pr [Xn|Sn] = Pr [Xn-1|Sn-1] Tn-1[Xn] EXn [Sn]
4. HMM Decoding
![Page 37: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/37.jpg)
1. Introduction2. Problem3. Methods (4)4. HMM Examples (3) *
a. Segmentation HMMb. Profile HMMc. Conditional Random Field
5. Proposal
4. HMM EXAMPLES
![Page 38: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/38.jpg)
1. Introduction2. Problem3. Methods (4)4. HMM Examples (3)
a. Semi-HMM *b. Profile HMMc. Conditional Random Field
5. Proposal
4a. SEMI-HMM
![Page 39: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/39.jpg)
• Definition– Each state can emit a sequence– Move emission probabilities into states– Model secondary structure segments
4a. Semi-HMM
![Page 40: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/40.jpg)
• Sequence Segments4a. Segmentation
![Page 41: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/41.jpg)
• Sequence Segments4a. Segmentation
![Page 42: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/42.jpg)
• Sequence Segments4a. Segmentation
• T = secondary structural type of the segment, {H, E, L}
• S = ends of each individual structural segments
• R = known amino acid sequence
![Page 43: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/43.jpg)
• Sequence Segments4a. Segmentation
• T2 = E = β-strand• S2 = 9• R2 = S1 + 1 : S2
![Page 44: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/44.jpg)
• R = Sequence of ALL amino acid residues• S = End of the segments • T = Secondary structural type of the segments
– {H, E, L}
4a. Bayesian• Bayesian Formulation
![Page 45: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/45.jpg)
1. Likelihood2. Priori Probability3. Constant (S,T) dropped
4a. Bayesian
• Bayesian Formulation
![Page 46: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/46.jpg)
• m = Total number of segments• Sj = End of the jth segments• Tj = Secondary structural type of the jth segments
4a. Bayesian Likelihood
![Page 47: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/47.jpg)
4a. Bayesian Likelihood
![Page 48: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/48.jpg)
4a. Bayesian Likelihood
![Page 49: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/49.jpg)
4a. Bayesian Likelihood
N-terminus
Internal
C-terminus
![Page 50: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/50.jpg)
4a. BSPPS• Bayesian Segmentation PPS
![Page 51: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/51.jpg)
4a. BSPPS• Bayesian Segmentation PPS
![Page 52: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/52.jpg)
4a. Results• Better than PSIPRED
– (w/o homology information)
![Page 53: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/53.jpg)
4a. Results• Better than PSIPRED
– (w/o homology information)
![Page 54: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/54.jpg)
1. Introduction2. Problem3. Methods (4)4. HMM Examples (3)
a. Semi-HMMb. Profile HMM *c. Conditional Random Field
5. Proposal
4b. PROFILE-HMM
![Page 55: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/55.jpg)
• Main States– Columns of alignment
4b. Profile HMM
![Page 56: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/56.jpg)
• Insertion States4b. Profile HMM
![Page 57: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/57.jpg)
• Deletion States– Jump over 1+ column in alignment
4b. Profile HMM
![Page 58: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/58.jpg)
• Combined4b. Profile HMM
![Page 59: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/59.jpg)
• HMM for local protein STRucture4b. HMMSTR
![Page 60: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/60.jpg)
• HMM for local protein STRucture• Pronounced “hamster”
4b. HMMSTR
![Page 61: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/61.jpg)
• I-sites Library – Motif = short basic structural fragments
• 3~19 residues• 262 motifs• Highly predictable
– Non-redundant PDB data (<25% similarity)– Fold uniquely across protein family– Exhaustive motif clustering
4b. I-Site Library
![Page 62: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/62.jpg)
• States– Amino acid sequence and – Structural attribute
• Transition from state– Adjacent positions in motif– No gap or insertion states
4b. Build HMM
![Page 63: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/63.jpg)
• Emission probability distributions– b = observed amino acid
• (20 probability values)– d = secondary structure
• (helix, strand, loop)– r = backbone angle region
• (11 dihedral angle symbols)– c = structural context descriptor
• (10 context symbols)
4b. Build HMM
![Page 64: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/64.jpg)
• Model I-site Library– Each 262 motif is a chain in HMM– Merge states base on similarity of
• Sequence• Structure
4b. Build HMM
![Page 65: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/65.jpg)
• Model I-site Library• Merge states
– base on similarity of• Sequence• Structure
4b. Build HMM
![Page 66: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/66.jpg)
• Ex. β-Hairpin4b. HMMSTR Merge
Serine β-Hairpin Type-I β-Hairpin
![Page 67: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/67.jpg)
• Ex. β-Hairpin4b. HMMSTR Merge
Serine β-Hairpin Type-I β-Hairpin
![Page 68: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/68.jpg)
• Ex. β-Hairpin4b. HMMSTR Merge
![Page 69: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/69.jpg)
• Ex. β-Hairpin4b. HMMSTR Merge
![Page 70: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/70.jpg)
• Input: PDB proteins• Find
– best state sequence for sequence– probability distribution of one amino acid
• Integrate 3 data set– Aligned probability distribution– Amino acid and context information– Contact map
4b. HMMSTR Training
![Page 71: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/71.jpg)
4b. HMMSTR Summary• 282 nodes
• 317 transitions• 31 merged motifs
![Page 72: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/72.jpg)
• Introduce structural context on level of super-secondary structure• Predict higher-order 3D tertiary structure
– Side-result = predict 1D secondary structure
4b. HMMSTR Summary
![Page 73: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/73.jpg)
1. Introduction2. Problem3. Methods (4)4. HMM Examples (3)
a. Semi-HMMb. Profile HMMc. Conditional Random Field *
5. Proposal
4b. PROFILE-HMM
![Page 74: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/74.jpg)
• Does not model– Multiple interacting features– Long-range dependencies
• Strict independence assumptions
4c. HMM Disadvantages
![Page 75: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/75.jpg)
• Allow– Arbitrary features– Non-independent features
• Transition probability– With respect to past and future observations
4c. Conditional Model
![Page 76: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/76.jpg)
4c. Conditional Model
y1
x1
y2
x2
y3
x3
y4
x4
y5
x5
y6
x6
…HMM
y1
x1
y2
x2
y3
x3
y4
x4
y5
x5
y6
x6
…CRF
![Page 77: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/77.jpg)
• Random Field (Undirected graphical model)– Let G = (Y, E) be a graph
• Where each vertex Yv = a random variable– If P(Yv|all other Y)= P(Yv|neighbours of Yv)
Then Y is a random field
4c. Random Field
![Page 78: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/78.jpg)
• Example:– P(Y5 | all other Y) = P(Y5 | Y4, Y6)
4c. Random Field
![Page 79: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/79.jpg)
• Conditional Random Field– Let X = r.v. data sequences to be labeled
• observations– Let Y = r.v. corresponding label sequences
• labels– Let G = (V, E) be a graph
• S.t. Y = (Yv)vY so Y is indexed by vertices of G– If P(Yv | X, Yw w≠v) = P(Yv | X, Yw, w~v)
Then (X, Y) is a random field
4c. Conditional RF
![Page 80: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/80.jpg)
• Example:– P(Y3 | X, all other Y) = P(Y3 | X, Y2, Y4)
4c. Conditional RF
![Page 81: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/81.jpg)
• HMM: – Maximize P(x,y|θ)=P(y|x,θ)P(x|θ)– Transition and emission probabilities– Transition/emission base only one x
• CRF: – Maximize P(y|x,θ)– Feature function f(i, j, k) – Feature function base on all x
4c. HMM vs. CRF
![Page 82: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/82.jpg)
4c. Beta-Wrap• β-Helix
– 3 parallel β-strands– Connected by coils
• Few solved structures– 9 SCOP SuperFamilies– 14 RH solved structures in PDB – Solved structures differ widely
![Page 83: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/83.jpg)
• Let G = (V,E1,E2) be a graph– V = Nodes/States = Secondary structures– Edges = interactions
• E1– Edges between adjacent neighbors– Implied in the model
• E2– Edges for long-term interactions– Explicitly considered
4c. Graph Definition
![Page 84: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/84.jpg)
• Simple Example:– S2 = first β-strand – S3 = coil– S4 = second β-strand – S5 = coil– S6 = -helix
4c. Beta-Wrap Example
![Page 85: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/85.jpg)
4c. Beta-Wrap• β-Helix Solution:
![Page 86: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/86.jpg)
1. Introduction2. Problem3. Methods (4)4. HMM Examples (3)
a. Segmentation HMMb. Profile HMMc. Conditional Random Field
5. Proposal *
5. PROPOSAL
![Page 87: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/87.jpg)
• Do not infer global interaction– i.e. Beta-sheet interactions
• Protein structure definition constraint
5. Difficulties
![Page 88: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/88.jpg)
• Novel methods of secondary structure prediction– Model as Integer Programming
• Super-secondary structure prediction
5. Possible Future Work
![Page 89: P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee](https://reader035.vdocument.in/reader035/viewer/2022062411/56813a87550346895da2840b/html5/thumbnails/89.jpg)
• Professor Ming Li– Guidance in
• knowledge and • expertise
• Bioinformatics lab• Mentoring a “rookie”
• Class• Attention and listening
5.
Acknowledgement