lecture 11. rna secondary structure prediction the chinese university of hong kong csci3220...
TRANSCRIPT
Lecture 11. RNA Secondary Structure Prediction
The Chinese University of Hong KongCSCI3220 Algorithms for Bioinformatics
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 2
Lecture outline1. From sequences to functions2. RNA secondary structures
Last update: 21-Nov-2015
FROM SEQUENCES TO FUNCTIONSPart 1
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 4
From sequences to functions• One of the biggest questions in biology: Can
one tell the function of a molecule (DNA/RNA/protein) from its sequence alone?– Sometimes, but usually not– Easier if we also know the structure– Common believe:
sequence structure function– Of course, also depends on the environment
Last update: 21-Nov-2015
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 5
Molecular structures• Four levels:– Primary structures
• The sequence
– Secondary structures• First formed• Local
– Tertiary structures• Global• “Folds”, “domains”
– Quaternary structures• Multiple molecules
Last update: 21-Nov-2015
Image credit: http://www.personal.psu.edu/jms5704/blogs/simmons/levels_of_protein_s_c_la_784.jpg
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 6
Primary structures• Connections (strong
covalent bonds vs. weak hydrogen bonds)– Which molecules are
connected– Which atoms are
connected– First-level constraints of the
possible structures• Example: Molecules close in
primary structure must also be close in secondary, tertiary and quaternary structures
Last update: 21-Nov-2015
Image credit: Wikibooks
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 7
Primary structures• Orientation:
– DNA, RNA: 5’-3’– Amino acids: Amino (N)
terminus to carboxyl (C) terminus• “Residue”: what
remains after a water molecule is expelled
Last update: 21-Nov-2015
Image credit: http://bealbio.wikispaces.com/file/view/dsDNA.jpg, http://attentionmanagement.ca/userfiles/image/DNA-RNA%20directions.gif, http://www.phschool.com/science/biology_place/biocoach/images/translation/peptbond.gif, http://www.cystinuria.org/resources/education/aminoacids/peptide.gif
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 8
DNA secondary structures• Double helix• A-DNA (dehydrated samples)– Right-handed– 11bp per turn
• Most common: B-DNA– Right-handed– 10.5bp per turn
• Z-DNA (some methylated DNA)– Left-handed– 12bp per turn
Last update: 21-Nov-2015
Image credit: Wikipedia
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 9
DNA secondary structures
Last update: 21-Nov-2015
A-DNA B-DNA Z-DNAImage credit: Wikipedia
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 10
RNA secondary structures• Largely possible to be projected onto a 2D
plane
Last update: 21-Nov-2015
Stem/hairpin loop Stacking pairs Bulge
Image credit: http://www.clcbio.com/scienceimages/rna_prediction/RNA_structure_prediction_web.png
Internal loop Multi-loop Exterior loop
Dangling nucleotides Less stable pair Coaxial stacking
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 11
RNA secondary structures• Pseudoknots: complex structures
Last update: 21-Nov-2015
Image credit: Wikipedia, Sperschneider and Datta, RNA 14(4):630-640, (2008)
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 12
Protein secondary structures• Three main types:– -helixes– -sheets– Coils (connectors)
Last update: 21-Nov-2015
Image credit: http://calcium.uhnres.utoronto.ca/cadherin/images/pub_pages/general/ribbon.jpg, http://www.mun.ca/biology/scarr/MGA2-03-25.jpg
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 13
DNA tertiary structures• Wrapped around nucleosomes
formed by histone proteins• Condensed form at beginning
of mitosis and meiosis
Last update: 21-Nov-2015
Image credit: http://micro.magnet.fsu.edu/cells/nucleus/images/chromatinstructurefigure1.jpg, Wikipedia
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 14
RNA tertiary structures• Overall structure of an RNA
– More studied for RNAs that do not translate into proteins -- “non-coding” RNAs
– Example: tRNA
Last update: 21-Nov-2015
Image credit: Wikipedia
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 15
Protein tertiary structures• Complex structures– Mainly caused by weak forces (hydrogen
bonds and hydrophobic interactions)– Occasionally stronger forces (disulfide bonds
between cysteines)
• The CATH hierarchy– Class: composition of secondary structures– Architecture: overall shape– Topology: connection of secondary structures– Homologous: with common ancestor
Last update: 21-Nov-2015
Image credit: CATH
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 16
Quaternary structures• Types:– Protein subunit-protein
sub-unit– Protein-protein– Protein-DNA– Protein-RNA– (Protein-small molecules)– RNA-RNA– ...
Last update: 21-Nov-2015
Image credit: Wikipedia, http://serrano.crg.es/images/protein_dna1.jpg Protein-DNA interaction
Protein-subunit interaction (Hemoglobin)
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 17
Structure and function • Why function depends
on structure?1. Structure itself is the
function (e.g., tubulins)
2. Binding• Complementarity of
interacting structures• Formation of special
bonds
Last update: 21-Nov-2015
Image credit: http://www.nigms.nih.gov/NR/rdonlyres/54BEAC37-47A9-454A-BC4F-B94EA127FA1E/0/fig1a_large.jpg, http://upload.wikimedia.org/wikimedia/en-labs/7/7f/Protein_Protein_Docking.JPG
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 18
Structure and function • Why function depends
on structure? (cont’d)3. Functional group (e.g.,
catalytic site)4. Determining
localization (e.g., transporter membrane proteins)
Last update: 21-Nov-2015
Image credit: http://www.catalysis-ed.org.uk/principles/images/enzyme_substrate.gif, Spudich , Science 288(5470):1358-1359, 2000
RNA SECONDARY STRUCTURESPart 2
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 20
Important RNA classes• Coding:
– Messenger RNAs (mRNAs)• For translating into proteins
• Non-coding:– Ribosomal RNAs (rRNAs)
• Parts of the ribosome complex
– Transfer RNAs (tRNAs)• Delivering free amino acids during translation
– Micro RNAs (miRNAs)• Binding mRNA targets to promote RNA
degradation or repress translation
– Small nucleolar RNAs (snoRNAs)• Guiding chemical modifications of other RNAs
– Small nuclear RNAs (snRNAs)• Involved in mRNA splicing
– Long non-coding RNAs (lncRNAs)• Some involved in gene regulation
– ...
Last update: 21-Nov-2015
Image source: http://legacy.hopkinsville.kctcs.edu/sitecore/instructors/Jason-Arnold/VLI/Module%201/m1DNAfunction/m1DNAfunction3.html
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 21
Importance of RNA structures• Structure is important to many classes of RNA• Examples:
Last update: 21-Nov-2015
Image sources: http://www.bio.miami.edu/dana/pix/tRNA.jpg, http://lowelab.ucsc.edu/images/CDBox.jpg
tRNA snoRNA
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 22
Representing RNA secondary structures
• Formats: (see http://projects.binf.ku.dk/pgardner/bralibase/RNAformats.html):– Dot-bracket format– Stockholm format– ...
Last update: 21-Nov-2015
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 23
Dot-bracket format
• Sequence (nucleotides 10, 20, 30, etc. marked in red):GUGAAUGAUGAAUUUAAUUCUUUGGUCCGUGUUUAUGAUGGGAAGUAAGACCCCCGAUAUGAGUGACAAAAGAGAUGUGGUUGACUAUCACAGUAUCUGACG
• Structure:......((((.......((((((.(((....((((((.((((..........)))).)))))).))).)))))).((((((.....)))))).)))).....
Last update: 21-Nov-2015
Image credit: Xihao Hu
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 24
Predicting RNA secondary structures
• A basic assumption in structure predictions:– Real structure has the lowest free energy
• In a simplified view, more stable bonds lower free energy
• In the case of RNA secondary structures:– Good to form more pairs
• A-U• C-G• Sometimes G-U (a “wobble base pair”)
– Good to form more stable pairs• C-G > A-U > G-U
– Good to have stable sub-structures• E.g., stacking pairs
Last update: 21-Nov-2015
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 25
Predicting RNA secondary structures
• We will assume there are no pseudoknots– With pseudoknots, currently there is no
known algorithm that can find the optimal solution efficiently
• We need two things:1. A thermodynamic model for
computing the free energy of a structure
2. A method for finding the structure with the minimum free energy
– This setting sounds familiar?
Last update: 21-Nov-2015
Image credit: Wikipedia
A pseudoknot
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 26
Further assumptions1. The free energy of a secondary structure is
the sum of the free energies of the sub-structures– Not the sum of individual bases/base pairs, as
one base pair can participate in multiple sub-structures
2. The free energies of the sub-structures are independent
Last update: 21-Nov-2015
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 27
Problem definition• Given an RNA sequence, find a set of base
pairs so that each base is paired at most once• Example:– Input sequence: GUGAAUGAUGAAUUU...ACG– Output set of base pairs:• (7, 97)• (8, 96)• ...• (18, 74)• ...• (81, 87)
Last update: 21-Nov-2015
Image credit: Xihao Hu
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 28
Linear view
Last update: 21-Nov-2015
1 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ... 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 ...... 96 97. ( ( ( ( . . . . . . . ( ( ( ( ( ( . ( ... ) . ) ) ) ) ) ) . ( ( ( ( ( ( . . . . . ) ) ) ) ) ) . ) ) ...... ) )
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 29
Thermodynamics model• We will consider four types of sub-structures here:– Stacking pairs: both (i, j) and (i+1, j-1) are in the set– Hairpin loop: there is a pair (i, j), where all bases from i+1
to j-1 are not paired– Bulge/Internal loop: there are two pairs (i, j) and (i1, j1),
where i<i1<j1<j, and all bases from i+1 to i1-1 and from j1+1 to j-1 are not paired
– Multi-loop: there are pairs (i, j), (i1, j1), ..., (ik, jk), where i<i1<j1<...<ik<jk<j, and all bases from i+1 to i1-1, from j1+1 to i2-1, ..., jk-1+1 to ik-1 and from jk+1 to j-1 are unpaired
• One base pair can participate in multiple structures
Last update: 21-Nov-2015
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 30
Stacking pairs• Both (i, j) and (i+1,
j-1) are in the set• E.g., i:20, j:72
Last update: 21-Nov-2015
1 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ... 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 ...... 96 97i i+1 j-1 j
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 31
Hairpin loop• There is a pair (i, j),
where all bases from i+1 to j-1 are not paired
• E.g., i: 81, j: 87
Last update: 21-Nov-2015
1 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ... 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 ...... 96 97i j
Image source: http://img.ehowcdn.com/article-new/ds-photo/getty/article/151/226/87820768_XS.jpg
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 32
Bulge/Internal loop• Internal loop: There
are two pairs (i, j) and (i1, j1), where i<i1<j1<j, and all bases from i+1 to i1-1 and from j1+1 to j-1 are not paired– Called a bulge if only
one side has unpaired bases
• E.g., i:23, j:69, i1:25, j1:67
Last update: 21-Nov-2015
1 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ... 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 ...... 96 97i i1 jj1
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 33
Multi-loop• Multi-loop: There are
pairs (i, j), (i1, j1), ..., (ik, jk), where i<i1<j1<...<ik<jk<j, and all bases from i+1 to i1-1, from j1+1 to i2-1, ..., jk-1+1 to ik-1 and from jk+1 to j-1 are unpaired
• E.g., k=2, i:10, j:94, i1:18, j1:74, i2:76, j2:92
Last update: 21-Nov-2015
1 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ... 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 ...... 96 97i i1 j1 i2 j2 j
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 34
One possible thermodynamic model• Unpaired bases have 0 free energy and all the
terms below have negative free energy• eS(i, j): for the stacking pairs (i, j) and (i+1, j-1)• eH(i, j): for the hairpin loop closed at (i, j)• eBI(i, j, i1, j1): for a bulge or internal loop
enclosed by the pairs (i, j) and (i1, j1)
• eM(i, j, i1, j1, ..., ik, jk): for a multi-loop that consists of the pairs (i, j), (i1, j1), ..., (ik, jk) and satisfying i<i1<j1<...<ik<jk<j
Last update: 21-Nov-2015
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 35
Finding the optimal structure• Dynamic programming• Let s be the RNA sequence with n nucleotides• Tables:– V(j): free energy of the optimal structure for s[1..j]
• Final answer is based on V(n)
– VP(i, j): free energy of the optimal structure for s[i..j] with i and j forming a pair
– VBI(i, j): free energy of the optimal structure for s[i..j] with i and j forming a pair that closes a budge or internal loop
– VM(i, j): free energy of the optimal structure for s[i..j] with i and j forming a pair that closes a multi-loop
Last update: 21-Nov-2015
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 36
Update formulas• V(j): free energy of the optimal structure for
s[1..j]• V(1) = 0• For j > 1,
Last update: 21-Nov-2015
j...1
i ... j... i-11
j-1
j...1
j is unpaired
j pairs with i
...
...
...
Vሺ𝑗ሻ= minቊVሺ𝑗− 1ሻ 𝑗 is unpairedmin1≤𝑖<𝑗ሼVPሺ𝑖,𝑗ሻ+ Vሺ𝑖 − 1ሻሽ 𝑗 pairs with 𝑖
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 37
Update formulas• VP(i, j): free energy of the optimal structure for
s[i..j] with i and j forming a pair • We require that i < j
Last update: 21-Nov-2015
i ... j...
i ... j... j-1i+1Stacking pairs
i ... j...Hairpin loop
All unpaired
...
...
...
Vpሺ𝑖,𝑗ሻ= minە۔
+eSሺ𝑖,𝑗ሻۓ VPሺ𝑖 + 1,𝑗− 1ሻ ሺ𝑖,𝑗ሻ and ሺ𝑖 + 1,𝑗+ 1ሻ form stacking pairseH(𝑖,𝑗) ሺ𝑖,𝑗ሻ closes a hairpin loopVBIሺ𝑖,𝑗ሻ ሺ𝑖,𝑗ሻ closes an internal loopVMሺ𝑖,𝑗ሻ ሺ𝑖,𝑗ሻ closes a multi loop
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 38
Update formulas• VBI(i, j): free energy of the optimal structure
for s[i..j] with i and j forming a pair that closes a budge or internal loop (i.e., i and j take the roles of i1 and j1)
Last update: 21-Nov-2015
i ... j... ...
i ... j... ...i1 ... j1 ...Budge or internal loop
All unpaired All unpaired
VBIሺ𝑖,𝑗ሻ= min𝑖1,𝑗1:𝑖<𝑖1<𝑗1<𝑗ሼeBIሺ𝑖,𝑗,𝑖1,𝑗1ሻ+ VPሺ𝑖1,𝑗1ሻሽ
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 39
Update formulas• VM(i, j): free energy of the optimal structure
for s[i..j] with i and j forming a pair that closes a multi-loop
Last update: 21-Nov-2015
i ... j... ...
VMሺ𝑖,𝑗ሻ= min𝑖1,𝑗1,…,𝑖k,𝑗𝑘:𝑖<𝑖1<𝑗1<...<𝑖𝑘<𝑗𝑘<𝑗൝eMሺ𝑖,𝑗,𝑖1,𝑗1,…,𝑖𝑘,𝑗𝑘ሻ+ VPሺ𝑖ℎ,𝑗ℎሻ𝑘ℎ=1 ൡ
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 40
Time and space requirements
• V: n entries, each takes O(n) time• VP(i, j): O(n2) entries, each takes constant time
Last update: 21-Nov-2015
Vሺ𝑗ሻ= minቊVሺ𝑗− 1ሻ 𝑗 is unpairedmin1≤𝑖<𝑗ሼVPሺ𝑖,𝑗ሻ+ Vሺ𝑖 − 1ሻሽ 𝑗 pairs with 𝑖
Vpሺ𝑖,𝑗ሻ= minە۔
+eSሺ𝑖,𝑗ሻۓ VPሺ𝑖 + 1,𝑗− 1ሻ ሺ𝑖,𝑗ሻ and ሺ𝑖 + 1,𝑗− 1ሻ form stacking pairseH(𝑖,𝑗) ሺ𝑖,𝑗ሻ closes a hairpin loopVBIሺ𝑖,𝑗ሻ ሺ𝑖,𝑗ሻ closes an internal loopVMሺ𝑖,𝑗ሻ ሺ𝑖,𝑗ሻ closes a multi loop
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 41
Time and space requirements
• VBI: O(n2) entries, each takes O(n2) time
• VM: O(n2) entries, each takes O(n2k) time
Last update: 21-Nov-2015
VBIሺ𝑖,𝑗ሻ= min𝑖1,𝑗1:𝑖<𝑖1<𝑗1<𝑗ሼeBIሺ𝑖,𝑗,𝑖1,𝑗1ሻ+ VPሺ𝑖1,𝑗1ሻሽ VMሺ𝑖,𝑗ሻ= min𝑖1,𝑗1,…,𝑖k,𝑗𝑘:𝑖<𝑖1<𝑗1<...<𝑖𝑘<𝑗𝑘<𝑗൝eMሺ𝑖,𝑗,𝑖1,𝑗1,…,𝑖𝑘,𝑗𝑘ሻ+ VPሺ𝑖ℎ,𝑗ℎሻ𝑘
ℎ=1 ൡ
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 42
Time and space requirements• Summary:– V: n entries, each takes O(n) time– VP: O(n2) entries, each takes constant time
– VBI: O(n2) entries, each takes O(n2) time
– VM: O(n2) entries, each takes O(n2k) time
• Total: O(n2) space, O(n2k+2) time– Exponential if k is unbounded– Some approximations could bring the time down
to O(n4) – still huge for large n, but feasible for small or median n
Last update: 21-Nov-2015
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 43
Some remarks• If we allow general pseudoknots, there is
currently no efficient way to find the optimal RNA secondary structure with the minimum free energy
• Other methods to predict RNA secondary structures:– Conservation and covariation• High conservation: 2 and 4• Strong covariation: 1 and 5
– Experimental methods (e.g., RNA footprinting)
Last update: 21-Nov-2015
12345ACGGUACUGUCCAGGUCCGA
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 44
Representing pseudoknots
• Without pseudoknots, RNA secondary structures can be unambiguously represented by dots (single bases) and brackets (base pairs)
– What if there are pseudoknots?– Need more types of brackets
Last update: 21-Nov-2015
1 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ... 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 ...... 96 97
. ( ( ( ( . . . . . . . ( ( ( ( ( ( . ( ... ) . ) ) ) ) ) ) . ( ( ( ( ( ( . . . . . ) ) ) ) ) ) . ) ) ...... ) )
Image source: http://ultrastudio.org/upload/RNAPseudoKnot-25005810.jpg
GAAGUACAAUAUGUAACCG.{.((((.....))}))..
CASE STUDY, SUMMARY AND FURTHER READINGS
Epilogue
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 46
Case study: Drug finding/design• Drugs are mostly chemicals with a specific
structure that interacts with some biological objects
• Examples:– Inhibiting the activities of an important protein of
bacteria– Blocking the interaction between virus and
receptors of host cell– Simulating the production of a hormone
Last update: 21-Nov-2015
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 47
Case study: Drug finding/design• Suppose we want to identify/design a chemical to
target a particular object (e.g., a protein), we need to make sure that they have tight bindings through a process called docking
Last update: 21-Nov-2015
Image source: http://vds.cm.utexas.edu/
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 48
Case study: Drug finding/design• Computational problem:– Input: a target protein and a list of chemicals– Goal: find a chemical that binds the target well
• Try different locations and orientations• Binding depends on structure and chemistry
– Output: One or more chemicals that bind the target well• Difficulties:– Computational complexity
• Large search space for each protein-chemical combination• Need to try many chemicals
– Need to ensure specificity (not to target other proteins and cause side effects)
Last update: 21-Nov-2015
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 49
Case study: Drug finding/design• There is a game for players to try folding proteins called FoldIt (
http://fold.it/)– Score based on free energy– Real time update of scores and ranks– Players can discuss and share solutions– Resulted in some amazingly good folds as compared to automatic predictions
by computer programs
Last update: 21-Nov-2015
Image source: http://fold.it/portal/site_files/theme/science/competition.png
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 50
Summary• Functions depend on structures• Different levels of structures:
– Primary (sequence)– Secondary (local)– Tertiary (global)– Quaternary (interactions)
• RNA secondary structures can be predicted by dynamic programming based on a thermodynamic model
• Important sub-structures– Stacking pairs– Hairpin loops– Internal loops/bulges– Multi-loops– Pseoduknots
Last update: 21-Nov-2015
CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 51
Further readings• Chapter 11 of Algorithms in Bioinformatics: A
Practical Introduction– Speed up of algorithm– Algorithm for RNA structure perdition with
pseudoknots– Free slides available
• Parts VII and VIII of Fundamental Concepts of Bioinformatics– Protein folding and protein structure prediction– Docking
Last update: 21-Nov-2015