applications of structural bioinformatics
TRANSCRIPT
2010/11/22
1
Applications of structural bioinformatics
徐唯哲 Paul Wei-Che HSU
Assistant Research Specialist
Bioinformatics Core, Institute of Molecular Biology, Academia Sinica, Taiwan, R.O.C.
RNA structure
• Primary structure of an RNA: a sequence of the bases A, G, C and U
• Due to hydrogen bonds, the bases of an RNA may form the base pair– Watson-Click base pairs:
• G≡C: formed by a triple-hydrogen bond• A=U: formed by a double-hydrogen bond
– Wobble base pairs:• G−U: formed by a single hydrogen bond
• Secondary structure of an RNA: the Watson-Crick and wobble base pairs occurring in the RNA fold
2010/11/22
2
3
RNA secondary structure• RNA structure pairing:
A-U, C-G, G-U
a. hairpin loop b. internal loop c. bulge loop d. multibranched loop e. stem f. pseudoknot
Thermodynamic Calculations
△G: Free energy of duplex formation
△H: enthalpy
△S:entropy
△G = △H - T△S
T: temperature in K
2010/11/22
3
Minimum free-energy (MFE)
• E(S) : Total free-energy E of structure
E(S) = ∑ e(ri,rj)(ri,rj)ЄS
E = min E(S)
Nearest-neighbor energy parameters
Breslauer SantaLucia Sugimoto
Dh Ds Dh Ds Dh Ds
AA/TT -9.1 -24.0 -8.4 -23.6 -8.0 -21.9
AG/CT -7.8 -20.8 -6.1 -16.1 -6.6 -16.4
AT/TA -8.6 -23.9 -6.5 -18.8 -5.6 -15.2
AC/GT -6.5 -17.3 -8.6 -23.0 -9.4 -25.5
GA/TC -5.6 -13.5 -7.7 -20.3 -8.8 -23.5
GG/CC -11.0 -26.6 -6.7 -15.6 -10.9 -28.4
GC/GC -11.1 -26.7 -11.1 -28.4 -10.5 -26.4
TA/TA -6.0 -16.9 -6.3 -18.5 -6.6 -18.4
TG/CA -5.8 -12.9 -7.4 -19.3 -8.2 -21.0
CG/CG -11.9 -27.8 -10.1 -25.5 -11.8 -29.0
nuc (GC% >0) 0.0 -16.8 0.0 -5.9 0.6 -9.0
nuc (GC% =0) 0.0 -20.1 0.0 -9.0 0.6 -9.0
Example:
△H= △h int + (△hGG/CC + △hGA/TC + △hAT/TA)
= 0 + (-11) + (-5.6) + (-8.6) = -25.2 (kcal/mol)
△S= △s int + △s GG/CC+△s GA/TC + △s AT/TA
= (-16.8) + (-26.6) + (-13.5) + (-23.9)= -80.8 (cal/K*mol)
△G25℃ = △H - T△S
= (-25.2) - (25+273)*(-0.0808)= -1.1 (kcal/mol)
GGAT||||CCTA
2010/11/22
4
• Mfold– http://mfold.rna.albany.edu/
– Fold many RNA/DNA sequences at once
– Fold RNA/DNA at different temperature (between 0° and 100° C)
M. ZukerMfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 31 (13), 3406-15, (2003)
Folding of DNA sequence at different temperature
37 °C 60 °C 90 °C
2010/11/22
5
14 utilities in RNA Studio
2010/11/22
6
2010/11/22
7
• RNAz –– http://rna.tbi.univie.ac.at/cgi-bin/RNAz.cgi
– predicting structural noncoding RNAs
Washietl S., Hofacker I.L., Stadler P.F.Fast and reliable prediction of noncoding RNAsProc. Natl. Acad. Sci. U.S.A. 102, 2454-2459, Feb. 2005
RNAz
• Predict structurally conserved and thermodynamically stable RNA secondary structures in multiple sequence alignments
• Can be used in genome wide screens :
– Detect functional RNA structures, as found in noncoding RNAs and cis-acting regulatory elements of mRNAs
2010/11/22
8
Step 1: File upload
Step 2: Analysis options
2010/11/22
9
Step 3: Output options
Step 4: View results
2010/11/22
10
The biogenesis of microRNAs
(Esquela-Kerscher and Slack, 2006)19RISC: RNA-induced silencing complex
Genes can be regulated by miRNAs
• microRNAs (miRNAs) get involved in critical biological processes by repressing the translation of coding genes.
• Previous study finds that more than one-third of human genome are regulated by RNA (Life science news, 2005)
• Thousands of human genes are microRNA targets (Lewis et al., Cell, 2005, Selbach et al., Nature, 2008)
20
2010/11/22
11
microRNA cluster
miRNA Function
mRNA mRNAActive chromatin
Silent chromatin
Histone methylation
mRNA degradation Translation repression Transcription repression
Common in plants Common in animalsCommon in yeasts, plants, and possibly animals
22
2010/11/22
12
23
Schematic of the structure of five human pri-miRNAs
(Cullen et al. Mol. Cell., 2004)
Category in miRNAMap
intergenic
intronic
intergenic
intergenic
exonic
24
• miRBase– http://www.mirbase.org/
– The microRNA database
2010/11/22
13
26
2010/11/22
14
27
Search resultGenomic location
Seed region of miRNA
• Perfect match at either bases 2-8 from the 5' end of the miRNA
• The seed correlated with both mRNA degradation and translational repression (Selbach et al., Nature, 2008)
28
miRNA
Target gene
2010/11/22
15
29
Tools for identifying miRNA targets
• miRNA.org (miRanda)– http://www.microrna.org
• TargetScan– http://www.targetscan.org/
• RNAhybrid– http://bibiserv.techfak.uni-
bielefeld.de/rnahybrid/submission.html
• PicTar– http://pictar.mdc-berlin.de/
30
Predicted miRNA targets - miRNA.org
2010/11/22
16
View target View expression profile
2010/11/22
17
34
Predicted miRNA targets - TargetScan
2010/11/22
18
RNAhybrid
miRNA sequence (in FASTA format)
Target RNA (in FASTA format)
2010/11/22
19
38
Predicted miRNA targets - PicTar
2010/11/22
20
39
Known miRNA targets : Tarbase
• TarBase: A comprehensive database of experimentally supported animal microRNA targets. – Sethupathy, P. et al. (RNA, 12:192-197, 2006)
• A database provides a means of searching through a comprehensive set of experimentally supportedmicroRNA targets in at least 9 organisms. – Number of miRNAs represented : 177– Number of target genes : 995– Number of target sites : 883– http://www.diana.pcbi.upenn.edu/tarbase
Known miRNA targets : Tarbase
2010/11/22
21
• How to get the promoter region of miRNA gene?
2010/11/22
22
43
Schematic of the structure of five human pri-miRNAs
(Cullen et al. Mol. Cell., 2004)
Category in miRNAMap
intergenic
intronic
intergenic
intergenic
exonic
Few of complete pri-miRNA sequences are identified
• Most of the transcription start sites (TSSs) of intergenicmiRNAs are unknown the position
TSS ?
44
2010/11/22
23
Experimental data
FANTOM3
CAGE tags
DBTSS
Cap-analysis gene expression (CAGE) tags
DBTSS Solexa tags
5'-ends of the Solexa sequences of human cell lines (MCF7, HEK293)
Expressed sequence tag (EST)
UCSC
EST positions in human genome
(Kuhn et al., Nucleic Acids Res, 2007)
(Wakaguri et al., Nucleic Acids Res, 2008)
(Carninci et al., Nat Genet, 2006)
http://fantom3.gsc.riken.jp/
http://dbtss.hgc.jp
Statistics of DBTSS
2010/11/22
24
http://dbtss.hgc.jp
2010/11/22
25
Retrieve promoter sequence
Comparative analysis of the promoters
2010/11/22
26
Search for TF Binding Site
Search result of TF binding site
2010/11/22
27
Tool Method Species Features Data Source References
EP3Structural features of DNA
identification
eukaryotic
genomes
DNA denaturation
values,duplex-free energy, GC
content
UCSC
ENCODEAbeel et al., Genome Res, 2008
NNPP 2.2 Neural network prokaryote/
eukaryote
TATA box EPDReese et al., Comput Chem,
2001
Promoter 2.0 Neural network VertebrateFour TFBSs (TATA box, CCAAT
box, GC box, Inr)EPD Knudsen, Bioinformatics, 1999
53
Promoter prediction tools
EP3 NNPP 2.2 Promoter 2.0
EP3 • http://bioinformatics.psb.ugent.be/webtools/ep3/
2010/11/22
28
BDGP: Neural Network Promoter Prediction
• http://www.fruitfly.org/seq_tools/promoter.html
Promoter 2.0http://www.cbs.dtu.dk/services/Promoter/
2010/11/22
29
Protein structure
Protein structure
(Adapted from a slide by P. Johansson, E. Jakobsson)
Drug DiscoveryFunctional study
Protein structure determination1. NMR2. X-ray crystallography
2010/11/22
30
Protein structure database• 1.PDB (Protein Data Bank):
– PDB contains information about experimentally-determined structures of proteins, nucleic acids, and complex assemblies.
– http://www.rcsb.org/pdb/home/home.do
• 2. MMDB (Molecular Modeling Database): – Data came from PDB, with value-added features such as explicit chemical
graphs, computationally identified 3D that are used to identify similar 3D structures
– http://www.ncbi.nlm.nih.gov/sites/entrez?db=structure
• 3. Pfam database (Protein Family Database)– Pfam is a large collection of protein families, proteins are generally
composed of one or more functional regions, commonly termed domains.– http://pfam.sanger.ac.uk/
Protein Data Bank (PDB)
• http://www.pdb.org/pdb/home/home.do
– Structure data determined by X-ray crystallography and NMR
– The data include the atom coordinate, reference, sequence, secondary structure, disulfide bond ……etc.
2010/11/22
31
2010/11/22
32
Enzyme Classification Histogram
Amprenavir: a protease inhibitor used to treat HIV infection.
2010/11/22
33
MMDB
66
PFam
• Pfam is a collection of protein families and
domains
• In Pfam, you can
– Look at multiple alignments
– View protein domain architectures
– Examine species distribution
– Follow links to other databases
– View known protein structures
2010/11/22
34
67
Pfam-A: Families from Pfam
Pfam-B: A large number of
small families taken from the
ProDom database
Protein and domain families
(Note: A single protein
can belong to several
Pfam families )
68
URL : http://pfam.sanger.ac.uk/
2010/11/22
35
70
Keyword Search
apoptosis
2010/11/22
36
71
Bcl-2 family
72
Alignment
2010/11/22
37
73
HMM logo
74
Phylogenetic tree for Bcl-2 family
2010/11/22
38
Structures
76
Domain organization
2010/11/22
39
77
Other databases of structural classification of proteins
• 1. SCOP (Structural Classification of Proteins):folds, superfamilies, and families– http://scop.mrc-lmb.cam.ac.uk/scop/
• 2. CATH (Classification by Class, Architecture, Topology & Homology) – http://www.cathdb.info/
• 3. Dali: a network service for comparing protein structures in 3D
– DALI server http://ekhidna.biocenter.helsinki.fi/dali_server/index.html
– DALI Database (fold classification) http://ekhidna.biocenter.helsinki.fi/dali/start
2010/11/22
40
蛋白質結構軟體應用
Software for Protein Structure Visualization
• PyMol http://www.pymol.org/• Jmol http://jmol.sourceforge.net/• RasMol http://www.umass.edu/microbio/rasmol/• MolPOV http://www.chem.ufl.edu/~der/der_pov2.htm• MolMol http://www.mol.biol.ethz.ch/wuthrich/software/molmol/• Ribbons http://www.cmc.uab.edu/ribbons/• MolScript http://www.avatar.se/molscript/• WebLab ViewerLite and ViewerPro
http://www.accelrys.com/about/msi.html• Swiss-PDB Viewer http://www.expasy.ch/spdbv/• XtalView http://www.scripps.edu/pub/dem-web/toc.html• MolView and MolView Lite http://bilbo.bio.purdue.edu/~tom/
STRING
(‘Search Tool for the Retrieval of Interacting Genes/Proteins’)
2010/11/22
41