wellcome trust graduate course. - computational methods series. --- sequence-based bioinformatics....
TRANSCRIPT
![Page 1: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University](https://reader030.vdocument.in/reader030/viewer/2022033101/56649f415503460f94c60fd9/html5/thumbnails/1.jpg)
Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics.
Dr. Hyunji KimDepartment of Biochemistry, University of Oxford, South Parks Road, Oxford, OX1 3QU, UKEmail:[email protected]
![Page 2: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University](https://reader030.vdocument.in/reader030/viewer/2022033101/56649f415503460f94c60fd9/html5/thumbnails/2.jpg)
1) BLAST/WUBLAST
A search engine to find sequences of your interest.BLAST can sophisticate its search, by varying substitution matrices/filtering options on a specified database. http://www.ncbi.nlm.nih.gov/BLAST/, http://www.ebi.ac.uk/blast2/,
2) ClustalW/T-Coffee/Muscle
Helps us make sense of a bunch of unaligned sequences, via generating multiple or pairwise sequence alignments. Uses a progressive-alignment method. http://www.ebi.ac.uk/clustalw/
3) HMMer/PSI-BLAST
Builds a profile Hidden Markov Model from a set of sequences aligned.Aligns sequences using a pHMM, searches from a sequence database, and can assign functions to a given
sequence.http://hmmer.wustl.edu/
4) Phylip/TreeDyn
Calculates a distance matrix from a set of sequences. Derives phylogenetic trees, by taking such matrix as input, based upon theories of minimum evolution, parsimony and more.http://evolution.genetics.washington.edu/phylip.html
Basic Tools
![Page 3: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University](https://reader030.vdocument.in/reader030/viewer/2022033101/56649f415503460f94c60fd9/html5/thumbnails/3.jpg)
5) Databases
• Nucleotide databases; EMBL, Genbank &DDBJ• Protein databases; fully annotated, e.g. Swiss-Prot v52.3, as of 17th of Apr., 2007. (264,492 entries) a computer-annotated, e.g. TrEMBL v35.3
• Genomics databases; Ensembl & Eukaryota, Bacteria and Archaea genomes 20+14;(v44), 51, 445, 40, as of 20th of Apr., 2007.
http://www.ebi.ac.uk/uniprot/index.html, http://www.ensembl.org/, http://www.ebi.ac.uk/genomes/index.html
6) Major Bioinformatics Centres, around the globe.
http://www.ebi.ac.uk/, http://www.ncbi.nlm.nih.gov/, http://www.ddbj.nig.ac.jp/, http://us.expasy.org/, http://www.sanger.ac.uk/, http://geneontology.org/
![Page 4: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University](https://reader030.vdocument.in/reader030/viewer/2022033101/56649f415503460f94c60fd9/html5/thumbnails/4.jpg)
Searching for sequences by homology
- BLAST
![Page 5: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University](https://reader030.vdocument.in/reader030/viewer/2022033101/56649f415503460f94c60fd9/html5/thumbnails/5.jpg)
x
yi
j
![Page 6: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University](https://reader030.vdocument.in/reader030/viewer/2022033101/56649f415503460f94c60fd9/html5/thumbnails/6.jpg)
![Page 7: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University](https://reader030.vdocument.in/reader030/viewer/2022033101/56649f415503460f94c60fd9/html5/thumbnails/7.jpg)
![Page 8: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University](https://reader030.vdocument.in/reader030/viewer/2022033101/56649f415503460f94c60fd9/html5/thumbnails/8.jpg)
Reference: Gish, W. (1996-2006) http://blast.wustl.edu
Query= KcsA (160 letters) >Filtered+0 MPPMXXXXXXXXXXXXXGRHGSALHWRXXXXXXXXXXXXXXXGSYLAVLAERGAPGAQLI TYPRALWWSVETATTVGYGDLYPVTLWGRLVAVVVMVAGITSFGLVTAALATWFVGREQE RRGHFVRHSEKXXXXXXXXXXXXLHERFDRLERMLDDNRR
Database: swissprot 223,100 sequences; 81,965,973 total letters. Searching....10....20....30....40....50....60....70....80....90....100% done
Smallest Sum High Probability Sequences producing High-scoring Segment Pairs: Score P(N) N
SW:KCSA_STRCO P0A333 Voltage-gated potassium channel. 615 3.0e-60 1 SW:KCSA_STRLI P0A334 Voltage-gated potassium channel. 615 3.0e-60 1
>SW:KCSA_STRCO P0A333 Voltage-gated potassium channel. Length = 160
Score = 615 (221.5 bits), Expect = 3.0e-60, P = 3.0e-60, Group = 1 Identities = 120/160 (75%), Positives = 120/160 (75%)
Query: 1 MPPMXXXXXXXXXXXXXGRHGSALHWRXXXXXXXXXXXXXXXGSYLAVLAERGAPGAQLI 60 MPPM GRHGSALHWR GSYLAVLAERGAPGAQLI Sbjct: 1 MPPMLSGLLARLVKLLLGRHGSALHWRAAGAATVLLVIVLLAGSYLAVLAERGAPGAQLI 60
Query: 61 TYPRALWWSVETATTVGYGDLYPVTLWGRLVAVVVMVAGITSFGLVTAALATWFVGREQE 120 TYPRALWWSVETATTVGYGDLYPVTLWGRLVAVVVMVAGITSFGLVTAALATWFVGREQE Sbjct: 61 TYPRALWWSVETATTVGYGDLYPVTLWGRLVAVVVMVAGITSFGLVTAALATWFVGREQE 120
![Page 9: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University](https://reader030.vdocument.in/reader030/viewer/2022033101/56649f415503460f94c60fd9/html5/thumbnails/9.jpg)
Multiple sequence alignment
– ClustalW
![Page 10: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University](https://reader030.vdocument.in/reader030/viewer/2022033101/56649f415503460f94c60fd9/html5/thumbnails/10.jpg)
***************************************************** CLUSTAL W (1.83) Multiple Sequence
Alignments ***************************************************** 1. Sequence Input From Disc 2. Multiple Alignments 3. Profile / Structure Alignments 4. Phylogenetic trees S. Execute a system command H. HELP X. EXIT (leave program) Your choice: 2
****** MULTIPLE ALIGNMENT MENU ****** 1. Do complete multiple alignment now
(Slow/Accurate) 2. Produce guide tree file only 3. Do alignment using old guide tree file 4. Toggle Slow/Fast pairwise alignments = SLOW 5. Pairwise alignment parameters 6. Multiple alignment parameters 7. Reset gaps before alignment? = OFF 8. Toggle screen display = ON 9. Output format options S. Execute a system command H. HELP or press [RETURN] to go back to main menu Your choice:
![Page 11: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University](https://reader030.vdocument.in/reader030/viewer/2022033101/56649f415503460f94c60fd9/html5/thumbnails/11.jpg)
![Page 12: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University](https://reader030.vdocument.in/reader030/viewer/2022033101/56649f415503460f94c60fd9/html5/thumbnails/12.jpg)
CLUSTAL W (1.82) multiple sequence alignment
KVAP_AERPE FDALW-WAVVTATTVGYGDVVP-ATPIGKVIGIAVMLTGISALTLLIGTVSNMF------ 79MVP_METJA FDAFY-FTTISITTVGYGDITP-KTDAGKLI---IIFS---VLFFISGLITS-------- 70O28600 FDSLY-MTVITITTTGYGEVKP-MGPGGRVISMLLMFVGVGTF----------------- 64Q8TXQ4 LTCLY-FTAATITTVGYGDVVP-TTEAGRLLSVIVMFSGIGVASYAL------------- 73Q6L2S2 FTSLW-WTMQTITTVGYGDTPV-YGFYGRINGMLIMVFGIGTIGYVTASLAT-------- 79Q979Z2 FTAIW-FTMETVTTVGYGDVVP-VSNLGRVVAMLIMVSGIGLLGTLTATISAYLF----Q 80O26605 EDSLW-YVLQTITTVGYGDIVP-VTSLGRFTGMVIMFSAIASTSLITASATSTLLERGEQ 114Q9HIA8 GNAFY-YTGEVITTLGFGDILP-VTMDAKIFTISLAFLGVAIFFSSITALILPSVERRLG 94Q97CK5 GTALY-YTGETVTTLGFGDILP-VDLESRLFTISLAFLGVAIFFSAMTALITPTIERRVG 84
GrayOthers
Hydroxyl, AmineGreenSTYHCNGQ
BasicMagentaRHK
AcidicBlueDE
Small (small+ hydrophobic (incl.aromatic -Y))
RedAVFPMILW
![Page 13: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University](https://reader030.vdocument.in/reader030/viewer/2022033101/56649f415503460f94c60fd9/html5/thumbnails/13.jpg)
Profile alignment & Pattern recognition: HMMer More sensitive homology-search: PSI-BLAST &
HMMer
![Page 14: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University](https://reader030.vdocument.in/reader030/viewer/2022033101/56649f415503460f94c60fd9/html5/thumbnails/14.jpg)
DNA sequence
Amino acid sequence
![Page 15: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University](https://reader030.vdocument.in/reader030/viewer/2022033101/56649f415503460f94c60fd9/html5/thumbnails/15.jpg)
![Page 16: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University](https://reader030.vdocument.in/reader030/viewer/2022033101/56649f415503460f94c60fd9/html5/thumbnails/16.jpg)
PSI-BLAST
![Page 17: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University](https://reader030.vdocument.in/reader030/viewer/2022033101/56649f415503460f94c60fd9/html5/thumbnails/17.jpg)
Phylogeny: Phylip & Treedyn
![Page 18: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University](https://reader030.vdocument.in/reader030/viewer/2022033101/56649f415503460f94c60fd9/html5/thumbnails/18.jpg)
Saitou N and Nei M, The neighbour-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol, 4(4):406-425, 1987
![Page 19: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University](https://reader030.vdocument.in/reader030/viewer/2022033101/56649f415503460f94c60fd9/html5/thumbnails/19.jpg)
![Page 20: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University](https://reader030.vdocument.in/reader030/viewer/2022033101/56649f415503460f94c60fd9/html5/thumbnails/20.jpg)
![Page 21: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University](https://reader030.vdocument.in/reader030/viewer/2022033101/56649f415503460f94c60fd9/html5/thumbnails/21.jpg)
TreeDyn
![Page 22: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University](https://reader030.vdocument.in/reader030/viewer/2022033101/56649f415503460f94c60fd9/html5/thumbnails/22.jpg)
Protein secondary structure prediction:
two consensus methods
![Page 23: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University](https://reader030.vdocument.in/reader030/viewer/2022033101/56649f415503460f94c60fd9/html5/thumbnails/23.jpg)
http://sbcb.bioch.ox.ac.uk/TM_noj/TM_noj.html
![Page 24: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University](https://reader030.vdocument.in/reader030/viewer/2022033101/56649f415503460f94c60fd9/html5/thumbnails/24.jpg)
640 650 660 670 680 690 700 | | | | | | | MFAKGYGKNNEPLRGYILTFLIALGFILIAELNVIAPIISNFFLASYALINFSVFHASLAKSPGWRPAFKALOM2 ***************** DAS **************************************** HMMTOP2 ****************** ************************* MEMSAT1.5 ************************* PHD ************************* SPLIT4 **************** *************************** TMAP ***************************** TMFINDER **************************************** TMHMM2 *********************** ****************** TMPRED ************************* TOPPRED2 ********************* ********************* Consensus ------------???hhhhHHHHHHHHHHHHHHHHHhHHhhhhhhhhh???????????-----------
Dr. Jonathan Cuthbertson developed Transmembrane Prediction Server.
Example Output
![Page 25: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University](https://reader030.vdocument.in/reader030/viewer/2022033101/56649f415503460f94c60fd9/html5/thumbnails/25.jpg)
http://pongo.biocomp.unibo.it/pongo
Pongo
![Page 26: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University](https://reader030.vdocument.in/reader030/viewer/2022033101/56649f415503460f94c60fd9/html5/thumbnails/26.jpg)
Example Output by Pongo
![Page 27: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University](https://reader030.vdocument.in/reader030/viewer/2022033101/56649f415503460f94c60fd9/html5/thumbnails/27.jpg)
Background for practical sessions
![Page 28: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University](https://reader030.vdocument.in/reader030/viewer/2022033101/56649f415503460f94c60fd9/html5/thumbnails/28.jpg)
Ion channels ; Potassium channels ; Voltage-gated potassium channels
• Ion channels are a diverse class of transmembrane proteins that are responsible for the diffusion of ions across the cell membranes.
• There are several major families of ion channels, for instance K+, Na+, Ca2+ and Cl- channels as well as ligand gated ion channels (LGICs).
•Many human neurological and muscular disorders have been traced to defects in voltage-gated and ligand-gated ion channels. Fig 2. A. Long et al., Science, Vol. 309, p897, 2005
TM
T1
Introduction to your input sequence
![Page 29: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University](https://reader030.vdocument.in/reader030/viewer/2022033101/56649f415503460f94c60fd9/html5/thumbnails/29.jpg)
K+ channels, blastp
Homologues are visualised in BLIXEM.
Your expected blastp-output
![Page 30: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University](https://reader030.vdocument.in/reader030/viewer/2022033101/56649f415503460f94c60fd9/html5/thumbnails/30.jpg)
Kv
BK
SK
Erg
Kir
CNG
AKT
Kv1.xShabKv2.xShalKv4.xKv5.6.8.9.ShawKv3.x
Kir2.xKir6.2Kir3.xKir4.xKir1.1Kir6.1Kir2.3
Fig 4. Shealy et al., Biophysical Journal, Vol 84, p2929, 2003
Alignment you are about to build, not necessarily as big.
![Page 31: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University](https://reader030.vdocument.in/reader030/viewer/2022033101/56649f415503460f94c60fd9/html5/thumbnails/31.jpg)
hmmsearch - search a sequence database with a profile HMM
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -HMM file: Kv.hmm [Kv_homologues]Sequence database: infile_comb- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Query HMM: Kv_homologuesHMM has been calibrated; E-values are empirical estimates]Scores for complete sequences (score includes all domains):
Sequence Description Score E-value N -------- ----------- ----- ------- ---CIKS_DROME 241.2 3.2e-71 1Q9VX00_DROME 234.3 3.9e-69 1CIKB_DROME 159.3 1.5e-46 1O62350_Celegans 156.7 8.8e-46 1Q9VLC6_DROME 156.6 9.6e-46 1CIKW_DROME 156.5 1e-45 1Q8SYL2_DROME 156.5 1e-45 1Q22012_Celegans 155.3 2.4e-45 1Filtered_5DROME 140.5 6.6e-41 1Filtered_6DROME 140.5 6.6e-41 1Q9XXD1_Celegans 125.0 3.1e-36 1
Example of pHMM-related output
![Page 32: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University](https://reader030.vdocument.in/reader030/viewer/2022033101/56649f415503460f94c60fd9/html5/thumbnails/32.jpg)
Kir
Kv
BK
SK
AKT
CNG/HErg
KcsA
MthK
Kv1.2
KvAP
Raw tree-files produced by PHYLIP
![Page 33: Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University](https://reader030.vdocument.in/reader030/viewer/2022033101/56649f415503460f94c60fd9/html5/thumbnails/33.jpg)
Phylogenetic trees modified in TreeDyn