hmmer tutorial 羅偉軒 [email protected]. account ip: 140.129.78.120 account: binfo2005...

30
HMMER tutorial 羅羅羅 [email protected]

Post on 21-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HMMER tutorial 羅偉軒 g39208007@ym.edu.tw. Account IP: 140.129.78.120 Account: binfo2005 Password: 2005binfo

HMMER tutorial

羅偉軒[email protected]

Page 2: HMMER tutorial 羅偉軒 g39208007@ym.edu.tw. Account IP: 140.129.78.120 Account: binfo2005 Password: 2005binfo

Account

• IP: 140.129.78.120

• Account: binfo2005

• Password: 2005binfo

Page 3: HMMER tutorial 羅偉軒 g39208007@ym.edu.tw. Account IP: 140.129.78.120 Account: binfo2005 Password: 2005binfo

HMMER

• http://hmmer.wustl.edu/

• The theory behind profile HMMs: R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, Biological sequence analysis: probabilistic models of proteins and nucleic acids, Cambridge University Press, 1998.

Page 4: HMMER tutorial 羅偉軒 g39208007@ym.edu.tw. Account IP: 140.129.78.120 Account: binfo2005 Password: 2005binfo

Flowchart

http://bioweb.pasteur.fr/seqanal/motif/hmmer-uk.html

Page 5: HMMER tutorial 羅偉軒 g39208007@ym.edu.tw. Account IP: 140.129.78.120 Account: binfo2005 Password: 2005binfo

Format of input alignment files

• Output of CLUSTAL family of programs

• Wisconsin/GCG MSF format

• the input format for the PHYLIP phylogenetic analysis programs

• aligned FASTA format

• Stockholm format (HMMER’s native format, used by the Pfam and Rfam databases)

• SELEX format

Page 6: HMMER tutorial 羅偉軒 g39208007@ym.edu.tw. Account IP: 140.129.78.120 Account: binfo2005 Password: 2005binfo

Searching a sequence database with a single profile HMM

• build a profile HMM with hmmbuild> hmmbuild globin.hmm globins50.msf

• calibrate the profile HMM with hmmcalibrate> hmmcalibrate globin.hmm

• search the sequence database with hmmsearch> hmmsearch globin.hmm Artemia.fa

Page 7: HMMER tutorial 羅偉軒 g39208007@ym.edu.tw. Account IP: 140.129.78.120 Account: binfo2005 Password: 2005binfo
Page 8: HMMER tutorial 羅偉軒 g39208007@ym.edu.tw. Account IP: 140.129.78.120 Account: binfo2005 Password: 2005binfo
Page 9: HMMER tutorial 羅偉軒 g39208007@ym.edu.tw. Account IP: 140.129.78.120 Account: binfo2005 Password: 2005binfo

local alignment versus global alignment

• To HMMER, whether local or global alignments are allowed is part of the model, rather than being accomplished by running a different algorithm.

• you need to choose what kind of alignments you want to allow when you build the model with hmmbuild.

• By default, hmmbuild builds models which allow alignments that are global with respect

• to the HMM, local with respect to the sequence, and allows multiple domains to hit per sequence.

Page 10: HMMER tutorial 羅偉軒 g39208007@ym.edu.tw. Account IP: 140.129.78.120 Account: binfo2005 Password: 2005binfo

Searching a query sequence against a profile HMM database

• creating your own profile HMM database> hmmbuild -A myhmms rrm.sto> hmmbuild -A myhmms fn3.sto> hmmbuild -A myhmms pkinase.sto> hmmcalibrate myhmms

• parsing the domain structure of a sequence with hmmpfam> hmmpfam myhmms 7LES DROME

Page 11: HMMER tutorial 羅偉軒 g39208007@ym.edu.tw. Account IP: 140.129.78.120 Account: binfo2005 Password: 2005binfo

Creating and maintaining multiple alignments with hmmalign

• Another use of profile HMMs is to create multiple sequence alignments of large numbers of sequences.

• A profile HMM can be build of a “seed” alignment of a small number of representative sequences, and this profile HMM can be used to efficiently align any number of additional sequences.

• > hmmalign -o globins630.ali globin.hmm globins630.fa

Page 12: HMMER tutorial 羅偉軒 g39208007@ym.edu.tw. Account IP: 140.129.78.120 Account: binfo2005 Password: 2005binfo
Page 13: HMMER tutorial 羅偉軒 g39208007@ym.edu.tw. Account IP: 140.129.78.120 Account: binfo2005 Password: 2005binfo
Page 14: HMMER tutorial 羅偉軒 g39208007@ym.edu.tw. Account IP: 140.129.78.120 Account: binfo2005 Password: 2005binfo

HMMER scoring and determining significance

• HMMER gives you at least two scoring criteria to judge by: the HMMER raw score, and an E-value.

• The E-value is calculated from the bit score. It tells you how many false positives you would have expected to see at or above this bit score.

• HMMER bit scores reflect whether the sequence is a better match to the profile model (positive score) or to the null model of nonhomologous sequences (negative score).

Page 15: HMMER tutorial 羅偉軒 g39208007@ym.edu.tw. Account IP: 140.129.78.120 Account: binfo2005 Password: 2005binfo

hmmsearch output

Page 16: HMMER tutorial 羅偉軒 g39208007@ym.edu.tw. Account IP: 140.129.78.120 Account: binfo2005 Password: 2005binfo
Page 17: HMMER tutorial 羅偉軒 g39208007@ym.edu.tw. Account IP: 140.129.78.120 Account: binfo2005 Password: 2005binfo
Page 18: HMMER tutorial 羅偉軒 g39208007@ym.edu.tw. Account IP: 140.129.78.120 Account: binfo2005 Password: 2005binfo

• Building a model– hmmbuild From a multiple sequence alignment

• Using a model– hmmalign Align sequences to an existing model (outputs a multiple alignment) – hmmconvert Convert a model into different formats – hmmcalibrate Takes an HMM and empirically determines parameters that are used to make

searches more sensitive, by calculating more accurate expectation value scores (E-values) – hmmemit Emit sequences probabilistically from a profile HMM – hmmsearch Search a sequence database for matches to an HMM

• HMMs Databases– hmmfetch Get a single model from an HMM database – hmmindex: Index an HMM database (not available on the WEB server) – hmmpfam Search an HMM database for matches to a query sequence

• Other programs– alistat: Show some simple statistics about a sequence alignment file – seqstat: Show some simple statistics about a sequence file – getseq: Retrieve a (sub-)sequence from a sequence file (not available on the WEB server) – sreformat: Reformat a sequence(s) or alignment file into a different format

Page 19: HMMER tutorial 羅偉軒 g39208007@ym.edu.tw. Account IP: 140.129.78.120 Account: binfo2005 Password: 2005binfo

References

• HMMER user guide

• Eddy SR. (1998) Profile hidden Markov models. Bioinformatics.

Page 20: HMMER tutorial 羅偉軒 g39208007@ym.edu.tw. Account IP: 140.129.78.120 Account: binfo2005 Password: 2005binfo
Page 21: HMMER tutorial 羅偉軒 g39208007@ym.edu.tw. Account IP: 140.129.78.120 Account: binfo2005 Password: 2005binfo
Page 22: HMMER tutorial 羅偉軒 g39208007@ym.edu.tw. Account IP: 140.129.78.120 Account: binfo2005 Password: 2005binfo
Page 23: HMMER tutorial 羅偉軒 g39208007@ym.edu.tw. Account IP: 140.129.78.120 Account: binfo2005 Password: 2005binfo
Page 24: HMMER tutorial 羅偉軒 g39208007@ym.edu.tw. Account IP: 140.129.78.120 Account: binfo2005 Password: 2005binfo
Page 25: HMMER tutorial 羅偉軒 g39208007@ym.edu.tw. Account IP: 140.129.78.120 Account: binfo2005 Password: 2005binfo
Page 26: HMMER tutorial 羅偉軒 g39208007@ym.edu.tw. Account IP: 140.129.78.120 Account: binfo2005 Password: 2005binfo

Related links

• HMMER http://hmmer.wustl.edu/• SAM http://www.cse.ucsc.edu/research/compbio/sam.ht

ml• PFTOOLS http://www.isrec.isb-sib.ch/ftp-server/pftools/• HMMpro http://www.netid.com/html/hmmpro.html• GENEWISE http://www.ebi.ac.uk/Wise2/• PROBE ftp://ftp.ncbi.nih.gov/pub/neuwald/probe1.0/• META-MEME http://metameme.sdsc.edu/• BLOCKS http://www.blocks.fhcrc.org/• PSI-BLAST http://www.ncbi.nlm.nih.gov/BLAST/newblast.

html

Page 27: HMMER tutorial 羅偉軒 g39208007@ym.edu.tw. Account IP: 140.129.78.120 Account: binfo2005 Password: 2005binfo

Homework: Search for homologies with hidden Markov models

• Obtain the UniProtKB/Swiss-Prot entry P10242 of the myb proto-oncogene protein (AC P10242, entry MYB_HUMAN)

• Take the amino acid sequence of the myb protein and search against the NCBI nr protein database with BLASTp to obtain a HMM for myb-domains and use this HMM for searching against the UniProt-SwissProt protein database.

• Select 10 myb-domains while screening the hits of the BLASTp search and copy the corresponding parts of the sequences to a file in fasta-format

• Do a multiple sequence alignment with these ten myb-domains by ClustalW.

Page 28: HMMER tutorial 羅偉軒 g39208007@ym.edu.tw. Account IP: 140.129.78.120 Account: binfo2005 Password: 2005binfo

Homework: Search for homologies with hidden Markov models (cont.)

• Download HMMER from http://hmmer.wustl.edu/ and install.

• Build and calibrate a HMM of these myb-domains by means of hmmbuild and hmmcalibrate.

• Use hmmsearch to search against the UniProt-SwissProt protein library with the HMM of the myb-domains.

• Screen the hits, build a new HMM including selected hits and hmmsearch again.

• How many hits do you get? What are they?

Page 29: HMMER tutorial 羅偉軒 g39208007@ym.edu.tw. Account IP: 140.129.78.120 Account: binfo2005 Password: 2005binfo

HMM

Page 30: HMMER tutorial 羅偉軒 g39208007@ym.edu.tw. Account IP: 140.129.78.120 Account: binfo2005 Password: 2005binfo

Some examples