ncbi review concepts 20040715 chuong huynh. ncbi pairwise sequence alignments purpose:...

21
NCB I Review Concepts 20040715 Chuong Huynh

Upload: norah-reynolds

Post on 30-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: NCBI Review Concepts 20040715 Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)

NC

BI

Review Concepts 20040715

Chuong Huynh

Page 2: NCBI Review Concepts 20040715 Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)

NC

BI

Pairwise Sequence Alignments

• Purpose:• identification of sequences with significant similarity to

(a) sequence(s) in a sequence-repository• identification of all homologous sequences the repository• identification of domains with sequence similarity

• Terminology • Global alignment• Local alignment

Page 3: NCBI Review Concepts 20040715 Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)

NC

BI

Terminology: Global Alignment

• Finds the optimal alignment over the entire length of the two compared sequences

• Unlikely to detect genes that have evolved by recombination (e.g. domain shuffling) or insertion/deletion of DNA

• Suitable for sequences of homologous molecules

Page 4: NCBI Review Concepts 20040715 Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)

NC

BI

Terminology: Local Alignment

• short regions of similarity between a pair of sequences.

• compared sequences can receive high local similarity scores, without the need to have high levels of similarity over their entire length

• useful when looking for domains within proteins or looking for regions of genomic DNA that contain coding exons

Page 5: NCBI Review Concepts 20040715 Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)

NC

BI

An alignment that BLAST can’t find

1 GAATATATGAAGACCAAGATTGCAGTCCTGCTGGCCTGAACCACGCTATTCTTGCTGTTG || | || || || | || || || || | ||| |||||| | | || | ||| |

1 GAGTGTACGATGAGCCCGAGTGTAGCAGTGAAGATCTGGACCACGGTGTACTCGTTGTCG

61 GTTACGGAACCGAGAATGGTAAAGACTACTGGATCATTAAGAACTCCTGGGGAGCCAGTT

| || || || ||| || | |||||| || | |||||| ||||| | |

61 GCTATGGTGTTAAGGGTGGGAAGAAGTACTGGCTCGTCAAGAACAGCTGGGCTGAATCCT

121 GGGGTGAACAAGGTTATTTCAGGCTTGCTCGTGGTAAAAAC

|||| || ||||| || || | | |||| || |||

121 GGGGAGACCAAGGCTACATCCTTATGTCCCGTGACAACAAC

Page 6: NCBI Review Concepts 20040715 Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)

NC

BI

BLAST Selection Matrix

Page 7: NCBI Review Concepts 20040715 Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)

NC

BI

Choosing The Right BLAST Flavor for Proteins

What you Want to Do? The Right BLAST Flavor

Find out something about the function of the protein

Use blastp to compare your protein with other proteins contained in the databases.

Discover new genes encoding similar proteins

Use tblastn to compare your protein with DNA sequences translated into their 6 possible reading framesClaverie & Notredame 2003

Page 8: NCBI Review Concepts 20040715 Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)

NC

BI

Choosing the Right BLAST

Flavor for DNAQuestions Answer

Am I interested in non coding DNA?

Yes, Use blastn. Rem: blastn is only for closely related DNA sequences (more than 70% identical)

Do I want to discover new proteins?

Yes, Use tblastx

Do I want to discover proteins encoded in my query DNA sequences?

Yes, Use blastx

Am I unsure of the quality of my DNA?

Yes, Use blastx. Especially if you suspsect your DNA sequence codes for a protein, but may contain sequencing errors.

Claverie & Notredame 2003

Page 9: NCBI Review Concepts 20040715 Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)

NC

BI

Choosing The Right BLAST Flavor

for DNA SequencesUsage Query Database Progra

m

Find very similar DNA sequence

DNA DNA blastn

Protein discovery and ESTs

Translated DNA

Translated DNA

tblastx

Analysis of query DNA sequence

Translated DNA

Protein blastx

Claverie & Notredame 2003

Page 10: NCBI Review Concepts 20040715 Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)

NC

BI

BLAST Tips

• It is faster and more accurate to BLAST proteins (blastp) rather than nucleotides.

• If in doubt use blastp.• When possible restrict to the subset of

the database you are interested in.• Look around for the database you

need or create your own custom BLAST database. BUT HOW???

• When is the best time to use the BLAST server?

Page 11: NCBI Review Concepts 20040715 Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)

NC

BI

Asking Biological Problems with BLASTWhat You

Want to DOGeneral (but More Complicated) Computational Method

Using BLAST

Finding genes in a genome

Run gene prediction software or an ORF Finder (for bacteria)

Cut your genome sequence in little (2-5kb) overlapping sequences. Use blastx to BLAST each piece of genome against NR (nonredundant protein db). Works better for sequences with no introns (bacteria).

Predicting protein function

Domain analysis or wet-lab experimentation

Use blastp to BLAST your protein sequence against SWISS-Prot (future = UniProt). If you get a good hit (more than 25% identify) over the complete length of the protein, then your protein has the same function as the SWISS-PROT protein

Predicting protein 3-D structure

Homology modeling, X-ray, NMR analysis of protein of interest

Use blastp to BLAST your protein against PDB (Protein structure DB), if you get hit >25% identity, then your protein and the good hit(s) have a similar 3-D structure

Finding protein family members

Clone new family members using PCR techniques

Use blastp (or better use PSI-BLAST) and run against NR (nonredundant protein family). After you have all members of family, you can make multiple sequence alignment phylogenetic tree

Claverie & Notredame 2003

Page 12: NCBI Review Concepts 20040715 Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)

NC

BI

BLAST and PSI-BLAST Servers on the Internet

Country

Program

URL

USA BLAST/ PSI-BLAST

http://www.ncbi.nlm.nih.gov/BLAST

USA BLAST http://genome.wustl.edu/gsc/BLAST

EUROPE BLAST http://www.ch.embnet.org/software/bBLAST.html

Europe BLAST http://www.ebi.ac.uk/blast2/

Japan BLAST/ PSI-BLAST

http://www.ddbj.nig.ac.jp/E-mail/ homology.html

Page 13: NCBI Review Concepts 20040715 Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)

NC

BI

Common Mistake

• Seq1 has domain A & B; Seq2 has domain A and Seq3 has domain B

• Use Seq 1 as query sequence• What happens? E-value of both of these hits may

be very high if domain A and B are long and well conserved.

• Seq1 is homologous to Seq2&3, but remember Seq1 is not homlogous over the entire length to Seq2&3

• Just don’t depend on the E-value• “BLAST hits are not transitive, unless the

alignments are overlapping”• Most proteins have more than one domain, so

becareful when looking a BLAST results, not all reported hits belong to the same big family.

Sequence 1: AAAAAABBBBBBSequence 2: AAAAAASequence 3: BBBBBB

Page 14: NCBI Review Concepts 20040715 Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)

NC

BI

Alternative Method for

Homology Searches• Smith-Waterman (ssearch): slower but

more accurate• FASTA: slower than BLAST, but more

accurate when making DNA comparison

• BLAT: for locating cDNA in a genome or finding close proteins in a genome

Page 15: NCBI Review Concepts 20040715 Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)

NC

BI

Common Questions

• When I do a blast job using WU-BLAST vs NCBI BLAST with the same query sequence, I get a different result? Both are based on the same algorithm, but a different implementation. So why the difference?

Usually this is due to the slight variation in the database version, but differences in BLAST program version also play a minor role in the difference. Usually the result, do not change in a dramatic manner, but they do change a bit.

Page 16: NCBI Review Concepts 20040715 Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)

NC

BI

Basic Gene Prediction Flow Chart

Obtain new genomic DNA sequence

1. Translate in all six reading frames and compare to protein sequence databases2. Perform database similarity search of expressed sequence tagSites (EST) database of same organism, or cDNA sequences if available

Use gene prediction program to locate genes

Analyze regulatory sequences in the gene

Page 17: NCBI Review Concepts 20040715 Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)

NC

BI

The Annotation Process

DNA SEQUENCE

AN

NA

LY

SIS

SO

FT

WA

RE

UsefulInformation

Annotator

Page 18: NCBI Review Concepts 20040715 Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)

NC

BI

DNA sequence

RepeatMasker Blastn HalfwiseBlastxGene finders tRNA scan

Repeats Promoters Pseudo-GenesrRNAGenes

tRNA

Fasta BlastP Pfam Prosite Psort SignalP TMHMM

Annotation Process

Page 19: NCBI Review Concepts 20040715 Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)

NC

BI

How do I do large scale genome analysis?

• Read Koonin’s book on NCBI Bookshelf

Page 20: NCBI Review Concepts 20040715 Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)

NC

BI

TaxPlot is a tool for three-way comparisons of genomes on the basis of the protein sequences they encode.

Demo TaxPlot

http://www.ncbi.nlm.nih.gov/sutils/taxik2.cgi

Page 21: NCBI Review Concepts 20040715 Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)

NC

BI

Demo - VecScreen

http://www.ncbi.nlm.nih.gov/VecScreen/VecScreen.html