basic local alignment and search tool

Upload: lancns

Post on 07-Apr-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 Basic Local Alignment and Search Tool

    1/33

    Basic Local Alignment and Search Tool(BLAST)

    Database searching

    10-02-2009

    Barbera van Schaik

  • 8/3/2019 Basic Local Alignment and Search Tool

    2/33

    2

    Why use BLAST?

    Dynamic Programming is not suitable for comparing a query

    sequence against a database Takes too much time!

    BLAST is a heuristic method to find the highest locally optimal

    alignments

    BLAST improved overall speed of searches

    BLAST maintains good sensitivity

  • 8/3/2019 Basic Local Alignment and Search Tool

    3/33

    3

    BLAST terminology

    query sequence

    blast

    targetdatabase(GenBank/

    SwissProt)

    output

    sequence list:Hits/subject

    information aboutinput query sequence, e.g.,function

    The aim of a database (blast) search is to discover sequence homologyon basis of sequence similarity

    BLAST returns similar sequences, not necessarily biological similarsequences

  • 8/3/2019 Basic Local Alignment and Search Tool

    4/33

    4

    BLAST variants

    blastptblastnamino acid query

    blastxblastn/tblastxnucleotide query

    protein databasenucleotidedatabase

    Sequence type

    blastn: finds NT sequences similar to your NT sequenceblastp: finds AA sequences similar to your AA sequenceblastx: finds AA sequences similar to translation of your NT

    sequence (if you cannot recognize an ORF)tblastn: translate AA sequence and searches against NTdatabase (for finding pseudogenes)

    tblastx: keep computers busy

  • 8/3/2019 Basic Local Alignment and Search Tool

    5/33

    5

    http://blast.ncbi.nlm.nih.gov/Blast.cgi

    Webinterf

    acechange

    snowand

    then

  • 8/3/2019 Basic Local Alignment and Search Tool

    6/33

    6

    http://blast.ncbi.nlm.nih.gov/Blast.cgi

  • 8/3/2019 Basic Local Alignment and Search Tool

    7/33

    7

    http://blast.ncbi.nlm.nih.gov/Blast.cgi

  • 8/3/2019 Basic Local Alignment and Search Tool

    8/33

    8

    BLASTing a sequence at NCBI programs

  • 8/3/2019 Basic Local Alignment and Search Tool

    9/33

    9

    BLASTing a sequence at NCBI enter accession

  • 8/3/2019 Basic Local Alignment and Search Tool

    10/33

    10

    BLASTing a sequence at NCBI enter sequence

  • 8/3/2019 Basic Local Alignment and Search Tool

    11/33

    11

    Database choice

    Protein databases

    Also good for protein coding nucleotide queries

    Choose a non-redundant database

    Nucleotide databases

    Non-redundant database

    Filter on organism / other Entrez query

  • 8/3/2019 Basic Local Alignment and Search Tool

    12/33

    12

    BLASTing a sequence at NCBI parameters

  • 8/3/2019 Basic Local Alignment and Search Tool

    13/33

    Blast algorithm: step 1

    protein query sequence

    protein database

    compile list of words

    of length W

    13

  • 8/3/2019 Basic Local Alignment and Search Tool

    14/33

    Blast algorithm: step 2Initial search

    Use PAM/BLOSUM matrix

    Find word of length W thatscores at least T (T=11)

    Exact matches only

    The parameter T dictates the

    speed and sensitivity-increasing T increases speed,decreases sensitivity

    14

  • 8/3/2019 Basic Local Alignment and Search Tool

    15/33

    Join words on same diagonal(ungapped)

    database sequence

    word hit

    High-scoring segment pair (score=S')

    distance AScore>T Score>T

    15

  • 8/3/2019 Basic Local Alignment and Search Tool

    16/33

    Join words on same diagnonal

    database sequence

    High-scoring segment pair (score=S')

    Extend HSP until score drops small amount below highest scoreof shorter alignment

    score score HSP

    extend to left (similar to right)

    stop extensiondrops below threshold

    score S for this alignment

    If S>threshold (based on random sequences) then keep HSP16

  • 8/3/2019 Basic Local Alignment and Search Tool

    17/33

    Finding HSPs

    T>11

    T>13

    HSP

    (Join and extend)

    17

  • 8/3/2019 Basic Local Alignment and Search Tool

    18/33

    Trigger gapped extension

    HSP

    Gapped extension

    18

  • 8/3/2019 Basic Local Alignment and Search Tool

    19/33

    19

    BLASTing a sequence at NCBI parameters

  • 8/3/2019 Basic Local Alignment and Search Tool

    20/33

    20

    Masking of sequences low complexity

    Low complexity repeats in genome

    Many amino-acid stretches in proteins

    BLAST recognizes these regions as similar

    but, they are NOT biologically related

  • 8/3/2019 Basic Local Alignment and Search Tool

    21/33

    21

    Masking of sequences highly abundant sequences

    First query sequence against database that contains domainsrepresentative of large sequence families

    Alu repeatsProtein kinase catalytic domainsVector sequences

    Then mask these domains in the query sequence and continuesearch

    Masking option replaces these regions with XXXXXXX

  • 8/3/2019 Basic Local Alignment and Search Tool

    22/33

    22

    When do you change the parameters?

    The database youre searching OR

    filter the reported entries by keywordOR increase the nr of reportedmatches OR increase Expect (theevalue threshold) OR rejectsequences too similar to the query

    (those with very low e-values)

    BLAST reports too many matches

    The substitution matrix or the gappenalties to check the matchsrobustness

    Your match has a borderline E-value

    The substitution matrix or the gappenalties

    BLAST doesnt report any results

    Sequence filter (automatic masking)The sequence youre interested in

    contains many identical residues; ithas a biased composition

    Parameters to changeReason

    Parametersare

    already

    optimiz

    ed

  • 8/3/2019 Basic Local Alignment and Search Tool

    23/33

    23

    BLASTing a sequence at NCBI parameters

  • 8/3/2019 Basic Local Alignment and Search Tool

    24/33

    24

    BLASTing a sequence at NCBI job status

  • 8/3/2019 Basic Local Alignment and Search Tool

    25/33

    25

    If it takes too long: try another BLAST server

    http://blast.ncbi.nlm.nih.gov/Blast.cgi

    http://www.expasy.ch/tools/blast/http://www.ch.embnet.org/software/

    bBLAST.html

    http://www.ebi.ac.uk/blast

    http://blast.ddbj.nig.ac.jp/top-e.html

    BLAST / PSI-BLAST

    BLASTBLAST

    BLAST

    BLAST / PSI-BLAST

    USA

    EuropeEurope

    Europe

    Japan

    URLProgramCountry /continent

    Warning:d

    ifferentdata

    base(versions)!

  • 8/3/2019 Basic Local Alignment and Search Tool

    26/33

    26

    BLASTing a sequence at NCBI blast summary

  • 8/3/2019 Basic Local Alignment and Search Tool

    27/33

    27

    BLASTing a sequence at NCBI used parameters

  • 8/3/2019 Basic Local Alignment and Search Tool

    28/33

    28

    BLASTing a sequence at NCBI graphical display

  • 8/3/2019 Basic Local Alignment and Search Tool

    29/33

    29

    BLASTing a sequence at NCBI hit list

    How often would thishit have occurred bychance?

    Rule of thumb:E-value < 0.0001

  • 8/3/2019 Basic Local Alignment and Search Tool

    30/33

    30

    BLASTing a sequence at NCBI

  • 8/3/2019 Basic Local Alignment and Search Tool

    31/33

    31

    Alternatives for homology searches

    http://genome.ucsc.edu/BLATUSA

    http://www.ddbj.nig.ac.jp/search/

    search-e.html

    SSEARCH/ FASTA

    Japan

    http://www.ch.embnet.org/software/GMFDF_form.html

    SSEARCHEurope

    http://www.ebi.ac.uk/fasta33FASTAEurope

    http://fasta.bioch.Virginia.edu/fastaFASTAUSA

    AddressProgramCountry /continent

  • 8/3/2019 Basic Local Alignment and Search Tool

    32/33

    32

  • 8/3/2019 Basic Local Alignment and Search Tool

    33/33

    33