dna 序列分析
DESCRIPTION
DNA 序列分析. David Shiuan Department of Life Science Institute of Biotechnology and Interdisciplinary Program of Bioinformatics National Dong Hwa University. DNA 序列分析 (I). BLAST comparison ORF (open reading frame) Finder Promoter Search - PowerPoint PPT PresentationTRANSCRIPT
DNA 序列分析
David Shiuan
Department of Life Science
Institute of Biotechnology and
Interdisciplinary Program of Bioinformatics
National Dong Hwa University
DNA 序列分析 (I)
BLAST comparison ORF (open reading frame) Finder Promoter Search
- Promoter Prediction (BCM)
- EPD (Eukaryote Promoter Database)
- NNPP prokaryote promoter prediction (BCM)
- ProtScan (BIMAS)
DNA 序列分析 (II)
Sequence Alignment (Clastal W) Tree Analysis (MEGA, PAUP, UPGMA) Motif Prediction Restriction Analysis (TCGA) RNAFOLD (GCG)
Basic Local Alignment Search Tool
A sequence comparison algorithm optimized for speed used to search sequence databases for optimal local alignments to a query.
Algorithm : A fixed procedure embodied in a computer program.
Basic Local Alignment Search Tool
The initial search is done for a word of length "W" that scores at least "T" when compared to the query using a substitution matrix. Word hits are then extended in either direction in an attempt to generate an alignment with a score exceeding the threshold of "S". The "T" parameter dictates the speed and sensitivity of the search.
Calculating alignment scores
BLOSUM62 Substitution Scoring Matrix
The BLOSUM 62 matrix shown here is a 20 x 20 matrix, in which every possible identity and substitution is assigned a score based on the observed frequencies of such occurences in alignments of related proteins.
Identities are assigned the most positive scores.
The NCBI BLAST family of programs
blastp compares an amino acid query sequence against a protein sequence database
blastn compares a nucleotide query sequence against a nucleotide sequence database
blastx compares a nucleotide query sequence translated in all reading frames against a protein sequence database
tblastn compares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames
tblastx compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database.
Peptide Sequence Databases
for BLAST search nr
All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF
month All new or revised GenBank CDS
translation+PDB+SwissProt+PIR+PRF released in the last 30 days.
swissprot Last major release of the SWISS-PROT protein
sequence database (no updates)
Filtering of low-complexity segments
E-value for the score S
the expected number of HSPs with score at least S is given by the formula
E = K m n e – S
HSP : high-scoring segment pairs m and n : sequence lengths K and lambda : parameters
Promoter Search
ProtScan (at BIMAS) EPD (Eukaryote Promoter Database) Promoter Prediction (BCM) NNPP (Prokaryote Promoter Prediction at
BCM)
About the neural network method
NNPP is a method that finds eukaryotic and prokaryotic promoters in a DNA sequence.
It has been shown that multiple functional sites in the primary DNA are involved in the polymerase binding process.
These elements, such as the TATA-box and the transcription start site ("Initiator") for eukaryotes.
These promoter elements are present in various combinations separated by various distances in the sequence.