biological sequence databases
TRANSCRIPT
1
BIOLOGICAL SEQUENCE
DATABASES
2
NCBIWhat is NCBI?National center for biotechnology
informationEstablished in 1998Part of national library of medicine at
national institute of healthMajor aim : public databaseDevelopment of software tools for
sequence analysis and disseminate biomedical information
3
2 explain Roles of NCBI
1) Maintenance of biological databases whether primary or secondary. It includes GENEBANK
2) NCBI provides the data retrieval systems such as ENTREZ
3) Provides computational sources for the analysis of the GENEBANK data and other biological data
4
Kinds of databases
Primary databases Secondary databases
Original submission by the experimentalists who have originally searched
Content Is controlled by the submitters
Examples include GENEBANK, SNP and GEO
Built up from primary data which is retrieved by primary database
Content controlled by third party NCBI
Examples include RefSeq, RefSNP, NCBI Structure, Protein. Etc.
5
NCBI homepage
6
NCBI
TOOLS
BLAST
Standard blast Mega blast
PSI-blast PHI-blast
RPS blastBLAST 2 SEQ
DATABASE RETREIVAL
TOOL
SPECIALIZED TOOL
ORF finder E-pcr
Sequence submission tool bankit
Spidey
DATABASES
Nucleotide database
Literature database
Protein database
Expression database
Structure database
7
Retrieval tool ENTREZ
Integrated database search and retrieval system
Provides extensive links between and within database records
Cross references of different databases
8
3 Sequence submission to NCBIDatabases are constantly
updated with the newer submissions of the sequences via sequence submission tools such as:
BankitSequein
9
Bank it Web-based sequence submission
tool Connect to NCBI Home Page Connect to GENEBANK side bar
at leftTool of choice for simple
submissionsCan also be used for updating
previously added information
10
SequeinStand alone sequence
submission and updating tool Handling multiple sequence
submission Provides increased capacity for
long sequence submissions Multiple annotationPhylogenetic analysis population
11
BLAST
Basic local alignment search tool program
Sequence similarity searches against a variety of different sequence databases
Unigene, gene, MMDB, GEO
12
Kinds of BLAST
Blastn Blastp Blastx Tblastn Tblastx
13
SPECIALIZED TOOLS
There are a lot of sequence analysis tools which will be explained later
1) ORF Finder2) e-PCR3) SPIDEY
14
ORF FINDEROpen reading frame finderGraphical analysis toolFinds all open reading frames in
the user’s sequence or the sequence already submitted in the databases
Uses standard and alternative genetic codes for the analysis of reading frames
Packaged with sequein
15
e-PCR
Electronic polymerase chain reaction
Searches for the STSWhole template DNA is searched
for STSNew database searches a query
sequence against a sequence database
16
SpideyThis is another m RNA to genome
alignment tool Searches databases via BLAST As an input it gets a single
genomic sequence and m RNA FASTA sequences
Pseudo genes and paralogues are eliminated in this search and rue gene is selected.
17
Databases of NCBI
Nucleotide
Literature
Protein
Gene expression
Structure
Chemical
18
Nucleotide database- GENEBANKNCBI’s primary sequence
databaseComprehensive public database
of nucleotide sequencesBibliographic supportBuilt from authors entry into
genebak regarding EST Genebank an EMBL make an
INSDCollaborative approach to share
data daily
19
HOMOLOGENEAutomated detection of
homologuesCompletely sequenced
eukaryotic genesAnalyses the proteins of the input
organism BlastpTaxonomic trees are being madeStatistical analysis of each match
is done and orthologs and paralogs are identified
20
Db SNPDatabase of single nucleotide
polymorphismsShort deletion and insertions
polymorphismsSNP~ 3D structures via Cn3D
and MMDB Functional variants could be
matched with the OMIM
21
Literature database- PMC
Pubmed central Digital archive of peer review
journals of life sciences Enormous full text journals are
thereImmediate access to full text
journals or within 12 months of publishing
22
Protein databaseENTREZ PROTEIN ~ Protein
sequence database of NCBIDatabases are cross searched PDB, Swiss-ProtTaxonomic relations CDD conserved domain database
23
Gene expression databaseDistribution and regulation of the
Transcriptional products Normal and abnormal cell typesLot of techniques have been
developed for survey of genome wide transcript expression
24
SAGE mapSerial analysis of gene
expression map Gene expression data analysisTag-to-gene function map SAGE tags to gene clusters or a
single gene A reciprocal gene to tag SAGE
Map is also available Updated weekly
25
Structural database- MMDBMolecular modeling database
MMDB3D macromolecular structuresXRD and NMR are being used for
the experimental structure determination
Evolutionary history of function Relationship between
macromolecules.
26
27
28
29
DATABASES
30
Chemical database- PubchemDatabase for the chemical
molecules Freely accessed through web-user
interfaceChemical structureDiagnostic and therapeutic agentsMolecular mass below 2000uBridge between macromolecular
genomics and small organic molecules of cellular metabolism
31
32
Display settings
33
Aspirin
34
35
Thanks