nucleic acid databases[1]

Upload: piscessanjeev

Post on 03-Jun-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/11/2019 Nucleic Acid Databases[1]

    1/37

    8/27/2014 5:03 AM

    Introduction to Bioinformatics

    databases: Nucleic Acid

    Databases

    DineshGupta

    ICGEB

  • 8/11/2019 Nucleic Acid Databases[1]

    2/37

    8/27/2014 5:03 AM

    Biological databases: why?

    Need for storing and communicating

    large datasets has grown

    Make biological data available to

    scientists.

    To make biological data available in

    computer-readable form.

  • 8/11/2019 Nucleic Acid Databases[1]

    3/37

    8/27/2014 5:03 AM

    Different classifications of

    databases

    Type of data

    nucleotide sequences

    protein sequences

    proteins sequence patterns or motifs

    macromolecular 3D structure

    gene expression data

    metabolic pathways

  • 8/11/2019 Nucleic Acid Databases[1]

    4/37

    8/27/2014 5:03 AM

    Different classifications of databases.

    Primary or derived databases

    Primary databases: experimental results

    directly into database

    Secondary databases: results of analysis of

    primary databases

    Aggregate of many databases

    Links to other data items Combination of data

    Consolidation of data

  • 8/11/2019 Nucleic Acid Databases[1]

    5/37

    8/27/2014 5:03 AM

    Different classifications of databases.

    Technical design

    Flat-files

    Relational database (SQL)

    Exchange/publication technologies (FTP,

    HTML, CORBA, XML,...)

  • 8/11/2019 Nucleic Acid Databases[1]

    6/37

    8/27/2014 5:03 AM

    Different classifications of databases.

    Availability

    Publicly available, no restrictions

    Available, but with copyright

    Accessible, but not downloadable

    Academic, but not freely available

    Proprietary, commercial; possibly free for

    academics

  • 8/11/2019 Nucleic Acid Databases[1]

    7/37

    8/27/2014 5:03 AM

    Where do I get DB of my interest ?

  • 8/11/2019 Nucleic Acid Databases[1]

    8/37

    8/27/2014 5:03 AM

  • 8/11/2019 Nucleic Acid Databases[1]

    9/37

    8/27/2014 5:03 AM

    http://www3.oup.co.uk/nar/database/c/

    http://www3.oup.co.uk/nar/database/c/http://www3.oup.co.uk/nar/database/c/
  • 8/11/2019 Nucleic Acid Databases[1]

    10/37

    8/27/2014 5:03 AM

    Nucleotide sequence databases

    EMBL, GenBank, and DDBJ are the three

    primary nucleotide sequence

    databases

    EMBL www.ebi.ac.uk/embl/

    GenBank

    www.ncbi.nlm.nih.gov/Genbank/

    DDBJ www.ddbj.nig.ac.jp

    http://www.ebi.ac.uk/embl/http://www.ncbi.nlm.nih.gov/Genbank/http://www.ddbj.nig.ac.jp/http://www.ddbj.nig.ac.jp/http://www.ncbi.nlm.nih.gov/Genbank/http://www.ebi.ac.uk/embl/
  • 8/11/2019 Nucleic Acid Databases[1]

    11/37

    8/27/2014 5:03 AM

    Genbank

    An annotated collection of all publiclyavailable nucleotide and proteins

    Set up in 1979 at the LANL (Los Alamos).

    Maintained since 1992 NCBI (Bethesda).

    http://www.ncbi.nlm.nih.gov

    http://www.ncbi.nlm.nih.gov/http://www.ncbi.nlm.nih.gov/
  • 8/11/2019 Nucleic Acid Databases[1]

    12/37

    8/27/2014 5:03 AM

  • 8/11/2019 Nucleic Acid Databases[1]

    13/37

    8/27/2014 5:03 AM

  • 8/11/2019 Nucleic Acid Databases[1]

    14/37

    8/27/2014 5:03 AM

    EMBL Nucleotide Sequence

    Database An annotated collection of all publicly available

    nucleotide and protein sequences

    Created in 1980 at the European Molecular

    Biology Laboratoryin Heidelberg.

    Maintained since 1994 by EBI- Cambridge.

    http://www.ebi.ac.uk/embl.html

    http://www.ebi.ac.uk/embl.htmlhttp://www.ebi.ac.uk/embl.html
  • 8/11/2019 Nucleic Acid Databases[1]

    15/37

    8/27/2014 5:03 AM

  • 8/11/2019 Nucleic Acid Databases[1]

    16/37

    8/27/2014 5:03 AM

    http://www3.ebi.ac.uk/Services/DBStats/

    http://www3.ebi.ac.uk/Services/DBStats/http://www3.ebi.ac.uk/Services/DBStats/http://www3.ebi.ac.uk/Services/DBStats/http://www3.ebi.ac.uk/Services/DBStats/
  • 8/11/2019 Nucleic Acid Databases[1]

    17/37

    8/27/2014 5:03 AM

    DDBJDNA Data Bank of Japan

    An annotated collection of all publicly availablenucleotide and protein sequences

    Started, 1984 at the National Institute ofGenetics(NIG) in Mishima.

    Still maintained in this institute a team led by

    Takashi Gojobori.

    http://www.ddbj.nig.ac.jp

    http://www.ddbj.nig.ac.jp/http://www.ddbj.nig.ac.jp/
  • 8/11/2019 Nucleic Acid Databases[1]

    18/37

    8/27/2014 5:03 AM

  • 8/11/2019 Nucleic Acid Databases[1]

    19/37

    8/27/2014 5:03 AM

  • 8/11/2019 Nucleic Acid Databases[1]

    20/37

    8/27/2014 5:03 AM

    Other NCBI nucleic acids DBs

    EST database:A collection of expressed sequence tags, or short, single-pass sequencereads from mRNA (cDNA).

    GSS database: A database of genome survey sequences, or short, single-pass genomicsequences.

    HomoloGene:A gene homology tool that compares nucleotide sequences between pairs oforganisms in order to identify putative orthologs.

    HTG database:A collection of high-throughput genome sequences from large-scalegenome sequencing centers, including unfinished and finished sequences.

    SNPs database:A central repository for both single-base nucleotide substitutions andshort deletion and insertion polymorphisms.

    RefSeq:A database of non-redundant reference sequences standards, including genomicDNA contigs, mRNAs, and proteins for known genes. Multiple collaborations, both withinNCBI and with external groups, supports data-gathering efforts.

    STS database:A database of sequence tagged sites, or short sequences that areoperationally unique in the genome.

    UniSTS:A unified, non-redundant view of sequence tagged sites (STSs).

    UniGene:A collection of ESTs and full-length mRNA sequences organized into clusters,each representing a unique known or putative human gene annotated with mapping andexpression information and cross-references to other sources.

    http://www.ncbi.nlm.nih.gov/dbEST/index.htmlhttp://www.ncbi.nlm.nih.gov/dbGSS/index.htmlhttp://www.ncbi.nlm.nih.gov/HomoloGene/http://www.ncbi.nlm.nih.gov/HTGS/http://www.ncbi.nlm.nih.gov/SNP/http://www.ncbi.nlm.nih.gov/RefSeq/http://www.ncbi.nlm.nih.gov/dbSTS/index.htmlhttp://www.ncbi.nlm.nih.gov/genome/sts/http://www.ncbi.nlm.nih.gov/UniGene/http://www.ncbi.nlm.nih.gov/UniGene/http://www.ncbi.nlm.nih.gov/genome/sts/http://www.ncbi.nlm.nih.gov/dbSTS/index.htmlhttp://www.ncbi.nlm.nih.gov/RefSeq/http://www.ncbi.nlm.nih.gov/SNP/http://www.ncbi.nlm.nih.gov/HTGS/http://www.ncbi.nlm.nih.gov/HomoloGene/http://www.ncbi.nlm.nih.gov/dbGSS/index.htmlhttp://www.ncbi.nlm.nih.gov/dbEST/index.html
  • 8/11/2019 Nucleic Acid Databases[1]

    21/37

    8/27/2014 5:03 AM

  • 8/11/2019 Nucleic Acid Databases[1]

    22/37

    8/27/2014 5:03 AM

  • 8/11/2019 Nucleic Acid Databases[1]

    23/37

    8/27/2014 5:03 AM

    Sequence submission

    Data mainly direct submissions from theauthors.

    Submissions through the Internet:

    Web forms. Email.

    Sequences shared/exchanged between

    the 3 centers on a daily basis: The sequence content of the banks is

    identical.

  • 8/11/2019 Nucleic Acid Databases[1]

    24/37

    8/27/2014 5:03 AM

    Derived databases

    CUTG Codon usage tabulated from GenBank

    http://www.kazusa.or.jp/codon/

    Genetic Codes Deviations from the standard genetic code in various

    organisms and organelles

    http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=c

    TIGR Gene Indices Organism-specific databases of EST and gene

    sequences http://www.tigr.org/tdb/tgi.shtml

    UniGene Unified clusters of ESTs and full-length mRNA sequences

    http://www.ncbi.nlm.nih.gov/UniGene/

    ASAP Alternative spliced isoformshttp://www.bioinformatics.ucla.edu/ASAP

    Intronerator Introns and alternative splicing in C.elegans and

    C.briggsae http://www.cse.ucsc.edu/~kent/intronerator/

    http://www.kazusa.or.jp/codon/http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=chttp://www.tigr.org/tdb/tgi.shtmlhttp://www.ncbi.nlm.nih.gov/UniGene/http://www.bioinformatics.ucla.edu/ASAPhttp://www.cse.ucsc.edu/~kent/intronerator/http://www.cse.ucsc.edu/~kent/intronerator/http://www.bioinformatics.ucla.edu/ASAPhttp://www.ncbi.nlm.nih.gov/UniGene/http://www.tigr.org/tdb/tgi.shtmlhttp://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=chttp://www.kazusa.or.jp/codon/
  • 8/11/2019 Nucleic Acid Databases[1]

    25/37

    8/27/2014 5:03 AM

  • 8/11/2019 Nucleic Acid Databases[1]

    26/37

    8/27/2014 5:03 AM

  • 8/11/2019 Nucleic Acid Databases[1]

    27/37

    8/27/2014 5:03 AM

  • 8/11/2019 Nucleic Acid Databases[1]

    28/37

    8/27/2014 5:03 AM

  • 8/11/2019 Nucleic Acid Databases[1]

    29/37

    8/27/2014 5:03 AM

  • 8/11/2019 Nucleic Acid Databases[1]

    30/37

    8/27/2014 5:03 AM

  • 8/11/2019 Nucleic Acid Databases[1]

    31/37

    8/27/2014 5:03 AM

    Nucleic acid structure

    databases NDB Nucleic acid-containing structures

    http://ndbserver.rutgers.edu/

    NTDB Thermodynamic data for nucleic acidshttp://ntdb.chem.cuhk.edu.hk/

    RNABase RNA-containing structures from PDB andNDB http://www.rnabase.org/

    SCOR Structural classification of RNA: RNA motifs bystructure, function and tertiary interactions

    http://scor.lbl.gov/

    http://ndbserver.rutgers.edu/http://ntdb.chem.cuhk.edu.hk/http://www.rnabase.org/http://scor.lbl.gov/http://scor.lbl.gov/http://www.rnabase.org/http://ntdb.chem.cuhk.edu.hk/http://ndbserver.rutgers.edu/
  • 8/11/2019 Nucleic Acid Databases[1]

    32/37

    8/27/2014 5:03 AM

  • 8/11/2019 Nucleic Acid Databases[1]

    33/37

    8/27/2014 5:03 AM

  • 8/11/2019 Nucleic Acid Databases[1]

    34/37

    8/27/2014 5:03 AM

  • 8/11/2019 Nucleic Acid Databases[1]

    35/37

    8/27/2014 5:03 AM

  • 8/11/2019 Nucleic Acid Databases[1]

    36/37

    8/27/2014 5:03 AM

    Database searching tips

    Look for links to Helpor Examples

    Try Booleansearches

    Be careful with UK/US spellingdifferences leukaemia vs leukemia

    haemoglobin vs hemoglobin

    colour vs color

  • 8/11/2019 Nucleic Acid Databases[1]

    37/37

    8/27/2014 5:03 AM

    Exercises Study the statistics of the three primary nucleic acid

    databases: Are they matching ?

    Look for a gene of your interest in the three primarynucleic acid databases: compare the information given ineach one of them.

    Read NAR DB paper and NAR DB index site: search fordifferent nucleic acid databases based on differentsearch terms.

    Self study:

    http://www3.oup.co.uk/nar/database/c/ Download NAR database paper (NARDB2004) from:ftp://cbag.sc.mahidol.ac.th/pub/Course_Materials/dinesh

    http://www3.oup.co.uk/nar/database/c/http://www3.oup.co.uk/nar/database/c/