central hub for biological data uniprotkb/swiss-prot is a central hub for biological data: over 120...
TRANSCRIPT
UniProtKB/Swiss-Prot is a central central hub for biological datahub for biological data: over 120 databases are cross-referenced (EMBL/DDBJ/GenBank, PDB, 2D-
PAGE, OMIM, TAIR, FlyBase, InterPro, PROSITE, etc.)
In order to avoid redundancy avoid redundancy and improve sequence reliabilityimprove sequence reliability, all protein sequences encoded by a
given gene are merged into a single entry (on average:
1 human entry -> more than 6 cross-references to EMBL).Differences found between
merged entries are documented.Evidence on protein existence are provided.
Our main sources of datasources of data are publications (~1’900 journals cited), external
scientific expertise and high-performance bioinformatics
tools.
Swiss-ProtSwiss-Prot (55.5, June 2008) 389’046 entries / 11’419 speciesBacteria/Archae 777 proteomes
Homo sapiens 19’804entriesOther mammals 42’674 entries
Plants 22’919 entriesVirus 12’283 entries
TrEMBLTrEMBL (38.5, June 2008)5’906’286 entries / 165’662 species
Swiss-Prot + TrEMBL give access to all publicly available protein sequences.Once in Swiss-Prot, an entry is no more in TrEMBL.
Highlights of an UniProtKB/Swiss-Prot entry in the UniProt view formatHighlights of an UniProtKB/Swiss-Prot entry in the UniProt view format
UniProtKB/Swiss-Prot is the manually annotated section of the UniProt knowledgebase. UniProtKB/Swiss-Prot is the manually annotated section of the UniProt knowledgebase. Manual annotation consists of a critical review of experimentally proven or predicted data about each protein, Manual annotation consists of a critical review of experimentally proven or predicted data about each protein,
including the protein sequenceincluding the protein sequence. . Data are continuously updated by an expert team of biologists. Data are continuously updated by an expert team of biologists.
A special emphasis is laid on the annotation of biological biological
events which generate protein events which generate protein diversitydiversity but are not always predictable at the genomic level. Alternative products (alternative splicing, RNA
editing…) and post-translational modifications are
extensively annotated. In mammals, polymorphisms (SAPs) and strain differences
are also integrated.
GenBank/DDBJ/EMBL,Ensembl and other protein
ressources
UniProt Knowledgebase (UniProtKB)
Annotation prioritiesAnnotation prioritiescomplete microbial
proteomes, plastid–encoded proteins, human and
mammalian orthologous proteins, plant proteins
(A.thaliana and rice), fungal proteomes, proteome of representative subsets of
strains of virus, toxins and anti-microbial peptides, Drosophila, Zebrafish,
Xenopus, and C.elegans proteomes…
UniProtKB/Swiss-Prot UniProtKB/Swiss-Prot - the manually annotated section of the UniProt Knowledgebase - - the manually annotated section of the UniProt Knowledgebase -
provides a link between protein sequences and state-of-the-art knowledgeprovides a link between protein sequences and state-of-the-art knowledge www.uniprot.org
…We need We need youryour feedback ! feedback !
[email protected]@uniprot.org
UniProtKB/Swiss-Prot provides a link between UniProtKB/Swiss-Prot provides a link between protein sequences and state-of-the-art knowledgeprotein sequences and state-of-the-art knowledge
UniProt Consortium Swiss Institute of Bioinformatics, European Bioinformatics Institute, Protein Information Resourcewww.uniprot.org
UniProtKB/TrEMBLUniProtKB/TrEMBLUnreviewed protein sequences
Automatic annotation
UniProtKB/Swiss-ProtUniProtKB/Swiss-ProtReviewed protein sequences
Manual annotation: sequence accuracy, no redundancy, high quality annotation,
numerous cross-references
…
UniRef UniParcUniProt KnowledgebaseGives access to archived protein sequences, found
in publicly accessible databases (UniProtKB, PIR, EMBL, Ensembl, IPI, PDB, RefSeq, FlyBase,
WormBase, Patent Offices…)
UniParc allows the tracking of a protein
sequence and its integration into various
databases.
One UniRef100 entry groups identical sequences (including
fragments).
One UniRef90 entry groups sequences that have at least
90% or more identity-> database size reduction of
~ 40%.
One UniRef50 entry groups sequences that are at least
50 % identical-> database size reduction of
~ 65%.
Clustering across species.
Three collections of sequence clusters (UniRef100, UniRef90,
UniRef50) based on UniProtKB and selected UniParc records
UniRef is useful forcomprehensive BLAST
similarity searches by providing sets of
representative sequences.
Use with caution: also contains pseudogenes, incorrect CDS predictions,
etc.
Gives access to publicly available protein sequences with a maximum of biological information.
UniProtKB is composed of two sections: UniProtKB/TrEMBL and UniProtKB/Swiss-Prot
UniProtKB/TrEMBL Unreviewed protein sequences- Computer annotated entries -
5’906’286 entries (Rel. 38.5, June 2008): Available protein sequences are automatically integrated into TrEMBL with: Merge of 100% identical sequences derived from the same organism, Protein family and domain attribution (InterPro), Automated annotation.
UniProtKB/Swiss-ProtReviewed protein sequences
- Manually annotated entries - 389’046 entries (Rel. 55.5, June 2008)
TrEMBL sequences are manually integrated into Swiss-Prot. This process involves:
Merge of all variant sequences derived from the same gene in a single species (polymorphisms, alternative splicing, RNA editing, etc.): low redundancy and high accuracy of the protein sequence;
Integration of biological and medical data derived from publications, external expertise, as well as high-performance bioinformatic tools, etc.:high-quality manual annotation;
Addition of cross-references to relevant databases: links to about 100 databases are available: central hub for biological data.
UniProtThe Universal Protein Resource
One UniParc entry groups identical sequences
across species.
Each entry contains a protein sequence,
taxonomic data and cross-references to source
databases.
Swiss Institute of Bioinformatics (SIB)European Bioinformatics Institute (EMBL-EBI)
Protein Information Resource (PIR)
UniProt is mainly supported by the National Institutes of Health (NIH) grant 2 U01 HG02712-04. Additional support for the EBI's involvement in UniProt comes from the European Commission (EC)'s FELICS grant (021902RII3) and from the NIH grant 1R01HGO2273-01. UniProtKB/Swiss-Prot activities at the SIB are supported by the Swiss Federal Government through the Federal Office of Education and Science. PIR activities are also supported by the NIH grants and contracts HHSN266200400061C, NCI-caBIG, and 1R01GM080646-01, and the National Science Foundation (NSF) grant IIS-0430743.
UniMESUniProt Metagenomic and Environmental Sequences
Currently the database contains only data from the Global Ocean Sampling Expedition (GOS). UniMES is released in FASTA format together with an UniMES
matches to InterPro method file.
The UniProt Consortium
The mission of UniProt is to provide the scientific community with a comprehensive, high-quality and freely accessible resource of protein sequence and functional information.
UniProt provides four databases, each optimized for different uses:UniProtKB, UniRef, UniParc and UniMES.
UniProt is produced by SIB, EBI and PIR.
UniMesUniMesMetagenomic
UniParc UniParc Sequence archive
EMBL/GenBank/DDBJ, Ensembl, VEGA, RefSeq, other protein resources
UniRefUniRefSequence clusters
Expert manual annotation
UniProtKB/TrEMBLUniProtKB/TrEMBLUnreviewed
Automated annotation
UniProtKB/Swiss-ProtUniProtKB/Swiss-ProtReviewed
UniProtKBUniProtKBProtein sequence knowledgebase
Contact: [email protected]@uniprot.orgWeb site: www.uniprot.orgwww.uniprot.org