a proteomics toolkit:
DESCRIPTION
A Proteomics Toolkit:. UniProt, InterPro and IntAct Databases at the EBI. Hinxton,U.K. EMBL. GenBank. EBI (EMBL). NCBI (NIH). DDBJ. CIB (NIG). European Bioinformatics Institute. (http://www.ebi.ac.uk/). Created as part of the EMBL in 1992 - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/1.jpg)
A Proteomics A Proteomics Toolkit:Toolkit:
UniProt, InterPro and IntAct UniProt, InterPro and IntAct Databases at the EBIDatabases at the EBI
![Page 2: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/2.jpg)
Hinxton,U.K.
![Page 3: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/3.jpg)
European Bioinformatics InstituteEuropean Bioinformatics Institute
Created as part of the EMBL in 1992
• To house EMBL Nucleotide Sequence Data Library established in 1980
Today, 3 databases accept primary nucleotide data:
(http://www.ebi.ac.uk/)
EBI (EMBL)EBI (EMBL) EMBL
CIB (NIG)CIB (NIG)
DDBJ
NCBI (NIH)NCBI (NIH)GenBank
![Page 4: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/4.jpg)
EMBL-EBI EMBL-EBI maintains the maintains the world’s most world’s most
comprehensive comprehensive range of range of
molecular molecular databasesdatabases
European Bioinformatics InstituteEuropean Bioinformatics Institute(http://www.ebi.ac.uk/)
![Page 5: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/5.jpg)
Nucleotide Sequence Database
Database of Protein Families and Domains
ArrayExpress
Alternative Splicing Database
Protein Sequence Database
Molecular Structure Database
Alternative Transcript Diversity
Automatic Annotation of Genomes
Protein Interaction Database
Chemical Entities of Biological
Interest
Gene Ontology
Enzyme Database
Database of Biological Processes
![Page 6: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/6.jpg)
http://www.ebi.ac.uk/services/
![Page 7: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/7.jpg)
Roles of Public Domain DatabasesRoles of Public Domain Databases
To provide stable, long-term sources of basic information
To react in the long-term for the needs of the community
To act as repositories for published information
To bridge the gap between multiple data sources
![Page 8: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/8.jpg)
Protein DatabasesProtein Databases
UniProtUniProt Database of Protein Sequences
InterPro InterPro Database of Protein Families and Domains
IntAct IntAct Database of Protein Interactions
![Page 9: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/9.jpg)
World's most comprehensive catalogue of information on proteins
Funded mainly by NIH
A central repository of protein sequence and function
Based on the original work of PIR, Swiss-Prot and TrEMBL
UniProtUniProt
![Page 10: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/10.jpg)
Met-Gln-Pro-Glu-Glu-Gly-Thr-Gly-Trp-Leu-Leu-Glu-Val-Gln-Gln-
Met-Gly-Arg-Gly-Arg-Cys-Val-Gly-Pro-Ser-Leu-Gln-Glu-Trp-Arg-
protein sequencingprotein sequencing
annotationannotation Swiss-Prot
EMBL
CGCGCCTGTACGCTGAACGCTCGTGACGTGTAGTGCGCG
CGCTGTGATAGCGCTGATCGTGATGCGTATGCAGGTCGT
nucleotide sequencingnucleotide sequencing
![Page 11: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/11.jpg)
Swiss-Prot
EMBL
CGCGCCTGTACGCTGAACGCTCGTGACGTGTAGTGCGCG
CGCTGTGATAGCGCTGATCGTGATGCGTATGCAGGTCGT
nucleotide sequencingnucleotide sequencing
TrEMBL
translated EMBLtranslated EMBL
annotationannotation
UniProUniPrott
PSD
PIRPIR
annotation
++
EBIEBI
![Page 12: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/12.jpg)
UniProt Consortium
![Page 13: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/13.jpg)
UniProtUniProt
UniProt Reference ClustersUniProt Reference Clusters (UniRef)
UniProt KnowledgebaseUniProt Knowledgebase (UniProt)
UniProt ArchiveUniProt Archive (UniParc)
3 Components:3 Components:
![Page 14: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/14.jpg)
UniProtUniProt
UniProt KnowledgebaseUniProt Knowledgebase (UniProt)
3 Components:3 Components:
UniProt Reference ClustersUniProt Reference Clusters (UniRef)
UniProt ArchiveUniProt Archive (UniParc)
• Central repository for annotated protein sequences
![Page 15: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/15.jpg)
UniProtUniProt
UniProt KnowledgebaseUniProt Knowledgebase (UniProt)
UniProt ArchiveUniProt Archive (UniParc)
3 Components:3 Components:
• Swiss-Prot: non-redundant, manually annotated• TrEMBL: redundant, automatically annotated
• Central repository for annotated protein sequences
UniProt Reference ClustersUniProt Reference Clusters (UniRef)
![Page 16: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/16.jpg)
UniProtUniProt
UniProt KnowledgebaseUniProt Knowledgebase (UniProt)
3 Components:3 Components:
• Swiss-Prot: non-redundant, manually annotated• TrEMBL: redundant, automatically annotated
UniProt ArchiveUniProt Archive (UniParc)
• Central repository for annotated protein sequences
UniProt Reference ClustersUniProt Reference Clusters (UniRef)• Combines related sequences for speed searching
![Page 17: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/17.jpg)
UniProtUniProt
UniProt KnowledgebaseUniProt Knowledgebase (UniProt)
3 Components:3 Components:
• Swiss-Prot: non-redundant, manually annotated• TrEMBL: redundant, automatically annotated
• Central repository for annotated protein sequences
UniProt Reference ClustersUniProt Reference Clusters (UniRef)• Combines related sequences for speed searching• UniRef100, UniRef90, UniRef50
UniProt ArchiveUniProt Archive (UniParc)
![Page 18: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/18.jpg)
UniProtUniProt
UniProt Reference ClustersUniProt Reference Clusters (UniRef)
UniProt KnowledgebaseUniProt Knowledgebase (UniProt)
UniProt ArchiveUniProt Archive (UniParc)
3 Components:3 Components:
• Combines related sequences for speed searching
• Comprehensive repository for history of sequences
• Central repository for annotated protein sequences• Swiss-Prot: non-redundant, manually annotated• TrEMBL: redundant, automatically annotated
• UniRef100, UniRef90, UniRef50
![Page 19: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/19.jpg)
![Page 20: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/20.jpg)
UniProt Explicit Links
SequenceEMBL/GenBank/DDBJPIR
PTM GlycoSuiteDBPhosSite
StructureHSSPPDBMSD
Domains, Sites, FamiliesGene3DHAMAPInterProPANTHERPfamPIRSFPRINTSProDomPROSITESMARTTIGRFAM
2D-gel ElectrophoresisANU-2DPAGEAarhus/Ghent-2DPAGECOMPLUYEAST-2DPAGEECO2DPAGEHSC-2DPAGEMAIZE-2DPAGEOGPPHCI-2DPAGEPMMA-2DPAGERat-heart-2DPAGESiena-2DPAGESWISS-2DPAGE
Molecular InteractionIntActTRANSFAC
DatabasesDatabasescross-referencedcross-referenced
in UniProtin UniProt
MiscellaneousEnsemblGermOnlineGene OntologyMEROPS
Organism-SpecificAGDdbSNPDictyBaseEcoGeneEchoBASEFlyBaseGeneDB_SpombeGeneFarmGenewGrameneHIVH-InvDBLegioListLepromaListiListMaizeDBMGDMypuListOMIMPhotoListReactomeRGDSagaListSGDStyGeneSubtiListTAIRTIGRTubercuListWormBaseWormPepZFIN
![Page 21: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/21.jpg)
http://www.ebi.ac.uk/services/
![Page 22: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/22.jpg)
Search tools include:
• Text Search
http://www.ebi.uniprot.org/index.shtml
• Blast, Fasta and MPsrch
• Links to extra search services (including SRS)
• Power Search
Searching UniProtSearching UniProt
![Page 23: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/23.jpg)
http://www.ebi.uniprot.org/index.shtml
• Text-based searching• Logical operators ‘&’ (and), ‘|’ (or) • (Wildcards and numerical operators not allowed)
• Text Search – keyword queries• Power Search – can search for specific entry lines• Warehouse Search – link query to other databases
![Page 24: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/24.jpg)
Text Search ResultsText Search Results
Each linked to the UniProt entry
![Page 25: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/25.jpg)
• Sequence-based searching• BLAST, Fasta, MPsrch
![Page 26: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/26.jpg)
Sequence Search ResultsSequence Search Results
UniProt entry
Identity score
View alignments
![Page 27: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/27.jpg)
Manipulate multiple data sets
![Page 28: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/28.jpg)
Use Venn diagrams to combine, intersect, or
subtract multiple data sets
Build complex data sets
![Page 29: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/29.jpg)
UniProt/Swiss-Prot entry for UniProt/Swiss-Prot entry for human ubiquitin-protein ligase E3 human ubiquitin-protein ligase E3
mdm2mdm2
![Page 30: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/30.jpg)
Some literature search engines pull
synonyms from UniProt for more
complete searching
Merged entries:• Remove redundancy• Can still be searched
![Page 31: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/31.jpg)
![Page 32: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/32.jpg)
![Page 33: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/33.jpg)
![Page 34: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/34.jpg)
![Page 35: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/35.jpg)
![Page 36: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/36.jpg)
![Page 37: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/37.jpg)
IntAct Database
![Page 38: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/38.jpg)
![Page 39: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/39.jpg)
Summary of nucleotide data
upon which entry is originally basedStructural data associated with entry protein
![Page 40: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/40.jpg)
IntAct Database
![Page 41: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/41.jpg)
IntAct Database
All the interactions with
entry protein
![Page 42: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/42.jpg)
IntAct Database
![Page 43: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/43.jpg)
![Page 44: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/44.jpg)
IntAct Database
![Page 45: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/45.jpg)
IntAct Database
![Page 46: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/46.jpg)
Experimental information
Experimental name Experimental technique:
co-immunoprecipitation
Literature citation used for curationTaxonomic Reference
Interaction information
Links to interacting protein
![Page 47: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/47.jpg)
IntAct Database
Displays interactions graphically
![Page 48: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/48.jpg)
View all 7 interactions involving MDM2
View all GO interactions involving MDM2
![Page 49: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/49.jpg)
View all InterPro entries associated with MDM2
Expand graph to see network surrounding one protein
Expand graph to see entire network
![Page 50: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/50.jpg)
View interactions associated with both MDM2 and p53
View all proteins in a network associated with a specific GO term
![Page 51: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/51.jpg)
All protein in red associated with “negative regulation of cell proliferation”
![Page 52: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/52.jpg)
![Page 53: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/53.jpg)
Genomic location
Complete nucleotide sequence
SNP information
Transcript and protein information
Transcript structure
![Page 54: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/54.jpg)
![Page 55: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/55.jpg)
Interactive map. Can zoom in/out, and move around
Summary and links to information about processes involving this molecule (here
cell-cycle checkpoints)GeneralSpecific
![Page 56: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/56.jpg)
Mendelian Inheritance in Man
![Page 57: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/57.jpg)
Cellular componentMolecular functionBiological process
![Page 58: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/58.jpg)
InterPro Database
• Allow searching for terms• Linked to GO
![Page 59: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/59.jpg)
Domain organisation
Position of motifs and sites
Positions of variable splicing
Experimental mutation information
Sequencing conflicts
![Page 60: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/60.jpg)
Secondary structure
![Page 61: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/61.jpg)
Easy navigation between UniProt/UniParc/UniRef
Useful for cut/paste into search engines
![Page 62: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/62.jpg)
![Page 63: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/63.jpg)
![Page 64: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/64.jpg)
![Page 65: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/65.jpg)
UniProt/TrEMBLUniProt/TrEMBL
>2.5 M entries in TrEMBL
Doubled since mid-2004 Doubled since mid-2001
>200 K entries in Swiss-Prot
![Page 66: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/66.jpg)
UniProt
raw data
Curated automated annotationCurated automated annotation
TrEMBL TrEMBL ??
SwissProt SwissProt annotationannotation
![Page 67: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/67.jpg)
UniProt/TrEMBLUniProt/TrEMBL
Redundancy
Automatically maintained
• Automatic clean-up of nucleotide data
• Automatic annotation
• InterPro run and cross-references updated every 2 weeks
Recognises common annotation in related Swiss-Prot entries
Identifies all members of family using InterPro
![Page 68: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/68.jpg)
SwissProt SwissProt annotated annotated sequencessequences
uncharacterised
Multiple Multiple signaturessignatures INTERPROINTERPRO
provides provides annotation on annotation on multiple levelsmultiple levels
Feeds back to Feeds back to TrEMBLTrEMBL
Curated Annotation in InterProCurated Annotation in InterPro
![Page 69: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/69.jpg)
Entry name uses accession number
Automatic annotation through machine learning
![Page 70: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/70.jpg)
Foundations of InterProFoundations of InterPro
Manual curation
Integration of signatures
InterPro
![Page 71: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/71.jpg)
• Greater coverage of proteins
• Relationships between signatures
• Signature databases specialised
greater coverage of annotation features
evolutionary context
Unique to InterProUnique to InterPro
Advantages of integrated signaturesAdvantages of integrated signatures
![Page 72: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/72.jpg)
Characterisation of Protein SequencesCharacterisation of Protein Sequences
Build up consensus sequences of families, domains, motifs or sites Conserved signatures
more sequences
BLAST
Basic information
![Page 73: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/73.jpg)
Finding Conserved SignaturesFinding Conserved Signatures
• Pattern
More information
Simplest (limited)
• Profile
• Fingerprint
• Sequence clustering
• HMM
![Page 74: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/74.jpg)
PatternsPatterns
Patterns in sequence regular expressions
Often used to define important sites within proteins
PROSITE best-known pattern database
![Page 75: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/75.jpg)
PatternsPatterns
Example: PS00262 Insulin family signature
B chain xxxxxxCxxxxxxxxxxxxCxxxxxxxxx | | A chain xxxxxCCxxxCxxxxxxxxCx | |
MALWMRLLPL LALLALWGPD PAAAFVNQHL CGSHLVEALY LVCGERGFFY TPKTRREAED LQVGQVELGG GPGAGSLQPL ALEGSLQKRG IVEQCCTSIC SLYQLENYCN
INS_HUMAN
![Page 76: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/76.jpg)
B chain xxxxxxCxxxxxxxxxxxxCxxxxxxxxx | | A chain xxxxxCCxxxCxxxxxxxxCx | |
Example: PS00262 Insulin family signature
MALWMRLLPL LALLALWGPD PAAAFVNQHL CGSHLVEALY LVCGERGFFY TPKTRREAED LQVGQVELGG GPGAGSLQPL ALEGSLQKRG IVEQCCTSIC SLYQLENYCN
INS_HUMAN
PatternsPatterns
![Page 77: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/77.jpg)
PatternsPatterns
B chain xxxxxxCxxxxxxxxxxxxCxxxxxxxxx | | A chain xxxxxCCxxxCxxxxxxxxCx | |
Example: PS00262 Insulin family signature
MALWMRLLPL LALLALWGPD PAAAFVNQHL CGSHLVEALY LVCGERGFFY TPKTRREAED LQVGQVELGG GPGAGSLQPL ALEGSLQKRG IVEQCCTSIC SLYQLENYCN
INS_HUMAN
![Page 78: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/78.jpg)
PatternsPatterns
B chain xxxxxxCxxxxxxxxxxxxCxxxxxxxxx | | A chain xxxxxCCxxxCxxxxxxxxCx | |
Example: PS00262 Insulin family signature
MALWMRLLPL LALLALWGPD PAAAFVNQHL CGSHLVEALY LVCGERGFFY TPKTRREAED LQVGQVELGG GPGAGSLQPL ALEGSLQKRG IVEQCCTSIC SLYQLENYCN
INS_HUMAN
![Page 79: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/79.jpg)
PatternsPatterns
B chain xxxxxxCxxxxxxxxxxxxCxxxxxxxxx | | A chain xxxxxCCxxxCxxxxxxxxCx | |
Example: PS00262 Insulin family signature
INS_HUMAN
MALWMRLLPL LALLALWGPD PAAAFVNQHL CGSHLVEALY LVCGERGFFY TPKTRREAED LQVGQVELGG GPGAGSLQPL ALEGSLQKRG IVEQ CCTSICSLYQLENYC N
![Page 80: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/80.jpg)
PatternsPatterns
B chain xxxxxxCxxxxxxxxxxxxCxxxxxxxxx | | A chain xxxxxCCxxxCxxxxxxxxCx | |
Example: PS00262 Insulin family signature
MALWMRLLPL LALLALWGPD PAAAFVNQHL CGSHLVEALY LVCGERGFFY TPKTRREAED LQVGQVELGG GPGAGSLQPL ALEGSLQKRG IVEQ CCTSICSLYQLENYC N
C-C-{P}-x(2)-C-[STDNEKPI]-x(3)-[LIVMFS]-x(3)-C
Regular expression
![Page 81: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/81.jpg)
Extract pattern sequencesxxxxxxxxxxxxxxxxxxxxxxxx
Sequence alignment
Insulin family motifDefine pattern
Pattern signature
C-C-{P}-x(2)-C-[STDNEKPI]-x(3)-[LIVMFS]-x(3)-C
Build regular expression
![Page 82: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/82.jpg)
FingerprintsFingerprints
Several discrete motifs characterise family
Highly specific matches to small regions of proteins
PRINTS best-known fingerprint database
![Page 83: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/83.jpg)
FingerprintsFingerprints
Example: PR00107 Phosphocarrier HPr signature
MEKKEFHIVA ETGIHARPA TLLVQTASK FNSDINLEY KGKSVNLKS IMGVMSLGV GQGSDVTITV DGADEAEGMA
AIVETLQKEG LAE
PTHP_ENTFA:
![Page 84: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/84.jpg)
FingerprintsFingerprints
MEKKEFHIVA ETGIHARPA TLLVQTASK FNSDINLEY KGKSVNLKS IMGVMSLGV GQGSDVTITV DGADEAEGMA
AIVETLQKEG LAE
His phosphorylation site
Example: PR00107 Phosphocarrier HPr signature
PTHP_ENTFA:
![Page 85: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/85.jpg)
FingerprintsFingerprints
MEKKEFHIVA ETGIHARPA TLLVQTASK FNSDINLEY KGKSVNLKS IMGVMSLGV GQGSDVTITV DGADEAEGMA
AIVETLQKEG LAE
His phosphorylation site
Ser phosphorylation site
Example: PR00107 Phosphocarrier HPr signature
PTHP_ENTFA:
![Page 86: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/86.jpg)
FingerprintsFingerprints
His phosphorylation site
Conserved site
Example: PR00107 Phosphocarrier HPr signature
MEKKEFHIVA ETGIHARPA TLLVQTASK FNSDINLEY KGKSVNLKS IMGVMSLGV GQGSDVTITV
DGADEAEGMA AIVETLQKEG LAE
PTHP_ENTFA:
Ser phosphorylation site
![Page 87: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/87.jpg)
FingerprintsFingerprints
Example: PR00107 Phosphocarrier HPr signature
MEKKEFHIVA ETGIHARPA TLLVQTASK FNSDINLEY KGKSVNLKS IMGVMSLGV GQGSDVTITV
DGADEAEGMA AIVETLQKEG LAE
1) GIHARPATLLVQTASKF
2) KGKSVNLKSIMGVMSL
3) LGVGQGSDVTITVDGADE
PR00107 a fingerprint with three motifs
PTHP_ENTFA:
![Page 88: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/88.jpg)
Extract motif sequences
xxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxx
Sequence alignment
Fingerprint signature 1 2 3
Correct order
Correct spacing
Ser phosphorylation
site
Conserved site
His phosphorylation
siteDefine motifs
![Page 89: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/89.jpg)
Sequence ClusteringSequence Clustering
Automatic clustering of homologous domains
Used by ProDom database
![Page 90: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/90.jpg)
Sequence ClusteringSequence Clustering
Well-characterised domain families
Align resulting protein domain families
ProDomAlign
Automatically cluster homologous domains
MKDOM2
Recruit homologous domains
PSI-BLAST
![Page 91: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/91.jpg)
ProfilesProfiles
Sequence alignment scoring matrix
Profile
Sequence search
![Page 92: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/92.jpg)
Matrix
(frequency of each residue at each position in alignment)
Sequence 1:Sequence 2:Sequence 3:Sequence 4:Sequence 5:Sequence 6:Sequence 7:
Sequence alignment
![Page 93: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/93.jpg)
Match values are higher for conserved residues
e.g. Position 1 F>Y>L (phenylalanine and tyrosine are closer than leucine)
Sequence 1:Sequence 2:Sequence 3:Sequence 4:Sequence 5:Sequence 6:Sequence 7:
![Page 94: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/94.jpg)
Match values are higher for conserved residues
e.g. Position 1 F>Y>L (phenylalanine and tyrosine are closer than leucine)
Sequence 1:Sequence 2:Sequence 3:Sequence 4:Sequence 5:Sequence 6:Sequence 7:
![Page 95: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/95.jpg)
Match values are higher for conserved residues
e.g. Position 1 F>Y>L (phenylalanine and tyrosine are closer than leucine)
Sequence 1:Sequence 2:Sequence 3:Sequence 4:Sequence 5:Sequence 6:Sequence 7:
![Page 96: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/96.jpg)
Match values are higher for conserved residues
e.g. Position 1 F>Y>L (phenylalanine and tyrosine are closer than leucine)
Sequence 1:Sequence 2:Sequence 3:Sequence 4:Sequence 5:Sequence 6:Sequence 7:
![Page 97: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/97.jpg)
ProfilesProfiles
Problem insertions and deletions not well accounted for
Can characterise proteins over entire length (need trusted sequence alignment)
Position-specific scoring good for modelling divergent as well as conserved regions
![Page 98: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/98.jpg)
Hidden Markov Models (HMM)Hidden Markov Models (HMM)
Large scale profiles
Outperform in sensitivity and specificity
More flexible (can use partial alignments)
• Probability method gauges scoring parameters
• Allows insertions and deletions
Improvements:
![Page 99: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/99.jpg)
Hidden Markov Models (HMM)Hidden Markov Models (HMM)
Sequence alignment
M1 M2 M3 M4Begin
End
M = match state
![Page 100: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/100.jpg)
Hidden Markov Models (HMM)Hidden Markov Models (HMM)
D3
I2 I3
M1 M2 M3 M4Begin
End
D1 D4
M = match state,
D2
D = delete state
I1 I4
I = insert state,
I0
![Page 101: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/101.jpg)
Hidden Markov Models (HMM)Hidden Markov Models (HMM)
HMMbuild
Database search
HMMcalibrate
HMMER2 package:
http://hmmer.wustl.edu/
![Page 102: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/102.jpg)
Hidden Markov Models (HMM)Hidden Markov Models (HMM)
HMM databases:
• PIR SUPERFAMILY
• PANTHER
• TIGRFAM
• PFAM
• SMART
• SUPERFAMILY
• GENE3D
Domains conserved in sequence
Families conserved in sequence
Domains conserved in structure
![Page 103: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/103.jpg)
Hidden Markov Models (HMM)Hidden Markov Models (HMM)
HMM databases:
• PIR SUPERFAMILY
• PANTHER
• TIGRFAM
• PFAM
• SMART
• SUPERFAMILYSUPERFAMILY
• GENE3DGENE3D
Domains conserved in sequence
Families conserved in sequence
Domains conserved in structure
Special Special casecase
![Page 104: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/104.jpg)
SAM Profile HMMsSAM Profile HMMs
(http://www.cse.ucsc.edu/research/compbio/sam.html)
SUPERFAMILY + GENE3D
• Start with single seed sequence
SAM:
• Proteins related by structure
• Uses Target99 (T99) script
Often only 1 protein in a family with structural
information
May have low sequence identity
Combine results
Multiple models/ superfamily
• Homologous Structural Superfamilies
![Page 105: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/105.jpg)
SAM T99 Profile HMMsSAM T99 Profile HMMs
T99 script:
Low identity matches
Close homologues
WU-BLASTP
search
Final HMM
Single seed sequenceGIHARPATLLVQTASKF
Initial HMM
GIHARPATLLVQTASKF GIHARPATLLVQTASKF GIHARPATLLVQTASKF
New larger alignmentGIHARPATLLVQTASKF GIHARPATLLVQTASKF GIHARPATLLVQTASKF GIHARPATLLVQTASKF GIHARPATLLVQTASKF
![Page 106: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/106.jpg)
xxxxxxxxxxxxxxxxxxxxxxxx
Extract motif pattern (PROSITE)
Single motif method
Multiple motif methods
Full alignment methods
Extract multiple motifs (PRINTS)
xxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxx
Full sequence:
1) profile (PROSITE)
2) HMM (PFAM, SMART, SUPERFAMILY, TIGRFAM, PIRSF, GENE3D, PANTHER)
Sequence alignment
Summary of signature methodsSummary of signature methods
![Page 107: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/107.jpg)
Patterns Prosite
Fingerprints Prints
Sequence clustering ProDom
Profiles PrositeHMM PIR Superfamily Panther
Tigrfam Pfam
Smart
Protein Signature DatabasesProtein Signature Databases
T99-SAM HMM Gene3D Superfamily
![Page 108: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/108.jpg)
PrintsPrints
Fingerprint is a set of motifs
Full length of protein
PR00000
Can identify small conserved regions in divergent proteins
Use different combinations of motifs to describe families and sibling subfamilies
http://umber.sbs.man.ac.uk/dbbrowser/PRINTS/
![Page 109: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/109.jpg)
Prosite PatternsProsite Patterns
Pattern is a regular expression
PS00000
Identify various important sites within proteins
Several models characterise enzymes
Used by UniProt to define catalytic sites
Enzyme catalytic site Prosthetic group attachment Metal ion binding site Cysteines for disulphide bonds Protein or molecule binding
http://us.expasy.org/prosite/
![Page 110: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/110.jpg)
Prosite ProfilesProsite Profiles
ProfilePatternPS00000
PS00000
Describe protein families or domains conserved in sequence
Use curated sequence alignments
Accurate
Profile is a multiple alignment with matrix frequencies
http://us.expasy.org/prosite/
![Page 111: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/111.jpg)
ProDomProDom
Sequence clustering method automatic process (mkdom2)
PD000000
Groups UniProt sequences into (core) domains conserved in sequence
http://protein.toulouse.inra.fr/prodom/current/html/home.php
![Page 112: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/112.jpg)
PfamPfam
HMM models built from HMMER2
PF00000
Pfam A manually curatedPfam B automatic clustering
Use trusted cut-offs accurate
Wide coverage of protein families and domains conserved in sequence
http://www.sanger.ac.uk/Software/Pfam/
Only PFAM A used to build signatures in
InterPro
![Page 113: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/113.jpg)
SmartSmart
HMM domains using curated sequence alignments of families from psi-blast
SM00000
Primarily describe domains conserved in sequence
Concentrate on signalling proteins, and extracellular and nuclear domains
http://smart.embl-heidelberg.de/
![Page 114: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/114.jpg)
TigrfamsTigrfams
HMM families built with curated alignments
TIGR00000
Describe protein families (and domains) conserved in sequence and function
Functional classifications using equivalogs(functionally conserved homologues)
Curated trusted cut-off Very accurateUse phylogenetic trees Accurate family
membershiphttp://www.tigr.org/TIGRFAMs/
![Page 115: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/115.jpg)
PIRSFPIRSF
http://pir.georgetown.edu/pirsf/
HMM families using computationally defined non-overlapping clusters of sequences
PIRSF000000
Comprehensive protein family database of full-length models
Describe protein families conserved in sequence and domain composition:
Homeomorphic
![Page 116: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/116.jpg)
PantherPanther
https://panther.appliedbiosystems.com/
HMM families based on phylogenetic trees
PTHR00000
Comprehensive protein family database of full-length models
Provides family classification by functions, processes, pathways and taxonomy
Use phylogenetic trees Define functionally distinct families
![Page 117: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/117.jpg)
SuperfamilySuperfamily
HMMs based on SCOP structural superfamilies
Describe protein domains conserved in structure with evidence of common evolutionary origin
Provides information on structural classification
http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/
Good at describing non-contiguous structural domains
SSF00000
Often define structural domain boundaries
![Page 118: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/118.jpg)
Gene3DGene3D
HMM domains based on CATH structural superfamily
G3D.0.0.0.0
Provides information on structural classification
http://cathwww.biochem.ucl.ac.uk/latest/index.html
Describe protein domains conserved in structure with evidence of common evolutionary origin
Always define structural domain boundaries
**
Good at describing non-contiguous structural domains
![Page 119: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/119.jpg)
PrintsPrints Describe sibling families
PrositeProsite Identify binding and active sites (enzymes)
ProDomProDom Describe conserved core of domains
PfamPfam Wide coverage of domains and families
SmartSmart Signalling, extracellular & nuclear domains
TigrfamTigrfam Functional classification of equivalogs
PIRSFPIRSF Homeomorphs, conserved in domain composition
PantherPanther Functional families; best at detecting fragments
SuperfamilySuperfamily Structural-based domain classification
Gene3DGene3D Describe structural domain boundaries
Specialisation of databasesSpecialisation of databases
![Page 120: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/120.jpg)
Structural Representation in InterProStructural Representation in InterPro
MSD
PDB sequence
UniProt amino acid position
Residue-by-residuemapping
InterPro sequence-structure
comparison
![Page 121: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/121.jpg)
PDB structures displayed as striped patterns
Structural classification in CATHCATH
SCOP
and SCOP
Homology models from Swiss-model
Swiss-M
and ModBase
ModB
Structural RepresentationStructural Representation
![Page 122: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/122.jpg)
Structural RepresentationStructural Representation
CATH and SCOP divide PDB structures into domains
Swiss-Model and ModBase predict structure for regions not covered by PDB
Note that one domain is non-contiguous
![Page 123: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/123.jpg)
Sequence-Structure DisplaySequence-Structure Display
Structural data for specific
proteins
Signatures predictive of
protein annotation
![Page 124: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/124.jpg)
http://www.ebi.ac.uk/interpro/
Search tools include:
• Text Search
• InterProScan (sequence search)
• SRS (multiple database search)
Searching InterProSearching InterPro
![Page 125: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/125.jpg)
Text Text Search Search ResultsResults
Direct links to entry
![Page 126: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/126.jpg)
InterProScan search resultsInterProScan search results
Link to InterPro entry
Link to SRS view of InterPro entry
Enables direct searching of other databases in SRS
using InterProScan results
Link to signature database
Mouse-over provides signature data: residue position, E-value, accession ID, and name
Single InterPro
entry
![Page 127: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/127.jpg)
InterPro EntryInterPro Entry
• Groups similar signatures together and provide relationships between signatures
• Provides extensive manual annotation
• Provides links to other databases
• Provides structural information and viewers
![Page 128: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/128.jpg)
• Name and short name• Entry type• Relationships• GO mapping• Abstract• Structural links• Database links• Taxonomy• Examples• Publications
Annotation Fields in InterProAnnotation Fields in InterPro
![Page 129: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/129.jpg)
InterPro entry for the ligand-binding InterPro entry for the ligand-binding domain of the nuclear hormone domain of the nuclear hormone
receptorreceptor
![Page 130: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/130.jpg)
Protein matches
![Page 131: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/131.jpg)
Shows the InterPro entries
that match a protein
![Page 132: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/132.jpg)
Protein matchesShows each individual
signature that matches a protein
Shows structural information for
protein with links to PDB, CATH,
SCOP
![Page 133: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/133.jpg)
Protein matches
![Page 134: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/134.jpg)
Protein matches
Splice variants
![Page 135: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/135.jpg)
Select data set of these proteins
![Page 136: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/136.jpg)
Detailed information
Family, domain, site, repeat
Links to signature databases
Relationships linking different
signatures
Mapping to GO terms
Abstract with references
Contains/Found inContains/Found inDescribe composition of protein sequences
Parent/ChildParent/ChildFamily or domain evolutionary hierarchies
![Page 137: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/137.jpg)
Structural links
![Page 138: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/138.jpg)
Database links
![Page 139: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/139.jpg)
Taxonomy
![Page 140: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/140.jpg)
Overlap with other InterPro entries
Examples
References
![Page 141: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/141.jpg)
Integration of signatures
Greater coverage of annotation features
Relationships provide evolutionary context (unique to InterPro)
Increased coverage of proteins
Enhances functional annotation of
TrEMBL
Powerful Annotation ToolPowerful Annotation Tool
![Page 142: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/142.jpg)
Database links
Taxonomy Search/download using taxonomy
GO mapping Large-scale classification using GO terms
To several databases to increase annotation
Structural information Structural classification, 3-D viewers
Signature databases Direct links to their annotation
Powerful Annotation ToolPowerful Annotation Tool
![Page 143: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/143.jpg)
InterPro signatures cover:
90% of UniProt/Swiss-Prot proteins
69% of UniProt/TrEMBL proteins
CoverageCoverage
>2 million matches in InterPro>2 million matches in InterPro
>13,000 InterPro entries>13,000 InterPro entries
>22,000 signature methods>22,000 signature methods
![Page 144: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/144.jpg)
Structural coverage in InterPro:
0.6% of proteins have PDB structures
20% of proteins have Swiss-Model structures
63% of proteins have ModBase structures
CoverageCoverage
>9500 PDB structures in InterPro>9500 PDB structures in InterPro
>300,000 Swiss Model links in InterPro>300,000 Swiss Model links in InterPro
>950,000 ModBase links in InterPro>950,000 ModBase links in InterPro
![Page 145: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/145.jpg)
Web accessWeb access
Tool/Databases:
Availability and downloadsAvailability and downloads
ftp://ftp.ebi.ac.uk/pub/databases/ftp site:
DownloadsDownloads
http://www.ebi.ac.uk/services/
![Page 146: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/146.jpg)
2Can Training and Education2Can Training and Education
Bioinformatics Educational ResourceBioinformatics Educational Resource
Information on EBI Databases
On-line tutorials on EBI Databases and tools
Glossary
Guide to bioinformatics resources on the internet
EBI web servicesProtein structureNucleotide analysis
Proteomics analysis
Protein function
Genome browsing Database browsing
![Page 147: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/147.jpg)
http://www.ebi.ac.uk/
![Page 148: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/148.jpg)
http://www.ebi.ac.uk/2can/
![Page 149: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/149.jpg)
http://www.ebi.ac.uk/interpro/
![Page 150: A Proteomics Toolkit:](https://reader035.vdocument.in/reader035/viewer/2022062322/568146c2550346895db3fc97/html5/thumbnails/150.jpg)
Rolf Apweiler
Amos Bairoch
Cathy Wu
+100 annotators
AcknowledgementsAcknowledgements
Nicky Mulder
IntAct Team
InterPro Consortium
Henning Hermajakob
InterPro Team