the iproxpress knowledge system for proteomic data analysis hongzhan huang, zhang-zhi hu, peter...

1
The iProXpress Knowledge System for Proteomic Data Analysis Hongzhan Huang, Zhang-Zhi Hu, Peter McGarvey and Cathy H. Wu Protein Information Resource, Georgetown University, Washington, DC 20007 Contact [email protected] Protein Information Matrix iProXpress integrated Protein eXpression Analysis System Functional Annotation Expression Profiling Protein Mapping IP/2D/MS Proteomic Data Gene Expression Gene/Protein ID list Peptide Sequence UniProt iProClass Function Categorization Chart Two-Way Comparison Matrix Pathway Map Interaction Map GO tree visualization Protein Information Matrix Figure 1. An Overview of the iProXpress System Introduction Large-scale proteomic profiling of biological samples such as cells, organelles or biological fluids has led to discovery of numerous key and novel proteins involved in many biological/disease processes, as well as to the identification of novel disease biomarkers and potential therapeutic targets. Bioinformatics infrastructure and systems play instrumental roles in such data analyses and discovery processes. iProXpress (integrated Protein eXpression system) is a prototype analysis system designed to help analyze proteomic and genomic data, such as protein/peptide and gene profiles from MS proteomic and microarray gene expression experiments. It has been applied to several studies including the expression profile analysis of hormone-induced changes in tumor cells, and is currently being adopted for analyses of pathogen/host genomic and proteomic data produced from the NIAID Biodefense Proteomic Program. We will present a case study where proteomes of various stages of melanosomes from human melanoma cells were analyzed using the iProXpress system to illustrate its utility in facilitating a better understanding of pathways of melanin synthesis and melanosome biogenesis. System Designs iProXpress consists of three major components: The PIR (Protein Information Resource) data warehouse with integrated protein information, Analytical tools for sequence analysis and functional annotation, and Graphical user interface for categorization and visualization of expression data. Major Functionalities Gene/Peptide to Protein Mapping. Gene or protein lists are mapped to corresponding entries in UniProt Knowledgebase (UniProtKB) of all known proteins based on gene/protein IDs, names or sequences. Protein information matrix. A comprehensive matrix is generated, summarizing salient features including gene ontology (GO) and pathways retrieved from underlying PIR protein databases with annotated experimental literature information or inferred based on sequence similarity. Protein Data Analysis . By iterative categorization and sorting of proteins in the information matrix, users can correlate expression/interaction patterns to protein properties for pathway and network discovery. Melanosome Proteome Analysis ID mapping. Total 2298 gi numbers were mapped using UniProt/PIR ID mapping service, which converts common gene/protein IDs (e.g., gi number) to UniProtKB AC/ID and vice versa. 1253 were mapped to UniProtKB sequences. Peptide mapping. Peptides from MS data were matched against all human sequences in UniProtKB with a two-step procedure: direct mapping or mapping using UniRef90 clusters (90% or more sequence identity grouped in the cluster), giving 1506 mapped proteins. Combining both ID and peptide mappings, 1936 (84%) of the proteins were mapped to 1532 UniProtKB sequences. Protein information matrix. A comprehensive protein information matrix were generated from underlying PIR data warehouse (UniProt/iProClass) or inferred based on sequence homologies for melanosome proteins. Attributes in the matrix include protein name, family, domain/motif/site, isoform, post-translational modification, GO, function/functional category, structure/structural classification, pathway, protein interaction and complex, etc. Melanosome proteome analysis : The melanosome proteome datasets were partitioned into 12 groups and categorization and sorting functions for each group or all datasets were provided based on the protein information matrices especially of gene ontologies (GO) and pathways (KEGG and BioCarta). Iterative categorization and sorting of proteins based on functions, pathways, and/or other attributes were carried out to generate various protein clusters, from which interesting unique or common proteins at different stages of melanosome biogenesis can be identified in combination with manual examination. Comparative analysis of organelle proteomes . Melanosome proteomes of early or late stages were compared with other organelle proteomes such as lysosome, synaptosome and endosome. This comparative analysis coupled with other bioinformatic analysis was aimed at deducing a set of signature proteins characteristics of the melanosme. Sorting of data sets and display of protein information matrix Functional categorization based on Gene Ontology Two-way comparative matrix NIAID Biodefense Proteomics Program Master Protein Diretory from NIAID Biodefence Proteomic Research Centers. Browsing selected complete proteome(s) with protein links to the Proteomic Center data. Organellar proteomes of various stages of melanosomes from human melanoma cell lines Mapping to known mouse coat color genes led to identification of 17 human melanosome related proteins; Identification of possible stage-specific melanosome proteins for validation; Comparison of melanosome proteome with those of several other organelles. SLC24A5, a human skin color gene Science 16 December 2005: 1754- 1755. Q71RS6 Ion transporter JSX (Unique late stage) Causes delayed and reduced development of melanin pigmentation golden (gdn) gdn This is a partial list of total 17 mapped genes. The others include Lyst, Ostm1, Dct, Atp7a, Gpr143, Myo5a and Krt2-17. For complete list, go to http://pir.georgetown.edu/~huz/datamining/proteomics/ Griscelli syndrome, type 2 [607624 ] Q6IAS8 : RAB27A protein P51159 : Ras-related protein Rab-27A (Common all stage) melanosome transport ashen (ash) Rab27a OCA4 [606574 ] Q9UMX9 : Melanoma antigen AIM1 Q6P2P0 : Membrane-associated transporter protein, isoform b (MNT1 stage1 & 2) transporter underwhite (uw) *Matp Glaucoma-related pigment dispersion syndrome-1 [ 604368 ] Q14956 : Transmembrane glycoprotein NMB precursor (Common all stage) Apparent melanosomal component iris pigment dispersion (ipd) *Gpnmb Rufous albinism, ROCA [ 115501 ]; OCA3 [203290 ]; Precocious graying of hair [278400 ] P17643 : 5,6-dihydroxyindole-2- carboxylic acid oxidase precursor. (Unique MNT1) melanosomal enzyme/stabilizing factor brown (b) *Tyrp1 OCA1 [203100 ]; OCA1B [ 606952 ]… P14679 : Tyrosinase precursor (Unique MNT1) melanogenic enzyme albino, color (c) *Tyr Oculocutaneous albinism [ 155550 ] P40967 : Pmel 17 precursor (Common all stage) melanosomal matrix protein silver (si) *si 606281 P57729 : Ras-related protein Rab-38. (uniqueSkmel28) Targeting of Tyrp1 protein to the melanosome chocolate (cht) *Rab38 Q9H9B4 : Sideroflexin-1(Common early stage) Tricarboxylate carrier flexed tail Sfxn1 Waardenburg-shah syndrome [277580 ]… P24530 : ET-B (Common stage1 & MNT1 stage2) melanoblast differentiation piebald spotting (s) Ednrb Human Disease (OMIM) Human Melanosome Proteins Function in Pigmentation Murine Locus Gene Symbol MNT1 Stage I (77) Both melanosome-specific proteins Tyrosinase and TYRP1 are absent in Skmel28 data set, suggesting that their absence can partially account for the lack of melanin production in Skmel28. P57729 : Ras-related protein Rab-38 (=mouse Rab38 ) P51810 : G-protein coupled receptor 143 (=mouse Gpr143 ) P53794 : Sodium/myo-inositol cotransporter (Na(+)/myo-inositol cotransporter) Skmel28 unique (143) Q12846 : Syntaxin-4 (TM) (interact with O75379 Vesicle-associated membrane protein 4; Q15836 Vesicle-associated membrane protein 3) Q04656 : Copper-transporting ATPase 1 (=mouse Atp7a) Adaptor proteins: O95782: AP-2 (~mouse Ap3bl) (also Skmel) Q96EL6: Adaptin (~mouse Ap3d) Vacuolar protein sorting: Q96A65 : Exocyst complex component Sec8 P46459 : Vesicle-fusing ATPase (EC 3.6.4.6) ( interact ) P14415 : Sodium/potassium-transporting ATPase beta-2 chain (TM) MNT1 Stage II (112) P36955 : Pigment epithelium-derived factor precursor (PEDF) Q14254 : Flotillin-2 (Epidermal surface antigen) P07093 : Glia derived nexin precursor (GDN) P24390 : ER lumen protein retaining receptor 1 (KDEL receptor 1) (TM) O14880 : Microsomal glutathione S-transferase 3 (TM) MNT4 Stage IV (267) Adaptor protein: Q9Y6Q5 : Adaptor protein complex AP-1 mu-2 subunit, ( interact with P63010: AP2B1) Motor poteins: Q14203 : Dynactin-1 (Progressive lower motor neuron disease [ OMIM:607641]) (interact with P18669: Phosphoglycerate mutase 1) Q9H193 : KINESIN-13A2 Transport: Q99747 : Gamma-soluble NSF attachment protein (also Skmel) Q99698 : Lysosomal trafficking regulator (=mouse Lyst) Vacuolar protein sorting: Q9H444 : VPS32 (~mouse Vps33a) ( interact ) (also Skmel) Q9NZZ3 : VPS60 (~mouse Vps33a) Mears et al, 2004 42 (M:42, S:32) -18(43%) all stages 55 ~56 Human exosome (EX) Martens et al, 2005 33 (M:26, S:22) -6(18%) all stages 71 ~93 Human platelet (PL) Bagshaw et al, 2005 49 (M:40, S:38) -13(27%) all stages 116 215 Rat lysosome (LY) Witzmann et al, 2005 43 (M:35, S:27) -14(33%) all stages 88 200 Rat synaptosome (SY) Tribl et al, 2005 43 (M:38, S:36) -22(51%) all stages 72 72 Human neuromelanin granules (NG) Knoblach et al, 2003 57 (M:51, S:36) -19(33%) all stage 131 ~141 Mouse ER (ER) References Common with melanosome * # Entries mapped # Protein reported Organelle All stage IV-specific membrane protein: P50443 : Sulfate transporter (Solute carrier family 26 member 2) (OMIM : 600972) Q9NZ45 : Protein C10orf70 P33121 : Long-chain-fatty-acid--CoA ligase 1 (LACS 1) (OMIM : 152425) (also in ER) Q8NCC2 : Hypothetical protein FLJ90355 (Solute carrier family 2 Q8IWB8 : CCR4-NOT transcription complex, subunit 1, isoform b P27449 : Vacuolar ATP synthase 16 kDa proteolipid subunit (EC 3.6.3.14) (OMIM : 108745) Q71RS6 : Ion transporter JSX [Homo sapiens] – human skin color gene Q6ZTT7 : Hypothetical protein FLJ44232 [Homo sapiens] Q16444 : Phosphoglycerate kinase (Fragment) [Homo sapiens] (47 aa) A Case Study Table 1. Mapping of mouse color genes to human melanosome proteins Table 2. Partial list of stage-specific melanosome proteins Table 3. Summary of the comparison of organellar proteomes http://pir.georgetown.edu/iproxpress

Upload: alexander-stone

Post on 02-Jan-2016

218 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: The iProXpress Knowledge System for Proteomic Data Analysis Hongzhan Huang, Zhang-Zhi Hu, Peter McGarvey and Cathy H. Wu Protein Information Resource,

The iProXpress Knowledge System for Proteomic Data Analysis

Hongzhan Huang, Zhang-Zhi Hu, Peter McGarvey and Cathy H. WuProtein Information Resource, Georgetown University, Washington, DC 20007

[email protected]

Protein Information

Matrix

iProXpress

integrated Protein eXpression Analysis System

Functional Annotation

Expression Profiling

Protein Mapping

IP/2D/MS Proteomic DataGene Expression

Gene/Protein ID list Peptide Sequence

UniProt

iProClass

Function Categorization Chart

Two-Way Comparison Matrix

Pathway MapInteraction Map

GO tree visualization

Protein Information Matrix

Figure 1. An Overview of the iProXpress System

IntroductionLarge-scale proteomic profiling of biological samples such as cells, organelles or biological fluids has led to discovery of numerous key and novel proteins involved in many biological/disease processes, as well as to the identification of novel disease biomarkers and potential therapeutic targets. Bioinformatics infrastructure and systems play instrumental roles in such data analyses and discovery processes. iProXpress (integrated Protein eXpression system) is a prototype analysis system designed to help analyze proteomic and genomic data, such as protein/peptide and gene profiles from MS proteomic and microarray gene expression experiments. It has been applied to several studies including the expression profile analysis of hormone-induced changes in tumor cells, and is currently being adopted for analyses of pathogen/host genomic and proteomic data produced from the NIAID Biodefense Proteomic Program. We will present a case study where proteomes of various stages of melanosomes from human melanoma cells were analyzed using the iProXpress system to illustrate its utility in facilitating a better understanding of pathways of melanin synthesis and melanosome biogenesis.

System Designs• iProXpress consists of three major components:

• The PIR (Protein Information Resource) data warehouse with integrated protein information, • Analytical tools for sequence analysis and functional annotation, and • Graphical user interface for categorization and visualization of expression data.

• Major Functionalities• Gene/Peptide to Protein Mapping. Gene or protein lists are mapped to corresponding entries in UniProt

Knowledgebase (UniProtKB) of all known proteins based on gene/protein IDs, names or sequences. • Protein information matrix. A comprehensive matrix is generated, summarizing salient features including gene

ontology (GO) and pathways retrieved from underlying PIR protein databases with annotated experimental literature information or inferred based on sequence similarity.

• Protein Data Analysis. By iterative categorization and sorting of proteins in the information matrix, users can correlate expression/interaction patterns to protein properties for pathway and network discovery.

Melanosome Proteome Analysis• ID mapping. Total 2298 gi numbers were mapped using UniProt/PIR ID mapping service, which converts common

gene/protein IDs (e.g., gi number) to UniProtKB AC/ID and vice versa. 1253 were mapped to UniProtKB sequences. • Peptide mapping. Peptides from MS data were matched against all human sequences in UniProtKB with a two-step

procedure: direct mapping or mapping using UniRef90 clusters (90% or more sequence identity grouped in the cluster), giving 1506 mapped proteins. Combining both ID and peptide mappings, 1936 (84%) of the proteins were mapped to 1532 UniProtKB sequences.

• Protein information matrix. A comprehensive protein information matrix were generated from underlying PIR data warehouse (UniProt/iProClass) or inferred based on sequence homologies for melanosome proteins. Attributes in the matrix include protein name, family, domain/motif/site, isoform, post-translational modification, GO, function/functional category, structure/structural classification, pathway, protein interaction and complex, etc.

• Melanosome proteome analysis: The melanosome proteome datasets were partitioned into 12 groups and categorization and sorting functions for each group or all datasets were provided based on the protein information matrices especially of gene ontologies (GO) and pathways (KEGG and BioCarta). Iterative categorization and sorting of proteins based on functions, pathways, and/or other attributes were carried out to generate various protein clusters, from which interesting unique or common proteins at different stages of melanosome biogenesis can be identified in combination with manual examination.

• Comparative analysis of organelle proteomes. Melanosome proteomes of early or late stages were compared with other organelle proteomes such as lysosome, synaptosome and endosome. This comparative analysis coupled with other bioinformatic analysis was aimed at deducing a set of signature proteins characteristics of the melanosme.

Sorting of data sets and display of protein information matrix

Functional categorization based on Gene Ontology

Two-way comparative matrix

NIAID Biodefense Proteomics Program

Master Protein Diretory from NIAID Biodefence Proteomic Research Centers.

Browsing selected complete proteome(s) with protein links to the Proteomic Center data.

Organellar proteomes of various stages of melanosomes from human melanoma cell lines• Mapping to known mouse coat color genes led to

identification of 17 human melanosome related proteins; • Identification of possible stage-specific melanosome

proteins for validation; • Comparison of melanosome proteome with those of

several other organelles.SLC24A5, a human skin color geneScience 16 December 2005: 1754-1755.

Q71RS6Ion transporter JSX (Unique late stage)

Causes delayed and reduced development of melanin pigmentation

golden (gdn)gdn

This is a partial list of total 17 mapped genes. The others include Lyst, Ostm1, Dct, Atp7a, Gpr143, Myo5a and Krt2-17. For complete list, go to http://pir.georgetown.edu/~huz/datamining/proteomics/

Griscelli syndrome, type 2 [607624]

Q6IAS8 : RAB27A proteinP51159 : Ras-related protein Rab-27A (Common all stage)

melanosome transportashen (ash)Rab27a

OCA4 [606574]

Q9UMX9 : Melanoma antigen AIM1 Q6P2P0 : Membrane-associated transporter protein, isoform b (MNT1 stage1 & 2)

transporterunderwhite (uw)*Matp

Glaucoma-related pigment dispersion syndrome-1 [604368 ]

Q14956 : Transmembrane glycoprotein NMB precursor (Common all stage)

Apparent melanosomal component

iris pigment dispersion (ipd)*Gpnmb

Rufous albinism, ROCA [115501]; OCA3 [203290];Precocious graying of hair [278400]

P17643 : 5,6-dihydroxyindole-2-carboxylic acid oxidase precursor. (Unique MNT1)

melanosomal enzyme/stabilizing factor

brown (b)*Tyrp1

OCA1 [203100]; OCA1B [606952]…

P14679 : Tyrosinase precursor (Unique MNT1)

melanogenic enzymealbino, color (c)*Tyr

Oculocutaneous albinism [155550]

P40967 : Pmel 17 precursor (Common all stage)

melanosomal matrix protein

silver (si)*si

606281P57729 : Ras-related protein Rab-38. (uniqueSkmel28)

Targeting of Tyrp1 protein to the melanosome

chocolate (cht)*Rab38

Q9H9B4 : Sideroflexin-1(Common early stage)

Tricarboxylate carrierflexed tailSfxn1

Waardenburg-shah syndrome [277580]…

P24530 : ET-B (Common stage1& MNT1 stage2)

melanoblast differentiation

piebald spotting (s)Ednrb

Human Disease (OMIM)Human Melanosome ProteinsFunction in

PigmentationMurine LocusGene Symbol

MNT1 Stage I (77)

Both melanosome-specific proteins Tyrosinase and TYRP1 are absent in Skmel28 data set, suggesting that their absence can partially account for the lack of melanin production in Skmel28. P57729 : Ras-related protein Rab-38 (=mouse Rab38)P51810 : G-protein coupled receptor 143 (=mouse Gpr143) P53794 : Sodium/myo-inositol cotransporter (Na(+)/myo-inositol cotransporter)

Skmel28 unique (143)

Q12846 : Syntaxin-4 (TM) (interact with O75379 Vesicle-associated membrane protein 4; Q15836 Vesicle-associated membrane protein 3)Q04656 : Copper-transporting ATPase 1 (=mouse Atp7a)Adaptor proteins:O95782: AP-2 (~mouse Ap3bl) (also Skmel)Q96EL6: Adaptin (~mouse Ap3d)Vacuolar protein sorting:Q96A65 : Exocyst complex component Sec8P46459 : Vesicle-fusing ATPase (EC 3.6.4.6) (interact)P14415 : Sodium/potassium-transporting ATPase beta-2 chain (TM)

MNT1 Stage II (112)

P36955 : Pigment epithelium-derived factor precursor (PEDF) Q14254 : Flotillin-2 (Epidermal surface antigen)P07093 : Glia derived nexin precursor (GDN)P24390 : ER lumen protein retaining receptor 1 (KDEL receptor 1) (TM)O14880 : Microsomal glutathione S-transferase 3 (TM)

MNT4 Stage IV (267)

Adaptor protein: Q9Y6Q5 : Adaptor protein complex AP-1 mu-2 subunit, (interact with P63010: AP2B1) Motor poteins:Q14203 : Dynactin-1 (Progressive lower motor neuron disease [OMIM:607641]) (interact with P18669: Phosphoglycerate mutase 1)Q9H193 : KINESIN-13A2Transport:Q99747 : Gamma-soluble NSF attachment protein (also Skmel)Q99698 : Lysosomal trafficking regulator (=mouse Lyst)Vacuolar protein sorting:Q9H444 : VPS32 (~mouse Vps33a) (interact) (also Skmel)Q9NZZ3 : VPS60 (~mouse Vps33a)

Mears et al, 200442 (M:42, S:32)-18(43%) all stages

55~56Human exosome (EX)

Martens et al, 200533 (M:26, S:22)-6(18%) all stages

71~93Human platelet (PL)

Bagshaw et al, 2005

49 (M:40, S:38)-13(27%) all stages

116215Rat lysosome (LY)

Witzmann et al, 2005

43 (M:35, S:27)-14(33%) all stages

88200Rat synaptosome (SY)

Tribl et al, 200543 (M:38, S:36)-22(51%) all stages

7272Human neuromelanin granules (NG)

Knoblach et al, 2003

57 (M:51, S:36)-19(33%) all stage

131~141Mouse ER (ER)

ReferencesCommon with melanosome *

# Entries mapped

# Protein reported

Organelle

All stage IV-specific membrane protein: P50443 : Sulfate transporter (Solute carrier family 26 member 2) (OMIM: 600972)Q9NZ45 : Protein C10orf70 P33121 : Long-chain-fatty-acid--CoA ligase 1 (LACS 1) (OMIM: 152425) (also in ER)Q8NCC2 : Hypothetical protein FLJ90355 (Solute carrier family 2Q8IWB8 : CCR4-NOT transcription complex, subunit 1, isoform bP27449 : Vacuolar ATP synthase 16 kDa proteolipid subunit (EC 3.6.3.14) (OMIM: 108745)Q71RS6 : Ion transporter JSX [Homo sapiens] – human skin color geneQ6ZTT7 : Hypothetical protein FLJ44232 [Homo sapiens]Q16444 : Phosphoglycerate kinase (Fragment) [Homo sapiens] (47 aa)

A Case Study

Table 1. Mapping of mouse color genes to human melanosome proteins

Table 2. Partial list of stage-specific melanosome proteins

Table 3. Summary of the comparison of organellar proteomes

http://pir.georgetown.edu/iproxpress