title: assign pathways to gene set june 21, 2007 guanming wu

23
Title: Assign Pathways to Title: Assign Pathways to Gene Set Gene Set June 21, 2007 June 21, 2007 Guanming Wu Guanming Wu

Upload: joan-shields

Post on 18-Jan-2018

218 views

Category:

Documents


0 download

DESCRIPTION

Results from Previous Talks Notes: 1. Values in parentheses are numbers of proteins in SwissProt (Column 2) or coverage in SwissProt (Column 3) 2. Coverages in column 3 were calculated by dividing numbers in column 2 by total number of HPRD entries (25205) or SwissProt entries (14446) 3. Citations: 1). Joshi-Tope G. et al. Nucleric Acids Res.33 : D (2005) 2). Huaiyu Mi et al. Nucleric Acids Res. 35: D247-D252 (2007) 3). 4). 5). 6).

TRANSCRIPT

Page 1: Title: Assign Pathways to Gene Set June 21, 2007 Guanming Wu

Title: Assign Pathways to Title: Assign Pathways to

Gene SetGene SetJune 21, 2007June 21, 2007

Guanming WuGuanming Wu

Page 2: Title: Assign Pathways to Gene Set June 21, 2007 Guanming Wu

ContentsContents• Recap what I have done past

• Introduction to the dataset from Scott Powers

• Statistical model used

• Results

• Summary and future directions

Page 3: Title: Assign Pathways to Gene Set June 21, 2007 Guanming Wu

Results from Previous TalksResults from Previous TalksData Source Protein(SwissProt) Coverage (SwissProt) Interaction Citation

Reactome 1492 (1200) 5.9% (8.3%) 45189 1Panther 2997 (1670) 11.9% (11.6%) 75694 2CellMap 567 (567) 2.2% (3.9%) 1195 3

INOH 719 (711) 2.9% (4.9%) 11759 4NCI-Nature 607 (593) 2.4% (4.1%) 2945 5

NCI-BioCarta 938 (936) 3.7% (6.5%) 4675 5KEGG 2053 (1947) 8.1% (13.5%) 11606 6Total 5461 (3743) 21.7% (25.9%) 142498

Notes: 1. Values in parentheses are numbers of proteins in SwissProt (Column 2) or coverage in SwissProt (Column 3) 2. Coverages in column 3 were calculated by dividing numbers in column 2 by total number of HPRD entries (25205) or SwissProt entries (14446) 3. Citations:

1). Joshi-Tope G. et al. Nucleric Acids Res.33 : D428-32 (2005)2). Huaiyu Mi et al. Nucleric Acids Res. 35: D247-D252 (2007)3). http://cancer.cellmap.org4). http://www.inoh.org5). http://pid.nci.nih.org6). http://www.genome.jp/kegg

Page 4: Title: Assign Pathways to Gene Set June 21, 2007 Guanming Wu

Results from Previous TalksResults from Previous Talks

Data Source Protein(SwissProt) Coverage (SwissProt) InteractionPathways 5461 (3743) 21.7% (25.9%) 142498

PPIs 11977 (7097) 47.5% (49.1%) 56616Total 14367 (7873) 57.0% (54.5%) 194323

Naïve Bayes Classifier

Page 5: Title: Assign Pathways to Gene Set June 21, 2007 Guanming Wu

Dataset from Scott PowersDataset from Scott PowersCHR Start.Pos Length Min Samples Frequency Genes

1 244971182 449202 2.00032 CHAGO-K-1;DMS79;KN0119;KN0144… 14 OR2T29,PGBD2,OR2T34…8 128167230 873909 10.8613 H1437;H647;H647;HCC-44;SK-LC-17… 9 MYC 7 54277584 816214 4.29282 KN0850;KN0244;NCI-H1838;H1792… 7 EGFR,SEC61G,MGC33530… … … … … … …

23 147222115 1329556 1.86815 KN0148 1 CXorf40A,FAM11A,LW-1…24 947914 1544960 1.60234 KN0397 1 CXYorf2,IL3RA,ZBED1…24 57500280 152390 1.66632 KN0129 1 SYBL1

• Lung cancer samples or cell lines: 135

• Amplified fragments: 365

• Genes contained by fragments: 3900

Question: How to find statistically significant pathways for these genes?

Page 6: Title: Assign Pathways to Gene Set June 21, 2007 Guanming Wu

A Simple ModelA Simple Model

Binomial Test

Bonferroni Correction

?

Page 7: Title: Assign Pathways to Gene Set June 21, 2007 Guanming Wu

Results from Simple ModelResults from Simple ModelPathway PathwayProteinRatio PathwayProteinNumber ProteinFromSample P-ValueB cell receptor signaling pathway(I) 0.013 83 33 2.90E-07TGFBR(C) 0.02 128 41 2.70E-06Cell adhesion molecules (CAMs)(K) 0.018 119 36 3.50E-05Jak-STAT signaling pathway(K) 0.015 98 30 0.00012EGFR1(C) 0.023 149 40 0.00016Signaling events mediated by PTP1B(N) 0.0059 38 15 0.00052Focal adhesion(K) 0.021 137 35 0.00097Taste transduction(K) 0.007 45 16 0.001PDGFRA Signaling Pathway(N) 0.0028 18 9 0.0013il-2 receptor beta chain in t cell activation(B) 0.0059 38 14 0.0014il 4 signaling pathway(B) 0.0023 15 8 0.0016keratinocyte differentiation(B) 0.0061 39 14 0.0018BCR Signaling Pathway(N) 0.0093 59 18 0.0027Nucleotide metabolism(R) 0.013 85 23 0.0034Leukocyte transendothelial migration(K) 0.012 80 22 0.0034FGF signaling pathway(I) 0.0088 56 17 0.0036fc epsilon receptor i signaling in mast cells(B) 0.0045 29 11 0.0036Hemostasis(R) 0.015 96 25 0.0038Regulation of Telomerase(N) 0.0089 57 17 0.0043… … … … …

Page 8: Title: Assign Pathways to Gene Set June 21, 2007 Guanming Wu

Results from Simple ModelResults from Simple Model

Bonferroni Correction: P-values 536 536: number of pathways

Pathway PathwayProteinRatio PathwayProteinNumber ProteinFromSample P-ValueB cell receptor signaling pathway(I) 0.013 83 33 1.50E-04TGFBR(C) 0.02 128 41 1.40E-03Cell adhesion molecules (CAMs)(K) 0.018 119 36 1.90E-02Jak-STAT signaling pathway(K) 0.015 98 30 0.064

Page 9: Title: Assign Pathways to Gene Set June 21, 2007 Guanming Wu

How to consider frequencies?How to consider frequencies?CHR Start.Pos Length Min Samples Frequency Genes

1 244971182 449202 2.00032 CHAGO-K-1;DMS79;KN0119;KN0144… 14 OR2T29,PGBD2,OR2T34…8 128167230 873909 10.8613 H1437;H647;H647;HCC-44;SK-LC-17… 9 MYC 7 54277584 816214 4.29282 KN0850;KN0244;NCI-H1838;H1792… 7 EGFR,SEC61G,MGC33530… … … … … … …

23 147222115 1329556 1.86815 KN0148 1 CXorf40A,FAM11A,LW-1…24 947914 1544960 1.60234 KN0397 1 CXYorf2,IL3RA,ZBED1…24 57500280 152390 1.66632 KN0129 1 SYBL1

To consider frequencies, a new list of genes was generated: genes were counted multiple times based on frequencies

E.g.: OR2T29 14, MYC 9, etc.

Total numbers: 5717

Redundant Set Non-redundant Set

Page 10: Title: Assign Pathways to Gene Set June 21, 2007 Guanming Wu

Results from Simple ModelResults from Simple Model- - Redundant SetRedundant Set

Pathway PathwayProteinRatio PathwayProteinNumber ProteinFromSample P-ValueB cell receptor signaling pathway(I) 0.013 83 62 2.22E-16FGF signaling pathway(I) 0.0088 56 46 5.36E-14Melanoma(K) 0.0061 39 33 9.02E-11Adherens junction(K) 0.011 74 47 1.65E-10EGFR1(C) 0.023 149 73 2.36E-10Focal adhesion(K) 0.021 137 69 2.40E-10Pancreatic cancer(K) 0.01 64 42 5.96E-10Signaling events mediated by PTP1B(N) 0.0059 38 31 7.79E-10TGFBR(C) 0.02 128 64 1.42E-09Glioma(K) 0.0055 35 28 7.60E-09keratinocyte differentiation(B) 0.0061 39 29 1.97E-08MAPK signaling pathway(K) 0.04 257 100 2.16E-08nf-kb signaling pathway(B) 0.0031 20 20 3.14E-08p53 signaling pathway(B) 0.002 13 16 4.74E-08influence of ras and rho proteins on g1 to s transition(B) 0.0037 24 21 1.30E-07trefoil factors initiate mucosal healing(B) 0.0045 29 23 1.86E-07agrin in postsynaptic differentiation(B) 0.0036 23 20 2.82E-07Apoptosis(K) 0.012 79 41 4.94E-07angiotensin ii mediated activation of jnk pathway via pyk2 dependent signaling(B)0.0042 27 21 8.44E-07Atypical NF kappa B Pathway(N) 0.002 13 14 1.51E-06… … … … …

Bonferroni Correction cannot make any difference!

Page 11: Title: Assign Pathways to Gene Set June 21, 2007 Guanming Wu

Permutation Based ModelPermutation Based Model

Sampling genes

Binomial test

Filtering out hit pathways based on cut-off value

1000

Counting occurrences of pathways

Generating a mapping file

Binomial test of actual sample

Correcting sample p values using mapping file

Choosing cut-off p-value

Page 12: Title: Assign Pathways to Gene Set June 21, 2007 Guanming Wu

Sampling GenesSampling GenesChromosome segment based: Using a fixed length to sample a chromosome based on CNV information

Example:

Chromosome 1

Page 13: Title: Assign Pathways to Gene Set June 21, 2007 Guanming Wu

One RunOne Run

Pathway PathwayProteinRatio PathwayProteinNumber ProteinFromSample P-ValueB cell receptor signaling pathway(I) 0.013 83 33 2.90E-07TGFBR(C) 0.02 128 41 2.70E-06Cell adhesion molecules (CAMs)(K) 0.018 119 36 3.50E-05Jak-STAT signaling pathway(K) 0.015 98 30 0.00012EGFR1(C) 0.023 149 40 0.00016Signaling events mediated by PTP1B(N) 0.0059 38 15 0.00052Focal adhesion(K) 0.021 137 35 0.00097Taste transduction(K) 0.007 45 16 0.001PDGFRA Signaling Pathway(N) 0.0028 18 9 0.0013il-2 receptor beta chain in t cell activation(B) 0.0059 38 14 0.0014il 4 signaling pathway(B) 0.0023 15 8 0.0016keratinocyte differentiation(B) 0.0061 39 14 0.0018BCR Signaling Pathway(N) 0.0093 59 18 0.0027Nucleotide metabolism(R) 0.013 85 23 0.0034Leukocyte transendothelial migration(K) 0.012 80 22 0.0034FGF signaling pathway(I) 0.0088 56 17 0.0036fc epsilon receptor i signaling in mast cells(B) 0.0045 29 11 0.0036Hemostasis(R) 0.015 96 25 0.0038Regulation of Telomerase(N) 0.0089 57 17 0.0043… … … … …

2.9E-07

B cell receptor signaling pathway(I)

3.0E-07

Page 14: Title: Assign Pathways to Gene Set June 21, 2007 Guanming Wu

p-value: 3.0E-7, permutation: 1000Mammalian Wnt signaling pathway Diagram(I) 1Translation(R) 1Antigen processing and presentation(K) 12Heterotrimeric GPCR signaling pathway (through G alpha s ACs Epac BRaf and ERKcascade)(I) 8Regulation of Telomerase(N) 1IFN alpha signaling pathway(JAK1 TYK2 STAT3)(I) 68MAPK signaling pathway(K) 2Wnt signaling pathway(P) 33Heterotrimeric GPCR signaling pathway (through G alpha q, PLC beta and ERK cascade)(I) 11Heteromeric GPCR signaling pathway (through_G alpha s_ACs_PKA_BRaf_and_ERKcascade)(canonical)(I)13Natural killer cell mediated cytotoxicity(K) 3Neuroactive ligand-receptor interaction(K) 4Complement and coagulation cascades(K) 1Xenobiotic metabolism(R) 2Toll-like receptor signaling pathway(K) 10Taste transduction(K) 21IFN alpha signaling pathway((JAK1 TYK2 STAT1 STAT2)(I) 57JAK-STAT pathway and regulation pathway(I) 7Heterotrimeric GPCR signaling pathway (through G alpha i and pertussis toxin)(I) 11Insulin signaling pathway(K) 1GPCR signaling (cholera toxin)(I) 12C. elegans endoderm induction Wnt signaling pathway Diagram(I) 1Cadherin signaling pathway(P) 140Chromosome Maintenance(R) 10Xenopus axis formation Wnt signaling pathway Diagram(I) 1JAK STAT MolecularVariation(I) 15Cell adhesion molecules (CAMs)(K) 1IFN alpha signaling pathway(JAK1 TYK2 STAT1 STAT3)(I) 67IFN alpha signaling pathway(JAK1 TYK2 STAT1)(I) 71Drosophila Toll-like receptor signaling(I) 2Heterotrimeric GTP-binding protein coupled receptor signaling pathway (through G alpha i, adenylate cyclase and cAMP)(I)11TNF alpha/NF-kB(C) 2Hemostasis(R) 1Role of HDAC Class III(N) 9Drosophila Wingless/Wnt signaling pathway Diagram(I) 1Cytokine-cytokine receptor interaction(K) 28Total Topics: 36

B cell receptor signaling pathway(I) p value: < 0.001

Page 15: Title: Assign Pathways to Gene Set June 21, 2007 Guanming Wu

Another RunAnother Run

Pathway PathwayProteinRatio PathwayProteinNumber ProteinFromSample P-ValueB cell receptor signaling pathway(I) 0.013 83 33 2.90E-07TGFBR(C) 0.02 128 41 2.70E-06Cell adhesion molecules (CAMs)(K) 0.018 119 36 3.50E-05Jak-STAT signaling pathway(K) 0.015 98 30 0.00012EGFR1(C) 0.023 149 40 0.00016Signaling events mediated by PTP1B(N) 0.0059 38 15 0.00052Focal adhesion(K) 0.021 137 35 0.00097Taste transduction(K) 0.007 45 16 0.001PDGFRA Signaling Pathway(N) 0.0028 18 9 0.0013il-2 receptor beta chain in t cell activation(B) 0.0059 38 14 0.0014il 4 signaling pathway(B) 0.0023 15 8 0.0016keratinocyte differentiation(B) 0.0061 39 14 0.0018BCR Signaling Pathway(N) 0.0093 59 18 0.0027Nucleotide metabolism(R) 0.013 85 23 0.0034Leukocyte transendothelial migration(K) 0.012 80 22 0.0034FGF signaling pathway(I) 0.0088 56 17 0.0036fc epsilon receptor i signaling in mast cells(B) 0.0045 29 11 0.0036Hemostasis(R) 0.015 96 25 0.0038Regulation of Telomerase(N) 0.0089 57 17 0.0043… … … … …

2.7E-06TGFBR(C)

3.0E-06

Page 16: Title: Assign Pathways to Gene Set June 21, 2007 Guanming Wu

p-value: 3.0E-6, permutation: 1000Pathway OccurrenceAntigen processing and presentation(K) 18Apoptosis(K) 3Wnt(C) 1VEGF signaling pathway(I) 1MAPK signaling pathway(K) 6Wnt signaling pathway(P) 56TGF-beta signaling pathway(K) 1Heteromeric GPCR signaling pathway (through_G alpha s_ACs_PKA_BRaf_and_ERKcascade)(canonical)(I) 35mRNA Processing(R) 1Neuroactive ligand-receptor interaction(K) 19Signaling events mediated by VEGFR1 and VEGFR2(N) 1IFN alpha signaling pathway((JAK1 TYK2 STAT1 STAT2)(I) 78Wnt signaling pathway(K) 1Heterotrimeric GPCR signaling pathway (through G alpha i and pertussis toxin)(I) 36Jak-STAT signaling pathway(K) 1DNA Replication(R) 1Chromosome Maintenance(R) 18Opioid Signalling(R) 1JAK STAT MolecularVariation(I) 45Formation and Maturation of mRNA Transcript(R) 1Metabolism of glucose, other sugars, and ethanol(R) 2Drosophila Toll-like receptor signaling(I) 17IFN alpha signaling pathway(JAK1 TYK2 STAT1)(I) 82VEGF signaling pathway(K) 1TGFBR(C) 2Heterotrimeric GTP-binding protein coupled receptor signaling pathway (through G alpha i, adenylate cyclase and cAMP)(I)36TNF alpha/NF-kB(C) 2Hemostasis(R) 1Heterotrimeric GPCR signaling pathway (through G alpha s ACs Epac BRaf and ERKcascade)(I) 32EGFR1(C) 2Cell cycle(K) 1IFN alpha signaling pathway(JAK1 TYK2 STAT3)(I) 82Lipid metabolism(R) 1Heterotrimeric GPCR signaling pathway (through G alpha q, PLC beta and ERK cascade)(I) 30TCR pathway (CD4 positive cells)(N) 2Natural killer cell mediated cytotoxicity(K) 10Complement and coagulation cascades(K) 4Xenobiotic metabolism(R) 4Toll-like receptor signaling pathway(K) 16Taste transduction(K) 37JAK-STAT pathway and regulation pathway(I) 23Insulin signaling pathway(K) 1GPCR signaling (cholera toxin)(I) 34Cadherin signaling pathway(P) 143angiotensin ii mediated activation of jnk pathway via pyk2 dependent signaling(B) 1Cell adhesion molecules (CAMs)(K) 1IFN alpha signaling pathway(JAK1 TYK2 STAT1 STAT3)(I) 80Role of HDAC Class III(N) 12Cytokine-cytokine receptor interaction(K) 50Total Topics: 49

TGFBR(C) 2

TGFBR(C) p value: 0.002

Page 17: Title: Assign Pathways to Gene Set June 21, 2007 Guanming Wu

Pathways p Values Corrected p ValuesB cell receptor signaling pathway(I) 2.92E-07 <0.001TGFBR(C) 2.74E-06 0.002Cell adhesion molecules (CAMs)(K) 3.53E-05 0.009Jak-STAT signaling pathway(K) 1.26E-04 0.01Signaling events mediated by PTP1B(N) 5.20E-04 0.02Focal adhesion(K) 9.74E-04 0.021

Significantly Hit PathwaysSignificantly Hit Pathways - Non-redundant Set- Non-redundant Set

Page 18: Title: Assign Pathways to Gene Set June 21, 2007 Guanming Wu

Significantly Hit PathwaysSignificantly Hit Pathways- Redundant Set- Redundant Set

Pathways p Values Corrected p ValuesB cell receptor signaling pathway(I) 2.22E-16 0.002FGF signaling pathway(I) 5.36E-14 0.011Focal adhesion(K) 2.40E-10 0.02Pancreatic cancer(K) 5.96E-10 0.027Signaling events mediated by PTP1B(N) 7.79E-10 0.039Glioma(K) 7.60E-09 0.01keratinocyte differentiation(B) 1.97E-08 0.043nf-kb signaling pathway(B) 3.14E-08 0.002influence of ras and rho proteins on g1 to s transition(B) 1.30E-07 0.034cyclins and cell cycle regulation(B) 4.58E-06 0.039cadmium induces dna synthesis and proliferation in macrophages(B) 7.73E-06 0.016PDGFRA Signaling Pathway(N) 1.36E-05 0.044cell cycle: g1/s check point(B) 1.38E-05 0.047EGF signaling pathway(I) 4.45E-05 0.04opposing roles of aif in apoptosis and cell survival(B) 5.82E-05 0.034acetylation and deacetylation of rela in nucleus(B) 1.30E-04 0.028

Page 19: Title: Assign Pathways to Gene Set June 21, 2007 Guanming Wu

Results from A Simple Results from A Simple SamplingSampling

Sampling: Randomly pick 3900 genes from all human genes

Pathways p Values Corrected p ValuesB cell receptor signaling pathway(I) 2.92E-07 <0.001TGFBR(C) 2.74E-06 <0.001Cell adhesion molecules (CAMs)(K) 3.53E-05 <0.001Jak-STAT signaling pathway(K) 1.26E-04 0.012EGFR1(C) 1.67E-04 0.037Signaling events mediated by PTP1B(N) 5.20E-04 0.007Focal adhesion(K) 9.74E-04 0.022

Page 20: Title: Assign Pathways to Gene Set June 21, 2007 Guanming Wu

SummarySummary

• A framework has been built to look for statistically significant pathways for a list of genes

• Using this framework, we found several pathways linking to the gene set from lung cancer CNVs

• However, relationships among these hit pathways and genes in these pathways need further investigations.

Page 21: Title: Assign Pathways to Gene Set June 21, 2007 Guanming Wu

Future DirectionsFuture Directions

• Validate the predicated results: Pick disease-related gene sets with known pathways (e.g. Type 1 diabetes)

• Develop a web based application to deploy the combined network to end users.

• Develop methods based on the Graph theory to explore relationships among genes in hit pathways: protein interaction data will be used as bridges to traversal different pathways.

Page 22: Title: Assign Pathways to Gene Set June 21, 2007 Guanming Wu

ReferenceReference

Osier, MV, Zhao, H and Cheung, KH: Handling multiple testing while interpreting microarrays with the Gene Ontology Database. BMC Bioinformatics 2004, 5: 124

Page 23: Title: Assign Pathways to Gene Set June 21, 2007 Guanming Wu

Thanks!!!Thanks!!!