1 developing semantic pathway alignment algorithms for systems biology jonas gamalielsson 2006-09-06

22
1 Developing Semantic Pathway Alignment Algorithms for Systems Biology Jonas Gamalielsson 2006-09-06

Upload: bonnie-curtis

Post on 11-Jan-2016

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1 Developing Semantic Pathway Alignment Algorithms for Systems Biology Jonas Gamalielsson 2006-09-06

1

Developing Semantic Pathway Alignment Algorithms for Systems Biology

Jonas Gamalielsson2006-09-06

Page 2: 1 Developing Semantic Pathway Alignment Algorithms for Systems Biology Jonas Gamalielsson 2006-09-06

2

higher order networks,revealing functional modules

Functional modules, which are hierarchically clustered

Systems Biology Stydy behaviour of complex biological systems Consider interaction of all cellular/molecular parts Often studied over time Goal: develop models for system understanding Powerful computational tools are required Organisation: "Life's complexity pyramid" (Oltvai & Barabasi, 2002)

genes, mRNA, proteins & metabolites

regulatory motifs & metabolic pathways

Page 3: 1 Developing Semantic Pathway Alignment Algorithms for Systems Biology Jonas Gamalielsson 2006-09-06

3

Thesis aim

To develop semantic pathway alignment algorithms for systems biology

Three related algorithms GOTEM (GO-based regulatory TEMplates) GOSAP (GO-based Semantic Alignment of biological

Pathways) EGOSAP (Evolutionary GO-based Semantic

Alignment of biological Pathways)

Page 4: 1 Developing Semantic Pathway Alignment Algorithms for Systems Biology Jonas Gamalielsson 2006-09-06

4

Gene Ontology (GO)Gene_Ontology

molecular_function biological_process cellular_component

catalytic activity transporter activity cellular process development cell extracellular

Function sub-graph Process sub-graph Component sub-graph

G1 G2

G3G4G1

G2

G3G4

G1

G2

G3

G4

Gx = gene product

Page 5: 1 Developing Semantic Pathway Alignment Algorithms for Systems Biology Jonas Gamalielsson 2006-09-06

5

Semantic similarity

ms(A,B)=D pms(A,B)=pD=0.10 SS(A,B)=-log2(0.10)=3.32

D

B C

A

p=0.10

p=0.03p=0.07

p=0.01Note: all nodes not shown in graph

is-a

is-a

is-a

EXAMPLE: Resnik (1995)

Page 6: 1 Developing Semantic Pathway Alignment Algorithms for Systems Biology Jonas Gamalielsson 2006-09-06

6

GOTEM: background 1(2) Highly desirable to derive gene regulatory

networks using gene expression data Reverse engineering (RE) algorithms derive a

model (set of rules) that fits the data Examples; boolean networks, neural networks,

Bayesian networks Limitations of RE algorithms

Many derived model networks can fit the same data Few derived networks are actually biologically feasible RE algorithms do not distinguish between biologically

plausible and implausible networks. Reduce search space?

Page 7: 1 Developing Semantic Pathway Alignment Algorithms for Systems Biology Jonas Gamalielsson 2006-09-06

7

GOTEM: background 2(2)

We propose GOTEM; GO-based regulatory TEMplates [1,2]

Contribution Means to distinguish between plausible and implausible

networks GOTEM generalises knowledge about gene products

using the molecular function part of Gene Ontology Binary semantic templates encoding general knowledge

of regulation are derived from documented pathways and used to assess the biological plausibility of regulatory hypotheses

[1] Gamalielsson, J., Olsson, B., Nilsson, P. (2005). A Gene Ontology based Method for Assessing the Biological Plausibility of Regulatory Hypotheses. Technical report, HS-IKI-TR-05-004, University of Skövde, Sweden[2] Gamalielsson, J., Nilsson, P., Olsson, B. (2006). A GO-based Method for Assessing the Biological Plausibility of Regulatory Hypotheses. In proceedings of the 2nd International Workshop on Bioinformatics Research and Applications (IWBRA 2006), Reading, Great Britain (May 2006)

Page 8: 1 Developing Semantic Pathway Alignment Algorithms for Systems Biology Jonas Gamalielsson 2006-09-06

8

GOTEM

Annotation databases

Templates

GO term probability calculation

Binary relations

Extract binary pathway relations

Template generation

Model pathway databases

Hypothesis assessment

Enriched GO graph

Method/algorithm

Data/information

GO

Regulatory hypotheses

Scored & ranked hypotheses

Page 9: 1 Developing Semantic Pathway Alignment Algorithms for Systems Biology Jonas Gamalielsson 2006-09-06

9

GOTEM: exampleRAD24 [act] MEC3SWI4 [expr] CLN1SWI4 [expr] CLN2SWI6 [expr] CLN1SWI6 [expr] CLN2CLN1 [phos] SIC1CLN2 [phos] SIC1CDC28 [phos] SIC1..

T1: GO:0003689 [act] GO:0003677T2: GO:0003689 [act] GO:0003676T3: GO:0003689 [act] GO:0005488T4: GO:0003689 [act] GO:0003674T5: GO:0003677 [act] GO:0003677T6: GO:0003677 [act] GO:0003676T7: GO:0003677 [act] GO:0005488..

GO-score(Tx)=-log2((p(GOIDLHS)+p(GOIDRHS))/2)

RAD24 [?] MEC3CLN1 [?] SWI4MBP1 [?] CLN2...

TM1: GO:0003689 [act] GO:0003677(GO-score=6.80)..

TM1: GO:0003674 [exp] GO:0003674(GO-score=0)..

Generation

Assessment

TM1: GO:0003700 [exp] GO:0016538(GO-score=5.88)..

Page 10: 1 Developing Semantic Pathway Alignment Algorithms for Systems Biology Jonas Gamalielsson 2006-09-06

10

GOTEM: results

Test Templates created from KEGG S. cerevisiae cell cycle Reverse engineered hypotheses from microarray

gene expression data Assess how well templates can separate true positive

interactions from false positive ones Results

Method can filter out a large proportion of implausible hypotheses

Hence, improves specificity of network reconstruction

Page 11: 1 Developing Semantic Pathway Alignment Algorithms for Systems Biology Jonas Gamalielsson 2006-09-06

11

GOSAP: background 1(2)

Large base of biological pathways Need for pathway analysis methods:

Inter-species comparisons Intra-species comparisons Assess hypothetical pathways

Limitations of related work Previous efforts on metabolic pathways Little work on approximate matching by semantic

similarity EC hierarchy used before, which only covers the

molecular function of enzymes

Page 12: 1 Developing Semantic Pathway Alignment Algorithms for Systems Biology Jonas Gamalielsson 2006-09-06

12

GOSAP: background 2(2)

We propose GOSAP; GO-based Semantic Alignment of

biological Pathways [3,4] Contribution

GO has not been used before for semantic pathway alignment

GOSAP generalises about any kind of gene product using GO, not only enzymes

Richer semantic description of gene products by combining function-, process- and component ontologies of GO in similarity calculations

[3] Gamalielsson, J., Olsson, B. (2005). GOSAP: Gene Ontology Based Semantic Alignment of Biological Pathways.Technical report, HS-IKI-TR-05-005, University of Skövde, Sweden[4] Gamalielsson, J., Olsson, B. (200x). GOSAP: GO-based Semantic Alignment of Biological Pathways. Manuscript in preparation.

Page 13: 1 Developing Semantic Pathway Alignment Algorithms for Systems Biology Jonas Gamalielsson 2006-09-06

13

GOSAPOrganism annotation databases

GO term probability calculation

Model paths

Extraction of super-paths

Model pathway database

Path alignment

Enriched GO graph

Procedure/algorithm

Data/information

GO graph

Query paths

Scored & ranked path alignments

Query pathway database

Parameter settings

Page 14: 1 Developing Semantic Pathway Alignment Algorithms for Systems Biology Jonas Gamalielsson 2006-09-06

14

GOSAP: examplePath extraction

Path alignment

e.g.1. SWI4 [e]>CLN12. SWI4 [e]>CLN2[p]>SIC13. MBP1[e]>CLB5[p]>CDC6..

Only super-paths, extracted by depth-first based algorithm.

1. SWI4[?]>CLN2

2. MBP1[?]>CLN1[?]>CDC6

1. SWI4 [e]>CLN1

2. SWI4 [e]>CLN2[p]>SIC1

3. MBP1[e]>CLB5[p]>CDC6

Query paths Model pathsalign

Example alignment

Q: FAR1 ?> SIC1 (GAP) CLN2 ?> SIC1M: FAR1 i> CLN1 p> SWI6 e> CLN2 p> SIC1F: GO:0004861 > GO:0019207 (GAP) GO:0016538 > GO:0019210P: GO:0007050|GO:0045786 > GO:0000079 (GAP) GO:0000320|GO:0000321 > GO:0000079C: GO:0005634 > GO:0005634 (GAP) GO:0005634 > GO:0005634

Page 15: 1 Developing Semantic Pathway Alignment Algorithms for Systems Biology Jonas Gamalielsson 2006-09-06

15

GOSAP: results Test

Model pathways: KEGG S. cerevisiae cell cycle, metabolic pathways Query pathways: Reverse engineered (RE) regulatory pathways,

KEGG MAPK, metabolic pathways Assess if GOSAP can find significant alignments of biological

interest Results

Method is able to detect significant alignments between RE paths and model paths

and between different metabolic pathways suggest missing gene products in query paths

Combined ontologies resulted in significant alignments when molecular function alone did not

Page 16: 1 Developing Semantic Pathway Alignment Algorithms for Systems Biology Jonas Gamalielsson 2006-09-06

16

EGOSAP: background 1(2) Large base of biological pathways and microarray

gene expression data Sometimes only hypothetical sets of gene

products are known Highly desirable derive interactions between gene

products Limitations of related work

Previous efforts merely map genes onto known pathways by identity

No work on approximate matching by semantic similarity Related methods do not attempt to assemble

hypothetical paths using a query set of gene products

Page 17: 1 Developing Semantic Pathway Alignment Algorithms for Systems Biology Jonas Gamalielsson 2006-09-06

17

EGOSAP: background 2(2) We propose

EGOSAP; Evolutionary GO-based Semantic Alignment of biological Pathways [5]

Contribution GO has not been used before for semantic pathway

alignment GOSAP generalises about any kind of gene product

using GO, not only enzymes Richer semantic description of gene products by

combining function-, process- and component ontologies of GO in similarity calculations

Hypothetical paths are assembled using an evolutionary algorithm and a query set of gene products

[5] Gamalielsson, J., Corne, D. W., Olsson, B. (200x). EGOSAP: Evolutionary Gene Ontology Based Semantic Alignment of Biological Pathways. Manuscript in preparation.

Page 18: 1 Developing Semantic Pathway Alignment Algorithms for Systems Biology Jonas Gamalielsson 2006-09-06

18

EGOSAPOrganism annotation databases

GO term probability calculation

Model paths

Path extraction

Model pathway database

Evolution of path alignments

Enriched GO graph

Procedure/algorithm

Data/information

GO graph

Query set of gene products

Path alignments

Parameter settings

Page 19: 1 Developing Semantic Pathway Alignment Algorithms for Systems Biology Jonas Gamalielsson 2006-09-06

19

EGOSAP: example Evolutionary algorithmt0initialise P(t)evaluate P(t)while(not term-cond) dobegin tt+1 select P(t) from P(t-1) alter P(t) evaluate P(t) apply elitism to P(t)end

P(t): a set of gene product permutations initialised from query alphabet

evaluate: Calculate fitness, i.e. semantic similarity score btw model path and each evolved path in P(t).

select: tournament selection

alter: partially mapped crossover, mutation

Example alignment (fitness=0.73, p=0.01):

Query (mouse): MEF2C > NR2F6 > NRBF2 > AFG3L2 > TRIM28Model (yeast): SWI6 > SWI4 > NDD1 > ACE2 > SFL1Function: GO:0003713 > GO:0003700 > GO:0016563 > GO:0008237 > GO:0016564Process: GO:0006366 > GO:0007049 > GO:0006357 > GO:0006508 > GO:0000122Component: GO:0005634 > GO:0005634 > GO:0005634 > GO:0016021 > GO:0005694

Page 20: 1 Developing Semantic Pathway Alignment Algorithms for Systems Biology Jonas Gamalielsson 2006-09-06

20

EGOSAP: results Test

Model pathway: S. cerevisiae regulatory chain motifs Query set: Differentially expressed genes for

transgenic and knock-out mice Assess if EGOSAP can evolve significant alignments

(of biological interest) Results

Method is able to detect significant alignments between evolved paths and model paths.

Like for GOSAP, combined ontologies resulted in significant alignments when molecular function alone did not

Page 21: 1 Developing Semantic Pathway Alignment Algorithms for Systems Biology Jonas Gamalielsson 2006-09-06

21

Conclusions

Three methods for semantic analysis of biological pathways are developed

Methods assess biological plausibility of derived pathways compare different pathways for semantic similarities evolve hypothetical pathways similar to model pathways

Methods are novel Methods are believed to be useful to biologists

Page 22: 1 Developing Semantic Pathway Alignment Algorithms for Systems Biology Jonas Gamalielsson 2006-09-06

22

Write-up schedule

September 2006 Thesis contributions, thesis skeleton, set of chapters,

draft of GOSAP paper October 2006

Submission GOSAP paper, redrafts of earlier material, set of new chapters, draft of EGOSAP paper

November 2006 Submission EGOSAP paper, nearly complete thesis draft

December 2006 - February 2007 Continual refinement of thesis

March 2007 Submission of thesis