1 semantic aggregation, integration, and inference of pathway data co-destructors: joanne luciano,...

92
1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD [email protected] Jeremy Zucker [email protected] ISMB 2005 Tutorial Detroit Michigan June 25 th 2005 http://www.biopathways.org/ismb2005tutorial -am6/ edantic Aggravation, Irritation, and Interference)

Upload: owen-crawford

Post on 02-Jan-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

1

Semantic Aggregation, Integration, and

Inference of Pathway Data

Co-Destructors:

Joanne Luciano, PhD [email protected]

Jeremy [email protected]

ISMB 2005 Tutorial Detroit MichiganJune 25th 2005

http://www.biopathways.org/ismb2005tutorial-am6/

(Pedantic Aggravation, Irritation, and Interference)

Page 2: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

2

Overview

Introduction (45 minutes)

Time Out (15 minutes)Workshop Case Studies & Exercises (2 hrs

15 minutes)

Subdivide into groups of triads and dyads

•Case Study I (45 minutes)•Case Study II (45 minutes)•Case Study III (45 minutes)

Time Out (15 minutes)Lessons Learned (30 minutes)Lessons Not Yet Learned (take home)

Page 3: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

3

Introduction (45 minutes)

Semantic Aggregation, Integration and Inference of Pathway Data

Pathway Data (domain)– What is it?– What does it look like?– Why do we care? (motivation)

Definitions & DisclaimersStrategies

Page 4: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

4

What is it?Pathway Databases

So many pathway databases, so little time.

Pathway Data (domain)

Graphic from Mike Cary and Gary Bader

Page 5: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

5

MetabolicPathways

MolecularInteractionNetworks

SignalingPathways

The Main Categories

GeneRegulation

GlycolysisProtein-Protein Apoptosis Lac Operon

Different types of pathways

(different strokes for different folks, it’s OK.)

Page 6: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

6

Different representations of the same pathways

KEGG Reference Pathway GLYCOLYSIS

<!ELEMENT reaction (substrate*,product*)>

<!ATTLIST reaction name %keggid.type; #REQUIRED>

<!ATTLIST reaction type %reaction-type.type; #REQUIRED>

<!ELEMENT substrate EMPTY>

<!ATTLIST substrate name %keggid.type; #REQUIRED>

<!ELEMENT product EMPTY>

<!ATTLIST product name %keggid.type; #REQUIRED>

starts at -D-Glucose 1P

Page 7: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

7

Different representations of the same pathways

BioCYC Reference Pathway GLYCOLYSIS

reactions.dat This file lists all chemical reactions in the PGDB.

Attributes: UNIQUE-ID TYPES COMMON-NAME ACTIVATORS BASAL-TRANSCRIPTION-VALUE DBLINKS DELTAG0 DEPRESSORS EC-LIST EC-NUMBER ENZYMATIC-REACTION EQUILIBRIUM-CONSTANT IN-PATHWAY INHIBITORS LEFT MOVED-IN MOVED-OUT OFFICIAL-EC? REACTANTS REQUIREMENTS RIGHT SIGNAL SPECIES SPONTANEOUS? STIMULATORS SYNONYMS

starts at -D-glucose6-phosphate

Page 8: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

8

Different representations of the same pathways

Reactome Pathway GLYCOLYSIS

<reaction name="R_alpha_D_glucose_6_phosphate_D_fructose_6_phosphate" id="R_163457">

<listOfReactants>

<speciesReference species="R_30537_alpha_D_Glucose_6_phosphate" />

</listOfReactants>

<listOfProducts>

<speciesReference species="R_29512_D_Fructose_6_phosphate" />

</listOfProducts>

<listOfModifiers>

<modifierSpeciesReference species="R_163455_glucose_6_phosphate_isomerase_dimer_name_copied_from_complex_in_Homo_sapiens_" />

</listOfModifiers>

</reaction>

DatabaseObject [41245]

Event [8285]

Reaction [6598]

ConcreteReaction [4034]

GenericReaction [2564]

Page 9: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

9

Different representations of the same pathways

BioCarta Reference Pathway GLYCOLYSIS

Does not compute.

Pretty,but useless

Starts at Glucose (but it doesn’t matter)

Reactions clickable but...

Page 10: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

10

Pathway Data Why do we care?

Pathway Research has Broad Impact

– Drug Discovery (pathway of target, safety)– Basic Science (identify pathways)– Disease Research (cancer pathways)– Environmental Research (microbial research)

Combine knowledge from multiple sources– Whole is greater than the sum of its parts– Biological knowledge is fragmented– Need database to manage resources

Page 11: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

11

Aggregation2 (or more) data sources, different data models, common link between (among) them.

Integration2 (or more) data sources, same data model, semantic mapping and instance merging required.

Inference1 (or more) data sources, one data model, creating new instances or new relationships.(Evidence code type kind of “inference”)

Disclaimer “Controlled” Vocabulary scope = this tutorial

Definitions & Disclaimers

Page 12: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

12

Assembling KnowledgeAggregation, Integration,

InferenceUse Case I

Use Case IIIUse Case II

“When it comes to data cleaning, there’s no such thing as a free lunch.” Tim Berners-Lee

Some tasks are specific to a use case, some are common to more than one and there’s no escaping others.

Page 13: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

13

Bridging Chemistry and Molecular Biology

Uniprot:P49841

•Different Views have different semantics: Lenses

• When there is a correspondence between objects, a semantic binding is possible

Apply Correspondence Rule:if ?target.xref.lsid == ?bpx:prot.xref.lsidthen ?target.correspondsTo.?bpx:prot

Source: Eric Neumann Haystack BioDASH Demo http://www.w3.org/2005/04/swls/BioDash/Demo/

Page 14: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

14

GO2Keyword.rdf

UniProt.rdf

GO.rdf

Keywords.rdf

Taxonomy.rdfPubMed.xml

Citation

IntAct.rdf

Organism

Enzymes.rdf

OMIM.rdf

GO2OMIM.rdf

GO2Enzyme.rdf

MIM Id

KEGG.rdf

KeywordGO2UniProt.rdf

Protein

Enzyme

ProbeSet.rdf

Gene

Probe

Pathway

Compound

1. Differentiate different forms of disease

2. Identify patients subgroups.

3. Identify top biomarkers

4. Identify function

5. Identify biological and chemical properties and disease associations of biomarker

6. Identify documents

7. Identify role in metabolic pathways

8. Identify compounds that interact

9. Identify and compare function in other organisms

10. Identify any prior art

Seamark Demonstration: Identification of new drug candidates

Page 15: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

15

SMBL integration using BioPAX

Use BioPAX to Address SBML’s data integration issues

• Different data types, same representation

• Same data, different representations

• External references…• Synonyms…• Provenance…

Page 16: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

16

A problem: same representation different semantics (SBML)

Protein-Protein Interaction

<reaction id=“pyruvate_dehydrogenase_cplx”/> <listOfReactants> <speciesRef species=“PdhA”/> <speciesRef species=“PdhB”/> </listOfReactants> <listOfProducts> <speciesRef

species=“Pyruvate_dehydrogenase_E1”/>

</listOfProducts>

</reaction>

Biochemical Reaction<reaction id=“pyruvate_dehydrogenase_rxn”/> <listOfReactants> <speciesRef species=“NADP+”/> <speciesRef species=“CoA”/> <speciesRef species=“pyruvate”/> </listOfReactants> <listOfProducts> <speciesRef species=“NADPH”/> <speciesRef species=“acetyl-CoA”/> <speciesRef species=“CO2”/> </listOfProducts> <listOfModifers> <modifierSpeciesRef

species=“pyruvate_dehydrogenase_E1”/> </listOfModifiers>

</reaction>

Page 17: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

17

SBML annotated with BioPAX

<sbml xmlns:bp=“http://www.biopax.org/release1/biopax-release1.owl” xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"><listOfSpecies> <species id=“PdhA” metaid=“PdhA”> <annotation> <bp:protein rdf:ID=“#PdhA”/> </annotation> </species> <species id=“NADP+” metaid=“NADP+”> <annotation> <bp:smallMolecule rdf:ID=“#NADP+”/> </annotation> </listOfSpecies><listOfReactions> <reaction id=“pyruvate_dehydrogenase_cplx”> <annotation> <bp:complexAssembly rdf:ID=“#pyruvate_dehydrogenase_cplx”/> </annotation> </reaction></listOfReactions>

species is protein

protein is PdhA

species is small molecule

small molecule is NADP+

Page 18: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

18

BioPAX: External References

<species id=“pyruvate” metaid=“pyruvate”><annotation xmlns:bp=“http://biopax.org/release1/biopax-release1.owl”>

<bp:smallMolecule rdf:ID=“#pyruvate”> <bp:Xref> <bp:unificationXref rdf:ID=“#unificationXref119">

<bp:DB>LIGAND</bp:DB> <bp:ID>c00022</bp:ID> </bp:unificationXref> </bp:Xref> </bp:smallMolecule> </annotation></species>

Page 19: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

19

BioPAX: Synonyms

<species id=“pyruvate” metaid=“pyruvate”><annotation xmlns:bp=“http://biopax.org/release1/biopax_release1.owl”/>

<bp:smallMolecule rdf:ID=“#pyruvate” > <bp:SYNONYMS>2-oxo-propionic acid</bp:SYNONYMS>

<bp:SYNONYMS>2-oxopropanoate</bp:SYNONYMS> <bp:SYNONYMS>BTS</bp:SYNONYMS> <bp:SYNONYMS>pyruvic acid</bp:SYNONYMS></bp:smallMolecule></annotation></species>

Page 20: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

20

Strategies

• Develop bridging technologies

• Develop pathway representation standard within the Life Science community (BioPAX) (Social Engineering!)

• Utilize Semantic Web Integration Technologies (LSID, RDF/OWL)

How we get to a Standard Pathway Representation? (Game plan: Take over the world

or have the world take over itself?)

Page 21: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

21

Exchange Formats in Pathway Data Space

(Scope)

BioPAX

PSI-MI 2SBML,CellML

GeneticInteractions

Molecular InteractionsPro:Pro All:All

Interaction NetworksMolecular Non-molecularPro:Pro TF:Gene Genetic

Regulatory PathwaysLow Detail High Detail

Database ExchangeFormats

Simulation ModelExchange Formats

RateFormulas

Metabolic PathwaysLow Detail High Detail

Biochemical Reactions

Small MoleculesLow Detail High Detail

Graphic from Mike Cary & Gary Bader

Page 22: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

22

BioPAX Objectives

• Accommodate existing database representations

• Integration and exchange of pathway data

• Interchange through a common (standard) representation

• Provide a basis for future databases• Enable development of tools for searching and reasoning over the data

Page 23: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

23

BioPAX Motivation

Before BioPAX With BioPAX

Common format will make data more accessible, promoting data sharing and distributed curation efforts

>180 DBs and tools

Database

Application

User

Page 24: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

24

BioPAX Biological PAthway

eXchange

A data exchange ontology and format for biological pathway integration, aggregation and

inference

Initiative arose from the community

Page 25: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

25

MetabolicPathways

MolecularInteractionNetworks

SignalingPathways

GeneRegulation

Glycolysis Apoptosis Lac Operon

BioPAXLevel 1

Biological pathways of the Cell

What is a Pathway?Protein-Protein

BioPAXLevel 2

Page 26: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

26

Aggregation, Integration, Inference

1. Multiple kinds of pathway databases– metabolic– molecular interactions– signal transduction

2. Constructs designed for integration– DB References– XRefs (Publication, Unification,

Relationship)– synonyms– provenance

3. OWL DL – to enable reasoning

Page 27: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

27

phosphoglucoseisomerase 5.3.1.9

OWL(schema)

Instances (Individuals)

(data)

BioPAX Biochemical Reaction

Page 28: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

28

BioPAX Ontology

• Conceptual framework based upon existing DB schemas:

• aMAZE, BIND, EcoCyc, WIT, KEGG, Reactome, etc.• Allows wide range of detail, multiple levels of abstraction

• BioPAX ontology in OWL (XML)• Designed for pathway database integration– Database ID– Unification X-REF– Relationship X-REF– Publication X-REF– Synonyms– Provenance

Page 29: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

29

BioPAX uses other ontologies

• Use pointers to existing ontologies to provide supplemental annotation where appropriate– Cellular location GO Component– Cell type Cell.obo– Organism NCBI taxon DB

• Incorporate other standards where appropriate– Chemical structure SMILES, CML, INCHI

Page 30: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

30

BioPAX Ontology: Overview

Level 1 v1.0 (July 7th, 2004)

parts

how the parts are known to interact

a set ofinteractions

Page 31: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

31

BioPAX Ontology: Top Level

• Pathway– A set of interactions– E.g. Glycolysis, MAPK, Apoptosis

• Interaction– A set of entities and some relationship

between them– E.g. Reaction, Molecular Association,

Catalysis• Physical Entity

– A building block of simple interactions– E.g. Small molecule, Protein, DNA, RNA

Entity

Pathway

Interaction

Physical Entity

Subclass (is a)Contains (has a)

Graphic from Gary Bader

Page 32: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

32

BioPAX Ontology: Root

• Root class: Entity– Any concept referred to as a discrete biological unit when describing pathways. This is the root class for all biological concepts in the ontology, which include pathways, interactions and physical entities

Page 33: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

33

Metabolic PathwaysInteraction sub-classes

Definition An entity that defines a single biochemical interaction between two or more entities.

An interaction cannot be defined without the entities it relates.

participants

Page 34: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

34

Metabolic PathwaysInteraction sub-classes

Definition Two terms exist under interaction: Control and conversion. In future BioPAX levels, this list may be extended to include other classes, such as genetic interactions.

Examples Enzyme catalysis controls a biochemical reaction, transport catalysis controls transport, a small molecule that inhibits a pathway by an unknown mechanism controls the pathway.

Page 35: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

35

BioPAX as a solution toAggregation, Integration,

Inference1. Multiple kinds of pathway databases

– metabolic– molecular interactions– signal transduction– gene regulatory

2. Constructs designed for integration– DB References– XRefs (Publication, Unification,

Relationship)– Synonyms– Provenance (not yet implemented)

3. OWL DL – to enable reasoning

Page 36: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

36

Time Out

(15 minutes)

Page 37: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

37

Workshop Case Studies & Exercises

(2 hrs 15 minutes)

Break into groups of triads and dyads

Case Study I (45 minutes)• Use Case 1: Inference of a Metabolic Flux Model from an Annotated Genome

• Group Exercise 1

Case Study II (45 minutes)• Use Case 2: Integration of a metabolic flux model from two sources

• Group Exercise 2

Case Study III (45 minutes)• Use Case 3: Multi-source aggregation Validation and Testing

• Group Exercise 3

Page 38: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

38

Methodology

• Define the goal of the integration– How will the integrated data be used?– This defines the level of integration from syntactic through semantic

• Take stock of current resources– This defines your staring point

• Data base sources, programmers, lab access, collaborators

• Scope the work to get from B to A– Data Profiling– Resource Profiling

Page 39: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

39

3 Case Studies

• Case study I: Semantic Inference of metabolic pathway data from an annotated genome.

• Case study II: Semantic Integration of a metabolic flux model from two sources.

• Case study III: Semantic Aggregation of pathway data from multiple sources

Page 40: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

40

Case Study I:Inference of a Metabolic

Flux Model from an Annotated Genome

• Objective: To apply Biological knowledge to constrain the possible behaviors of a metabolic network.

• Resources: Annotated Genome, Transport DB, Pathway databases, experimental community, published literature

Page 41: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

41

Genes make RNA make Protein

Gene1 P1RNA 1

Gene2 P2RNA 2

Gene3 P3RNA 3

Gene4 P4RNA 4

Gene6 P6RNA 6

Gene7 P7RNA 7

Gene8 P8RNA 8

Legend:

Enzyme

Transporter

Transcription

Translation

Gene RNA Protein

Gene5 P5RNA 5

Gene9 P9RNA 9

Page 42: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

42

Proteins catalyze biochemical reactions

P2

P4

P8

Legend:Metabolites: A-F

P1 P5 P9

Periplasm

Cytoplasm

FEA

A B

A C

2 DB

E

2 BC

F

C D

D

Reaction:

Enzyme

Transporter

P6

P3 P7

Catalyzes

Page 43: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

43

Biochemical reactions comprise a metabolic

network

Legend: Exchange IntracellularObjective

Biomass: R8

B

A

F

2D

E

C

R3

R2

Waste: R9

Uptake: R5

Uptake: R1

R4

R7

R6 D

2B

Page 44: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

44

Metabolic Inference Subgoals

1. Infer genes from sequence and homology2. Infer enzymatic reactions from Enzyme

Commission (EC) numbers3. Infer metabolic reaction network from

enzymatic reactions and metabolites.4. Infer pathway holes using network

debugging algorithms5. Propose candidate enzymes using pathway-

hole filling algorithms6. Add experimentally verified candidates

to the annotated genome7. Lather, rinse, repeat

Page 45: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

45

Data Profiling of the Annotated Genome

• Orphaned genes• Orphaned enzymes• Misannotated genes• Misannotated enzymes• Sequencing errors• BLAST Algorithm errors

Page 46: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

46

Schema Level Errors

Gene that codes for the gene product (protein enzyme)

Enzyme (protein) that catalyzes the biochemical reaction

Biochemical reaction

Biochemical reaction

Page 47: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

47

Semantic bugs revealed by chemical structure

EcoCyc 7.5 Pathway:Riboflavin and FMN

and FAD biosynthesis

No place to go!4-(1-D-ribitylamino)-5-amino-2,6-dihydroxypyrimidine:

Page 48: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

48

EcoCyc 8.0 Pathway:Riboflavin and FMN

and FAD biosynthesis

Synonyms 4-(1-D-ribitylamino)-5-amino-2,6-dihydroxypyrimidine:

Semantic bugs revealed by chemical structure

Page 49: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

49

Data Profiling of Pathway/Genome Database

• Unbalanced Reactions• Pathway holes• Unproducible metabolites• Generalized Metabolites• Unconsumable metabolites (toxins)

Page 50: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

50

Biomass

Bugs in Network structure revealed by Forward and

Backward chainingFired

Reaction

Missing essentialcompound

Known Nutrient

set

Essential

compounds

Unfired Reaction

Page 51: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

51

Biomass

Bugs in Network structure revealed by Forward and

Backward chaining

Missing essentialcompound

Essential

compounds

Precursor metabolite

Unproduced metabolite

Page 52: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

52

Case study II:Integration of a metabolic

flux model from two sources

• What is metabolic flux analysis?• How does one build a metabolic flux model?

• What can go wrong in building a metabolic flux model?

Page 53: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

53

What is Metabolic Flux Analysis?

• Starts with the metabolic network• Assumes steady-state behavior• Constrain with Thermodynamics• Add Nutrient conditions• Choose an objective: Biomass

growth• Predicts growth rate for mutant

and wild-type organisms under different conditions.

Page 54: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

54

Start with the metabolic network

Flux legend:Exchange IntracellularObjective

Objective

v8

B

A

F

2D

E

C

v3

v2

Waste: v9

Uptake: v5

Uptake: v1

v4

v7

v6 D

2B

Page 55: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

55

Stoichiometric Matrix: Representation of the

metabolic networkR1 R2 R3 R4 R5 R6 R7 R8 R8

A +1 -1 -1

B +1 -1 -2

C +1 +1 -1

D 2 +1 -1

E -1 +1

F +1 -1

R4: B + E → 2D

R5: → ER6: 2B → C + FR7: C → DR8: D →R9: F →

R1: → AR2: A → BR3: A → C

Page 56: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

56

What is a metabolic flux?

Sink fluxes

Source fluxes

Metabolite Pool

Page 57: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

57

What is a metabolic flux?

dt

Bd

dt

Adv

][][2

For a reaction of stoichiometry R2: A → B

the rate of reaction, or flux is equal to:

For a reaction of stoichiometry R4: B+E → 2D

the flux is equal to:

dt

Dd

dt

Ed

dt

Bdv

][

2

1][][4

Page 58: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

58

What is a metabolic flux?

For a reaction of stoichiometry R4: B+E → 2D

The rate of reaction, or flux, is equal to:

dt

Dd

dt

Ed

dt

Bdv

][

2

1][][4

Page 59: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

59

At steady-state, nonlinear dynamics simplify to

linear fluxes.

0321

][

]3][[

][

]2][[

][

]1][[

3,

3

2,

2

1,

1

vvvdt

dA

KC

PCk

KB

PBk

KA

PAk

dt

dA

mmmext

ext

Aext AP1

P2

P3

B

C

AextA

v1

B

v2

v3

C

k1

k2

k3

Page 60: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

60

At steady-state, the sum of the fluxes that produce a metabolite is equal to the sum of the fluxes that

consume it.

0][

i

iivcdt

Ad

AextA

v1

B

v2

v3

C

Page 61: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

61

Stoichiometric Matrix: more unknowns than equations

0763][

046*22][

0321][

vvvdt

Cd

vvvdt

Bd

vvvdt

AdR1 R2 R4 R4 R5 R6 R7 R8 R9

A +1 -1 -1

B +1 -1 -2

C +1 +1 -1

D 2 +1 -1

E -1 +1

F +1 -1

v1

v2

v3

v4

v5

v6

v7

v8

v9

0874*2][

vvvdt

Dd

096][

054][

vvdt

Fd

vvdt

Ed

Page 62: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

62

How to determine the metabolic capabilities of

a network?

Flux legend:Exchange IntracellularObjective

Biomass: v8

B

A

F

2D

E

C

v3

v2

Waste: v9

Uptake: v5

Uptake: v1

v4

v7

v6 D

2B

Page 63: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

63

B

A

F

2D

E

C

v3

v2

R9

v5

v1

v4

v7

v6 v8D

2BB

A

F

2D

E

C

v3

v2

v9

v5

v1

v4

v7

v6 v8D

2B

B

A

F

2D

E

C

v3

v2

v9

v5

v1

v4

v7

v6 v8D

2B

EE

Using Elementary modes to study the steady state-behavior

V1 v2 v3 v4 v5 v6 v7 v8 v9

A +1 -1 -1

B +1 -1 -2

C +1 +1 -1

D 2 +1 -1

E -1 +1

F +1 -1

Page 64: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

64

How to make predictions about the behavior of the

metabolic network?

Flux legend:Exchange IntracellularObjective

Biomass: v8

B

A

F

2D

E

C

v3

v2

Waste: v9

Uptake: v5

Uptake: v1

v4

v7

v6 D

2B

Page 65: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

65

B

A

F

2D

E

C

v3

v2

v9

v5

v1

v4

v7

v6v8

D

2B

10

10

10

10

20

Optimal wild-type flux distribution

Optimal Growth Flux

Page 66: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

66

B

A

F

2D

E

C

v3

v2

v9

v5

v1

v4

v7

v6v8

D

2B

10

1010

10

STOP

Optimal mutant flux distribution

Page 67: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

67

B

A

F

2D

E

C

v3

v2

v9

v5

v1

v4

v7

v6

v8

D

2B

10

3.36.7

6.7

STOP6.7

3.3

3.3

Suboptimal mutant flux distribution

Page 68: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

68

Case II: Palsson JR904

• good flux balance model• implicit schema• literature curated biochemical reactions

• 904 enzymatic reactions• gene, enzyme-reaction associations

Page 69: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

69

Case II: What sources of data are

available to build a Metabolic Flux model?

• Annotated Genome• Literature• Pathway Databases• Experimental measurements

Page 70: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

70

(fluxes in [mmol/gr DM h] normalized to glucose uptake flux)

(Segrè, Vitkup and Church, PNAS 2002)

0 50 100 150 200

0

50

100

150

200

12

3

45 6

7

8

9

10

1112

1314

15

16

17

WT (FBA)C 0.4

vi (exper)

v i (t

heor

)

Corr.coeff.=0.97

Model vs. Exper., Glucose limited

Page 71: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

71

- 5 0 0 5 0 1 0 0 1 5 0 2 0 0 2 5 0- 5 0

0

5 0

1 0 0

1 5 0

2 0 0

2 5 0

1

2

3

4

5 6

78

9

1 0

1 1

1 2

1 31 4

1 5

1 6

1 7

- 5 0 0 5 0 1 0 0 1 5 0 2 0 0 2 5 0- 5 0

0

5 0

1 0 0

1 5 0

2 0 0

2 5 0

1

2

34 5 6

78

9

1 0

1 1

1 2

1 31 4

1 5

1 6

1 7

- 5 0 0 5 0 1 0 0 1 5 0 2 0 0 2 5 0- 5 0

0

5 0

1 0 0

1 5 0

2 0 0

2 5 0

1

2

3

456

7

8

91 0

1 11 2

1 3

1 4

1 5

1 6

1 7

0 5 0 1 0 0 1 5 0 2 0 0

0

5 0

1 0 0

1 5 0

2 0 0

12

3

45 6

7

8

9

1 0

1 11 2

1 31 4

1 5

1 6

1 7

0 5 0 1 0 0 1 5 0 2 0 0

0

5 0

1 0 0

1 5 0

2 0 0

1

2

3

45

6

7

8

9

1 0

1 11 2

1 31 4

1 5

1 6

1 7

0 5 0 1 0 0 1 5 0 2 0 0

0

5 0

1 0 0

1 5 0

2 0 0

12

3

45 6

7

8

9

1 0

1 11 2

1 31 4

1 5

1 6

1 7

0 5 0 1 0 0 1 5 0 2 0 0

0

5 0

1 0 0

1 5 0

2 0 0

1

2

3

4 5 6

78

9

1 01 1

1 2

1 31 4

1 5

1 6

1 7

0 5 0 1 0 0 1 5 0 2 0 0

0

5 0

1 0 0

1 5 0

2 0 0

1

2

3

45 6

78

9

1 0

1 1

1 21 3

1 4 1 5

1 6

1 7

0 5 0 1 0 0 1 5 0 2 0 0

0

5 0

1 0 0

1 5 0

2 0 0

1

2

3

4

5 6

7

8

9

1 01 1

1 2

1 3

1 4

1 5

1 6

1 7

WT

(FB

A)

KO

(F

BA

)K

O (

MP

A)

C 0 . 0 9 C 0 . 4 N 0 . 0 9

)exper(iv )exper(iv )exper(iv

)th

eor

(iv

)th

eor

(py

kiv

)th

eor

(py

kiv

)th

eor

(py

kiv

)th

eor

(py

kiv

A

B

C

D

E

F

G

H

I

Low Glucose Limited High Glucose Limited Nitrogen Limited

i (exper) i (exper) i (exper)

Corr.coeff.=0.91 Corr.coeff.=0.97 Corr.coeff.=0.78

Page 72: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

72

-50 0 50 100 150 200 250-50

0

50

100

150

200

250

1

2

34 5

6

78

9

10

11

12

1314

15

16

17

vi (exper)

v i ( t

heor

)

Corr.coeff.= - 0.064P-value=0.6

-50 0 50 100 150 200 250-50

0

50

100

150

200

250

1

2

34 5

6

78

9

10

11

12

1314

15

16

17

vi (exper)

v i ( t

heor

)

Corr.coeff.= - 0.064P-value=0.6

Max growth (optimal)

-50 0 50 100 150 200 250-50

0

50

100

150

200

250

1

2

3

456

78

910

1112

1314

15

16

17

vi (exper)

v i (t

heor

)

Corr.coeff.=0.564

P-value=0.007

Min Adjust. (suboptimal)

Page 73: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

73

The power of a model lies in its ability to distinguish between

competing hypotheses

Page 74: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

74

Case II: EcoCyc

• good schema• Flux balance model doesn’t work

Page 75: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

75

What happens if the steady-state behavior of the model fails to

reproduce the steady-state behavior of the organism?

GenomePathologic

Transporterprediction

Pathway/GenomeDatabase

BioCycto

SBML

Nutrients &Objective

FBA &MOMA

Fluxprediction

ModelDefinition

(SBML)

Page 76: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

76

What happens if the steady-state behavior of the model fails to

reproduce the steady-state behavior of the organism?

GenomePathologic

Transporterprediction

Pathway/GenomeDatabase

BioCycto

SBML

NetworkDebugging

Nutrients &Objective

FBA &MOMA

Fluxprediction

ModelDefinition

(SBML)

Page 77: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

77

Case II: EcoCyc/JR904

• Best of both worlds

• Biological Objective: From nutrients create all essential compounds required for growth

• True test of metabolic databases: Is the data good enough to predict growth rate under different nutrient conditions and effect of gene knockouts?

Page 78: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

78

Case II: Schema level integration

• Translation from BioCyc ontology to BioPAX ontology

• Translation of implicit JR904 schema to BioPAX ontology

• Integration of JR904 concepts with BioPAX ontology (flux limits)

Page 79: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

79

Case II: Instance level

• EcoCyc <-> JR904 Gene names • EcoCyc <-> JR904 Enzyme names• EcoCyc <-> JR904 Reaction names• EcoCyc <-> JR904 Reversibility/flux limits

• EcoCyc <-> JR904 Gene->protein associations

• EcoCyc <-> JR904 protein->enzyme complex associations

• EcoCyc <-> JR904 enzyme->reaction associations

Page 80: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

80

Data Profiling of Flux Model

• Incorrect constraints (reversibility)• Incorrect Nutrient conditions• Incorrect Biomass composition• Incorrect protein function predictions

Page 81: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

81

Data profiling of Flux Predictions

• Incorrect hypothesis (FBA vs MOMA vs ROOM)

• Incorrect network architecture(Gene knockouts)

• Incorrect modeling assumptions(steady state assumption, gene expression profiles)

Page 82: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

82

Fixing the problems you find

Requires different amounts of time, money, and expertise

– Enzyme Genomics project– Community annotation projects– Adopt-a-Genome project– High-throughput experiments– Pathway hole filling algorithms

Page 83: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

83

Case III: Semantic Aggregation Case study

Prochlorococcus marinus MED4• Most abundant species in the ocean• Responsible for a significant portion of photosynthetic carbon fixation.

• Iron hypothesis: Possible solution to global warming?

• Need to understand details of metabolic network

Page 84: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

84

Case III: Multi-source aggregation

Public– KEGG (metabolism)– BioCyc (metabolism)– WIT (metabolism)– TransportDB (transport proteins)

Local– RNA expression (microarrays)– protein expression (mass spec)

Page 85: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

85

Case III: Goal

Constrain metabolic flux model with

experimental measurements:

•RNA expression•Protein expression•Metabolite concentrations•Flux measurements

Page 86: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

86

Case III: Aggregation Problems

• Higher Level: Orphan enzymes• Schema Level: Bridge ontologies• Instance Level: Object identity problem

• Simulation Level: underdetermined system.

Page 87: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

87

Case III: Multi-source aggregation Validation and

Testing

• Joint-learning from multiple sources• Semantic test suite for data validation

• Network debugging algorithms

Page 88: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

88

Time Out

(15 minutes)

Page 89: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

89

Lessons Learned(30 minutes)

What did you learn?

Discussion

“A good representation is the key to good problem solving” –Patrick Winston

“Standard is better than best”—Gerald J Sussman

“The great thing about standards is that there are so many from which to choose” --Unknown

“Above all, one must develop a feeling for the organism.”—Barbara McClintock

“Someone does it once, everybody benefits.”Eric Miller, W3C Semantic Web Activity Lead

Remember people, process, technology, however without people there isn’t any process or technology, so it’s all

social engineering.

Page 90: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

90

Lessons Not Yet Learned

(Take home exercise)

Page 91: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

91

FeedbackOur goal is to have you walk away with a clear understanding of how to approach any

database integration projectTo provide

• A methodology to scope and plan the project• An understanding of what to expect• Some specific examples to illustrate what is common to all integration projects (data cleaning) and what specific to a particular task. (i.e. to provide you with examples to give a sense of it)

• Some first hand experience at pedantic aggravation, irritation and interference

How did we do? Please let us know how we can improve this tutorial.

Page 92: 1 Semantic Aggregation, Integration, and Inference of Pathway Data Co-Destructors: Joanne Luciano, PhD jluciano@biopathways.org Jeremy Zucker zucker@research.dfci.harvard.edu

92

Thank You

Joanne & Jeremy