pathway knowledge base: a public repository for searching biological pathways nikesh kotecha 1, kyle...

22
Pathway Knowledge Base: Pathway Knowledge Base: A Public Repository for Searching A Public Repository for Searching Biological Pathways Biological Pathways Nikesh Kotecha Nikesh Kotecha 1 , Kyle Bruck , Kyle Bruck 1 , William Lu , William Lu 1 and Nigam Shah and Nigam Shah 1 1 Department of Biomedical Informatics, Stanford University Department of Biomedical Informatics, Stanford University Email: [email protected] Email: [email protected] A B C http://pkb.stanford.edu

Upload: luciano-liddicoat

Post on 14-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Pathway Knowledge Base:Pathway Knowledge Base:A Public Repository for Searching Biological A Public Repository for Searching Biological

Pathways Pathways

Nikesh KotechaNikesh Kotecha11, Kyle Bruck, Kyle Bruck11, William Lu, William Lu11 and Nigam Shah and Nigam Shah11

11Department of Biomedical Informatics, Stanford UniversityDepartment of Biomedical Informatics, Stanford University

Email: [email protected]: [email protected]

A B

C

http://pkb.stanford.edu

PKBPKB

C

BA

OutlineOutline

• The problem with pathways…The problem with pathways…• BioPAX and data exchangeBioPAX and data exchange• Pathway Knowledge BasePathway Knowledge Base

– Querying over multiple data sourcesQuerying over multiple data sources– Data import and query processData import and query process

• EvaluationEvaluation• Conclusion & Next StepsConclusion & Next Steps

PKBPKB

C

BAPathways convey biological Pathways convey biological phenomenaphenomena

Sundaraaj et. al, 2005

PKBPKB

C

BAUsers interpret non-standard pathway Users interpret non-standard pathway diagramsdiagrams

Insulin signaling pathways – KEGG & BioCarta

PKBPKB

C

BA

The problem with pathways…The problem with pathways…

• Too many sources to investigateToo many sources to investigate– >200 databases in the Pathway Resource List>200 databases in the Pathway Resource List

– http://www.pathguide.orghttp://www.pathguide.org

• Pathway diagrams are the primary way for consuming Pathway diagrams are the primary way for consuming informationinformation– Non-standardNon-standard– Incorporates a handful of genes and proteinsIncorporates a handful of genes and proteins– Difficult to compute overDifficult to compute over

• ““Pathway” representation is not easy to definePathway” representation is not easy to define– Appropriate levels of abstractionAppropriate levels of abstraction– Standard naming systems for moleculesStandard naming systems for molecules– Appropriate numbers of componentsAppropriate numbers of components

PKBPKB

C

BABioPAX is an emerging standard for BioPAX is an emerging standard for pathway representationpathway representation

http://www.biopax.org

PKBPKB

C

BABioPAX allows for creation of a central BioPAX allows for creation of a central resourceresource

• BioPAX BioPAX – Standardized representation of Pathway data in OWLStandardized representation of Pathway data in OWL

• Provide method for exchange that promotes interoperability Provide method for exchange that promotes interoperability between known data sources.between known data sources.

We use BioPAX level 2 to We use BioPAX level 2 to integrate Kegg, BioCyc and integrate Kegg, BioCyc and ReactomeReactome

PKBPKB

C

BAPathway Knowledge Base (PKB)Pathway Knowledge Base (PKB)

• Infrastructure to integrate pathway information Infrastructure to integrate pathway information from multiple sources into a central resourcefrom multiple sources into a central resource– BioPAX and Oracle RDF modelBioPAX and Oracle RDF model

• Methods to query pathways in these databasesMethods to query pathways in these databases– SQL/SPARQLSQL/SPARQL

• A web interface that allows users and other A web interface that allows users and other programs to query and access these pathwaysprograms to query and access these pathways– JSP/AJAXJSP/AJAX

PKBPKB

C

BASearch pathways across data sources and speciesSearch pathways across data sources and species

http://pkb.stanford.edu

PKBPKB

C

BA

Visualize and navigate pathway results Visualize and navigate pathway results

http://pkb.stanford.edu

PKBPKB

C

BA

Custom queries enabled via advanced searchCustom queries enabled via advanced search

http://pkb.stanford.edu

PKBPKB

C

BA

Under the Hood…Under the Hood…

PKBPKB

C

BA

ArchitectureArchitecture

Pathway Viewer tools

Patika, Cytoscape etc.

Web Client Interface

RDF Objects

PKBPKB

C

BA

BioPAXfilesReactomeBioCyc

BioPAX

Import Object

Web Server

Search Pathway

BioPAX output

SQL

HTTP

JSP

JENA

PKBPKB

C

BA

Sample Query: List all reactions in PKBSample Query: List all reactions in PKB

select distinct t,x,y FROM(select 'HUMAN' as t,x,y from TABLE(SDO_RDF_MATCH('(?y rdf:type :biochemicalReaction) (?y :NAME ?x)', SDO_RDF_Models('human'), null,SDO_RDF_Aliases(SDO_RDF_Alias('','http://www.biopax.org/release/biopax-level2.owl#')),null))UNION ALLselect 'YEAST' as t,x,y from TABLE(SDO_RDF_MATCH('(?y rdf:type :biochemicalReaction) (?y :NAME ?x)', SDO_RDF_Models('yeast'), null,SDO_RDF_Aliases(SDO_RDF_Alias('','http://www.biopax.org/release/biopax-level2.owl#')),null))UNION ALLselect 'ECOLI' as t,x,y from TABLE(SDO_RDF_MATCH('(?y rdf:type :biochemicalReaction) (?y :NAME ?x)', SDO_RDF_Models('ecoli'), null,SDO_RDF_Aliases(SDO_RDF_Alias('','http://www.biopax.org/release/biopax-level2.owl#')),null)))ORDER BY x

Each species has its own RDF model

Query across species via UNION ALL

Determine data source via URI:http://pkb.stanford.edu/biopax#biocyc_biochemicalReaction124086301HUMANRecruitment of elongation factors to form HIV-1 elongation complex

http://pkb.stanford.edu/biopax#reactome_Recruitment_of_elongation_factors_to_form_HIV_1_elongation_complex

PKBPKB

C

BA

Adding new species and data sourcesAdding new species and data sources

• Add new speciesAdd new species– Create TABLE for storing rdf dataCreate TABLE for storing rdf data– Create an RDF_MODEL for that speciesCreate an RDF_MODEL for that species– Insert data as N-TriplesInsert data as N-Triples

• Add new data sourceAdd new data source– Obtain BioPAX pathway dataObtain BioPAX pathway data– Convert BioPAX OWL files into the triples format using JENAConvert BioPAX OWL files into the triples format using JENA– Update all unique identifiers to have a custom URI* (Update all unique identifiers to have a custom URI* (

http://pkb.stanford.edu/biopax#datasourcenamehttp://pkb.stanford.edu/biopax#datasourcename))– Upgrade biopax identifiers from level 1 to level 2*Upgrade biopax identifiers from level 1 to level 2*– Insert data into tableInsert data into table

PKBPKB

C

BA

EvaluationEvaluation

PKBPKB

C

BA

PKB Query Performance PKB Query Performance

Human E. coli Yeast Total

# of pathways 1072 515 606 2193

# of reactions 3550 1966 1794 7310

# of triples 525148 367454 238877 1,131,479

Summary of data available in PKB

Example query times

Queries run on a Dell PowerEdge 2800 w/ 4 GB RAM and Oracle 10.2.0.2

Query Time (s)

Search for pathway titles containing phospholipase 1.3

Find all proteins with bcl1 in their name or synonym 16.1

Find all pathways with protein bcl1 35.3

List all entities to the left or right of hydroxyanthranilate 39.739.7

PKBPKB

C

BACertain queries are easier with Certain queries are easier with relational structuresrelational structures

Pathway A Pathway B

Reaction D

Protein E

Reaction C

PathwayPathway ReactionReaction ProteinProtein

AA CC EE

BB DD EE

Query:Query:Given protein E, find all pathwaysGiven protein E, find all pathways

Relational Structure

Directed Graph

PKBPKB

C

BA

Pathway Knowledge BasePathway Knowledge Base

• BenefitsBenefits– Centralized resource for querying pathway information across Centralized resource for querying pathway information across

data sources and speciesdata sources and species

– Demonstrates use of BioPAX and RDF for exchange and Demonstrates use of BioPAX and RDF for exchange and integration of pathway data integration of pathway data

• LimitationsLimitations– Storing data Storing data onlyonly as RDF graph structures limits query as RDF graph structures limits query

performanceperformance

– Import requires manipulation of namespacesImport requires manipulation of namespaces

– Querying options and methods are based on SPARQL but do Querying options and methods are based on SPARQL but do not currently support all the features specified in SPARQLnot currently support all the features specified in SPARQL

PKBPKB

C

BA

Next Steps in PKB?Next Steps in PKB?

• Pathway differences between data sourcesPathway differences between data sources– How does the galactose metabolism differ b/w ecocyc How does the galactose metabolism differ b/w ecocyc

and reactomeand reactome

• Pathway mergingPathway merging– Merge information from the same pathways from Merge information from the same pathways from

different data sourcesdifferent data sources

• Pathway consistency checkingPathway consistency checking– BioPAX rules for testing pathway modelsBioPAX rules for testing pathway models

PKBPKB

C

BA

AcknowledgementsAcknowledgements

• PKB TeamPKB Team– Kyle BruckKyle Bruck– William LuWilliam Lu– Nigam ShahNigam Shah

• FeedbackFeedback– Daniel RubinDaniel Rubin– Russ AltmanRuss Altman

• HostingHosting– National Center for Biomedical OntologiesNational Center for Biomedical Ontologies

• Funding SourcesFunding Sources– Stanford Medical InformaticsStanford Medical Informatics– National Library of MedicineNational Library of Medicine– National Institute of HealthNational Institute of Health

PKBPKB

C

BA

Thanks. Any Questions?Thanks. Any Questions?

http://pkb.stanford.eduhttp://pkb.stanford.edu

[email protected]@lists.stanford.edu