1758-2946-4-6

Upload: pascalm1

Post on 06-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 1758-2946-4-6

    1/22

    This Provisional PDF corresponds to the article as it appeared upon accPDF and full text (HTML) versions will be made availab

    Improving integrative searching of systems chemical bsemantic annotation

    Journal of Cheminformatics 2012, 4:6 doi:10.1186/1758

    Bin Chen ([email protected])Ying Ding ([email protected])David J Wild ([email protected])

    Journal of Cheminformatics

    mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]
  • 8/2/2019 1758-2946-4-6

    2/22

    Improving integrative searching of sy

    chemical biology data using semanti

    annotation

    Bin Chen,Aff1Corresponding Affiliation: Aff1

    Email: [email protected]

    Ying Ding,Aff2Email: [email protected]

    David J Wild,Aff1Email: [email protected]

  • 8/2/2019 1758-2946-4-6

    3/22

    Availability

    Chem2Bio2OWL is available at http://chem2bio2rdf.org/owl. The documhttp://chem2bio2owl.wikispaces.com.

    Background

    Recent efforts [1-3] in the Semantic web have involved conversion of var

    biological data sources into semantic formats (e.g., RDF, OWL) and linke

    large networks. The number of bubbles in Linked Open Data (LOD) [4] hfrom 12 in 2007 to 203 in 2010. This richly linked data allows answering

    scientific questions using the SPARQL query language [5], finding paths

    and ranking associations of different entities [7,8]. Our previous work on

    [3] offers a framework to data mine systems chemical biology and chemo

    exemplified by the examples given in our paper: compound selection in p

    multiple pathway inhibitor identification and adverse drug reaction - path

  • 8/2/2019 1758-2946-4-6

    4/22

    explicitly. BioAssay Ontology [22] is developed to standardize the descri

    experiments and screening results. DDI [23] and OBI [24] present integra

    frameworks in drug discovery investigation and biomedical investigationnumber of upper ontologies such as Basic Formal Ontology (BFO) [25] a

    support domain ontology building as well. Many of the ontologies are dep

    foundry [26] or NCBO BioPortal [27], for public access. Using ontologie

    and reason has been widely practiced in life sciences. Baitaluk and Ponom

    IntegromeDB to semantically integrate over 100 experimental and compu

    sources relating to genomics, transcriptomics, genetics, and functional an

    concerning gene transcriptional regulation in eukaryotes and prokaryotes created logical rules using Semantic Web Rule Language to answer resea

    pertaining to pseudogenes [29].

    Systems Chemical Biology [30] (and its sub-discipline of chemogenomic

    discipline studying how chemicals interact with the whole biological syst

    which cover a wide range of entities (compounds, drugs, proteins, genes,

    ff t th d ) d i l ti b t titi h

  • 8/2/2019 1758-2946-4-6

    5/22

    BioPAX offers a standard, well defined representation of biological pathw

    and it has been widely used in biological data integration [15,31]. We imp

    from BioPAX and made subsequent extensions based upon our use casesclasses were refined in accordance with current instance data structure. Sm

    and Protein were put under PhysicalEntity. Their relation with Disease an

    elaborated under Interaction, which is further classified into DrugInduced

    DrugTreatment, DrugDrugInteraction, ProteinProteinInteraction and

    ChemicalProteinInteraction. BioAssay and Literature serve as Evidence t

    relations. Pathway was treated as a black box since its instance data is ju

    Other than Interaction, we did not intend to further classify other individu

    Table 1Primary classes, their description, sample instance data sourcof sample annotated instances. Data sources were described in [32]primary classes description sample instance data sources

    SmallMolecule a small bioactive molecule PubChem, ChEBI

    Drug a chemical used in the treatment DrugBank PharmGKB TTD

  • 8/2/2019 1758-2946-4-6

    6/22

    placed under Interaction; otherwise, they were presented as object proper

    Ontology (RO) [33] was imported to help present basic relations. For exa

    ProteinProteinInteraction not only covers the binary relation between twoaffiliates its experimental conditions (e.g., organism and interaction type)

    participant in that interaction. Similarly, Chemical and Protein serve as pa

    ChemicalProteinInteraction, which includes other information such as the

    interaction. Figure 2 shows major classes and their relations.

    Figure 2Overview of Chem2Bio2OWL. Only part of classes (presented

    relations (presented as edges) are visualized. Some classes in ChemicalPr

    ignored due to the limited space

    Data properties appeared in the original database sources were not fully c

    only the important ones related to our purpose (chemogenomics and syste

    biology). This simplifies the ontology without losing essential knowledge

    including data property name, class name and relation name were manual

    i l t t l i i th OBO d NCBO Bi P t l d th t i

  • 8/2/2019 1758-2946-4-6

    7/22

    Implementation

    Figure 4 shows the data integration workflow we used to populate the ontJava scripts along with the OWL API Java package [37] were used to aut

    annotation of Chem2Bio2RDF data using Chem2Bio2OWL. Pellet reason

    applied to reason new relations. The annotated data plus new relations we

    Virtuoso triple store [39] for querying. Efforts were made to cope with da

    inconsistence and provenance. Data redundancy is originated from the ho

    source of the objects. Chemical compounds for example were presented a

    (e.g., SMILES, InChi, MOL, etc.) and many data sources have their own compounds. The URI of individual instance in Chem2Bio2OWL is based

    source ID or fake ID if primary ID is unavailable. PubChem as the larges

    hub is considered as the primary source for chemicals. Its identifier Comp

    used to identify compounds (e.g., http://chem2bio2rdf.org/chem2bio2owl

    The compounds with unknown CIDs were assigned CIDs by searching Pu

    InChi, a universal structure representation. A fake CID was assigned if th

    i i P bCh D i d id ff i D B k ID U

  • 8/2/2019 1758-2946-4-6

    8/22

    Drug related target identification

    Identification of potential targets for drugs is important for discovering neapplications as well as identifying potential undesirable side-effects (off

    interactions). These kinds of interactions are described in different ways

    Chem2Bio2RDF datasets: PubChem BioAssay, ChEMBL and BindingDB

    experiments; PharmGKB contains genetic variations upon drug response;

    Express contain expression data; KEGG contains interactions in pathway

    question: What are the possible targets of drug (e.g., Troglitazone)? pr

    complex SPARQL query explicitly referencing each data set individuallySPARQL presents the searching of two chemogenomics database:

    PREFIX compound:

    PREFIX bindingdb:

    PREFIX drugbank:

    PREFIX uniprot:

  • 8/2/2019 1758-2946-4-6

    9/22

    PREFIX c2b2r:

    PREFIX bp:

    PREFIX ro:

    PREFIX rdfs:

    PREFIX rdf:

    select distinct ?target_name

    from

    where

    {?chemical rdfs:label Troglitazone^^xsd:string;

    ro:participates_in ?interaction .

    ?interaction rdf:type c2b2r:ChemicalProteinInteraction;

    ro:has_participant ?target .

    ?target rdf:type bp:Protein ;

  • 8/2/2019 1758-2946-4-6

    10/22

    Following the steps in the previous query, this query is interpreted as: this

    evidence which is a bioassay; the assay has description and outcome, and

    exists.

    Target inhibitor/activator searching

    Pregnane X receptor (NR1I2) is a transcriptional regulator of the expressi

    metabolism and transporter genes. It has multiple binding sites, accountin

    functions. Its agonists at the ligand-binding domain would trigger up-regu

    increase the metabolism and excretion of therapeutic agents, and cause drinteractions, but its antagonists counteract such interactions [41]. Due to d

    sites, the two types of compounds may be quite different structurally. Usi

    Chem2Bio2OWL, we are able to answer this question: Find NR1I2 agon

    compounds with weight500. The following SPARQL was used to retriTheir structures are quite different with 6 antagonists retrieved from anoth

    the significance of classifying the ligands.

  • 8/2/2019 1758-2946-4-6

    11/22

    hepatorenal syndrome, that could be further linked to disease genes in ou

    ABCB11 is one of the liver disease related genes and its activity is affecte

    ABCB11 involves in the liver bile acid transportation and metabolism (frABCB11). It is not surprising that the change of its activity will result in

    Similarly, we mapped heart attack to heart disease in the disease ontology

    failure, endocarditis, pericarditis,etc, which are linked to 7 disease genes.

    overlap between disease genes and Rosiglitazone related targets was foun

    turned to find disease related targets. First, the drugs causing heart diseas

    their related targets were further identified, grouped, and ranked by the nu

    common drugs. The higher ranking indicates the higher possibility linkin

    The top 10 targets are CYP3A4, CYP2C9, ABCB1, CYP1A2, PTGS2, C

    CYP3A5, CYP2C19 and PPARG. The top one CYP3A4 for example is sh

    out of 181 total heart disease related drugs. Some high ranked targets like

    affected by Troglitazone, nevertheless, the activity of CYP2D6 shared by

    related drugs is affected only by Rosiglitazone, it was not found in the Tr

    targets. Further literature search indicates that CYP2D6 plays a very impo

    cardiovascular disease [45] Although further experimental evaluation wo

  • 8/2/2019 1758-2946-4-6

    12/22

    Many current methods and tools often convert data into RDFs directly fro

    bases. They prove useful in their own right to link multiple datasets, but w

    ontology to model the concepts and their relations, the linked data would demonstrate the capability of semantics, limiting its further usage in data

    reasoning. In addition, adding evidence and provenance to support asserti

    importance to keep track of the record, but it has not been well implemen

    RDFization process. Chem2Bio2OWL provides a high level structure of t

    relations according to use case queries and instance data structures, and c

    and provenance. However, the case driven ontology building process has

    compromises on accuracy to many top down approaches which aims to

    knowledge oriented from philosophical perspective, nevertheless, our wo

    engineering solution to address the urgent needs in this area. The integrat

    systems chemical biology data can be performed in a very intuitive and ef

    than Chem2Bio2RDF, some public datasets (e.g., Bio2RDF and LODD) c

    using Chem2Bio2OWL.

    F th ff t h ld b d t li Ch 2Bi 2OWL ith b i t

  • 8/2/2019 1758-2946-4-6

    13/22

    Acknowledgements

    We thank Shanshan Chen, Kuochung Peng, Qian Zhu, Jaehong Shin and

    the cheminformatics group at Indiana University for discussion and comm

    References

    1. Belleau F, et al.: Bio2RDF: Towards a mashup to build bioinforma

    systems.J Biomed Inform 2008, 41(5):706716.

    2. Jentzsch A, et al.: Linking open drug data. In Triplification Challeng

    International Conference on Semantic Systems 2009.

    3. Chen B, et al.: Chem2Bio2RDF: a semantic framework for linking

    chemogenomic and systems chemical biology data.BMC Bioinformati

    d i 10 1186/1471 2105 11 255

  • 8/2/2019 1758-2946-4-6

    14/22

    15. Choi J, et al.: A semantic web ontology for small molecules and th

    targets.J Chem Inf Model 2010, 50(5):732741. doi:10.1021/ci900461j.

    16. Ivchenko O, et al.: PLIO: an ontology for formal description of pr

    interactions.Bioinformatics 2011, [[http://dx.doi.org/10.1093/bioinform

    17. Qu XA, et al.: Inferring novel disease indications for known drug

    linking drug action and disease mechanism relationships.BMC Bioinf

    doi:doi:10.1186/1471-2105-10-S5-S4.

    18. Demir E, et al.: The BioPAX community standard for pathway da

    Biotechnol 2010, 28(9):935942. doi:10.1038/nbt.1666.

    19. Luciano JS, Andersson B, Batchelor C, Bodenreider O, Clark T, Den

    C, Gambet T, Harland L, Jentzsch A, Kashyap V, Kos P, Kozlovsky J, Le

    McCusker JP, McGuinness DL, Ogbuji C, Pichler E, Powers RL, Prudho

    S ld M S h i l L T ll t PJ Wh t l PL Zh J St h S D

  • 8/2/2019 1758-2946-4-6

    15/22

    27. Noy NF, et al.: BioPortal: ontologies and integrated data resource

    mouse.Nucleic Acids Res 2009, 37:170173. doi:10.1093/nar/gkp440.

    28. Baitaluk M, Ponomarenko J: Semantic integration of data on trans

    regulation.Bioinformatics 2010, 13:16511661. doi:10.1093/bioinforma

    29. Holford ME, et al.: Using semantic web rules to reason on an onto

    pseudogenes.Bioinformatics 2010, 26(12):7178. doi:10.1093/bioinform

    30. Oprea TI, et al.: Systems chemical biology.Nat Chem Biol 2007, 3(

    doi:10.1038/nchembio0807-447.

    31. Ruebenacker O, Moraru II, Schaff JC, Blinov ML: Kinetic modeling

    ontology.Proceedings (IEEE Int Conf Bioinformatics Biomed) 2007, 200

    doi:[[http://dx.doi.org/10.1109/BIBM.2007.55]].

    32 Ch B t l Ch 2Bi 2RDF A Li k d O D t P t l f S

  • 8/2/2019 1758-2946-4-6

    16/22

    42. Cohen JS: Risks of troglitazone apparent before approval in USA

    2006, 49(6):14541455. doi:10.1007/s00125-006-0245-0.

    43. Ajjan RA, Grant PJ: The cardiovascular safety of rosiglitazone.Ex

    2008, 7(4):367376. doi:10.1517/14740338.7.4.367.

    44. Xie L, et al.: Drug discovery using chemical systems biology: iden

    protein-ligand binding network to explain the side effects of CETP in

    Comput Biol 2009, 5(5):1000387. doi:10.1371/journal.pcbi.1000387.

    45. Lessard E, et al.: Influence of CYP2D6 activity on the disposition a

    toxicity of the antidepressant agent venlafaxine in humans.Pharmaco

    9(4):435443.

    46. Chen H, et al.: Semantic web for integrated network analysis in bi

    Bioinform 2009, 10(2):177192. doi:10.1093/bib/bbp002.

  • 8/2/2019 1758-2946-4-6

    17/22

    " "

    " ""

    " "" " "" "

    " "

    "

  • 8/2/2019 1758-2946-4-6

    18/22

    I n t e r a c t i o n

    S

    i d e E f f

    p

    a r t i c i

    p

    a t e s i n

    i s _ a

    D r u g I n d u c e d

    S

    i d e E f f e c t

    p

    a r t i c i

    p

    a t e s _ i n

    h

    a s _

    p

    a r t i c i

    p

    a n t

    p

    a r t i c i

    p

    a t e s _ i n

    h

    a s _

    p

    a r t i c i

    p

    a n t

    p

    a r t i c i

    p

    a t e s _ i n

    h

    a s _

    p

    a r t i c i

    p

    a n t

    D r u g

    D r u g D r u g I n t e r a c t i o n

    t i i t i

    D r u g T r e a t m e n t

    p

    a r t i c i

    p

    a t e s _ i n

    h

    a s _

    p

    a r t i c i

    p

    a n t

    p

    a r t _ o f

    h

    a s _

    p

    a r t

    S

    m a l l M o l e c u l e

    h

    i l l i

    P r o t e i n P r o t e i n I n t e r a c t i o n

    C

    h

    e m i c a l P r o t e i n I n t e r a c t i o n

    i s _ a

    p

    a r t i c i

    p

    a t e s _ i n

    h

    a s _

    p

    a r t i c i

    p

    a n t

    p

    a r t i c i

    p

    a t e s _ i n

    h

    a s _

    p

    a r t i c i

    p

    a n t

    p

    a r t i c i

    p

    a t e s _ i n

    h

    a s _

    p

    a r t i c i

    p

    a n t

    C

    h

    e m i c a l R e g u l a t e s P r o t e i n

    A c t i v i t y

    E x

    p

    r e s s i o n

    p

    a r t i c i

    p

    a t e s _ i n

    h

    a s _

    p

    a r t i c i

    p

    a n t

    M a c r o m o l e c u l e M e t a b o l i c P r o c e s s

    M e t

    h

    y l a t i o n

    P

    h

    o s

    p

    h

    o r y l a t i o n

  • 8/2/2019 1758-2946-4-6

    19/22

    t r o g l i t a z o n e _ P P A R G _ i n t e r a c t i o n s

    "

    B d

    h

    s "p p " "h

    s "p p " "h

    s b

    I

    s

    u

    h

    s

    T

    u

    t r o g l i t a z o n e _ P P A R G _ i n t e r a c t i o n s

    "

    B d

    h

    s "p p " "h

    s "p p " "h

    s b

    I

    s

    u

    h

    s

    T

    u

    t r o g l i t a z o n e s

    "

    D

    u

    h

    s "s y y m " h

    s "x f D u k : D 0 0 4 8 8 h

    s "x f u C : "5 5 9 1

    t r o g l i t a z o n e s

    "

    D

    u

    h

    s "s y y m " h

    s "x f D u k : D 0 0 4 8 8 h

    s "x f u C : "5 5 9 1

    h

    s " T u h

    s " v " 1h

    s " T u h

    s " v " 1

    f

    p

    p

    s " "

    f

    p

    p

    s

    "

    i o A a _ t r o g l i t a z o n e _ P P A R G

    h

    s " x f M D : 1 0 6 9 1 6 8 0 h

    s "x f C E M B L 5 1 3 9 8 6 h

    s " s m "H " h

    s "m s m "E C 5 0

    i o A a _ t r o g l i t a z o n e _ P P A R G

    h

    s " x f M D : 1 0 6 9 1 6 8 0 h

    s "x f C E M B L 5 1 3 9 8 6 h

    s " s m "H " h

    s "m s m "E C 5 0

    h

    s ""v "0 . 5 5 h

    s "" "u h

    s " "=h

    s ""v "0 . 5 5 h

    s "" "u h

    s " "=

  • 8/2/2019 1758-2946-4-6

    20/22

    C h e m 2 B i o 2 O W L

    "

  • 8/2/2019 1758-2946-4-6

    21/22

    t r o g l i t a z o n e r o s i g l i t a z o n

    a c t i v i t y b i n d i n g

    a c t i v i t y b i n d i n g

    C Y P 3 A 4 P P A R G A B C B 1 1 C Y P 2 D 6P P A R G

    a c t i v i t y b i n d i n g

    a c t i v i t y

    "

    A B C B 1 1

    A C H E C Y P 2 D

    A T P 8 B 1 C Y P 7 B 1

    I

    n t e r a t i o n

    c a u s e d b y p p m

    I

    n t e r a t i o n

    I

    n t e r a t i o n

    I

    n t e r a t i o n

    I

    n t e r a t i o n

    c a u s e d " b y c a u s e d " b yc a u s e d " b y p p m

    c a u s e d " b y c a u s e d " b yc a u s

  • 8/2/2019 1758-2946-4-6

    22/22