Download - 1758-2946-4-6
-
8/2/2019 1758-2946-4-6
1/22
This Provisional PDF corresponds to the article as it appeared upon accPDF and full text (HTML) versions will be made availab
Improving integrative searching of systems chemical bsemantic annotation
Journal of Cheminformatics 2012, 4:6 doi:10.1186/1758
Bin Chen ([email protected])Ying Ding ([email protected])David J Wild ([email protected])
Journal of Cheminformatics
mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected] -
8/2/2019 1758-2946-4-6
2/22
Improving integrative searching of sy
chemical biology data using semanti
annotation
Bin Chen,Aff1Corresponding Affiliation: Aff1
Email: [email protected]
Ying Ding,Aff2Email: [email protected]
David J Wild,Aff1Email: [email protected]
-
8/2/2019 1758-2946-4-6
3/22
Availability
Chem2Bio2OWL is available at http://chem2bio2rdf.org/owl. The documhttp://chem2bio2owl.wikispaces.com.
Background
Recent efforts [1-3] in the Semantic web have involved conversion of var
biological data sources into semantic formats (e.g., RDF, OWL) and linke
large networks. The number of bubbles in Linked Open Data (LOD) [4] hfrom 12 in 2007 to 203 in 2010. This richly linked data allows answering
scientific questions using the SPARQL query language [5], finding paths
and ranking associations of different entities [7,8]. Our previous work on
[3] offers a framework to data mine systems chemical biology and chemo
exemplified by the examples given in our paper: compound selection in p
multiple pathway inhibitor identification and adverse drug reaction - path
-
8/2/2019 1758-2946-4-6
4/22
explicitly. BioAssay Ontology [22] is developed to standardize the descri
experiments and screening results. DDI [23] and OBI [24] present integra
frameworks in drug discovery investigation and biomedical investigationnumber of upper ontologies such as Basic Formal Ontology (BFO) [25] a
support domain ontology building as well. Many of the ontologies are dep
foundry [26] or NCBO BioPortal [27], for public access. Using ontologie
and reason has been widely practiced in life sciences. Baitaluk and Ponom
IntegromeDB to semantically integrate over 100 experimental and compu
sources relating to genomics, transcriptomics, genetics, and functional an
concerning gene transcriptional regulation in eukaryotes and prokaryotes created logical rules using Semantic Web Rule Language to answer resea
pertaining to pseudogenes [29].
Systems Chemical Biology [30] (and its sub-discipline of chemogenomic
discipline studying how chemicals interact with the whole biological syst
which cover a wide range of entities (compounds, drugs, proteins, genes,
ff t th d ) d i l ti b t titi h
-
8/2/2019 1758-2946-4-6
5/22
BioPAX offers a standard, well defined representation of biological pathw
and it has been widely used in biological data integration [15,31]. We imp
from BioPAX and made subsequent extensions based upon our use casesclasses were refined in accordance with current instance data structure. Sm
and Protein were put under PhysicalEntity. Their relation with Disease an
elaborated under Interaction, which is further classified into DrugInduced
DrugTreatment, DrugDrugInteraction, ProteinProteinInteraction and
ChemicalProteinInteraction. BioAssay and Literature serve as Evidence t
relations. Pathway was treated as a black box since its instance data is ju
Other than Interaction, we did not intend to further classify other individu
Table 1Primary classes, their description, sample instance data sourcof sample annotated instances. Data sources were described in [32]primary classes description sample instance data sources
SmallMolecule a small bioactive molecule PubChem, ChEBI
Drug a chemical used in the treatment DrugBank PharmGKB TTD
-
8/2/2019 1758-2946-4-6
6/22
placed under Interaction; otherwise, they were presented as object proper
Ontology (RO) [33] was imported to help present basic relations. For exa
ProteinProteinInteraction not only covers the binary relation between twoaffiliates its experimental conditions (e.g., organism and interaction type)
participant in that interaction. Similarly, Chemical and Protein serve as pa
ChemicalProteinInteraction, which includes other information such as the
interaction. Figure 2 shows major classes and their relations.
Figure 2Overview of Chem2Bio2OWL. Only part of classes (presented
relations (presented as edges) are visualized. Some classes in ChemicalPr
ignored due to the limited space
Data properties appeared in the original database sources were not fully c
only the important ones related to our purpose (chemogenomics and syste
biology). This simplifies the ontology without losing essential knowledge
including data property name, class name and relation name were manual
i l t t l i i th OBO d NCBO Bi P t l d th t i
-
8/2/2019 1758-2946-4-6
7/22
Implementation
Figure 4 shows the data integration workflow we used to populate the ontJava scripts along with the OWL API Java package [37] were used to aut
annotation of Chem2Bio2RDF data using Chem2Bio2OWL. Pellet reason
applied to reason new relations. The annotated data plus new relations we
Virtuoso triple store [39] for querying. Efforts were made to cope with da
inconsistence and provenance. Data redundancy is originated from the ho
source of the objects. Chemical compounds for example were presented a
(e.g., SMILES, InChi, MOL, etc.) and many data sources have their own compounds. The URI of individual instance in Chem2Bio2OWL is based
source ID or fake ID if primary ID is unavailable. PubChem as the larges
hub is considered as the primary source for chemicals. Its identifier Comp
used to identify compounds (e.g., http://chem2bio2rdf.org/chem2bio2owl
The compounds with unknown CIDs were assigned CIDs by searching Pu
InChi, a universal structure representation. A fake CID was assigned if th
i i P bCh D i d id ff i D B k ID U
-
8/2/2019 1758-2946-4-6
8/22
Drug related target identification
Identification of potential targets for drugs is important for discovering neapplications as well as identifying potential undesirable side-effects (off
interactions). These kinds of interactions are described in different ways
Chem2Bio2RDF datasets: PubChem BioAssay, ChEMBL and BindingDB
experiments; PharmGKB contains genetic variations upon drug response;
Express contain expression data; KEGG contains interactions in pathway
question: What are the possible targets of drug (e.g., Troglitazone)? pr
complex SPARQL query explicitly referencing each data set individuallySPARQL presents the searching of two chemogenomics database:
PREFIX compound:
PREFIX bindingdb:
PREFIX drugbank:
PREFIX uniprot:
-
8/2/2019 1758-2946-4-6
9/22
PREFIX c2b2r:
PREFIX bp:
PREFIX ro:
PREFIX rdfs:
PREFIX rdf:
select distinct ?target_name
from
where
{?chemical rdfs:label Troglitazone^^xsd:string;
ro:participates_in ?interaction .
?interaction rdf:type c2b2r:ChemicalProteinInteraction;
ro:has_participant ?target .
?target rdf:type bp:Protein ;
-
8/2/2019 1758-2946-4-6
10/22
Following the steps in the previous query, this query is interpreted as: this
evidence which is a bioassay; the assay has description and outcome, and
exists.
Target inhibitor/activator searching
Pregnane X receptor (NR1I2) is a transcriptional regulator of the expressi
metabolism and transporter genes. It has multiple binding sites, accountin
functions. Its agonists at the ligand-binding domain would trigger up-regu
increase the metabolism and excretion of therapeutic agents, and cause drinteractions, but its antagonists counteract such interactions [41]. Due to d
sites, the two types of compounds may be quite different structurally. Usi
Chem2Bio2OWL, we are able to answer this question: Find NR1I2 agon
compounds with weight500. The following SPARQL was used to retriTheir structures are quite different with 6 antagonists retrieved from anoth
the significance of classifying the ligands.
-
8/2/2019 1758-2946-4-6
11/22
hepatorenal syndrome, that could be further linked to disease genes in ou
ABCB11 is one of the liver disease related genes and its activity is affecte
ABCB11 involves in the liver bile acid transportation and metabolism (frABCB11). It is not surprising that the change of its activity will result in
Similarly, we mapped heart attack to heart disease in the disease ontology
failure, endocarditis, pericarditis,etc, which are linked to 7 disease genes.
overlap between disease genes and Rosiglitazone related targets was foun
turned to find disease related targets. First, the drugs causing heart diseas
their related targets were further identified, grouped, and ranked by the nu
common drugs. The higher ranking indicates the higher possibility linkin
The top 10 targets are CYP3A4, CYP2C9, ABCB1, CYP1A2, PTGS2, C
CYP3A5, CYP2C19 and PPARG. The top one CYP3A4 for example is sh
out of 181 total heart disease related drugs. Some high ranked targets like
affected by Troglitazone, nevertheless, the activity of CYP2D6 shared by
related drugs is affected only by Rosiglitazone, it was not found in the Tr
targets. Further literature search indicates that CYP2D6 plays a very impo
cardiovascular disease [45] Although further experimental evaluation wo
-
8/2/2019 1758-2946-4-6
12/22
Many current methods and tools often convert data into RDFs directly fro
bases. They prove useful in their own right to link multiple datasets, but w
ontology to model the concepts and their relations, the linked data would demonstrate the capability of semantics, limiting its further usage in data
reasoning. In addition, adding evidence and provenance to support asserti
importance to keep track of the record, but it has not been well implemen
RDFization process. Chem2Bio2OWL provides a high level structure of t
relations according to use case queries and instance data structures, and c
and provenance. However, the case driven ontology building process has
compromises on accuracy to many top down approaches which aims to
knowledge oriented from philosophical perspective, nevertheless, our wo
engineering solution to address the urgent needs in this area. The integrat
systems chemical biology data can be performed in a very intuitive and ef
than Chem2Bio2RDF, some public datasets (e.g., Bio2RDF and LODD) c
using Chem2Bio2OWL.
F th ff t h ld b d t li Ch 2Bi 2OWL ith b i t
-
8/2/2019 1758-2946-4-6
13/22
Acknowledgements
We thank Shanshan Chen, Kuochung Peng, Qian Zhu, Jaehong Shin and
the cheminformatics group at Indiana University for discussion and comm
References
1. Belleau F, et al.: Bio2RDF: Towards a mashup to build bioinforma
systems.J Biomed Inform 2008, 41(5):706716.
2. Jentzsch A, et al.: Linking open drug data. In Triplification Challeng
International Conference on Semantic Systems 2009.
3. Chen B, et al.: Chem2Bio2RDF: a semantic framework for linking
chemogenomic and systems chemical biology data.BMC Bioinformati
d i 10 1186/1471 2105 11 255
-
8/2/2019 1758-2946-4-6
14/22
15. Choi J, et al.: A semantic web ontology for small molecules and th
targets.J Chem Inf Model 2010, 50(5):732741. doi:10.1021/ci900461j.
16. Ivchenko O, et al.: PLIO: an ontology for formal description of pr
interactions.Bioinformatics 2011, [[http://dx.doi.org/10.1093/bioinform
17. Qu XA, et al.: Inferring novel disease indications for known drug
linking drug action and disease mechanism relationships.BMC Bioinf
doi:doi:10.1186/1471-2105-10-S5-S4.
18. Demir E, et al.: The BioPAX community standard for pathway da
Biotechnol 2010, 28(9):935942. doi:10.1038/nbt.1666.
19. Luciano JS, Andersson B, Batchelor C, Bodenreider O, Clark T, Den
C, Gambet T, Harland L, Jentzsch A, Kashyap V, Kos P, Kozlovsky J, Le
McCusker JP, McGuinness DL, Ogbuji C, Pichler E, Powers RL, Prudho
S ld M S h i l L T ll t PJ Wh t l PL Zh J St h S D
-
8/2/2019 1758-2946-4-6
15/22
27. Noy NF, et al.: BioPortal: ontologies and integrated data resource
mouse.Nucleic Acids Res 2009, 37:170173. doi:10.1093/nar/gkp440.
28. Baitaluk M, Ponomarenko J: Semantic integration of data on trans
regulation.Bioinformatics 2010, 13:16511661. doi:10.1093/bioinforma
29. Holford ME, et al.: Using semantic web rules to reason on an onto
pseudogenes.Bioinformatics 2010, 26(12):7178. doi:10.1093/bioinform
30. Oprea TI, et al.: Systems chemical biology.Nat Chem Biol 2007, 3(
doi:10.1038/nchembio0807-447.
31. Ruebenacker O, Moraru II, Schaff JC, Blinov ML: Kinetic modeling
ontology.Proceedings (IEEE Int Conf Bioinformatics Biomed) 2007, 200
doi:[[http://dx.doi.org/10.1109/BIBM.2007.55]].
32 Ch B t l Ch 2Bi 2RDF A Li k d O D t P t l f S
-
8/2/2019 1758-2946-4-6
16/22
42. Cohen JS: Risks of troglitazone apparent before approval in USA
2006, 49(6):14541455. doi:10.1007/s00125-006-0245-0.
43. Ajjan RA, Grant PJ: The cardiovascular safety of rosiglitazone.Ex
2008, 7(4):367376. doi:10.1517/14740338.7.4.367.
44. Xie L, et al.: Drug discovery using chemical systems biology: iden
protein-ligand binding network to explain the side effects of CETP in
Comput Biol 2009, 5(5):1000387. doi:10.1371/journal.pcbi.1000387.
45. Lessard E, et al.: Influence of CYP2D6 activity on the disposition a
toxicity of the antidepressant agent venlafaxine in humans.Pharmaco
9(4):435443.
46. Chen H, et al.: Semantic web for integrated network analysis in bi
Bioinform 2009, 10(2):177192. doi:10.1093/bib/bbp002.
-
8/2/2019 1758-2946-4-6
17/22
" "
" ""
" "" " "" "
" "
"
-
8/2/2019 1758-2946-4-6
18/22
I n t e r a c t i o n
S
i d e E f f
p
a r t i c i
p
a t e s i n
i s _ a
D r u g I n d u c e d
S
i d e E f f e c t
p
a r t i c i
p
a t e s _ i n
h
a s _
p
a r t i c i
p
a n t
p
a r t i c i
p
a t e s _ i n
h
a s _
p
a r t i c i
p
a n t
p
a r t i c i
p
a t e s _ i n
h
a s _
p
a r t i c i
p
a n t
D r u g
D r u g D r u g I n t e r a c t i o n
t i i t i
D r u g T r e a t m e n t
p
a r t i c i
p
a t e s _ i n
h
a s _
p
a r t i c i
p
a n t
p
a r t _ o f
h
a s _
p
a r t
S
m a l l M o l e c u l e
h
i l l i
P r o t e i n P r o t e i n I n t e r a c t i o n
C
h
e m i c a l P r o t e i n I n t e r a c t i o n
i s _ a
p
a r t i c i
p
a t e s _ i n
h
a s _
p
a r t i c i
p
a n t
p
a r t i c i
p
a t e s _ i n
h
a s _
p
a r t i c i
p
a n t
p
a r t i c i
p
a t e s _ i n
h
a s _
p
a r t i c i
p
a n t
C
h
e m i c a l R e g u l a t e s P r o t e i n
A c t i v i t y
E x
p
r e s s i o n
p
a r t i c i
p
a t e s _ i n
h
a s _
p
a r t i c i
p
a n t
M a c r o m o l e c u l e M e t a b o l i c P r o c e s s
M e t
h
y l a t i o n
P
h
o s
p
h
o r y l a t i o n
-
8/2/2019 1758-2946-4-6
19/22
t r o g l i t a z o n e _ P P A R G _ i n t e r a c t i o n s
"
B d
h
s "p p " "h
s "p p " "h
s b
I
s
u
h
s
T
u
t r o g l i t a z o n e _ P P A R G _ i n t e r a c t i o n s
"
B d
h
s "p p " "h
s "p p " "h
s b
I
s
u
h
s
T
u
t r o g l i t a z o n e s
"
D
u
h
s "s y y m " h
s "x f D u k : D 0 0 4 8 8 h
s "x f u C : "5 5 9 1
t r o g l i t a z o n e s
"
D
u
h
s "s y y m " h
s "x f D u k : D 0 0 4 8 8 h
s "x f u C : "5 5 9 1
h
s " T u h
s " v " 1h
s " T u h
s " v " 1
f
p
p
s " "
f
p
p
s
"
i o A a _ t r o g l i t a z o n e _ P P A R G
h
s " x f M D : 1 0 6 9 1 6 8 0 h
s "x f C E M B L 5 1 3 9 8 6 h
s " s m "H " h
s "m s m "E C 5 0
i o A a _ t r o g l i t a z o n e _ P P A R G
h
s " x f M D : 1 0 6 9 1 6 8 0 h
s "x f C E M B L 5 1 3 9 8 6 h
s " s m "H " h
s "m s m "E C 5 0
h
s ""v "0 . 5 5 h
s "" "u h
s " "=h
s ""v "0 . 5 5 h
s "" "u h
s " "=
-
8/2/2019 1758-2946-4-6
20/22
C h e m 2 B i o 2 O W L
"
-
8/2/2019 1758-2946-4-6
21/22
t r o g l i t a z o n e r o s i g l i t a z o n
a c t i v i t y b i n d i n g
a c t i v i t y b i n d i n g
C Y P 3 A 4 P P A R G A B C B 1 1 C Y P 2 D 6P P A R G
a c t i v i t y b i n d i n g
a c t i v i t y
"
A B C B 1 1
A C H E C Y P 2 D
A T P 8 B 1 C Y P 7 B 1
I
n t e r a t i o n
c a u s e d b y p p m
I
n t e r a t i o n
I
n t e r a t i o n
I
n t e r a t i o n
I
n t e r a t i o n
c a u s e d " b y c a u s e d " b yc a u s e d " b y p p m
c a u s e d " b y c a u s e d " b yc a u s
-
8/2/2019 1758-2946-4-6
22/22