building linked data large-scale chemistry platform - challenges, lessons and solutions
TRANSCRIPT
![Page 1: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/1.jpg)
Building linked-data, large-scale chemistry platform: challenges, lessons and solutions
Valery Tkachenko, Alexey Pshenichnov, Aileen Day, Colin Batchelor, Peter CorbettRoyal Society of Chemistry
ACS Spring 2016San Diego, CAMarch 13th 2016
![Page 2: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/2.jpg)
ChemSpider – 2007 - 2011
OpenPHACTS – 2011 - 2014
Chemistry Data Platform – 2014 - …
![Page 3: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/3.jpg)
• 45 million chemicals and growing• Data sourced from >500 different sources• Crowdsourced curation and annotation• Ongoing deposition of data from our
journals and our collaborators• A structure centric hub for web-searching
![Page 4: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/4.jpg)
ChemSpider
![Page 5: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/5.jpg)
Chemical vendors and datasources
![Page 6: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/6.jpg)
ChemSpider
![Page 7: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/7.jpg)
Properties - experimental
![Page 8: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/8.jpg)
Literature and patents references
![Page 9: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/9.jpg)
Classification
![Page 10: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/10.jpg)
Spectra
![Page 11: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/11.jpg)
Multimedia
![Page 12: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/12.jpg)
Tagging
![Page 13: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/13.jpg)
ChemSpider - Summary
• Simple, flattish data model• InChI as a primary identifier• Linked by synonyms• Linked by “ExtId”• Standard searches (identity, substructure,
similarity)• Very little semantics
![Page 14: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/14.jpg)
Open PHACTS Mission: Integrate Multiple Research Biomedical Data Resources
Into A Single Open & SustainableAccess Point
OpenPHACTS: 2011-2014
![Page 15: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/15.jpg)
[email protected] @Open_PHACTS
Open PHACTS Practical SemanticsOpenPHACTS
GlaxoSmithKline – CoordinatorUniversität Wien – Managing entity Technical University of Denmark University of Hamburg, Center for Bioinformatics BioSolveIT GmBH Consorci Mar Parc de Salut de Barcelona Leiden University Medical Centre Royal Society of Chemistry Vrije Universiteit AmsterdamNovartisMerck SeronoH. Lundbeck A/SEli LillyNetherlands Bioinformatics CentreSwiss Institute of BioinformaticsConnectedDiscoveryEMBL-European Bioinformatics InstituteJanssen Esteve AlmirallOpenLink ScibiteThe Open PHACTS FoundationSpanish National Cancer Research Centre University of Manchester Maastricht University AqnowledgeUniversity of Santiago de Compostela Rheinische Friedrich-Wilhelms-Universität BonnAstraZenecaPfizer
![Page 16: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/16.jpg)
Why is it so hard to….
Competitors?
What’s the structure?
Are they in our file?
What’s similar?
What’s the target?Pharmacology
data?
Known Pathways?
Working On Now?Connections to
disease?
Expressed in right cell type?
IP?
![Page 17: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/17.jpg)
18@gray_alasdair Big Data Integration
![Page 18: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/18.jpg)
19
OpenPHACTS Discovery Platform
RDFNanopub
Db
VoID
Data Cache (Virtuoso Triple Store)
Semantic Workflow Engine
Linked Data API (RDF/XML, TTL, JSON)DomainSpecificServices
Identity Resolution
Service
Chemistry RegistrationNormalisation & Q/C
IdentifierManagement
Service
Indexing
Cor
e Pl
atfo
rm
P12374EC2.43.4
CS4532
“Adenosine receptor 2a”
RDF
VoID
Db
RDFNanopub
Db
VoID
RDF
Db
VoID
RDFNanopub
VoID
Public Content Commercial
Public Ontologies
User Annotations
Apps
21 October 2014 Scientific Lenses – A. J. G. Gray
![Page 19: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/19.jpg)
Gleevec®: Imatinib Mesylate
21 October 2014 Scientific Lenses – A. J. G. Gray 20
DrugbankChemSpider PubChem
Imatinib
MesylateImatinib MesylateYLMAHDNUQAMNNX-UHFFFAOYSA-N
![Page 20: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/20.jpg)
Scientific Lenses – A. J. G. Gray 21
skos:exactMatch(InChI)
Strict Relaxed
Analysing Browsing
Structure Lens
21 October 2014
I need to compute an analysis, give me details of the active compound in Gleevec.
![Page 21: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/21.jpg)
22
Commercial ibuprofen is a racemic mixture containing the same proportion of two chiral forms. Both chiral forms are equally active. Typically, the user will wish to retrieve info for any stereoisomer.
CHEMBL427526
CHEMBL521CHEMBL175
Lens Effects: Ibuprofen
21 October 2014 Scientific Lenses – A. J. G. Gray
![Page 22: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/22.jpg)
23
Commercial ibuprofen is a racemic mixture containing the same proportion of two chiral forms. Both chiral forms are equally active. Typically, the user will wish to retrieve info for any stereoisomer.
Default Lens
21 October 2014 Scientific Lenses – A. J. G. Gray
![Page 23: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/23.jpg)
24
Commercial ibuprofen is a racemic mixture containing the same proportion of two chiral forms. Both chiral forms are equally active. Typically, the user will wish to retrieve info for any stereoisomer.
Stereoisomer Lens
21 October 2014 Scientific Lenses – A. J. G. Gray
![Page 24: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/24.jpg)
25
Mapping Generation
21 October 2014 Scientific Lenses – A. J. G. Gray
ops:OPS437281
✔
ops:OPS380297
has_stereoundefined_parent [ci:CHEMINF_000456]
ops:OPS380297
is_stereoisomer_of[ci:CHEMINF_000461] Other relationships
• has part• is tautomer of• uncharged counterpart• isotope…
![Page 25: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/25.jpg)
OpenPHACTS UIhttp://explorer.openphacts.org/
![Page 26: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/26.jpg)
27
Explorer Screenshot
21 October 2014 Scientific Lenses – A. J. G. Gray
![Page 27: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/27.jpg)
28
Explorer Screenshot
21 October 2014 Scientific Lenses – A. J. G. Gray
![Page 28: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/28.jpg)
OpenPHACTS - Summary
• Principal difference – inter-domain links• More complex, but still structure-centric
data model• Ontological relationships introduced• Chemical Lenses – new type of search
![Page 29: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/29.jpg)
Chemistry Data Platform – 2014 - …
![Page 30: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/30.jpg)
Dimensions and complexity of science
![Page 31: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/31.jpg)
![Page 32: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/32.jpg)
RSC Archive – since 1841
![Page 33: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/33.jpg)
Digitally Enabling RSC Archive
![Page 34: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/34.jpg)
ChemSpider Synthetic PagesCompoundsReactionAnalytical DataText and References
![Page 35: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/35.jpg)
RSC DatabasesRSC CompoundsRSC ReactionsRSC SpectraRSC CrystalsRSC PolymersRSC MaterialsRSC AssaysRSC AlgorithmsRSC Models…and on…
![Page 36: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/36.jpg)
Compounds domain
![Page 37: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/37.jpg)
![Page 38: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/38.jpg)
Data quality issue and CVSP
– Robochemistry
– Proliferation of errors in public and private databases
• ChemSpider• PubChem• DrugBank• KEGG• ChEBI/ChEMBL
– Automated quality control system
![Page 39: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/39.jpg)
Chemistry Validation and Standardization Platform
![Page 40: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/40.jpg)
Chemistry Validation and Standardization Platform
![Page 41: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/41.jpg)
Reactions domain
![Page 42: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/42.jpg)
![Page 43: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/43.jpg)
Analytical data domain
![Page 44: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/44.jpg)
Crystallography domain
![Page 45: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/45.jpg)
Chemistry Data Platform - Summary
• Simplified models within domain• Domains are described with its own models
with embedded semantics• No proper domain-specific identifiers• Extensive quality control – CVSP (DOI
10.1186/s13321-015-0072-8)
![Page 46: Building linked data large-scale chemistry platform - challenges, lessons and solutions](https://reader036.vdocument.in/reader036/viewer/2022062522/587a32ee1a28abdb1c8b4fcb/html5/thumbnails/46.jpg)
There is no way back