mining public domain data as a basis for drug repurposing...when errors are identified hard to get...
TRANSCRIPT
![Page 1: Mining public domain data as a basis for drug repurposing...When errors are identified hard to get fixed! Data licensing is confusing –“Open Data” We are “takers” not “givers”](https://reader036.vdocument.in/reader036/viewer/2022070904/5f6e3419abdf663ee862e652/html5/thumbnails/1.jpg)
Mining public domain data as a basis
for drug repurposing
Antony J Williams, Sean Ekins and Valery Tkachenko
ACS Philadelphia August 2012
http://tinyurl.com/d6wodsl
![Page 2: Mining public domain data as a basis for drug repurposing...When errors are identified hard to get fixed! Data licensing is confusing –“Open Data” We are “takers” not “givers”](https://reader036.vdocument.in/reader036/viewer/2022070904/5f6e3419abdf663ee862e652/html5/thumbnails/2.jpg)
Drug Repurposing
Drug repurposing commonly means data reexamination also!
Lots of data mining occurs
Then more screening which creates more data..
LOTS of public databases used to examine repurposing…
![Page 3: Mining public domain data as a basis for drug repurposing...When errors are identified hard to get fixed! Data licensing is confusing –“Open Data” We are “takers” not “givers”](https://reader036.vdocument.in/reader036/viewer/2022070904/5f6e3419abdf663ee862e652/html5/thumbnails/3.jpg)
A LOT of data coming online
![Page 4: Mining public domain data as a basis for drug repurposing...When errors are identified hard to get fixed! Data licensing is confusing –“Open Data” We are “takers” not “givers”](https://reader036.vdocument.in/reader036/viewer/2022070904/5f6e3419abdf663ee862e652/html5/thumbnails/4.jpg)
Interlinked on the semantic web
![Page 5: Mining public domain data as a basis for drug repurposing...When errors are identified hard to get fixed! Data licensing is confusing –“Open Data” We are “takers” not “givers”](https://reader036.vdocument.in/reader036/viewer/2022070904/5f6e3419abdf663ee862e652/html5/thumbnails/5.jpg)
Where do you get your data?
Databases?
Patents?
Papers?
Your own lab?
Collaborators?
All of the above?
What is likely common to all sources? DataQuality issues. There is no perfect database.
![Page 6: Mining public domain data as a basis for drug repurposing...When errors are identified hard to get fixed! Data licensing is confusing –“Open Data” We are “takers” not “givers”](https://reader036.vdocument.in/reader036/viewer/2022070904/5f6e3419abdf663ee862e652/html5/thumbnails/6.jpg)
Public Domain Databases
Our databases are a mess…
Non-curated databases are proliferating errors
We source and deposit data between databases
Original sources of errors hard to determine
Curation is time-consuming and challenging
![Page 7: Mining public domain data as a basis for drug repurposing...When errors are identified hard to get fixed! Data licensing is confusing –“Open Data” We are “takers” not “givers”](https://reader036.vdocument.in/reader036/viewer/2022070904/5f6e3419abdf663ee862e652/html5/thumbnails/7.jpg)
![Page 8: Mining public domain data as a basis for drug repurposing...When errors are identified hard to get fixed! Data licensing is confusing –“Open Data” We are “takers” not “givers”](https://reader036.vdocument.in/reader036/viewer/2022070904/5f6e3419abdf663ee862e652/html5/thumbnails/8.jpg)
Availability of libraries of FDA drugs
Johns Hopkins Clinical Compound library- made compounds available at cost
![Page 9: Mining public domain data as a basis for drug repurposing...When errors are identified hard to get fixed! Data licensing is confusing –“Open Data” We are “takers” not “givers”](https://reader036.vdocument.in/reader036/viewer/2022070904/5f6e3419abdf663ee862e652/html5/thumbnails/9.jpg)
The FDA Drug Database
![Page 10: Mining public domain data as a basis for drug repurposing...When errors are identified hard to get fixed! Data licensing is confusing –“Open Data” We are “takers” not “givers”](https://reader036.vdocument.in/reader036/viewer/2022070904/5f6e3419abdf663ee862e652/html5/thumbnails/10.jpg)
The DailyMed Database
![Page 11: Mining public domain data as a basis for drug repurposing...When errors are identified hard to get fixed! Data licensing is confusing –“Open Data” We are “takers” not “givers”](https://reader036.vdocument.in/reader036/viewer/2022070904/5f6e3419abdf663ee862e652/html5/thumbnails/11.jpg)
Government Databases Should
Come With a Health Warning
Williams and Ekins, DDT, 16: 747-750 (2011)
![Page 12: Mining public domain data as a basis for drug repurposing...When errors are identified hard to get fixed! Data licensing is confusing –“Open Data” We are “takers” not “givers”](https://reader036.vdocument.in/reader036/viewer/2022070904/5f6e3419abdf663ee862e652/html5/thumbnails/12.jpg)
What is Neomycin?
![Page 13: Mining public domain data as a basis for drug repurposing...When errors are identified hard to get fixed! Data licensing is confusing –“Open Data” We are “takers” not “givers”](https://reader036.vdocument.in/reader036/viewer/2022070904/5f6e3419abdf663ee862e652/html5/thumbnails/13.jpg)
Not this…
![Page 14: Mining public domain data as a basis for drug repurposing...When errors are identified hard to get fixed! Data licensing is confusing –“Open Data” We are “takers” not “givers”](https://reader036.vdocument.in/reader036/viewer/2022070904/5f6e3419abdf663ee862e652/html5/thumbnails/14.jpg)
Substructure # of
Hits
# of
Correct
Hits
No
stereochemistry
Incomplete
Stereochemistry
Complete but
incorrect
stereochemistry
Gonane 34 5 8 21 0
Gon-4-ene 55 12 3 33 7
Gon-1,4-diene 60 17 10 23 10
Williams, Ekins and Tkachenko
Drug Disc Today 17: 685-701 (2012)
Data Errors in the NPC Browser: Analysis of Steroids
![Page 15: Mining public domain data as a basis for drug repurposing...When errors are identified hard to get fixed! Data licensing is confusing –“Open Data” We are “takers” not “givers”](https://reader036.vdocument.in/reader036/viewer/2022070904/5f6e3419abdf663ee862e652/html5/thumbnails/15.jpg)
![Page 16: Mining public domain data as a basis for drug repurposing...When errors are identified hard to get fixed! Data licensing is confusing –“Open Data” We are “takers” not “givers”](https://reader036.vdocument.in/reader036/viewer/2022070904/5f6e3419abdf663ee862e652/html5/thumbnails/16.jpg)
Drug Disambiguation Project
![Page 17: Mining public domain data as a basis for drug repurposing...When errors are identified hard to get fixed! Data licensing is confusing –“Open Data” We are “takers” not “givers”](https://reader036.vdocument.in/reader036/viewer/2022070904/5f6e3419abdf663ee862e652/html5/thumbnails/17.jpg)
NCATS Discovering “New Therapeutic
Uses for Existing Molecules”
58 Molecule names
and identifiers. Where
are the “structures”?
![Page 18: Mining public domain data as a basis for drug repurposing...When errors are identified hard to get fixed! Data licensing is confusing –“Open Data” We are “takers” not “givers”](https://reader036.vdocument.in/reader036/viewer/2022070904/5f6e3419abdf663ee862e652/html5/thumbnails/18.jpg)
NCATS dataset• Several groups tried to collate molecules
• Chris Lipinski provided approximately 30 unique molecules
• Simple molecule descriptors shows no difference between
compounds classified as discontinued (N= 15) or those in
clinical trials (n = 14).
• Where is the definitive set of publicly accessible molecules
for computational repurposing and analysis?
![Page 19: Mining public domain data as a basis for drug repurposing...When errors are identified hard to get fixed! Data licensing is confusing –“Open Data” We are “takers” not “givers”](https://reader036.vdocument.in/reader036/viewer/2022070904/5f6e3419abdf663ee862e652/html5/thumbnails/19.jpg)
Drug structure quality is important..
Many groups ARE doing in silico repositioning
Integrating or using sets of FDA drugs..and if structures are incorrect predictions will be
Where is the definitive set of FDA approved drugs with correct structures?
Ideally we need linkage between in vitro data and clinical data
![Page 20: Mining public domain data as a basis for drug repurposing...When errors are identified hard to get fixed! Data licensing is confusing –“Open Data” We are “takers” not “givers”](https://reader036.vdocument.in/reader036/viewer/2022070904/5f6e3419abdf663ee862e652/html5/thumbnails/20.jpg)
We have a problem…
Lots of data available but quality is suspect
Errors proliferate database to database
Data continues to flow in unabated
When errors are identified hard to get fixed!
Data licensing is confusing – “Open Data”
We are “takers” not “givers” mostly…
Standards are lacking:
Data licensing
Data processing – structure standardization
![Page 21: Mining public domain data as a basis for drug repurposing...When errors are identified hard to get fixed! Data licensing is confusing –“Open Data” We are “takers” not “givers”](https://reader036.vdocument.in/reader036/viewer/2022070904/5f6e3419abdf663ee862e652/html5/thumbnails/21.jpg)
• Let’s agree collaboration and crowdsourcing
can help
• Provide SIMPLE ways to provide feedback
• Contribute when possible – databases should
provide feedback mechanisms
• Adopt standards for structure handling and
representation
• Adopt standards for data interchange
• Allow machine handling of data – use the
power of the semantic web
So what needs to happen to improve?
![Page 22: Mining public domain data as a basis for drug repurposing...When errors are identified hard to get fixed! Data licensing is confusing –“Open Data” We are “takers” not “givers”](https://reader036.vdocument.in/reader036/viewer/2022070904/5f6e3419abdf663ee862e652/html5/thumbnails/22.jpg)
Williams, Ekins and Tkachenko, Drug Disc Today 17: 685-701 (2012)
![Page 23: Mining public domain data as a basis for drug repurposing...When errors are identified hard to get fixed! Data licensing is confusing –“Open Data” We are “takers” not “givers”](https://reader036.vdocument.in/reader036/viewer/2022070904/5f6e3419abdf663ee862e652/html5/thumbnails/23.jpg)
Collaboration on Curation
Collaborate on curation…share through standards and open interfaces
![Page 24: Mining public domain data as a basis for drug repurposing...When errors are identified hard to get fixed! Data licensing is confusing –“Open Data” We are “takers” not “givers”](https://reader036.vdocument.in/reader036/viewer/2022070904/5f6e3419abdf663ee862e652/html5/thumbnails/24.jpg)
All DBs should take comments!
![Page 25: Mining public domain data as a basis for drug repurposing...When errors are identified hard to get fixed! Data licensing is confusing –“Open Data” We are “takers” not “givers”](https://reader036.vdocument.in/reader036/viewer/2022070904/5f6e3419abdf663ee862e652/html5/thumbnails/25.jpg)
Standardize
Use the SRS as guidance for standardization
![Page 26: Mining public domain data as a basis for drug repurposing...When errors are identified hard to get fixed! Data licensing is confusing –“Open Data” We are “takers” not “givers”](https://reader036.vdocument.in/reader036/viewer/2022070904/5f6e3419abdf663ee862e652/html5/thumbnails/26.jpg)
“Appify” curation and collaboration
• The data network is complex
• “Appify” collaboration and
curation networks
• Increasing crowdsourcing role
for data analysis
Ekins & Williams, Pharm Res, 27: 393-395, 2010.
![Page 27: Mining public domain data as a basis for drug repurposing...When errors are identified hard to get fixed! Data licensing is confusing –“Open Data” We are “takers” not “givers”](https://reader036.vdocument.in/reader036/viewer/2022070904/5f6e3419abdf663ee862e652/html5/thumbnails/27.jpg)
Mobile Apps for Drug Discovery
![Page 28: Mining public domain data as a basis for drug repurposing...When errors are identified hard to get fixed! Data licensing is confusing –“Open Data” We are “takers” not “givers”](https://reader036.vdocument.in/reader036/viewer/2022070904/5f6e3419abdf663ee862e652/html5/thumbnails/28.jpg)
Open Drug Discovery Teams
Free iOS app used to expose repurposing data
All of this data has been tweeted http://tinyurl.com/6l9qy4f
Ekins, Clark and Williams, Mol Informatics, in Press 2012
![Page 29: Mining public domain data as a basis for drug repurposing...When errors are identified hard to get fixed! Data licensing is confusing –“Open Data” We are “takers” not “givers”](https://reader036.vdocument.in/reader036/viewer/2022070904/5f6e3419abdf663ee862e652/html5/thumbnails/29.jpg)
Open Drug Discovery Teams
![Page 30: Mining public domain data as a basis for drug repurposing...When errors are identified hard to get fixed! Data licensing is confusing –“Open Data” We are “takers” not “givers”](https://reader036.vdocument.in/reader036/viewer/2022070904/5f6e3419abdf663ee862e652/html5/thumbnails/30.jpg)
Gather stakeholders. Decide if goals are primarily scientific, commercial or mixed.
Explore benefits of open licensing and drawbacks of enclosure. Hold closely to open definitions and standards. Do not write your own IP licenses!
Provide simple explanations for terms of use. Use metadata to indicate licensing terms explicitly - the Creative Commons Rights Expression Language is a good tool.
Do not lock up metadata. If you can’t make the data public domain, make the metadata public domain.
Simple Rules for licensing “open” data
Williams, Wilbanks and Ekins.
PLoS Comput. Biol. in Press Sept.2012
![Page 31: Mining public domain data as a basis for drug repurposing...When errors are identified hard to get fixed! Data licensing is confusing –“Open Data” We are “takers” not “givers”](https://reader036.vdocument.in/reader036/viewer/2022070904/5f6e3419abdf663ee862e652/html5/thumbnails/31.jpg)
Open PHACTS Project Develop a set of robust standards…
Implement the standards in a semantic integration hub
Deliver services to support drug discovery programs in pharma and public domain
22 partners, 8 pharmaceutical companies, 3 biotechs
36 months project
Guiding principle is open access, open usage, open source
- Key to standards adoption -
![Page 32: Mining public domain data as a basis for drug repurposing...When errors are identified hard to get fixed! Data licensing is confusing –“Open Data” We are “takers” not “givers”](https://reader036.vdocument.in/reader036/viewer/2022070904/5f6e3419abdf663ee862e652/html5/thumbnails/32.jpg)
![Page 33: Mining public domain data as a basis for drug repurposing...When errors are identified hard to get fixed! Data licensing is confusing –“Open Data” We are “takers” not “givers”](https://reader036.vdocument.in/reader036/viewer/2022070904/5f6e3419abdf663ee862e652/html5/thumbnails/33.jpg)
![Page 34: Mining public domain data as a basis for drug repurposing...When errors are identified hard to get fixed! Data licensing is confusing –“Open Data” We are “takers” not “givers”](https://reader036.vdocument.in/reader036/viewer/2022070904/5f6e3419abdf663ee862e652/html5/thumbnails/34.jpg)
To facilitate THIS process!
What’s the
structure?
Are they in
our file?
What’s
similar?
What’s the
target?Pharmacology
data?
Known
Pathways?
Working On
Now?Connections
to disease?
Expressed in
right cell type?
Competitors?
IP?
![Page 35: Mining public domain data as a basis for drug repurposing...When errors are identified hard to get fixed! Data licensing is confusing –“Open Data” We are “takers” not “givers”](https://reader036.vdocument.in/reader036/viewer/2022070904/5f6e3419abdf663ee862e652/html5/thumbnails/35.jpg)
It’s not JUST structures of course…
![Page 36: Mining public domain data as a basis for drug repurposing...When errors are identified hard to get fixed! Data licensing is confusing –“Open Data” We are “takers” not “givers”](https://reader036.vdocument.in/reader036/viewer/2022070904/5f6e3419abdf663ee862e652/html5/thumbnails/36.jpg)
Taxol: Paclitaxel Bioassay Data
Most Bioassay data associated with structure with one ambiguous stereocenter
![Page 37: Mining public domain data as a basis for drug repurposing...When errors are identified hard to get fixed! Data licensing is confusing –“Open Data” We are “takers” not “givers”](https://reader036.vdocument.in/reader036/viewer/2022070904/5f6e3419abdf663ee862e652/html5/thumbnails/37.jpg)
Hydrophobic
features (HPF)
Hydrogen
bond acceptor
(HBA)
Hydrogen
bond donor
(HBD)
Observed vs.
predicted IC50
r
Acoustic mediated process 2 1 1 0.92
Disposable tip mediated process 0 2 1 0.80
Data from 2 AstraZeneca patents - Ephrin pharmacophores
developed using data for 14 compounds with IC50. Different
dispensing methods give different results. Impact
hypotheses and could impact drug discovery.
Ekins, Olechno and Williams, Submitted 2012
Acoustic Disposable tip
Measuring data: dispensing dependencies
![Page 38: Mining public domain data as a basis for drug repurposing...When errors are identified hard to get fixed! Data licensing is confusing –“Open Data” We are “takers” not “givers”](https://reader036.vdocument.in/reader036/viewer/2022070904/5f6e3419abdf663ee862e652/html5/thumbnails/38.jpg)
Acoustically-derived IC50 values were 1.5 to 276.5-fold
lower than for tip-based dispensing
• Pharmacophores and other computational models are used
to guide medicinal chemistry.
• Non tip-based methods may improve HTS results and avoid
misleading computational and statistical models.
• No analysis of influence of dispensing processes on data.
• Public databases should annotate metadata to create larger
datasets for comparing different computational methods.
How much data is reproducible, accurate, valid? The
challenge of high-throughput science.
Measuring data: dispensing dependencies
![Page 39: Mining public domain data as a basis for drug repurposing...When errors are identified hard to get fixed! Data licensing is confusing –“Open Data” We are “takers” not “givers”](https://reader036.vdocument.in/reader036/viewer/2022070904/5f6e3419abdf663ee862e652/html5/thumbnails/39.jpg)
Conclusions
![Page 40: Mining public domain data as a basis for drug repurposing...When errors are identified hard to get fixed! Data licensing is confusing –“Open Data” We are “takers” not “givers”](https://reader036.vdocument.in/reader036/viewer/2022070904/5f6e3419abdf663ee862e652/html5/thumbnails/40.jpg)
Acknowledgments
Sean Ekins
Christopher Lipinski
Joe Olechno
John Wilbanks
Drug Disambiguation project team
RSC Cheminformatics Team
![Page 41: Mining public domain data as a basis for drug repurposing...When errors are identified hard to get fixed! Data licensing is confusing –“Open Data” We are “takers” not “givers”](https://reader036.vdocument.in/reader036/viewer/2022070904/5f6e3419abdf663ee862e652/html5/thumbnails/41.jpg)
Thank you
Email: [email protected]
Twitter: @chemconnector
Blog: www.chemconnector.com
SLIDES: www.slideshare.net/AntonyWilliams
Email: [email protected]: collabchemBlog: http://www.collabchem.com/