the collection, curation and modeling of open melting point measurements
DESCRIPTION
Jean-Claude Bradley and Andrew Lang present at the 5th Meeting on U.S. Government Chemical Databases and Open Chemistry on August 26, 2011 about "The collection, curation and modeling of Open Melting Point measurements". The talk also covers the role of Open Notebook Science and Google Apps Scripts in this effort.TRANSCRIPT
![Page 1: The collection, curation and modeling of Open Melting Point measurements](https://reader034.vdocument.in/reader034/viewer/2022042814/55504865b4c9058f768b4f27/html5/thumbnails/1.jpg)
The collection, curation and modeling of Open Melting Point measurements
August 26, 2011
5th Meeting on U.S. Government Chemical Databases and Open Chemistry
Jean-Claude Bradley
Department of ChemistryDrexel University
Andrew Lang
Department of MathematicsOral Roberts University
Antony Williams
ChemSpiderRoyal Society of
Chemistry
![Page 2: The collection, curation and modeling of Open Melting Point measurements](https://reader034.vdocument.in/reader034/viewer/2022042814/55504865b4c9058f768b4f27/html5/thumbnails/2.jpg)
The Problem of Data Quality in Chemistry
• Lack of provenance
•Reliance on a system of “trusted sources”
• CRC Handbook•Merck Index• Chemical Vendor Catalogs (e.g. Sigma-Aldrich)• Peer-Reviewed Journals
In the case of melting points:
![Page 3: The collection, curation and modeling of Open Melting Point measurements](https://reader034.vdocument.in/reader034/viewer/2022042814/55504865b4c9058f768b4f27/html5/thumbnails/3.jpg)
Strategy for the curation of melting points
Using technology, we can begin to replace the “trusted source”
model with one based on transparency and provenance
1. Rely on redundancy when possible2. Provide the maximum level of
provenance when necessary (Open Notebook Science)
3. Adhere to Open Data, Open Descriptors and Open Algorithms for measurements and modeling
![Page 4: The collection, curation and modeling of Open Melting Point measurements](https://reader034.vdocument.in/reader034/viewer/2022042814/55504865b4c9058f768b4f27/html5/thumbnails/4.jpg)
The Chemical Information Validation Sheet
567 curated and referenced measurements from Fall 2010 Chemical Information Retrieval course
![Page 5: The collection, curation and modeling of Open Melting Point measurements](https://reader034.vdocument.in/reader034/viewer/2022042814/55504865b4c9058f768b4f27/html5/thumbnails/5.jpg)
Investigating the m.p. inconsistencies of EGCG
![Page 6: The collection, curation and modeling of Open Melting Point measurements](https://reader034.vdocument.in/reader034/viewer/2022042814/55504865b4c9058f768b4f27/html5/thumbnails/6.jpg)
Most popular data sources
![Page 7: The collection, curation and modeling of Open Melting Point measurements](https://reader034.vdocument.in/reader034/viewer/2022042814/55504865b4c9058f768b4f27/html5/thumbnails/7.jpg)
Alfa Aesar donates melting points to the public
![Page 8: The collection, curation and modeling of Open Melting Point measurements](https://reader034.vdocument.in/reader034/viewer/2022042814/55504865b4c9058f768b4f27/html5/thumbnails/8.jpg)
Open Melting Point Explorer
![Page 9: The collection, curation and modeling of Open Melting Point measurements](https://reader034.vdocument.in/reader034/viewer/2022042814/55504865b4c9058f768b4f27/html5/thumbnails/9.jpg)
OutliersMDPI
datasetEPA/PhysProp
(donated all data to public also)
![Page 10: The collection, curation and modeling of Open Melting Point measurements](https://reader034.vdocument.in/reader034/viewer/2022042814/55504865b4c9058f768b4f27/html5/thumbnails/10.jpg)
Outliers for ethanol: Alfa Aesar and Oxford MSDS
![Page 11: The collection, curation and modeling of Open Melting Point measurements](https://reader034.vdocument.in/reader034/viewer/2022042814/55504865b4c9058f768b4f27/html5/thumbnails/11.jpg)
Inconsistencies and SMILES problems within MDPI dataset
![Page 12: The collection, curation and modeling of Open Melting Point measurements](https://reader034.vdocument.in/reader034/viewer/2022042814/55504865b4c9058f768b4f27/html5/thumbnails/12.jpg)
MDPI Dataset labeled with High Trust Level
![Page 13: The collection, curation and modeling of Open Melting Point measurements](https://reader034.vdocument.in/reader034/viewer/2022042814/55504865b4c9058f768b4f27/html5/thumbnails/13.jpg)
EPA/PHYSPROP Structure Errors (Incorrect Valence): 2315 out of 43543 were contained pentavalent
nitrogens
![Page 14: The collection, curation and modeling of Open Melting Point measurements](https://reader034.vdocument.in/reader034/viewer/2022042814/55504865b4c9058f768b4f27/html5/thumbnails/14.jpg)
EPA/PHYSPROP Errors: Structure displayed is for the neutral compound dopamine but the associated CAS
Number and chemical name in the file are for the hydrobromide salt.
![Page 15: The collection, curation and modeling of Open Melting Point measurements](https://reader034.vdocument.in/reader034/viewer/2022042814/55504865b4c9058f768b4f27/html5/thumbnails/15.jpg)
Common errors in datasets
1. multiple melting points for the same compound in the same database
2. stereochemistry issues3. sign inversion4. conversion errors (Kelvin/Celcius
Fahrenheit/Celcius)5. bad SMILES (non-rendering)6. salts associated with SMILES for free base7. using boiling point for melting point
![Page 16: The collection, curation and modeling of Open Melting Point measurements](https://reader034.vdocument.in/reader034/viewer/2022042814/55504865b4c9058f768b4f27/html5/thumbnails/16.jpg)
Open melting point datasets
Double+ validated: 2706 compounds (7413 highly curated measurements. range: 0.01-5 C. Compounds that had at least one chiral center, possessed cis/trans isomerism, were inorganic or a salt removed.)
Entire dataset: 19933 unique compounds (27684 measurements – no inorganics or salts)
![Page 17: The collection, curation and modeling of Open Melting Point measurements](https://reader034.vdocument.in/reader034/viewer/2022042814/55504865b4c9058f768b4f27/html5/thumbnails/17.jpg)
Open Models with Open Data Using Open Descriptors (CDK)
![Page 18: The collection, curation and modeling of Open Melting Point measurements](https://reader034.vdocument.in/reader034/viewer/2022042814/55504865b4c9058f768b4f27/html5/thumbnails/18.jpg)
Modeling Results
Model Training set Test set (TS) Descriptors TS AAE TS RMSE TS R2
1 2205 500 132 2D 29.51 40.91 0.82
1 2204 500 170 2D/3D 29.52 40.79 0.83
2 16015 500 137 2D 26.62 36.35 0.86
3 16015 3500 137 2D 29.36 40.18 0.81
![Page 19: The collection, curation and modeling of Open Melting Point measurements](https://reader034.vdocument.in/reader034/viewer/2022042814/55504865b4c9058f768b4f27/html5/thumbnails/19.jpg)
Melting point prediction service
![Page 20: The collection, curation and modeling of Open Melting Point measurements](https://reader034.vdocument.in/reader034/viewer/2022042814/55504865b4c9058f768b4f27/html5/thumbnails/20.jpg)
Melting point predictions and measurements on iPhone/iPad (Alex Clark)
![Page 21: The collection, curation and modeling of Open Melting Point measurements](https://reader034.vdocument.in/reader034/viewer/2022042814/55504865b4c9058f768b4f27/html5/thumbnails/21.jpg)
Publication of double+ validated melting point dataset to Nature Precedings and LuLu
![Page 22: The collection, curation and modeling of Open Melting Point measurements](https://reader034.vdocument.in/reader034/viewer/2022042814/55504865b4c9058f768b4f27/html5/thumbnails/22.jpg)
For all Formats of ONS Projects
![Page 23: The collection, curation and modeling of Open Melting Point measurements](https://reader034.vdocument.in/reader034/viewer/2022042814/55504865b4c9058f768b4f27/html5/thumbnails/23.jpg)
Open Melting Point DatasetsCurrently 20,000 compounds with Open MPs
![Page 24: The collection, curation and modeling of Open Melting Point measurements](https://reader034.vdocument.in/reader034/viewer/2022042814/55504865b4c9058f768b4f27/html5/thumbnails/24.jpg)
Some melting points can’t be resolved only with literature: 4-benzyltoluene
![Page 25: The collection, curation and modeling of Open Melting Point measurements](https://reader034.vdocument.in/reader034/viewer/2022042814/55504865b4c9058f768b4f27/html5/thumbnails/25.jpg)
Motivation: Faster Science, Better Science
![Page 26: The collection, curation and modeling of Open Melting Point measurements](https://reader034.vdocument.in/reader034/viewer/2022042814/55504865b4c9058f768b4f27/html5/thumbnails/26.jpg)
Open Lab Notebook page measuring the melting point of 4-benzyltoluene
![Page 27: The collection, curation and modeling of Open Melting Point measurements](https://reader034.vdocument.in/reader034/viewer/2022042814/55504865b4c9058f768b4f27/html5/thumbnails/27.jpg)
Using melting point for temperature dependent solubility prediction
![Page 28: The collection, curation and modeling of Open Melting Point measurements](https://reader034.vdocument.in/reader034/viewer/2022042814/55504865b4c9058f768b4f27/html5/thumbnails/28.jpg)
Crowdsourcing Solubility Data
![Page 29: The collection, curation and modeling of Open Melting Point measurements](https://reader034.vdocument.in/reader034/viewer/2022042814/55504865b4c9058f768b4f27/html5/thumbnails/29.jpg)
Integration of Multiple Web Services to Recommend Solvents for Reactions
![Page 30: The collection, curation and modeling of Open Melting Point measurements](https://reader034.vdocument.in/reader034/viewer/2022042814/55504865b4c9058f768b4f27/html5/thumbnails/30.jpg)
![Page 31: The collection, curation and modeling of Open Melting Point measurements](https://reader034.vdocument.in/reader034/viewer/2022042814/55504865b4c9058f768b4f27/html5/thumbnails/31.jpg)
All ONS web services
![Page 32: The collection, curation and modeling of Open Melting Point measurements](https://reader034.vdocument.in/reader034/viewer/2022042814/55504865b4c9058f768b4f27/html5/thumbnails/32.jpg)
Google Apps Scripts web services
![Page 33: The collection, curation and modeling of Open Melting Point measurements](https://reader034.vdocument.in/reader034/viewer/2022042814/55504865b4c9058f768b4f27/html5/thumbnails/33.jpg)
Google Apps Scripts for conveniently exploring melting point data
![Page 34: The collection, curation and modeling of Open Melting Point measurements](https://reader034.vdocument.in/reader034/viewer/2022042814/55504865b4c9058f768b4f27/html5/thumbnails/34.jpg)
Straight chain carboxylic acids from 1 to 10 carbons
Straight chain alcohols from 1 to 10 carbons
Comparison of model with triple validated measurements
![Page 35: The collection, curation and modeling of Open Melting Point measurements](https://reader034.vdocument.in/reader034/viewer/2022042814/55504865b4c9058f768b4f27/html5/thumbnails/35.jpg)
Cyclic primary amines from 3 to 6 carbons (cyclobutylamine flagged for validation – only single
source available)
![Page 36: The collection, curation and modeling of Open Melting Point measurements](https://reader034.vdocument.in/reader034/viewer/2022042814/55504865b4c9058f768b4f27/html5/thumbnails/36.jpg)
Google Apps Scripts for planning reactions and creating schemes
![Page 37: The collection, curation and modeling of Open Melting Point measurements](https://reader034.vdocument.in/reader034/viewer/2022042814/55504865b4c9058f768b4f27/html5/thumbnails/37.jpg)
Open Melting Points in Supplementary Data Pages of Wikipedia (Martin Walker)
![Page 38: The collection, curation and modeling of Open Melting Point measurements](https://reader034.vdocument.in/reader034/viewer/2022042814/55504865b4c9058f768b4f27/html5/thumbnails/38.jpg)
Conclusions
• For science to progress quickly there is great benefit in moving away from a “trusted source” model to one based on transparency and data provenance
•Open Notebook Science offers an efficient way to make research transparent and discoverable