pubchem: a significant resource for scientists · pdf filepubchem: a significant resource for...
TRANSCRIPT
![Page 1: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/1.jpg)
PubChem: A significant resource for scientists
Evan Bolton, Ph.D.NCBI/NLM/NIH
5th Meeting on U.S. Government Chemical Databases and Open Chemistry
August 25, [email protected]
![Page 2: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/2.jpg)
What is PubChem?
• An open archive– anyone can contribute
• chemical structures• synonyms• comments• biological experiments• cross references• records versioned• URLs
– links external resources– voluntary data push– automated updates
• A public resource– anyone can access
• data downloadable• search, browse, retrieve
– integrated• literature• sequences, protein 3‐D
– analysis capabilities– programmatic layers
• PUG, PUG/SOAP• Entrez Utilities• URL‐based interfaces
![Page 3: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/3.jpg)
What is PubChem?
• An open archive– anyone can contribute
• chemical structures• synonyms• comments• biological experiments• cross references• records versioned• URLs
– links external resources– voluntary data push– automated updates
• A public resource– anyone can access
• data downloadable• search, browse, retrieve
– integrated• literature• sequences, protein 3‐D
– analysis capabilities– programmatic layers
• PUG, PUG/SOAP• Entrez Utilities• URL‐based interfaces
![Page 4: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/4.jpg)
PubChem home page…
http://pubchem.ncbi.nlm.nih.gov
![Page 5: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/5.jpg)
PubChem contributors are many…
![Page 6: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/6.jpg)
PubChem contents are growing…Depositors Chemicals
Biological Assays Bioactivities
Tested Chemicals
Protein Targets
![Page 7: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/7.jpg)
PubChem is heavily used…
![Page 8: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/8.jpg)
PubChem is a global resource…
Interactive usage by country (Jul 15 2010 – Aug 15 2010)
![Page 9: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/9.jpg)
PubChem data relationships…
Unique chemical structure content of PubChem
MixtureSalt
ParentComponents
“Identity groups”Exactly SameSame IsotopeSame Stereo
Same ConnectivityTautomers
Depositor providedPrimary accession SID Primary accession CID
Depositor providedPrimary accession AID
![Page 10: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/10.jpg)
The state of chemical information
(An aside)
![Page 11: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/11.jpg)
The sad state of chemical information
![Page 12: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/12.jpg)
Let’s talk chemical information…
• No “Global” rules or standards• based on individual organizational needs• often based on individual preferences• depictions of chemical structures
• PubChem accepts data from many organizations• conflicting “business rules”• previously unseen data representation schemes• combinatorial ways of drawing the same structure
![Page 13: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/13.jpg)
What do you mean by that?
• “C” means?– form of carbon?
• which one?– diamond?– graphite?– coal?– graphene?– charcoal?– carbon black?– nanotube?
– methane?
![Page 14: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/14.jpg)
Image from Wikipedia
http://en.wikipedia.org/wiki/Don_Quixote
![Page 15: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/15.jpg)
Image from Wikipedia
http://en.wikipedia.org/wiki/Don_Quixote
![Page 16: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/16.jpg)
What did you mean by that?
• Case Study:
(+)‐Iridodial
Defense chemicals from abdominal glands of 13 rove beetle species of subtribe Staphylinina
Ring Closed
Ring Open
![Page 17: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/17.jpg)
A chemical structure may be represented in many different ways
![Page 18: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/18.jpg)
A chemical structure may be represented in many different ways
![Page 19: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/19.jpg)
What do you mean by “sodium acetate”?
![Page 20: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/20.jpg)
Stereochemistry
• Import issues– Often obtained by perception of atom coordinates
• Coordinates or stereo wedges may be ambiguous
– Inconsistency between software packages for same file
• Export issues– Improper/inconsistent use of file format
• Format conversion adds/removes/changes stereo• Relative stereochemistry improperly treated• Depiction vs. machine readable
• Curated data may become corrupted!
Bigproblem
![Page 21: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/21.jpg)
Do we have a “defined” structure?
![Page 22: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/22.jpg)
Is the structure reasonable?
![Page 23: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/23.jpg)
The (sad) state of chemical information
(End of aside)
![Page 24: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/24.jpg)
• Verification– Atom element– Implicit hydrogen– Functional group– Valence
• Standardization– Tautomer invariance– Aromaticity detection– Stereochemistry– Explicit hydrogen
• Calculation– Coordinates– Properties– Descriptors
• Components– Isolate covalent units– Neutralize (+/‐ proton)– Reprocess– Detect unique
Automated structure processing...
![Page 25: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/25.jpg)
PubChem data access…
• Interfaces– text/numeric search– fielded/range search– precomputed similarities
• 2‐D, 3‐D, identity groups
– inter‐database links• biomedical literature• MeSH ontology
– biological roles
• protein 3‐D• pathways
– external resource links
• Tools– bioactivity analysis– chemical clustering– chemical structure search– data download– FTP site– heatmap analysis– integrated 3‐D layer– similarity computation– source summary– structure normalization
![Page 26: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/26.jpg)
PubChem data access…
• Interfaces– text/numeric search– fielded/range search– precomputed similarities
• 2‐D, 3‐D, identity groups
– inter‐database links• biomedical literature• MeSH ontology
– biological roles
• protein 3‐D• pathways
– external resource links
• Tools– bioactivity analysis– chemical clustering– chemical structure search– data download– FTP site– heatmap analysis– integrated 3‐D layer– similarity computation– source summary– structure normalization
![Page 27: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/27.jpg)
Entrez interface…
• Primary (text‐based) search engine
Rapidresult subsets
Google‐likeapproach…most likely answer is at the top…
Resultrecord
summaries
User query
![Page 28: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/28.jpg)
Entrez interface…
• Advanced search capability– makes it easy to rapidly create complex queries
– helps with discoverability of indexes/filters
![Page 29: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/29.jpg)
Entrez interface…
• “History” query result management– AND, OR, NOT operations
![Page 30: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/30.jpg)
Entrez interface…
• Each database has lots of specialized indexes and filters– PubChem Compound
• +50 indexese.g., aspirin[synonym]
• +60 filterse.g., “"has 3d conformer”[filter]
![Page 31: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/31.jpg)
Fielded queries to the rescue!
• Interested in chemical names?
Search just chemical name indexes
• “aspirin”– global keyword search ‐ 69 hits
• “aspirin”[Synonym]– keyword search ‐ 53 hits ‐many derivatives, mixtures, salts
• “aspirin”[CompleteSynonym]– exactly matches name ‐ 1 hit
![Page 32: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/32.jpg)
Case study… “glucose”
• Search by global keyword … 1,131 hits!
![Page 33: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/33.jpg)
Case study… “glucose”
• Search by “glucose[Synonym]”… 975 hits!
![Page 34: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/34.jpg)
Case study… “glucose”
• “glucose[CompleteSynonym]”… 4 hits!
![Page 35: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/35.jpg)
What is data quality?
Ideal• Validated• Available• Complete• Succinct
• Useful• Facile• Seamless
• Happy user
Usually found• Best guess• Something close• Fragmented• Verbose
• Might help• Lots of work• Issues
• Frustrated user
![Page 36: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/36.jpg)
What is data quality?
Ideal• Validated• Available• Complete• Succinct
• Useful• Facile• Seamless
• Happy user
Usually found• Best guess• Something close• Fragmented• Verbose
• Might help• Lots of work• Issues
• Frustrated user
![Page 37: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/37.jpg)
How many names in PubChem?
49.0 million!
Provided more than once:11.5 million {23.5% of 49.0M}
Unique chemical names:4.65 million {40.9% of 11.5M}
![Page 38: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/38.jpg)
Chemical name “consistency” filtering
Ensure name‐chemical associations are consistent at some level of structural “sameness”– Same structure
– Same stereo isomer• varies by isotope
– Same parent structure• varies by charge/salt
– Same parent stereo isomer• varies by charge/salt/isotope
– Same connectivity• varies by isotope/stereo
– Same parent connectivity • varies by charge/salt/isotope/stereo
![Page 39: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/39.jpg)
Can match a name to one “chemical”?
• Yes! And often!
• One vote per depositor– First check that depositor is consistent
• But what consistency ratio?– 2 out of 3 is okay!– 3 out of 4 is okay!– 3 out of 5 is okay!
60%
![Page 40: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/40.jpg)
Affect of filtering on chemical names
• 4.65M unique chemical names
• Assign Synonym to a single “CID” Sliding quality scale
– 4.61M (99.1%) names with “consistent” structure
Observation: Very few cases where inconsistency is found!
![Page 41: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/41.jpg)
Depositors agree… but stereo an issue
One Vote, 60% % of Total
CID 3,671,623 79.7%
STE 4,591 0.1%
PCID 40,209 0.9%
PSTE 6 0.0%
CON 887,314 19.3%
PCON 4,643 0.1%
• CID – same exact structure– no variation
• STE – same structure stereo form– variable isotopic form
• CON – same structure connectivity– variable stereo/isotopic form
• PCID – same exact parent structure– variable salt/charge state form
• PSTE – same parent structure stereo form– variable salt/charge state/isotopic form
• PCON – same parent structure connectivity– variable salt/charge state/isotopic/stereo
form
![Page 42: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/42.jpg)
PubChem data access…
• Interfaces– text/numeric search– fielded/range search– precomputed similarities
• 2‐D, 3‐D, identity groups
– inter‐database links• biomedical literature• MeSH ontology
– biological roles
• protein 3‐D• pathways
– external resource links
• Tools– bioactivity analysis– chemical clustering– chemical structure search– data download– FTP site– heatmap analysis– integrated 3‐D layer– similarity computation– source summary– structure normalization
![Page 43: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/43.jpg)
Compound Summary
![Page 44: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/44.jpg)
BioMedical Annotation
![Page 45: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/45.jpg)
BioMedical Annotation
![Page 46: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/46.jpg)
Safety and Toxicology … Literature
![Page 47: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/47.jpg)
Biological Assay Results
![Page 48: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/48.jpg)
Pathway and Protein Information
![Page 49: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/49.jpg)
Synonyms and Computed Properties
![Page 50: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/50.jpg)
Compound and Substance Information
![Page 51: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/51.jpg)
Streamlined access to depositor websites
![Page 52: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/52.jpg)
Streamlined access to depositor websites
![Page 53: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/53.jpg)
Entrez interface…
• Primary (text‐based) search engine
![Page 54: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/54.jpg)
BioActivity Analysis Tool
![Page 55: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/55.jpg)
Download Facility
![Page 56: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/56.jpg)
Structure Clustering Tool
![Page 57: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/57.jpg)
Structure Clustering Tool
![Page 58: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/58.jpg)
Chemical structure search
• Structure query interface– One tab for each query type
![Page 59: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/59.jpg)
Chemical structure sketcher
• Ability to dynamically enter complex structural queries without a plugin
Ihlenfeldt WD, Bolton EE, Bryant SH. The PubChem chemical structure sketcher. J Cheminform. 2009 Dec 17;1(1):20. [PMID: 20298522]
![Page 60: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/60.jpg)
Score Matrix Service
• Pair‐wise scores in matrix format– Similarity scores between compounds
• Allows users to obtain PubChem scores for arbitrary CID lists
• Enables further (external) analysis
![Page 61: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/61.jpg)
PubChem data submission
How do users put data into PubChem?
http://pubchem.ncbi.nlm.nih.gov/deposit
![Page 62: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/62.jpg)
Standardization Service
• Performs PubChem chemical structure “standardization”– Provides CID if structure is in PubChem
• Allows users to examine PubChem methodology affects on their data
![Page 63: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/63.jpg)
• Verification– Atom element– Implicit hydrogen– Functional group– Valence
• Standardization– Tautomer invariance– Aromaticity detection– Stereochemistry– Explicit hydrogen
• Calculation– Coordinates– Properties– Descriptors
• Components– Isolate covalent units– Neutralize (+/‐ proton)– Reprocess– Detect unique
Automated structure processing...
![Page 64: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/64.jpg)
Standardization Service
• Performs PubChem chemical structure “standardization”– Provides CID if structure is in PubChem
• Allows users to examine PubChem methodology affects on their data
![Page 65: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/65.jpg)
Power User Gateway (PUG)
• Programmatic interface to many PubChem services
• Allows scripted access to PubChem
• Enables one to save a query/view
• SOAP interface– Accessible by Pipeline
Pilot, Taverna, Java, PERL, Python, VB.net, C#.net, etc.
![Page 66: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/66.jpg)
InChI Compound‐based Lookup
![Page 67: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/67.jpg)
InChIKey Compound‐based Lookup
![Page 68: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/68.jpg)
Integrated 3‐D Layer
![Page 69: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/69.jpg)
Integrated 3‐D Layer
![Page 70: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/70.jpg)
Integrated 3‐D Layer
![Page 71: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/71.jpg)
Integrated 3‐D Layer
![Page 72: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/72.jpg)
What is a “Similar Conformer”?
• When two conformers have …… similar shapes (ST >= 0.80)… similar features (CT >= 0.50)
… BUT only shape optimized
Similarity scores: Shape = 92%; Feature = 54%
![Page 73: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/73.jpg)
PubChem Publications…{Click}
![Page 74: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/74.jpg)
PubChem Publications…
![Page 75: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/75.jpg)
PubChem3D Thematic Series
http://www.jcheminf.com/series/PubChem3D
![Page 76: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/76.jpg)
Summary
• PubChem is a chemical biology resource– open and public to all– continues to grow rapidly– many tools to get at the information you need– uses Google‐like approach of likely answers first
• Fundamental problems exist in chemical information exchange– stereo corruption a major issue– chemical name filtering helps remove noise
![Page 77: PubChem: A significant resource for scientists · PDF filePubChem: A significant resource for scientists Evan Bolton, Ph.D. NCBI/NLM/NIH 5th Meeting on U.S. Government Chemical Databases](https://reader034.vdocument.in/reader034/viewer/2022051721/5a7a77c57f8b9a0d098dc039/html5/thumbnails/77.jpg)
PubChem Crew …
Steve BryantJie Chen
Tiejun ChenLewis Geer
Asta GindulyteVolker Hahnke
Lianyi HanJane He
Siqian HeKenneth Karapetian
Sunghwan KimQingliang Li
Ben Shoemaker
Tugba SuzekPaul Thiessen
Jiyao WangYanli WangJewen Xiao
Bo YuJian ZhangJun Zhang