chemical and pharmaceutical databases and their …nist chemistry webbook - database of organic...
TRANSCRIPT
Chemical and pharmaceutical databases and their
using
The open Web offers a rich collection of diverse chemical data sources.
Chemistry Databases on the Web
CCOHS: Web Information Services - Offers subscriptions to chemical
databases as well as free access to Chemindex.
CHEMnetBASE - Online access to major chemical reference works from
Chapman and Hall/CRC including Combined Chemical Dictionary (CCD), The
Handbook of Chemistry and Physics, Polymers: A Property Database, Properties of
Organic Compounds.
CML Reference Collection - Commercial database of more than 120,000
3D molecular structures expressed in CML (Chemical Markup Language) format.
Chem Sources - Supplier information for approximately 200000
chemical compounds. Searching for compounds and supplier information is fee
based.
ChemBank - Initiative for Chemical Genetics. Freely available collection
of data about small molecules (over 2000 compounds) and resources for studying
their properties, especially their effects on biology.
ChemExpo - News, stock reports, products, directories, classifieds, and
profiles of commercially available chemical products; focus reports;
ChemIDplus - A free database of 350000 chemical compounds. Records
consist of name, synonyms, CAS number, molecular formula, and direct links to
biomedical resources. Included also over 100000 structures. Searchable by
substructure with free Chime plugin.
ChemSpider - Free chemistry search engine which aggregates and
indexes chemical structures and their associated information into a single searchable
repository.
Chemical Abstracts Service (CAS) - Provides pathways to published
research in the world's journal and patent literature through tools such as SciFinder,
SciFinder Scholar and STN. CAS databases provide access to over 25 million
documents, 26 million chemical substances, and 56 million sequences.
DDBST - Dortmund Data Bank and DDB Software Package - Database
of thermodynamic and transport properties of pure components and mixtures for use
in process design and development and related software package.
DETHERM Thermophysical Properties - Thermophysical properties of
pure substances and mixtures. Phase equilibrium data, caloric data, critical data,
transport properties, electrolyte data. Data sets with descriptors and bibliographic
references.
EaSI-Pro - On-line, fee-based database of dangerous substances and
(inter)national legislation on 200.000 chemicals.
FTIRsearch.com - Search a database of 87,000 FTIR and Raman spectra
using text or full spectral comparison. JAVA required.
Hazardous Chemical Database - Contains names, formulas and registry
numbers for 1991 hazardous chemicals. This site was last updated in 1996.
IRSLDB-Infrared and Raman Spectroscopy Literature Database -
Bibliography of infrared and Raman spectroscopy and its applications. Provides
reference data (title, author(s), journal, page, date info.) collected from 123 journals,
grouped into 129 classes.
IUPAC-NIST Solubility Data Series Database - Comprising 11 volumes
from the IUPAC Solubility Data Series liquids with liquids. A limited number of
multi-component (organic-water-salt) systems are also included.
Indiana University Molecular Structure Center - Small molecule
database of structures that have primarily been determined using single crystal X-ray
diffraction. Features JAVA applets that can be used to orient and examine the
molecule, and a dedicated Beowulf computer system to generate high quality
rendered images. Currently contains over 2,500 molecules.
JAICI - JAICI is a not-for-profit scientific information service
organization featuring, in particular, chemical expertise.
LiqCryst - Online - Information about approximately 75000 compounds
with liquid crystalline properties. Compound search is free, the compound data are
fee based.
MALDI Database from NIST Polymers Division - Database of methods,
taken largely from the peer-review scientific literature, for matrix-assisted laser
desorption ionization (MALDI) mass spectrometry of synthetic polymers.
NIH Chemical Information - Database of substances ranging from drugs
to hazardous chemicals, maintained by the US National Library of Medicine,
Bethesda, Maryland.
NIST Chemistry WebBook - Database of organic chemistry compounds,
organized by species. Contains chemical and physical property data on over 30,000
compounds.
Nanogen Index 2 - Offers a book containing data on over 3000 pesticides
and other environmental contaminants.
Organic Reactions on Wiley InterScience - Comprehensive database of
important synthetic reactions, with all published examples of the topic reactions.
Organic Syntheses - Free electronic version of printed Organic Syntheses
series - detailed reliable experimental methods for the synthesis of organic
compounds.
Organic Syntheses - Describes checked and edited experimental
procedures, spanning a broad range of synthetic methodologies. CrossRef(R) and
Chemport(R) enabled reference linking.
Organic Synthesis series - All annual volumes of Organic Syntheses in
ISIS/Base format. Checked and edited experimental synthetic procedures.
PubChem - Free database of chemical structures of small organic
molecules and information on their biological activities. Linked with NIH
PubMed/Entrez information retrieval system.
Public Chromatography Applications Database - Contains HPLC and
GC applications provided by major vendors of chromatography equipment. Can be
searched by numerous chromatographic parameters, as well as by structure,
substructure, and structural similarity.
Query Chem - Chemical search engine powered by the Google Web
API.
RADEN Data Bank - Is designed for compilation of experimental
researches and calculations of the RADiative and ENergy parameters defining the
distribution of intensity in electronic and vibrational spectra of diatomic molecules.
Reciprocal Net - Distributed database used by research crystallographers
to store information about 3D molecular structures.
SDBS, Spectral Database for Organic Compounds - Integrated spectral
database system for organic compounds, which includes the following spectra -
Electron Impact Mass (EI-MS), Fourier Transform Infrared (FT-IR), 1H and 13C
NMR, Raman and Electron Spin Resonance (ESR).
SOLV-DB - A free database of commercially available solvents
searchable by many properties. Compiled by National Center for Manufacturing
Sciences.
SPRESIweb - InfoChem GmbH - The new SpresiWeb developed by
InfoChem is an integrated scientific database containing over 4.5 million molecules,
3.5 million reactions, 380,000 references, 98,000 patents covering years 1974 - 1999.
STM Data - Data collections from Science, Technology and Medicine
(STM): AntiBase (natural compound identifier), AmicBase (antimicrobial activity
verifier), Mass Spectra (MS) of Pharmaceuticals and Agrochemicals, Infrared Spectra
(IR) of Polymers.
Spectra databases for analytical laboratories - ATR, FT IR, and Raman
Spectra of chemicals and other compounds. Includes polymers, solvents, forensics,
etc.
Structure Database of Chemicals with Pharmaceutical Activity -
Animations require the Chime plug-in to view. From Oxford University, UK.
Synthetic Pages - Free interactive database of practical and reliable
organic, organometallic and inorganic chemical syntheses submitted by synthetic
chemists.
The ChemExper Chemicals Directory - A catalog of 60000 fine
chemicals searchable for free using a chemist-friendly search engine.
Thermochemistry Resource - Free database containing thermochemistry
data for over 400 compounds relevant to high-temperature processes used in
materials synthesis. Searching by molecular formula.
Thermodata - Databases and software for thermochemistry.
WebReactions - Organic Reaction Retrieval System - Free reaction
search system offering direct retrieval of reaction precedents. Search based on
reaction types and bonding change.
Wiley Database of Polymer Properties - Physical property data for
polymers commercially available, with experimentally determined and selected data
for over 2,500 polymers. Derived from the Polymer Handbook.
PubChem “… organized as three linked databases …
PubChem Substance, PubChem Compound, and PubChem BioAssay.”
PubChem is a database of chemical molecules and their activities against
biological assays. The system is maintained by the National Center for Biotechnology
Information (NCBI), a component of the National Library of Medicine, which is part
of the United States National Institutes of Health (NIH). PubChem can be accessed
for free through a web user interface. Millions of compound structures and
descriptive datasets can be freely downloaded via FTP. PubChem contains substance
descriptions and small molecules with fewer than 1000 atoms and 1000 bonds. The
American Chemical Society tried to get the U.S. Congress to restrict the operation of
PubChem, because they claim it competes with their Chemical Abstracts Service.
More than 80 database vendors contribute to the growing PubChem database.
PubChem consists of three dynamically growing primary databases. As of 7
January 2011:
Compounds, 31 million entries, contains pure and characterized
chemical compounds.
Substances, 75 million entries, contains also mixtures, extracts,
complexes and uncharacterized substances.
BioAssay, bioactivity results from 1644 high-throughput screening
programs with several million values.
Searching the databases is possible for a broad range of properties including
chemical structure, name fragments, chemical formula, molecular weight, XLogP,
and hydrogen bond donor and acceptor count.
PubChem contains its own online molecule editor with SMILES/SMARTS
and InChI support that allows the import and export of all common chemical file
formats to search for structures and fragments.
Each hit provides information about synonyms, chemical properties,
chemical structure including SMILES and InChI strings, bioactivity, and links to
structurally related compounds and other NCBI databases like PubMed.
In the text search form the database fields can be searched by adding the
field name in square brackets to the search term. A numeric range is represented by
two numbers separated by a colon. The search terms and field names are case-
insensitive. Parentheses and the logical operators AND, OR, and NOT can be used.
AND is assumed if no operator is used.
Example (Lipinski's Rule of Five): 0:500[mw] 0:5[hbdc] 0:10[hbac] -
5:5[logp]
The American Chemical Society has raised concerns about the publicly
supported PubChem database, since it appears to directly compete with their existing
Chemical Abstracts Service. They have a strong interest in the issue since the
Chemical Abstracts Service generates a large percentage of the society's revenue. To
advocate their position against the PubChem database, ACS has actively lobbied the
US Congress.
ZINC “… a free database of commercially-available compounds for virtual
screening. ZINC contains over 13 million purchasable compounds in ready-to-dock,
3D formats.”
The ZINC database is a curated collection of commercially available
chemical compounds prepared especially for virtual screening. ZINC is used by
investigators (generally people with training as biologists or chemists) in
pharmaceutical companies, biotech companies, and research universities
ZINC is different from other chemical databases because it aims to represent
the biologically relevant, three dimensional form of the molecule.
ZINC is updated regularly and may be downloaded and used free of charge.
It is developed by John Irwin in the Shoichet Laboratory in the Department of
Pharmaceutical Chemistry at the University of California, San Francisco.
The latest release of the website interface is "ZINC 12"(2012). The database
contents are continuously updated. Static subsets are generated regularly and are
dated.
eMolecules “eMolecules discovers sources of chemical data by searching
the Internet, and receives submissions from data providers such as chemical suppliers
and academic research institutions.”
The typical eMolecules user is a research scientist, laboratory technician,
student, or procurement manager seeking small-molecule chemical compounds.
By area of specialization, typical users include:
Function: lab chemists, analytical chemists, medicinal chemists,
biochemists, and procurement managers.
Industry: pharmaceutical drug discovery, biotechnology, research
institutes, flavor and fragrance manufacturing, contract research organizations
(CRO’s), contract manufacturing organizations (CMO’s), and legal (patent search).
Academic: university labs, professors, graduate students, librarian
reference.
ChEBI “Chemical Entities of Biological Interest (ChEBI) is a freely
available dictionary of molecular entities focused on �small’ chemical
compounds.”
Chemical Entities of Biological Interest, also known as ChEBI, is a
database and ontology of molecular entities focused on 'small' chemical compounds,
that is part of the Open Biomedical Ontologies effort. The term "molecular entity"
refers to any "constitutionally or isotopically distinct atom, molecule, ion, ion pair,
radical, radical ion, complex, conformer, etc., identifiable as a separately
distinguishable entity".[3]
The molecular entities in question are either products of
nature or synthetic products used to intervene in the processes of living organisms.
Molecules directly encoded by the genome, such as nucleic acids, proteins and
peptides derived from proteins by proteolytic cleavage, are not as a rule included in
ChEBI.
ChEBI uses nomenclature, symbolism and terminology endorsed by the
International Union of Pure and Applied Chemistry (IUPAC) and Nomenclature
Committee of the International Union of Biochemistry and Molecular Biology (NC-
IUBMB).
All data in the database is non-proprietary or is derived from a non-
proprietary source. It is thus freely accessible and available to anyone. In addition,
each data item is fully traceable and explicitly referenced to the original source.
The ChEBI data is available through a public web interface, Web Service
and downloads.
NIST Chemistry WebBook “…provides thermochemical, thermophysical,
and ion energetics data compiled by NIST under the Standard Reference Data
Program.”
Compendium of Common Pesticide Names “This Compendium is
believed to be the only place where all of the ISO-approved standard names of
chemical pesticides are listed. It also includes more than 300 approved names from
national and international bodies for pesticides that do not have ISO names.”
For purposes of trade, registration and legislation, and for use in popular and
scientific publications, pesticides need names that are short, distinctive, non-
proprietary and widely-accepted. Systematic chemical names are rarely short and are
not convenient for general use, and so standards bodies assign common names to the
active ingredients of pesticides.
More than 1100 of these official common names for pesticides have been
assigned by the International Organization for Standardization (ISO). This
Compendium is believed to be the only place where all of the ISO-approved standard
names of chemical pesticides are listed.
The Compendium contains much more than ISO common names, with
nomenclature data sheets for more than 1700 different active ingredients and for
more than 350 ester and salt derivatives, made accessible by a comprehensive set of
indexes and a classification.
NMRShiftDB “… a NMR database (web database) for organic structures
and their nuclear magnetic resonance (nmr) spectra. It allows for spectrum prediction
(13C, 1H and other nuclei) as well as for searching spectra, structures and other
properties. Last not least, it features peer-reviewed submission of datasets by its
users.”
DrugBank “… a unique bioinformatics and cheminformatics resource that
combines detailed drug (i.e. chemical, pharmacological and pharmaceutical) data
with comprehensive drug target (i.e. sequence, structure, and pathway) information.
The database contains 6707 drug entries including 1436 FDA-approved small
molecule drugs, 134 FDA-approved biotech (protein/peptide) drugs, 83 nutraceuticals
and 5086 experimental drugs.”
Literature
ChEBI / Wikipedia – The Free Encyclopedia:
http://en.wikipedia.org/wiki/ChEBI
De Matos, P.; Alcantara, R.; Dekker, A.; Ennis, M.; Hastings, J.; Haug, K.;
Spiteri, I.; Turner, S. et al. (2009). "Chemical Entities of Biological Interest: An
update". Nucleic Acids Research 38 (Database issue): D249–54.
doi:10.1093/nar/gkp886. PMC 2808869. PMID 19854951.
//www.ncbi.nlm.nih.gov/pmc/articles/PMC2808869/
Degtyarenko, K.; De Matos, P.; Ennis, M.; Hastings, J.; Zbinden, M.;
McNaught, A.; Alcantara, R.; Darsow, M. et al. (2007). "ChEBI: A database and
ontology for chemical entities of biological interest". Nucleic Acids Research 36
(Database issue): D344–50. doi:10.1093/nar/gkm791. PMC 2238832.
PMID 17932057. //www.ncbi.nlm.nih.gov/pmc/articles/PMC2238832/
eMolecules: http://www.emolecules.com
Nic, M.; Jirat, J.; Kosata, B., eds. (2006–). "molecular entity". IUPAC
Compendium of Chemical Terminology (Online ed.). doi:10.1351/goldbook.M03986.
ISBN 0-9678550-9-8. http://goldbook.iupac.org/M03986.html
PubChem / Wikipedia – The Free Encyclopedia:
http://en.wikipedia.org/wiki/PubChem
Sixty-four free chemistry databases. – October 12th
,2011: http://depth-
first.com/articles/2011/10/12/sixty-four-free-chemistry-databases/
ZINC database / Wikipedia – The Free Encyclopedia:
http://en.wikipedia.org/wiki/ZINC_database