commercial vendors & databases gary wiggins i571 fall 2006

81
Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Upload: denis-ross

Post on 17-Jan-2018

218 views

Category:

Documents


0 download

DESCRIPTION

Size of the Chemical Literature: 2002 Estimate ~ 50 million chemical substances ~ 6 million reagents ~ 7 million published reactions ~16,000 protein crystal structures ~250,000 small molecule x-ray structures --Robert Glen and Susan Aldridge (2002)

TRANSCRIPT

Page 1: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Commercial Vendors & Databases

Gary WigginsI571

Fall 2006

Page 2: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Factors in the Current Environment

• Interdisciplinary science• Consolidation of the Scientific-Technical-

Medical (STM) publishing world• Different cultures in the chemistry

publishing environment compared to that in biology

• Move to open access journals and data• Influence of the Web

Page 3: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Size of the Chemical Literature: 2002 Estimate

• ~ 50 million chemical substances• ~ 6 million reagents• ~ 7 million published reactions• ~16,000 protein crystal structures• ~250,000 small molecule x-ray structures

--Robert Glen and Susan Aldridge (2002)

Page 4: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Size of the Chemical Literature: 2006

• ~ 88 million chemical substances• ~ ? million reagents• ~ ? million published reactions• ~ 39,000 protein crystal structures• ~ 367,000 small molecule x-ray structures

Page 5: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Vendors and Publishers

• Partnership between commercial vendors and abstracting/indexing services (and to some extent with journal publishers)– Most activity in online searching started in the

early 1970s– Comparatively little change in the vendors’

search systems until relatively recently• Aggregation of databases• Cross-file searching• Command-driven access

Page 6: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Vendors of Chemical Databases• STN International (http://info.cas.org/stn.html)

– SciFinder and SciFinder Scholar (http://www.cas.org/)• Thomson Scientific (http://www.isinet.com) (ISI)• Questel (http://www.questel.orbit.com/index.htm) (Orbit)

– Merged Markush Service• Thomson Dialog (http://www.dialog.com/)• Elsevier Scopus (http://www.info.scopus.com/) • Elsevier MDL (http://www.mdl.com/)• US National Library of Medicine (http://www.nlm.nih.gov/)• Ovid Technologies (http://www.ovid.com/)• CSA (Cambridge Scientific Abstracts) (http://www.csa.com/)• Chemical Information System (http://www.nisc.com/cis/qcis1.asp)• Knovel (http://www.knovel.com/)• Technical Database Services (http://www.tdsonline.com/)

Page 7: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

STN International

• Partnership among Chemical Abstracts Service, FIZ Chemie, and the Japan Science and Technology Corporation

• Has over 200 STM databases– STN Database Summary Sheets: http://

info.cas.org/ONLINE/DBSS/dbsslist.html– Includes some databases also available free

through other venues (e.g., Medline, GenBank)

Page 8: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Features in Commercial Systems• Concept of the Basic Index

– Default field; in bibliographic databases often limited to keywords from titles, abstracts, and index terms

• Special Boolean operators (proximity, adjacency, etc.)• Truncation (wild cards and left-hand or right-hand truncation)• Controlled vocabulary tools (MeSH, CAS’s Index Guide, CA

Lexicon)• Classification of the documents

– PACS (Physics and Astronomy Classification Scheme)– CA Sections/Subsections

• Structure searching (usually range from exact to full substructure search)

• Numeric and other data that is searchable• Data analysis tools• Current awareness options

Page 9: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Vocabulary Control

• “The entities we deal with, such as genes, sequences, and chemical data, and manipulate and analyze in the context of bioinformatics and biomedical research have not always been properly defined. There are no control vocabularies, no standards for much of the data, and no unified way to refer to them.”

--Pablo Tamayo, senior computational biologist and manager, cancer genomics informatics, MIT Broad Institute, quoted in Drug Discovery & Development August 2004, 7(8), 52.

Page 10: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Command Language Systems

• Allow field-directed searches• Incorporate sophisticated Boolean

relationships– AND, OR, NOT– Adjacency, Proximity, Logical linking to the

same field or sub-field of a record– Numbers of intervening words can be

specified• User must learn the commands

Page 11: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

User-Oriented Software

• Front-end systems to mask command language– STN’s SciFinder (&SF Scholar)– STN on the Web, STNEasy, STN Express– Elsevier MDL’s CrossFire Commander and

DiscoveryGate– Questel’s QWeb and Imagination

Page 12: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Main Chemical Databases

• Chemical Abstracts• Beilstein/Gmelin• Cambridge Structural Database• Protein Data Bank• Many other relevant databases

Page 13: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

CAS DBs: CA File• CA File, a bibliographic database covering

journal articles (from ~9500 journals), technical reports, conference proceedings, dissertations, patents and other literature

• 1907 to the present (and some earlier); full indexing was added retrospectively for all records

• Linked through the Registry Number to compound data

• CAplus File, includes CA File data plus e-journals, some preprints, and all articles from ~1500 key chemical journals within one week of receipt

Page 14: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Old References Recently Added to CA Database

The boiling-point curve for mixtures of ethyl alcohol and water. Noyes, William A.; Warfel, R. R. Rose Polytechnic Institute, Terre Haute, Journal of the American Chemical Society (1901), 23(7), 463-8. CODEN: JACSAT ISSN: 0002-7863. Journal written in English. CAN 0:1311 AN 1906:1311 CAPLUS (Copyright 2004 ACS on SciFinder (R))

AbstractIn the determination with small amounts of alcohol, the readings of the

thermometer were taken when the vapors first entered the condenser, as after boiling for a few minutes a relatively large proportion of the alcohol present would be found in the upper layers and in the condenser. The thermometer under these conditions registered about 0.3 higher. An examination of the table and curve revealed that the minimum boiling point is for alcohol of 96% by weight. The curve was steeper on the side toward absolute alcohol. Alcohol of 90.7% had the same boiling point as absolute alcohol.

Page 15: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Relative Contributions of Literature Types to CA

Used with the permission of Chemical Abstracts Service (CAS), a division of the American Chemical Society, from:http://www.cas.org/casdb.html

Page 16: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Growth of Articles in CA

Year Articles Abstracted1907 7,994 1945 22,824 1960 104,484 1970 230,902 1980 407,342 1990 394,945 2000 573,469 2005 737,480

Source: http://www.cas.org/EO/casstats.pdf

Page 17: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Basic Index from the CA FileField Name |Examples

Basic Index: single words | S 50-21-5 from title (TI), supplementary | S ?FLUOROCARBON? term (ST), index term (IT), | S (WATER(S)OIL)/BI and abstract field (AB), as | S transgenic cottonwell as CAS Registry Numbers.

Page 18: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Special Fields in the CA File

• In addition to the standard bibliographic citation data, have:– Controlled Terms (CT) or (IT)– Classification Codes (CC: the 80 section codes

into which the content of the paper CA is divided: http://www.cas.org/PRINTED/sects.html)

– Document type (DT)– Language Code (LA)– Role (RL)

Page 19: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

CAS Roles• Used in conjunction with chemical substance searches• Seven super roles, e.g., ANST, BIOL, CMBI, FORM,

OCCU, PREP, PROC, RACT, USES• Over 60 more specific role descriptors, e.g., with PREP:

– BMF Bioindustrial manufacture – BPN Biosynthetic preparation – BYP Byproduct– Combinatorial preparation – IMF Industrial manufacture – PNU Preparation, unclassified– PUR Purification or recovery– SPN Synthetic preparation

• Also two roles not up-posted to super roles: PRP (Properties) and MSC (Miscellaneous)

Page 20: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

CAS DBs: Registry File

• “Authority” file that lets indexers and searchers definitively identify a substance as new or find a previous entry

• Contains all types of chemical substances, including biomolecules

• Best file for chemical names• Many physical properties being added• Linked to CA and other files through the

Registry Number (RN)

Page 21: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

CAS Registry Number

• Serves as the accession number in the Registry File

• RN has no meaning– Example: Isatin is 91-56-5

Page 22: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Registry File Contents

• Includes synonyms, molecular formulas, alloy composition tables, classes for polymers, nucleic acid and protein sequences, ring analysis data, and structure diagrams

• Also: experimental and calculated property data from various sources as well as super roles and document type information from CAplus

Page 23: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Registry File Contents

• 87,711,955 substances have a RN in the Registry File as of 9/8/2006

• All substances in CAS files plus others• Many physical constants now added to the

records, most of them calculated– Lipinski Rule of Five values– BP, MP, Density, Optical Rotatory Power,

Refractive Index– Data for 3D visualization

Page 24: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Size of the Registry File

Date Friday, 9/8/2006

Count 29,884,228 organic and inorganic substances 57,827,727 sequences

CAS Registry Number

Source:

906063-52-3 most recent CAS RN http://www.cas.org/cgi-bin/regreport.pl

Page 25: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

CAS DBs: CASReact

• Derived from journal and patent documents from 1840 to date

• Contains both single-step and multistep reactions

• Structure searchable• Contains yield data, reaction conditions,

etc.

Page 26: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

CAS Databases: Other• CHEMCATS--information about commercially

available chemicals and their worldwide suppliers

• CHEMLIST--contains chemical substances on national inventories

• MARPAT--more than 500,000 Markush structure records for patents found in the CA File with patent publication year 1988 to the present

• TOXCENTER--covers the pharmacological, biochemical, physiological, and toxicological effects of drugs and other chemicals

Page 27: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

SciFinder and SciFinder Scholar

• Includes access to the CA, Registry, CHEMCATS, CHEMLIST files, plus Medline (1957-)

• Easy structure searching capabilities• Integrated with ChemPort for easy access

to the primary literature• Download page for SFS:

– http://www.libraries.iub.edu/index.php?pageId=2114

Page 28: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

SciFinder Scholar Under the Hood

• A. Ben Wagner’s look at what really underlies the apparent simplicity of SFS searches

• http://ublib.buffalo.edu/libraries/e-resources/SciFinder/SciFinder200dpi.pdf

Page 29: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

PubChem: A Threat to CAS?

• PubChem, part of the NIH Roadmap plan under the Molecular Libraries and Imaging Initiative

• Several million compounds already in the database

• To be linked to assay data from High Throughput Screening analyses

• http://pubchem.ncbi.nlm.nih.gov/

Page 30: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

InChI: Another Threat?• IUPAC-NIST Chemical Identifier• a unique label which would be a non-proprietary

identifier for chemical substances that could be used in printed and electronic data sources thus enabling easier linking of diverse data compilations

• latest version handles:– organic, covalent structures– inorganic and organometallic compounds

• http://chemdata.nist.gov/IChI/INChIv11b.zip

Page 31: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Beilstein Database

• Covers organic chemistry back to 1771• Includes many physical properties• Includes reaction information• Structure searchable• Available on the CrossFire Commander

system (and direct from MDL via DiscoveryGate) for academic institutions

Page 32: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Gmelin Database

• Covers inorganic and organometallic chemistry back to 1771

• Includes many physical and chemical properties

• Not searchable for reactions• Accessible through the CrossFire

Commander system (and direct from MDL via DiscoveryGate) for academic institutions

Page 33: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

MDL’s CrossFire Commander

• Download page for Commander at IU:– http://www.libraries.iub.edu/index.php?pageId=2114

Page 34: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

DiscoveryGate for Academics• CrossFire Beilstein • CrossFire Gmelin • MDL® Available Chemicals Directory • MDL® Screening Compounds Directory • MDL® Reference Library of Synthetic Methodology • MDL® Solid-Phase Organic Reactions • ORGSYN (Organic Syntheses) Database • Encyclopedia of Reagents for Organic Synthesis • Comprehensive Organic Functional Group Transformations • Comprehensive Asymmetric Catalysis • MDL® Comprehensive Medicinal Chemistry • MDL® Drug Data Report • MDL® Metabolite Database • MDL® Toxicity Database • ChemInform Reaction Library • Current Synthetic Methodology • Derwent Journal of Synthetic Methods • National Cancer Institute Database• http://www.mdl.com/solutions/solutions_for/academics/dg_academics.jsp

Page 35: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Reaction Databases• CASReact• SPRESI

– http://www.spresi.com/• Organic Syntheses

– Free version: http://chemfinder.cambridgesoft.com/reactions/orgsyn.asp

• ISI’s Index Chemicus• e-EROS (Encyclopedia of Reagents for Organic

Synthesis)• MDL’s Integrated Major Reference Works

– Reactions indexed with InfoChem’s CLASSIFY Reaction Classification Code, based on the degree of specificity around the reacting center:

– http://www.infochem.de/content/downloads/classify.pdf

Page 36: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Cross-Product Approaches• MDL/InfoChem’s Integrated Major Reference

Works– Thieme’s Science of Synthesis (successor to

Houben–Weyl)– Springer’s Comprehensive Asymmetric Synthesis and

their Glycoscience– Elsevier Science’s Comprehensive Organic

Functional Group Transformations– Wiley’s Encyclopedia of Reagents for Organic

Synthesis– Links to primary journal literature.

Page 37: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Physical Property Databases

• Beilstein & Gmelin• CRC Handbook (CHEMnetBASE)• Ei ChemVillage• Knovel

– Perry’s Chemical Engineers’ Handbook– Lange’s Handbook of Chemistry

• Landolt-Börnstein• CAS Registry File

Page 38: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Spectral Databases

• Bio-Rad• Aldrich• NIST Chemical WebBook• Some high-quality free databases on the

Web, e.g.,• SDBS, Spectral Database for Organic

Compounds– http://www.aist.go.jp/RIODB/SDBS/menu-e.html

Page 39: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

SDBS IR Spectrum for Traumatic Acid

Page 40: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

CCDC

Page 41: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Isatin on the CSD

Page 42: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Cambridge Structural Database

• Bibliographic, chemical and crystallographic information for: – organic molecules – metal-organic compounds

• 3D structures have been determined using:– X-ray diffraction – neutron diffraction

• The CSD records results of: – 3D atomic coordinate data for at least all non-H atoms

Page 43: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

CSD components

• ConQuest: search and information retrieval

• Mercury: structure visualization • Vista: numerical analysis• PreQuest: database creation

Page 44: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Accessing the CSD at IUB

• Download the Citrix Metaframe client at:– http://

www.citrix.com/site/SS/downloads/downloads.asp?dID=2755

• Connect to IUB via VPN and link to:– http://www.libraries.iub.edu/scripts/

countResources.php?resourceId=1399945– For IUPUI, ask Kelsey Forsythe

Page 45: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Other Structural Databases• Protein Data Bank for polypeptides and

polysaccharides having more than 24 units http://www.rcsb.org/pdb/

• Nucleic Acids Database for oligonucleotides http://ndbserver.rutgers.edu/

• Inorganic Crystal Structure Database http://www.fiz-informationsdienste.de/en/DB/icsd/

• CRYSTMET® for metals and alloys http://www.tothcanada.com/

Page 46: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Materials Chemistry Databases

• TDS specializes in chemical engineering data. Includes:– American Institute of Chemical Engineers’

DIPPR Pure Component Data• 29 fixed-value properties and 13 temperature-

dependent properties for about 1600 industrial chemicals

Page 47: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Patent Databases

• Derwent World Patents Index• USPATFULL• PCTFULL (WIPO/PCT Patents Full Text)• INPADOC (INternational PAtent

DOcumentation Center)• IFIPAT• CA and CAplus• MDL Patent Chemistry Database

Page 48: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Chemical Information System

• 34 environmental databases– Originally developed by the US National Institutes of

Health and the Environmental Protection Agency• Covers over 515,000 compounds

– Toxicological and/or carcinogenic research data – information on handling hazardous materials– chemical/physical property information– Regulations– safety and health effects information– pharmaceutical data

Page 49: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Hybrid Links to the Web• STN’s eScience

– http://www.escience.org/• Elsevier Science’s Scirus

– http://www.scirus.com/srsapp/• Elsevier Science’s Scopus (includes Scirus)

– http://www.info.scopus.com– 15,000 titles going back to the mid 1960s– More than 500,000 records link to the Beilstein

database on either CrossFire or DiscoveryGate– 250 million web sources

Page 50: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Traumatic Acid: SFS eScience

Page 51: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Electronic Journals

• Coverage in some cases back to the 17th century

• Most major publishers’ backfiles are now online

• CrossRef• DOI• SFX

Page 52: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Shift from Ownership to Licensing of Journals

• IUB Chemistry Library e-journals– http://www.indiana.edu/~libchem/ejournals.html

• Shift away from ownership• Archival issues

– Publisher archives (usually 2-3 locations)– LOCKSS and other proposals– Libraries often have no archival rights

Page 53: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Archival Issues

• “Given their transitory nature, are commercial and even society publishers the parties to which we want to entrust the task of keeping and preserving human knowledge?”

William W. Armstrong, Chemistry Librarian, Louisiana State University, C&EN 10 October 2005, 83(41), 53

Page 54: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Single Publisher Databases

• Elsevier’s ScienceDirect and their encyclopedia DBs– Scirus: http://www.scirus.com/srsapp/

• Wiley’s journal, book, and encyclopedia DBs: http://www3.interscience.wiley.com/

• American Chemical Society journals– http://pubs.acs.org/

Page 55: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

CrossRef

• CrossRef Search http://www.crossref.org/crossrefsearch.html

• Pilot initiative running in 2004 in collaboration with Google

• Includes the content of 45 publishers (out of the 1488 CrossRef publishers and societies)

• Now covers approximately 6.5 million research articles

• Allows XML searching

Page 56: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Getting at the Data

• New CAS Information Use Policies– http://www.cas.org/infopolicy.html

• STN’s Information Keep & Share Program– http://info.cas.org/copyright/index.html

• SciFinder Scholar download restrictions: 100 items at a time

Page 57: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Data Analysis Tools

• STN’s Analyze and Tabulate feature• STN Express with Discover! (Analysis Edition)• STN AnaVist

– http://www.cas.org/stnanavist/prices.html– Flat fee cost based on size of the answer set

analyzed (ranges from $230 for up to 1,000 to $850 for up to 20,000)

• Limited access to records because of A&I publishers’ fear of data piracy

Page 58: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Open Access• Institute of Physics: most papers free for 30 days

after publication– http://www.iop.org/EJ/ and

http://www.iop.org/EJ/journal/NJP• Public Library of Science

– http://www.publiclibraryofscience.org• Highwire Press

– http://www.highwire.org/• PubMed Central

– http://www.pubmedcentral.nih.gov/

Page 59: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Budapest Open Access Initiative

• Based on:– Self archiving by authors– Open Access journals, e.g., BioMed Central

• http://www.soros.org/openaccess/

Page 60: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Open Access + Semantic Web• "Almost all of an author's output (compounds,

spectra, reactions, properties, etc.) is nowadays computerised and in principle redistributable to the community for re-use. Few journals actively validate the primary data (e.g. spectra) involved in a publication (chemical crystallography being a clear exception where data are intensively reviewed by machine). We reassert that chemists must now move towards publishing their collective knowledge in a systematic and easily accessible form for re-use and innovation....

Page 61: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Open Access + Semantic Web• We urge that authors, funders, editors,

publishers and readers move further towards the following protocol: [1] All information should be ultimately machine-

understandable in XML....[2] Machine-understandable information for a compound

should include a connection table, the IUPAC unique identifier (InChI) which guarantees that the connection table can be checked and regenerated, and a name....

[3] Rights metadata.”-- Murray-Rust, Rzepa, Tyrrella, Zhanga (2004)

Page 62: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Opposition to Open Access

• Reacting to NIH’s proposed policy on open access, C&EN Editor Rudy Baum says:

“[This] action will inflict long-term damage on the communication of scientific results and on maintenance of the archive of scientific knowledge.”

-- C&EN, September 20, 2004, p. 7

Page 63: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Recent Legislative Action

• In the US, Senators Cornyn and Lieberman introduced S. 2696 (109th Congress, 2nd session).– Federal Research Public Access Act of 2006

• In the UK, half of the Research Councils UK and the Wellcome trust have endorsed open access.– http://www.rcuk.ac.uk/access/index.asp – http://www.wellcome.ac.uk/doc_WTD002766.html

Page 64: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Free Services• ChemFinder

– http://chemfinder.cambridgesoft.com/• ChemIDplus

– http://chem.sis.nlm.nih.gov/chemidplus/• Frederick/Bethesda Data and Online Services

– http://cactus.nci.nih.gov/• PubMed

– http://www.ncbi.nlm.nih.gov/entrez/query.fcgi• DOE’s STI Information Bridge

– http://www.osti.gov/bridge/• CICC guide to free chemistry databases

– http://www.chembiogrid.org/related/resources/databases.html

Page 65: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Future

• XML and metadata– Dymond (DYnamic Metadata ON Demand)

• Virtual journals (Virtual Journal of Nanoscale Science and Technology)

• Copyright question and open access resolution• Legal protection of databases• Impact of InChI and CML• Demise of Abstracting and Indexing Services?

Page 66: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Conclusion

• “The main challenge is for chemists to recognise the value of making their data machine-understandable, rather than destroying it with traditional paper or slide-focused publication and dissemination processes.”-- Murray-Rust, Rzepa, Tyrrella, Zhanga (2004)

Page 67: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006
Page 68: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

What is Citation Indexing?• Utilizes a known relevant document regardless

of when published to find newer journal articles that have cited that document

• Assumption: Authors who are citing the document must be writing on a related topic– Citation indexing lets you find newer articles from an

older reference– Found on other tools, e.g., SciFinder Scholar,

SCOPUS, but citation indexing doesn’t go as far back as does SCI

• Gets around the problems of doing a subject search when you aren’t sure of the words to use

Page 69: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Source Journal Coverage

• SCIE: 5700 titles• SSCI: 1735 titles*• A&HCI: 1145 titles*

*also includes selected articles from SCIE• Weekly updates

– Lag time: 2-3 weeks• Journal List: http://www.isinet.com/journals/

Page 70: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Web of Science Search Screen

Page 71: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Search Example: Cited Reference Searching

• Use the Full General Search and Cited Reference Search Option

• Find publications that have cited the works of Donald E. Linn.– Dots before his name indicates he is not the

first listed author on the publication.– Links are to ISI source journals.– Unlinked items may be incorrect forms of the

reference.

Page 72: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

SCI Cited Ref Search for DE Linn

Page 73: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Lookup Results for DE Linn Search

Page 74: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

DE Linn’s 2003 JACS Article

Page 75: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Newer Articles Citing the 2003 JACS Article by DE Linn et al.

Page 76: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Analysis of All Authors Citing DE Linn

Page 77: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Searches

• Isatin (91-56-5)• Moronic Acid (RN 6713-27-5)• Traumatic Acid (RN 6402-36-4)• Others:

http://www.chm.bris.ac.uk/sillymolecules/sillymols.htm

Page 78: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Beilstein Structure Search

R1=O or S R2=H, OH, OMe, CH3, or CO2H

X = any halogen ? = any bond value

Page 79: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Bibliography• Kaufman-Wills Group LLC. The Facts About Open Access. Association of

Learned and Professional Societies, 2005. ISBN 0-907341-29-2

• Culp, F. Bartow. "Ten or so things that every chemistry librarian absolutely, positively has to have to keep from being an absolute plonk." Sci-Tech News, February 2004, 58(1), 9. also published as: SLA Chemistry Division E-Newsletter Winter 2004, 18(3), 19-20).

http://www.sla.org/division/dche/Newsletters/Feb_2004.pdf

• Gasaway, Laura. “The open archives movement.” Information Outlook October 2004, 8(10), 36, 39-40.

• Glen, Robert; Aldridge, Susan. “Developing tools and standards in molecular informatics.” Chemical Communications 2002, (23), 2745-2747. DOI: 10.1039/b207793k

http://xlink.rsc.org/?DOI=b207793k

Page 80: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Bibliography• Huber, C.; Porter, K. “Cheap tricks.”http://www.indiana.edu/~cheminfo/workshop/cheap.html

• McLeland, Le-Nhung. What every chemist should know about patents.

http://www.chemistry.org/portal/resources/?id=1b41692a6cf811d6f8dd6ed9fe800100

• Murray-Rust, Peter; Rzepa, Henry S.; Tyrrella, Simon M.; Zhanga, Y. “Representation and use of chemistry in the global electronic age.” Organic & Biomolecular Chemistry 2004, 2, 3192-3203.

http://www.ch.ic.ac.uk/rzepa/obc/ (preprint)

Page 81: Commercial Vendors & Databases Gary Wiggins I571 Fall 2006

Bibliography• Wagner, A. Ben. "Finding physical properties of chemicals: A

practical guide for scientists, engineers, and librarians.” Science & Technology Libraries 2001, 21(3/4), 27-45. (published Fall 2003)Text for personal and professional use available at:

http://ublib.buffalo.edu/libraries/asl/staff/documents/wagner_phys_prop_stl_art.pdf

• Wiggins, Gary. “Overview of databases/data sources.” in Gasteiger, Johannes, ed. Handbook of Chemoinformatics: From Data to Knowledge in 4 Volumes. Wiley-VCH: 2003, v. 2, pp. 496-506.

http://www.indiana.edu/~cheminfo/C571/wiggins_chapter_2003.pdf

• Wiggins, Gary. “Teaching chemical literature, databases, and chemical informatics.” CPT; Committee on Professional Training [newsletter] Spring 2004, 4(1), 1-2.

http://acswebcontent.acs.org/PDF/cpt/nl_cpt_spring2004.pdf