a bibliometric analysis of chemoinformatics presented at the 233 rd national meeting of the american...

27
A bibliometric analysis of chemoinformatics Presented at the 233 rd National Meeting of the American Chemical Society, Chicago, 25 th March 2007 Peter Willett, University of Sheffield, UK

Upload: karin-ryan

Post on 22-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

A bibliometric analysis of chemoinformatics

Presented at the 233rd National Meeting of the American Chemical Society, Chicago, 25th March 2007

Peter Willett, University of Sheffield, UK

Overview of talk

• Bibliometrics

• Chemoinformatics• Growth of the subject

• Subject coverage

• Key journals

• Author productivity

• Yvonne Martin’s contribution

Bibliometrics: what is it?

• Bibliometrics is: • “The application of mathematical and statistical

methods to books and other media” (A. Pritchard (1969), Statistical bibliography or bibliometrics?, J. Docum., Vol. 25, pp. 348-349)

• “The study, or measurement, of texts and information” (Wikipedia)

• See also:• Webometrics

• “The study of the quantitative aspects of the construction and use of information resources, structures and technologies on the Web drawing on bibliometric and informetric approaches" (L. Björneborn and P. Ingwersen (2004), Toward a basic framework for webometrics. J. Amer. Soc. Inf. Sci. Technol., Vol. 55, pp. 1216-1227)

• Cybermetrics, informetrics, scientometrics

Bibliometrics: subjects of study

• Bibliometric distributions• Highly skewed frequency distributions (Bradford,

Lotka, Zipf) and their implications

• Citation analysis• Analysis of individuals, institutions and journals

Use as performance indicators for the evaluation of research

• Philosophy of scienceSubject coverageAcademic collaborations

• Now extension to linkages between Web sitesSitations, cf citations

From chemical documentation to chemoinformatics

• Chemistry is an information-rich discipline• Chemisches Journal (1778); Chemical Abstracts (1907)• Chemical Titles (1961); Morgan and Sussenguth algorithms

(1965)

• But chemoinformatics a new arrival• M. Hann and R. Green (1999), Chemoinformatics - a new

name for an old problem?, Curr. Opin. Chem. Biol., Vol. 3, pp. 379-383

• “The use of information technology and management has become a critical part of the drug discovery process. Chemoinformatics is the mixing of those information resources to transform data into information and information into knowledge for the intended purpose of making better decisions faster in the area of drug lead identification and optimization” F.K. Brown (1998), Chemoinformatics: What is it and how does it impact drug discovery?, Ann. Reports Med. Chem., Vol. 33, pp. 375-384

• Take 1998 as the starting point for the bibliometric analyses

Bibliometric studies in chemoinformatics

• Onodera (2001)• Analysis of the subject coverage of Journal of

Chemical Information and Computer Sciences

• Redman et al. (2001)• Applications of the Cambridge Structural Database

• Bishop et al. (2003)• Citations to Sheffield chemoinformatics research

• Warr (2005)• Most cited papers in Journal of Chemical Information

and Computer Sciences

• Behrens and Luksch (2006)• Contents of the Inorganic Crystal Structure Database

Data sources for bibliometric research

• Web of Knowledge (WOK)• Long established as the data source for bibliometric

analyses• Recent addition of analysis tools (Analyse Results and

Citation Reports)• Probably still the most comprehensive

• New sources• Google

Google Scholar restricted to the scholarly literature

• ScopusNew service from Elsevier, offering similar

facilities to WOK

What shall we call it?

Term or phrase Google Google Scholar

WOK Scopus

Chemical documentation 695000 66 1 34

Chemical informatics 50,400 129 20 39

Chemical information management

978 42 4 28

Chemical information science

779 17 2 5

Chemiinformatics 2230 2 2 2

Cheminformatics 320,000 447 83 250

Chemoinformatics 191,000 5636 99 473

Google postings from http://www.molinspiration.com/chemoinformatics.html

0

50000

100000

150000

200000

250000

300000

350000

Chemo*

Chem*

• WOK search of the title, keyword and abstract fields for: • chemoinformatics OR cheminformatics OR “chemical

informatics”

• This search retrieved 197 records for the period 1998-2006 in 87 different sources• Of these, Journal of Chemical Information and

Modeling (and its predecessor) is clearly the core journal

Most frequently occurring sources

Source Citations

Abstracts of papers of ACS meeting 44

Journal of Chemical Information and Computer Sciences/Journal of Chemical Information and Modeling

22

Drug Discovery Today 11

Combinatorial Chemistry and High-Throughput Screening 5

Bioinformatics 5

Current Opinion in Drug Discovery and Development 4

Journal of Computer-Aided Molecular Design 4

Molecular Diversity 4

Quantitative Structure-Activity Relationships/QSAR & Combinatorial Science

4

Inter-journal relationships

• L. Leydesdorff (2007), "Visualization of the citation impact environment of scientific journals", J. Amer. Soc. Inf. Sci. Technol., Vol. 58, pp. 25-38.• Analysis of 2003-04 WOK data to identify journals that

provide >= 1% of the citations to/from a given journal

• For Journal of Chemical Information and Computer Sciences• 14 other “to” journals but only 5 other “from” journals

• Emerging, multi-disciplinary nature of the field means that a wide range of sources is used

Author productivity: I

• Analysis of the authors of all articles published 1998-2006 in:• Bioinformatics, Combinatorial Chemistry and High-

Throughput Screening and Journal of Biomolecular Screening

• Journal of Chemical Information and Modeling, Journal of Computer-Aided Molecular Design, Molecular Diversity and QSAR & Combinatorial Science

• Journal of Molecular Graphics and Modelling, Journal of Molecular Modeling and SAR and QSAR in Environmental Research

• Identification of the 20 most productive authors for each of these journals in 1998-2006

Author productivity: II

• Productive authors in the first group of journals did not publish frequently in the other two groups of journals, but fair degree of overlap between the journals in the other two groups • There are two authors in the top-20 for four journals, one author

in the top-20 for three journals and 12 authors in the top-20 for two journals

• Eight of the top-20 authors in Journal of Chemical Information and Computer Sciences are also top-20 authors in other journals

• Main degrees of overlap between• Journal of Chemical Information and Modeling and Journal of

Computer-Aided Molecular Design

• QSAR & Combinatorial Science and SAR and QSAR in Environmental Research

Overlap in “top-20” authors

JCAMD MD QSAR JMGM JMM SAR

JCICS 5 1 1 2 1 3

JCAMD 0 1 2 0 1

MD 0 2 1 0

QSAR 0 0 5

JMGM 2 0

JMM 0

The core literature

• A basic principle of bibliometrics is that citation corresponds to use, i.e., frequently cited papers are the most scientifically valuable

• NB the many exceptions…• “Classic” citations

• Critical citations

• Self-citation and close collaborators

• Journal Impact Factor games

• …but generally a valid assumption

• Analysis of citations to 4411 articles in seven chemoinformatics journals for 1998-2006 attracted a total of 35,228 citations

Most-cited papers: I

E. Lindahl et al. (2001), GROMACS 3.0: a package for molecular simulation and trajectory analysis, J. Mol. Model., Vol. 7, pp. 306-317

854

G. Schaftenaar and J.H. Noordik (2000), Molden: a pre- and post-processing program for molecular and electronic structures, J. Comput.-Aid. Mol. Design, Vol. 14, pp. 123-134

701

P. Willett et al. (1998), Chemical similarity searching, J. Chem. Inf. Comput. Sci., Vol. 38, pp. 983-996.

291

A.K. Dunker et al. (2001), Intrinsically disordered protein, J. Mol. Graph. Model., Vol. 19, pp. 26-59.

239

T.J.A. Ewing et al. (2001), DOCK 4.0: search strategies for automated molecular docking of flexible molecule databases, J. Comput.-Aid. Mol. Design, Vol. 15, pp. 411-428.

181

A. Golbraikh and A. Tropsha, A. (2002), Beware of q2!, J. Mol. Graph. Modell., Vol. 20, pp. 269-276.

167

M.D. Wessel et al. (1998), Prediction of human intestinal absorption of drug compounds from molecular structure, J. Chem. Inf. Comput. Sci., Vol. 38, pp. 726-735.

157

T.I. Oprea et al. (2001), Is there a difference between leads and drugs? A historical perspective, J. Chem. Inf. Comput. Sci., Vol. 41, pp. 1308-1315

145

Most-cited papers: II

• Certain types of article strongly represented in the top-30 positions• Software descriptions (9)

• Reviews (4)

• Drug-likeness (4)

• Binding energies (4)

• The first of these article-types might be thought of as the field’s “classic” citations (cf Journal of Chemical Information and Computer Sciences two most-cited articles)

Institutional productivity

• The following institutions all provide at least 1% of the papers in the seven journals• National Institute of Chemistry, Ljubljana, University of

Erlangen-Nurnberg, University of Sheffield, University of Minnesota, Environmental Protection Agency, Russian Academy of Sciences, Liverpool John Moores University, Pennsylvania State University, Chinese Academy of Sciences and the University of Cambridge

• Of top-50 institutions, Tripos (no. 27) and Pfizer (no. 36) are for-profit organisations

National productivity: the ten countries providing the most articles in the seven journals

USAGermanyEnglandPR ChinaFranceSpainItalyJapanIndiaSwitzerlandAll others

Yvonne Martin’s contribution

• Since starting her career in 1958 as a Research Assistant at Abbott Laboratories she has produced:• One authored and six edited books• 39 book chapters• Seven patents• 22 review articles and 60 refereed articles

• References to 73 of the articles in WOK, with >=2 articles in • J. Med. Chem. (28), J. Pharm. Sci. (7), Perspect.

Drug Discov. Design (7), J. Comput.-Aid. Mol. Design (5) and J. Chem. Inf. Model. (3)

• Also three references on optical microscopy by a (presumed) namesake

Citation analysis

• A total of 2714 citations to these 73 publications (plus a few more to conference abstracts)

• The notable difference between the mean (37.2) and median (13) numbers of citations mean that some of her publications have been very influential

• Eight (and counting) have more than 100 citations

Most-cited papers

R.D. Brown and Y.C. Martin (1996), Use of structure activity data to compare structure-based clustering methods and descriptors for use in compound selection, J. Chem. Inf. Comput. Sci., Vol. 36, pp. 572-584

321

I. Muegge and Y.C. Martin (1999), A general and fast scoring function for protein-ligand interactions: A simplified potential approach, J. Med. Chem., Vol. 42, pp. 791-804

256

Y.C. Martin et al. (1993), Fast new approach to pharmacophore mapping and its application to dopaminergic and benzodiazepine agonists, J. Comput.-Aid. Mol. Design, Vol. 7, pp. 83-102.

223

R.D. Brown and Y.C. Martin, The information content of 2D and 3D structural descriptors relevant to ligand-receptor binding, J. Chem. Inf. Comput. Sci., Vol. 37, pp. 1-9

154

M.A. Abreo et al. (1996), Novel 3-pyridyl ethers with subnanomolar affinity for central neuronal nicotinic acetylcholine receptors, J. Med. Chem., Vol. 39, pp. 817-825

142

Y.C. Martin (1992), 3D database searching in drug design, J. Med. Chem., Vol. 35, pp. 2145-2154

138

Y.C. Martin (1981), A practitioner’s perspective of the role of quantitative structure-activity analysis in medicinal chemistry, J. Med. Chem., Vol. 24, pp. 229-237.

121

Y.C. Martin et al. (2002), Do structurally similar molecules have similar biological activity? J. Med. Chem., Vol. 45, pp. 4350-4358

110

Brown and Martin (1996)

• The 321 citations are in 80 journals, with some frequent (>=10)• J. Chem. Inf. Model. (109), J. Med.

Chem. (24), J. Comput.-Aid. Mol. Design (15), J. Mol. Graph. Model. (15), Perspect. Drug. Discov. Design (13), Comb. Chem. High-Through. Screen. (11) and Drug Discov. Today (11)

• Wide range of disciplines with singleton-sources including• Advances in Informatics, Canadian

Journal of Physiology and Pharmacology, Grid Computing in the Life Sciences, IBM Journal of Research and Development, Journal of Immunology, Mathematics of Operations Research, and Technometrics

Muegge and Martin (1999)

• The 256 citations are in 76 journals, with some frequent (>=5)• J. Med. Chem. (69), J. Chem. Inf.

Model. (19), J. Comput.-Aid. Mol. Design (19), Proteins (13), J. Mol. Graph. Model. (9), J. Comput. Chem. (7), Bioorg. Med. Chem. (7), Bioorg. Med. Chem. Lett. (6), Curr. Med. Chem. (6), J. Mol. Biol. (5)

• Slightly more focused range of disciplines with singleton-sources including• Biomaterials, FEBS Letters,

International Journal of Quantum Chemistry, Journal of Biomedical Materials Research, Nucleic Acids Research, Oncology Reports, Organometallic Chemistry, and Protein Simulations

Martin et al. (1993)

• The 223 citations are in 75 journals, with some frequent (>=5)• J. Med. Chem. (38), J. Comput.-Aid.

Mol. Design (35), J. Chem. Inf. Model. (25), Perspect. Drug. Discov. Design (11), Acta Chim. Sinica (7), Bioorg. Med. Chem. (5), Mol. Pharm. (5)

• Wide range of disciplines with singleton-sources including• Algorithmica, Computational

Geometry, Cytochrome P450, European Journal of Operational Research, IEEE Transactions on Pattern Analysis and Machine Intelligence, Letters in Peptide Science, Machine Learning, and Trends in Cardiovascular Medicine

Conclusions

• Most academics are interested in their personal citation counts and in the impact factors for their favourite journals

• Bibliometrics has more general applications• Subject coverage

• Key players and articles

• Relationships between journals

• Recent developments facilitate the carrying-out of such analyses