a bibliometric analysis of chemoinformatics

32
A bibliometric analysis of chemoinformatics Presented at the 25th Anniversary Meeting of the Molecular Graphics and Modelling Society, School of Oriental and African Studies, London 13th March 2007 Peter Willett, University of Sheffield, UK

Upload: aricin

Post on 09-Jan-2016

86 views

Category:

Documents


6 download

DESCRIPTION

A bibliometric analysis of chemoinformatics. Presented at the 25th Anniversary Meeting of the Molecular Graphics and Modelling Society, School of Oriental and African Studies, London 13th March 2007 Peter Willett, University of Sheffield, UK. Overview of talk. Bibliometrics Chemoinformatics - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A bibliometric analysis of chemoinformatics

A bibliometric analysis of chemoinformatics

Presented at the 25th Anniversary Meeting of the Molecular Graphics and Modelling Society, School of Oriental and African Studies, London 13th March 2007

Peter Willett, University of Sheffield, UK

Page 2: A bibliometric analysis of chemoinformatics

Overview of talk

• Bibliometrics

• Chemoinformatics• Growth of the subject

• Subject coverage

• Author productivity

• The Journal of Molecular Graphics (and Modelling)

Page 3: A bibliometric analysis of chemoinformatics

Bibliometrics: what is it?

• Bibliometrics is: • “The application of mathematical and statistical

methods to books and other media” (A. Pritchard (1969), Statistical bibliography or bibliometrics?, J. Docum., Vol. 25, pp. 348-349)

• “The study, or measurement, of texts and information” (Wikipedia)

• See also:• Webometrics

• “the study of the quantitative aspects of the construction and use of information resources, structures and technologies on the Web drawing on bibliometric and informetric approaches" (L. Björneborn and P. Ingwersen (2004), Toward a basic framework for webometrics. J. Amer. Soc. Inf. Sci. Technol., Vol. 55, pp. 1216-1227)

• Cybermetrics, informetrics, scientometrics

Page 4: A bibliometric analysis of chemoinformatics

Bibliometrics: subjects of study

• Bibliometric distributions• Highly skewed frequency distributions (Bradford,

Lotka, Zipf) and their implications

• Citation analysis• Analysis of individuals, institutions and journals

Use as performance indicators for the evaluation of research

• Philosophy of scienceSubject coverageAcademic collaborations

• Now extension to linkages between Web sitesSitations, cf citations

Page 5: A bibliometric analysis of chemoinformatics

From chemical documentation to chemoinformatics

• Chemical documentation is long established• Chemisches Journal started in 1778• Chemical Abstracts started in 1907

• First computer-based information systems and services in Sixties• Chemical Titles in 1961• Morgan and Sussenguth algorithms in 1965

• Recent emergence of chemoinformatics • M. Hann and R. Green (1999), Chemoinformatics -

a new name for an old problem?, Curr. Opin. Chem. Biol., Vol. 3, pp. 379-383.

Page 6: A bibliometric analysis of chemoinformatics

Chemoinformatics: definitions

• “The use of information technology and management has become a critical part of the drug discovery process. Chemoinformatics is the mixing of those information resources to transform data into information and information into knowledge for the intended purpose of making better decisions faster in the area of drug lead identification and optimization” F.K. Brown (1998), Chemoinformatics: What is it and how does it impact drug discovery?, Ann. Reports Med. Chem., Vol. 33, pp. 375-384

• Take 1998 as the starting point for the bibliometric analyses• Many alternatives, e.g.

• “Chem(o)informatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination, visualization and use of chemical information” G Paris (August 1999 ACS meeting), quoted by W.A. Warr at http://www.warr.com/warrzone.htm

• “Chemoinformatics is the application of informatics methods to solve chemical problems” J. Gasteiger and T. Engels (2003), Chemoinformatics: a Textbook, Wiley-VCH.

Page 7: A bibliometric analysis of chemoinformatics

Bibliometric studies in chemoinformatics

• Onodera (2001)• Analysis of the subject coverage of Journal of

Chemical Information and Computer Sciences

• Redman et al. (2001)• Applications of the Cambridge Structural Database

• Bishop et al. (2003)• Citations to Sheffield chemoinformatics research

• Warr (2005)• Most cited papers in Journal of Chemical Information

and Computer Sciences

• Behrens and Luksch (2006)• Contents of the Inorganic Crystal Structure Database

Page 8: A bibliometric analysis of chemoinformatics

Data sources for bibliometric research

• Web of Knowledge (WOK)• Long established as the data source for bibliometric

analyses• Recent addition of analysis tools (Analyse Results and

Citation Reports)• Probably still the most comprehensive

• New sources• Google

Google Scholar restricted to the scholarly literature

• ScopusNew service from Elsevier, offering similar

facilities to WOK

Page 9: A bibliometric analysis of chemoinformatics

What shall we call it?

Term or phrase Google Google Scholar

WOK Scopus

Chemical documentation 695000 66 1 34

Chemical informatics 50,400 129 20 39

Chemical information management

978 42 4 28

Chemical information science

779 17 2 5

Chemiinformatics 2230 2 2 2

Cheminformatics 320,000 447 83 250

Chemoinformatics 191,000 5636 99 473

Page 10: A bibliometric analysis of chemoinformatics

Google postings from http://www.molinspiration.com/chemoinformatics.html

0

50000

100000

150000

200000

250000

300000

350000

Chemo*

Chem*

Page 11: A bibliometric analysis of chemoinformatics

• WOK search of the title, keyword and abstract fields for: • chemoinformatics OR cheminformatics OR “chemical

informatics”

• This search retrieved 197 records for the period 1998-2006 in 87 different sources• Of these, Journal of Chemical Information and

Modeling (and its predecessor) is clearly the core journal

Page 12: A bibliometric analysis of chemoinformatics

Most frequently occurring sources

Source Citations

Abstracts of papers of ACS meeting 44

Journal of Chemical Information and Computer Sciences/Journal of Chemical Information and Modeling

22

Drug Discovery Today 11

Combinatorial Chemistry and High-Throughput Screening 5

Bioinformatics 5

Current Opinion in Drug Discovery and Development 4

Journal of Computer-Aided Molecular Design 4

Molecular Diversity 4

Quantitative Structure-Activity Relationships/QSAR & Combinatorial Science

4

Page 13: A bibliometric analysis of chemoinformatics

Inter-journal relationships

• L. Leydesdorff (2007), "Visualization of the citation impact environment of scientific journals", J. Amer. Soc. Inf. Sci. Technol., Vol. 58, pp. 25-38.• Analysis of 2003-04 WOK data to identify journals that

provide >= 1% of the citations to/from a given journal

• For Journal of Chemical Information and Computer Sciences• 14 other “to” journals but only 5 other “from” journals

• Multi-disciplinary nature of the field means that a wide range of sources are used

Page 14: A bibliometric analysis of chemoinformatics

Author productivity: I

• Analysis of the authors of all articles published 1998-2006 in:• Bioinformatics, Combinatorial Chemistry and High-

Throughput Screening and Journal of Biomolecular Screening

• Journal of Chemical Information and Modeling, Journal of Computer-Aided Molecular Design, Molecular Diversity and QSAR & Combinatorial Science

• Journal of Molecular Graphics and Modelling, Journal of Molecular Modeling and SAR and QSAR in Environmental Research

• Identification of the 20 most productive authors for each of these journals in 1998-2006

Page 15: A bibliometric analysis of chemoinformatics

Author productivity: II

• Productive authors in the first group of journals did not publish frequently in the other two groups of journals, but fair degree of overlap between the journals in the other two groups (Molecular Diversity the least)• There is one author in the top-20 for four journals, two authors in

the top-20 for three journals and 12 authors in the top-20 for two journals

• Eight of the top-20 authors in Journal of Chemical Information and Computer Sciences are also top-20 authors in other journals

• Main degrees of overlap between• Journal of Chemical Information and Modeling and Journal of

Computer-Aided Molecular Design

• QSAR & Combinatorial Science and SAR and QSAR in Environmental Research

Page 16: A bibliometric analysis of chemoinformatics

Overlap in “top-20” authors

JCAMD MD QSAR JMGM JMM SAR

JCICS 5 1 1 2 1 3

JCAMD 0 1 2 0 1

MD 0 2 1 0

QSAR 0 0 5

JMGM 2 0

JMM 0

Page 17: A bibliometric analysis of chemoinformatics

The core literature

• A basic principle of bibliometrics is that citation corresponds to use, i.e., frequently cited papers are the most scientifically valuable

• NB the many exceptions…• “Classic” citations

• Critical citations

• Self-citation and close collaborators

• Journal Impact Factor games

• …but generally a valid assumption

• Analysis of citations to 4411 articles in seven chemoinformatics journals for 1998-2006 attracted a total of 35,228 citations

Page 18: A bibliometric analysis of chemoinformatics

Most-cited papers: I

E. Lindahl et al. (2001), GROMACS 3.0: a package for molecular simulation and trajectory analysis, J. Mol. Model., Vol. 7, pp. 306-317

854

G. Schaftenaar and J.H. Noordik (2000), Molden: a pre- and post-processing program for molecular and electronic structures, J. Comput.-Aid. Mol. Design, Vol. 14, pp. 123-134

701

A.K. Dunker et al. (2001), Intrinsically disordered protein, J. Mol. Graph. Model., Vol. 19, pp. 26-59.

239

T.J.A. Ewing et al. (2001), DOCK 4.0: search strategies for automated molecular docking of flexible molecule databases, J. Comput.-Aid. Mol. Design, Vol. 15, pp. 411-428.

181

M.D. Wessel et al. (1998), Prediction of human intestinal absorption of drug compounds from molecular structure, J. Chem. Inf. Comput. Sci., Vol. 38, pp. 726-735.

157

T.I. Oprea et al. (2001), Is there a difference between leads and drugs? A historical perspective, J. Chem. Inf. Comput. Sci., Vol. 41, pp. 1308-1315

145

H.-J. Bohm (1998), Prediction of binding constants of protein ligands: A fast method for the prioritization of hits obtained from de novo design or 3D database search programs, J. Comput.-Aid. Mol. Design, Vol. 12, pp. 309-323

143

J.A. Platts et al. (1999), Estimation of molecular linear free energy relation descriptors using a group contribution approach, J. Chem. Inf. Comput. Sci., Vol. 39, pp. 835-845.

137

Page 19: A bibliometric analysis of chemoinformatics

Most-cited papers: II

• Certain types of article strongly represented in the top-30 positions• Software descriptions (9)

• Reviews (4)

• Drug-likeness (4)

• Binding energies (4)

• The first of these might be thought of as the field’s “classic” citations (cf Journal of Chemical Information and Computer Sciences two most-cited articles)

Page 20: A bibliometric analysis of chemoinformatics

Institutional productivity

• The following institutions all provide at least 1% of the papers in all of the seven journals• National Institute of Chemistry, Ljubljana, University of

Erlangen-Nurnberg, University of Sheffield, University of Minnesota, Environmental Protection Agency, Russian Academy of Sciences, Liverpool John Moores University, Pennsylvania State University, Chinese Academy of Sciences and the University of Cambridge

• Of top-50 institutions, only Tripos (no. 27) and Pfizer (no. 36) are for-profit organisations

Page 21: A bibliometric analysis of chemoinformatics

National productivity: the ten countries providing the most articles in the seven journals

USAGermanyEnglandPR ChinaFranceSpainItalyJapanIndiaSwitzerlandAll others

Page 22: A bibliometric analysis of chemoinformatics

The Journal of Molecular Graphics and Modelling

• The journal, then the Journal of Molecular Graphics, was started in 1983 and changed to its current name with Volume 15 in 1997

• The journal is:• “devoted to the publication of papers on the uses of computers

in theoretical investigations of molecular structure, function, interaction, and design. The scope of the journal includes all aspects of molecular modelling and computational chemistry, including, for instance, the study of molecular shape and properties, molecular simulations, protein and polymer engineering, drug design, materials design, structure-activity and structure-property relationships, database mining, and compound library design”

• See http://www.elsevier.com/wps/find/journaldescription.cws_home/525012/description#description

Page 23: A bibliometric analysis of chemoinformatics

Bibliometric distributions: I

• Many bibliometric distributions are characterised by inverse, highly skewed frequency distributions• Zipf’s Law for word occurrences

• Lotka’s Law for author productivity

• Bradford’s Law for subject spread in journals

• Many other examples• Design of storage systems

• Language acquisition

• Income distribution (Pareto distribution)

Page 24: A bibliometric analysis of chemoinformatics

Bibliometric distributions: II

• All of the bibliometric distributions can be represented by an equation of the form

where f(k) is the frequency of occurrence of some bibliometric item that is associated with each member of a population (k=1,2...) that is producing examples of these items, and where C and are constants

Page 25: A bibliometric analysis of chemoinformatics

Lotka’s Law

• The original formulation (A. Lotka (1926), The frequency distribution of scientific productivity, Journal of the Washington Academy of Sciences, Vol. 16, pp. 317-323) suggested =2 but wide range of values observed in practice, e.g., 1.78-3.78 (M.L. Pao (1986), An empirical examination of Lotka's Law, J. Amer. Soc. Inf. Sci., Vol. 37, pp. 26-33)

• WOK lists 859 articles appearing in Vols. 2-24 of the journal• Reasonable Lotka plot with C=0.834 and = 3.02 • Well know authors with >= 6 papers: Arteca, Bajorath, Brasseur,

Chatterjee, Ferrin, Flower, Gaber, Goodsell, Griffith, Maigret, Martin, Mornon, Nakamura, Olson, Richards, Tapia, Toma, Umeyama, Welsh, White, Willett

Page 26: A bibliometric analysis of chemoinformatics

Lotka data for 859 articles published in Volumes 2-24 of the journal

0

1

2

3

4

5

6

7

8

0 0.5 1 1.5 2 2.5

Log authors against log papers

Page 27: A bibliometric analysis of chemoinformatics

Types of paper in Volumes 4 (1986), 14 (1996) and 24 (2006)

0

10

20

30

40

50

60

70

1986 1996 2006

SoftwareApplications

Page 28: A bibliometric analysis of chemoinformatics

Most-cited papers

R. Koradi et al. (1996), MOLMOL: A program for display and analysis of macromolecular structures, J. Mol. Graph. Model., Vol. 14, pp. 51-55.

3298

W. Humphrey et al. (1996), VMD: Visual molecular dynamics, J. Mol. Graph., Vol. 14, pp. 33-38.

1732

G. Vriend (1990), What-If – a molecular modelling and drug design program, J. Mol. Graph., Vol. 8, pp. 52-56.

1505

R.M. Esnouf (1997), An extensively modified version of MolScript that includes greatly enhanced coloring capabilities, J. Mol. Graph. Model., Vol. 15, pp. 132-134.

1316

S.V. Evans (1993), SETOR – hardware-lighted 3-dimensional solid model representations of macromolecules, J. Mol. Graph., Vol. 11, pp. 134-138.

1151

T.E. Ferrin et al. (1988), The MIDAS display system, J. Mol. Graph., Vol. 6, pp. 13-27

982

M. Carson (1987), Ribbon models of macromolecules, J. Mol. Graph., Vol. 5, pp. 103-106.

514

W. Smith and T.R. Forester (1996), DL_POLY_2.0: A general-purpose parallel molecular dynamics simulation package, J. Mol. Graph., Vol. 14, pp. 36-141.

314

Page 29: A bibliometric analysis of chemoinformatics

Inter-journal relatedness

• The Journal Citation Reports database provides a further way of analysing the degree of co-citation between journals

• Let A and B be journals publishing PA and PB articles; let CAB be the number of times that A cites B and let CTA be the total number of citations in A. Then the relatedness of A to B is defined as

• A similar calculation can be made of the relatedness of B to A

AB

AB

CTP

C

Page 30: A bibliometric analysis of chemoinformatics

Relatedness values (× 106)

JMGM to J J to JMGM

Journal of Computer-Aided Molecular Design 250.35 256.16

Journal of Chemical Information and Modeling  62.95 186.84

Journal of Computational Chemistry  162.85 66.96

Structure 30.00 141.65

Proteins 55.99 116.33

Acta Crystallographica D 15.04 111.91

SAR and QSAR in Environmental Research   31.48 98.87

Journal of Molecular Modeling   22.36 96.27

Current Opinion in Structural Biology 84.66 41.70

Protein Science 26.56 79.73

Page 31: A bibliometric analysis of chemoinformatics

Countries providing at least 3% of the articles in Volumes 2-24 of the journal

USAEnglandJapanFranceGermanyAustraliaSpainSwitzerlandAll others

Page 32: A bibliometric analysis of chemoinformatics

Conclusions

• Most academics are interested in their personal citation counts and in the impact factors for their favourite journals

• Bibliometrics has more general applications• Subject coverage

• Key players and articles

• Relationships between journals

• Recent developments facilitate the carrying-out of such analyses