a bibliometric analysis of chemoinformatics

of 32 /32
A bibliometric analysis of chemoinformatics Presented at the 25th Anniversary Meeting of the Molecular Graphics and Modelling Society, School of Oriental and African Studies, London 13th March 2007 Peter Willett, University of Sheffield, UK

Author: aricin

Post on 09-Jan-2016




6 download

Embed Size (px)


A bibliometric analysis of chemoinformatics. Presented at the 25th Anniversary Meeting of the Molecular Graphics and Modelling Society, School of Oriental and African Studies, London 13th March 2007 Peter Willett, University of Sheffield, UK. Overview of talk. Bibliometrics Chemoinformatics - PowerPoint PPT Presentation


  • A bibliometric analysis of chemoinformatics

    Presented at the 25th Anniversary Meeting of the Molecular Graphics and Modelling Society, School of Oriental and African Studies, London 13th March 2007

    Peter Willett, University of Sheffield, UK

  • Overview of talkBibliometricsChemoinformaticsGrowth of the subjectSubject coverageAuthor productivityThe Journal of Molecular Graphics (and Modelling)

  • Bibliometrics: what is it?Bibliometrics is: The application of mathematical and statistical methods to books and other media (A. Pritchard (1969), Statistical bibliography or bibliometrics?, J. Docum., Vol. 25, pp. 348-349)The study, or measurement, of texts and information (Wikipedia)See also:Webometricsthe study of the quantitative aspects of the construction and use of information resources, structures and technologies on the Web drawing on bibliometric and informetric approaches" (L. Bjrneborn and P. Ingwersen (2004), Toward a basic framework for webometrics. J. Amer. Soc. Inf. Sci. Technol., Vol. 55, pp. 1216-1227)Cybermetrics, informetrics, scientometrics

  • Bibliometrics: subjects of studyBibliometric distributionsHighly skewed frequency distributions (Bradford, Lotka, Zipf) and their implicationsCitation analysisAnalysis of individuals, institutions and journalsUse as performance indicators for the evaluation of researchPhilosophy of scienceSubject coverageAcademic collaborationsNow extension to linkages between Web sitesSitations, cf citations

  • From chemical documentation to chemoinformaticsChemical documentation is long establishedChemisches Journal started in 1778Chemical Abstracts started in 1907First computer-based information systems and services in SixtiesChemical Titles in 1961Morgan and Sussenguth algorithms in 1965Recent emergence of chemoinformatics M. Hann and R. Green (1999), Chemoinformatics - a new name for an old problem?, Curr. Opin. Chem. Biol., Vol. 3, pp. 379-383.

  • Chemoinformatics: definitionsThe use of information technology and management has become a critical part of the drug discovery process. Chemoinformatics is the mixing of those information resources to transform data into information and information into knowledge for the intended purpose of making better decisions faster in the area of drug lead identification and optimization F.K. Brown (1998), Chemoinformatics: What is it and how does it impact drug discovery?, Ann. Reports Med. Chem., Vol. 33, pp. 375-384Take 1998 as the starting point for the bibliometric analysesMany alternatives, e.g.Chem(o)informatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination, visualization and use of chemical information G Paris (August 1999 ACS meeting), quoted by W.A. Warr at http://www.warr.com/warrzone.htmChemoinformatics is the application of informatics methods to solve chemical problems J. Gasteiger and T. Engels (2003), Chemoinformatics: a Textbook, Wiley-VCH.

  • Bibliometric studies in chemoinformaticsOnodera (2001)Analysis of the subject coverage of Journal of Chemical Information and Computer Sciences Redman et al. (2001)Applications of the Cambridge Structural Database Bishop et al. (2003)Citations to Sheffield chemoinformatics researchWarr (2005)Most cited papers in Journal of Chemical Information and Computer Sciences Behrens and Luksch (2006)Contents of the Inorganic Crystal Structure Database

  • Data sources for bibliometric researchWeb of Knowledge (WOK)Long established as the data source for bibliometric analysesRecent addition of analysis tools (Analyse Results and Citation Reports)Probably still the most comprehensiveNew sourcesGoogleGoogle Scholar restricted to the scholarly literatureScopusNew service from Elsevier, offering similar facilities to WOK

  • What shall we call it?

  • Google postings from http://www.molinspiration.com/chemoinformatics.html

  • WOK search of the title, keyword and abstract fields for: chemoinformatics OR cheminformatics OR chemical informaticsThis search retrieved 197 records for the period 1998-2006 in 87 different sourcesOf these, Journal of Chemical Information and Modeling (and its predecessor) is clearly the core journal

  • Most frequently occurring sources

  • Inter-journal relationshipsL. Leydesdorff (2007), "Visualization of the citation impact environment of scientific journals", J. Amer. Soc. Inf. Sci. Technol., Vol. 58, pp. 25-38.Analysis of 2003-04 WOK data to identify journals that provide >= 1% of the citations to/from a given journalFor Journal of Chemical Information and Computer Sciences14 other to journals but only 5 other from journalsMulti-disciplinary nature of the field means that a wide range of sources are used

  • Author productivity: IAnalysis of the authors of all articles published 1998-2006 in:Bioinformatics, Combinatorial Chemistry and High-Throughput Screening and Journal of Biomolecular ScreeningJournal of Chemical Information and Modeling, Journal of Computer-Aided Molecular Design, Molecular Diversity and QSAR & Combinatorial ScienceJournal of Molecular Graphics and Modelling, Journal of Molecular Modeling and SAR and QSAR in Environmental ResearchIdentification of the 20 most productive authors for each of these journals in 1998-2006

  • Author productivity: IIProductive authors in the first group of journals did not publish frequently in the other two groups of journals, but fair degree of overlap between the journals in the other two groups (Molecular Diversity the least)There is one author in the top-20 for four journals, two authors in the top-20 for three journals and 12 authors in the top-20 for two journalsEight of the top-20 authors in Journal of Chemical Information and Computer Sciences are also top-20 authors in other journalsMain degrees of overlap betweenJournal of Chemical Information and Modeling and Journal of Computer-Aided Molecular DesignQSAR & Combinatorial Science and SAR and QSAR in Environmental Research

  • Overlap in top-20 authors

  • The core literatureA basic principle of bibliometrics is that citation corresponds to use, i.e., frequently cited papers are the most scientifically valuableNB the many exceptionsClassic citationsCritical citationsSelf-citation and close collaboratorsJournal Impact Factor games but generally a valid assumptionAnalysis of citations to 4411 articles in seven chemoinformatics journals for 1998-2006 attracted a total of 35,228 citations

  • Most-cited papers: I

  • Most-cited papers: IICertain types of article strongly represented in the top-30 positionsSoftware descriptions (9)Reviews (4)Drug-likeness (4)Binding energies (4)The first of these might be thought of as the fields classic citations (cf Journal of Chemical Information and Computer Sciences two most-cited articles)

  • Institutional productivityThe following institutions all provide at least 1% of the papers in all of the seven journalsNational Institute of Chemistry, Ljubljana, University of Erlangen-Nurnberg, University of Sheffield, University of Minnesota, Environmental Protection Agency, Russian Academy of Sciences, Liverpool John Moores University, Pennsylvania State University, Chinese Academy of Sciences and the University of Cambridge Of top-50 institutions, only Tripos (no. 27) and Pfizer (no. 36) are for-profit organisations

  • National productivity: the ten countries providing the most articles in the seven journals

  • The Journal of Molecular Graphics and ModellingThe journal, then the Journal of Molecular Graphics, was started in 1983 and changed to its current name with Volume 15 in 1997The journal is:devoted to the publication of papers on the uses of computers in theoretical investigations of molecular structure, function, interaction, and design. The scope of the journal includes all aspects of molecular modelling and computational chemistry, including, for instance, the study of molecular shape and properties, molecular simulations, protein and polymer engineering, drug design, materials design, structure-activity and structure-property relationships, database mining, and compound library designSee http://www.elsevier.com/wps/find/journaldescription.cws_home/525012/description#description

  • Bibliometric distributions: IMany bibliometric distributions are characterised by inverse, highly skewed frequency distributionsZipfs Law for word occurrencesLotkas Law for author productivityBradfords Law for subject spread in journalsMany other examplesDesign of storage systemsLanguage acquisitionIncome distribution (Pareto distribution)

  • Bibliometric distributions: IIAll of the bibliometric distributions can be represented by an equation of the form

    where f(k) is the frequency of occurrence of some bibliometric item that is associated with each member of a population (k=1,2...) that is producing examples of these items, and where C and are constants

  • Lotkas LawThe original formulation (A. Lotka (1926), The frequency distribution of scientific productivity, Journal of the Washington Academy of Sciences, Vol. 16, pp. 317-323) suggested =2 but wide range of values observed in practice, e.g., 1.78-3.78 (M.L. Pao (1986), An empirical examination of Lotka's Law, J. Amer. Soc. Inf. Sci., Vol. 37, pp. 26-33) WOK lists 859 articles appearing in Vols. 2-24 of the journalReasonable Lotka plot with C=0.834 and = 3.02 Well know authors with >= 6 papers: Arteca, Bajorath, Brasseur, Chatterjee, Ferrin, Flower, Gaber, Goodsell, Griffith, Maigret, Martin, Mornon, Nakamura, Olson, Richards, Tapia, Toma, Umeyama, Welsh, White, Willett

  • Lotka data for 859 articles published in Volumes 2-24 of the journal

  • Types of paper in Volumes 4 (1986), 14 (1996) and 24 (2006)

  • Most-cited papers

  • Inter-journal relatednessThe Journal Citation Reports database provides a further way of analysing the degree of co-citation between journals Let A and B be journals publishing PA and PB articles; let CAB be the number of times that A cites B and let CTA be the total number of citations in A. Then the relatedness of A to B is defined as

    A similar calculation can be made of the relatedness of B to A

  • Relatedness values ( 106)

  • Countries providing at least 3% of the articles in Volumes 2-24 of the journal

  • ConclusionsMost academics are interested in their personal citation counts and in the impact factors for their favourite journalsBibliometrics has more general applicationsSubject coverageKey players and articlesRelationships between journalsRecent developments facilitate the carrying-out of such analyses