Bibliometrics: From Garfield to Google Scholar
Elaine M. Lasda BergmanUniversity at Albany
Upstate NY SLA Spring MeetingApril 20, 2012
What we’re going to cover
• What is the study of Bibliometrics?• Bibliometrics which assess entire Journals – JIF, Eigenfactor, SNIP, SJR
• Bibliometrics assessing authors, articles, institutions– citation count, H-index, e-index, etc. etc. etc.
What is bibliometrics?
Eugene Garfield• Scholarly communication:
tracing the history and evolution of ideas from one scholar to another
• Measures the scholarly influence of articles, journals, scholars, institutions
Three sources for citation data
Three sources for citation data
• Citation data overlaps, but not completely• Unique citing references in all three databases• Unique metrics developed using each
database– Metrics could be computed in any one of these
but most are tied to a particular source
JOURNAL METRICS
What is measured?
• Journal Ranking– “Quality” or “Importance” of journal relative to
other journals• Usually within a given field of study
• There are many ways to measure “quality,” “importance”
“Impact”
• Journal Impact Factor (JIF)• Web of Science – Journal Citation Reports• Basically “how fast are ideas spreading from
this journal to other publications?”• Formula is a ratio:
Number of citations to a journal in a given year from articles occurring in the past 2 years,
divided by the number of scholarly articles published in the journal in the past 2 years
Journal Impact Factor
• Journal of Hypothetical ExamplesCiting references appearing in 2010, to articles published in Journal in 2009 and 2008
100
200 Total number of articles in Journal published in 2009 and 2008
0.50 JIF
• Cannot be used to compare across disciplines• Two year time frame not adequate for social
sciences, humanities• Coverage of some disciplines not sufficient in
Web of Science• Is a measure of “impact” a measure of
“quality”?
Concerns with impact factor
“Influence”
• Eigenfactor.org • Web of Science: Journal Citation Reports• Eigenvector analysis: Similar to Google
PageRank, “chain of citations”• Takes into account the total amount of
“citation traffic” appearing in JCRInfluence of the citing journal, Divided by the total number of citations appearing in that journal.
“Influence”
• Journal Impact Factor: – All citing references weighted equally
• Eigenfactor: – SOME CITING REFERENCES ARE MORE
IMPORTANT THAN OTHERS• The citing articles from journals that are heavily cited
themselves demonstrate greater influence
Considerations
• Eigenfactor will always be bigger if a journal is larger, i.e., publishes more articles
• Article Influence Score: corrects for journal size– takes the journal’s Eigenfactor score and further
divides it by the number of articles in the journal.– Correlation to the JIF
Examples
• For the year 2011, Neurology had an eigenfactor score of .159. This number = % of all citation traffic of articles in the JCR
• For the year 2011, Neurology had an article influence score of 2.57. This means an average article in this journal is roughly 2 ½ X more influential than an average article in all of JCR
• www.eigenfactor.org
“Citation Potential”
• SNIP: Source Normalized Impact Per Paper• Uses Scopus data• Citation Potential = total number of citing
references in all journals which have cited this journal
• Takes an average citation countThe ratio of the journal’s average citation count per paper to the citation potential in its subject field
Pros and cons of SNIP
• Can compare SNIP scores across disciplines
• Aggregate of a journal, so larger journals automatically have higher scores than smaller journals
“Prestige”
• SJR: Scimago Journal Rank• Uses Scopus data• Measures “current average prestige per
paper”Prestige factors include: # of journals in the Scopus database, # of articles in Scopus from this journal, citation count, eigenvector analysis of important citing references, corrections for self-citations, and normalization by the number of significant works published in the journal.
Pros and Cons of SJR
• Corrects for self citations• Correlated to JIF• Scores can be compared across disciplines• Web version provides data on countries• Three year window not good for social sciences• http://www.scimagojr.com/
Examples in Scopus
Examples in Scopus
Examples in Scopus
Examples in Scopus
Examples in Scopus
Examples in Scopus
METRICS FOR SCHOLARS, AUTHORS, INSTITUTIONS, ETC.
• Number of times cited within a given time period– Journals, Authors, Articles, etc.
• Does not take into account– Materials not included in citation database– Self citations– Variations in citation patterns/rates
Citation count
Citation count
• Citation counts will vary depending on which database you use
• It is very difficult to get a complete count of all citing references
H-index
• Scopus, Google Scholar, WoS?• Meant to account for differences in citation
patterns (i.e., “one-hit wonders” vs. consistent record of scholarship)
“A scientist has index h if h of his/her Np papers have at least h citations each and the other (Np-h) papers have no more than h citations each” (Hisrch 2005)
1 2 3 4 5 6 70
5
10
15
20
25
30
H-indexScholar AScholar B
Article Number
Num
ber o
f Cita
tions
H-index ExampleScholar A Scholar B
10 2710 129 58 47 46 26 2
56 citations 56 citations6 h-index 4 h-index
Variations on the H-index• G-index (Egghe 2006): gives greater weight to highly cited articles
– The top g number of articles have received a combined total of g2 citations
• E-index (Zhang 2009): gives greater weight to highly cited articles – The square root of the surplus of citations in the h-set beyond h2
• Contemporary h-index (Sidiropolous, et. Al. 2006): gives greater weight to newer articles– “parameterized”: current year, citations count 4 times, four years
ago, citations count 1 time, 6 years ago, citations count 4/6 times
Variations on the H-index• Individual h-index (Batista, et al. 2006)accounts for co-authors
– Divides the h-index by the average number of authors per paper• Alternative individual h-index (Harzing): accounts for co-authors
– Normalizes citation counts: divides # of citations by average # of authors per each paper and then computes the h-index
• Another alternative individual h-index (Schreiber 2006): accounts for co-authors– Divides by fractions of papers instead of # of authors, keeps full
citation count
Variations on the H-index
• Age weighted citation rate and AW index (Jin 2007): accounts for variations in citation patterns over time– AWCR= The square root of the sum of all age-weighted citation
counts over all papers that contribute to the h-index– AW-index= the square root of the AWCR – Per-author AWCR: AWCR divided by number of authors for each
paper
Publish or Perish
• Google scholar citation information• Interdisciplinary topics, fields relying on
conference papers or reports• Greatest variety of metrics• Dirty data• Unverified data• Nonscholarly sources
Differences in H-index
Scopus vs. Google Scholar (PoP)The Case of Eugene Garfield
PoP Interface
PoP Search for Garfield
PoP Search for Garfield
An aside: Why I don’t like PoP for Journal Metrics
Scopus Search for Garfield
Scopus Search for Garfield
Scopus Search for Garfield
Scopus Search for Garfield
Scopus Search for Garfield
Citation overview
Scopus Search for Garfield
Link to graphic information next to citation overview
Scopus Search for Garfield
Scopus Search for Garfield
Scopus Search for Garfield
Google scholar citations
http://scholar.google.com/intl/en/scholar/citations.html
Microsoft Academic
http://academic.research.microsoft.com/
• Don’t measure an individual article’s impact by the metrics for the entire journal
• Do I need a comparison within a discipline or across disciplines?
• Does the citation pattern matter or just the count?• Does the database being used cover my subject as
thoroughly as possible?• To what degree does my subject area rely on non-
journal scholarly publications?• Not all citing references are positive!
Considerations
Questions??
Elaine Lasda [email protected]