bibliometrics and preference modelling

58
Bibliometrics and preference modelling Thierry Marchant Ghent University

Upload: katina

Post on 22-Feb-2016

42 views

Category:

Documents


0 download

DESCRIPTION

Bibliometrics and preference modelling. Thierry Marchant Ghent University. Some academic rankings. Top 5% Authors, as of April 2008 Average Rank Score. Outline. Why rank ? Which attributes? Some popular rankings. How can we motivate a ranking ? The axiomatic approach. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Bibliometrics and preference modelling

Bibliometrics and preference modelling

Thierry MarchantGhent University

Page 2: Bibliometrics and preference modelling

Some academic rankings

Page 3: Bibliometrics and preference modelling
Page 4: Bibliometrics and preference modelling
Page 5: Bibliometrics and preference modelling
Page 6: Bibliometrics and preference modelling

Top 5% Authors, as of April 2008

Average Rank Score

Author Nb W Nb C Sc C ANb C ASc C Nb P Sc P ANb P h

1 R. J. Barro 52 1 2 1 1 26 11 15 22 J. E. Stiglitz 5 5 6 4 9 3 2 2 143 A. Shleifer 21 2 1 6 8 10 5 54 14 J. J. Heckman 12 4 4 3 3 4 3 5 35 P. C. B. Phillips 2 30 69 26 59 2 1 3 526 R. E. Lucas Jr. 737 10 8 2 2 156 60 61 337 M. L. Gertler 192 3 3 9 11 208 95 270 118 M. S. Feldstein 19 71 39 40 23 67 21 25 799 E. C. Prescott 109 9 5 11 4 127 82 152 23

10 J. Tirole 14 20 24 21 27 6 4 8 4

Page 7: Bibliometrics and preference modelling
Page 8: Bibliometrics and preference modelling

Outline

• Why rank ?

• Which attributes?

• Some popular rankings.

• How can we motivate a ranking ?

• The axiomatic approach.

• Comparing peers and apples

Page 9: Bibliometrics and preference modelling

Why rank ?

Page 10: Bibliometrics and preference modelling

Why rank universities ?• To choose one for studying (bachelor student).• To attract good students (good university).• To obtain subsidies (good university).• To allocate subsidies (government).• To allocate students to various universities in

function of their score at an exam (government).• ...

Page 11: Bibliometrics and preference modelling

Why rank departments ?• To choose one for studying (doctoral student).• To attract good students (good department).• To obtain subsidies (good department).• To allocate subsidies (government).• To allocate students to various departments in function

of their score at an exam (government).• ...

Page 12: Bibliometrics and preference modelling

Why rank scientists ?• To determine the salary (university).• To award a scientific distinction (scientific society).• To hire a new scientist (university).• To choose a thesis director (student).• To evaluate a department or university (...).• To evaluate a journal (...).• To allocate subsidies (government).• ...

Page 13: Bibliometrics and preference modelling

Why rank journals ?• To choose one for publishing (scientist).

• To maximize the dissemination of one’s results.• To maximize one’s value.

• To evaluate a scientist (...).• To evaluate a department (...).• To evaluate a university (...).• To improve one’s image (good publisher).• ...

Page 14: Bibliometrics and preference modelling

Why rank articles ?• To select articles (scientist).• To evaluate a scientist (...).• To evaluate a departement (...).• To evaluate a university (...).• To evaluate a journal (...).• ...

Page 15: Bibliometrics and preference modelling

Focus in this talk

• Rankings of scientists• Rankings of departments• Rankings of universities• Rankings of journals• Rankings of articles

Page 16: Bibliometrics and preference modelling

Which attributes ?

Page 17: Bibliometrics and preference modelling

Many relevant attributes

Quality– Evaluation by peers– Quality of the journals– Citations (#, authors, journals, +/-)– Coauthors– Patents– Awards– Budget

Quantity– Number of papers– Number of books– Number of pages– Coauthors (#)– Number of patents– Citations (#)– Awards– Budget– Number of thesis students

Various– Age– Carreer length– Land

– Nationality– Discipline– Century

– University

Page 18: Bibliometrics and preference modelling

Bibliometric attributes

Quality– Evaluation by peers– Quality of the journals– Citations (#, authors, journals, +/-)– Coauthors– Patents– Awards– Budget

Quantity– Number of papers– Number of books– Number of pages– Coauthors– Number of patents– Citations (#)– Awards– Budget– Number of thesis students

Various– Age– Carreer length– Land

– Nationality– Discipline– Century

– University

Page 19: Bibliometrics and preference modelling

Bibliometric attributes

Quality– Evaluation by peers– Quality of the journals– Citations (#, authors, journals, +/-)– Coauthors– Patents– Awards– Budget

Quantity– Number of papers– Number of books– Number of pages– Coauthors– Number of patents– Citations (#)– Awards– Budget– Number of thesis students

Various– Age– Carreer length– Land

– Nationality– Discipline– Century

– University

Page 20: Bibliometrics and preference modelling

Bibliometric attributes

Why using bibliometric attributes ?• Cheap• Objective ?• Reliable ?

Page 21: Bibliometrics and preference modelling

Some popular rankings of scientists

Page 22: Bibliometrics and preference modelling

Some popular rankings• Number of publications• Total number of citations• Maximal number of citations• Number of publications with at least a citations.• Average number of citations• The same ones weighted by• Number of authors• Number of pages• Impact factor

• The same ones corrected for age• h-index, g-index, hc-index, hI-index, R-index, A-index, …

Page 23: Bibliometrics and preference modelling

The h-index• Published in 2005 by physicist G. Hirsch.• 462 (1267) citations in March 2009 (May 2013).• Adopted by Web of Science (ISI, Thomson).• The h-index is the largest natural number x such that at

least x of his/her papers have at least x citations each.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 160

4

8

12

16

Distribution of the number of citations

h-index = 6

Page 24: Bibliometrics and preference modelling

How to justify a ranking ?

• THE true and universal ranking does not exist.

Page 25: Bibliometrics and preference modelling

How to justify a ranking ?

• THE true and universal ranking does not exist.

Two departments: 50 scientists with 2000 citations

3 scientists with 180 citations

Page 26: Bibliometrics and preference modelling

How to justify a ranking ?

• THE true and universal ranking does not exist.

• If one knows the true ranking, one may compute some correlation between the true one and another one.

Page 27: Bibliometrics and preference modelling

How to justify a ranking ?

• THE true and universal ranking does not exist.

• If one knows the true ranking, one may compute some correlation between the true one and another one.

Assessing the Accuracy of the h- and g-Indexes for Measuring Researchers’ Productivity, Journal of the American society for information science and technology, 64(6):1224–1234, 2013.

“The analysis quantifies the shifts in ranks that occur when researchers’ productivity rankings by simple indicators such as the h- or g-indexes are compared with those by more accurate FSS.”

Page 28: Bibliometrics and preference modelling

How to justify a ranking ?

• THE true and universal ranking does not exist.

• If one knows the true ranking, one may compute some correlation between the true one and another one.

• Assume a law linking the numbers of papers and citations to the quality of the scientist (unobserved variable) and his age. This law may be probabilistic. Derive then an estimation of the quality of a scientist from his data (papers and citations).

Page 29: Bibliometrics and preference modelling

How to justify a ranking ?

• THE true and universal ranking does not exist.

• If one knows the true ranking, one may compute some correlation between the true one and another one.

• Assume a law linking the numbers of papers and citations to the quality of the scientist (unobserved variable) and his age. This law may be probabilistic. Derive then an estimation of the quality of a scientists from his data (papers and citations).

• Analyze the mathematical properties of rankings.

Page 30: Bibliometrics and preference modelling

Characterization of scoring rules

Page 31: Bibliometrics and preference modelling

Definitions

• Set of journals : J = { j, k, l, …}

• Paper: a paper in journal j with x citations and a coauthors is represented by the triplet (j,x,a).

• Scientist: mapping f from J×N×N to N. The number f(j,x,a) represents the number of publications of author f in journal j with x citations and a coauthors.

• Set of scientists: set X of all mappings from J×N×N to N such that Σj∈J Σx∈N Σa∈N f(j,x,a) is finite.

• Bibliometric ranking : weak order ≥ on X (complete and transitive relation).

Page 32: Bibliometrics and preference modelling

Scoring rules

• Scoring rule : a bibliometric ranking is a scoring rule if there exists a real-valued mapping u defined on J×N×N such that f ≥ g iff

Σj Σx Σa f(j,x,a) u(j,x,a) ≥ Σj Σx Σa g(j,x,a) u(j,x,a) • Examples :

• u(j,x,a) = 1 # papers

• u(j,x,a) = x # citations

• u(j,x,a) = x/(a+1) # citations weighted by # authors

• u(j,x,a) = IF(j) # papers weighted by impact factor

• …

Page 33: Bibliometrics and preference modelling

Axioms

• Independence: for all f, g in X, all j in J, all x, a in N, we have f ≥ g iff f + 1j,x,a ≥ g + 1j,x,a .

Page 34: Bibliometrics and preference modelling

Axioms

• Independence: for all f, g in X, all j in J, all x, a in N, we have f ≥ g iff f + 1j,x,a ≥ g + 1j,x,a .

>+ 1 paper in j, with x citations with a coauthors

+ 1 paper in j, with x citations with a coauthors

>f g

Page 35: Bibliometrics and preference modelling

Axioms

• Archimedeanness: for all f, g, h, e in X with f > g, there is a natural n such that e + nf ≥ h + ng .

Page 36: Bibliometrics and preference modelling

Axioms

• Archimedeanness: for all f, g, h, e in X with f > g, there is a natural n such that e + nf ≥ h + ng .

<e h

+ f : 10 papers with 20 citations + g : 1 paper with 1 citation + f : 10 papers with 20 citations + g : 1 paper with 1 citation + f : 10 papers with 20 citations + g : 1 paper with 1 citation + f : 10 papers with 20 citations + g : 1 paper with 1 citation

Page 37: Bibliometrics and preference modelling

Axioms

• Independence: for all f, g in X, all j in J, all x, a in N, we have f ≥ g iff f + 1j,x,a ≥ g + 1j,x,a .

Not satisfied by the max # of citations or h-index.

Reversal with the h-index when adding 2 papers.

• Archimedeanness: for all f, g, h, e in X with f > g, there is an integer n such that e + nf ≥ h + ng .

Not satisfied by the max # of citations, h-index, lexicographic ranking.

Page 38: Bibliometrics and preference modelling

Result

• Theorem : A bibliometric ranking satisfies Independence and Archimedeanness iff it is a scoring rule. Furthermore u is unique up to a positive affine transformation.

• Proof:

• (X, +, ≥) is an extensive measurement structure as in [Luce, 2000].

• (X, +) is a cancellative (f+g = f+h g=h) monoid. It can be extended to a group (X’, +) by the Grothendieck construction. (X’, +, ≥) is an Abelian and Archimedean linearly ordered group. It is isomorphic to a subgroup of the ordered group of real numbers (Hölder).

Page 39: Bibliometrics and preference modelling

Special case: u(j,x,a) = x /(a+1).

• Transfer: for all j in J, all x, y, a in N, we have 1j,x,a + 1j,y+1,a ~ 1j,x+1,a + 1j,y,a (u affine in # citations).

• Condition Zero: for all j in J, all a in N, there is f in X such that f + 1j,0,a ~ f (u linear in # citations).

• Journals Do Not Matter: for all j, j’ in J, all a, x in N, 1j,x,a ~ 1j’,x,a (u independent of journal).

• No Reward for Association: for all j in J, all m, x in N with m >1, 1j,x,0 ~ m 1j,x,m-1 (u inversely proportional to # authors).

Page 40: Bibliometrics and preference modelling

Characterization of conjugate scoring rules for scientists and departments

Page 41: Bibliometrics and preference modelling

Introduction

• Consider two departments each consisting of two scientists. The scientists in department A both have 4 papers, each one cited 4 times. The scientists in department B both have 3 papers, each one cited 6 times.

• Both scientists in department A have an h-index of 4 and are therefore better than both scientists in department B, with an h-index of 3. Yet, department A has an h-index of 4 and is therefore worse than department B with an h-index of 6. Hence, the “best” department contains the “worst” scientists.

Page 42: Bibliometrics and preference modelling

Definitions

• Scientist: mapping f from N to N. The number f(x) represents the number of publications of scientist f in with x citations.

• Set of scientists: set X of all mappings from N to N such that Σx∈N f(x) is finite.

• Ranking of scientists : weak order ≥s on X.

• Department : vector of scientists

• Set of all departments denoted by Y.

• Ranking of departments : weak order ≥d on Y.

Page 43: Bibliometrics and preference modelling

Scoring rules

• Scoring rule : a ranking of scientists is a scoring rule if there exists a real-valued mapping u defined on N such that f ≥s g iff

Σx f(x) u(x) ≥ Σx g(x) u(x)

• Scoring rule : a ranking of departments is a scoring rule if there exists a real-valued mapping u defined on N such that (f1, …, fk) ≥d (g1, …, gl) iff

Σi Σx fi(x) v(x) ≥ Σj Σx gj(x) v(x)

• Conjugate scoring rules : ≥s and ≥d are conjugate scoring rules if u = v.

Page 44: Bibliometrics and preference modelling

Axioms

• Consistency: if fi ≥s gi, for i = 1, … , k, then (f1, …, fk) ≥d (g1, …, gk) . In addition, if fi >s gi, for some i, then (f1, …, fk) >d (g1, …, gk) .

• Totality: if (f1, …, fk) and (g1, …, gl) are such that Σi fi = Σj gj , then (f1, …, fk) ~d (g1, …, gl) .

• Dummy : (f1, …, fk) ~d (f1, …, fk, 0) .

Page 45: Bibliometrics and preference modelling

Result

• Theorem : ≥s and ≥d satisfy Consistency, Totality, Dummy and Archimedeannness of ≥s iff they are conjugate scoring rules. Furthermore u is unique up to a positive affine transformation.

Page 46: Bibliometrics and preference modelling

Discussion

Page 47: Bibliometrics and preference modelling

Discussion

• Axiomatic analysis of more rankings is needed.

• Axiomatic analysis of indices is different but also relevant.

• Consistency is important (e.g. h-index for scientists and IF for journals).

Page 48: Bibliometrics and preference modelling

Literature

•Scientometrics

•Journal of Informetrics

•Journal of the American Society for Information Scienceand Technology

Page 49: Bibliometrics and preference modelling

Comparing peers and apples

Page 50: Bibliometrics and preference modelling

Comparing scientists of different ages

h-index = a h-index = b

a > b

Page 51: Bibliometrics and preference modelling

•Instead of h-index, use an index that is independent of time.

•For instance, the average number of citations per paper, i.e. Σx∈N x f(x)/ Σx∈N f(x)

•Problem: suppose f has one paper with 50 citations and g has 10 papers with 40 citations.

•Divide the h-index by the length of the carreer

•Problem: the h-index is not a linear function of time

Comparing scientists of different ages

Page 52: Bibliometrics and preference modelling

Comparing across disciplines• The average number of citations per paper is 80 times

larger in medicine than in mathematics.• Any comparison of scientists across disciplines, using an

index based on citations is therefore flawed.• Field normalization: for a given index, compute the

distribution of the index in each field (medicine, physics, economics, mathematics, literature, …). Define then the normalized index of a scientist as his/her percentile.

• Problem: the definition of a field is arbitrary. The average number of citations per paper is 20 times larger in physics than in mathematics. But only 2-3 times in theoretical physics.

Page 53: Bibliometrics and preference modelling

Source field normalization

• Papers in medicine are often cited. This implies that they have long reference lists. Papers in mathematics have short reference lists.

• Instead of defining disciplines or fields, use the length of the reference list to normalize. Thus, divide the number of citations received by a paper by the length of the reference list.

Page 54: Bibliometrics and preference modelling

Distributions

Page 55: Bibliometrics and preference modelling
Page 56: Bibliometrics and preference modelling

Lotka’s law

Proportion of scientists with n papers : F(n) = C/na

with C ≃ 2 and a depending on the field.

Page 57: Bibliometrics and preference modelling

Non universal power law

Peterson Pressé and Dill, Proceedings of the National Academy of

Sciences, 107, 2010.

Direct citations : the probability that a new paper will randomly cite paper A is Pdirect = 1/N, with N the total number of published papers.

Indirect citations : the author of the new paper may first find a paper B and learn of paper A via B’s reference list. Pindirect = k/Nn, with k the number of existing citations to A and n the average length of the reference list.

Page 58: Bibliometrics and preference modelling

Non universal power law (ctd)

Fraction of the N papers with k citations :