is peer review any good? a quantitative analysis of peer review

Ispeerreviewanygood?Aquan4ta4ve

analysisofpeerreview

FabioCasa),MaurizioMarchese,AzzurraRagone,Ma6eoTurrini

UniversityofTrento

h6p://eprints.biblio.unitn.it/archive/00001654/01/techRep045.pdf

Ini)alGoals

•  Understandhowwellpeerreviewworks

•  Understandhowtoimprovetheprocess

•  Metrics+Analysis–  (refertoliquiddoc)

•  Focusonlyongatekeepingaspect “Not everything that can be counted counts,

and not everything that counts can be counted.” -- Albert Einstein

MetricDimensions

Quality

Fairness Efficiency

Kendall Distance

Divergence

Disagreement

Biases

Robustness

Unbiasing

Effort vs quality. Effort-invariant alternatives

DataSets

•  Around7000reviewsfromvariousconferencesintheCSfield(moreontheway)– Large,medium,small– Somewith“youngreviewers”

Ispeerrevieweffec)ve?Doesitwork?

•  Andwhatdoesitmeantobeeffec)ve?HOWdowemeasureit?

•  Easiertomeasure/detect“problems”

•  Peerreviewrankingvs.idealranking

Comparingrankings

28 17 2

45 67 .. ..

89 33 ..

33 89 2

17 67 ..

28 .. ..

45

Idealranking(?)

•  Successinasubsequentphase•  Cita)ons

Suggested reading: Positional effect on citation and readership in arXiv, by Haque and Ginsparg

Comparingrankings

T=3

N=10

Divergence:Div(t,N)Kendallτ

Results:peerreviewrankingvs.cita)oncount

10

Divergence

Div

Normalizedt

Randomnessandreliability

•  Quality‐relatedbutindependentofthecriteriaforthe“ideal”ranking

•  Basicstats•  Disagreement

•  Robustness•  Biases

11

Quality‐relatedMetrics:Sta)s)cs

12

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0 1 2 3 4 5 6 7 8 9 10

Prob

ability

Marks

Distribu4onofmarks(integermarks)

Disagreement

•  Measurethedifferencebetweenthemarksgivenbythereviewersonthesamecontribu4on.

•  Thera)onalebehindthismetricisthatinareviewprocessweexpectsomekindofagreementbetweenreviewers.

NormalizedDisagreement(aferdiscussion)

14

C1 C2 C3

Computed 0,27 0,32 (high variance) 0,26 (high variance)

Reshuffled 0,34 0,40 0,32

Robustness

•  Sensi)vitytosmallvaria)oninthemarks– Triestoassesstheimpactofsmallindecisionsingivingthemark(e.g.,6vs7…..)

•  Measuresdivergenceaferapplyinganε‐varia)ontothemark

•  Results:reasonablyrobustexceptfortheconferencemanagedbyyoungresearchers

15

Metricdimensions

Quality

Fairness Efficiency

Statistics

Kendall Distance

Divergence

Disagreement

Biases

Robustness Unbiasing

Effort

Fairness

•  Defini)on:Areviewprocessisfairifandonlyoftheacceptanceofacontribu)ondoesnotdependonthepar)cularsetofPCmembersthatreviewsit

•  Thekeyisintheassignmentofapapertoreviewers:Apaperassignmentisunfairifthespecificassignmentinfluences(makesmorepredictable)thefateofthepaper.

Poten)albiases

•  Ra4ngbias:Reviewersarebiasediftheyconsistentlygivehigher/lowermarksthantheircolleagueswhoarereviewingthesamepaper

•  Affilia4onbias

•  Topicbias•  Countrybias•  Genderbias•  …

ComputedNormalizedRa)ngBiases

C2 C3 C4

top accepting 3,44 1,52 1,17

top rejecting -2,78 -2,06 -1,17

> + |min bias| 5% 9% 7%

< - |min bias| 4% 8% 7%

C2 C3 C4

Unbiasing effect (divergence) 9% 11% 14%

Unbiasing effect (reviewers affected) 16 5 4

is peer review any good? a quantitative analysis of peer review

Education