is peer review any good? a quantitative analysis of peer review
DESCRIPTION
This is a presentation of the paper in which we focus on the analysis of peer reviews and reviewers behavior in conference review processes. We report on the development, definition and rationale of a theoretical model for peer review processes to support the identification of appropriate metrics to assess the processes main properties. We then apply the proposed model and analysis framework to data sets about reviews of conference papers. We discuss in details results, implications and their eventual use toward improving the analyzed peer review processes.TRANSCRIPT
Ispeerreviewanygood?Aquan4ta4ve
analysisofpeerreview
FabioCasa),MaurizioMarchese,AzzurraRagone,Ma6eoTurrini
UniversityofTrento
h6p://eprints.biblio.unitn.it/archive/00001654/01/techRep045.pdf
Ini)alGoals
• Understandhowwellpeerreviewworks
• Understandhowtoimprovetheprocess
• Metrics+Analysis– (refertoliquiddoc)
• Focusonlyongatekeepingaspect “Not everything that can be counted counts,
and not everything that counts can be counted.” -- Albert Einstein
MetricDimensions
Quality
Fairness Efficiency
Kendall Distance
Divergence
Disagreement
Biases
Robustness
Unbiasing
Effort vs quality. Effort-invariant alternatives
DataSets
• Around7000reviewsfromvariousconferencesintheCSfield(moreontheway)– Large,medium,small– Somewith“youngreviewers”
Ispeerrevieweffec)ve?Doesitwork?
• Andwhatdoesitmeantobeeffec)ve?HOWdowemeasureit?
• Easiertomeasure/detect“problems”
• Peerreviewrankingvs.idealranking
Comparingrankings
28 17 2
45 67 .. ..
89 33 ..
33 89 2
17 67 ..
28 .. ..
45
Idealranking(?)
• Successinasubsequentphase• Cita)ons
Suggested reading: Positional effect on citation and readership in arXiv, by Haque and Ginsparg
Comparingrankings
T=3
N=10
Divergence:Div(t,N)Kendallτ
9
Results:peerreviewrankingvs.cita)oncount
10
Divergence
Div
Normalizedt
Randomnessandreliability
• Quality‐relatedbutindependentofthecriteriaforthe“ideal”ranking
• Basicstats• Disagreement
• Robustness• Biases
11
Quality‐relatedMetrics:Sta)s)cs
12
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0 1 2 3 4 5 6 7 8 9 10
Prob
ability
Marks
Distribu4onofmarks(integermarks)
Disagreement
• Measurethedifferencebetweenthemarksgivenbythereviewersonthesamecontribu4on.
• Thera)onalebehindthismetricisthatinareviewprocessweexpectsomekindofagreementbetweenreviewers.
NormalizedDisagreement(aferdiscussion)
14
C1 C2 C3
Computed 0,27 0,32 (high variance) 0,26 (high variance)
Reshuffled 0,34 0,40 0,32
Robustness
• Sensi)vitytosmallvaria)oninthemarks– Triestoassesstheimpactofsmallindecisionsingivingthemark(e.g.,6vs7…..)
• Measuresdivergenceaferapplyinganε‐varia)ontothemark
• Results:reasonablyrobustexceptfortheconferencemanagedbyyoungresearchers
15
Metricdimensions
Quality
Fairness Efficiency
Statistics
Kendall Distance
Divergence
Disagreement
Biases
Robustness Unbiasing
Effort
Fairness
• Defini)on:Areviewprocessisfairifandonlyoftheacceptanceofacontribu)ondoesnotdependonthepar)cularsetofPCmembersthatreviewsit
• Thekeyisintheassignmentofapapertoreviewers:Apaperassignmentisunfairifthespecificassignmentinfluences(makesmorepredictable)thefateofthepaper.
Poten)albiases
• Ra4ngbias:Reviewersarebiasediftheyconsistentlygivehigher/lowermarksthantheircolleagueswhoarereviewingthesamepaper
• Affilia4onbias
• Topicbias• Countrybias• Genderbias• …
ComputedNormalizedRa)ngBiases
C2 C3 C4
top accepting 3,44 1,52 1,17
top rejecting -2,78 -2,06 -1,17
> + |min bias| 5% 9% 7%
< - |min bias| 4% 8% 7%
C2 C3 C4
Unbiasing effect (divergence) 9% 11% 14%
Unbiasing effect (reviewers affected) 16 5 4
20