wang - bias againt novelty in science

19
Bias against Novelty in Science: A Cautionary Tale for Users of Bibliometric Indicators OECD Blue Sky Forum September 19, 2016 Jian Wang (KU Leuven) Reinhilde Veugelers (KU Leuven, Bruegel & CEPR) Paula Stephan (Georgia State University & NBER)

Upload: innovationoecd

Post on 15-Apr-2017

132 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Bias against Novelty in Science: A Cautionary Tale for Users of

Bibliometric Indicators

OECD Blue Sky Forum

September 19, 2016

Jian Wang (KU Leuven)

Reinhilde Veugelers (KU Leuven, Bruegel & CEPR)

Paula Stephan (Georgia State University & NBER)

In a nutshell

• Develop a bibliometric measure of combinatorial novelty.

• Study the impact profile of novel research:

o High risk: higher variance in citations.

o High gain: highly cited, and inspire follow-on highly-cited papers.

o Transdisciplinary impact: broader impact, highly cited in foreign but

not home fields.

o Delayed recognition: not highly cited in the short run.

o Published in low Impact Factor journals.

• Implication:

o Bias against novelty in standard bibliometric indicators.

o Appreciation of novel research comes from foreign fields.

Why do we care?

• Novel research “High risk/high gain” public support.

• Funding agencies are increasingly risk-averse.

o Roger Kornberg, Nobel Laureate, “If the work that you propose to

do isn’t virtually certain of success, then it won’t be funded.”

• Bibliometrics is increasingly used in funding decisions.

o Performance based research funding systems.

• Research Question:

o What is the relationship between novelty and citation impact?

o Are there potential biases in standard bibliometric indicators against

novelty?

Conceptualizing novelty

The creation of any sort of novelty in art, science, or practical

life – consists to a substantial extent of a recombination of

conceptual and physical materials that were previously in

existence.

-- Nelson and Winter (1982)

• Combinatorial novelty: combining existing scientific

components in an unprecedented fashion.

o Economists (Schumpeter, 1939; Nelson & Winter, 1982); psychologists

(Mednick, 1962; Simonton, 2004); sociologists (Latour & Woolgar, 1986).

• Combinatorial novelty is just one dimension of novelty.

Measuring novelty

• For each paper, retrieve its co-cited journal pairs.

• Identify new pairs.

• Check how distant are the combined journals, by

comparing their co-cited journal profiles.

o Cosine similarity (COSi,j) between their journal co-citation profiles in

the preceding three years.

• 𝑁𝑜𝑣𝑒𝑙𝑡𝑦 = 𝐽𝑖−𝐽𝑗 𝑝𝑎𝑖𝑟 𝑖𝑠 𝑛𝑒𝑤 1 − 𝐶𝑂𝑆𝑖,𝑗

• To avoid trivial combinations:

o Exclude 50% least cited journals (in the preceding 3 years).

o Require to be reused in the next 3 years.

o Results robust when relaxing these constraints.

Measuring novelty: An example

Denk & Horstmann (2004) Serial block-face scanning electron microscopy to

reconstruct three-dimensional tissue nanostructure. PLoS biology, 2(11), e329.

o cites 19 WoS-indexed journals, and 9 (out of 171) pairs are new.

• Nature Materials: Chemistry, Physical; Materials Science,

Multidisciplinary; Physics, Applied; and Physics, Condensed Matter.

• Others: Neurosciences; Cell Biology; and Physiology.

Journal 1 Journal 2

1 NATURE MATERIALS CURRENT OPINION IN NEUROBIOLOGY

2 NATURE MATERIALS DEVELOPMENTAL DYNAMICS

3 NATURE MATERIALS PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY OF

LONDON SERIES B-BIOLOGICAL SCIENCES

4 NATURE MATERIALS EUROPEAN JOURNAL OF NEUROSCIENCE

5 NATURE MATERIALS JOURNAL OF HISTOTECHNOLOGY

6 NATURE MATERIALS SCANNING

7 NATURE MATERIALS BRAIN RESEARCH REVIEWS

8 NATURE MATERIALS ANNUAL REVIEW OF BIOPHYSICS AND BIOENGINEERING

9 NATURE MATERIALS PFLUGERS ARCHIV-EUROPEAN JOURNAL OF PHYSIOLOGY

Measuring novelty: An example

o How distant are NATURE MATERIALS and CURRENT OPINION

IN NEUROBIOLOGY?

o Journal co-citation matrix (2001-2003)

o 𝐶𝑂𝑆1,2 =𝐽1∙𝐽2

𝐽1 𝐽2=

331×9691+110×0+0×9959+⋯

02+3312+1102+02+⋯ × 02+96912+02+99592+⋯=

0.31

J1 J2 J3 J4 J5 … JN

J1 NATURE MATERIALS / 0 331 110 0 … …

J2 CURRENT OPINION 0 / 9691 0 9959 … …

J3 SCIENCE 331 9691 / … … … …

J4 NANO LETTERS 110 0 … / … … …

J5 J. OF NEUROSCIENCE 0 9959 … … / … …

… … … … … … / …

JN … … … … … … /

Measuring novelty: An exampleJournal 1 Journal 2 novelty

1 NATURE MATERIALS CURRENT OPINION IN NEUROBIOLOGY 0.69

2 NATURE MATERIALS DEVELOPMENTAL DYNAMICS 0.72

3 NATURE MATERIALS PHILOSOPHICAL TRANSACTIONS … 0.56

4 NATURE MATERIALS EUROPEAN JOURNAL OF NEUROSCIENCE 0.74

5 NATURE MATERIALS JOURNAL OF HISTOTECHNOLOGY 0.73

6 NATURE MATERIALS SCANNING 0.36

7 NATURE MATERIALS BRAIN RESEARCH REVIEWS 0.76

8 NATURE MATERIALS ANNUAL REVIEW OF BIOPHYSICS … 0.50

9 NATURE MATERIALS PFLUGERS … 0.74

o Novelty of the paper = 5.79, top 1% highly novel in its subject

categories.

o This paper was NOT among the top 1% highly cited papers until

2012/2013.

Measuring novelty

• Novelty scores are highly skewed.

• Categorical measure: NOV CAT:

1. non-novel, if a paper has no new journal combinations;

2. moderately novel, if a paper makes new combinations but has a

novelty score lower than the top 1% of its subject category;

3. highly novel, if a paper has a novelty score among the top 1%.

• 661,643 unique pubs, 1,038,238 obs. in 2001.

% of all

papers

avg # new

pairs

median #

new pairs

Avg (avg

cos)

Avg(min

cos)

Non-novel 89% / / / /

Moderately 10% 1.76 1.00 0.22 0.19

Highly 1% 8.39 7.00 0.13 0.06

Novelty and impact

• Data:

o 661,643 unique articles in 2001 in WoS.

o 1,038,238 obs.

o Papers with multiple subject categories are counted multiple times.

• Dependent variables:

o Various aspects of impact.

• Independent variable:

o Categorical novelty measure: NOV CAT

• Control:

o Number of references and authors, whether internationally

coauthored, subject category dummies.

High risk of novel research

*** p<.001, ** p<.01, * p<.05, + p<.10.

Control for international co-authorship, number of authors (ln), number of

references (ln), and scientific field fixed effects.

15-year

citations

GNB

Mean

Moderately- 0.032***

Highly novel 0.146***

Dispersion

Moderately- -0.001

Highly novel 0.162***

Citation

classes (15y)

Multi-logit

top10% vs mid80%

Moderately- 0.056***

Highly novel 0.162***

low10% vs mid80%

Moderately- -0.054**

Highly novel 0.137**

High gain from novel research

Top 1% cited

(15y)

logit

Cited by big hits

(10y)

logit

Moderately novel 0.122*** 0.055***

Highly novel 0.451*** 0.229***

10y citations (ln) 1.669***

• Novel papers are more likely to become big hits, i.e., top

1% highly cited in the field.

• Novel papers are more likely to be cited by papers which

themselves become big hits.

Transdisciplinary impact# citing

fields

(15y)

Poisson

Ratio

foreign

field

citations

(15y)

OLS

Max dist.:

citing-

home

field

(15y)

OLS

Top 1%

cited

home

field

(15y)

logit

Top 1%

cited

foreign

field

(15y)

logit

Moderately- 0.100*** 0.050*** 0.016*** -0.102** 0.318***

Highly novel 0.177*** 0.083*** 0.030*** 0.010 0.669***

15y cites (ln) 0.494*** 0.002***

15y foreign

cites (ln)

0.052***

• Novel papers are cited in more fields and fields further

away from their home field.

• Novel papers are highly cited in foreign fields but not in

their home field.

Top 1%

cited (3y)

logit

Moderately- -0.102**

Highly novel -0.031

Delayed recognition

• Novel papers are more likely to be top cited in the long run,

but not in the short run.

• Delayed recognition.

o Ahead of its time.

o Resistance from incumbent scientific paradigms.

Top 1%

cited (15y)

logit

Moderately- 0.122***

Highly novel 0.451***

Bias against novelty

• Novel papers are less likely to be published in journals

with high Impact Factors.

JIF

Poisson

JIF

Poisson

JIF

Poisson

Moderately novel -0.103*** -0.101*** -0.079***

Highly novel -0.182*** -0.180*** -0.136***

Journal age < 4 -0.398***

Journal age (ln) 0.250***

Summary

Implications

• Potential bias against novel research in science policy

using journal impact factor or short-term citations.

• Over-reliance on such measures

o Directly, discourage novel research that might of great value.

o Indirectly, miss follow-on breakthroughs build on novel research.

• The monodisciplinary approach in peer review may fail to

recognize the full value of novel research.

Caveats

• Combinatorial novelty, other dimensions of novelty

• Not all breakthrough research is “novel”

• Data are truncated

• “Gaming” system could become concern if review bodies

focused on “novel” indicator

• Note: important for public agencies to have a portfolio that

includes risk; not all research funded should be risky. Real

role for “ditch diggers”

Thanks for your attention!

Questions, comments?