a new software tool for large-scale analysis of citation networks

37
A new software tool for large- scale analysis of citation networks Nees Jan van Eck Centre for Science and Technology Studies (CWTS), Leiden University Workshop “Measuring the Diversity of Research”, Berlin September 2, 2013

Upload: nees-jan-van-eck

Post on 19-Jun-2015

971 views

Category:

Education


0 download

TRANSCRIPT

Page 1: A new software tool for large-scale analysis of citation networks

A new software tool for large-scale analysis of citation networksNees Jan van Eck

Centre for Science and Technology Studies (CWTS), Leiden University

Workshop “Measuring the Diversity of Research”, Berlin

September 2, 2013

Page 2: A new software tool for large-scale analysis of citation networks

2

Today’s talk

• Part 1: CWTS research program on bibliometric network analysis– VOSviewer

– VOS mapping and clustering

– Large-scale modularity optimization

– Algorithmically constructed publication-level classification system

• Part 2: New software tool for large-scale analysis of citation networks

Page 3: A new software tool for large-scale analysis of citation networks

3

CWTS research program on bibliometric network analysis

Part 1

Page 4: A new software tool for large-scale analysis of citation networks

4

VOSviewer (1)

(Van Eck & Waltman, Scientometrics, 2010)

Page 5: A new software tool for large-scale analysis of citation networks

5

VOSviewer (2)

(Van Eck & Waltman, Scientometrics, 2010)

Page 6: A new software tool for large-scale analysis of citation networks

6

Subject categories

Page 7: A new software tool for large-scale analysis of citation networks

7

Leiden University

Page 8: A new software tool for large-scale analysis of citation networks

8

Erasmus University Rotterdam

Page 9: A new software tool for large-scale analysis of citation networks

9

Delft University of Technology

Page 10: A new software tool for large-scale analysis of citation networks

10

Clinical Neurology

Page 11: A new software tool for large-scale analysis of citation networks

11

Clinical Neurology: Citation density

(Van Eck et al., PLoS ONE, 2012)

Page 12: A new software tool for large-scale analysis of citation networks

12

Clinical Neurology: Reference density

Page 13: A new software tool for large-scale analysis of citation networks

13

VOS mapping and clustering

• Mapping and clustering are commonly used bibliometric network analysis techniques

• Mapping:– Assigning the nodes in a network to locations in a (usually two-

dimensional) space

– VOS mapping technique has been developed specifically for mapping bibliometric networks

• Clustering:– Partitioning the nodes in a network into a number of groups

(a.k.a. community detection)

– VOS clustering technique has been developed to be used jointly with the VOS mapping technique in a unified technical framework

Page 14: A new software tool for large-scale analysis of citation networks

1414

Unified approach: Clustering seen as mapping in a restricted space

Page 15: A new software tool for large-scale analysis of citation networks

1515

Unified approach: Clustering seen as mapping in a restricted space

Page 16: A new software tool for large-scale analysis of citation networks

16

Unified approach to mapping and clustering

Minimize

wheren: number of nodes in the network

m: number of links in the network

cij: number of links between nodes i and j

ci: number of links of node i

ji

ijji

ijji

ijn dd

cc

mcxxQ 2

1

2),,(

Mappingxi: vector denoting the location of

node i in a p-dimensional map

p

kjkikjiij xxxxd

1

2)(

Clusteringxi: integer denoting the cluster to

which node i belongs

: resolution parameter

ji

ji

ij xx

xxd

if1

if0

Page 17: A new software tool for large-scale analysis of citation networks

17

Unified approach: Mapping

• Equivalent to the VOS mapping technique

• Closely related to multidimensional scaling (Van Eck et al., JASIST, 2010)

Page 18: A new software tool for large-scale analysis of citation networks

18

Unified approach: Clustering

• Equivalent to a weighted and parameterized variant of modularity-based clustering (Waltman et al., JOI, 2010)

• Parameter makes it possible to customize the granularity level of the clustering

Maximize

where(xi, xj) equals 1 if xi = xj and 0 otherwise

ji

jiijijjin m

cccwxx

mxxQ

2),(

2

1),,(ˆ

1

jiij cc

mw

2

Page 19: A new software tool for large-scale analysis of citation networks

19

Large-scale modularity optimization• Modularity optimization is one of the most

popular approaches to clustering in networks

• Several variants of the original modularity function have been proposed, supporting for instance weighted networks and different resolution levels

• Optimization of modularity functions in large networks (with millions of nodes and edges) has received only limited attention but has important applications in bibliometrics

Page 20: A new software tool for large-scale analysis of citation networks

20

• ‘Louvain algorithm’ (Blondel et al., 2008) is the best-known algorithm for large-scale modularity optimization

• Our proposed ‘smart local moving algorithm’ can be seen as an enhanced version of this algorithm (Waltman & Van Eck, 2013)

New algorithm for large-scale modularity optimization

Page 21: A new software tool for large-scale analysis of citation networks

21

Louvain algorithmQ = 0.3791

Q = 0.4151

Page 22: A new software tool for large-scale analysis of citation networks

22

Smart local moving algorithm

Q = 0.4198

Q = 0.3791

Page 23: A new software tool for large-scale analysis of citation networks

23

Comparison

Network  Louvain Smart local

moving

Amazon

(0.5M / 0.9M)

Qmin 0.9257 0.9335

Qmax 0.9264 0.9338

t 6 28

DBLP

(0.4M / 1.0M)

Qmin 0.8203 0.8357

Qmax 0.8227 0.8367

t 7 26

IMDb

(0.4M / 15.0M)

Qmin 0.6976 0.7050

Qmax 0.7041 0.7077

t 18 100

LiveJournal

(4.0M / 34.7M)

Qmin 0.7441 0.7676

Qmax 0.7557 0.7720

t 350 1 549

WoS

(10.6M / 104.5M)

Qmin 0.7714 0.7918

Qmax 0.7786 0.7957

t 6 800 19 994

Web uk-2005

(39.5M / 783.0M)

Qmin 0.9793 0.9801

Qmax 0.9795 0.9801

t 11 006 17 074

Page 24: A new software tool for large-scale analysis of citation networks

24

Classification systems of scientific publications• Web of Science/Scopus journal subject categories:

– Scientific fields defined at the level of journals rather than individual publications

– Difficulties with multidisciplinary journals

– High level of aggregation

– Sometimes outdated or inaccurate

• Disciplinary classification systems:– E.g., CA, JEL, MeSH, PACS

– Not available for all disciplines

– Sometimes outdated or inaccurate

Page 25: A new software tool for large-scale analysis of citation networks

25

Algorithmic classification systems (Waltman & Van Eck, JASIST, 2012)• Why not algorithmically construct a classification

system of science?

• We cluster publications (not journals) into fields based on citation relations

• Only direct citation relations are used; no co-citation or bibliographic coupling relations

• Fields are defined at different levels of granularity and are organized hierarchically

Page 26: A new software tool for large-scale analysis of citation networks

26

Example

• 10.2 million publications from the period 2001–2010 indexed in Web of Science

• 97.6 million direct citation relations

• Classification system of 3 hierarchical levels:– 20 broad disciplines

– 672 fields

– 22,412 subfields

• Clustering by optimizing a variant of the standard modularity function that accounts for differences across fields in citation practices

Page 27: A new software tool for large-scale analysis of citation networks

27

Map of the 672 research areas at level 2 of the classification system

Page 28: A new software tool for large-scale analysis of citation networks

28

Map of the 417 publications in research area 4.30.10

Page 29: A new software tool for large-scale analysis of citation networks

29

New software tool for exploring large-scale citation networks

Part 2

Page 30: A new software tool for large-scale analysis of citation networks

30

Exploring citation networks: Why?

• To support literature reviewing

• To show how the scientific literature has evolved over time

• To delineate topics or research areas in the literature

• To identify connections between different topics in the literature

Page 31: A new software tool for large-scale analysis of citation networks

Motivation for a new tool

• VOSviewer has proven to be a very useful tool for visualizing science from a static point of view

• VOSviewer has not been developed for visualizing the dynamics of science

• In fact, the availability of software tools for dynamic visualizations is rather limited:– CiteSpace (Chaomei Chen)

– HistCite (Eugene Garfield)

31

Page 32: A new software tool for large-scale analysis of citation networks

32

HistCite

• Timeline visualization of publications and their citation relations, referred to as algorithmic historiography by Garfield

Page 33: A new software tool for large-scale analysis of citation networks

Citation Network Explorer

• Somewhat similar to HistCite, but capable of dealing with much larger citation networks

• So far, the tool has been used successfully with the entire Web of Science citation network of the social sciences (1980–2013; ~2M publications and ~20M citations)

• The aim is to be able to handle the entire citation network of all scientific disciplines (~40M publications and ~500M citations)

33

Page 34: A new software tool for large-scale analysis of citation networks

Today’s demonstration (1)

• We demonstrate a prototype of the tool

• The core functionality is available, but some options have not yet been fully implemented

• Your feedback is very much appreciated!

34

Page 35: A new software tool for large-scale analysis of citation networks

Today’s demonstration (2)

• Data set 1:– Scientometrics

– 1980–2013

– ~10K publications and ~60K citations

• Data set 2:– All social sciences except for psychology, education, and

health-related sciences

– 1980–2013

– ~1.4M publications and ~10M citations

35

Page 36: A new software tool for large-scale analysis of citation networks

36

Citation Network Explorer

Page 37: A new software tool for large-scale analysis of citation networks

37

References

Van Eck, N.J., & Waltman, L. (2010). Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics, 84(2), 523-538.

Van Eck, N.J., & Waltman, L. (2011). Text mining and visualization using VOSviewer. ISSI Newsletter, 7(3), 50-54.

Van Eck, N.J., Waltman, L., Dekker, R., & Van den Berg, J. (2010). A comparison of two techniques for bibliometric mapping: Multidimensional scaling and VOS. JASIST, 61(12), 2405-2416.

Van Eck, N.J., Waltman, L., Van Raan, A.F.J., Klautz, R.J.M., & Peul, W.C. (2013). Citation analysis may severely underestimate the impact of clinical research as compared to basic research. PLoS ONE, 8(4), e62395.

Waltman, L., & Van Eck, N.J. (2012). A new methodology for constructing a publication-level classification system of science. JASIST, 63(12), 2378-2392.

Waltman, L., & Van Eck, N.J. (2013). A smart local moving algorithm for large-scale modularity-based community detection. arXiv:1308.6604.

Waltman, L., Van Eck, N.J., & Noyons, E.C.M. (2010). A unified approach to mapping and clustering of bibliometric networks. Journal of Informetrics, 4(4), 629-635.