a new software tool for large-scale analysis of citation networks
TRANSCRIPT
A new software tool for large-scale analysis of citation networksNees Jan van Eck
Centre for Science and Technology Studies (CWTS), Leiden University
Workshop “Measuring the Diversity of Research”, Berlin
September 2, 2013
2
Today’s talk
• Part 1: CWTS research program on bibliometric network analysis– VOSviewer
– VOS mapping and clustering
– Large-scale modularity optimization
– Algorithmically constructed publication-level classification system
• Part 2: New software tool for large-scale analysis of citation networks
3
CWTS research program on bibliometric network analysis
Part 1
4
VOSviewer (1)
(Van Eck & Waltman, Scientometrics, 2010)
5
VOSviewer (2)
(Van Eck & Waltman, Scientometrics, 2010)
6
Subject categories
7
Leiden University
8
Erasmus University Rotterdam
9
Delft University of Technology
10
Clinical Neurology
11
Clinical Neurology: Citation density
(Van Eck et al., PLoS ONE, 2012)
12
Clinical Neurology: Reference density
13
VOS mapping and clustering
• Mapping and clustering are commonly used bibliometric network analysis techniques
• Mapping:– Assigning the nodes in a network to locations in a (usually two-
dimensional) space
– VOS mapping technique has been developed specifically for mapping bibliometric networks
• Clustering:– Partitioning the nodes in a network into a number of groups
(a.k.a. community detection)
– VOS clustering technique has been developed to be used jointly with the VOS mapping technique in a unified technical framework
1414
Unified approach: Clustering seen as mapping in a restricted space
1515
Unified approach: Clustering seen as mapping in a restricted space
16
Unified approach to mapping and clustering
Minimize
wheren: number of nodes in the network
m: number of links in the network
cij: number of links between nodes i and j
ci: number of links of node i
ji
ijji
ijji
ijn dd
cc
mcxxQ 2
1
2),,(
Mappingxi: vector denoting the location of
node i in a p-dimensional map
p
kjkikjiij xxxxd
1
2)(
Clusteringxi: integer denoting the cluster to
which node i belongs
: resolution parameter
ji
ji
ij xx
xxd
if1
if0
17
Unified approach: Mapping
• Equivalent to the VOS mapping technique
• Closely related to multidimensional scaling (Van Eck et al., JASIST, 2010)
18
Unified approach: Clustering
• Equivalent to a weighted and parameterized variant of modularity-based clustering (Waltman et al., JOI, 2010)
• Parameter makes it possible to customize the granularity level of the clustering
Maximize
where(xi, xj) equals 1 if xi = xj and 0 otherwise
ji
jiijijjin m
cccwxx
mxxQ
2),(
2
1),,(ˆ
1
jiij cc
mw
2
19
Large-scale modularity optimization• Modularity optimization is one of the most
popular approaches to clustering in networks
• Several variants of the original modularity function have been proposed, supporting for instance weighted networks and different resolution levels
• Optimization of modularity functions in large networks (with millions of nodes and edges) has received only limited attention but has important applications in bibliometrics
20
• ‘Louvain algorithm’ (Blondel et al., 2008) is the best-known algorithm for large-scale modularity optimization
• Our proposed ‘smart local moving algorithm’ can be seen as an enhanced version of this algorithm (Waltman & Van Eck, 2013)
New algorithm for large-scale modularity optimization
21
Louvain algorithmQ = 0.3791
Q = 0.4151
22
Smart local moving algorithm
Q = 0.4198
Q = 0.3791
23
Comparison
Network Louvain Smart local
moving
Amazon
(0.5M / 0.9M)
Qmin 0.9257 0.9335
Qmax 0.9264 0.9338
t 6 28
DBLP
(0.4M / 1.0M)
Qmin 0.8203 0.8357
Qmax 0.8227 0.8367
t 7 26
IMDb
(0.4M / 15.0M)
Qmin 0.6976 0.7050
Qmax 0.7041 0.7077
t 18 100
LiveJournal
(4.0M / 34.7M)
Qmin 0.7441 0.7676
Qmax 0.7557 0.7720
t 350 1 549
WoS
(10.6M / 104.5M)
Qmin 0.7714 0.7918
Qmax 0.7786 0.7957
t 6 800 19 994
Web uk-2005
(39.5M / 783.0M)
Qmin 0.9793 0.9801
Qmax 0.9795 0.9801
t 11 006 17 074
24
Classification systems of scientific publications• Web of Science/Scopus journal subject categories:
– Scientific fields defined at the level of journals rather than individual publications
– Difficulties with multidisciplinary journals
– High level of aggregation
– Sometimes outdated or inaccurate
• Disciplinary classification systems:– E.g., CA, JEL, MeSH, PACS
– Not available for all disciplines
– Sometimes outdated or inaccurate
25
Algorithmic classification systems (Waltman & Van Eck, JASIST, 2012)• Why not algorithmically construct a classification
system of science?
• We cluster publications (not journals) into fields based on citation relations
• Only direct citation relations are used; no co-citation or bibliographic coupling relations
• Fields are defined at different levels of granularity and are organized hierarchically
26
Example
• 10.2 million publications from the period 2001–2010 indexed in Web of Science
• 97.6 million direct citation relations
• Classification system of 3 hierarchical levels:– 20 broad disciplines
– 672 fields
– 22,412 subfields
• Clustering by optimizing a variant of the standard modularity function that accounts for differences across fields in citation practices
27
Map of the 672 research areas at level 2 of the classification system
28
Map of the 417 publications in research area 4.30.10
29
New software tool for exploring large-scale citation networks
Part 2
30
Exploring citation networks: Why?
• To support literature reviewing
• To show how the scientific literature has evolved over time
• To delineate topics or research areas in the literature
• To identify connections between different topics in the literature
Motivation for a new tool
• VOSviewer has proven to be a very useful tool for visualizing science from a static point of view
• VOSviewer has not been developed for visualizing the dynamics of science
• In fact, the availability of software tools for dynamic visualizations is rather limited:– CiteSpace (Chaomei Chen)
– HistCite (Eugene Garfield)
31
32
HistCite
• Timeline visualization of publications and their citation relations, referred to as algorithmic historiography by Garfield
Citation Network Explorer
• Somewhat similar to HistCite, but capable of dealing with much larger citation networks
• So far, the tool has been used successfully with the entire Web of Science citation network of the social sciences (1980–2013; ~2M publications and ~20M citations)
• The aim is to be able to handle the entire citation network of all scientific disciplines (~40M publications and ~500M citations)
33
Today’s demonstration (1)
• We demonstrate a prototype of the tool
• The core functionality is available, but some options have not yet been fully implemented
• Your feedback is very much appreciated!
34
Today’s demonstration (2)
• Data set 1:– Scientometrics
– 1980–2013
– ~10K publications and ~60K citations
• Data set 2:– All social sciences except for psychology, education, and
health-related sciences
– 1980–2013
– ~1.4M publications and ~10M citations
35
36
Citation Network Explorer
37
References
Van Eck, N.J., & Waltman, L. (2010). Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics, 84(2), 523-538.
Van Eck, N.J., & Waltman, L. (2011). Text mining and visualization using VOSviewer. ISSI Newsletter, 7(3), 50-54.
Van Eck, N.J., Waltman, L., Dekker, R., & Van den Berg, J. (2010). A comparison of two techniques for bibliometric mapping: Multidimensional scaling and VOS. JASIST, 61(12), 2405-2416.
Van Eck, N.J., Waltman, L., Van Raan, A.F.J., Klautz, R.J.M., & Peul, W.C. (2013). Citation analysis may severely underestimate the impact of clinical research as compared to basic research. PLoS ONE, 8(4), e62395.
Waltman, L., & Van Eck, N.J. (2012). A new methodology for constructing a publication-level classification system of science. JASIST, 63(12), 2378-2392.
Waltman, L., & Van Eck, N.J. (2013). A smart local moving algorithm for large-scale modularity-based community detection. arXiv:1308.6604.
Waltman, L., Van Eck, N.J., & Noyons, E.C.M. (2010). A unified approach to mapping and clustering of bibliometric networks. Journal of Informetrics, 4(4), 629-635.