large-scale analysis of bibliometric networks

39
Large-scale analysis of bibliometric networks Nees Jan van Eck Centre for Science and Technology Studies (CWTS), Leiden University International Conference on Data-driven Discovery: When Data Science Meets Information Science Beijing, China, June 20, 2016

Upload: nees-jan-van-eck

Post on 18-Jan-2017

84 views

Category:

Science


0 download

TRANSCRIPT

Large-scale analysis of bibliometric

networks

Nees Jan van Eck

Centre for Science and Technology Studies (CWTS), Leiden University

International Conference on Data-driven Discovery:

When Data Science Meets Information Science

Beijing, China, June 20, 2016

Bibliographic databases: ‘Big data’

1

Web of Science Scopus

Journals 12,000 20,000

Publications 45 million 35 million

Citations 1 billion 0.9 billion

Bibliometric networks

2

Web of

Science

Scopus

Citation network

of pubs / authors / journals

Co-authorship network

of authors / organizations

Co-citation network

of pubs / authors / journals

Co-occurrence network

of keywords / terms

Bibliographic coupling network

of pubs / authors / journals

Bibliographic

database

Outline

• Software tools

• Network analysis techniques

• Analysis of data science

3

Software tools

4

Software tools

• VOSviewer (www.vosviewer.com)

– Tool for constructing and visualizing bibliometric networks

• CitNetExplorer (www.citnetexplorer.nl)

– Tool for visualizing and analyzing citation networks of

publications

• Both tools have been developed together

with my colleague Ludo Waltman 5

VOSviewer

6

VOSviewer: Overview

• Software tool for visualizing (bibliometric) networks

• Built-in support for popular bibliographic databases

• Text mining functionality

• Layout and clustering techniques

• Advanced visualization features:

– Smart labeling algorithm

– Overlay visualizations

– Density visualizations (‘heat map’)

• Users:

– Researchers

– Professional users (e.g., universities, libraries, funders,

publishers)7

Map of university co-authorship

network

8

Map of journal citation network

9

CitNetExplorer

10

• Any type of bibliometric

network

• Co-authorship, direct citations,

co-citation, and bibliographic

coupling

• Time dimension is ignored

• Networks of at most ~10,000

nodes are supported

• Only citation networks of

publications

• Direct citation between

publications

• Time dimension is explicitly

considered

• Millions of publications are

supported

11

VOSviewer CitNetExplorer

Network

analysis

techniques

12

Network analysis techniques

13

Layout:

• Assigning the nodes in a network to

locations in a (usually 2d) space

(a.k.a. mapping)

• Visualization of similarities (VOS)

Clustering:

• Partitioning the nodes in a network

into a number of groups (a.k.a.

community detection)

• Weighted modularity

• Smart local moving algorithm

1414

Clustering can be seen as mapping

in a restricted space

1515

Clustering can be seen as mapping

in a restricted space

Unified approach to mapping and

clustering

Minimize

where

n: number of nodes in the network

m: total weight of all edges in the network

Aij: weight of edge between nodes i and j

ki: total weight of all edges of node i

16

ji

ij

ji

ijij

ji

nddA

kk

mxxQ

2

1

2),,(

Mapping

xi: vector denoting the location

of node i in a p-dimensional

space

p

k

jkikjiijxxxxd

1

2

)(

Clustering

xi: integer denoting the

community to which node i

belongs

: resolution parameter

ji

ji

ij

xx

xx

d

if 1

if 0

Smart local moving algorithm

17

Q = 0.4198

Q = 0.3791

Reduced

network

Local moving

heuristic in

subnetworks

Local moving heuristic

Original

network

Algorithmically constructed

classification system of science

• 17.8 million publications from the period 2000–

2015 indexed in Web of Science

• 282.4 million citation relations

• Classification system of 3 hierarchical levels:

– 27 broad disciplines

– 817 fields

– 4,113 subfields

18

Breakdown of scientific literature into

817 fields

19

Social sciences

and humanitiesBiomedical and

health sciences

Life and earth

sciences

Mathematics and

computer science

Physical

sciences and

engineering

Publications in scientometrics

subfield

20

Time-line map of highly cited

scientometrics publications

21

Analysis of

data science

22

What is data science?

• Empirical operationalization of data science based

on publications with ‘data’ in title or abstract

23

Wikipedia: “Data Science is an interdisciplinary field

about processes and systems to extract knowledge

or insights from data … which is a continuation of

some of the data analysis fields such as statistics,

data mining, and predictive analytics”

LCDS: “Data Science … deals with finding, analyzing

and validating complex patterns in data. Data

Science methods are indispensable for maintaining a

competitive edge in all disciplines in science”

Growth of data-driven research

24

0%

2%

4%

6%

8%

10%

12%

14%

16%

18%

20%

1990 1995 2000 2005 2010 2015

Percen

tag

e o

f p

ub

licatio

ns

% 'data' publications % 'theory' publications

Breakdown of scientific literature into

817 fields

25

Social sciences

and humanitiesBiomedical and

health sciences

Life and earth

sciences

Mathematics and

computer science

Physical

sciences and

engineering

Data-driven nature of different

scientific fields

26

Social sciences

and humanitiesBiomedical and

health sciences

Life and earth

sciences

Mathematics and

computer science

Physical

sciences and

engineering

% pub. with ‘data’ in title or abstract

Data-driven nature of different

scientific fields

27

artificial

intelligence

statisticsbioinformatics

neuroimaging pattern

recognitionastronomy

earthwater

climate

remote

sensing

nutrition

obesity

addiction

accident

analysis

% pub. with ‘data’ in title or abstract

Data science fields (at least 25% ‘data’

publications)

28

Social sciences

and humanitiesBiomedical and

health sciences

Life and earth

sciences

Mathematics and

computer science

Physical

sciences and

engineering

Term map of data science fields

29

China’s publication output in data

science fields

30

Social sciences

and humanitiesBiomedical and

health sciences

Life and earth

sciences

Mathematics and

computer science

Physical

sciences and

engineering

China’s publication output in data

science fields

31

artificial

intelligence

pattern

recognition

high

energy

earth

atmospheres

weatherremote

sensing

Chinese institutes with most publications

in data science fields (2011-2015)

• Chinese Academy of Sciences

• Peking University

• Tsinghua University

• China University of Geosciences

• Zhejiang University

• Nanjing University

• Shanghai Jiao Tong University

• University of Science and Technology of China

• Beijing Normal University

• University of Hong Kong

32

CAS publication output in data

science fields

33

earth

atmospheres

weatherremote

sensing

vegetation

astronomy

high energy

Term map based on CAS publications in

data science fields

34

CAS (Beijing Branch) publication

output in data science fields

35

astronomy

earth

atmospheres

weatherremote

sensing

vegetation

high energy

CAS (Shanghai Branch) publication

output in data science fields

36

bioinformatics

genetics

astronomy

nuclear

Do it yourself!

37

www.vosviewer.com www.citnetexplorer.nl

Thank you for your attention!

38