large-scale analysis of bibliometric data sources

33
Large-scale analysis of bibliometric data sources Nees Jan van Eck Centre for Science and Technology Studies (CWTS), Leiden University 8th LCDS Meeting: Statistics & Data Science Leiden, November 13, 2015

Upload: nees-jan-van-eck

Post on 16-Apr-2017

1.455 views

Category:

Science


1 download

TRANSCRIPT

Large-scale analysis of bibliometric

data sources

Nees Jan van Eck

Centre for Science and Technology Studies (CWTS), Leiden University

8th LCDS Meeting: Statistics & Data Science

Leiden, November 13, 2015

About myself

• Master in computer science

• PhD thesis on bibliometric

mapping of science

• Researcher at CWTS since 2009

• Research focus on analysis and

visualization of bibliometric

networks

1

Centre for Science and Technology

Studies (CWTS)

• Research center at Leiden University

focusing on science and technology

studies

• About 30 staff members

• History of more than 25 years in

bibliometric and scientometric

research

• Contract research

• Full access to large bibliographic

database (Web of Science and

Scopus)

2

Bibliographic databases: ‘Big data’

3

Web of Science Scopus

Journals 12,000 20,000

Publications 45 million 35 million

Citations 1 billion 0.9 billion

Bibliometric networks

4

Web of

Science

Scopus

Citation network

of publications

Co-authorship network

of authors / organizations

Co-citation network

of pubs / authors / journals

Co-occurrence network

of terms

Bibliographic coupling network

of pubs / authors / journals

Bibliographic

database

Outline

• Software tools

• Network analysis techniques

• Analysis of data science

5

Software tools

6

Software tools

• VOSviewer (www.vosviewer.com)

– Tool for constructing and visualizing bibliometric networks

• CitNetExplorer (www.citnetexplorer.nl)

– Tool for visualizing and analyzing citation networks of

publications

• Both tools have been developed together

with my colleague Ludo Waltman 7

VOSviewer

8

Map of university co-authorship

network

9

Map of journal citation network

10

CitNetExplorer

11

Network

analysis

techniques

13

Network analysis techniques

14

Layout:

• Visualization of similarities

(VOS)

Community detection:

• Weighted modularity

• Smart local moving algorithm

Smart local moving algorithm

15

Q = 0.4198

Q = 0.3791

Reduced

network

Local moving

heuristic in

subnetworks

Local moving heuristic

Original

network

Algorithmically constructed

classification system of science

• 16.2 million publications from the period 2000–

2014 indexed in Web of Science

• 241.7 million citation relations

• Classification system of 3 hierarchical levels:

– 28 broad disciplines

– 813 fields

– 3,822 subfields

16

17

Breakdown of scientific literature into

813 fields

Social sciences

and humanities

Biomedical and

health sciences

Life and earth

sciences

Mathematics and

computer science

Physical

sciences and

engineering

Publications in scientometrics

subfield

18

Time-line map of highly cited

scientometrics publications

19

Analysis of

data science

20

What is data science?

• Empirical operationalization of data science based

on publications with ‘data’ in title or abstract

21

Wikipedia: “Data Science is an interdisciplinary field

about processes and systems to extract knowledge

or insights from data … which is a continuation of

some of the data analysis fields such as statistics,

data mining, and predictive analytics”

LCDS: “Data Science … deals with finding, analyzing

and validating complex patterns in data. Data

Science methods are indispensable for maintaining a

competitive edge in all disciplines in science”

Growth of data-driven research

22

0%

2%

4%

6%

8%

10%

12%

14%

16%

18%

20%

1990 1995 2000 2005 2010 2015

Percen

tag

e o

f p

ub

licatio

ns

% 'data' publications % 'theory' publications

23

Breakdown of scientific literature into

813 fields

Social sciences

and humanities

Biomedical and

health sciences

Life and earth

sciences

Mathematics and

computer science

Physical

sciences and

engineering

24

Data-driven nature of different

scientific fields

Social sciences

and humanities

Biomedical and

health sciences

Life and earth

sciences

Mathematics and

computer science

Physical

sciences and

engineering

% pub. with ‘data’ in title or abstract

25

Data-driven nature of different

scientific fields

artificial

intelligence

statisticsbioinformatics

neuroimagingpattern

recognitionastronomy

earthwater

weather

climate

remote

sensing

nutrition

obesity

addiction

% pub. with ‘data’ in title or abstract

Data science fields (at least 20% ‘data’

publications)

26

Social sciences

and humanities

Biomedical and

health sciences

Life and earth

sciences

Mathematics and

computer science

Physical

sciences and

engineering

Term map of data science fields

27

28

Leiden University’s publication output

in data science fields

Social sciences

and humanities

Biomedical and

health sciences

Life and earth

sciences

Mathematics and

computer science

Physical

sciences and

engineering

Leiden University’s institutes with most

publications in data science fields

• Leiden Observatory

• LUMC

• Faculty of Archaeology

• Institute of Psychology (FSW)

• Centre for Science and Technology Studies (FSW)

• Mathematical Institute (Science)

• Institute of Biology Leiden (Science)

• Leiden Institute of Advanced Computer Science

(Science)

29

LUMC departments with most

publications in data science fields

• Medical Statistics and Bioinformatics

• Rheumatology

• Psychiatry

• Radiology

• Clinical Epidemiology

• Human Genetics

• Neurosurgery

• Cardiology

• Clinical Oncology

• Endocrinology 30

Term map based on Leiden University’s

publications in data science fields

31

Do it yourself!

32

www.vosviewer.com www.citnetexplorer.nl

Thank you for your attention!

33