pubmed dataset visualisation pecha kucha

PUBMED dataset visualisa1on George Gkotsis

Knowledge Media Ins1tute The Open University

21.5 million cita1ons 10.8 million authors

Visualisa1on

•  XKCD-style

•  Infographic-‐style Ver1cal scrolling

Data analysis

1. Co-‐authorship network

2. Academic reten1on and produc1vity

3. Terminology & evolu1on

1. Co-‐authorship network

•  For each year, a co-‐authorship graph is constructed

•  Visualise graph proper1es: – Nodes – Edges – Clustering coefficient – Entropy:

Rowe & Strohmaier, WWW2014

1. Co-‐authorship network (cont.)

2. Academic throughput and reten1on

•  Researcher profile – 4 aYributes:

[1] Year of first publica1on [2] Year of last publica1on [3] Number of publica1ons [4] Dura1on of research ac1vity ([2]-‐[1])

1966 -‐ 2001

2. Academic throughput and reten1on (cont.)

3. Terminology & evolu1on

w: 1-‐gram word-‐term M: 5-‐year ;tles’ corpus

3. Terminology & evolu1on (cont.)

Development

•  Pandas Data analysis and manipula1on

•  NetworkX Network analysis

•  NLTK Natural Language processing

•  Matplotlib Plobng

pubmed dataset visualisation pecha kucha

Internet