pubmed dataset visualisation pecha kucha
DESCRIPTION
Pubmed dataset visualisation pecha kucha for the WebScience 2014 conferenceTRANSCRIPT
PUBMED dataset visualisa1on George Gkotsis
Knowledge Media Ins1tute The Open University
21.5 million cita1ons 10.8 million authors
Visualisa1on
• XKCD-style
• Infographic-‐style Ver1cal scrolling
Data analysis
1. Co-‐authorship network
2. Academic reten1on and produc1vity
3. Terminology & evolu1on
1. Co-‐authorship network
• For each year, a co-‐authorship graph is constructed
• Visualise graph proper1es: – Nodes – Edges – Clustering coefficient – Entropy:
Rowe & Strohmaier, WWW2014
1. Co-‐authorship network (cont.)
2. Academic throughput and reten1on
• Researcher profile – 4 aYributes:
[1] Year of first publica1on [2] Year of last publica1on [3] Number of publica1ons [4] Dura1on of research ac1vity ([2]-‐[1])
1966 -‐ 2001
2. Academic throughput and reten1on (cont.)
3. Terminology & evolu1on
w: 1-‐gram word-‐term M: 5-‐year ;tles’ corpus
3. Terminology & evolu1on (cont.)
Development
• Pandas Data analysis and manipula1on
• NetworkX Network analysis
• NLTK Natural Language processing
• Matplotlib Plobng