w i s s e n science 2.0 vu - ktikti.tugraz.at › staff › elex › courses › science20 ›...

42
www.tugraz.at W I S S E N T E C H N I K L E I D E N S C H A F T www.tugraz.at Science 2.0 VU Big Science, e-Science and E- Infrastructures + Bibliometric Network Analysis WS 2015/16 Elisabeth Lex KTI, TU Graz

Upload: others

Post on 06-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

W I S S E N n T E C H N I K n L E I D E N S C H A F T

u www.tugraz.at

Science 2.0 VU Big Science, e-Science and E-Infrastructures + Bibliometric Network Analysis

WS 2015/16

Elisabeth Lex KTI, TU Graz

Page 2: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

Agenda

•  Repetition from last time: altmetrics / altmetrics in practice

•  Big Data and Science •  E-Science •  E-Infrastructures •  Bibliometric Network Analysis •  Your Assignment!

2

Page 3: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

Altmetrics (repetition)

„Altmetric is the creation and study of new metrics based on the Social Web for analyzing and informing

scholarship“ -  Altmetrics Manifesto, http://altmetrics.org/about

•  Aggregated from many sources (e.g. Twitter, Mendeley, github, slideshare,...)

•  Article Level Metrics (ALM) •  multidimensional suite of transparent and established metrics at

article level

3

Page 4: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

Examples for Altmetrics sources (repetition) •  Usage

•  Views, downloads,.. •  Captures

•  Bookmarks, readers,.. •  Mentions

•  Blog posts, news stories, Wikipedia articles, comments, reviews

•  Social Media •  Tweets, Google+, Facebook likes, shares, ratings

•  Citations •  Web of Science, Scopus, Google Scholar,...

4

Page 5: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

Examples: Altmetric.com

5 Source: http://www.altmetric.com/details.php?domain=www.altmetric.com&citation_id=843656

Page 6: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

Lessons learned (repetition)

•  Alternative ways to assess impact of various scientific outputs

•  No common understanding of altmetrics yet •  What do they really express? •  Are they useful and for which part of the research

process? •  Not necessarily „better“ metrics

•  E.g. Gamification •  Can help to get an overview of a research field

•  Visualizations based on altmetrics

6

Page 7: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

Modern Science: What has changed?

•  150 years later: Searching for new particles like Higgs boson with the Large Hadron Collider

•  Built in collaboration with over 10,000 scientists and engineers from over 100 countries, hundreds of universities and laboratories. In a tunnel of 27 km in circumference,175 m deep, near Geneva

7

Page 8: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

Motivation

•  Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and engineering) generate large and complex datasets (Big Data)

•  require more advanced database and architectural support

•  „New kind of research methodology“ has emerged (fourth paradigm of scientific exploration (Hey, 2007)

•  based on statistical exploration of big amounts of data

8 http://www.ksi.mff.cuni.cz/astropara/

Page 9: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

Data intensive scientific discovery

9 http://research.microsoft.com/en-us/collaboration/fourthparadigm/4th_paradigm_book_complete_lr.pdf

Page 10: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

Example: Big Data in Science - European Exascale Projects

10 http://exascale-projects.eu

Exascale computing: computers capable of at least one exaflops (1018 floating point operations per second) à Not yet achieved, currently 1015

Page 11: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

Publications as Big Data

11

Cross- Journal Recommen- dation based on Click Streams

[Bollen et al., 2009]

Page 12: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

e-Science

•  Large scale science (since 1999) •  Data-driven discovery •  Focus on computationally intensive science and how

to tackle it using highly distributed environments in collaborative manner

•  Powerful computers: Supercomputers, High Performance Computing (HPC), Grid,…

•  Distributed Computing •  Powerful research infrastructures – “e-infrastructures”,

grids, clouds

12 http://www.anandtech.com/show/6421/inside-the-titan-supercomputer-299k-amd-x86-cores-and-186k-nvidia-gpu-cores/3

Page 13: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

Supercomputers

13 http://www.top500.org/lists/2014/06/ http://www.wikihow.com/Build-a-Supercomputer

•  large, expensive systems, usually housed in a single room, in which multiple processors are connected by fast local network

•  Suited for highly complex, real-time applications and simulation

Pros: data can move between processors rapidly àall processors can work together on same tasks Cons: expensive to build and maintain. Do not scale well, e.g. adding more processors is challenging

Page 14: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

Distributed Computing

•  systems in which processors are not necessarily located in close proximity to one another—and can even be housed on different continents—but which are connected via the Internet or other networks

14

•  Pros: relative to supercomputers much less expensive.

•  Cons: less speed achieved than with supercomputers

Page 15: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

Example: Hadoop

•  Ecosystem of tools for processing big data

•  Simple computational model •  two-stage method for processing large data amounts •  design an algorithm for operating on one chunk of the

data in two stages (a Map and a Reduce stage), MapReduce automatically distributes that algorithm to cluster à hides complexity in framework

15 http://hadoop.apache.org http://architects.dzone.com/articles/how-hadoop-mapreduce-works

Page 16: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

Hadoop in eScience: Example: Astronomical Image Processing

•  Large telescopes survey sky over a prolonged period of time.

•  Large Synoptic Survey Telescope LSST - under construction - will capture 1/2 of sky over 10 years - 30TB of data every night - ~60PBs in 10 years

•  Astronomers pick out faint objects for study by capturing multiple images of same area and by combining them – „coaddition“

•  Challenge: how to organize and process all the resulting data.

16 http://www.lsst.org/lsst/

Page 17: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

Using Hadoop to help with image coaddition

17 http://escience.washington.edu/get-help-now/astronomical-image-processing-hadoop

Page 18: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

Virtual Science Environments

•  Not only HPC but also sharing of knowledge and data is becoming a requirement for scientific discovery

•  providing useful mechanisms to facilitate this sharing •  Preserve and organize research data

à Virtual Science Environments: „virtual environments in which researchers work together through ubiquitous, trusted and easy access to services for scientific data, computing and networking, enabled by e-Infrastructures“

18

Page 19: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

Defining e-Infrastructures

European e- Infrastructure Reflection group (e-IRG):

‘The term e-Infrastructure refers to this new research environment in which all researchers—whether working in the context of their home institutions or in national or multinational scientific initiatives—have shared access

to unique or distributed scientific facilities (including data, instruments, computing and communications),

regardless of their type and location in the world.’

19 http://www.e-irg.eu/about-e-irg.html

Page 20: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

e-Infrastructures - Goals

•  Opening access to knowledge through reliable, distributed and participatory data e-infrastructures

•  Cost effective infrastructures for preservation and curation for re-use of data

•  Persistent availability of information and linking people and data through flexible and robust digital identifiers

•  Interoperability for consistency of approaches on global data exchange (e.g. standards)

•  Enabling trust through authentication and authorisation mechanisms

20 http://cordis.europa.eu/fp7/ict/e-infrastructure/docs/framework-for-action-in-h2020_en.pdf

Page 21: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

Example: e-Infrastructure OpenAIRE

•  The European Open Access Data Infrastructure for Scholarly and Scientific Communication

•  Functionality: •  Harvesting and storing of information about

publications from various repos (OAI-PMH) •  Enables searching for publications and related

infos (e.g. funding,..) •  Provides list of OA repos that can be used to store

publications •  Orphan repo

•  Shows statistics of stored data 21 https://www.openaire.eu

Page 22: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

OpenAIRE - Applications

22

Page 23: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

Example: e-Infrastructures Austria 1/2

23 http://www.e-infrastructures.at

Page 24: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

Example: e-Infrastructures Austria 2/2

24

Page 25: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

Take away message

•  Big Science / e-Science: data-driven, large scale science

•  Supercomputers and distributed computing •  Virtual research environments

•  e-Infrastructures

25

Page 26: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

Bibliometric Network Analysis

26

Page 27: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

Bibliometrics

•  Quantitative study of all kinds of bibliographic data •  Patterns of authorship, publications, citations •  E.g: citation analysis of research outputs/publication •  Assess research impact of individuals, groups,

institutions •  Measuring by Author (H Index), Article (Plos), or

Publication (Journal Impact Factor) •  Measure of Output not Quality (Quantitative Not

Qualitative !) •  Other measures could include funding received, number

of patents, awards granted, or qualitative measures such as peer review

17/04/2015 Maynooth University

Page 28: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

Why use Bibliometrics?

•  Measure impact of research/publishing activity •  CV, promotion, tenure, grants, feedback to funding bodies/

industry/public •  Showcase Individual/Group/Institutional Research

•  identify Areas of Research Strengths/Weaknesses •  Inform Research Priorities •  Identify highest impact or top performing Journals in a Subject

Area •  Where to Publish, learning about a particular subject area,

identify emerging areas of research •  Identify the top researchers in a subject area

•  Collaborations/Competitors •  Recruitment

•  Learning about a subject area 17/04/2015 Maynooth University

Page 29: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

Bibliometric Networks

•  Represent scientific literature based on bibliographic data in form of networks

•  Helps providing overview of structure of scientific literature e.g. in a domain or wrt a topic

•  Applications •  Identify main research areas within a field •  Analyze relationship between research areas

29

Page 30: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

Bibliometric Networks

•  Co-authorship networks •  Citation networks •  Co-citation networks •  Co-occurence maps

•  Keywords, extracted topics,..

30

Page 31: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

Co-authorship Networks

•  Scientific collaboration network •  Nodes are authors of publications •  Link between authors if they co-authored a

publication •  Collaboration networks are scale-free •  Co-authorship networks are Affiliation Networks

31

Page 32: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

Co-authorship networks: Example

32

Page 33: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

Citation Networks

•  Nodes are publications •  Link between nodes if publications cite each other •  Reveals how often articles were cited

33

Page 34: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

Citation Networks

34 http://eduinf.eu/2012/03/15/co-citation-analysis-of-the-topic-social-network-analysis/

Page 35: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

Co-Citation Networks

•  Nodes are publications •  Links between nodes if two publications were cited

together in a paper •  How often two articles were cited by some third

article •  OR: nodes are authors

•  Links between nodes if authors were cited together •  To identify clusters of authors

35

Page 36: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

36

Author co-citation network of 15 history & philosophy of science journals. Two authors are connected if they are cited together in some article, and connected more strongly if they are cited together frequently

http://www.scottbot.net/HIAL/?p=38272

Page 37: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

Mining in Scientific Networks

•  Find influential researchers •  Find influential papers •  Investigate patterns of scientific collaboration •  ...

37

Page 38: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

Centrality Measures

•  Degree Centrality •  equals to number of links (connections) a

node has à  In citation networks papers that have high

in-degree centrality have a lot of citations à  Widely used metric for measuring the

scientific impact of a paper

38

Page 39: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

Centrality Measures

•  „Extension“ of degree centrality •  Degree centrality awards one centrality point for

every neighbor a node has •  However, not all neighbors are equally important

•  In many cases importance of node increased by having connections to other nodes that are themselves important

•  Eigenvector centrality: not only count of neighbors is important but also the importance of the neighbors

•  Eigenvector centrality gives each node score proportional to the sum of the scores of its neighbors

39

Page 40: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

Centrality Measures in Python

https://networkx.github.io/documentation/latest/reference/algorithms.centrality.html

40

Page 41: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

Summary

•  Big Science •  E-Science •  E-Infrastructure •  Bibliometrics •  Bibliometric Network Analysis

41

Page 42: W I S S E N Science 2.0 VU - KTIkti.tugraz.at › staff › elex › courses › science20 › slides › ... · Bibliometric Networks • Represent scientific literature based on

www.tugraz.at n

Thank you for your attention!

42