w i s s e n science 2.0 vu - ktikti.tugraz.at › staff › elex › courses › science20 ›...
TRANSCRIPT
www.tugraz.at n
W I S S E N n T E C H N I K n L E I D E N S C H A F T
u www.tugraz.at
Science 2.0 VU Big Science, e-Science and E-Infrastructures + Bibliometric Network Analysis
WS 2015/16
Elisabeth Lex KTI, TU Graz
www.tugraz.at n
Agenda
• Repetition from last time: altmetrics / altmetrics in practice
• Big Data and Science • E-Science • E-Infrastructures • Bibliometric Network Analysis • Your Assignment!
2
www.tugraz.at n
Altmetrics (repetition)
„Altmetric is the creation and study of new metrics based on the Social Web for analyzing and informing
scholarship“ - Altmetrics Manifesto, http://altmetrics.org/about
• Aggregated from many sources (e.g. Twitter, Mendeley, github, slideshare,...)
• Article Level Metrics (ALM) • multidimensional suite of transparent and established metrics at
article level
3
www.tugraz.at n
Examples for Altmetrics sources (repetition) • Usage
• Views, downloads,.. • Captures
• Bookmarks, readers,.. • Mentions
• Blog posts, news stories, Wikipedia articles, comments, reviews
• Social Media • Tweets, Google+, Facebook likes, shares, ratings
• Citations • Web of Science, Scopus, Google Scholar,...
4
www.tugraz.at n
Examples: Altmetric.com
5 Source: http://www.altmetric.com/details.php?domain=www.altmetric.com&citation_id=843656
www.tugraz.at n
Lessons learned (repetition)
• Alternative ways to assess impact of various scientific outputs
• No common understanding of altmetrics yet • What do they really express? • Are they useful and for which part of the research
process? • Not necessarily „better“ metrics
• E.g. Gamification • Can help to get an overview of a research field
• Visualizations based on altmetrics
6
www.tugraz.at n
Modern Science: What has changed?
• 150 years later: Searching for new particles like Higgs boson with the Large Hadron Collider
• Built in collaboration with over 10,000 scientists and engineers from over 100 countries, hundreds of universities and laboratories. In a tunnel of 27 km in circumference,175 m deep, near Geneva
7
www.tugraz.at n
Motivation
• Internet and science disciplines (e.g. physical sciences, biological sciences, medicine, and engineering) generate large and complex datasets (Big Data)
• require more advanced database and architectural support
• „New kind of research methodology“ has emerged (fourth paradigm of scientific exploration (Hey, 2007)
• based on statistical exploration of big amounts of data
8 http://www.ksi.mff.cuni.cz/astropara/
www.tugraz.at n
Data intensive scientific discovery
9 http://research.microsoft.com/en-us/collaboration/fourthparadigm/4th_paradigm_book_complete_lr.pdf
www.tugraz.at n
Example: Big Data in Science - European Exascale Projects
10 http://exascale-projects.eu
Exascale computing: computers capable of at least one exaflops (1018 floating point operations per second) à Not yet achieved, currently 1015
www.tugraz.at n
Publications as Big Data
11
Cross- Journal Recommen- dation based on Click Streams
[Bollen et al., 2009]
www.tugraz.at n
e-Science
• Large scale science (since 1999) • Data-driven discovery • Focus on computationally intensive science and how
to tackle it using highly distributed environments in collaborative manner
• Powerful computers: Supercomputers, High Performance Computing (HPC), Grid,…
• Distributed Computing • Powerful research infrastructures – “e-infrastructures”,
grids, clouds
12 http://www.anandtech.com/show/6421/inside-the-titan-supercomputer-299k-amd-x86-cores-and-186k-nvidia-gpu-cores/3
www.tugraz.at n
Supercomputers
13 http://www.top500.org/lists/2014/06/ http://www.wikihow.com/Build-a-Supercomputer
• large, expensive systems, usually housed in a single room, in which multiple processors are connected by fast local network
• Suited for highly complex, real-time applications and simulation
Pros: data can move between processors rapidly àall processors can work together on same tasks Cons: expensive to build and maintain. Do not scale well, e.g. adding more processors is challenging
www.tugraz.at n
Distributed Computing
• systems in which processors are not necessarily located in close proximity to one another—and can even be housed on different continents—but which are connected via the Internet or other networks
14
• Pros: relative to supercomputers much less expensive.
• Cons: less speed achieved than with supercomputers
www.tugraz.at n
Example: Hadoop
• Ecosystem of tools for processing big data
• Simple computational model • two-stage method for processing large data amounts • design an algorithm for operating on one chunk of the
data in two stages (a Map and a Reduce stage), MapReduce automatically distributes that algorithm to cluster à hides complexity in framework
15 http://hadoop.apache.org http://architects.dzone.com/articles/how-hadoop-mapreduce-works
www.tugraz.at n
Hadoop in eScience: Example: Astronomical Image Processing
• Large telescopes survey sky over a prolonged period of time.
• Large Synoptic Survey Telescope LSST - under construction - will capture 1/2 of sky over 10 years - 30TB of data every night - ~60PBs in 10 years
• Astronomers pick out faint objects for study by capturing multiple images of same area and by combining them – „coaddition“
• Challenge: how to organize and process all the resulting data.
16 http://www.lsst.org/lsst/
www.tugraz.at n
Using Hadoop to help with image coaddition
17 http://escience.washington.edu/get-help-now/astronomical-image-processing-hadoop
www.tugraz.at n
Virtual Science Environments
• Not only HPC but also sharing of knowledge and data is becoming a requirement for scientific discovery
• providing useful mechanisms to facilitate this sharing • Preserve and organize research data
à Virtual Science Environments: „virtual environments in which researchers work together through ubiquitous, trusted and easy access to services for scientific data, computing and networking, enabled by e-Infrastructures“
18
www.tugraz.at n
Defining e-Infrastructures
European e- Infrastructure Reflection group (e-IRG):
‘The term e-Infrastructure refers to this new research environment in which all researchers—whether working in the context of their home institutions or in national or multinational scientific initiatives—have shared access
to unique or distributed scientific facilities (including data, instruments, computing and communications),
regardless of their type and location in the world.’
19 http://www.e-irg.eu/about-e-irg.html
www.tugraz.at n
e-Infrastructures - Goals
• Opening access to knowledge through reliable, distributed and participatory data e-infrastructures
• Cost effective infrastructures for preservation and curation for re-use of data
• Persistent availability of information and linking people and data through flexible and robust digital identifiers
• Interoperability for consistency of approaches on global data exchange (e.g. standards)
• Enabling trust through authentication and authorisation mechanisms
20 http://cordis.europa.eu/fp7/ict/e-infrastructure/docs/framework-for-action-in-h2020_en.pdf
www.tugraz.at n
Example: e-Infrastructure OpenAIRE
• The European Open Access Data Infrastructure for Scholarly and Scientific Communication
• Functionality: • Harvesting and storing of information about
publications from various repos (OAI-PMH) • Enables searching for publications and related
infos (e.g. funding,..) • Provides list of OA repos that can be used to store
publications • Orphan repo
• Shows statistics of stored data 21 https://www.openaire.eu
www.tugraz.at n
OpenAIRE - Applications
22
www.tugraz.at n
Example: e-Infrastructures Austria 1/2
23 http://www.e-infrastructures.at
www.tugraz.at n
Example: e-Infrastructures Austria 2/2
24
www.tugraz.at n
Take away message
• Big Science / e-Science: data-driven, large scale science
• Supercomputers and distributed computing • Virtual research environments
• e-Infrastructures
25
www.tugraz.at n
Bibliometric Network Analysis
26
www.tugraz.at n
Bibliometrics
• Quantitative study of all kinds of bibliographic data • Patterns of authorship, publications, citations • E.g: citation analysis of research outputs/publication • Assess research impact of individuals, groups,
institutions • Measuring by Author (H Index), Article (Plos), or
Publication (Journal Impact Factor) • Measure of Output not Quality (Quantitative Not
Qualitative !) • Other measures could include funding received, number
of patents, awards granted, or qualitative measures such as peer review
17/04/2015 Maynooth University
www.tugraz.at n
Why use Bibliometrics?
• Measure impact of research/publishing activity • CV, promotion, tenure, grants, feedback to funding bodies/
industry/public • Showcase Individual/Group/Institutional Research
• identify Areas of Research Strengths/Weaknesses • Inform Research Priorities • Identify highest impact or top performing Journals in a Subject
Area • Where to Publish, learning about a particular subject area,
identify emerging areas of research • Identify the top researchers in a subject area
• Collaborations/Competitors • Recruitment
• Learning about a subject area 17/04/2015 Maynooth University
www.tugraz.at n
Bibliometric Networks
• Represent scientific literature based on bibliographic data in form of networks
• Helps providing overview of structure of scientific literature e.g. in a domain or wrt a topic
• Applications • Identify main research areas within a field • Analyze relationship between research areas
29
www.tugraz.at n
Bibliometric Networks
• Co-authorship networks • Citation networks • Co-citation networks • Co-occurence maps
• Keywords, extracted topics,..
30
www.tugraz.at n
Co-authorship Networks
• Scientific collaboration network • Nodes are authors of publications • Link between authors if they co-authored a
publication • Collaboration networks are scale-free • Co-authorship networks are Affiliation Networks
31
www.tugraz.at n
Co-authorship networks: Example
32
www.tugraz.at n
Citation Networks
• Nodes are publications • Link between nodes if publications cite each other • Reveals how often articles were cited
33
www.tugraz.at n
Citation Networks
34 http://eduinf.eu/2012/03/15/co-citation-analysis-of-the-topic-social-network-analysis/
www.tugraz.at n
Co-Citation Networks
• Nodes are publications • Links between nodes if two publications were cited
together in a paper • How often two articles were cited by some third
article • OR: nodes are authors
• Links between nodes if authors were cited together • To identify clusters of authors
35
www.tugraz.at n
36
Author co-citation network of 15 history & philosophy of science journals. Two authors are connected if they are cited together in some article, and connected more strongly if they are cited together frequently
http://www.scottbot.net/HIAL/?p=38272
www.tugraz.at n
Mining in Scientific Networks
• Find influential researchers • Find influential papers • Investigate patterns of scientific collaboration • ...
37
www.tugraz.at n
Centrality Measures
• Degree Centrality • equals to number of links (connections) a
node has à In citation networks papers that have high
in-degree centrality have a lot of citations à Widely used metric for measuring the
scientific impact of a paper
38
www.tugraz.at n
Centrality Measures
• „Extension“ of degree centrality • Degree centrality awards one centrality point for
every neighbor a node has • However, not all neighbors are equally important
• In many cases importance of node increased by having connections to other nodes that are themselves important
• Eigenvector centrality: not only count of neighbors is important but also the importance of the neighbors
• Eigenvector centrality gives each node score proportional to the sum of the scores of its neighbors
39
www.tugraz.at n
Centrality Measures in Python
https://networkx.github.io/documentation/latest/reference/algorithms.centrality.html
40
www.tugraz.at n
Summary
• Big Science • E-Science • E-Infrastructure • Bibliometrics • Bibliometric Network Analysis
41
www.tugraz.at n
Thank you for your attention!
42