the scientific literature background

39
PCI2014 conference, Athens, Greece 3 rd of October 2014 The OpenScienceLink architecture for novel services exploiting open access data in the biomedical domain CIP-ICT PSP-2012-6 ICT PSP Main Theme: Open Data and Open Access to Scientific Information Efstathios Karanastasis, NTUA Vassiliki Andronikou (NTUA), Efthymios Chondrogiannis (NTUA) George Tsatsaronis (TUD), Daniel Eisinger (TUD), Alina Petrova (TUD)

Upload: tamar

Post on 04-Jan-2016

23 views

Category:

Documents


0 download

DESCRIPTION

The Scientific Literature Background. Lack of universal well-structured repositories of scientific and research data for experimentation and benchmarking of pertinent research works in a given thematic area - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The Scientific Literature Background

PCI2014 conference, Athens, Greece3rd of October 2014

The OpenScienceLink architecture for novel services exploiting open access data in the biomedical domain

CIP-ICT PSP-2012-6ICT PSP Main Theme: Open Data and Open Access to

Scientific Information

Efstathios Karanastasis, NTUA

Vassiliki Andronikou (NTUA), Efthymios Chondrogiannis (NTUA)George Tsatsaronis (TUD), Daniel Eisinger (TUD), Alina Petrova (TUD)

Page 2: The Scientific Literature Background

OpenScienceLink Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014

The Scientific Literature Background

• Lack of universal well-structured repositories of scientific and research data for experimentation and benchmarking of pertinent research works in a given thematic area

• Fragmented, lengthy, weak and inefficient peer review processes given the growing number of journals, magazines and conferences

• Non-objective and extremely focused (in terms of the aspects that they cover such as impact and popularity) tools and metrics for assessing research work as well as individuals, institutions and organizations which are based on a specific snapshot of the scientific work

• Poorly linking of research articles to data journals

Page 3: The Scientific Literature Background

OpenScienceLink Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014

The Scientific Literature Background

• Growing wealth of the scientific work and information produced by researchers and scholars– scientific/research articles– monographs– research datasets

• Need for more effective processes and improved tools and techniques towards:– reviewing scientific articles and research data– organising and managing data journals– bibliographic analysis– management of scientometrics and development of new ones– better collaboration between researchers

Page 4: The Scientific Literature Background

OpenScienceLink Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014

OpenScienceLink Objectives

• Provide a holistic approach to the publication, sharing, linking, reviewing and evaluation of research results based on open access to scientific information

• Empower a novel eco-system for open access to scientific information, which will provide a range of added-value services for all stakeholders

• Main Outcomes:– The OpenScienceLink platform– Implementation of 5 pilots– The Biomedical Data Journal (BMDJ)

Open Semantically-enabled, Social-aware Access to Scientific Data

Page 5: The Scientific Literature Background

OpenScienceLink Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014

OpenScienceLink Pilots

1. Research Dynamics-aware Open Access Data Journals Development

2. Novel open, semantically-assisted peer review process

3. Data Mining for Biomedical and Clinical Research Trends Detection and Analysis

4. Data Mining for Proactive Formulation of Scientific Collaborations

5. Scientific field-aware, Productivity- and Impact-oriented Enhanced Research Evaluation Services

5

Page 6: The Scientific Literature Background

OpenScienceLink Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014

2Novel open, semantically-assisted peer review process

• Article-based reviewers suggestion

• Assign competent reviewers

• Review support tools (e.g. automatic retrieval of relevant research articles)

• Review form submission

• Post-review discussion

1Research Dynamics-aware Open Access Data Journals Development• Data journal establishment

• Journal issue suggestion

• Dataset submission

• Dataset peer review

• Publishing

• Assessment and evaluation

• Identification of research dynamics associated with specific datasets

Pilots Overview

Page 7: The Scientific Literature Background

OpenScienceLink Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014

4Data Mining for Proactive Formulation of Scientific Collaborations• Enable the networking and

collaboration of researchers and scholars working on similar or potentially collaborating scientific fields and sharing similar research interests

• Infer relationships between researchers and research groups, including (in several cases) non-obvious, non-declared ones

3Data Mining for Biomedical and Clinical Research Trends Detection and Analysis

• Detect research trends

• Analyse research trends

• Essential for:

– allocation of research funding (by private sponsors and governmental agencies)

– overall planning of research strategies

Pilots Overview

Page 8: The Scientific Literature Background

OpenScienceLink Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014

5Scientific field-aware, Productivity- and Impact-oriented Enhanced Research Evaluation Services• Current simplified indices and impact factors evaluate only an aspect of the

scientific work

• Introduce, produce and track new metrics of research and scientific performance, beyond conventional ones for evaluation of:

– Research work (incl. data papers)

– Researcher

– Research group or community

– Conference, Journal, Publisher

– Department, Laboratory, Institution, University, Organisation

– Country

– Research grant

Pilots Overview

Page 9: The Scientific Literature Background

OpenScienceLink Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014

OpenScienceLink Ecosystem

Page 10: The Scientific Literature Background

OpenScienceLink Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014

Integrated Platforms

• FP7 SocIoS– A set of tools that leverage the potential of Social Networking Sites (SNSs)– Serves as an umbrella for accessing user data scattered among various SNSs through a common

and secure interface– Hides SNS-specific complexity– Enables the delivery of services which exploit social graphs

• GoPubMed– A semantic search engine for the life sciences– Allows exploring PubMed search results with concepts from the Medical Subject Headings (MeSH),

the Gene Ontology (GO) and the Universal Protein Resource (UniProt)– A data management model expanded with the ability to index, annotate, and semantically search

datasets

• FP7 PONTE– A knowledge-oriented platform that supports the design and creation of clinical trial protocols– Provides a set of semantic web enabled mechanisms and services facilitating clinical trials lifecycle– Incorporates a set of advanced data mining and semantic reasoning mechanisms which are applied

on a variety of web data sources containing clinical and non-clinical information

Page 11: The Scientific Literature Background

OpenScienceLink Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014

OpenScienceLink Conceptual Architecture

Page 12: The Scientific Literature Background

OpenScienceLink Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014

OpenScienceLink Core Components

• The OpenScienceLink core components implement the main functionality of the Platform and form the OpenScienceLink API

• Users Management– Responsible for handling all functionality related to the Platform

users, their profile and access rights, such as user registration, profile editing, authentication and role-based authorisation.

• Datasets Management– Responsible for handling all functionality related to datasets and the

corresponding metadata.– Metadata are partially based on the Dryad Metadata Application

Profile, including extensions at the level of parameters, e.g. dataset source type (real-world vs. synthetic), level of noise, and species, among others

Page 13: The Scientific Literature Background

OpenScienceLink Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014

Core Components Layer

• Articles Management– Responsible for handling all functionality related to articles

• Authors Management– This component is responsible for handling all functionality related to

authors

• Groups Management– This component is responsible for handling all functionality related to groups

of people

Page 14: The Scientific Literature Background

OpenScienceLink Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014

Core Components Layer

• Review Data Management– Responsible for handling all functionality related to the review process and

the corresponding data– Covers the initiation and updating of the review process as well as the

provision of access to the reviews to the corresponding users• For example, for a particular article or dataset, some users can see their

own review (e.g. a reviewer), some users can see all reviews without knowing the reviewers (e.g. an author), and some users can see all reviews and reviewers (e.g. a publisher)

– Comments and ratings are also managed by this component, always considering each user's access rights

Page 15: The Scientific Literature Background

OpenScienceLink Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014

Adaptors Layer

• The OpenScienceLink core components interact with the SocIoS, GoPubMed and PONTE platforms by means of the adaptors

• The latter undertake the required actions, mappings and transformations in order to enable communication with the existing platforms and ultimately the underlying data sources for the exploitation of the existing wealth of information

Page 16: The Scientific Literature Background

OpenScienceLink Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014

Social Networks Adaptor

• Comprises a simplification layer on top of the SocIoS Services• Undertakes the integration of the underlying SocIoS platform and

communication with the connected SNS(s)• Receives requests from the OpenScienceLink core components for the

provision of data stemming from the connected SNSs, including the exact type of information required and the SNS(s) involved

• Combines SocIoS services in order to provide tailored functionality pertaining to the specific data needs of the OpenScienceLink Core Components

• Queries the services built on top of the SocIoS platform in order to further process the specific requests and gather the required data

• Internally performs data processing or mapping that may required for the seamless collaboration between the OpenScienceLink core components and the SocIoS platform in either direction

• Offered functionality: Persons retrieval, connected persons retrieval, media items retrieval, activities retrieval, data transformation and data extraction

16

Page 17: The Scientific Literature Background

OpenScienceLink Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014

Content and Data Management Adaptor• Integrates the data management system of GoPubMed within

the OpenScienceLink Platform• Integrates the services of the GoPubMed semantic search

engine• Comprises a simplified layer of services on top of the

GoPubMed platform that pertain to the indexing of data, annotation with the underlying ontology concepts, importing of new ontologies, semantic search on the indexed data and identification of trends in the indexed data.– Utilised for presenting statistics about the resulting set of documents,

such as the number of publications over time, the top countries, cities, journals, authors and ontology terms

– It is, thus, a summary of the trends observed for the documents that are returned via the input query

17

Page 18: The Scientific Literature Background

OpenScienceLink Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014

Semantically-enabled Inference Adaptor• Enables the integration of the PONTE platform with OpenScienceLink• Exploits the PONTE data mining and semantic reasoning mechanisms and

services as well as the rich knowledge base of the PONTE platform • Use of the term co-occurrence index building capability of the PONTE platform,

in order to exploit the fact that relevant terms appear together in the literature – the more this happens, the more relevant they are considered to be – and build a co-occurrence index for pairs and triples of terms, ranked on each case by frequency (offering a first stage filter of information, able to reduce the amount of information to manageable levels, without sacrificing interesting results, for guiding research)

• Exploitation of a local knowledge base based on curated data from the web of linked data, as well as specialized data sources (incl. KEGG, ChEBI, DrugBank, Sider, etc)

• Application of various ranking algorithms to the discovered data, following the knowledge-based concept correlations capability stemming from PONTE, with the ranking results being used either for presentation purposes (top first) or for adjusting the level of inclusion / exclusion of terms deemed relevant.

18

Page 19: The Scientific Literature Background

OpenScienceLink Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014

Conclusions

• The OpenScienceLink platform enables accessing and offering of added value services (including trends detection and analysis, development of new scientometrics, data journals management, enhanced review processes) based on a multitude of openly accessible data sources (from literature and data sets to social network data), while at the same time empowering their semantic linking and data processing

• It further offers a wide range of opportunities for better collaboration between researchers, scholars, and research organisations, including their ability to formulate added-value scientific / research networks

19

Page 20: The Scientific Literature Background

OpenScienceLink Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014

Future Work

• Expand the capabilities of the components and user interfaces according to the recorded end user needs and requirements regarding all Pilots

• Address any issues with the implemented functionality and provide improvements based on the end user’s evaluation feedback

• Consider additional data sources for inclusion via integration with the underlying platforms, according to the needs of OpenScienceLink

• Investigate the integration more SNSs, with the aim to also include networks specifically addressed to researchers and research communities, with the most probable first candidate being Mendeley

• Analyse the steps required (e.g., link with other domains’ ontologies, data sources and models) for enabling the Platform to offer its services beyond the biomedical domain, and, thus, ideally become domain-agnostic

20

Page 21: The Scientific Literature Background

OpenScienceLink Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014

Thank you

Efstathios KaranastasisResearch Engineer

+30 210 772 2132

[email protected]

21

► Contact

National Technical University of AthensSchool of Electrical and Computer Engineering

Distributed Knowledge and Media Systems Group

http://grid.ece.ntua.gr

Page 22: The Scientific Literature Background

OpenScienceLink Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014

The OpenScienceLink Platform

22

Page 23: The Scientific Literature Background

OpenScienceLink Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014

Log in

23

Page 24: The Scientific Literature Background

OpenScienceLink Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014

User registration (step 1 of 3)

24

Page 25: The Scientific Literature Background

OpenScienceLink Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014

User registration (step 2 of 3)

25

Page 26: The Scientific Literature Background

OpenScienceLink Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014

User registration (step 3 of 3)

26

Page 27: The Scientific Literature Background

OpenScienceLink Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014

Main menu bar

27

Page 28: The Scientific Literature Background

OpenScienceLink Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014

My profile

28

Page 29: The Scientific Literature Background

OpenScienceLink Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014

My datasets

29

Page 30: The Scientific Literature Background

OpenScienceLink Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014

Upload dataset

30

Page 31: The Scientific Literature Background

OpenScienceLink Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014

Create review call (1 of 2)

31

Page 32: The Scientific Literature Background

OpenScienceLink Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014

Create review call (2 of 2)

32

Page 33: The Scientific Literature Background

OpenScienceLink Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014

Trends

33

Page 34: The Scientific Literature Background

OpenScienceLink Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014

Collaborations

34

Page 35: The Scientific Literature Background

OpenScienceLink Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014

Evaluation (1 of 2)

35

Page 36: The Scientific Literature Background

OpenScienceLink Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014

Evaluation (2 of 2)

36

Page 37: The Scientific Literature Background

OpenScienceLink Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014

PONTE: Eligibility Criteria Model

Scope within PONTE:►Formal representation of Eligibility (Inclusion/Exclusion)

Criteria►Patients Model for Clinical Research purposes

(especially recruitment)

Current Status: 1st year work►Work upon extending and adapting the eligibility criteria

model for OpenScienceLink purposes

Future work: 2nd and mainly 3rd year►Update and Integrate I/E criteria model within

OpenScienceLink platform ►Annotate literature search results►Improve literature search process

37

Page 38: The Scientific Literature Background

OpenScienceLink Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014

PONTE: Abbreviations - Introduction

► An abbreviation is shortened form of a term or expression (aka the expanded form)

► Abbreviations are widely used in biomedical articles and datasets. Example:► An abbreviation is present within a document,

e.g. “Cardiac testing for all patients at low-risk for ACS is not sustainable”…

► But its expansion is missing

Acute Coronary Syndrome

► Highly Ambiguous ► Over 5 expansions per abbreviation on average

► Abbreviations expansion detection or prediction is a real challenge

38

Page 39: The Scientific Literature Background

OpenScienceLink Efstathios Karanastasis, The OpenScienceLink Architecture, PCI2014, 3rd of October, 2014

PONTE: Abbreviations - Tasks

►Current Status: Work done during 1st year►In-depth analysis of problem►Abbreviation Expansion Detection and Prediction

System Architecture►Description of Algorithms / Methodology

►Future Work: for 2nd and 3rd year►Repository of abbreviations with expansions along

with context►Suggestion of most appropriate expansion for an

abbreviation

39