the links between research data, scientific analysis ... › fileadmin › user_upload ›...

17
KIT University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association Institute for Data Processing and Electronics (IPE) www.kit.edu NORDR The links between Research Data, Scientific Analysis Workflows, Provenance and Metadata: A Researchers Perspective on RDA Ajinkya Prabhune

Upload: others

Post on 08-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The links between Research Data, Scientific Analysis ... › fileadmin › user_upload › os.helmholtz.de › ... · Research Data Provenance IG •Focus on comparison and evaluation

KIT – University of the State of Baden-Wuerttemberg and

National Research Center of the Helmholtz Association

Institute for Data Processing and Electronics (IPE)

www.kit.edu

NORDR

The links between Research Data, Scientific Analysis

Workflows, Provenance and Metadata: A Researchers

Perspective on RDA

Ajinkya Prabhune

Page 2: The links between Research Data, Scientific Analysis ... › fileadmin › user_upload › os.helmholtz.de › ... · Research Data Provenance IG •Focus on comparison and evaluation

Institute for Data Processing and Electronics (IPE) 2

Introduction: Nanoscopy

Nanoscopy Research

• Investigation on “aggressive B-cell lymphomas”

• Microscopy technique – Spectral Precision Distance Microscopy (SPDM)

• Novel imaging method producing datasets in the range of 150-200 TB

Community Requirements

• Community specific data-processing algorithms

• Manage the continuously evolving scientific workflows

• Allow experiment reproducibility

• Automated provenance information management

Ajinkya Prabhune – The links between Research Data, Scientific Analysis

Workflows, Provenance and Metadata: A Researchers perspective on RDA

02.12.2015

Page 3: The links between Research Data, Scientific Analysis ... › fileadmin › user_upload › os.helmholtz.de › ... · Research Data Provenance IG •Focus on comparison and evaluation

Institute for Data Processing and Electronics (IPE) 3

Data Repository/Workflow/ Provenance/

Metadata

• Data repository for long term storage and access to scientific datasets

• Integrate scientific workflow + associated provenance information in the

repository system

• Capture workflow description and execution details for

- enabling research reproducibility

- tracking workflow evolution

- assessing data quality

Ajinkya Prabhune – The links between Research Data, Scientific Analysis

Workflows, Provenance and Metadata: A Researchers perspective on RDA

02.12.2015

Raw Intermediate Results Scientific

Interpretations

Data

Page 4: The links between Research Data, Scientific Analysis ... › fileadmin › user_upload › os.helmholtz.de › ... · Research Data Provenance IG •Focus on comparison and evaluation

Institute for Data Processing and Electronics (IPE) 4

Generic Provenance Metadata Model (GPDM)

• Enable modelling of prospective &

retrospective provenance information

• Flexible: can be modelled as per the needs

of the community

• Interoperable: Automated conversion into

OPM/PROV model

• Extensible: Easy to integrate vocabularies

Ajinkya Prabhune – The links between Research Data, Scientific Analysis

Workflows, Provenance and Metadata: A Researchers perspective on RDA

02.12.2015

Page 5: The links between Research Data, Scientific Analysis ... › fileadmin › user_upload › os.helmholtz.de › ... · Research Data Provenance IG •Focus on comparison and evaluation

Institute for Data Processing and Electronics (IPE) 5

• PROVENANCEGEN: automatic

generation of provenance graphs

• Metadata modelling services

integrated with metadata model

registry

• Building links between data,

provenance and metadata

• Digital Object (DO) available in a data

repository

Enable Scientific Data Reproducibility

Automated Metadata Management in Scientific

Repository Systems

Ajinkya Prabhune – The links between Research Data, Scientific Analysis

Workflows, Provenance and Metadata: A Researchers perspective on RDA

02.12.2015

Workflow

Definition

Metadata

Data

Raw Intermediate Results Scientific

Interpretations

DO1

R1

DO2

R2

DO3 DO4

R3

PROVENANCEGEN

Provenance

Page 6: The links between Research Data, Scientific Analysis ... › fileadmin › user_upload › os.helmholtz.de › ... · Research Data Provenance IG •Focus on comparison and evaluation

Institute for Data Processing and Electronics (IPE) 6

Metadata WGs & IGs

Metadata Standards

• Metadata directory for storing and accessing various metadata standards

• YAML based template for submitting metadata standard

• Well documented list of tools for handling the metadata standards

• Provision an API for adding, searching, retrieving metadata standards

• Generic metadata template and metadata principles document available

Ajinkya Prabhune – The links between Research Data, Scientific Analysis

Workflows, Provenance and Metadata: A Researchers perspective on RDA

02.12.2015

Page 7: The links between Research Data, Scientific Analysis ... › fileadmin › user_upload › os.helmholtz.de › ... · Research Data Provenance IG •Focus on comparison and evaluation

Institute for Data Processing and Electronics (IPE) 7

Research Data Provenance

Research Data Provenance IG

• Focus on comparison and evaluation of data provenance models

(OPM/PROV)

• Provide recommendation on provenance model

• Liaison with Data Citation, Data Foundation & Terminology and

Metadata Standards

Ajinkya Prabhune – The links between Research Data, Scientific Analysis

Workflows, Provenance and Metadata: A Researchers perspective on RDA

02.12.2015

Page 8: The links between Research Data, Scientific Analysis ... › fileadmin › user_upload › os.helmholtz.de › ... · Research Data Provenance IG •Focus on comparison and evaluation

Institute for Data Processing and Electronics (IPE) 8

Repositories WGs and IGs

Repositories Platforms of Research Data IG

• Collect and analyse research data use cases in context of repository platforms

• Matrix relating use cases with functional requirements as a deliverable

• Propose a specification for generic API in future New BOF group spawned

Domain Repositories IG

• Aim to bring together active data repositories serving scientific communities

• Provide a forum for sharing practical experience and developing joint projects

Ajinkya Prabhune – The links between Research Data, Scientific Analysis

Workflows, Provenance and Metadata: A Researchers perspective on RDA

02.12.2015

Page 9: The links between Research Data, Scientific Analysis ... › fileadmin › user_upload › os.helmholtz.de › ... · Research Data Provenance IG •Focus on comparison and evaluation

Institute for Data Processing and Electronics (IPE) 9

Conclusion

Nanoscopy Data Repository System available for scientific community

• Capable of managing the extremely large datasets

• Metadata management integrated in the repository

• Automated provenance information management enabled via

PROVENANCEGEN algorithm and GPDM

• Involvement with various WGs and IGs is an additional benefit for my

research

Ajinkya Prabhune – The links between Research Data, Scientific Analysis

Workflows, Provenance and Metadata: A Researchers perspective on RDA

02.12.2015

Page 10: The links between Research Data, Scientific Analysis ... › fileadmin › user_upload › os.helmholtz.de › ... · Research Data Provenance IG •Focus on comparison and evaluation

Institute for Data Processing and Electronics (IPE) 10

Extra slides

Ajinkya Prabhune - The links between research data, scientific analysis workflows,

provenance and metadata: a researchers perspective on RDA

02.12.2015

Page 11: The links between Research Data, Scientific Analysis ... › fileadmin › user_upload › os.helmholtz.de › ... · Research Data Provenance IG •Focus on comparison and evaluation

Institute for Data Processing and Electronics (IPE) 11

Aligning Nanoscopy Repository System with RDA

Ajinkya Prabhune - The links between research data, scientific analysis workflows,

provenance and metadata: a researchers perspective on RDA

02.12.2015

Me

tad

ata Metadata Extraction

Metadata Modelling

Metadata Processing

Metadata Preservation

Scientific Workflow

Intelligent Search

Annotation Service

Data Publication

ServiceSe

rvic

es

Da

ta

Data Preservation

Data Curation

Data Analysis

Data Processing

Interactive Web Portal

Knowledge Representation

Nanoscopy Open Reference Data RepositoryD

ata

Va

lida

tio

nD

ata

Co

llectio

n

An

on

ym

iza

tio

n

Da

ta I

ng

est

Use

r-

Inte

rfa

ce

Data Archive Data Processing

Data Fabric IG

• Data Management

• Data Preservation

• Data Analysis

• Data Curation/Processing

• Hardware-Infrastructure

• Reference Data Collection

Metadata SD/C WG and Metadata IG

• Metadata Model

• Metadata Management

• Metadata Store

Research Data Provenance IG

• Data Provenance Model

Repositories WGs and IGs

• Comprehensive coverage of

functional requirements

• Generic API for interoperability

Page 12: The links between Research Data, Scientific Analysis ... › fileadmin › user_upload › os.helmholtz.de › ... · Research Data Provenance IG •Focus on comparison and evaluation

Institute for Data Processing and Electronics (IPE) 12

Automated Metadata Management in Scientific

Repository Systems

Ajinkya Prabhune - The links between research data, scientific analysis workflows,

provenance and metadata: a researchers perspective on RDA

02.12.2015

Automated Metadata Management

• PROVENANCEGEN algorithm for

automatic generation of provenance

graphs

• Metadata modelling services

integrated with metadata model

registry

• Integrated PID system

Page 13: The links between Research Data, Scientific Analysis ... › fileadmin › user_upload › os.helmholtz.de › ... · Research Data Provenance IG •Focus on comparison and evaluation

Institute for Data Processing and Electronics (IPE) 13

Data Fabric IG

Data Fabric IG aims to design a flexible and dynamic ecosystem consisting of

components, services, tools, infrastructure for enabling efficient, cost-effective and

reproducible research.

• Data Fabric IG is the umbrella group, works together with other WGs and IGs

• Use cases submitted by various communities

Prof. Max Mustermann - Title 02.12.2015

Research Area Relation with WGs IGs

PIDs assignment PID Information Types

Scientific data repositories Repositories Platform for Research Data, Domain Repositories, Research

Data Repositories Interoperability

Metadata management Metadata Standards Directory, Metadata Standards Catalog and Metadata IG

Provenance data management Research Data Provenance

Data management policies Practical Policies

Page 14: The links between Research Data, Scientific Analysis ... › fileadmin › user_upload › os.helmholtz.de › ... · Research Data Provenance IG •Focus on comparison and evaluation

Institute for Data Processing and Electronics (IPE) 14

Introduction: RDA

WGs and IGs to support the complete

research data lifecycle

• Metadata WGs and IGs

• Repository IGs

• Research Data Provenance IG

• Data Fabric IG

• And more…

Ajinkya Prabhune – Research on extreme large datasets in field of nanoscopy

Page 15: The links between Research Data, Scientific Analysis ... › fileadmin › user_upload › os.helmholtz.de › ... · Research Data Provenance IG •Focus on comparison and evaluation

Institute for Data Processing and Electronics (IPE) 15

Scientific Workflows/ Provenance/ Metadata

Typical scientific workflow execution

Ajinkya Prabhune – The links between Research Data, Scientific Analysis

Workflows, Provenance and Metadata: A Researchers perspective on RDA

02.12.2015

Metadata

Data

Workflow

Definition Researcher

Raw Intermediate Results Scientific

Interpretations

Required:

Raw data repositorz

Workflow description

Provenance informai

Page 16: The links between Research Data, Scientific Analysis ... › fileadmin › user_upload › os.helmholtz.de › ... · Research Data Provenance IG •Focus on comparison and evaluation

Institute for Data Processing and Electronics (IPE) 16

Consolidated Motivation

Enabling efficient management of scientific research (meta)data lifecycle

from the perspective of the scientific community

• Comprehensive scientific data repository system

• Extensible architecture for integrating dynamic requirements

• Seamless integration of complex scientific workflows + associated provenance

data management

Active involvement with RDA

• Firsthand feedback from domain experts

• Dedicated groups focusing on specific topics (useable/adaptable outcomes)

• Regular discussion and updates via teleconferences

Ajinkya Prabhune – The links between Research Data, Scientific Analysis

Workflows, Provenance and Metadata: A Researchers perspective on RDA

Page 17: The links between Research Data, Scientific Analysis ... › fileadmin › user_upload › os.helmholtz.de › ... · Research Data Provenance IG •Focus on comparison and evaluation

Institute for Data Processing and Electronics (IPE) 17

Scientific Workflows/ Provenance/ Metadata

Typical scientific workflow execution

• Raw dataset ingested and available in data repository

Ajinkya Prabhune – The links between Research Data, Scientific Analysis

Workflows, Provenance and Metadata: A Researchers perspective on RDA

02.12.2015

Data

Raw Intermediate Results Scientific

Interpretations