support for the full e-experimentation cycle in the virtual laboratory infrastructure
DESCRIPTION
Support for the Full e-Experimentation Cycle in the Virtual Laboratory Infrastructure. Piotr Nowakowski (1), Eryk Ciepiela (1), Tomasz Gubała (1), Maciej Malawski (1, 2), Marian Bubak (1, 2) ( 1 ) ACC Cyfronet AGH, ul. Nawojki 11, 30-950 Kraków, Poland - PowerPoint PPT PresentationTRANSCRIPT
Polish InfrastructurePolish Infrastructurefor Supporting Computational Sciencefor Supporting Computational Science
in the European Research Spacein the European Research Space
Support for the Full e-ExperimentationSupport for the Full e-Experimentation Cycle Cycle in the Virtual Laboratory Infrastructurein the Virtual Laboratory Infrastructure
Piotr Nowakowski (1), Eryk Ciepiela (1), Tomasz Gubała (1), Maciej Malawski (1, 2), Marian Bubak (1, 2)
(1) ACC Cyfronet AGH, ul. Nawojki 11, 30-950 Kraków, Poland(2) Institute of Computer Science AGH, Mickiewicza 30, 30-059Kraków, Poland
KUKDM’10
Zakopane, 18-19 March 2010
OutlineOutline
Motivation Problem definition Scientific challenges Iterative experimentation support Experiment pipelines and traces Sharing experiment data through Data Nets
Motivation: Motivation: e-Science e-Science EExperiments,xperiments,DDataata and and PublicationsPublications
Reproducible experiments, provenance in e-Science
Need to link publications with primary data (experimental data, algorithms, software, workflows, scripts)
Plentitude of scientific software: jobs, workflows, services, components, scripts, experiment plans
Huge amount of scientific data consumed and produced by e-Science
Earth and life Sciences, HEP, etc. Large number of publications
makes research difficult: Computer Science: DBLP contains more
than 220 = 1,048,576 publications, PubMed stores ~17 million articles to
date, CM digital library, ISI Web of Knowledge,
Scopus, Citeseer, arXiv, Google Scholar Emergence of the Web 2.0-based
Scientific Social Community (SSC) model
Open Science & Science 2.0Open Science & Science 2.0
New means of scientific communication:Wikis, blogscollaborative web 2.0 technologies
New methods of conducting science: e-science, in-silico experiments, exploratory applications
Democratization of science Increasing role of openness
Problem DefinitionProblem Definition
To construct a theoretical model facilitating open, collaborative e-experimentation, from experiment inception to publication of results, including primary scientific data
To develop a framework implementing the above model
To exploit the emerging solution in the context of existing HPC infrastructures and scientific collaboration
Scientific ChallengesScientific Challenges
Theoretical: A common method for referencing primary data (experimental data, algorithms, software, workflows, scripts) as part of publications should be developed and integrated with modern e-Science infrastructures
Technological: An integrated architecture for storing, annotating, publishing, referencing and reusing primary data sources.This architecture should span existing virtual laboratory and grid computing systems
Description of the SolutionDescription of the Solution
Phase 1: Iterative experiment preparationPhase 2: Experiment execution involving semantic
storage of results and ensuring repeatability
Experimentation PipelineExperimentation Pipeline
The process of developing an experiment beings with drafting its specification
This is followed by iteratively constructing an experiment plan
Each prototype is tested by a specific research community, using tools provided by the PL-Grid virtual laboratory
Upon completion of tests the experiment can be executed in a production mode
Obtained results can be published along with the experiment plan (i.e. a set of operations which enable reenactment and validation of a given experiment)
Experiment TracesExperiment Traces
An experiment trace consists of the following: any input data provided by the experiment enactor; all steps performed in order to transform this data
into publishable scientific results (chronologically arranged);
the documentation of the experiment plan, prepared by a domain scientist (in the form of annotations and comments).
The outcome of this process will be easily manageable and readable, similarly to weblog entries
Our VL system will enable enrichment of individual data elements with provenance information, linking them to appropriate stages of the experiment
Sharing Primary Data: Data NetsSharing Primary Data: Data Nets
Data Net – unifying modern data storage mechanisms (relational databases, Grid-based file systems, Wiki pages etc.)A Data Net is a group of data entities linked by named relationships. Such relationships impose a structure upon the dataset and facilitate querying for entities
ReferencesReferences
W. Funika, D. Harezlak, D. Krol, M. Bubak; Environment for Collaborative Development and Execution of Virtual Laboratory Applications. In: M. Bubak, G.D.v. Albada, J. Dongarra, P.M.A. Sloot (Eds.), Proceedings ICCS 2008, Kraków, Poland, LNCS 5103, pp. 246-458, Springer 2008.
T. Gubala, M. Bubak, P.M.A. Sloot; Semantic Integration of Collaborative Research Environments, M. Cannataro (ed.) Handbook of Research on Computational Grid Technologies for Life Sciences, Biomedicine and Healthcare, Information Science Reference, 2009, IGI Global.
M. Bubak, M. Malawski, T. Gubala, M. Kasztelnik, P. Nowakowski, D. Harezlak, T. Bartynski, J. Kocot, E. Ciepiela, W. Funika, D. Krol, B. Balis, M. Assel, and A. Tirado Ramos. Virtual laboratory for collaborative applications. In M. Cannataro, editor, Handbook of Research on Computational GridTechnologies for Life Sciences, Biomedicine and Healthcare, chapter XXVII, pages 531-551. IGI Global, 2009.
https://gs2.cyfronet.pl