carelyn campbell, ben blaiszik, laura bartolo...5 data model definition defines the structure of...
TRANSCRIPT
![Page 1: Carelyn Campbell, Ben Blaiszik, Laura Bartolo...5 Data Model Definition Defines the structure of metadata and data Measurement Data Model Metadata e.g. • Sample owner • Date of](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35d7a5c0b8e4185d50c985/html5/thumbnails/1.jpg)
CarelynCampbell,BenBlaiszik,LauraBartolo
November1,2016
![Page 2: Carelyn Campbell, Ben Blaiszik, Laura Bartolo...5 Data Model Definition Defines the structure of metadata and data Measurement Data Model Metadata e.g. • Sample owner • Date of](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35d7a5c0b8e4185d50c985/html5/thumbnails/2.jpg)
DataLandscapeCollaborationTools(e.g.GoogleDrive,
DropBox,Sharepoint,Github,MatIN)
DataSharingCommunities
(e.g.Dryad,FigShare,NanoHub,Kaggle,NDS)
DataRepositories(e.g.Aflow,MaterialsProject,OQMD,NIMSMaterialNavi,NoMaD,MaterialsUniverse)
DataCuration
DataAnalysisTools
Software
![Page 3: Carelyn Campbell, Ben Blaiszik, Laura Bartolo...5 Data Model Definition Defines the structure of metadata and data Measurement Data Model Metadata e.g. • Sample owner • Date of](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35d7a5c0b8e4185d50c985/html5/thumbnails/3.jpg)
Datacurationistheactiveandongoingmanagementofdatathroughitslifecycleofinterestandusefulnesstoscholarship,science,andeducation.
http://ischool.illinois.edu/academics/degrees/specializations/data_curation
ScottAdams,October30,2011
![Page 4: Carelyn Campbell, Ben Blaiszik, Laura Bartolo...5 Data Model Definition Defines the structure of metadata and data Measurement Data Model Metadata e.g. • Sample owner • Date of](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35d7a5c0b8e4185d50c985/html5/thumbnails/4.jpg)
4Representative list of tools focused on materials data, but not comprehensive
![Page 5: Carelyn Campbell, Ben Blaiszik, Laura Bartolo...5 Data Model Definition Defines the structure of metadata and data Measurement Data Model Metadata e.g. • Sample owner • Date of](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35d7a5c0b8e4185d50c985/html5/thumbnails/5.jpg)
5
Data Model DefinitionDefines the structure of metadata and data Measurement Data Model
Metadata e.g.• Sample owner• Date of measurment Kα1• Sample stage position• Apparatus temperature
Data e.g.• As XML• Raw data (text, ASCII, binary) • Imported table • Link to image or raw data*
![Page 6: Carelyn Campbell, Ben Blaiszik, Laura Bartolo...5 Data Model Definition Defines the structure of metadata and data Measurement Data Model Metadata e.g. • Sample owner • Date of](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35d7a5c0b8e4185d50c985/html5/thumbnails/6.jpg)
6
Example APS Data (Large Data Example)
Generate Data at APS
Analysis Tools
Analyzed Data
Share DataAnd
Metadata
NanoHub
![Page 7: Carelyn Campbell, Ben Blaiszik, Laura Bartolo...5 Data Model Definition Defines the structure of metadata and data Measurement Data Model Metadata e.g. • Sample owner • Date of](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35d7a5c0b8e4185d50c985/html5/thumbnails/7.jpg)
Workflows
• LargeDatasets:SinglePointSource(e.g.APS)• Experimentaldata(smalltomediumsize),multiplesourcegeneration
• ComputationalData
• InfrastructureSelectionTool
![Page 8: Carelyn Campbell, Ben Blaiszik, Laura Bartolo...5 Data Model Definition Defines the structure of metadata and data Measurement Data Model Metadata e.g. • Sample owner • Date of](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35d7a5c0b8e4185d50c985/html5/thumbnails/8.jpg)
Stress-StrainMeasurement
![Page 9: Carelyn Campbell, Ben Blaiszik, Laura Bartolo...5 Data Model Definition Defines the structure of metadata and data Measurement Data Model Metadata e.g. • Sample owner • Date of](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35d7a5c0b8e4185d50c985/html5/thumbnails/9.jpg)
SampleGeometry
Stress-StrainMeasurement
LoadFrame DIC LVDT
SampleMetadata
LVDTdata
LoadFrameData
DICdata
ExperimentalWorkflow:Stress-strainMeasurement
LVDTMetadata
LoadFrameMetadata
DICMetadata
Analysis(Toolse.g.Github,Matlab)
Results
MDF
Granta
Other
CuratewithMaterialsCommonsorICE
![Page 10: Carelyn Campbell, Ben Blaiszik, Laura Bartolo...5 Data Model Definition Defines the structure of metadata and data Measurement Data Model Metadata e.g. • Sample owner • Date of](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35d7a5c0b8e4185d50c985/html5/thumbnails/10.jpg)
BigDataWorkflow
![Page 11: Carelyn Campbell, Ben Blaiszik, Laura Bartolo...5 Data Model Definition Defines the structure of metadata and data Measurement Data Model Metadata e.g. • Sample owner • Date of](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35d7a5c0b8e4185d50c985/html5/thumbnails/11.jpg)
ComputationalDataWorkflow
• Lotsofdifferenttechniques• PhaseField modeling:nostandards.• Communitystandardsneeded
• Codeschangesquickly• Socialchangeneeded.• FEM- morebenchmark.-- morestandardize
![Page 12: Carelyn Campbell, Ben Blaiszik, Laura Bartolo...5 Data Model Definition Defines the structure of metadata and data Measurement Data Model Metadata e.g. • Sample owner • Date of](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35d7a5c0b8e4185d50c985/html5/thumbnails/12.jpg)
HowdoIselectaMaterialsDataInfrastructureTool?
![Page 13: Carelyn Campbell, Ben Blaiszik, Laura Bartolo...5 Data Model Definition Defines the structure of metadata and data Measurement Data Model Metadata e.g. • Sample owner • Date of](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35d7a5c0b8e4185d50c985/html5/thumbnails/13.jpg)
A Taxonomy of Workflow Management Systems for Grid Computing
Jia Yu and Rajkumar Buyya*Grid Computing and Distributed Systems (GRIDS) Laboratory, Department of Computer Scienceand Software Engineering, The University of Melbourne, Melbourne, AustraliaE-mail: [email protected]
Received 28 May 2005; accepted in revised form 6 December 2005
Key words: Grid computing, resource management, scheduling, taxonomy, workflow management
Abstract
With the advent of Grid and application technologies, scientists and engineers are building more and morecomplex applications to manage and process large data sets, and execute scientific experiments on distributedresources. Such application scenarios require means for composing and executing complex workflows.Therefore, many efforts have been made towards the development of workflow management systems for Gridcomputing. In this paper, we propose a taxonomy that characterizes and classifies various approaches forbuilding and executing workflows on Grids. We also survey several representative Grid workflow systemsdeveloped by various projects world-wide to demonstrate the comprehensiveness of the taxonomy. The tax-onomy not only highlights the design and engineering similarities and differences of state-of-the-art in Gridworkflow systems, but also identifies the areas that need further research.
1. Introduction
Grids [51] have emerged as a global cyber-infra-structure for the next-generation of e-Science appli-cations by integrating large-scale, distributed andheterogeneous resources. Scientific communities,such as high-energy physics, gravitational-wave phys-ics, geophysics, astronomy and bioinformatics, areutilizing Grids to share, manage and process large datasets. In order to support complex scientific experi-ments, distributed resources such as computationaldevices, data, applications, and scientific instrumentsneed to be orchestrated while managing the applica-tion workflow operations within Grid environments[92].
Workflow is concerned with the automation ofprocedures whereby files and data are passed be-tween participants according to a defined set of rules
to achieve an overall goal [35]. A workflow man-agement system [5] defines, manages and executesworkflows on computing resources. Imposing theworkflow paradigm for application composition onGrids offers several advantages [117] such as:
! Ability to build dynamic applications which or-chestrate distributed resources.
! Utilization of resources that are located in a par-ticular domain to increase throughput or reduceexecution costs.
! Execution spanning multiple administrative do-mains to obtain specific processing capabilities.
! Integration of multiple teams involved in manag-ing of different parts of the experiment workflowY thus promoting inter-organizational collabora-tions.
Figure 1 shows the architecture and functionalitiessupported by various components of the Grid work-flow system based on the workflow reference model
j
j Corresponding author.
Journal of Grid Computing (2006) 3: 171–200 # Springer 2006DOI: 10.1007/s10723-005-9010-8
their dependencies in the form of Directed AcyclicGraph (DAG). Before mapping, Pegasus reduces theabstract workflow by reusing a materialized datasetwhich is produced by other users within a VO.Reduction optimization assumes that it is more costlyto produce a dataset than access the processingresults. The reduction algorithm removes any ante-cedents of the redundant jobs that do not have anyunmaterialized descendents in order to reduce thecomplexity of the executable workflow.
Pegasus consults various Grid information serv-ices to find the resources, software, and data that areused in the workflow. A Replica Location Service(RLS) [30] and Transformation Catalog (TC) [39] areused to locate the replicas of the required data, and tofind the location of the logical application compo-nents respectively. Pegasus also queries Globus
Monitoring and Discovery Service (MDS) [34] tofind available resources and their characteristics.
There are two methods used in Pegasus for re-source selection, one is through random allocation, theother is through a performance prediction approach. Inthe latter approach, Pegasus interacts with Prophesy[68, 140], which serves as an infrastructure for per-formance analysis and modeling of parallel anddistributed applications. Prophesy is used to predictthe best site to execute an application component byusing performance historical data. Prophesy gathersand stores the performance data of every application.The performance information can provide insightinto the performance relationship between the appli-cation and hardware and between the application,compilers, and run-time systems. An analyticalmodel is produced based on the performance data
Table 2. Workflow design taxonomy mapping.
Project name Structure Model Composition systems QoS constraints
DAGMan DAG Abstract User-directed User specified rank
expression for desired resources& Language-based
Pegasus DAG Abstract User-directed N/A
& Language-based
Automatic
Triana Non-DAG Abstract User-directed N/A
& Graph-based
ICENI Non-DAG Abstract User-directed Metrics specified by users
& Language-based
& Graph-based
Taverna DAG Abstract/concrete User-directed N/A
& Language-based
& Graph-based
GridAnt Non-DAG Concrete User-directed N/A
& Language-based
GrADS DAG Abstract User-directed Estimated application execution time
& Language-based
GridFlow DAG Abstract User-directed Application execution time
& Graph-based
& Language-based
Unicore Non-DAG Concrete User-directed N/A
& Graph-based
Gridbus workflow DAG Abstract/concrete User-directed Deadline, cost minimisation
& Language-based
Askalon Non-DAG Abstract User-directed Constrains and properties specified
by users or pre-defined& Graph-based
& Language-based
Karajan Non-DAG Abstract User-directed N/A
& Language-based
& Graph-based
Kepler Non-DAG Abstract/concrete User-directed N/A
& Graph-based
185
Example:WorkflowToolSelection
their dependencies in the form of Directed AcyclicGraph (DAG). Before mapping, Pegasus reduces theabstract workflow by reusing a materialized datasetwhich is produced by other users within a VO.Reduction optimization assumes that it is more costlyto produce a dataset than access the processingresults. The reduction algorithm removes any ante-cedents of the redundant jobs that do not have anyunmaterialized descendents in order to reduce thecomplexity of the executable workflow.
Pegasus consults various Grid information serv-ices to find the resources, software, and data that areused in the workflow. A Replica Location Service(RLS) [30] and Transformation Catalog (TC) [39] areused to locate the replicas of the required data, and tofind the location of the logical application compo-nents respectively. Pegasus also queries Globus
Monitoring and Discovery Service (MDS) [34] tofind available resources and their characteristics.
There are two methods used in Pegasus for re-source selection, one is through random allocation, theother is through a performance prediction approach. Inthe latter approach, Pegasus interacts with Prophesy[68, 140], which serves as an infrastructure for per-formance analysis and modeling of parallel anddistributed applications. Prophesy is used to predictthe best site to execute an application component byusing performance historical data. Prophesy gathersand stores the performance data of every application.The performance information can provide insightinto the performance relationship between the appli-cation and hardware and between the application,compilers, and run-time systems. An analyticalmodel is produced based on the performance data
Table 2. Workflow design taxonomy mapping.
Project name Structure Model Composition systems QoS constraints
DAGMan DAG Abstract User-directed User specified rank
expression for desired resources& Language-based
Pegasus DAG Abstract User-directed N/A
& Language-based
Automatic
Triana Non-DAG Abstract User-directed N/A
& Graph-based
ICENI Non-DAG Abstract User-directed Metrics specified by users
& Language-based
& Graph-based
Taverna DAG Abstract/concrete User-directed N/A
& Language-based
& Graph-based
GridAnt Non-DAG Concrete User-directed N/A
& Language-based
GrADS DAG Abstract User-directed Estimated application execution time
& Language-based
GridFlow DAG Abstract User-directed Application execution time
& Graph-based
& Language-based
Unicore Non-DAG Concrete User-directed N/A
& Graph-based
Gridbus workflow DAG Abstract/concrete User-directed Deadline, cost minimisation
& Language-based
Askalon Non-DAG Abstract User-directed Constrains and properties specified
by users or pre-defined& Graph-based
& Language-based
Karajan Non-DAG Abstract User-directed N/A
& Language-based
& Graph-based
Kepler Non-DAG Abstract/concrete User-directed N/A
& Graph-based
185
![Page 14: Carelyn Campbell, Ben Blaiszik, Laura Bartolo...5 Data Model Definition Defines the structure of metadata and data Measurement Data Model Metadata e.g. • Sample owner • Date of](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35d7a5c0b8e4185d50c985/html5/thumbnails/14.jpg)
Example:HardwareStoreWebsite
Credit:homedepot.comAnymentionofcommercialproductsisforinformationonly;itdoesnotimplyrecommendationorendorsementbyNIST.
![Page 15: Carelyn Campbell, Ben Blaiszik, Laura Bartolo...5 Data Model Definition Defines the structure of metadata and data Measurement Data Model Metadata e.g. • Sample owner • Date of](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35d7a5c0b8e4185d50c985/html5/thumbnails/15.jpg)
Example:UsedCarWebsite
Credit:carmax.comAnymentionofcommercialproductsisforinformationonly;itdoesnotimplyrecommendationorendorsementbyNIST.
![Page 16: Carelyn Campbell, Ben Blaiszik, Laura Bartolo...5 Data Model Definition Defines the structure of metadata and data Measurement Data Model Metadata e.g. • Sample owner • Date of](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35d7a5c0b8e4185d50c985/html5/thumbnails/16.jpg)
Registry:MaterialsDataInfrastructureTools
![Page 17: Carelyn Campbell, Ben Blaiszik, Laura Bartolo...5 Data Model Definition Defines the structure of metadata and data Measurement Data Model Metadata e.g. • Sample owner • Date of](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35d7a5c0b8e4185d50c985/html5/thumbnails/17.jpg)
FirstRoughDraftTaxonomy
![Page 18: Carelyn Campbell, Ben Blaiszik, Laura Bartolo...5 Data Model Definition Defines the structure of metadata and data Measurement Data Model Metadata e.g. • Sample owner • Date of](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35d7a5c0b8e4185d50c985/html5/thumbnails/18.jpg)
FirstRoughDraftTaxonomy
![Page 19: Carelyn Campbell, Ben Blaiszik, Laura Bartolo...5 Data Model Definition Defines the structure of metadata and data Measurement Data Model Metadata e.g. • Sample owner • Date of](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35d7a5c0b8e4185d50c985/html5/thumbnails/19.jpg)
FirstRoughDraftTaxonomy
![Page 20: Carelyn Campbell, Ben Blaiszik, Laura Bartolo...5 Data Model Definition Defines the structure of metadata and data Measurement Data Model Metadata e.g. • Sample owner • Date of](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35d7a5c0b8e4185d50c985/html5/thumbnails/20.jpg)
FirstRoughDraftTaxonomy
![Page 21: Carelyn Campbell, Ben Blaiszik, Laura Bartolo...5 Data Model Definition Defines the structure of metadata and data Measurement Data Model Metadata e.g. • Sample owner • Date of](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35d7a5c0b8e4185d50c985/html5/thumbnails/21.jpg)
NotesfromSummitWrap-upSession• Integratetoolsintoundergraduateeducation
– Toolsneedtomoreuserfriendly• Embeddataexpertsintoexperimentalgroups
– Alternate:floatingdataexpertsavailableforexperimentalgroups.– Needtodefineskillsneededforthesedataexperts
• EncouragemoreconferenceexchangesatDataAnalyticsandMaterialscommunities
• Definedatacurationguidelines/code– Benefittousers
• DataChallenge(Student)– Prizefordataset– Bestpaper/DOI/PID
• Developimplementationpath• Improvepeerrecognition• Developdataciteprofile
Interestinfollowingupwithsmallworkinggroupsonspecificissues.
![Page 22: Carelyn Campbell, Ben Blaiszik, Laura Bartolo...5 Data Model Definition Defines the structure of metadata and data Measurement Data Model Metadata e.g. • Sample owner • Date of](https://reader033.vdocument.in/reader033/viewer/2022042419/5f35d7a5c0b8e4185d50c985/html5/thumbnails/22.jpg)