national aeronautics and space administration jet propulsion laboratory supporting science through...
Post on 19-Dec-2015
224 views
TRANSCRIPT
National Aeronautics andSpace Administration
Jet Propulsion Laboratory
Supporting Science Through Workflows: Supporting Science Through Workflows: Infrastructure, Architecture and ModelingInfrastructure, Architecture and Modeling
David WoollardNASA Jet Propulsion Laboratory
University of Southern California
D.M. Woollard. Supporting Science Through Workflows. 2
National Aeronautics andSpace Administration
Jet Propulsion Laboratory
Agenda» Motivation
» Classification of in silico Experimentation
» Research Problem» Related Work
» Introduction to Workflow Systems
» Research Goals» Methodology
» Refactoring existing software» Domain Specific Software Architecture
» Evaluation» Conclusions & Future Work
D.M. Woollard. Supporting Science Through Workflows. 3
National Aeronautics andSpace Administration
Jet Propulsion Laboratory
Motivation• The nature of scientific investigations has changed.• Two major trend lines:
– Simulation via computer has for many replaced in vivo and in vitro science.
– Collaborations are growing (system of systems science).
• New discoveries in materials science, chemistry, physics, planetary science, and even social sciences are made via in silico experimentation.
D.M. Woollard. Supporting Science Through Workflows. 4
National Aeronautics andSpace Administration
Jet Propulsion Laboratory
in silico Experimentation• Discovery is a phase
is which a scientist rapidly prototypes, tests hypotheses, and develops a methodology
Discovery Production
Distribution
Theory
Practice
Development
Execution
Lone Researcher[Kepner 03]
D.M. Woollard. Supporting Science Through Workflows. 5
National Aeronautics andSpace Administration
Jet Propulsion Laboratory
in silico Experimentation• Production is the
engineeringengineering of replicating an experiment on large volumes of data.
Discovery Production
Distribution
We will focus on Production SystemsProduction Systems in this talk.
D.M. Woollard. Supporting Science Through Workflows. 6
National Aeronautics andSpace Administration
Jet Propulsion Laboratory
in silico Experimentation• Distribution is a phase
in which data is dispersed to peers for review and further experimentation including:PapersPapersFederated DataFederated DataDigital LibrariesDigital Libraries
Discovery Production
Distribution
D.M. Woollard. Supporting Science Through Workflows. 7
National Aeronautics andSpace Administration
Jet Propulsion Laboratory
The Role of Technology• In silico science, especially system of systems
science, is facilitated by the Grid.
“The sharing that we are concerned with is not primarily file exchange but rather direct access to computers, software, data, and other resources, as is required by a range of collaborative problem-solving and resource- brokering strategies emerging in industry, science, and engineering.”
The Anatomy of the Grid (2001)
D.M. Woollard. Supporting Science Through Workflows. 8
National Aeronautics andSpace Administration
Jet Propulsion Laboratory
Research Problem• Scientists harness complex hardware and software systems in order to
conduct scientific research in silicoin silico.
Meeting these production requirements causes scientists to engineer a production system or a software engineer to rewrite scientific code. This is both inefficientinefficient and costlycostly.
• Once algorithms and processes are established, production systemsproduction systems are created to produce large volumes of data.
• Designing a production system is a complex engineering taskcomplex engineering task as well as a complex scientific task.
D.M. Woollard. Supporting Science Through Workflows. 9
National Aeronautics andSpace Administration
Jet Propulsion Laboratory
Introduction to Workflows
ProductionSystems
Grid Systems
Grid Systems have traditionally focused on creating Virtual Virtual OrganizationsOrganizations.
In Grids, workflowsworkflows orchestrate processing tasks in production systems.
Workflows are a processing model that incorporate actors, tasks, data, actors, tasks, data, and rulesand rules.
Workflows
T1
T2
T3
T4
T0
Workflow management systems execute tasks on data once the task’s dependencies are satisfied based on rules.
D.M. Woollard. Supporting Science Through Workflows. 10
National Aeronautics andSpace Administration
Jet Propulsion Laboratory
Workflow System Model
D.M. Woollard. Supporting Science Through Workflows. 11
National Aeronautics andSpace Administration
Jet Propulsion Laboratory
Workflows Workflows Everywhere
Condor-G
Pegasus
Wings
Taverna
Grid Workflow
YawlDAG-Man
Triana
ICENI
VDS
GridAnt
GrADS
GridFlow
Unicore
GridbusAskalon
Kepler
Karajan
SciFlowOODT
D.M. Woollard. Supporting Science Through Workflows. 12
National Aeronautics andSpace Administration
Jet Propulsion Laboratory
Bottom-up Taxonomy
• Yu & Buyya presented a taxonomy [Yu & Buyya 05]
– Based on workflow properties like model representation and scheduling policy
– Illustration of divergence in the field
No taxonomy by interface to task code.
D.M. Woollard. Supporting Science Through Workflows. 13
National Aeronautics andSpace Administration
Jet Propulsion Laboratory
Insights from an Architect• Each production workflow task is a complex software application
with two primary stakeholders: the scientist and the engineer.
• Software architectures are a system’s blueprint–its form, elements, and rationale [Perry & Wolf, 92].
• An architecture provides appropriate viewsappropriate views for each stakeholder in addition to encapsulation of computation and communication. These are the architecture’s componentscomponents, connectorsconnectors and topologytopology.
• Reification of architectural elements in code is a method of bridging the gap between design and implementation. First-First-class connectorsclass connectors and explicit interfacesexplicit interfaces are such reifications.
D.M. Woollard. Supporting Science Through Workflows. 14
National Aeronautics andSpace Administration
Jet Propulsion Laboratory
Research Goals
• Develop a Domain Specific Software Architecture (DSSA) for tasks in scientific workflows.
• Develop a methodology for refactoring existing scientific code into this DSSA.
• Minimize overhead (computation time and memory footprint).
• Maximize science code reuse.
D.M. Woollard. Supporting Science Through Workflows. 15
National Aeronautics andSpace Administration
Jet Propulsion Laboratory
Agenda» Motivation
» Classification of in silico Experimentation
» Research Problem» Related Work
» Introduction to Workflow Systems
» Research Goals» Methodology
» Refactoring existing software» Domain Specific Software Architecture
» Evaluation» Conclusions & Future Work
D.M. Woollard. Supporting Science Through Workflows. 16
National Aeronautics andSpace Administration
Jet Propulsion Laboratory
Decomposing Software• Decomposition, the first step in the approach, is a process in
which scientific modulesscientific modules are identified and control flow determined.
• Scientific modules are like functionsfunctions - they have internal scope and a single entry and exit point. In graph theoretic terms, the call dominancy tree for the basic blocks in the module only have one source and one sink.
• The proper level of decomposition is dependant on both scientific functionality and engineering requirements. Therefore, it should be “tunabletunable.”
Decomposition Architecting
Deployment
D.M. Woollard. Supporting Science Through Workflows. 17
National Aeronautics andSpace Administration
Jet Propulsion Laboratory
“Injecting” Architecture
Decomposition Architecting
Deployment
• In the second part of the approach, these modules must be “architected”“architected” into a workflow task with connectors to services at appropriate levels (to satisfy production requirements).
• We use Prism-MW wrapperswrappers to encapsulate and componentized these decomposed modules. This provides us with a standard interface and utilities at the module level for employing event-based communication.
• We use the Exogenous Connector style Exogenous Connector style [Lau et. al.] to mimic the original control and data flow in the workflow task and augment these connectors with a specialized version of the invoking invoking connectorconnector.
D.M. Woollard. Supporting Science Through Workflows. 18
National Aeronautics andSpace Administration
Jet Propulsion Laboratory
Deploying to the Grid
Decomposition Architecting
Deployment
• Deployment is the last step in our approach. • We currently deploy the resulting workflow component into the
OODT Science Data System environment. This is a grid workflow management system used at JPL.
• We should note that this choice is purely for the sake of developer convenience, the approach such be deployable to any target workflow management system.
D.M. Woollard. Supporting Science Through Workflows. 19
National Aeronautics andSpace Administration
Jet Propulsion Laboratory
SWSA Architecture
Scientific Workflow Software Architecture (SWSA), a domain specific software
architecture for workflow tasks.
D.M. Woollard. Supporting Science Through Workflows. 20
National Aeronautics andSpace Administration
Jet Propulsion Laboratory
Preliminary Evaluation
• We chose a canonical scientific application (matrix multiplication) implemented in both Fortran and C
• Six different metrics were taken: – Execution time for:
• Base application• Wrapper (no data exchanged)• Wrapper (data exchanged)
– Memory Footprint• Base application• Wrapper (no data exchanged)• Wrapper (data exchanged)
D.M. Woollard. Supporting Science Through Workflows. 21
National Aeronautics andSpace Administration
Jet Propulsion Laboratory
Preliminary Evaluation
Refactoring Methodology Example: Molecular Dynamics Simulation
Performance results are very promising:
Time Overhead: 1.85%
Code Reuse: 96.77%
D.M. Woollard. Supporting Science Through Workflows. 22
National Aeronautics andSpace Administration
Jet Propulsion Laboratory
Conclusions & Future Work• Scientific Workflow Software Architecture (SWSA) improves
upon existing workflow systems by providing:– A methodology for accessing services.
– A separation of concerns between scientific algorithms and production features of code.
– A clean separation of roles between the scientist and the engineer.
• Satisfies the “cult of performance.”
• Future Work– Extended evaluation on more advanced simulation codes.
– Expansion of the the architecture to support parallel codes.
D.M. Woollard. Supporting Science Through Workflows. 23
National Aeronautics andSpace Administration
Jet Propulsion Laboratory
Thank You
Portions of this research were conducted at the Jet Propulsion Laboratory managed by the California Institute of Technology under a contract with the National Aeronautics and Space Administration.
For more information, please see:• D. Woollard, N. Medvidovic, Y. Gil, and C. Mattmann. “Scientific Software as Workflows: FromDiscovery to Distribution.” To appear in IEEE Software Special Issue on Developing Scientific Software, 2008.• D. Woollard, D. Freeborn, E. Kay-Im, S. LaVoie. “Case Studies in Science Data Systems: Meeting Software Challenges in Competitive Environments.” To appear in Proceedings of the 10th International Conference on Space Operations (SpaceOps-2008), AIAA press, Heidelberg, Germany, May 2008.• D. Woollard. “Supporting Scientific Workflows Through First-Class Connectors.” Qualifying Examination Report. University of Southern California. May, 2007.• D. Woollard, C. Mattmann, and N. Medvidovic "Injecting Software Architectural Constraints into Legacy Scientific Applications." USC Center for Software Engineering Technical Report, USC-CSE-2007-701, January 2007.