enabling self-management of component-based high-performance scientific applications
DESCRIPTION
Enabling Self-management of Component-based High-performance Scientific Applications. Hua (Maria) Liu and Manish Parashar The Applied Software Systems Laboratory Department of Electrical and Computer Engineering Rutgers University. Challenges. Emerging scientific applications are - PowerPoint PPT PresentationTRANSCRIPT
Enabling Self-management of Component-based High-performance Scientific Applications
Hua (Maria) Liu and Manish Parashar
The Applied Software Systems Laboratory
Department of Electrical and Computer Engineering
Rutgers University
2
Challenges
• Emerging scientific applications are– Distributed, heterogeneous, long-running, dynamic
• Changing user requirements
• Changing problem domains
• Changing context environments
• Emerging execution environments are also– Distributed, heterogeneous, dynamic
• Changing workload and communication capabilities
3
Solution
• Applications should be aware of changes in application/system state and execution context, and respond to them.– i.e., applications should be self-managing or autonomic
• However, this requires a programming system that can support the development and execution of such autonomic self-managing applications.– Extend computational elements (objects, components, and services)
to support autonomic behaviors
– Define dynamic composition (interactions) of autonomic elements that responds to changing user requirements and execution context
– Provide a runtime infrastructure to achieve self-management
4
Outline
• Challenges and solution• Conceptual model of Accord• Prototype implementation based on CCA Ccaffeine
framework• Illustrative applications
5
Overview of Accord Programming System
• Accord supports– Dynamic specification of adaptation behaviors in rules– Runtime enforcement of adaptation behaviors by invoking
sensors and actuators– Runtime conflict detection and resolution
• Key contributions– Accord provides programming abstractions to define the
control port– Accord enables applications to be context-aware and self-
managing– Accord enables element behavior adaptation and interaction
adaptation at runtime
6
Autonomic Element
Element Manager
Functional Port
Autonomic Element
Control Port
Operational Port
Element Manager
Event generation
Actuatorinvocation
OtherInterface
invocation
Internalstate
ContextualstateRules
ComputationalElement
7
The Accord Runtime Infrastructure
Application workflow
Composition manager
Application strategiesApplication requirements
Composition rules
Composition rules
Composition rules
Composition rules
Component rules
Component rules
Component rules
Component rules
8
CCA and Ccaffeine Framework
P0 P1 P2 P3
Components: Blue, Green, Red
Framework: Gray
• Different components in same process “talk to each” other via ports and the framework
• Same component in different processes talk to each other through their favorite communications layer (i.e. MPI, PVM, GA)
• Each process loaded with the same set of components wired the same way
Note: this slide is taken from CCA tutorial – www.cca-forum.org
• The characteristics of scientific applications
•These applications are component-based.•The execution of these applications typically consists of a series of computational phases.
9
Accord-CCA: Extend Ccaffeine to Enable Self-Management Behaviors
Controllable component
Component manager Composition manager
Driver
Ccaffeine framework + TAU
C1
C2
C3C4
10
Manager Components
• Component managers provide component-level adaptations via– Adapting the runtime behaviors of
individual component based on component rules
– Dynamically replacing components based on composition rules
• Composition managers provide application-level adaptations via– Coordinating component
managers’ behaviors
TAU
RulePort
events
C2
C3
11
Rule
Rule {
on events;
when conditions;
do actions;
}
component or system events
component or system sensors
component or system actuators
12
The Rule Enforcement Engine
Batch condition inquiry
Condition evaluation in parallel
Conflict detection and
resolutionReconciliation
Batch action
invocation
Context
Internal state of
elements
Pre-condition
Post-condition
Sensor-actuator conflict:• Detection: Execution of some rules will change the pre-condition• Resolution: Disable these rules
Actuator-actuator conflict:• Detection: The post-condition contains multiple • Resolution: Relax rule condition until no actuators are invoked with different
values by incrementally deleting sensors in a user-specified sequence
15
Reconciliation
C1
C2
Node x
C1
C2
Node y
C1
C2
Node z
Algorithm 1
Algorithm 1
Algorithm 2
C3C3
C4
Case1:
If the replacement on node z has a high priority and the other two have a low priority: propagate the replacement with C4.
If multiple high priority replacements: error.
Case2:
If all the replacements have a low priority, the replacement with highest performance gain will be propagated.
16
The Self-managing CH4 Ignition Simulation: Self-optimizing Via Component Adaptation
Component Manager
0
200000
400000
600000
800000
1000000
1200000
1400000
1000
1200
1400
1600
1800
2000
2200
2400
temperature
the
nu
mb
er
of
inv
oc
ati
on
to
G
rule basedexecution
non rulebasedexecution
3.69%
10.23%21.33%
9.38%
5.36%3.60%
27.42%
9.59%
Rule Generator
Export sensor “temperature” and
actuator “algorithm”
Initializer Executor CvodeThermo
ChemistryRef
A set of algorithms is provided to simulate a set of reaction processes. Some algorithms may not work at some temperatures. Further, these algorithms demonstrate different performance levels (execution time) at the same temperature. So algorithms have to be dynamically selected to avoid application crash and/or optimize application execution.
17
The Self-managing Shock Simulation: Self-optimizing Via Component Replacement
Component ManagerIF cache miss of GodunovFlux > value THEN REPLACE GodunovFlux EFMFlux
Performance toolkit (TAU)
2. collect cache miss of GodunovFlux
3. evaluate the rule
GodunovFluxEFMFlux
1. register cache miss event
4. replace GodunovFlux with EFMFlux
EFMFlux will be used from the next computation
18
The Self-managing Shock Simulation: Self-optimizing Via Component Adaptation
AMRMesh
Component Manager
1. export actuator “algorithm”
IF bandwidth < threshold THEN algorithm x
xy
Performance toolkit (TAU)
3. collect current bandwidth
5. invoke algorithm with x
Algorithm x will be used from the next computation
4. evaluate the rule
2. register communication bandwidth
19
The Self-managing Shock Simulation: Self-healing Via Component Replacement
Component ManagerIF GodunovFlux error
THEN REPLACE GodunovFlux EFMFlux
2. evaluate the rule
GodunovFluxEFMFlux
1. register execution error as a sensor
3. replace GodunovFlux with EFMFlux
20
Conclusion
• The distribution, heterogeneity, and dynamism of emerging environments and applications impose new requirements on programming systems– To support development and execution of autonomic self-
managing applications
• Accord programming system extends CCA Ccaffeine framework to meet the requirements– Extends CCA components with component managers to
autonomic components– Provides a runtime infrastructure to enforce adaptation
behaviors and detect/resolve runtime conflicts
Additional Slides
22
Centralized vs Decentralized Reconciliation
• Centralized approach: one instance collects proposals from other instances and propogates reconciliation result– Converging rate = O(n)– Low scalability– Not robust
• Decentralized approach: each instance only communicates with its neighbors to achieve local consensus– Converging rate = O(lg n)– High scalability– Robust
• Problems to be solved– Local rules used by individual component instances– How to define neighbors