opensaf symposium - intro to opensaf_9.13.11
TRANSCRIPT
Introduction to OpenSAF
David FickDavid FickSenior Software Architect
GoAhead Software
Introduction to OpenSAF
• Service availability and high availability systems and concepts have been around for decades
• However, HA terminology tends to vary from industry to industry and company to company
• Goals of this session:• Goals of this session:– High-level technical overview of the Service Availability™ Forum
standards– Overview of the support of those standards within OpenSAF– Allow you to:
• Familiarize yourself with general HA concepts and terminology OR
• Map the HA concepts and terminology with which you are familiar to the SA Forum and OpenSAF versions
– Resources for getting started with OpenSAF
SA Forum Interfaces: AIS & HPI
SAF
Sys
tem
Ma
na
ge
me
nt
Sys
tem
Ma
na
ge
me
nt
ApplicationsApplications
Service Availability MiddlewareService Availability Middleware
Application Interface Specifications (AIS)Application Interface Specifications (AIS)
AvailabilityAvailability Lock (LCK)Lock (LCK)Software MgmtSoftware Mgmt
Framework (SMF)Framework (SMF)SAF
StandardsImplemented
by OpenSAF
Sys
tem
Ma
na
ge
me
nt
Sys
tem
Ma
na
ge
me
nt
Hardware Platform Interface (HPI)Hardware Platform Interface (HPI)
Hardware Hardware
Platform APlatform AHardware Hardware
Platform BPlatform B
Hardware Hardware
Platform CPlatform CHardware Hardware
Platform DPlatform D
Virtualization Virtualization
Operating SystemOperating System
Log (LOG)Log (LOG)
InformationInformation
Model Model Mgmt (IMM)Mgmt (IMM)
Notification (NTF)Notification (NTF)
ManagementManagement
Framework (AMF)Framework (AMF)
Cluster Cluster Membership (CLM)Membership (CLM)
Platform Platform Mgmt (PLM)Mgmt (PLM) Message (MSG)Message (MSG)
Checkpoint (CKPT)Checkpoint (CKPT)
Event (EVT)Event (EVT)
Lock (LCK)Lock (LCK)Framework (SMF)Framework (SMF)
But how to make sense of the
SA Forum “acronym soup”?
Application ServicesResource Availability Management Services
System Management Services
AIS Service Groupings• First, understand that the AIS services fall into three
logical groupings*:
AvailabilityAvailability
ManagementManagement
Checkpoint (CKPT)Checkpoint (CKPT)InformationInformation
Model Mgmt (IMM)Model Mgmt (IMM)
Log (LOG)Log (LOG)
Software MgmtSoftware Mgmt
Framework (SMF)Framework (SMF)
Notification (NTF)Notification (NTF)
ManagementManagement
Framework (AMF)Framework (AMF)
Cluster Cluster Membership (CLM)Membership (CLM)
Platform Platform Mgmt (PLM)Mgmt (PLM) Lock (LCK)Lock (LCK)
Event (EVT)Event (EVT)
Message (MSG)Message (MSG)
Model Mgmt (IMM)Model Mgmt (IMM)
* - Not official SA Forum AIS service groupings
Services that manage central system capabilities commonly used by both:
• AIS services
• Applications
Services that manage and monitor the state of key system resources that affect availability:• Hardware / Operating
system
• Cluster nodes
• Applications
Optional services to support application operations such as:• Inter-process
communication• State replication
• Shared resource access control
Fault Management Cycle • Second, AIS services that
manage availability are designed around a standard fault management cycle
Detection– Detection• E.g. component
healthchecks
– Isolation
Isolation
Recovery
Repair Notification
– Isolation
• E.g. blade power off
– Recovery
• E.g. failover of workload
assignments to associated
standby resources– Repair
• E.g. automatic restart of
failed resource
– Notification• E.g. state change
notifications sent by service managing the resource
Resource Dependencies• Third, Availability Management in the AIS world is
driven by a detailed understanding of the availability management dependencies across all resource types
– Managed Applications• Simple to complex dependencies and relationships can be
modeled between the various software elements• Dependency on a particular node also modeled
– AMF Node• Represents a node where AMF services are provided
AMF Node
Managed Applications
• Represents a node where AMF services are provided• Depends on a CLM node
– CLM Node• Represents a cluster node where AIS services are
provided• Depends on an Execution Environment (optional)
– Platform Resource• Containment and logical dependencies represented
between platform resources• Execution Environment (EE)
– Represents an operating system instance (standalone or virtual)
• Hardware Element (HE)– Represents a physical hardware resource in the system
Hardware Element
Platform Resource
CLM Node
Execution Environment
Common Design Patterns
• Fourth, the AIS services follow common design patterns:– API
• Common library lifecycle
• Naming conventions• Naming conventions
– Resource managed by service � Managed object
• Typically with associated state model
• Managed objects stored in common information model
– Administrative operations
• X.731 style administrative operations for resources which affect availability
– Notifications automatically generated by AIS services for significant system events (alarms, state changes, etc.)
Resource Availability Management Services
• Availability Management Framework (AMF)– Manages the lifecycle and monitors the state of the managed
applications within the system
– More detail in upcoming slides
• Cluster Membership (CLM)– Provides cluster membership change notifications to AIS services
and interested applications
AMF
and interested applications
– OpenSAF CLM implements cluster management protocol dealing with:
• Cluster formation• Active controller selection & failover• Node failure detection
• Platform Management (PLM)– Manages the state of modeled hardware elements and execution
environments (operating system instances)
– Hardware element states and events accessed through Hardware Platform Interface (HPI)
– Manages graceful blade extraction / de-activation cases
– Supports hardware element controls (power on/off and reset)
– Optional service within OpenSAF
PLM
CLM
Availability Management Framework (AMF)AMF Logical Entities
• Structural Entities– AMF Application
• Represents the highest-level service(s) provided by the system
AMF
Application
Service
Group
1..*
– Service Group (SG)• Represents a group of like
ServiceUnit
Component
1..*
Group
1..*
• Represents a group of like logical resources that provide the same service(s)
• Associated redundancy model (e.g. 1+1)
– Service Unit (SU)
• Aggregates a set of resources which when combined provide a higher-level service
– Component
• Represents one or more resources that perform a function within the system
Availability Management Framework (AMF)AMF Logical Entities
• Workload Entities AMF
Application
Protected byService Group
Service Group
Service
Group
1..*– Service Instance (SI)
• Represents a workload to be supported by the system
• Has associated redundancy
Component
ServiceUnit 1
Component
Instance
Component Service Instance
Assigned
Assigned
ServiceUnit 1
ServiceUnit
ComponentComponent
1..*
1..*
Service Instance
1..*
Group
1..*
– Component Service Instance (CSI)
• Represents a more granular workload that needs to be supported by the system
• Has associated redundancy requirements (1+1, N+M, etc.)
• Protected by an identified SG
• Assigned to one or more SUs with an HA state of active, standby, quiescing or quiesced
• Assigned to one or more components
Availability Management Framework (AMF)AMF Logical Entities
• Common Characteristics– Well-defined state model for each logical
entity type• Operational
• Administrative
• Etc.• Etc.
– X.731 style administrative operations• Lock
• Unlock
• Shutdown
• Etc.
• Common AMF Component Types– SA-aware– Non-proxied, non-SA-aware– Proxied, non-SA-aware
AMF
AMF comp process
Library
AMF
Library
CLC-CLI Scripts
Lifecycle mgmt
HA state assignment
SA-aware Component Example
Availability Management Framework (AMF)Service Group Redundancy Models
• Key redundancy model characteristics– Preferred SI assignment model
• # of active resource(s)
• # of standby resource(s)
– Allowed concurrent HA state assignments for SUs
– # of assignable SUs SI1– # of assignable SUs
• Redundancy model options– 2N
• Most common redundancy model
• 1 active resource and 1 standby resource per SI
• SUs can have either all active or all standby SI assignments
– N+M– No Redundancy– N-way– N-way active
Node1 Node2
SU1 SU2
SI1
A S
SI2
A S
2N Service Group Example
Availability Management Framework (AMF)Error Recovery Policies
• Pre-defined AMF component error recovery policies– Configurable– Can be overridden at runtime
• Up to 3 actions per policy– Isolation– Recovery– Recovery– Repair
• Recovery policy scopes– Component– Service Unit– Node
• Recovery policy types– Restart– Failover– Failfast
• Recovery escalation policies
System Management ServicesInformation Model Management (IMM)
• Information Model Highlights– Based on pre-defined object classes
(including AIS classes)
– Holds both configuration and runtimeobjects
– Used by AIS services to store current configuration and runtime state info
– Can be used by applications as well
• Object Management API• Object Management API– Object class management
– Access object attribute values
– Search information model
– Configuration change requests
– Administrative operation invocation
• Object Implementer API– Runtime object management
– CCB validation and application
– Administrative operation handling
• OpenSAF Implementation– Persistence of information model
managed through Persistence BackEnd(PBE) feature
– Replicated to multiple cluster nodes
System Management ServicesSoftware Management Framework (SMF)
Software
Management
Framework
Upgrade
Campaign
Definition
“Upgrade
Instructions”
Adaptation commands
• SMF controls migration from one deployment configuration to another
• Upgrade methods
– Rolling upgrade
– Single step upgrade
FrameworkAdaptation commands
(SMF config object)
SoftwareRepository
InformationModel
Install / remove
software bundles
on target nodes
- Admin operations
- Read/Create/Delete/Update
objects
• [De-]Activation Unit Scope
– AMF Node
– Service Unit
• During the migration SMF
– Maintains the campaign state change model
– Takes measures to enable error recovery
– Monitors for potential errors caused by the migration
– Deploys error recovery procedures
System Management Services
• Notification (NTF)– Publish-and-subscribe semantics for system-level notifications
– Syntax and semantics for ITU X.73x notifications:
• Alarm / security alarm / state change / object create/ delete / attribute change
– Alarm and security alarm notifications automatically logged – Alarm and security alarm notifications automatically logged through LOG service
• Log (LOG)– Flexible, centralized, system-wide logging mechanism
– Pre-defined log streams: alarm, notification, system
– Multiple, custom application log streams allowed
– Configurable log stream characteristics including:
• log file full action: halt, wrap, and rotate
Application Services
• Checkpoint (CKPT)– Intended as a state replication mechanism for distributed
applications
– Can be used for all standby “temperature levels”
• Cold
• Warm• Warm
• Hot
– Through OpenSAF CKPT service API extension
– Semantics of a checkpoint
• Arbitrary set of sections containing opaque data
• Stored in one or more replicas distributed across cluster
• Reads and writes occur against the active replica
– Both synchronous and asynchronous replication options available
– Collocated checkpoint option provided for highest performance
Application Services• Event (EVT)
– Publish-and-subscribe communication paradigm
– Flexible event channel, pattern, and filtering definition
– Subscriber event queue maintained within app process
• Message (MSG)– Messages sent to and read from message queues– Messages sent to and read from message queues
– Single message queue owner at a time
– Message queue maintained outside app process
– Message queues can be logically grouped
• Messages can be sent to a message queue group
• Associated distribution policy (round-robin, broadcast, etc.)
• Lock (LCK)– Cluster-wide, distributed lock service
– Can be used to control access to cluster-level shared resources
Getting Started with OpenSAF
• OpenSAF Technical Educational Resources– Developer Wiki [http://devel.opensaf.org/wiki]
– OpenSAF Developers blog [http://devel.opensaf.org/blog]
– OpenSAF mailing lists [Subscribe: http://list.opensaf.org/maillist/listinfo/]• Users [Archive: http://list.opensaf.org/pipermail/users/]
• Development [Archive: /]• Development [Archive: http://list.opensaf.org/pipermail/devel/]• Announce [Archive: http://list.opensaf.org/pipermail/announce/]
– Latest documentation [http://devel.opensaf.org/hg/opensaf-4.x-
documentation/archive/tip.tar.gz]
– FAQ [http://www.opensaf.org/HOA/assn14944/images/FREQUENTLY%20ASKED%20QUESTIONS%20ABOUT%20OPENSAF%20RE
LEASE%204%20Final%20for%20publication.docx]
– README files in source code repository
• SA Forum Application Interface Specifications [http://www.saforum.org/Service-Availability-Forum:-Application-Interface-Specification-
~217404~16627.htm]
Questions