high availability and scalability - osos.itec.kit.edu/downloads_own/teach_wt1011_systemz... · •...
TRANSCRIPT
© 2010 IBM Corporation
High Availability and Scalabilitywith System z and z/OS
Joachim von Buttlar, Robert Vaupel
IBM Deutschland Research & Development GmbH
© 2010 IBM Corporation2
Contacts?
� Joachim von Buttlar– System z Firmware Development– [email protected]
� Robert Vaupel– z/OS Workload Management Development and Design– IBM Senior Technical Staff Member– [email protected]
© 2010 IBM Corporation3
Summary
� Summary of Course– What are the Highlights?– What to remember?
� About the Test (Prüfung)
� Course Assessment
� Outlook, Science Relationship, Summer
� 100 years IBM
© 2010 IBM Corporation4
System z and z/OS History
1960 1965 1970 1975 1980 1985 1990 1995 2000 2005
7. April 1964Introduction ofS/360 Architecture
Symmetric MultiProcessing
Virtual Memory
ExpandedStorage
2GB AddressingLPAR
AccessRegisters
Data-spaces
CMOSTechnology
ParallelSysplex
64bitAddressing
S/360 S/370 S/390 z Architecture
MVT MVS/370 MVS/XA MVS/ESA OS/390 z/OSSVS
Fixed Storage15 Partitions
or Tasks
Address Spaces
Multiple VirtualStorage
One16MB
VSArea
2 GBVirtual
Storage
ExpandedStorage
Virtual I/OFast
ProgramLoad
Dynamic I/OPosix
Cluster
ParallelSysplex
WorkloadManagementUnix System
ServicesTCP/IP
...
JavaWebsphereIEEE Float
64 bitIRD
HiperdispatchOffloadSecurityGDPS
...
MFT
© 2010 IBM Corporation5
What to Remember: Mainframe Computing
Mainframes are computers which– Execute hundreds of applications– Connect to thousands of I/O devices– And serve thousands of users simultaneously
Mainframes can best be defined by their characteristics
– The most important characteristic is to ensure a reliable and predictable execution of transactions
– The importance of mainframes is for data base transaction processing and as the backend in data centers
� Mainframe computers are characterized by a set qualities (Quality of Services):
– Security– Scalability– Compatibility– Availability– Reliability– Serviceability
� Mainframes are used by companies who have the need to store huge amounts of data
� 95% of the 2000 world-wide biggest companies use System z computers
� Around 65-70% of all relevant data are stored on System z computers
� 60% of all data being access thru the world wide web are stored in databases on System z (DB2, VSAM, and IMS)
© 2010 IBM Corporation6
What to Remember: Mainframes are highly virtualized computer environments
z/VM V4
CMS
Linux
CMS
Linux
LPAR
Linux
VSE
z/OS
z/VSE
z/OS
z/VM
Linux
Linux
LPAR LPAR LPAR
Linux
z/VSE
CP1 CP2 CP3 CP4 IFL1 IFL2 IFL3
System z Enterprise Server
Java
z/VM
Linux
LPAR
Standard Processors
LPAR
Linux
z/VSE
zIIP
DB2
Offload Engines Linux Engines
C ICS
IMS
SAP
zAAP
Linux
Linux
Linux
C ICS
Batch
© 2010 IBM Corporation7
What to Remember: Virtualization and z/VM
� First level of virtualization
� Up to 60 LPARs in a system
� Memory dedicated, no paging
� CPUs dedicated or shared
� I/O dedicated or shared
� Any mixture of ESA/390 and z/Architecture LPARs
� EAL5 certified: LPARs in a single system are separated as good as two separate systems
� Focus is on performance
� Second level of virtualization
� Any number of virtual machines
� Memory subject to paging
� CPUs dedicated or shared
� I/O dedicated or shared
� Any mixture of ESA/390 and z/Architecture virtual machines
� Several communication vehicles between virtual machines
� Focus is on virtualization
� z/VM may run under z/VM that runs under z/VM that ....
© 2010 IBM Corporation8
What to Remember: Processor Types
� All processors are identical from the hardware perspective
� Characterization occurs during system initialization and can be changed on the fly (concurrent upgrade, downgrade, repair scenarios)
� CPs, zAAPs, zIIPs, IFLs run customer software (operating system and applications)– CPs are general purpose CPUs– zAAPs, zIIPs are exploited by z/OS– IFLs are exploited by Linux on System z– zAAPs, zIIPs, IFLs, ICFs are cheaper, no software charges apply
� ICFs run Coupling Facility Control Code for a Parallel Sysplex
� SAPs run firmware only (primarily I/O in a running system)
� Spare PUs are idle and can be activated on the fly
� Configuration changes can be permanent or temporary
© 2010 IBM Corporation9
What to Remember: Firmware Layers
� Major parts of instruction set implemented in hardware
� Millicode layer for complex instructions, interrupts, virtualization
� i390 layer for complex functionality such as I/O subsystem
� LPAR hypervisor for logical partitioning
© 2010 IBM Corporation10
What to Remember: Hardware
• Multi-level lookup tables for dynamic address translation (DAT)
• 2 – 5 levels, depending on the size of the address space needed
• Enhanced DAT support allows 1MB segments as lowest level, instead of 4KB pages
• Complex instruction set (about 1,000 instructions)
• Binary integer data• Fixed decimal data for
commercial applications• Three floating-point format
(Binary, Decimal (for commercial applications), Hex)
• String processing
• Time-of-day clock for date/time representation
• Clock comparator for real-time measurements
• CPU timer for consumed time measurements
• TOD clock steering fine-tunes the TOD clock (synchronization with other systems)
© 2010 IBM Corporation11
What to Remember: System z Quality of Services
� RAS – Reliability– Availability– Serviceability
� Security / Integrity
� Scalability
� Manageability– Centralized control– Workload management
� Virtualization / Partitioning Technology – Workload separation
� Capacity– Evolving architecture
� Flexibility / Variety– Multiple workloads, multiple users
� Compatibility
� Capability– Autonomic features
© 2010 IBM Corporation12
What to Remember: Scalability
� Scale-up Example for System z9
� Allows installations to choose the capacity they need in a granular fashion and to grow when business needs require it
0
100
200
300
400
500
600
1-way 2-way 3-way 4-way 5-way 6-way 7-way 8-way
4xx
5xx
6xx
7xx
4xx 5xx 6xx 7xx
Model S08
© 2010 IBM Corporation13
What to Remember: Scalability …
� Issue– Systems with many processors do not scale very well
• Extreme: Overall system performance degrades with more processors– Amplification
• Today’s processors are very fast (5.2 GHz) compared to memory accesses• Virtualization requires that many logical processors share the same physical processor• Too many processors do not fit on the chip/ module• This requires that processors may need to access data and instructions from remote memories
� Solutions– Hierarchic cache structures to mitigate memory accesses � L1, L2, L3, L4 Caches– Grouping of processors into nodes in order to limit remote cache accesses � Hiperdispatch– Ensure that at least some logical processors can use physical processors nearly exclusively �
Hiperdispatch– Clustering of multiple systems � Parallel Sysplex
© 2010 IBM Corporation14
What to Remember: Scalability …
� Scalability issues also exist within the operating system– Example:
• Address Space: allows to easily observe how the architecture has changed to accommodate the increasing resource demand
• Virtualization� Running multiple logical systems on the same hardware has implications for internal processing of the
partitions� Spin Lock Processing:
� Critical Path which needs to be executed fast� If interrupted because of virtualization it requires coordination between the involved layers
© 2010 IBM Corporation15
What to Remember: Scalability and Workload Management
� Ensures that many different workloads can share the system(s) at the same time
� Ensures that workload gets access to resources based on their importance, demand and goal fulfillment�Tries to avoid over achievement, �tries to optimize workload throughput
� Includes technologies to optimize the scalability for work on System z– Starts/stops server spaces to help scale-out of middleware on z/OS– Start/stops initiators for batch work across systems– Shifts weights between partitions of the same CEC– Balances work across processor nodes to enable Hiperdispatch– Gives routing recommendations to load balancing functions– Promotes work to resolve lock contentions
© 2010 IBM Corporation16 Template Documentation2/11/2011
Unplanned Outage Causes
25%
30%45%
Operator Errors
Application FailuresHardware
Failures
IDC 2005
What to Remember: High AvailabilityBusiness Issue of “Non-Availability“
E.g. “Toll Collect”: The state of Germany and the company collecting toll on the autobahn agreed on a contractual penalty of €30 Million for each 1 hour of down time (represents €500.000 / min).
� On demand challenges– Downtime unaffordable– Heterogeneous by nature– Complex to manage
� Loss of business� Loss of customers – the
competition is just a mouse click away
� Loss of credibility, brand image and stock value
© 2010 IBM Corporation17
What to Remember: High Availability Disaster Recovery, Continuous Operations
121
2
34
5678
9
1011 12
12
34
56
789
1011
121
2
34
56
78
9
10
11
Single System Parallel Sysplex Geographical Dispersed PS
1 to 32 Systems Site 1 Site 2
• MTBF – in decades
• Built-In redundancy
• On/Off Capacity on Demand
• Capacity Backup
• Hot pluggable I/O
• Addresses planned and unplanned HW/SW outages
• Flexible, non-disruptive growth
• Capacity beyond largest CEC
• Scales better than SMPs
• Dynamic workload / resource management
• Addresses site failure / maintenance
• Metro / Global data mirroring
• Sync (PPRC) – 100 km
• Async (XRC) – any distance
• Eliminates tape / disk Single Point of Failure (SPOF)
• No / Some data loss
• Application independent
• Using an ICF, a single CEC (Central Electronic Complex) Parallel Sysplex can be defined• Maintenance on LPAR without loss of data• Protection from software outages
Clustering in a Box
© 2010 IBM Corporation18
What to Remember: High Availability and Workload Management
� Ensures that many different workloads can share the system(s) at the same time
� Ensures that workload gets access to resources based on their importance, demand and goal fulfillment�Tries to avoid over achievement, �tries to optimize workload throughput
� Includes technologies to optimize the availability for work on System z– Starts/stops server spaces to ensure that middleware servants are always available– Start/stops initiators for batch work across systems to ensure that work finds the best place to execute– Shifts weights between partitions of the same CEC to ensure that important work is not being harmed
by lower important work– Gives routing recommendations to ensure that work finds the best place to execute– Promotes work to ensure that work can continuously operate
© 2010 IBM Corporation19
What to Remember: High Availability and z/OS
� Error recovery – Ensures that the operating can
continue to execute even in case of errors
– Procedures are also used by z/OS participants (middleware and applications)
� Isolation– Address space concept ensures
isolation of different participants in the system
• Data exchange and communication on the other hand requires some thought and prepartion
� SRBs and Common Storage� Program Call and Cross Memory� Access Registers
© 2010 IBM Corporation20
Summary
� Summary of Course– What are the Highlights?– What to remember?
� About the Test (Prüfung)
� Course Assessment
� Outlook, Science Relationship, Summer
� 100 years IBM
© 2010 IBM Corporation21
� Konzepte� Zusammenhänge� Unterschiede zu anderen Plattformen� Techniken
� Hochverfügbarkeit� Skalierbarkeit� Virtualisierung
� Beispiel-Fragen� Welche Methoden sind Ihnen bekannt, um Hochverfügba rkeit zu erreichen?� Beschreiben Sie die Virtualisierungstechniken auf d em Mainframe.� Was ist und wie funktioniert ein Parallel Sysplex?� Was versteht man unter der sog. “Charakterisierung“ von System z
Prozessoren? Warum macht man so etwas?� Erklären Sie kurz die wichtigsten Qualities of Serv ice eines Mainframes.
Ein Wort zur Prüfung
© 2010 IBM Corporation22
Summary
� Summary of Course– What are the Highlights?– What to remember?
� About the Test (Prüfung)
� Course Assessment
� Outlook, Science Relationship, Summer
� 100 years IBM
© 2010 IBM Corporation23
Course Assessment
© 2010 IBM Corporation24
Course Assessment …
© 2010 IBM Corporation25
Course Assessment: Comparison to Last YearN=14,15 N=5,6
© 2010 IBM Corporation26
Course Assessment: Feedback
� Nicht gefallen – Verbesserungswürdig– Unterschiedliche Terminologien zu anderen Vorlesungen (2x erwähnt)– Altes Skriptum– Umfang, keine Abgrenzung zu prüfungsrelevanten und nicht-relevanten
Inhalten– Etwas mehr Interaktivität (zB. Fragen ans Auditorium, für Widerholung)– Mehr Demo (mehrmals um Teile zu vertiefen)
� Gut gefallen– Live-Demo– Bezug zur Praxis, Stories von Kunden– Viele OS Thematiken an einem konkreten System behandelt, ergibt eine
„Runde Sache“ (Zusammenspiel verschiedener Teile war gut zu erkennen)
© 2010 IBM Corporation27
Summary
� Summary of Course– What are the Highlights?– What to remember?
� About the Test (Prüfung)
� Course Assessment
� Outlook, Science Relationship, Summer
� 100 years IBM
© 2010 IBM Corporation28
Sommersemester 2011 – Ankündigung Vorlesung
Titel:
Zuverlässigkeit, Wartbarkeit, Virtualisierung und Sicherheit von Unternehmensservern am Beispiel von IBM System z
Lehrstuhl:
Prof. Dr. Frank Bellosa, KIT
Dozent:
Joachim von Buttlar & Team, IBM
Themen:
zArchitecture Reliability, Servi ceability, Virtualization Secu rity Error Handling /Prevention i ncl. Demo
.
© 2010 IBM Corporation29
Titel:
Semesterpraktikum zu dem Thema Unternehmensserver (Dauer 6 Monate)
Lehrstuhl:
Prof. Dr. Ralf Reussner, KIT
Dozent:
Dr. Michael Kuperberg, KIT
Details unter:
http://www.iic.kit.edu/
Sommersemester 2011 – Ankündigung Semesterpraktikum
© 2010 IBM Corporation30
- Vorlesungen- Praktika- Forschungsprojekte- Bachelor-, Master-, Diplomarbeiten- Board of System z Jobs - Veranstaltungen- Ansprechpartner- ...
Details unter:
http://www.iic.kit.edu/
Informationsportal
© 2010 IBM Corporation31
Summary
� Summary of Course– What are the Highlights?– What to remember?
� About the Test (Prüfung)
� Course Assessment
� Outlook, Science Relationship, Summer
� 100 years IBM
http://www.youtube.com/watch?v=39jtNUGgmd4&feature=share
© 2010 IBM Corporation32
© 2010 IBM Corporation33
Literature
� Introduction to the New Mainframe: Large-Scale Commercial Computing– http://www.redbooks.ibm.com/abstracts/sg247175.html?Open
� ABCs of z/OS System Programming Volume 11,– http://www.redbooks.ibm.com/abstracts/sg246327.html
� Documents for Workload Management– http://www-03.ibm.com/servers/eserver/zseries/zos/wlm/documents/
• z/OS Workload Manager: How It Works and How To Use It, April 2004 – http://www.research.ibm.com/journal/sj/362/aman.html
• Adaptive algorithms for managing a distributed data processing workload
� Das Betriebssystem z/OS und zSeries, M.Teuffel, R.Vaupel, ISBN 3-486-27528-3
© 2010 IBM Corporation34
The following are trademarks of the International B usiness Machines Corporation in the United States a nd/or other countries.
The following are trademarks or registered trademar ks of other companies.
* Registered trademarks of IBM Corporation
* All other products may be trademarks or registered trademarks of their respective companies.
Java and all Java-related trademarks and logos are trademarks of Sun Microsystems, Inc., in the United States and other countriesLinux is a registered trademark of Linus Torvalds in the United States, other countries, or both.UNIX is a registered trademark of The Open Group in the United States and other countries.Microsoft, Windows and Windows NT are registered trademarks of Microsoft Corporation.Red Hat, the Red Hat "Shadow Man" logo, and all Red Hat-based trademarks and logos are trademarks or registered trademarks of Red Hat, Inc., in the United States and other countries.SET and Secure Electronic Transaction are trademarks owned by SET Secure Electronic Transaction LLC.
Notes : Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here. IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply.All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions.This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without notice. Consult your local IBM business contact for information on the product or services available in your area.All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography.
APPN*CICS*DB2*DB2 ConnectDirMainte-business logo*ECKDEnterprise Storage Server*ESCON*FICON*GDPS*Geographically Dispersed Parallel Sysplex
HiperSocketsHyperSwapIBM*IBM eServerIBM e(logo)server* IBM logo*IMSLanguage Environment*MQSeries*Multiprise*NetView*On demand business logo
OS/390*Parallel Sysplex*PR/SMProcessor Resource/Systems ManagerRACF*Resource LinkRMFS/390*Sysplex Timer*System z9TotalStorage*Virtualization Engine
VM/ESA*VSE/ESAVTAM*WebSphere*z/Architecturez/OS*z/VM*z/VSEzSeries*
Trademarks