hms 2010: linux on system z – z/vm · 2010-09-02 · • also called an lpar, virtual machine, or...
TRANSCRIPT
HMS 2010 Virtualisierung
1
© 2010 IBM Corporation
HMS 2010:
Linux on System z – z/VM
Arwed Tschoeke – Systems Architect
© 2010 IBM Corporation2
System z - Overview
� System z Hardware
� z/OS
� Linux on System z
� System z – Virtualization
� z/VM
� Demo
HMS 2010 Virtualisierung
2
© 2010 IBM Corporation3
System zHardware
Linux on System z – z/VM
© 2010 IBM Corporation4
System z Terminology – Frame (System z10)
I/O Cages• Contains I/O cards
• OSA (Ethernet) or• FICON (disk) or• Cluster
CEC• Central Electronics Complex
• Houses cores and memory
• Stored in 1-4 bookson System z
Cooling (MRU)• Uses fans and modular refrigeration unit
• Special liquid cooling unit
• Improves reliability
Support Elements• Used primarily by the
IBM hardware customer engineer
• Checking hardware messages
• Shutdown or restart the system
HMS 2010 Virtualisierung
3
© 2010 IBM Corporation5
Compare and contrast
© 2010 IBM Corporation6
z10 EC Book Layout (z196 similar)
Power Supplies
FanoutCards
Memory
Memory
Coolingfrom/to MRU
Rear Front
MCM
HMS 2010 Virtualisierung
4
© 2010 IBM Corporation7
System z - Packaging and Performance
� Multi-Chip Module (MCM)– Dense packaging reduces latency
– Fewer parts improving availability
– IBM z10 – 4.4 GHz cores– Z196 – 5,2 GHz cores
� High CPU Utilization– Shared cache interconnect
– Flattest IBM memory model
– Twenty cores per MCM
MCM Physical ViewZ10 z196
© 2010 IBM Corporation8z196T LLB8
z196 vs. z10 hardware comparison
� z10 EC– CPU
• 4.4 Ghz– Caches
• L1 private 64k instr, 128k data• L1.5 private 3 MBs• L2 shared 48 MBs / book• book interconnect: star
� z196– CPU
• 5.2 Ghz• Out-of-Order execution
– Caches
• L1 private 64k instr, 128k data• L2 private 1.5 MBs• L3 shared 24 MBs / chip• L4 shared 192 MBs / book• book interconnect: star
Memory
L2 Cache
L1.5
CPU
L1.5
CPU
L1.5
CPU
…L1 L1L1
Memory
L4 Cache
L3 Cache L3 Cache…L2
CPU 1
L2
CPU 4
L1 L1
L2
CPU 1
L2
CPU 4
L1 L1……
HMS 2010 Virtualisierung
5
© 2010 IBM Corporation9
System z Parts Nomenclature
CEC (central electronics complex)Server
Computer
Processor, Engine, PU (processing unit) IOP (I/O processor)CPU (central processing unit)CP (central processor)SAP (system assist processor)Specialty engines
–IFL (Integrated Facility for Linux)
–zAAP (System z Application Assist Processor)–zIIP (System z9 Integrated Information Processor)
Processor
DASD – Direct Access Storage DeviceDisk, Storage
Storage (though we are moving toward "memory")Memory
System zx86, UNIX, etc.
© 2010 IBM Corporation10
Server Virtualization Terms
Hypervisor• “Virtualization” software• Divides real computing into logical computers or LPARs• Referred to as “PR/SM” on System z
Logical Partition• Also called an LPAR,
virtual machine, or VM or guest (z/VM)
• Runs an operating system such as z/OS, Linux , TPF, z/VSE, AIX, IBM i, Windows
DiskEthernet
Memory Virtualization• Dedicated to an PR/SM LPAR • Shared by guests within z/VM
Computer Memory
Hardware
Hypervisor
LogicalPartition
or
VirtualMachine
LogicalPartition
or
VirtualMachine
LogicalPartition
or
VirtualMachine
I/O Virtualization – Provided by• Hypervisor (VMware)• I/O owning LPAR (PowerVM, Xen)• Direct hardware virtualization (z)
H yperviso r
VirtualMachines
orGuests
2nd Level Hypervisor• Run an Hypervisor inside
of an LPAR• Provides unique features• Example: z/VM
HMS 2010 Virtualisierung
6
© 2010 IBM Corporation11
Processor Configuration - Example
IBM System z Server
CP CP CP CP CP
LPAR1
z/OS
LPAR2
z/OS DB2Offload
zIIP
JAVAOffload
zAAP
I/OOffload
SAP
LPAR3
z/VM
LPAR4
z/VM
Linux LinuxLinux
z/VM
Linux
IFL IFL IFLIFL
© 2010 IBM Corporation12
System z Terminology - I/O Subsystem
� I/O Subsystem – System Assist Processors (SAP)
– Channels
– Control Units (CU)
� Why an I/O subsystem?– Allow I/O device sharing between LPARs
– Allow I/O prioritization between LPARs
– Off-load I/O cycles
• I/O can be cycles can be significant• Don’t pay z/OS software costs on the I/O
portion of the work– Simplify disaster recovery by virtualizing I/O
System z Hardware
Hypervisor
LPAR LPAR
C A N N E LH S
Disk
CU CU
Disk
CU
Disk
Ethernet
SAP SAP
LPAR
HMS 2010 Virtualisierung
7
© 2010 IBM Corporation13
Capacity Backup
� Primary Site– Active z/OS and Linux capacity– Can also have CBU capacity at primary site
� Recovery Site– Capacity backup processors– All LPARs defined – Minimum of one z/OS processor– Five - ten test days – more can be acquired– Software licenses for active processors only– Cross site storage replication
� Low Cost In-house Disaster Recovery
Hyperv isor
z/OS(Standard CPs)
Linux(IFLs)
Hyperv isor
Capacity Backup
z/OS
Other Engines
Primary Site Secondary Site
Prod Activ eCopy
TestCopy
© 2010 IBM Corporation14
TCO? - A range of costs factors
� Availability• High availability
• Hours of operation
� Backup / Restore / Site Recovery• Backup & Restore
• Disaster Scenario
• Effort for Complete Site Recovery
• SA N effort
� Infrastructure Cost• Space, Pow er, Cooling
• Netw ork Infrastructure
• Storage Infrastructure
� Additional development and implementation
• Investment for one platform –reproduction for others
� Controlling and Accounting• Analyzing the systems
• Cost
� Operations Effort• Monitor ing, Operating
• Problem Determination
• Server Management Tools
• Integrated Server Management –Enterprise Wide
� Security• Authentication / Authorization
• User Administration
• Data Security
• Server and OS Security
• RA CF vs. other solutions
� Deployment and Support • System Programming
Keeping consistent OS and SW Level
Database Effort
• Middlew are
SW Maintenance
SW Distribution (across f irew all)
• Application
Technology Upgrade
System Release change w ithout interrupts
� Resource Utilization and Performance• Mixed Workload / Batch
• Resource Sharing
shared nothing vs. shared everything
• Parallel Sysplex vs. Other Concepts
• Response Time
• Performance Management
• Peak handling / scalability
� Operating Concept• Development of an operating procedure
• Feasibility of the developed procedure
• Automation
� Integration• Integrated Functionality vs. Functionality to be
implemented (possibly w ith 3rd party tools)
• Balanced System
• Integration of / into Standards
� Further Availability Aspects• Planned & Unplanned outages
• Automated Take Over
• Uninterrupted Take Over (especially for DB)
• Workload Management across physical borders
• Business continuity
• Availability effects for other applications / projects
• End User Service, End User Productiv ity
• Virtualization
� Skills and Resources• Personnel Education
• Availability of Resources
HMS 2010 Virtualisierung
8
© 2010 IBM Corporation15
z/OSLinux on System z – z/VM
© 2010 IBM Corporation16
Common z/OS Software Components
z/OS – Major Components
CryptoServices
JobOutput
(SDSF)
SoftwareMaintenance
(SMP/E)
PerformanceMonitoring
(RMF)
AutomatedDisk Mgt
(DFSMS)
OtherComputers
Database(DB2)
BatchJobs
SECURITY
MGR
End UserInterfaces
(ISPF, TSO)(UNIX Shells)
Tra nsactionManagers
(We bSphere)( CICS )( IMS )
Administratoror
Developer
BusinessUser
WorkloadManagers
(WLM / IRD)
JobManagement
(JES)
SN ATCP/IP
ActivityReporting
(SMF)
z/OSUNIX
Services
MicroKernel
HMS 2010 Virtualisierung
9
© 2010 IBM Corporation17
Job Entry Subsystem (JES)
� Description– Component of z/OS that provides the
necessary functions to get jobs into, and out of, the system
– Manages jobs before and after running the program; the base control program manages them during processing
– JES2 and JES3 are available
� Benefits– Improves system efficiency– Can participate in a Parallel Sysplex
Input Devices z/OS
Input Output
JES2
Job ExecutionJob
ControlLanguage
Output Devices
Output
BCPCheckpoint
© 2010 IBM Corporation18
TSO/E
�TSO/E is a base element of the z/OS operating syste m– Allows users to interactively work with the system
– Makes it easier for people with all levels of experience to interact with z/OS
�Use TSO/E in either of the following two ways:– Line Mode
• The quick and direct way to use TSO– ISPF/PDF
• The principal way to use TSO• It provides dialog management service to
enable users to navigate through panels
HMS 2010 Virtualisierung
10
© 2010 IBM Corporation19
ISPF
� The Interactive System Productivity Facility/Progra m Development Facility (ISPF/PDF) is a set of panels that help youmanage libraries of information on a z/OS system.
© 2010 IBM Corporation20
The New Face of z/OS
� Once upon a time, cards and 3270 were the only ways to access z/OS
� Now we have web applications and web services� For developers and admins, we also have Web and Ecl ipse
interfaces to develop for and administer z/OS
HMS 2010 Virtualisierung
11
© 2010 IBM Corporation21
Data Sets / VSAM
� Data Set– Refers to a file that has a record orientation.– Storage is referred to as Extended Count Key Data or ECKD– A data set name is up to 44 uppercase characters, divided by periods into
qualifiers, with a maximum of 8 bytes per qualifier.– Most common data set types:
• Sequential - data items that are stored consecutively • Partitioned - directory and members (sometimes called libraries)• Key Sequenced – stored with a key so data can be retried without
searching.– Access methods are used generally used for reading and writing data sets– The most common access method is VSAM
� VSAM – Virtual Storage Access Method– Four data set organizations – key sequenced (KSDS), relative record (RRDS),
entry sequenced (ESDS) and linear (LDS)– Both IMS and DB2 use VSAM data structures
© 2010 IBM Corporation22
z/OS UNIX System Services
� z/OS UNIX System Services– Certified UNIX system – Integral element of z/OS
� Includes– Over 2000 UNIX APIs– Multiple shells– zFS, HFS, and TFS file systems
� Example Applications– z/OS TCP/IP stack– WebSphere Application Server– Lotus Domino for z/OS– WebSphere MQ
� Benefits– Allows UNIX or hybrid applications to run under z/OS– Brings z/OS functions including workload management, resource recovery services,
security, activity recording, and performance monitoring to UNIX applications
z/OS Base Control Program
z/OS UNIX Sys tem Serv ices Support
z/OS UNIX Shells
POSIX ConformingApplications
/
D D D
F F F
OMVS
ProcessMgt
OtherDaemons
andServ ers
z/OS Security - RACF
HMS 2010 Virtualisierung
12
© 2010 IBM Corporation23
Decades of existing assets
� New code cost 5X than reusing existing codeSoftware Productivity Research (SPR)
� 200 Billion lines of COBOL code in existence eWeek
� 5 Billion lines of COBOL code added yearlyBill Ulrich, TSG Inc.
� Between 850K and 1.3 Million COBOL developers with 12,000 per year attritionIDC
� Majority of customer data still on mainframes, even though much is exposed through the web and e-Commerce applicationsDon Greb, Mellon Financial Corp from Computerworld
Rewriting all existing applications and moving them to new platforms is not a viable option
© 2010 IBM Corporation24
LinuxLinux on System z – z/VM
HMS 2010 Virtualisierung
13
© 2010 IBM Corporation25
Purpose of an Operating System
�To manage hardware and software resources in a system– Memory, processor, disk space, programs
�To ensure the system behaves in a predictable way
�To provide a stable, consistent high-level interface to thehardware
– Individual applications do not need to know hardwareimplementation details
© 2010 IBM Corporation26
Film ab!
Codename : Linux
HMS 2010 Virtualisierung
14
© 2010 IBM Corporation27
Was ist Linux?
� Ein “UNIX-ähnliches” Betriebssystem– Source-Code ist offengelegt– Wird von einer Community entwickelt
– ‘Master Repository’ gepflegt von LinusTorvalds
– ‘Experimental Repository’ gepflegt von Andrew Morton
– ‘System z Subsytem Repository’Gepflegtvon Martin Schwidefsky
– ‘Steering Committee’ betreut die Projekte
� Erhältlich für viele Architekturen– x86, POWER, System z…– IBM Chiphopper
� In der Regel wird eine Linux Distribution auf Basis einer “Support Subscription Fee” von Linux Distribution Partners (LDP) genutzt– Novell und Red Hat dominant
© 2010 IBM Corporation28
Linux - Kernel Entwicklungsprozess
HMS 2010 Virtualisierung
15
© 2010 IBM Corporation29
Kernel Development
Kernel Code Beitragende
Was ist mit „others“ bzw. „unknown“? Rechtsfragen?
- Software Freedom Law Center, http://www.sof twaref reedom.org/
- Microsof t und Nov ell kündigen im Herbst 2006 an, in den Bereichen Interoperabilität und Patentschutz zusammenzuarbeiten
- Angeblich v erletzt Linux Patente – Realität?
- Commitment der Distributoren, im Falle das eine Patentsv erletzung die entsprechenden Stellen nachzubessern
© 2010 IBM Corporation30
Some Facts
�Linux is really just the Kernel– Memory management
– Process/Thread management and synchronization
– Resource management
– Device Driver management
– File management
�Monolithic Kernel
�A useful system requires much more– Shells, window servers/managers, utility
programs
Contrast z/OS: Microkernel – only basicfunctionality within Kernel
itself
HMS 2010 Virtualisierung
16
© 2010 IBM Corporation31
Compare: Monolithic and Micro Kernel
© 2010 IBM Corporation32
Components of Linux Kernel
HMS 2010 Virtualisierung
17
© 2010 IBM Corporation33
GNU C Compiler Backend
GNU Binutils Backend
Linux Applications
GNU Runtime Environment
System z Instructionset and I/O Hardware
Linux Kernel
Architecture
independent Code
System z dependent Code
File systemsNetwork Protocols
arch
Generic Drivers
HW dependent Drivers
Backend
Memory Management
ProcessManagement
arch
Linux on System z - System Struktur
© 2010 IBM Corporation34
Linux Distribution
�Provides a complete usable system�Usually includes
– At least one version of the kernel
– An install system, base device drivers, utilities, and networking
– A software package system, selection, and update mech anism
�Package systems– Debian, RPM, source code
HMS 2010 Virtualisierung
18
© 2010 IBM Corporation35
Was ist Linux on System z?Linux ist Linux ...
... und Linux on System z bietet einzigartigen Mehrw ert!
� Kein spezielles Linux– Keine Änderung am Look&Feel– Etwa 1% Source-Code sind customized
� Reine ASCII umgebung– Keine EBCDIC Codepage
� Linux ist Linux ist Linux...– ...aber Leistung, Eigenschaften und Qualitäten
hängen von der Hardware ab� Linux-Only Mainframe ist möglich
� Unterstützt die speziellen Plattform-Features des Mainframe, z.B. FCP, HyperPAV,…
� Keine Ablösung für bestehende Betriebssysteme auf System z
LinuxKernel
GNURTL
GNUBinutils
Linux applications
GNUCompiler
System z or zSeries Hardware
IBM developed code
IBM developed code
© 2010 IBM Corporation36
Linux on System z - Mehrwert
� z/VM– Memory Overcommitment
– Siehe separates Thema� IO Verarbeitung
– HyperPAV für ECKD– FCP für SCSI
�Einbindung in z/OS HA/DR Mechanismen– ECKD Plattensicherung via DFSMSdss– GDPS HyperSwap
�Sicherheit– Crypto Express Support
– RACF�Netzwerk
– Hipersockets
HMS 2010 Virtualisierung
19
© 2010 IBM Corporation37
xDR on z/VM� Proxy
– One linux system is configured as Proxy for GDPS which has special configuration (Memory locked, Access rights to VM, One-Node-Cluster)
– Heartbeat for sanity check– erpd sends system information and reports disk errors to GDPS– CLI via rexec
� Production Nodes– Heartbeat for sanity check– erpd sends system information– CLI via rexec
� The command interface to VM CP is hcp (for SLES8,SL ES9) or vmcp (for SLES10)� The interface to retrieve disk errors from VM is vm logrdr (linux device)
VM
Linux Proxy
Error reporting
erpd
z/OS
GDPS K-System
xDR scripts (xdr.*)
rexec
Heartbeat
XDRHeartbeat
commands
Production System
Heartbeat
XDRHeartbeatSend Init Event
erpd
Production System
Heartbeat
XDRHeartbeatSend Init Event
erpd
hcp / vmcp vmlogrdr
Production Cluster
© 2010 IBM Corporation38
System zVirtualization
Linux on System z – z/VM
HMS 2010 Virtualisierung
20
© 2010 IBM Corporation39
Virtualization� Creates virtual resources and "maps" them to real resources.� Primarily accomplished with software and/or firmware.
Resources� Components with architected interfaces/functions.� May be centralized or distributed. Usually physical.� Examples: CPUs, memory, storage, network devices.
Virtual Resources� Proxies for real resources: same interfaces/functions, different attributes.� May be part of a physical resource or multiple physical resources.
� Separates presentation of resources to users from actua l resources
� Aggregates pools of resources for allocation to user s as virtual resources
Virtualization – Say What?
© 2010 IBM Corporation40
�Virtualization comprises the abstraction of physicalsystems to virtual systems
– Divison of static relationships between logical system environments(environment of services and applications) and physical systems
– Two possible directions:
• Integration of many single physical systems to one logical system• Segmentation of one physical system into many logical systems
– Such „logical systems“ are named as virtual machines
– A virtual machine is a fully protected and isolated simulation of theunderlaying hardware
...altough virtualization comprises both integration and segmentation, this presentation concentrates on segmentation.
Virtualization – Say What?
HMS 2010 Virtualisierung
21
© 2010 IBM Corporation41
Important Terms Concerning Virtualization
� Supervisor and Hypervisor– Supervisor is another term for an operating system of an virtual machine
• Controls the virtual machine and its dedicated resources
– Hypervisor (or Virtual Machine Monitor) is another term for an controller of virtual machines• Controls the physical system resources and dedicates them to virtual machines.• Controls and handles processes of virtual machines which are critical to physical hardware• Isolation of virtual machines• Switching (context switching) between virtual machines (e.g., exits, time slicing...)
Physical Resources
Hypervisor
Supervisor Supervisor Supervisor Supervisor
© 2010 IBM Corporation42
Important Terms Concerning Virtualization
� Kernel (Privileged) Mode and User Mode– Kernel Mode provides full access to system resources. It is the mode of the
operating system which administers and dedicates physical system resources.
– User Mode provides restricted access to system resources (e.g., applications)
� Privileged and Non-Privileged instructions– Privileged instructions can only be executed within Kernel Mode
� Sensitive and Non-Sensitive instructions– Sensitive instructions invoke critical hardware areas
OperatingSystem
ApplicationOperatingSystem
Application
Hypervisor
Non-Privileged
Privileged
Non-Privileged
Non-Privileged
Privileged
HMS 2010 Virtualisierung
22
© 2010 IBM Corporation4343
What is a Virtual Machine Image?� Meta-data describing the required
server resources– Number of CPUs (dedicated vs. shared)– Memory requirements– IO and network requirements
� Meta-data describing goals and constraints – Availability goals– Placement constraints
� Meta-data describing configuration variables– OS Configuration parameter – IP Address,
etc.– Application Configuration parameters
� One or more disk images containing OS, middleware and other application software
� May be composition of virtual machine images – Virtual machine images making up a
distributed application workload.– Includes additional meta-data scoped to the
entire composition.
meta-data
SW
OS
meta-data
SW
OS
meta-data
SW
OS
meta-data
SW
OS
Virtual Machine Image
Virtual Machine Image Composition
meta-data
© 2010 IBM Corporation44
Server Virtualization Approaches
Hypervisor software/firmwareruns directly on server
Hypervisor software runs ona host operating system
System z PR/SM and z/VMPOWER HypervisorVMware ESX ServerXen Hypervisor
VMware GSXMicrosoft Virtual ServerHP Integrity VMUser Mode Linux
S/370 SI-to-PP and PP-to-SI, Sun Domains, HP nPars
Logical partitioning
Physical partitioning
pSeries LPAR, HP (PA) vPars
Adjustablepartitions
PartitionController
...
SMP Server
OS
Apps
OS
Apps
Hypervisor
SMP Server
...OS
Apps
OS
Apps
Host OS
SMP Server
Hypervisor
...OS
Apps
OS
Apps
Hardware Partitioning Bare Metal Hypervisor Hosted Hy pervisor
Server is subdivided into fractionseach of which can run an OS
Hypervisor provides fine-grainedtimesharing of all resources
Hypervisor uses OS services todo timesharing of all resources
Type 1 Type 2
• Hardware partitioning subdiv ides a serv er into frac tions, each of which can run an OS• Hyperv isors use a thin layer of code to achiev e fin e-grained, dynamic resource sharing• Type 1 hyperv isors w ith high efficiency and av ailab ility w ill become dominant for serv ers• Type 2 hyperv isors w ill be mainly for clients where host OS integration is desirable
HMS 2010 Virtualisierung
23
© 2010 IBM Corporation45
Trap and Emulate
Hypervisor Calls (“Paravirtualization”) Direct Hardware Virtualization
Examples: CP-67, VM/370Benefits: Runs unmodified OSIssues: Substantial overhead
LASTPrivOpL...
Hypervisor PrivOpemulation code
• VM runs in user mode• All privileged instructions
cause traps
Trap
Examples: POWER Hypervisor, Xen (today), HP Integrity VM
Benefits: High efficiency depending ofHypervisor code + eventual HW support
Issues: OS-Kernel must be modified to issue Hcalls.OS & Hypervisor levels must be in sync
LASTHcallL...
Hypervisor service
• VM runs in normal modes• OS in VM calls hypervisor
in case of critical SyscallsLAST
PrivOp (*)L...
Hypervisor service
• VM runs in normal modes• HW does most of the virtualization• SIE arch – set architecture of VM,
provide status, translation & assists)• Hypervisor provides control
Exit
Examples: PR/SM, z/VM (also use hypervisor calls)for a some functional enhancements
Benefits: Highest efficiency depending on HW/ucode support. Runs unmodified OS
Issues: Requires HW & ucode support
Virt Mach
Virt MachVirt Mach
Translate, Trap, and Emulate
LASTTrapOpL...
Hypervisor PrivOpemulation code
• VM runs in user mode• Some IA-32 instructions must
be replaced with trap ops
Trap
Examples: VMware (today), Microsoft VS Benefits: Runs unmodified, translated OSIssues: May have some substantial overhead
Virt Mach
Call
Hypervisor Implementation Methods
(*) ONLY for some control instructions, executed rather infrequent
© 2010 IBM Corporation46
I/O Adapter and Network Virtualization Methods
VMware PCI Adapter Virtualization
� I/O virtualization and device drivers are part of hypervisor reducing overall system availability
� Failure of I/O adapter or device driver can cause system outage or data corruption
OS OS OS OS
HypervisorVirtualizes I/O
DeviceDriver
DeviceDriver
StorageAdapter
NetworkAdapter
OSVirtual I/O Server
Hypervisor
OS OS
DeviceDriverProxy
DeviceDriverProxy
DeviceDriver
Virtualizes devicesand adapters
DeviceDriver
System p Device and Adapter Virtualization
HostI/F
DeviceDriverProxy
� Firmware (VIOS) provides I/O sharing� Hardware (TCE’s) provide I/O isolation
Message Passing
DMA Isolation HW
StorageAdapter
NetworkAdapter
PCI-family adapters thatcannot be shared directly
TCE’s (Translation Control Entries)
System z Native Adapter & Network Virtualization
ESCONMultiple ImageFacility (EMIF)
since 1988
OS OS OS OS
I/O Passthrubypassinghypervisor
Channel
ESCON/FICON
� ESCON channels and network support efficient sharin g
Hypervisor
System z PCI Adapter Virtualization
Hydra/OSA HWprovides adapter
sharing andprotects server
OS OS OS OS
I/O PassthrubypassinghypervisorPCI Adapter
Network
� PCI-family I/O adapters cannot be shared directly
Hypervisor
HMS 2010 Virtualisierung
24
© 2010 IBM Corporation4747
Open Virtual Machine Format - OVF
� “The Open Virtual Machine Format (OVF) describes an open, secure, portable, efficient and extensible format f or the packaging and distribution of (collections of) virt ual machines”
� Draft specification in the DMTF– Created by Dell, HP, IBM, VMware and Microsoft – Defines and standardizes the XML schema (angle brackets) that describes a virtual image
� Key features of OVF…– Enabled for optimized distribution of virtual machines (virtual appliances)– Ease of installation and deployment carrying appropriate meta-data– Single and multiple virtual machine images supporting multi-tier applications– Portable packaging format with vendor extensibility– Permits the specification of Virtual Machine and application configuration
� Current OVF specification focuses on packaging and deployment– Future versions will include runtime and retirement aspects
© 2010 IBM Corporation48
VMware Virtual Infrastructure
Features� 8-way Virtual SMP� 256 GB guest memory� ESXi� Unified GUI� VMotion (guest & storage)� DRS � HA� Update Manager� DPM� SRM
Architecture
HMS 2010 Virtualisierung
25
© 2010 IBM Corporation49
Windows Serve r Virtualization
User Mode
Ring 3
Windows Server 2008
VSPWindows Kernel
Applications Applications Applications
Non-Hypervisor Aware OS
Windows Server 2003/2008
Windows Kernel VSC
VMBus Emulation
Server Hardware
Windows hypervisor
Xen-Enabled Linux Kernel
Linux VSC
Hypercall Adapter
Parent Partition
Child Partitions
VM Service
WMI Prov ider
VM Worker
Processes
OS
ISV / IHV / OEMMicrosoft / XenSource
KernelModeRing 0
Provided by:
Ring -1
IHV Driv ers
VMBus
VMBus
Hyper-V - Architecture
� 4vCPUs� RAM: 1TB
limit overall
© 2010 IBM Corporation50
KVM-Architecture
� Included in Linux kernel since 2006, maintained by the community
� Runs Linux, Windows and other operating system guests
� Advanced features– Live migration
– Memory page sharing
– Thin provisioning
– PCI Pass-through
� Utilizes Linux security� 16vCPUs, RAM: 42bit limit
HMS 2010 Virtualisierung
26
© 2010 IBM Corporation51
Xen
�Primarily paravirtualization
�Full virtualization possible
�Novell promoted Xen, moves currently to KVM
�Citrix uses Xen as strategic platform
�32vCPU w. Linux, 8vCPU with Windows, 32GBper VM
© 2010 IBM Corporation52
Current Dynamics of the open source virtualisation
Windows
Red Hat Linux
SUSE Oth
er
VMware Microsoft
XenSource(Citrix)
Novell (SUSE)
Red Hat
Xen(Open source)
Linux Kernel
KVM(open source)
Source : IBM MI – STG 1/22/09
X86 Server Unit ShareIDC Tracker
Oth
er L
inux
SunOracle
� SUSE “enlightened guest” on Windows Server 2008 Hypervisor
� Windows 2008 paravirtualized guest on SUSE using Xen
� Common WS-Mgmt management std
� Common MSFT Image std
� Directory Interoperability
� Acquired Quamranet, the sponsor, maintainer and catalyst behind the open-source KVM project
MSFT/Novell/XenSourceRed Hat
Virtual Iron
HMS 2010 Virtualisierung
27
© 2010 IBM Corporation53
PowerVM Virtualization Architecture
POWERHypervisorFirmware
ServiceSubsyst.
Processors
Memory
I/O Expansion Slots
POWERServer
Hardware
Local Devices and Storage
Virtual MemoryVirtual Processors
VirtualI/O
ServerVirtualAdapters
VirtualDisks
Virtual Networks
AIXPartitions
AIXPartitions
i5/OSPartitions
IBM iPartitions
LinuxPartition
s
UnassignedOn DemandResources
HardwareManagementConsole
Networks and Networked Storage
© 2010 IBM Corporation54
PowerVM – Virtualization Features� Configured via HMC / IVM
– Min/Max/Desired Memory
– Required/Desired Adapters: Real or Virtual
– Min/Max/Desired Number of VPs
– Min/Max/Desired Capacity Entitlement (CE)
– Capped / uncapped partitions
– Micro-partition weight
� Virtual Shared Pools� Excess Dedicated Capacity Utilization� Active Memory Sharing� Active Memory Expansion� I/O Virtualization
– Virtual Inter-Partition LAN
– VIOS Shared Ethernet Adapter Bridge
– Integrated Virtual Ethernet Adapter
– Storage Virtualization backed by LUN / Disk / LV / File
– N-Port ID VirtualizationLegend
- Utilization
- Entitled Capacity - Extra capacity used from pool (uncapped partition)
- Excess capacity ceded back to pool
Micro-partitionsShared Pool of 6
CPUs
Red
hatL
inux
Virt
ual I
/O S
erve
r
AIX
V6.
1
AIX
5L
V5.
3
SU
SE
Lin
ux
Processingcapacity
HypervisorMin
Max
AIX
5L
V5.
2
AIX
5L
V5.
3
DynamicLPARs
WholeProcessors
Virtual Processor s
HMS 2010 Virtualisierung
28
© 2010 IBM Corporation55
System z - Virtualization
z/VM
• 1966 : CP-40
• 1 to 1000s of guests
• Shared memory
• Multiple z/V M LPA Rs
• z/VM under z/V M
• FICON and FC disk
• EA L 4+ secur ity rating
Overview
• Dual hypervisors
• Direct CPU and I/O hardw are virtualization
• Virtual netw orks available for both Hypervisors
• Shared I/O
• Mixed production & non-production very common
PR/SM
• Introduced in 1988
• 1 to 60 LPA Rs
• Integrated f irmw are
• Dedicated memory
• Longer t ime slice than z/V M
• CP, IFL, z IIP, zAA P, ICF
• EA L5 secur ity rating
z/OSLPAR
System z – Virtualization Hardware & Shared I/O
Hypervisor (PR/SM)
z/VM LPAR
Shared Memory
LinuxLinux z/VMz/OSLPAR
z/VM LPAR
Shared Memory
LinuxLinux
zIIP
zAAP
ICF
© 2010 IBM Corporation56
z/OS z/OS
LPAR
z/OS – Application Virtualization
� “Multiple Virtual Storage” (MVS) was the old name fo r z/OS� Multiple applications and middleware instances per z/OS system
– Benefits from proximity between components – performance, simplicity, reliability
� Multiple z/OS instances (LPARs) per CEC (box)� Networking between LPARs with Hipersockets ( < - - - - - - >)
System z Hardware
CPCP
CPCP
zAAP
zIIPzIIP
zAAP
CPCP
I/O
. . . . .
App
lica
tion
Por
tal
App
lica
tion
Tra
nsa
ctio
n M
gr
DB
MS
App
Se
rve
r
. . .
App
lica
tion
Web
Se
rve
r
App
lica
tion
ES
B
DB
MS
We
b A
pp S
erve
r
Network
HMS 2010 Virtualisierung
29
© 2010 IBM Corporation57
z/VM
LPAR
Linux on System z – Server Virtualization
� A distributed architecture implemented in a System z frame– Generally one function per Linux instance , like most distributed server implementations
� Benefits derived from drastically lower environment al, floor space expense, network efficiency & performance
– Networking between Linux instances with Hipersockets ( < - - - - - > above ) or z/VM VLAN, � z/VM virtualization flexibility, ease of instance m anagement (provisioning, monitoring), security
System z Hardware
CPCP
CPCP
IFLCP
CPI/O
Linux
App
lica
tion
Web
Se
rve
r
App
lica
tion
Tra
nsa
ctio
n M
gr
DB
MS
ES
B
. . .
Network
Linux Linux Linux Linux
Linux
Web
App
Se
rver
Linux
Por
tal
Linux
. . .
IFLIFL
IFL
© 2010 IBM Corporation58
Let’s talk about…
Business Value!
HMS 2010 Virtualisierung
30
© 2010 IBM Corporation59
Exploiding numbers of systems? Consequences!
© 2010 IBM Corporation60
Data Center Heating
Source: Uptime Institute,Footprint – Heat Density Trends
HMS 2010 Virtualisierung
31
© 2010 IBM Corporation61
Why perform the effort of virtualization? - Business Val ue!
Virtual Servers
Physical Server
Virtualization
Roles:ConsolidationsDynamic provisioning / hosting Workload managementWorkload isolationSoftware release migrationMixed production and testMixed OS types/releasesReconfigurable clustersLow-cost backup servers
Benefits:Higher resource utilizationGreater usage flexibilityImproved workload QoSHigher availability / securityLower cost of availabilityLower management costsImproved interoperabilityLegacy compatibilityInvestment protection
� Reduced hardware costs– Higher physical resource utilization
– Smaller footprints
� Reduced management costs– Fewer physical servers to manage
– Many common management tasks become much easier� Improved flexibility and responsiveness
– Virtual resources can be adjusted dynamically to meet new or changing needsand to optimize service level achievement
– Provisioning and removing of servers within minutes
Virtualization based on SHARING RESSOURCESallows an installation to grow
dynamically both Vertically & Horizontallyon the same server
assuming we are dealing with
Efficient Virtualization
© 2010 IBM Corporation62
z/VMLinux on System z – z/VM
HMS 2010 Virtualisierung
32
© 2010 IBM Corporation63
IBM System z – a comprehensive and sophisticated sui te of virtualization function
IBM System z Virtualization Genetics
CP-67
VM/370
VM/SP
VM/HPO
VM/XA
VM/ESA
z/VM V5
S/360
S/370
SMP
64 MB Real
31-Bit
ESA
64-Bit
1960s 1972 1980 1981 1988 1995 2007...
REXX Interpreter
Virtual Machine Resource Manager
Virtual Disks in Storage
CMS Pipelines
Accounting Facility
Absolute | Relative SHARE
Discontiguous Saved Segments
Instruction TRACE
LPAR Hyperviso r
Adapter Interruption Pass-Through
Multiple Logical Channel Subsystems (LCSS)
Open Systems Adapter (OSA) Netwo rk Switching
Zone Relocation
Control Program Hypervisor
Dynamic Address Tran slation (DAT)
Diagnose Hypervisor Interface
Conversational Monitor System (CMS)
Inter-User Communication Vehicle (IUCV)
Program Event Recording (PER)
Translation Look-Aside Buffer (TLB)
Programmable Operator (PROP)
Dedicated I/O Processors
VM Assist Microcode
Start Interpretive Execution (SIE)
Named Saved Systems
Guest LANs
I/O Priority Queuing
Virtual Switch
Minidisk Cache
Set Observer
Performance Toolkit
SIE on SIE
Expanded Storage Multiple Image Facil ity (MIF)
Large SMP
HiperSockets
Integrated Facil ity for Linux
Host Page-Management Assi st
QDIO Enhanced Buffer State Mgmt
Automated Shutdown
Dynamic Virtual Machine Timeout
HyperSwap
N_Port ID Virtualization (NPIV)
30909x21
9672zSeries
System z9
System z10
308x303x
4381
Over 40 years of continuous innovation in virtualiz ation– Refined to support modern bus iness requirements
– Exploit hardw are technology for economical grow th
– LPA R, Integrated Fac ility for Linux, HiperSockets
– System z Application Assist Processors
– System z Information IntegrationProcessors
Business Value: Scalability, Reliability
, Robustness, Flexibility
, ...
© 2010 IBM Corporation64
z/VM Basic Components
� z/VM Control Program (CP)– z/VM Hypervisor– Schedules guests and virtualizes the hardware
� Conversational Monitor System (CMS)– Lightweight interactive Operating System – Similar to a shell on UNIX– Includes editors, commands, scripting languages, etc– Used by z/VM administration and some z/VM service
machines
� Guests– Service machines– Linux, CMS, z/VM
� Can pass commands along to the z/VM CP– CP controls hardware devices such as disk, network adapters, memory, etc– CMS and Linux guests can pass commands to CP - useful for automation
z/VM Control Program
System z
CMS Linux
HMS 2010 Virtualisierung
33
© 2010 IBM Corporation65
z/VM Virtual Machines
System z Hardware
Service Machines(similar to daemons)
Guests
CMS
Z/VM
Linux
Linux
Linux
CMS
Linux
MAIN
T
VMR
M
TCP/IP
TSM
ACCOUNTING
AUTOLOG1
PERF
TOOLKIT
…
z/VM Control ProgramUserDirectory
© 2010 IBM Corporation66
A Virtual Machine
� We permit any configuration that a real zSeries machine could have
� In other words, we completely implement the z/Architecture Principles of Operation
� There is no “standard virtual machine configuration”
� z/Architecture� 512 MB of memory� 2 processors� Basic I/O devices:
– A console
– A card reader
– A card punch
– A printer� Some read-only
disks� Some read-write
disks� Some networking
devices
Virtual Machine
HMS 2010 Virtualisierung
34
© 2010 IBM Corporation67
VM User Directory
USER LINUX01 MYPASS 512M 1024M GMACHINE ESA 2IPL 190 PARM AUTOCRCONSOLE 01F 3270 ASPOOL 00C 2540 READER *SPOOL 00D 2540 PUNCH ASPOOL 00E 1403 ASPECIAL 500 QDIO 3 SYSTEM MYLANLINK MAINT 190 190 RRLINK MAINT 19D 19D RRLINK MAINT 19E 19E RRMDISK 191 3390 012 001 ONEBIT MW MDISK 200 3390 050 100 TWOBIT MR
Definitions of:– memory
– architecture
– processors
– spool devices
– network device
– disk devices
– other attributes
© 2010 IBM Corporation68
z/VM CPU Resource Controls
� Granular sharing of resources
� Resource allocation– Determines priority for CPU, main storage,
and paging capacity
– Use shares or absolute values
– Absolute guests receive top priority
� Shares allows extreme consolidation of low utilization guests
� Settings can be changed on the fly– Command or programmed automation
– Virtual Machine Resource Manager
z/VM Control Program
Lin1
RelativeGuests
AbsoluteGuests
0
20
40
60
80
Absolute%
0
200
400
600
800
RelativeShare
Lin2 Lin3 Lin4 Lin5
HMS 2010 Virtualisierung
35
© 2010 IBM Corporation69
Anomalies of Time
�VM virtualizes various timers or clocks– CPU timer – runs as processor time consumed
– Time of day (TOD) clock– Clock comparator
�Anomaly– TOD always moves at wall clock speed– Virtual CPU timer “moves” slower as the sharing of the real processor
increases– Problem when calculations assume CPU timer is moving at TOD
clock speed�LPAR
– Same potential, but seldom shares processors to high enough degree to create drastic anomalies
© 2010 IBM Corporation70
Anomalies of Time
60 Seconds Wall Clock time
Running 20
Seconds
Waiting 5
Seconds
TOD
CPUTimerServer A
CPUTimerServer B
Running 30 Seconds
Waiting 5 Seconds
Stop running virtual server A, and dispatch virtual server B
50%86%3035B
33%80%2025A
Correct Utilization
Incorrect Utilization
CPU Timer ‘busy’
Total CPU Timer
Virtual Server
HMS 2010 Virtualisierung
36
© 2010 IBM Corporation71
� Allows z/VM guests to expand or contract the number of virtual processors it uses without affecting the overall CPU capacity it is allowed to consume
– Guests can dynamically optimize their multiprogramming capacity based on workload demand– Starting and stopping virtual CPUs does not affect the total amount of CPU capacity the guest is authorized
to use– Linux CPU hotplug (cpuplugd) daemon starts and stops virtual CPUs based on Linux Load Average value.
• The cpuplugd daemon is available with SLES10 SP2 and IBM is working with it Linux distributor partners to provide this function in other Linux on System z distributions.
� Helps enhance the overall efficiency of a Linux-on- z/VM environment
Note: Overall CPU capacity for a guest system can be dynamically adjusted using the SHARE setting
CPU 0SHARE=25
CPU 1SHARE=25
CPU 2SHARE=25
CPU 3SHARE=25
Guest SHARE = 100
CPU 0SHARE=50
CPU 1SHARE=50
CPU 2Stopped
CPU 3Stopped
Guest SHARE = 100
Reduced Need forMultiprogramming
Stop 2 CPUs
CPU 0SHARE=50
CPU 1SHARE=50
CPU 2Stopped
CPU 3Stopped
Guest SHARE = 100
CPU 0SHARE=25
CPU 1SHARE=25
CPU 2SHARE=25
CPU 3SHARE=25
Guest SHARE = 100
Increased Need forMultiprogramming
Start 2 CPUs
Dynamic virtual processor management
© 2010 IBM Corporation72
z/VM: Memory and Disk
�Memory– Virtualized – shared by all guests
– Memory over commit rule of thumb• Production 1.5 - 3:1• Test: 2:1 - 5:1• Development: 5:1 - 10:1
– Hardware assists for efficiency
– Shared memory capabilities• Shared memory segments • Shared Linux kernel
�Disk– FICON and/or FC attachment– Shared and/or dedicated LUNs
z/VM
Shared Memory
LinuxGuest
LinuxGuest
LinuxGuest
Shared Memory Segments
FibreChannel FICON
HMS 2010 Virtualisierung
37
© 2010 IBM Corporation73
VM Memory Virtualization
Host RealGuest RealGuest Virtual
4
1212
33
1
VMGuest
Swapping
4 3
Paging
2 4
© 2010 IBM Corporation74
Collaborative Memory Management Assist(CMMA)
� Extends coordination of memory and paging between Linux and z/VM to the level of individual pages
� z/VM reclaims “unused”pages at higher priority
� Bypass host page writes for unused and “volatile”pages (clean disk cache pages)
� Signal exception if guest references discarded volatile page
� Use Host Page-Management Assist to re-instantiate pages for next use
� z/VM support included in V5.3
HMS 2010 Virtualisierung
38
© 2010 IBM Corporation75
Saved Segment and NSS Support
� DCSS (Discontiguous Saved Segments)– Defines an address range (MB boundary) to the system– A single copy is shared among all guests– Guest "loads" the DCSS (maps DCSS into its address space)
• Can be located outside guest's defined storage– DAT lets this work with minimal CP involvement– Contains:
• Data (e.g. file system control blocks)• Code (e.g. CMS code libraries)
� NSS (Named Saved Systems)– An IPL-able saved segment– Great for CMS or for Linux
• 1 shared copy on system for N guests, instead of N copies.• Faster boot
� Special Cases– Writable by guest, or by CP– Restricted (sensitive data)– Can have both exclusive and shared ranges
© 2010 IBM Corporation76
Linux Exploitation of z/VM DiscontiguousSaved Segme nts (DCSS)
� DCSS support is data-in-memory technology– Share a single, real memory location among
multiple virtual machines– High-performance data access– Can reduce real memory utilization
� Linux exploitation support available today– Execute-in-place (xip2) file system– DCSS memory locations can reside outside the
defined virtual machine configuration– Access to file system is at memory speeds;
executables are invoked directly out of the file system (no data movement required)
– Avoids duplication of virtual memory and data stored on disks
– Enables throughput benefits for Linux guest images and helps enhance overall system performance and scalability
HMS 2010 Virtualisierung
39
© 2010 IBM Corporation77
� z/VM V5.4 exploits dynamic memory reconfiguration
� Users can nondisruptively add memory to a z/VM LPAR– Addit ional memory can come from: a) unused available memory, b) concurrent memory upgrade, or c) an LPA R
that can release memory
– Systems can now be configured to reduce the need to re-IPL z/V M
– Memory cannot be nondisruptively removed from a z/V M LPA R
� z/VM virtualizes this hardware support for guest machines– Currently, only z/OS and z/VM support this capability in a virtual machine environment
z/VM
Linux
Memory
I/O and Network
Linux
CPU
z/VSE
Smart economics: Nondisruptively scale your z/VM environment byadding hardware assets that can be shared with every virtual server
Linux z/VM z/OS
Dynamically add
resources toz/VM LPAR
Linux Linux
New with V5.4LPAR
Resources
VMV54_290
Dynamic memory upgrade
© 2010 IBM Corporation78
Virtualization of Disks
R/O
A
R/W
Minidisk 1
Minidisk 2
Minidisk 3
Dedicated
Enterprise Storage Server ™ (Shark)
z/VM
Linux1
R/W
Virtual Diskin Storage(memory)
R/O
Notes:R/W = Read/WriteR/O = Read Only
Linux2
TDISK space
R/W
DR/W
Linux3
R/WB
R/W
Virtual Diskin Storage(memory)
TDISK 1
Excellent swap device if not
storage-constrainedMinidisk Cache(High-speed,
in-memory disk cache)
Minidisk: z/VM diskallocation technology
TDISK: on-the-fly diskallocation pool
2B00 2B01 2B02
101
100
100 100200
B01 B01E
C
HMS 2010 Virtualisierung
40
© 2010 IBM Corporation79
z/VM Disk Technology – SCSI
R/O
B
Minidisk A
Minidisk B
Minidisk C
Paging
Enterprise Storage Server ™ (Shark)
z/VM
Linux1 A
R/W
R/O
Linux2
TDISK space
T1R/W
Linux3
R/WC
TDISK 1
Minidisk Cache(High-speed,
in-memory disk cache)
TDISK: on-the-fly diskallocation pool
SCSI SCSI SCSI
FBA
FBA FBAFBA
00
200
399
100
SCSI Disks attached to z/VM. Appear to guests and rest of VM as emulated FBA.
© 2010 IBM Corporation80
System z Networking
� Internal– HiperSockets – between LPARs
– VSWITCH – between z/VM guests
– Shared OSA (HW)
� External– Dedicated OSA (HW)
– Shared OSA (HW)
– VSWITCH to OSA (z/VM)
� Mix and match
Note: Open System Adapter (OSA) is an Ethernet adap ter for System z, PR/SM = Hardware Hypervisor
System z
Ethernet Switch(s)
z/VM LPAR
Linux Guest
NICNIC
Linux Guest
NICNIC
z/VM Virtual Switch
OSA
z/OSLPAR
OSA
NIC NIC
OSA
(PR/SM) HiperSocket
NIC
HMS 2010 Virtualisierung
41
© 2010 IBM Corporation81
z/VM Clustered Hypervisor & Guest Mobility (SOD) *
� Cluster up to four z/VM systems in a Single System Image– Same or different System z10 server– Set of shared resources for the z/VM systems and their hosted virtual
machines
� Simplifies management of a multi-z/VM environment– Single user directory– Apply maintenance to all systems from one location– Issue commands from one system to run
on another– Built in cross system capabilities
� Move running Linux guests from one z/VM LPAR to another
– Reduce planned outages– Balance available resources
*: All statements regarding IBM's plans, directions, and intent are subject to change or withdrawal without notice, and represent goals and objectives only
© 2010 IBM Corporation82
DemoLinux on System z – z/VM
HMS 2010 Virtualisierung
42
© 2010 IBM Corporation83
THANK YOU
© 2010 IBM Corporation84
A list of terms…
– BCP: Base Control Program– CBU: Capac ity Backup– CEC: Central Electronics Complex– CF: Coupling Facility
– CHPID: Channel Path ID– CICS: Customer Information Control System – CP: Central Processor– CSE: Cross System Extensions
– CUoD: Capac ity Upgrade on Demand– DA SD: Direct Access Storage Dev ice– DCSS: Discontiguous Shared Storage– ESCON: Enterprise System Connection
– ETR: External Time Reference– FICON: Fibre Connection– FICON-E: Fibre Connection Express– FSP: Flexible Service Processor
– GDPS: Geographically Dispersed Parallel Sysplex– HMC: Hardw are Management Console– HSA: Hardw are System Area– ICB: Integrated Cluster Bus– ISC: Inter Systems Channel
– ICF: Internal Coupling Facility– IFL: Integrated Facility for Linux– IMS: Information Management System – ISPF: Interactive System Productivity Facility
– JES: Job Entry Subsystem– LCSS: Logical Channel Sub-system– LIC: Licensed Internal Code– LPA R: Logical Partition
– MBA: Memory Bus Adapter– MCM: Mult i-Chip Module– MIF: Multi- Image Fac ility– MQ: Message Queuing
– NIC: Netw ork Interface Card– OSA: Open Systems Adapter– PPRC: Peer to Peer Remote Copy – PR/SM: Processor Resource/Systems Manager
– PU: Processor Unit– RA CF: Resource Access Control Facility – RMF: Resource Measurement Facility– SA P: System Assist Processor
– SE: Support Element– STI: Self Timed Interface– STP: Server Time Protocol– TSO: Time Sharing Option– VIPA: Virtual IP Address
– VM: Virtual machine– XRC: Extended Remote Copy– zAAP: zSeries Application Assist Processor
– zIIP: zSeries Integrated Information Processor