aix virtual user group - ibm€¦ · – sv830_068_048 / fw830.10 shipped 09/10/15 ... 1 1 5104.50...
Post on 26-May-2018
216 Views
Preview:
TRANSCRIPT
© Copyright IBM Corporation 2015Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM.
AIX Virtual User GroupSept. 24, 2015
What’s New PowerVP
© 2015 IBM Corporation 2
http://www.redbooks.ibm.com/redpieces/abstracts/sg248171.html
Draft available now!
POWER7 & POWER8PowerVM HypervisorAIX, i & LinuxJava, WAS, DB2…Compilers & optimizationPerformance tools & tuning
Optimization Redbook
© 2015 IBM Corporation 3
� This is a PowerVM Hypervisor issue on POWER8, but is not considered pervasive
� Impacts POWER8 PowerVM systems
– Hypervisor issue that can delay dispatching of Virtual Processors– Rare, noticed as pronounced degradation every couple of months
– Victim systems have only a small number of partitions, shared capped/uncapped and running in a defined non-default shared pool (other than Pool 0)
– If running uncapped in default shared pool or dedicated, not impacted
� This defect is fixed in the following releases– SV810_133_081 / FW810.33 shipped 8/14/2015
– SC820_XX_XX need to perform the circumvention until fix has GA’d
– SV830_068_048 / FW830.10 shipped 09/10/15
� Mitigation
– Create a new shared processor AIX/Linux partition using the default pool (Pool 0)
– Boot into SMS, no requirement to install an operating system. Help document: http://www.ibm.com/support/docview.wss?uid=nas8N1020863
HIPER Performance Issue
© 2015 IBM Corporation 5
What is Affinity?
� Affinity is a locality measurement of an entity with respect to physical resources– An entity could be a thread within an OS instance (AIX/i/Linux) or the
OS/Virtual Machine itself. For this presentation, we focus on the latter.– Physical resources could be a core, chip, node, socket, cache (L1/L2/L3),
memory controller, memory DIMMs, or I/O buses� Affinity is optimal when the number of cycles required to access resources is
minimizedDual Chip Module
Socket / NodeDIMM Memory
POWER7+ 760 PlanarNote x & z buses between chips, and A & B buses between Dual Chip Modules (DCM)In this model, each DCM is a “node” Power 750/760 D Technical Overview
Chip
© 2015 IBM Corporation 6
How does partition placement work?
� PowerVM knows the chip types and memory configuration, and attempts to pack partitions onto the smallest number of chips / nodes / drawers
– Optimizing placement will result in higher exploitation of local CPU and memory resources
– Dispatches across node boundaries will incur longer latencies, and both AIX and PowerVM the are actively trying to minimize that via active Enhanced Affinity and Hypervisor mechanisms
� It considers the partition profiles and calculates optimal placements– Placement is a function of Desired Entitlement, Desired &
Maximum Memory settings– Maximum memory defines the size of the Hardware Page Table
maintained for each partition. For POWER7, it is 1/64th of Maximum and 1/128th on POWER7+ and POWER8
– Ideally, Desired + (Maximum/HPT ratio) < node memory size if possible
© 2015 IBM Corporation 7
AIX Enhanced Affinity
� AIX on POWER and above uses Enhanced Affinity instrumentation to localize threads by Scheduler Resource Allocation Domain(SRAD)
� AIX Enhanced Affinity measuresLocal Usually a ChipNear Local Node/DCMFar Other Node/Drawer/CEC
� These are logical mappings, which may or may not exactly map 1:1 with physical resources
POWER8 S824 DCM
POWER7 770/780/795
Affinity
Localchip
Nearintra-node
Farinter-node
© 2015 IBM Corporation 8
View of 24-way, two socket POWER7+ 760 with Dual Chip Modules (DCM)6 cores chip, 12 in each DCM 5 Virtual Processors x 4-way SMT = 20 logical cpus
Terms: REF Node (drawer or DCM/MCM socket)SRAD Scheduler Resource Allocation Domain (typically chip)
AIX Affinity: lssrad tool shows logical placement
# lssrad -av
REF1 SRAD MEM CPU
0 0 12363.94 0-7
2 4589.00 12-15
1 1 5104.50 8-11
3 3486.00 16-19
If a thread’s ‘home’ node was SRAD 0SRAD 2 would be ‘near’ & SRAD 1 & 3 would be ‘far’
Node 0 SRAD
0
2
Node 1
1
3
© 2015 IBM Corporation 9
Affinity: Diagnosis
When may I have a problem?• SRAD has CPUs but no memory or vice-versa• CPU or Memory are very unbalanced (subjective)• DPO affinity score is low (subjective)• Customer is complaining of variable performance from similarly
configured environments
But how do I really know? Tools help you:• lssrad, topas, mpstat, svmon (AIX)• numactl, numastat (Linux)• PowerVP & Dynamic Platform Optimizer (AIX, i, Linux)• High percentage of threads (1000’s) with far dispatches• Measured disparity in application metrics between equivalent systems
PowerVM & POWER8 provide a variety of improvements• Firmware, pHyp, OS, Dynamic Platform Optimizer and PowerVP• Cache size, L4 cache, access logic, DIMM bandwidth • Inter-socket latencies and bus bandwidths
© 2015 IBM Corporation 10
Topas Monitor for host: claret4 Interval: 2===================================================================REF1 SRAD TOTALMEM INUSE FREE FILECACHE HOMETHRDS CPUS-------------------------------------------------------------------0 2 4.48G 515M 3.98G 52.9M 134.0 12-15
0 12.1G 1.20G 10.9G 141M 236.0 0-71 1 4.98G 537M 4.46G 59.0M 129.0 8-11
3 3.40G 402M 3.01G 39.7M 116.0 16-19===================================================================CPU SRAD TOTALDISP LOCALDISP% NEARDISP% FARDISP%----------------------------------------------------------0 0 303.0 43.6 15.5 40.92 0 1.00 100.0 0.0 0.03 0 1.00 100.0 0.0 0.04 0 1.00 100.0 0.0 0.05 0 1.00 100.0 0.0 0.06 0 1.00 100.0 0.0 0.0
AIX topas Logical Affinity (‘M’ option)
Node
Chip Dispatches
Local is optimal
What’s a bad FARDISP% rate?No rule-of-thumb, but 1000’s of far dispatches per second likely indicate lower performance
How do we fix? Entitlement & Memory Best Practices + Current Firmw are + Dynamic Platform Optimizer
© 2015 IBM Corporation 13Power 750/760 D Technical Overview
Review: POWER7+ 750/760 Planer
Intra-DCM bus: x & zInter-DCM/socket bus: AB
ChipDual Chip Module
Socket / NodeDIMM Memory
Memory Controller
I/O Bus
© 2015 IBM Corporation 14
Review: POWER7+ 770/780 Planer
Loc Code Conn Ref
Power 770/780 D Technical Overview
Not as pretty as 750+ diagram, note we have x, w & z buses between chips with this model and buses to other nodes (not pictured) and IO are a little more cryptic
© 2015 IBM Corporation 15
� During an IPL of the entire Power System, the Hypervisor determines an optimal resource placement strategy for the serverbased on the partition configuration and the hardware topology of the system
� Customers wanted a visual understanding of how hardware resources were assigned and being consumed
– Physical placement and utilization– Thresholding for indicating busiest resources– Recording for playback
� Development labs had internal tooling that provided affinity information, but no integrated solution usable by customer or IBM support
Why PowerVP
© 2015 IBM Corporation 16
PowerVP Overview
� Graphically displays data from existing and new performance tools
� Converges performance data from across the system
� Shows CEC, node & partition level performance data
� Illustrates topology utilization with colored “heat” threshold settings
� Enables drill down for both physical and logical approaches
� Allows real-time monitoring and recording function
� Simplifies physical/virtual environment, monitoring, and analysis
� Not intended to replace any current monitoring or management product
© 2015 IBM Corporation 17
System-wide Collector� One required per system� P7 topology information� P7 chip/core utilizations� P7 Power bus utilizations� Memory and I/O utilization� LPAR entitlements, utilization
Optional Partition Collectors� LPAR CPU utilization� Disk Activity� Network Activity� CPI analysis� Cache analysis
PowerVP Environment
Core HPMCs
Chip PMUlets
System Collector
Hypervisor interfaces
Power Hardware
FW/Hypervisor
Operating systemIBM i, AIX, VIOS, Linux
Thread PMUs
Partition Collector
You only need to install a single system-wide colle ctor to see global metrics
LPARS
PowerVPGUI
© 2015 IBM Corporation 18
PowerVP: Versions
� PowerVP 1.1.2 is required for POWER8, but memory bus activity is not currently available. This capability is expected to ship in a later firmware update.
� PowerVP GUI (a client in this architecture) has changed–Version 1.1 & 1.2 where Java-base applications–Version 1.3 runs as a Websphere Liberty browser plug-in
• You use a browser to point to the remote collector, ala http://localhost:9080/powerVPWeb/PowerVP.html
• Liberty package ships/installs wherever you run the GUI–IBM Global Security Kit (GSKit) packages are included to
support ssh client operation–The client can run on Windows, Linux or AIX (supported). A
Java installer is also shipped, which runs without problem on OS X.
� All versions use the same agent/collector schema, which are install packages for AIX, Linux or System i
© 2015 IBM Corporation 19
PowerVP: System Info, Global Usage, Recording
SystemInformation
Global Utilization
Recording/PlaybackControl
Viewing Preferences
ConnectionControls
VIOS Performance Advisor Reports(if VIOS is System Collector)
© 2015 IBM Corporation 20
PowerVP: LPAR List View (v 1.1.3)
List view of partitions within a frame
� LPAR name is only available if a system or partition collector is installed
� But the single system collector here can capture all partitions ID, state, entitlement and utilization
� LPAR colors are used to correlate affinity information in other views
© 2015 IBM Corporation 21
SystemTopology
Node
Drill Down PartitionDrill Down
PowerVP: System, Node and Partition Views
© 2015 IBM Corporation 22
PowerVP: System Topology
• The initial view shows the hardware topology of the system you are logged into
• In this view, we see a Power 795 with all eight books and/or nodes installed, each with four sockets
• Values within boxes show CPU usage
• Lines between nodes show SMP fabric activity
© 2015 IBM Corporation 23
• Active buses are shown with solid colored lines. These can be between nodes, chips, memory controllers and IO buses.
PowerVP: Node drill down
• Selection of a node provides resource assignments and/or consumption
• In this view, we see a POWER7 780 node with four chips each with four cores (version 1.1.2)
© 2015 IBM Corporation 24
PowerVP: Node Utilization View (P8 S824)
I/O Bus
SMP Bus
Chip
Memory Controller
Cores &Utilization
Systems like the 750+ & S824, a node is socket with Dual Chips (DCM)
© 2015 IBM Corporation 26
Chip
MemoryController
DIMM
IOSMP Bus
PowerVP: Chip (POWER7 780 / 4 cores)
&Memory
LPARVirtual
Processors
© 2015 IBM Corporation 27
LPAR 7 has 8 VPs. As we select cores, 2 VPs are “homed” to each core. The fourth core has 4 VPs from four LPARs “homed” to it.
This does not prevent VPs from being dispatched elsewhere in thepool as utilization requirements demand
PowerVP: CPU Affinity
© 2015 IBM Corporation 28
LPAR 7 Online Memory is 32768 MB, 50% of 64 GB in DIMMs
LPARs listed in color order
PowerVP: Memory Affinity
© 2015 IBM Corporation 29
PowerVP: Pre/Post DPO Operation v1.1.3
Memory & VirtualProcessors collapsedonto single socket(Affinity Score = 100%)
DPO Operation
Did Hypervisor remove VPs?
No – it ‘homed’ 2 VPs each onto the two physical cores shown
VPs homed to a socket/chip are more likely to be dispatched to other shared pool cores on that same chip as workload requires
Purple Partition has VirtualProcessors (4) and memoryspread across 4 sockets(Affinity Score = 25%)
© 2015 IBM Corporation 30
� If the System Level (SL) Collector is a VIOS, PowerVP can pull VIOS Performance Advisor Reports from it
� CPU requirements for collector is extremely low, no issue using VIOS
� Configuration/etc/opt/ibm/powervp/powervp.confVIOSAdvisor [runtime in minutes] [start time in HH format]
Multiple entries are allowed
PowerVP: VIOS Performance Advisor Reports
© 2015 IBM Corporation 32
� In addition to recordings on the PowerVP GUI (client) side, version 1.1.3 provides a new method to record performance data locally at the collector level
– Collector is configured to write to a data store file via configuration file settings
– Recording size/disk usage is dependent upon number of partitions and sample interval (monitoring interval defaults to 2 seconds, which IMO is too high for general recording). Currently 1 to 120 seconds, but I’ve asked for 300 seconds.
– Config file: /etc/opt/ibm/powervp/powervp.conf (AIX, VIOS and Linux)
– Recording format:PVPmmddyyyyhhmmss.csv
� Recordings are in CSV format with nmon-like format
– ZZZZ for timestamp
– AAAA for version info, config, processor, system info
– SYSTOP (system) CHIPTOP (processor), CORETOP core topology– REGLPARS (registered lpars)
– AFFDTOP (domain), AFFPTOP (partition), AFFVPROC (VP) affinity
– SCPUBC (global utilization)– SMETRICBP (partition metrics)
– PSTAT (partition status), PENENT (eth), PDISK (disk), PCPU
PowerVP: System/Partition Collector Recording
Single Point Collector for frame and partitions CPU utilization – write your own Consolidator!
© 2015 IBM Corporation 33
� Settings– LogData directive enables recording
• Yes | No
• System or Partition– LogFileRotation
• Settings for hours (1 to 24H) or size (NM - MB or NG –GB)
• Minimum value for size is 100M• PowerVP always rotates the log file at midnight, 00:00:00 time
– LogFileArchive number of days to archive recordings (ala nmonrecording)
– SampleInterval
– LogFilePath (path)• /opt/ibm/powervp/logs AIX/VIOS and Linux
• /QIBM/UserData/powervp/logs on IBM i
PowerVP: Local Recording (cont)
© 2015 IBM Corporation 34
PowerVP: Thresholds & Alerts
� Utilization thresholds can be configured– System & partition CPU
– Memory controller
– I/O bus– Inter-node & Intra-node bus
� Alerts use built-in OS functionality– AIX/VIOS syslog
– IBM i QSYSOPR message queue
– Amount of time before alerting and/or re-alerting is all configurable
� PowerVP GUI does not serve as a threshold/event monitor, use existing OS monitors to exploit
© 2015 IBM Corporation 35
• We can drill down on several of these resources. Example: we candrill down on the disk transfer or network activity by selecting the resource
PowerVP: Partition drill down
• View allows us to drill down on resources being used by selected partition
• In this view, we see CPU, Memory, Disk IOPS, and Ethernet being consumed. We can also get an idea of cache and memory affinity.
© 2015 IBM Corporation 38
PowerVP: How do I use this?
� PowerVP is not intended to replace traditional performance management products. It is not a management tool.
� PowerVP does provide an overview of hardware resource activity that allows you to get a high-level view of
–Node/socket activity–Cores assigned to dedicated and shared pool–VM’s Virtual Processors assigned to cores–VM’s memory assigned to DIMMs–Memory bus activity (not currently supported on POWER8)–IO bus activity–Provides partition activity related to
• hdisk & network• System and partition Cycles-Per-Instruction
© 2015 IBM Corporation 39
PowerVP: How do I use this? High-Level
� High-level view can allow visual identification of node and bus stress–Thresholding is largely arbitrary, but if one memory controller is
obviously saturated and others are inactive, you have an indication more detailed review is required
–Nodes, CPUs, buses with heaviest activity provide a start point to correlate with DPO information
–Placement issues with CPU & Memory are clearly represented–There are no rules-of-thumb or best practices for thresholds (yet)–You can review system Redbooks and determine where you are
with respect to bus performance (not always available, but newerRedbooks are more informative)
� This tool provides high-level diagnosis with some detailed view (if partition-level collectors are installed)
© 2015 IBM Corporation 40
PowerVP: How do I use this? Low-Level
� Cycles-Per-Instruction (CPI) is a complicated subject, it will be beyond the capacity of most customers to assess in detail
–In general, a lower CPI is better–The fewer number of CPU cycles per instruction, the more
instructions can get done–PowerVP gives you various CPI values. These values, in
conjunction with OS tools can tell you whether you have good affinity
� Affinity is a measurement of a threads locality to physical resources. Resources can be many things: L1/L2/L3 cache, core(s), chip, memory controller, socket, node, drawer, etc.
© 2015 IBM Corporation 41
Topas Monitor for host: claret4 Interval: 2===================================================================REF1 SRAD TOTALMEM INUSE FREE FILECACHE HOMETHRDS CPUS-------------------------------------------------------------------0 2 4.48G 515M 3.98G 52.9M 134.0 12-15
0 12.1G 1.20G 10.9G 141M 236.0 0-71 1 4.98G 537M 4.46G 59.0M 129.0 8-11
3 3.40G 402M 3.01G 39.7M 116.0 16-19===================================================================CPU SRAD TOTALDISP LOCALDISP% NEARDISP% FARDISP%----------------------------------------------------------0 0 303.0 43.6 15.5 40.92 0 1.00 100.0 0.0 0.03 0 1.00 100.0 0.0 0.04 0 1.00 100.0 0.0 0.05 0 1.00 100.0 0.0 0.06 0 1.00 100.0 0.0 0.0
Review: AIX topas Logical Affinity (‘M’ option)
Node
Chip Dispatches
Local is optimal
What’s a bad FARDISP% rate?No rule-of-thumb, but 1000’s of far dispatches per second likely indicate lower performance
How do we fix? Entitlement & Memory Best Practices + Current Firmw are + Dynamic Platform Optimizer
© 2015 IBM Corporation 42
Dynamic Platform Optimizer action causes workloads move• Logical affinity with AIX topas (local, near & far)• Physical affinity with PowerVP (local, remote & distant)• More local, more ideal
Cache Affinity DIMM Affinity
Computed CPI is an inverse calculation, lower is ty pically better
Local is optimal
PowerVP Physical Affinity
© 2015 IBM Corporation 43
� All PowerVM POWER8 systems/firmware (no memory bus metrics at this time)
� Power System models and ITE’s with 770 firmware supp ort• 710-E1D, 720-E4D, 730-E2D, 740-E6D (also includes Linux D models)• 750-E8D, 760-RMD• 770-MMC, 780-MHC, ESE 9109-RMD• p260-22X, p260-23X, p460-42X, p460-43X, p270-24X, p470-44X, p24L-7FL• 71R-L1S, 71R-L1C, 71R-L1D, 71R-L1T, 7R2-L2C, 7R2-L2S, 7R2-L2D, 7R2-L2T
� Power System models added with 780 firmware support– 770-MMB and 780-MHB (eConfig support 1/28/2014)– 795-FHB Dec 2013
� Power System models with 780 firmware support – 770-MMD, 780-MHD (4/30/2014)
PowerVP supported Power models and ITE’s
* Some Power models and firmware releases listed above are currently planned for the future and have not yet been announced.* All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.
http://www-304.ibm.com/support/customercare/sas/f/power5cm/power7.html
© 2015 IBM Corporation 44
� Announced and GA in 4Q 2013� PowerVP 1.1.2 shipped 6/2014, SP2 8/2014� PowerVP 1.1.3 available 6/2015 (SP2 7/2015, SP3 Q4/2015)� Available as standalone product or with PowerVM Enterprise Edition� Collectors will run on IBM i, AIX, Linux on Power and VIOS
–System i V7R1, AIX 6.1/7.1, any VIOS level supporting POWER7–RHEL 6.4, 6.5, 7+, SUSE 11 SP 3–Other Linux variants expected in 2015 updates
� Client supported on Windows, Linux, and AIX–v1.1.2 and earlier require Java 1.6–v1.1.3 requires Java 1.7 and Websphere Liberty–Installer provided for Windows, Linux, and AIX–Java installer, which has worked on OSX (my own testing)
Has worked on VMWARE and MAC where the others don’t
PowerVP OS Support
top related