aix virtual user group - ibm€¦ · – sv830_068_048 / fw830.10 shipped 09/10/15 ... 1 1 5104.50...

44
© Copyright IBM Corporation 2015 Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. AIX Virtual User Group Sept. 24, 2015 What’s New PowerVP

Upload: haduong

Post on 26-May-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

© Copyright IBM Corporation 2015Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM.

AIX Virtual User GroupSept. 24, 2015

What’s New PowerVP

© 2015 IBM Corporation 2

http://www.redbooks.ibm.com/redpieces/abstracts/sg248171.html

Draft available now!

POWER7 & POWER8PowerVM HypervisorAIX, i & LinuxJava, WAS, DB2…Compilers & optimizationPerformance tools & tuning

Optimization Redbook

© 2015 IBM Corporation 3

� This is a PowerVM Hypervisor issue on POWER8, but is not considered pervasive

� Impacts POWER8 PowerVM systems

– Hypervisor issue that can delay dispatching of Virtual Processors– Rare, noticed as pronounced degradation every couple of months

– Victim systems have only a small number of partitions, shared capped/uncapped and running in a defined non-default shared pool (other than Pool 0)

– If running uncapped in default shared pool or dedicated, not impacted

� This defect is fixed in the following releases– SV810_133_081 / FW810.33 shipped 8/14/2015

– SC820_XX_XX need to perform the circumvention until fix has GA’d

– SV830_068_048 / FW830.10 shipped 09/10/15

� Mitigation

– Create a new shared processor AIX/Linux partition using the default pool (Pool 0)

– Boot into SMS, no requirement to install an operating system. Help document: http://www.ibm.com/support/docview.wss?uid=nas8N1020863

HIPER Performance Issue

© 2015 IBM Corporation 4

Affinity Review

© 2015 IBM Corporation 5

What is Affinity?

� Affinity is a locality measurement of an entity with respect to physical resources– An entity could be a thread within an OS instance (AIX/i/Linux) or the

OS/Virtual Machine itself. For this presentation, we focus on the latter.– Physical resources could be a core, chip, node, socket, cache (L1/L2/L3),

memory controller, memory DIMMs, or I/O buses� Affinity is optimal when the number of cycles required to access resources is

minimizedDual Chip Module

Socket / NodeDIMM Memory

POWER7+ 760 PlanarNote x & z buses between chips, and A & B buses between Dual Chip Modules (DCM)In this model, each DCM is a “node” Power 750/760 D Technical Overview

Chip

© 2015 IBM Corporation 6

How does partition placement work?

� PowerVM knows the chip types and memory configuration, and attempts to pack partitions onto the smallest number of chips / nodes / drawers

– Optimizing placement will result in higher exploitation of local CPU and memory resources

– Dispatches across node boundaries will incur longer latencies, and both AIX and PowerVM the are actively trying to minimize that via active Enhanced Affinity and Hypervisor mechanisms

� It considers the partition profiles and calculates optimal placements– Placement is a function of Desired Entitlement, Desired &

Maximum Memory settings– Maximum memory defines the size of the Hardware Page Table

maintained for each partition. For POWER7, it is 1/64th of Maximum and 1/128th on POWER7+ and POWER8

– Ideally, Desired + (Maximum/HPT ratio) < node memory size if possible

© 2015 IBM Corporation 7

AIX Enhanced Affinity

� AIX on POWER and above uses Enhanced Affinity instrumentation to localize threads by Scheduler Resource Allocation Domain(SRAD)

� AIX Enhanced Affinity measuresLocal Usually a ChipNear Local Node/DCMFar Other Node/Drawer/CEC

� These are logical mappings, which may or may not exactly map 1:1 with physical resources

POWER8 S824 DCM

POWER7 770/780/795

Affinity

Localchip

Nearintra-node

Farinter-node

© 2015 IBM Corporation 8

View of 24-way, two socket POWER7+ 760 with Dual Chip Modules (DCM)6 cores chip, 12 in each DCM 5 Virtual Processors x 4-way SMT = 20 logical cpus

Terms: REF Node (drawer or DCM/MCM socket)SRAD Scheduler Resource Allocation Domain (typically chip)

AIX Affinity: lssrad tool shows logical placement

# lssrad -av

REF1 SRAD MEM CPU

0 0 12363.94 0-7

2 4589.00 12-15

1 1 5104.50 8-11

3 3486.00 16-19

If a thread’s ‘home’ node was SRAD 0SRAD 2 would be ‘near’ & SRAD 1 & 3 would be ‘far’

Node 0 SRAD

0

2

Node 1

1

3

© 2015 IBM Corporation 9

Affinity: Diagnosis

When may I have a problem?• SRAD has CPUs but no memory or vice-versa• CPU or Memory are very unbalanced (subjective)• DPO affinity score is low (subjective)• Customer is complaining of variable performance from similarly

configured environments

But how do I really know? Tools help you:• lssrad, topas, mpstat, svmon (AIX)• numactl, numastat (Linux)• PowerVP & Dynamic Platform Optimizer (AIX, i, Linux)• High percentage of threads (1000’s) with far dispatches• Measured disparity in application metrics between equivalent systems

PowerVM & POWER8 provide a variety of improvements• Firmware, pHyp, OS, Dynamic Platform Optimizer and PowerVP• Cache size, L4 cache, access logic, DIMM bandwidth • Inter-socket latencies and bus bandwidths

© 2015 IBM Corporation 10

Topas Monitor for host: claret4 Interval: 2===================================================================REF1 SRAD TOTALMEM INUSE FREE FILECACHE HOMETHRDS CPUS-------------------------------------------------------------------0 2 4.48G 515M 3.98G 52.9M 134.0 12-15

0 12.1G 1.20G 10.9G 141M 236.0 0-71 1 4.98G 537M 4.46G 59.0M 129.0 8-11

3 3.40G 402M 3.01G 39.7M 116.0 16-19===================================================================CPU SRAD TOTALDISP LOCALDISP% NEARDISP% FARDISP%----------------------------------------------------------0 0 303.0 43.6 15.5 40.92 0 1.00 100.0 0.0 0.03 0 1.00 100.0 0.0 0.04 0 1.00 100.0 0.0 0.05 0 1.00 100.0 0.0 0.06 0 1.00 100.0 0.0 0.0

AIX topas Logical Affinity (‘M’ option)

Node

Chip Dispatches

Local is optimal

What’s a bad FARDISP% rate?No rule-of-thumb, but 1000’s of far dispatches per second likely indicate lower performance

How do we fix? Entitlement & Memory Best Practices + Current Firmw are + Dynamic Platform Optimizer

© 2015 IBM Corporation 11

PowerVPVersion 1.1.3

© 2015 IBM Corporation 12

http://www.redbooks.ibm.com/redpieces/pdfs/redp5112.pdf

PowerVP Redbook

© 2015 IBM Corporation 13Power 750/760 D Technical Overview

Review: POWER7+ 750/760 Planer

Intra-DCM bus: x & zInter-DCM/socket bus: AB

ChipDual Chip Module

Socket / NodeDIMM Memory

Memory Controller

I/O Bus

© 2015 IBM Corporation 14

Review: POWER7+ 770/780 Planer

Loc Code Conn Ref

Power 770/780 D Technical Overview

Not as pretty as 750+ diagram, note we have x, w & z buses between chips with this model and buses to other nodes (not pictured) and IO are a little more cryptic

© 2015 IBM Corporation 15

� During an IPL of the entire Power System, the Hypervisor determines an optimal resource placement strategy for the serverbased on the partition configuration and the hardware topology of the system

� Customers wanted a visual understanding of how hardware resources were assigned and being consumed

– Physical placement and utilization– Thresholding for indicating busiest resources– Recording for playback

� Development labs had internal tooling that provided affinity information, but no integrated solution usable by customer or IBM support

Why PowerVP

© 2015 IBM Corporation 16

PowerVP Overview

� Graphically displays data from existing and new performance tools

� Converges performance data from across the system

� Shows CEC, node & partition level performance data

� Illustrates topology utilization with colored “heat” threshold settings

� Enables drill down for both physical and logical approaches

� Allows real-time monitoring and recording function

� Simplifies physical/virtual environment, monitoring, and analysis

� Not intended to replace any current monitoring or management product

© 2015 IBM Corporation 17

System-wide Collector� One required per system� P7 topology information� P7 chip/core utilizations� P7 Power bus utilizations� Memory and I/O utilization� LPAR entitlements, utilization

Optional Partition Collectors� LPAR CPU utilization� Disk Activity� Network Activity� CPI analysis� Cache analysis

PowerVP Environment

Core HPMCs

Chip PMUlets

System Collector

Hypervisor interfaces

Power Hardware

FW/Hypervisor

Operating systemIBM i, AIX, VIOS, Linux

Thread PMUs

Partition Collector

You only need to install a single system-wide colle ctor to see global metrics

LPARS

PowerVPGUI

© 2015 IBM Corporation 18

PowerVP: Versions

� PowerVP 1.1.2 is required for POWER8, but memory bus activity is not currently available. This capability is expected to ship in a later firmware update.

� PowerVP GUI (a client in this architecture) has changed–Version 1.1 & 1.2 where Java-base applications–Version 1.3 runs as a Websphere Liberty browser plug-in

• You use a browser to point to the remote collector, ala http://localhost:9080/powerVPWeb/PowerVP.html

• Liberty package ships/installs wherever you run the GUI–IBM Global Security Kit (GSKit) packages are included to

support ssh client operation–The client can run on Windows, Linux or AIX (supported). A

Java installer is also shipped, which runs without problem on OS X.

� All versions use the same agent/collector schema, which are install packages for AIX, Linux or System i

© 2015 IBM Corporation 19

PowerVP: System Info, Global Usage, Recording

SystemInformation

Global Utilization

Recording/PlaybackControl

Viewing Preferences

ConnectionControls

VIOS Performance Advisor Reports(if VIOS is System Collector)

© 2015 IBM Corporation 20

PowerVP: LPAR List View (v 1.1.3)

List view of partitions within a frame

� LPAR name is only available if a system or partition collector is installed

� But the single system collector here can capture all partitions ID, state, entitlement and utilization

� LPAR colors are used to correlate affinity information in other views

© 2015 IBM Corporation 21

SystemTopology

Node

Drill Down PartitionDrill Down

PowerVP: System, Node and Partition Views

© 2015 IBM Corporation 22

PowerVP: System Topology

• The initial view shows the hardware topology of the system you are logged into

• In this view, we see a Power 795 with all eight books and/or nodes installed, each with four sockets

• Values within boxes show CPU usage

• Lines between nodes show SMP fabric activity

© 2015 IBM Corporation 23

• Active buses are shown with solid colored lines. These can be between nodes, chips, memory controllers and IO buses.

PowerVP: Node drill down

• Selection of a node provides resource assignments and/or consumption

• In this view, we see a POWER7 780 node with four chips each with four cores (version 1.1.2)

© 2015 IBM Corporation 24

PowerVP: Node Utilization View (P8 S824)

I/O Bus

SMP Bus

Chip

Memory Controller

Cores &Utilization

Systems like the 750+ & S824, a node is socket with Dual Chips (DCM)

© 2015 IBM Corporation 25

PowerVP: Node View with Affinity v1.1.2 (P7 780)

© 2015 IBM Corporation 26

Chip

MemoryController

DIMM

IOSMP Bus

PowerVP: Chip (POWER7 780 / 4 cores)

&Memory

LPARVirtual

Processors

© 2015 IBM Corporation 27

LPAR 7 has 8 VPs. As we select cores, 2 VPs are “homed” to each core. The fourth core has 4 VPs from four LPARs “homed” to it.

This does not prevent VPs from being dispatched elsewhere in thepool as utilization requirements demand

PowerVP: CPU Affinity

© 2015 IBM Corporation 28

LPAR 7 Online Memory is 32768 MB, 50% of 64 GB in DIMMs

LPARs listed in color order

PowerVP: Memory Affinity

© 2015 IBM Corporation 29

PowerVP: Pre/Post DPO Operation v1.1.3

Memory & VirtualProcessors collapsedonto single socket(Affinity Score = 100%)

DPO Operation

Did Hypervisor remove VPs?

No – it ‘homed’ 2 VPs each onto the two physical cores shown

VPs homed to a socket/chip are more likely to be dispatched to other shared pool cores on that same chip as workload requires

Purple Partition has VirtualProcessors (4) and memoryspread across 4 sockets(Affinity Score = 25%)

© 2015 IBM Corporation 30

� If the System Level (SL) Collector is a VIOS, PowerVP can pull VIOS Performance Advisor Reports from it

� CPU requirements for collector is extremely low, no issue using VIOS

� Configuration/etc/opt/ibm/powervp/powervp.confVIOSAdvisor [runtime in minutes] [start time in HH format]

Multiple entries are allowed

PowerVP: VIOS Performance Advisor Reports

© 2015 IBM Corporation 31

PowerVP: VIOS Performance Advisor Reports

© 2015 IBM Corporation 32

� In addition to recordings on the PowerVP GUI (client) side, version 1.1.3 provides a new method to record performance data locally at the collector level

– Collector is configured to write to a data store file via configuration file settings

– Recording size/disk usage is dependent upon number of partitions and sample interval (monitoring interval defaults to 2 seconds, which IMO is too high for general recording). Currently 1 to 120 seconds, but I’ve asked for 300 seconds.

– Config file: /etc/opt/ibm/powervp/powervp.conf (AIX, VIOS and Linux)

– Recording format:PVPmmddyyyyhhmmss.csv

� Recordings are in CSV format with nmon-like format

– ZZZZ for timestamp

– AAAA for version info, config, processor, system info

– SYSTOP (system) CHIPTOP (processor), CORETOP core topology– REGLPARS (registered lpars)

– AFFDTOP (domain), AFFPTOP (partition), AFFVPROC (VP) affinity

– SCPUBC (global utilization)– SMETRICBP (partition metrics)

– PSTAT (partition status), PENENT (eth), PDISK (disk), PCPU

PowerVP: System/Partition Collector Recording

Single Point Collector for frame and partitions CPU utilization – write your own Consolidator!

© 2015 IBM Corporation 33

� Settings– LogData directive enables recording

• Yes | No

• System or Partition– LogFileRotation

• Settings for hours (1 to 24H) or size (NM - MB or NG –GB)

• Minimum value for size is 100M• PowerVP always rotates the log file at midnight, 00:00:00 time

– LogFileArchive number of days to archive recordings (ala nmonrecording)

– SampleInterval

– LogFilePath (path)• /opt/ibm/powervp/logs AIX/VIOS and Linux

• /QIBM/UserData/powervp/logs on IBM i

PowerVP: Local Recording (cont)

© 2015 IBM Corporation 34

PowerVP: Thresholds & Alerts

� Utilization thresholds can be configured– System & partition CPU

– Memory controller

– I/O bus– Inter-node & Intra-node bus

� Alerts use built-in OS functionality– AIX/VIOS syslog

– IBM i QSYSOPR message queue

– Amount of time before alerting and/or re-alerting is all configurable

� PowerVP GUI does not serve as a threshold/event monitor, use existing OS monitors to exploit

© 2015 IBM Corporation 35

• We can drill down on several of these resources. Example: we candrill down on the disk transfer or network activity by selecting the resource

PowerVP: Partition drill down

• View allows us to drill down on resources being used by selected partition

• In this view, we see CPU, Memory, Disk IOPS, and Ethernet being consumed. We can also get an idea of cache and memory affinity.

© 2015 IBM Corporation 36

PowerVP: Partition drill down (CPU, CPI)

© 2015 IBM Corporation 37

PowerVP: Partition drill down (Disk)

© 2015 IBM Corporation 38

PowerVP: How do I use this?

� PowerVP is not intended to replace traditional performance management products. It is not a management tool.

� PowerVP does provide an overview of hardware resource activity that allows you to get a high-level view of

–Node/socket activity–Cores assigned to dedicated and shared pool–VM’s Virtual Processors assigned to cores–VM’s memory assigned to DIMMs–Memory bus activity (not currently supported on POWER8)–IO bus activity–Provides partition activity related to

• hdisk & network• System and partition Cycles-Per-Instruction

© 2015 IBM Corporation 39

PowerVP: How do I use this? High-Level

� High-level view can allow visual identification of node and bus stress–Thresholding is largely arbitrary, but if one memory controller is

obviously saturated and others are inactive, you have an indication more detailed review is required

–Nodes, CPUs, buses with heaviest activity provide a start point to correlate with DPO information

–Placement issues with CPU & Memory are clearly represented–There are no rules-of-thumb or best practices for thresholds (yet)–You can review system Redbooks and determine where you are

with respect to bus performance (not always available, but newerRedbooks are more informative)

� This tool provides high-level diagnosis with some detailed view (if partition-level collectors are installed)

© 2015 IBM Corporation 40

PowerVP: How do I use this? Low-Level

� Cycles-Per-Instruction (CPI) is a complicated subject, it will be beyond the capacity of most customers to assess in detail

–In general, a lower CPI is better–The fewer number of CPU cycles per instruction, the more

instructions can get done–PowerVP gives you various CPI values. These values, in

conjunction with OS tools can tell you whether you have good affinity

� Affinity is a measurement of a threads locality to physical resources. Resources can be many things: L1/L2/L3 cache, core(s), chip, memory controller, socket, node, drawer, etc.

© 2015 IBM Corporation 41

Topas Monitor for host: claret4 Interval: 2===================================================================REF1 SRAD TOTALMEM INUSE FREE FILECACHE HOMETHRDS CPUS-------------------------------------------------------------------0 2 4.48G 515M 3.98G 52.9M 134.0 12-15

0 12.1G 1.20G 10.9G 141M 236.0 0-71 1 4.98G 537M 4.46G 59.0M 129.0 8-11

3 3.40G 402M 3.01G 39.7M 116.0 16-19===================================================================CPU SRAD TOTALDISP LOCALDISP% NEARDISP% FARDISP%----------------------------------------------------------0 0 303.0 43.6 15.5 40.92 0 1.00 100.0 0.0 0.03 0 1.00 100.0 0.0 0.04 0 1.00 100.0 0.0 0.05 0 1.00 100.0 0.0 0.06 0 1.00 100.0 0.0 0.0

Review: AIX topas Logical Affinity (‘M’ option)

Node

Chip Dispatches

Local is optimal

What’s a bad FARDISP% rate?No rule-of-thumb, but 1000’s of far dispatches per second likely indicate lower performance

How do we fix? Entitlement & Memory Best Practices + Current Firmw are + Dynamic Platform Optimizer

© 2015 IBM Corporation 42

Dynamic Platform Optimizer action causes workloads move• Logical affinity with AIX topas (local, near & far)• Physical affinity with PowerVP (local, remote & distant)• More local, more ideal

Cache Affinity DIMM Affinity

Computed CPI is an inverse calculation, lower is ty pically better

Local is optimal

PowerVP Physical Affinity

© 2015 IBM Corporation 43

� All PowerVM POWER8 systems/firmware (no memory bus metrics at this time)

� Power System models and ITE’s with 770 firmware supp ort• 710-E1D, 720-E4D, 730-E2D, 740-E6D (also includes Linux D models)• 750-E8D, 760-RMD• 770-MMC, 780-MHC, ESE 9109-RMD• p260-22X, p260-23X, p460-42X, p460-43X, p270-24X, p470-44X, p24L-7FL• 71R-L1S, 71R-L1C, 71R-L1D, 71R-L1T, 7R2-L2C, 7R2-L2S, 7R2-L2D, 7R2-L2T

� Power System models added with 780 firmware support– 770-MMB and 780-MHB (eConfig support 1/28/2014)– 795-FHB Dec 2013

� Power System models with 780 firmware support – 770-MMD, 780-MHD (4/30/2014)

PowerVP supported Power models and ITE’s

* Some Power models and firmware releases listed above are currently planned for the future and have not yet been announced.* All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.

http://www-304.ibm.com/support/customercare/sas/f/power5cm/power7.html

© 2015 IBM Corporation 44

� Announced and GA in 4Q 2013� PowerVP 1.1.2 shipped 6/2014, SP2 8/2014� PowerVP 1.1.3 available 6/2015 (SP2 7/2015, SP3 Q4/2015)� Available as standalone product or with PowerVM Enterprise Edition� Collectors will run on IBM i, AIX, Linux on Power and VIOS

–System i V7R1, AIX 6.1/7.1, any VIOS level supporting POWER7–RHEL 6.4, 6.5, 7+, SUSE 11 SP 3–Other Linux variants expected in 2015 updates

� Client supported on Windows, Linux, and AIX–v1.1.2 and earlier require Java 1.6–v1.1.3 requires Java 1.7 and Websphere Liberty–Installer provided for Windows, Linux, and AIX–Java installer, which has worked on OSX (my own testing)

Has worked on VMWARE and MAC where the others don’t

PowerVP OS Support