5046 perfomance analysis - understanding perfstat data

68
Insight 2008 – NetApp Confidential Limited Use Performance Analysis – Understanding Perfstat Data Spencer G. Watson

Upload: rahul-bc

Post on 27-Nov-2014

164 views

Category:

Documents


11 download

TRANSCRIPT

Page 1: 5046 Perfomance Analysis - Understanding Perfstat Data

Insight 2008 – NetApp Confidential Limited Use

Performance Analysis – Understanding Perfstat Data

Spencer G. Watson

Page 2: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 2

Agenda

Introductions

Tools and Data Collection

Level 1 – “Quick Look Analysis”

Process & Workflow

Level 2 – Perfstat “Deeper Analysis”

Responsibilities & Round-Up

Data Review & Translation

Q & A

Page 3: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 3

Objectives

After completing this session you will be able to:

– Collect and Analyse Performance Data

– Monitor Performance

– Perform Bottleneck Analysis

– Make Recommendations

– Know When to Sell Chargeable PS Performance Analysis

Page 4: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 4

Performance Analysis – The Basics

Why Monitor Performance?– Pre-sales sizing for new environments

– Additional workload sizing

– Replacement of older systems

– Analyse system headroom

– Increases customer satisfaction

Don’t Panic!

Page 5: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 5

How Do You See Performance?

How do you see performance issues?

Page 6: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 6

1st Rule for Performance Cases – No Fear

1 =

2 =

3 =

4 =

5 =

6 =

7 =

8 =

9 =

How long would it take to memorize this code?

15 minutes?

10 minutes?

5 minutes?

5 seconds??

Page 7: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 7

1st Rule for Performance Cases – No Fear

1 2 3

4 5 6

7 8 9

Page 8: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 8

Process & Workflow

Page 9: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 9

C U S T O M E R

Performance Analysis Process

TSC

TSE

FSE Escalations

Sales / SE

Level 1 “Quick Look

Analysis”

Formal PS

Analysis (£)

Perfstat “Level 2 Analysis”

Customer Issue

Customer Audit

Customer Refresh

Page 10: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 10

Performance Analysis is a Correlation of Data

Single / Multiprocessor CPU Utilization

FCAL Loop throughput

Efficiency of disk writes

Hit rate of read cache

Network speed and throughput

Type of workload

Client OS configuration

Multiple loops?Multipath I/O?Clustering?

MP?Single CPU?

%utilization?

300GB

300GB

300GB

300GB

10K RPM?15K RPM?ATA?

Chain lengths?Even %utilization?

FCALLoop

Mb/s throughput?

Ethernet

Wire speed?Congested network?

GB Ethernet?VIF?

Page 11: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 11

Tools and Data Collection

Page 12: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 12

Monitoring Storage Controller Performance

There are several tools used to collect controller performance metrics:

– From console commands (“Level-1”)

– From client using perfstat (“Level-2”)

In this session we will cover “sysstat”, “statit”, “stats”, “perfstat” and a few others…

Page 13: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 13

Level-1 SE“Quick Look Analysis”

Page 14: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 14

“sysstat 1”

Reports real-time aggregated system performance statistics

Depends on workload and how

much RAM

NFS Ops has effect on CPU

UT%

Network traffic compared to max speed

Disk writes compared to reads indicate type

of activity

CPU fairly busy Mostly Disk Reads except during CP

Page 15: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 15

“sysstat –x 1” CP types one factor of write performance

CP Types of NVLog Full, Flush to Disk Affecting Disk Utilisation

Extended performance statistics per sec

Page 16: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 16

Consistency Points

1st Field – (Type)

CP Types Two fields that dictate what kind of CP is happening

- No CP started during sampling interval

number Number of CPs started during sampling interval

B Back to back CPs (CP generated CP)

b Deferred back to back CP

F CP caused by full NVLog

H CP caused by high water mark

L CP caused by low water mark

S CP caused by snapshot operation

T CP caused by timer

U CP caused by flush

Z CP caused by internal sync

: continuation of CP from previous interval

# continuation of CP from previous interval, and the NVLogor the next CP is now full, so the next CP will be type B.

Page 17: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 17

Consistency Points

CP Types Two fields that dictate what kind of CP is happening

2nd Field – (Phase)0 Initializing

n Processing normal files

s Processing special files

f Flushing modified data to disk

v Writing superblock information to disk

q Processing quota files

Page 18: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 18

“sysstat -m” CPU

Running the sysstat command at the admin level (or higher) with the new multi-processor option (-m) will display:-• The percentage of time one or more processors were utilized

(ANY). This is the same as the standard sysstat command's CPU column

• The average utilization of all processors in the system (AVG) • The utilization of each individual processor (CPUx)

All statistics listed above will range from 0% to 100%. Note that if the -m option is invoked on a uni-processor system,the sysstat command will display its standard help menu.

Page 19: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 19

“sysstat -M” CPU (undocumented)Running sysstat command at diag level (or higher) with new multi-processor option (-M) will display: • The average utilization of all processors in the system (AVG) • The utilization of each processor (CPUx) • Parallelism level accounting (ANYx+) • CSMP Domain utilization for each CSMP Domain (Network,

Storage, Kahuna, Exempt) • Interrupt utilization (Intr) • Operations per second (Ops/s) • Amount of time processing a consistency point (CP)

These statistics are useful for spotting CPU bottlenecks caused by overload/imbalance.

But just because something is 100% it may not be a problem!!

Page 20: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 20

So what is a good or bad value?

Look for limits being reached like –– 1 GigE Network ~120MB/sec

– FCAL 2Gb ~180MB/s, 4Gb ~360MB/s

– CPU driven by real client ops

– Disks at 100% for extended periods

The main item of concern should be the end user latency.

Page 21: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 21

Case Study #1 – “sysstat” Example

SE “Quick-Look Analysis”– Translating a real customer sysstat output

?

Page 22: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 22

Case Study #1 – “sysstat” Example

CPU NFS CIFS Total Net kB/s Disk kB/s Cache Cache CP CP Disk iSCSIin out read write age hit time ty util iSCSI

95% 2348 1605 4909 46299 26415 46773 16 41s 97% 0% - 77% 95684% 2336 1692 4831 26636 30408 50628 7 37s 97% 0% - 75% 80383% 2055 1405 3962 8166 28938 43933 13305 37s 98% 40% Fn 71% 50289% 2241 1810 4605 35697 30198 51612 36155 37s 94% 100% :s 76% 55495% 2124 1518 4104 45802 21917 34179 76323 36s 99% 100% :s 71% 46276% 3444 2666 6821 11297 27633 45383 24220 37s 97% 100% :s 67% 71173% 2599 2885 6134 7724 20151 27640 24 37s 97% 100% :s 54% 65071% 2323 2779 5678 28031 21946 23697 0 37s 98% 100% #s 59% 57654% 2079 2161 4995 2032 46478 21822 8 37s 99% 100% #s 52% 75575% 2018 2092 4975 2520 51821 19317 24 37s 99% 100% #s 49% 86570% 2031 2041 4934 2342 53533 26432 0 38s 99% 100% #s 57% 86250% 2187 1666 4516 1781 38545 22851 0 39s 99% 100% #s 37% 66359% 1769 2030 4612 1967 46198 28087 24 40s 99% 100% #s 69% 81367% 0 1982 2942 1346 40913 28107 8 41s 99% 100% #s 46% 96075% 271 1269 2049 3034 19880 32610 27730 42s 95% 100% bn 67% 50997% 2509 2240 5985 71663 43530 39063 29767 43s 96% 100% :f 50% 123696% 2117 1527 4762 19900 41703 43079 20968 43s 96% 13% : 78% 1118

Page 23: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 23

Case Study #1 – “sysstat” Answer

High CPU Rates. Full NVLog, Continuation of CP, B-2-B CPs Writes to Disk Halted During Excessive (100%) CP processing

CPU NFS CIFS Total Net kB/s Disk kB/s Cache Cache CP CP Disk iSCSIin out read write age hit time ty util iSCSI

95% 2348 1605 4909 46299 26415 46773 16 41s 97% 0% - 77% 95684% 2336 1692 4831 26636 30408 50628 7 37s 97% 0% - 75% 80383% 2055 1405 3962 8166 28938 43933 13305 37s 98% 40% Fn 71% 50289% 2241 1810 4605 35697 30198 51612 36155 37s 94% 100% :s 76% 55495% 2124 1518 4104 45802 21917 34179 76323 36s 99% 100% :s 71% 46276% 3444 2666 6821 11297 27633 45383 24220 37s 97% 100% :s 67% 71173% 2599 2885 6134 7724 20151 27640 24 37s 97% 100% :s 54% 65071% 2323 2779 5678 28031 21946 23697 0 37s 98% 100% #s 59% 57654% 2079 2161 4995 2032 46478 21822 8 37s 99% 100% #s 52% 75575% 2018 2092 4975 2520 51821 19317 24 37s 99% 100% #s 49% 86570% 2031 2041 4934 2342 53533 26432 0 38s 99% 100% #s 57% 86250% 2187 1666 4516 1781 38545 22851 0 39s 99% 100% #s 37% 66359% 1769 2030 4612 1967 46198 28087 24 40s 99% 100% #s 69% 81367% 0 1982 2942 1346 40913 28107 8 41s 99% 100% #s 46% 96075% 271 1269 2049 3034 19880 32610 27730 42s 95% 100% bn 67% 50997% 2509 2240 5985 71663 43530 39063 29767 43s 96% 100% :f 50% 123696% 2117 1527 4762 19900 41703 43079 20968 43s 96% 13% : 78% 1118

Page 24: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 24

Case Study #2 – “sysstat” Example

CPU Total Net kB/s Disk kB/s Cache Cache CP CP Disk FCP FCP kB/sin out read write age hit time ty util in out

60% 2007 3 3 26466 128052 27 98% 86% Ff 15% 2007 57566 2591428% 2108 3 9 15840 82408 27 94% 100% :f 16% 2108 65316 2661018% 1745 1 3 12303 18299 27 95% 36% : 13% 1745 51064 2673020% 1737 1 3 11838 0 27 91% 0% - 11% 1737 56387 1854862% 2786 1 3 28351 69429 27 97% 77% Ff 15% 2786 94237 6088624% 2137 1 4 28819 168587 27 99% 100% :f 17% 2137 72937 4797122% 1776 1 3 11305 96 27 96% 7% : 8% 1776 77683 3033568% 3154 1 3 31089 139776 27 98% 66% Ff 18% 3154 97516 9220233% 2884 1 3 16160 69349 27 97% 42% : 15% 2884 107474 4917553% 2175 1 3 21083 49794 27 97% 33% Fn 16% 2175 83760 4236338% 2444 1 3 24244 145267 27 99% 100% :f 14% 2444 101737 3554527% 2015 2 4 15249 35319 27 95% 18% : 12% 2015 60705 4684537% 1736 1 3 16255 7 27 94% 19% Fn 13% 1736 56553 2811552% 2253 1 9 23673 224311 27 98% 65% : 20% 2253 66491 4708015% 1294 1 3 6934 0 27 96% 0% - 6% 1294 48047 22544

Page 25: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 25

Case Study #2 – “sysstat” Answer

CPU Avg Low. Similar CP write traffic. Write Interval Varies WidelyLikely Under Utilised Controller.

CPU Total Net kB/s Disk kB/s Cache Cache CP CP Disk FCP FCP kB/s

Data Wrote per CP

Time Taken to write

in out read write age hit time ty util in out60% 2007 3 3 26466 128052 27 98% 86% Ff 15% 2007 57566 25914 228M 2.22s28% 2108 3 9 15840 82408 27 94% 100% :f 16% 2108 65316 2661018% 1745 1 3 12303 18299 27 95% 36% : 13% 1745 51064 2673020% 1737 1 3 11838 0 27 91% 0% - 11% 1737 56387 1854862% 2786 1 3 28351 69429 27 97% 77% Ff 15% 2786 94237 60886 237M 1.84s24% 2137 1 4 28819 168587 27 99% 100% :f 17% 2137 72937 4797122% 1776 1 3 11305 96 27 96% 7% : 8% 1776 77683 3033568% 3154 1 3 31089 139776 27 98% 66% Ff 18% 3154 97516 92202 208M 1.08s33% 2884 1 3 16160 69349 27 97% 42% : 15% 2884 107474 4917553% 2175 1 3 21083 49794 27 97% 33% Fn 16% 2175 83760 42363 229M 1.51s38% 2444 1 3 24244 145267 27 99% 100% :f 14% 2444 101737 3554527% 2015 2 4 15249 35319 27 95% 18% : 12% 2015 60705 4684537% 1736 1 3 16255 7 27 94% 19% Fn 13% 1736 56553 28115 224M 0.84s52% 2253 1 9 23673 224311 27 98% 65% : 20% 2253 66491 4708015% 1294 1 3 6934 0 27 96% 0% - 6% 1294 48047 22544

Page 26: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 26

Level-2 “Deeper Analysis”

Page 27: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 27

Perfstat Tool

Perfstat is a script that captures statit, sysstat, and other controller and client side performance statistics and configuration

Engineers run perfstat instead of individual performance commands

perfstat for UNIX

perfstat for Windows

Right-click on perfstat.sh and download to a valid host machine

Run perfstat from the host machine to produce the perfstat output file

Page 28: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 29

Perfstat – Downloadhttp://now.netapp.com/NOW/download/tools/perfstat/

Page 29: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 30

Typical Usage of Perfstat

With workload to be monitored running in backgroundperfstat -f <storage cntlr> -t 15 -i 88 -F -p > <cntlr>1.txt

(Collect at 15min intervals for 22 hours of day (outside backup window), controller performance data only, output to labelled text file)

Multiple filer use:perfstat -f filername1,filername2 -t 15 > perfstat.out

Multiple host use:perfstat -h host1,host2 -f filername -t 15 > perfstat.out

Page 30: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 32

Accessing PerfViewer (Partners)

Page 31: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 33

Accessing PerfViewer (Partners)

Page 32: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 34

Perfstat Tools – Offline Perfstat Grapher (Partners)

Reads in perfstat output and graphs the data

Page 33: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 35

Perfstat Tools – Perfstat Grapher (Direct Employees)

Reads in perfstat output and graphs the data

Page 34: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 36

Perfstat Tools – Perfstat Viewer

Reads in perfstat output and gives breakdown of output

Page 35: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 37

Data Review and Translation

Page 36: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 38

Sample Perfstat Output - pre v7.2

Page 37: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 39

Perfstat Output – Pre DOT v7.2.x

Page 38: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 40

statit Command

An advanced command for analysis of system resources

Gathers a set of performance statistics over an interval between time statit is begun and ended

– CPU

– Network interfaces

– Disks

– System software

Some of the information provided is also available from other commands

(Keep “baseline” of performance for future reference)

Page 39: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 41

statit – “CPU Statistics”

1

2[1][2]

[4][3]

[6][7]

[5]

> 190% = CPU bottleneck

The first section reports the system name and system id number, the amount of RAM, the software version, and date and time when statit -b was executed.

The second section of the report breaks down the usage of the CPU(s) to microsecond precision. [1] is elapsed time for the measurement, in seconds.

[2] ("system time") is time that the CPU was in use (not idle) and the percentage of elapsed time in that state. On a multiprocessor system, the system time is reported as a

sum across all CPUs (0-200%).

[3] ("rupt time") is time that the CPU was executing interrupt-level code, the percentage of elapsed time in that state, the number of interrupts, and the average CPU-time per interrupt.

On a multiprocessor system, the system time is reported as a sum across all CPUs (0-200%).

[4] ("non-rupt system time") is time that the CPU was executing base-level codeand the percentage of elapsed time in that state. On a multiprocessor system,

the non-rupt system time is reported as a sum across all CPUs (0-200%).

[5] ("idle time") is time that the CPU was idle and the percentage of elapsed time in that state.On a multiprocessor system, idle time is reported as a sum across all CPUs (0-200%).

[6] ("time in CP") is time that WAFL had a consistency point (CP) in progress (which may include time when the CPU was idle) and the percentage of elapsed time in that state (0-100%).

[7] ("rupt time in CP") is time that the CPU was executing interrupt-level code during a CP, the percentage of CP time spent in interrupt-level code, the number of CP

interrupts, and the average CPU-time per CP interrupt. On a multiprocessor system, rupt time in CP is reported as a sum across all CPUs (0-200%).

Can you tell what may be wrong with this picture?

Page 40: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 42

statit – Multi Processors

Coarse Symmetric Multi Processing (CSMP)– Multiprocessing architecture developed by NetApp

– Parallelism of processing across multiple CPUs through “Domains”

Data ONTAP Domains– DataONTAP is divided up into multiple “domains”

– A group of processes that contains and are organized together for ease and synchronisation

– A single processor can only work in one domain at any time

– Exception is the ‘WAFL Exempt’ (and IDLE) domain

– Ongoing development for multi processing improvements

Page 41: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 43

statit – Multi Processors

IDLE The CPU is not doing any processing

Network Net drivers, TCP/UDP/IP, NFS

RAID SCSI & FC operations

TARGET Several miscellaneous calls (FCP & iSCSI)

STORAGE SCSI & FC device drivers

EXEMPT Locking & libraries

KAHUNA WAFL, SnapMirror & the bulk of Data ONTAP code (Note that CIFS was given it’s own domain in DOT v7.2)

NETCACHE / 2Code shared with DataONTAP from Netcache software (Redundant Domains)

Data ONTAP Domains

Page 42: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 44

statit – Multi Processors

New with v7.2.xCIFS Handles CIFS protocol requests. New with v7.2.x

WAFL_EXEMPT New with v7.2.x

Page 43: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 45NetApp Confidential – Do Not Distribute 45

DOT v7.3 Multi-CPU Performance Improvements

Ongoing optimizations and parallelism for four-way FAS platforms– Better CPU utilization for FAS platforms with four or more

processors / cores – FAS3070, FAS6070, FAS6080– More efficient distribution of parallel network layer

processing on multiple CPUs– Move write allocation processing from kernel

to separate domain

Performance improvements are workload dependent– Most protocols should see benefit to some degree –

NFSv3 & 4, CIFS, iSCSI

Transparent – no configuration or commands

How? Many more domains!!

Page 44: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 46

statit – Multi Processor Statistics

2) MP Domain

4) Active CPU Time

3) Microseconds per Second

5) Active CPU Percentage

1) Kernel Events / Switches

# divided by 106 X 100 = MP

domain utilization %

Percent / usec CPUs have been running in Domain (>80%=concern)

Page 45: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 47

statit – Miscellaneous Statistics

Look for network KB received and transmitted to get an idea of the amount of traffic also

distribution of read and write workload

Receiving less than Writing – maybe due to parity calc, client ops or fragmentation

Overall Cache Effectiveness and Network Traffic against Disk Activity

Page 46: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 48

statit – WAFL Statistics

Type of CPs is one indicator of amount of write requests, disk i/o, and write performance

WAFL Statistics (per second) 63.20 name cache hits ( 28%) 165.49 name cache misses ( 72%) 1443.76 inode cache hits ( 96%) 55.13 inode cache misses ( 4%) 3213.54 buf cache hits ( 100%) 2.96 buf cache misses ( 0%) 0.23 blocks read 0.00 blocks read-ahead 0.00 chains read-ahead 0.00 dummy reads 0.00 blocks speculative read-ahead 415.10 blocks written 26.13 stripes written 0.00 blocks over-written 0.05 wafl_timer generated CP 0.00 snapshot generated CP 0.00 wafl_avail_bufs generated CP 0.00 dirty_blk_cnt generated CP 0.04 full NV-log generated CP 0.00 back-to-back CP 0.00 flush generated CP 0.00 sync generated CP 0.00 wafl_avail_vbufs generated CP 0.00 deferred back-to-back CP 850.86 non-restart messages 0.03 IOWAIT suspends 144280 buffers

Name cache hits = Name to File handle cache

Inode cache hits = Inode cache given a file handle

Buf cache hits = Cached block information

CP = Indicator of write requests, disk I/O, write performance

Performance of file and buffer cache and I/O Character by CP Type

Page 47: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 49

statit – Network Interfaces

Amount of network traffic in KBs. Can be divided

by 1024 to find MB/s.

Page 48: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 50

statit – RAID Stripes on Statit

Looking at RAID group sizes of 4 drives (not including parity)

Only 1 disk was written to the RAID group = 1.24 stripes

Only 2 disks were written to in the RAID group = 0.59 stripes

Only 3 disks were written to in the RAID group = 1.87 stripes

All 4 disks were written to in the RAID group = 76.15 stripes

Page 49: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 51

statit – RAID Stripes on Statit

Poor Write Allocation / Affect of RG Increase?We know the total number of stripes written is 79.85 stripesWe know the total number of full stripes written is 76.15 stripesSo ~95% of all stripes written to all 4 data disk RAID group are full stripes.

Also…664.09 blocks written vs. 200.42 stripes written = 3.31 to 1 ratio(Ratio of 2.5 - 1 or lower indicates potential poor write allocation (fragmentation))

Page 50: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 52

statit – Disk Statistics

Disk transfers indicate number of reads,writes, and other disk I/O; Max about 130 - 180

A handy guideto Disk Statistics

disk ut% xfers ureads-chain-usecs writes-chain-usecs cpreads-chain-usecs greads--chain-usecs gwrites-chain-usecs /vol0/plex0/rg0:

0a.1 4 3.57 0.54 1.00 26667 2.86 4.56 1644 0.18 14.00 1143 0.00 .... . 0.00 .... .

0a.0 3 2.86 0.54 1.00 18333 2.14 5.08 1967 0.18 2.00 5000 0.00 .... . 0.00 .... .

0a.5 2 1.61 0.18 1.00 15000 1.43 7.00 1429 0.00 .... . 0.00 .... . 0.00 .... .

0a.6 2 2.50 0.18 1.00 10000 2.14 5.67 1368 0.18 12.00 1167 0.00 .... . 0.00 .... .

0a.8 3 3.57 1.07 2.67 2313 2.32 5.38 1457 0.18 14.00 1071 0.00 .... . 0.00 .... .

0a.9 2 1.61 0.54 1.00 11667 0.89 11.00 1036 0.18 1.00 9000 0.00 .... . 0.00 .... .

0a.10 2 1.61 0.18 1.00 17000 1.25 7.86 1345 0.18 1.00 9000 0.00 .... . 0.00 .... .

/vol0/plex0/rg1:

0a.19 2 1.79 0.18 1.00 4000 1.61 6.22 1464 0.00 .... . 0.00 .... . 0.00 .... .

0a.20 4 3.75 2.32 2.08 4333 1.43 7.00 1554 0.00 .... . 0.00 .... . 0.00 .... .

0a.21 3 4.64 3.21 2.72 1265 1.25 7.86 1291 0.18 1.00 8000 0.00 .... . 0.00 .... .

0a.22 3 3.21 1.61 1.11 7800 1.61 6.22 1571 0.00 .... . 0.00 .... . 0.00 .... .

Disk Statistics (per second)

ut% is the percent of time the disk was busy.

xfers is the number of data-transfer commands issued per second.

xfers = ureads + writes + cpreads + greads + gwrites

chain is the average number of 4K blocks per command.

usecs is the average disk round-trip time per 4K block.

Page 51: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 53

statit – Disk Statistics

disk writes--chain-usecs cpreads-chain-usecs/vol0/plex0/rg0:0b.16 1.67 1.75 1400 1.88 2.33 60000b.17 1.51 1.72 1439 1.68 2.33 51430b.18 1.53 1.67 1419 1.73 2.33 5443

/vol1/plex0/rg0:0b.19 157.58 15.90 477 1.30 16.00 4600b.20 157.42 15.90 506 2.42 11.74 6130b.21 156.67 15.96 481 2.23 10.87 645

1. Add up the write operations for a RAID group

2. Add up the CPreads operations for a RAID group

3. Divide total write operations by total CPread operations

> 1.20 = RAID Group Good

1.20 – 1.0 = Concern

< 1.0 = Probably Fragmented

Method for Identifying likely Fragmentation

Page 52: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 54

statit – Disk Statistics

disk writes--chain-usecs cpreads-chain-usecs/vol0/plex0/rg0:0b.16 1.67 1.75 1400 1.88 2.33 60000b.17 1.51 1.72 1439 1.68 2.33 51430b.18 1.53 1.67 1419 1.73 2.33 5443

/vol1/plex0/rg0:0b.19 137.31 15.90 477 163.30 16.00 4600b.20 137.22 15.90 506 167.42 11.74 613

0b.21 146.56 15.96 481 164.23 10.87 645

/vol2/plex0/rg0:0b.22 157.58 14.90 437 1.30 16.00 4600b.23 157.42 14.90 596 2.42 11.74 6130b.24 156.67 14.96 421 2.23 10.87 645

Vol0

Total Writes = 4.35

Total CPreads = 5.29

Total Writes / Total CPreads = 0.82

Vol1

Total Writes = 421.09

Total CPreads = 494.95

Total Writes / Total CPreads = 0.85

maybe not Fragmented

RAID Group Good

Likely Fragmented

Vol2

Total Writes = 471.67

Total CPreads = 5.95

Total Writes / Total CPreads = 79.27

Lets test this theory…

Page 53: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 55

Throughput vs. Response Time CurvesAverage Request Response Time vs. IOPS

0

5

10

15

20

25

30

0 50 100 150 200 250 300 350

IOPS per data drive

mill

isec

on

ds

7.2K rpm SATA 10K rpm FC 15K rpm SAS 15K rpm FC

Page 54: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 56

“The Story”What have we surmised…This is a pre v7.2.x DOT (only 9 NetApp “Domains”) systemTwo processor system with lack of resources Spending almost all its time (97%) with CP WritesKahuna Domain more than x2 of others (not good) (Bottleneck in CPUs or lack of parallelism… even though flat out?)Kahuna and RAID switching indicates most of Domain activityOnly NFS operations. No other protocol active in periodGeneral workload Net-in with similar disk write activityGood caching with Net-out 3x disk reads (Network-in Bottleneck at 124MB/s due to connectivity limitations?) (Writing more to disk than Net-in. Fragmentation or Client operation?)WAFL file & buffer cache effective. CP Write activity high for WAFLFour NICS all active. Bottleneck not apparent now. No network errors.For most active RGs there doesn’t appear to be fragmentation (95% full stripes)Low disk utilisation/IOPS but mostly write traffic

RECOMMENDATION: “Likely under CPU resourced controller. Under performance for required workloads (in this sample). Controller upgrade necessary before DOT upgrade as latent demand/CSMP may create more issues”

Page 55: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 57

Case Study #3 – PerfViewer Output

Generating PerfViewer Output & Translating

?

SE “Deeper Analysis”

Page 56: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 58

Perfstat Tools – Throughput by Protocol

Reads in perfstat output & graphs the data

Page 57: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 59

Perfstat Tools – CPU & Disk Utilisation

Reads in perfstat output & graphs the data

Page 58: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 60

Perfstat Tools – Aggregate Throughput

Reads in perfstat output & graphs the data

Page 59: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 61

Perfstat Tools – Domain Utilisation

Reads in perfstat output & graphs the data

Page 60: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 63

Perfstat Tools – “sysstat – M” Output

Reads in perfstat output & graphs the data

Page 61: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 64

Sample Perfstat v7.3 Output

Page 62: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 65

Perfstat Output – DOT v7.3

Page 63: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 66

Performance AnalysisWrap Up

Page 64: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 67

C U S T O M E R

Performance Analysis Process

TSC

TSE

FSE Escalations

Sales / SE

Level 1 “Quick Look

Analysis”

Formal PS

Analysis (£)

Perfstat “Level 2 Analysis”

Customer Issue

Customer Audit

Customer Refresh

Page 65: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 68

Performance Strategy

Moving Forward…

– Premium AutoSupport Tool (“Willow” ASUP visualization & graphs)

– Operations Manager with Performance Advisor (a “must use”)

– Internal Tool using Perfstats and ASUPs to provide stats and graphs

– Updates to Stats command is the future!

Page 66: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 69

Performance Strategy

The process again…

– Practice & prepare yourself to understand command line / perfstat output

– If an issue… a call must be raised with GSC!!

– If it’s a sizing, scoping, general performance pre-sale question then use command line output

– If large project then use PS Service

– If formal pre-sales analysis then use perfstat

– Perfstats take a while to collect & translate

– Give yourself enough time to complete

Page 67: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 70

Performance Analysis Education

Page 68: 5046 Perfomance Analysis - Understanding Perfstat Data

© 2008 NetApp. All rights reserved. NetApp Confidential - Limited Use - Insight 2008 71

Thank You

© 2008 NetApp. All rights reserved. Specifications are subject to change without notice. NetApp, the NetApp logo, Go further, faster, FlexClone, FlexVol, RAID-DP, and Snapshot are trademarks or registered trademarks of NetApp, Inc. in the United States and/or other countries. Windows is a registered trademark of Microsoft Corporation. Linux is a registered trademark of Linus Torvalds. Solaris is a trademark of Sun Microsystems, Inc. Oracle is a registered trademark of Oracle Corporation. SAP is a registered trademark of SAP AG. VMware is a registered trademark of VMware, Inc. UNIX is a registered trademark of The Open Group. All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such.