proactive storage performance and capacity management

54
Storage Intelligence Proactive Storage Performance and Capacity Management Lee LaFrese – Senior Consultant [email protected] 214-432-7920 www.intellimagic.net © IntelliMagic 2014

Upload: intellimagic

Post on 09-Jun-2015

264 views

Category:

Software


3 download

TRANSCRIPT

Page 1: Proactive storage performance and capacity management

Smart Storage Sizing

Storage Intelligence

Proactive Storage Performance and Capacity Management

Lee LaFrese – Senior Consultant [email protected]

214-432-7920 www.intellimagic.net

© IntelliMagic 2014

Page 2: Proactive storage performance and capacity management

Storage Intelligence

Lee LaFrese, Senior Storage Consultant

Speaker

© IntelliMagic 2014 2

Page 3: Proactive storage performance and capacity management

Storage Intelligence © IntelliMagic 2014 3

Objectives

What is Storage Intelligence?

Storage Technology Overview and Measurement

IntelliMagic Vision’s Proactive Analytics

Case Study – High Front-end Response Times

Conclusion and Q&A

Page 4: Proactive storage performance and capacity management

Storage Intelligence

Performance Management: Perception and Reality

Common IT Perceptions

• “We have ‘proactive’ monitoring – we get alerts on outages, usually before the end user calls to report the problem”

Reality

• ‘Proactive’ requires predictive analytics to prevent problems. Anything less is just ‘airbag monitoring’

• “Disruptions due to performance problems are inherently unexpected and cannot be predicted”

• “Optimization is too complex and time consuming, we don’t have the hours or headcount for that”

• Most I/O problems are not due to failures. They are from unbalanced or over-saturated components - which can be predicted

• The more constrained your staff, the more you need automated risk prediction to avoid fire fighting and constant ‘whack a mole’

© IntelliMagic 2014

• “Our storage hardware vendor’s proprietary tools provide me with good enough insight”

• Most IT environments are multi-vendor which implies multiple proprietary tools and duplicate resources to manage them

Page 5: Proactive storage performance and capacity management

Storage Intelligence

Storage Cost Trends: Better and Worse

Storage Costs Are Decreasing Storage Costs Are Increasing

• Virtualization

• De-duplication, thin provisioning

• Compression

• SSD, Automated Tiering

• Commoditization of hardware

• Cloud storage options

• Blurring of the lines between file and block level storage

• Constant growth in Data • More frequent spikes in Demand

(e.g., mobile) creates SLA pressure • More hotspots - Access Density is

higher: de-duped, thin provisioned, compressed...

• More complexity to manage • Expertise shortage, constantly

growing ratio of devices per head

Constant, Fresh Storage Intelligence about your workloads on your hardware is how to minimize both business risk and total storage costs

© IntelliMagic 2014

Page 6: Proactive storage performance and capacity management

Storage Intelligence 6

Time

Response Time

Sub-component Saturation

Classic Monitoring tools

Storage Intelligence provides Key Risk Indicators

SLA

Perf

orm

ance

Storage Intelligence delivers constant knowledge about how your infrastructure is handling your peak workloads

© IntelliMagic 2014

Storage Intelligence uses predictive analytics to produce Key Risk Indicators that identify root issues before the “knee of the curve” is reached avoiding SLA violations

Page 7: Proactive storage performance and capacity management

Storage Intelligence

IntelliMagic is a leader in advanced predictive analytics - especially for large data storage

infrastructures

Over 20 years developing storage performance modeling solutions

Privately held, financially independent

Customer centric and highly responsive

Proactive Performance Monitoring: Find / Avoid Problems

Predictive Performance Modeling Services: Optimize Investments

© IntelliMagic 2014 7

Page 8: Proactive storage performance and capacity management

Storage Intelligence

IntelliMagic solutions are used to mange many of the largest, most complex IT environments in the world:

• 4 of the 5 largest banks in the US • The largest wireless provider in the US market • The world’s largest manufacturer of farm equipment • One of the largest auto manufacturers in the world • Multiple US Federal Agencies • The largest bank in Brazil • The largest data center in Germany

Trusted to provide solutions for some of the largest enterprise storage vendors:

• IBM • Disk Magic, Capacity Magic, Batch Magic, etc.

• HP

Who uses IntelliMagic?

8 © IntelliMagic 2014

Page 9: Proactive storage performance and capacity management

Storage Intelligence

IntelliMagic is the market leader in providing advanced, actionable Storage Intelligence

9 © IntelliMagic 2014

Visualizes workload constraints deep inside storage devices

Quantifies Key Risk Indicators (KRIs) across the enterprise

Includes performance best practices learned from peer sites

Page 10: Proactive storage performance and capacity management

Storage Intelligence

Quantifies risk of specific storage components being unable to handle the daily peak period workloads running on that hardware

Storage Intelligence: See Deep Inside Storage Devices

10 © IntelliMagic 2014

Page 11: Proactive storage performance and capacity management

Storage Intelligence

Provides hardware and workload sensitive Key Risk Indicators (KRI’s) which are understandable in a single glance and give early warning of issues

Storage Intelligence: See Risk Across Entire Enterprise

11 © IntelliMagic 2014

Page 12: Proactive storage performance and capacity management

Storage Intelligence © IntelliMagic, 2014 12 The intuitive web interface provides a clear, prioritized view of risks and observations

© IntelliMagic 2014

Page 13: Proactive storage performance and capacity management

Storage Intelligence

IntelliMagic Vision Architecture z/OS Disk

13 © IntelliMagic 2014

Page 14: Proactive storage performance and capacity management

Storage Intelligence

IntelliMagic Vision Architecture z/OS Tape

14 © IntelliMagic 2014

Page 15: Proactive storage performance and capacity management

Storage Intelligence

IntelliMagic Vision Architecture Distributed SAN

15

IntelliMagic Vision as a Service (IVaaS) is available for all three platforms and include IntelliMagic consulting. Hard to obtain knowledge from experiences outside your company borders benefits you.

© IntelliMagic 2014

Page 16: Proactive storage performance and capacity management

Storage Intelligence

Infrastructure Layers

© IntelliMagic 2014 16

Host multi-pathing or

HA contention

Fabric/FICON or ISL

congestion

Front-end, cache, CPU or back end

Potential Bottlenecks

Unix Servers

Fabric

Storage Systems

VMWare Servers

Hosts

System z Servers

Page 17: Proactive storage performance and capacity management

Storage Intelligence

Why Do I/O Bottlenecks Occur?

Symptom Infrastructure Layer Bottleneck Reason Common

Solutions

Poor/inconsistent I/O response time on host

DSS Front End Imbalanced and overs-subscribed Host Adapters or ports

Rebalance or add adapters or ports

Poor/inconsistent I/O response time on host

DSS Back-end HDDs

Too many I/Os per spindle

Add disks/SSDs or move workload

Low read-hit ratio on and/or back-end storage controller

DSS Cache Cache hostile applications (i.e., large data warehouse) and large/few storage pools

Add cache, rebalance or add storage pool resources

Poor/inconsistent I/O response time on host

Host Multi-Pathing

Only using a single path when bandwidth required is greater

Increase pathing

Poor/inconsistent front-end response times

Fabric ISL or path

Poor design, ISLs oversubscribed or hardware outage

Use design best practices, reduce oversubscription if appropriate

© IntelliMagic 2014 17

Bottlenecks are a leading

cause of performance

issues.

Page 18: Proactive storage performance and capacity management

Storage Intelligence

CF

Collect data on a per sysplex basis

Consolidate by DSS for reporting

CHPIDs CHPIDs / MIF

z/OS z/OS z/OS

FCD FCD

73: Channel

74.5: Cache

73: Channel 74.1: z/OS Device

74.5: Cache 74.8: Link and HDD

78.3: I/O Queueing

74.7: FICON Director

42.5/6: DFSMS Dataset

required

optional

70: Processor 72: Workload 75: Paging

z/OS Data Sources

18

74.4: CF

74.2: XCF

30: Job

74.7: FICON Director

© IntelliMagic 2014

Optional User Records: Type 105: GDPS GM Type 206: SRDF/A VISION may also process GDPS GEOXPARM when loading data from an XRC installation

Page 19: Proactive storage performance and capacity management

Storage Intelligence

Data Sources for z/OS Tape

19

required

optional

CF

CHPIDs CHPIDs / MIF

z/OS z/OS z/OS

TMS 21: Mounts

30: Jobs/Pgms

14: DSN Read

15: DSN Write

Real and/or Virtual Tape

Collect data on a per Library (Grid/Cluster) basis

Consolidate by Grid/Library Cluster for reporting

BVIR TS7700

Optional Back-end Tape

Virtual Tape

© IntelliMagic 2014

Page 20: Proactive storage performance and capacity management

Storage Intelligence

What Can We Measure – SAN Storage

© IntelliMagic 2014 20

Component Metrics Port Throughput

Adapter / processor

Throughput, I/Os, response time, utilization

Volume Throughput, I/Os, response time, cache hits, sequential stages, cache-full delays, backend activity

Disk / RAID group

Throughput, I/Os, response time / utilization

Storage View

HBA

HBA

HA LUN

core

core

edge

edge

edge

edge HA

Page 21: Proactive storage performance and capacity management

Storage Intelligence

What Can We Measure – SAN Fabric

© IntelliMagic 2014 21

Component Metrics Port traffic Bytes, packets, and frames transmitted and received

Port errors Address errors, zero buffer-to-buffer credits, CRC errors, link failures, signal failures, …

Fabric View

HBA

HBA

HA LUN

core

core

edge

edge

edge

edge HA

Page 22: Proactive storage performance and capacity management

Storage Intelligence

Rated Performance Reports

© IntelliMagic 2014 22

No Border, No Rating Green Border, Good Yellow Border, Early Warning

Red Border, Performance Exceptions Reports for key

metrics are rated according

to adaptive thresholds defined per

platform, providing pro-active warning

of potential performance

issues.

Page 23: Proactive storage performance and capacity management

Storage Intelligence

Static Thresholds

© IntelliMagic 2014 23

Thresholds can be changed if

required.

Chart frame indicates if there is a problem.

Thresholds based on hardware

configuration.

Page 24: Proactive storage performance and capacity management

Storage Intelligence

Dynamic Workload Based Thresholds

© IntelliMagic 2014 24

Some thresholds, like those for Front-

end Read Response time are based upon the capabilities

of the controller,

activity and workload.

Page 25: Proactive storage performance and capacity management

Storage Intelligence

Vision for z/OS Examples

25 © IntelliMagic 2014

Page 26: Proactive storage performance and capacity management

Storage Intelligence

Throughput per DSS (MB/s) [rating: 0.15] for all Disk Storage Systems by Serial

26 © IntelliMagic 2014

Throughput shows storage

machine capabilities and

thus it is important for

proactive monitoring.

Page 27: Proactive storage performance and capacity management

Storage Intelligence

Response Time (ms) [rating: 0.00] for all Disk Storage Systems by Serial

27 © IntelliMagic 2014

From this chart, you can drill-down to

further analyze response time.

Page 28: Proactive storage performance and capacity management

Storage Intelligence

Dissecting Disconnect Time

Compute Read Miss and Synchronous Replication components Queuing & Delays is disconnect time that can’t be accounted for

28 © IntelliMagic 2014

Page 29: Proactive storage performance and capacity management

Storage Intelligence

Front-end Adapter Utilization (%) [rating: 0.00] by Serial

29

IntelliMagic Vision goes beyond just showing the

response time. This chart looks

at the Host Adapters

themselves.

© IntelliMagic 2014

Page 30: Proactive storage performance and capacity management

Storage Intelligence

Storage Pool Read & Write Response Time

30

Response time at the drive level tells us

when the disks are overloaded

From this chart, you can drill-down to

further analyze response time.

© IntelliMagic 2014

Page 31: Proactive storage performance and capacity management

Storage Intelligence

Replication Send (MB/s) [rating: 0.00] for all Ports by Serial

31 © IntelliMagic 2014

Page 32: Proactive storage performance and capacity management

Storage Intelligence

Average RPO over interval (sec) [rating: 0.00] for all Global Mirror Sessions by Session name

32 © IntelliMagic 2014

Page 33: Proactive storage performance and capacity management

Storage Intelligence

All CP, zIIP and zAAP time used (processors) for all Service Classes by Service Class

33

The “by service class” charts

help show changes in workloads

© IntelliMagic 2014

From this chart, you can drill-down to

further analyze response time.

Page 34: Proactive storage performance and capacity management

Storage Intelligence

Address spaces with highest Disk I/O intensity (#) (top 20)

For Service Class 'YXDNXY' by Address Space Name

34

You can easily see which

service classes can benefit from proper

storage management

© IntelliMagic 2014

From this chart, you can drill-down to

further analyze response time.

Page 35: Proactive storage performance and capacity management

Storage Intelligence

Coupling Facility Dashboard [rating: 3.00] for all Coupling Facilities by CF Name

35 © IntelliMagic 2014

Page 36: Proactive storage performance and capacity management

Storage Intelligence 36

Example of a Tape Dashboard

Page 37: Proactive storage performance and capacity management

Storage Intelligence

z/OS Tape Drill Down to a Detail Rated Chart

37 © IntelliMagic 2014

Page 38: Proactive storage performance and capacity management

Storage Intelligence

Vision for Distributed SAN Examples

38 © IntelliMagic 2014

Page 39: Proactive storage performance and capacity management

Storage Intelligence

Operations: Instant Enterprise Wide Health Check of Storage Environment

© IntelliMagic 2014 39

Automated Enterprise

Performance Dashboard

Report.

Quickly survey your enterprise

storage performance

health.

Page 40: Proactive storage performance and capacity management

Storage Intelligence © IntelliMagic 2014

Cluster (Host) Performance Dashboard

40

Summarize all key clusters

(hosts) and drill down to identify

performance issues.

Page 41: Proactive storage performance and capacity management

Storage Intelligence

Sample Fabric Health Check

© IntelliMagic 2014 41

Quickly detect and identify any port/SFP issues

across SAN environment.

Page 42: Proactive storage performance and capacity management

Storage Intelligence

Performance/Capacity: Identify Imbalances

© IntelliMagic 2014 42

Drill down to ports and

volumes to identify the

volumes and associated

hosts causing the imbalances.

Chart shows average, min/max and standard deviation over time

Page 43: Proactive storage performance and capacity management

Storage Intelligence © IntelliMagic 2014

Capacity Management: Location, Tier Trending, and Detailed Reports

43

Identify trends and determine which storage

pools are growing over

time.

Page 44: Proactive storage performance and capacity management

Storage Intelligence

Case Study High Front-end Response Times

© IntelliMagic 2014 44

Page 45: Proactive storage performance and capacity management

Storage Intelligence

DSS Dashboard Minicharts

45

Drill down shows rated mini line charts. This is

useful in seeing relationships

between the data.

Notice the relationship between the

throughput and response time and

Read Hit %.

© IntelliMagic 2014

Page 46: Proactive storage performance and capacity management

Storage Intelligence

Response Time (ms) [rating: 0.61] For Serial 'IBM-000'

46

Response time peaks at 11:00

PM.

© IntelliMagic 2014

Page 47: Proactive storage performance and capacity management

Storage Intelligence

Drive Read Response (ms) [rating: 0.00] For Serial 'IBM-000'

47

Average back-end response

time has lots of peaks. This

indicates a lack of correlation with

front-end response time.

© IntelliMagic 2014

Page 48: Proactive storage performance and capacity management

Storage Intelligence

Fibre Front-End Read Response (ms) [rating: 1.62]

for all Ports by Serial

48

Fibre port read response times peak at 11:00

PM!

© IntelliMagic 2014

Page 49: Proactive storage performance and capacity management

Storage Intelligence

Read+Write (MB/s) For Serial 'IBM-000' by HA Name

49

Peak throughput by HA shows

imbalanced host adapter

throughput.

Throughput is reaching peak for this type of host adapter capabilities.

© IntelliMagic 2014

Page 50: Proactive storage performance and capacity management

Storage Intelligence

Conclusion

© IntelliMagic 2014 50

Finding Recommendations HA0000 & HA0001 are saturated during peaks periods. Back-end drive write response times peak during same time but there are no FW bypasses. Front-end write response time should not be affected by increases in back-end response time.

Redistribute some of the load from HA0000 & HA0001 to HA0002 which was previously not in use.

Page 51: Proactive storage performance and capacity management

Storage Intelligence © IntelliMagic, 2014

IntelliMagic Offerings Summary

IntelliMagic Vision as a Service • Daily Monitoring / Analysis / Reporting • Expert Recommendations • 1-2 Day Install – Analytics Next Day

IntelliMagic Vision On Premise • SAN, z/OS or Tape • Perpetual or Term Licenses

• IntelliMagic Direction storage modeling engagements

• In a “tech refresh cycle” - Model prospective hardware vendors’ ability to deliver your specific workloads

51

Proactive Performance Monitoring: Find / Avoid Problems

Predictive Performance Modeling Services: Optimize Investments

Special Offer for Today’s Webinar Participants Customers who sign up for IntelliMagic Vision as a Service by August 29

will have their Initial Setup Fee waived (a $10,000 value)

Page 52: Proactive storage performance and capacity management

Storage Intelligence

Questions?

© IntelliMagic 2014 52

Contact us for a custom demo

Call 1-877-815-3799 (toll free) Email [email protected] Web www.intellimagic.net

Twitter: @IntelliMagic

Page 53: Proactive storage performance and capacity management

Storage Intelligence

Appendix

53 © IntelliMagic 2014

Page 54: Proactive storage performance and capacity management

Storage Intelligence

Platform Support – Distributed SAN

© IntelliMagic 2014 54

Vendor Models

EMC VNX (block), VMAX, DMX, CX

IBM SVC, V7000, DS8000, XIV, DS3000,DS4000,DS5000

HP 3PAR, P2000, P9500

HDS VSP, AMS

Brocade Fabric Switches