high performance storage in today’s critical applications · 2020. 7. 16. · how flash...

17
1 © 2013 IBM Corporation High Performance Storage in Today’s Critical Applications March 23, 2014 Andy Walls, IBM Fellow, CTO and Chief Architect IBM Flash Systems

Upload: others

Post on 27-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: High Performance Storage in Today’s Critical Applications · 2020. 7. 16. · How Flash Accelerates Today’s Most Critical Applications • Latency – Inherent read latency –

1 © 2013 IBM Corporation

High Performance Storage in Today’s Critical Applications

March 23, 2014 Andy Walls, IBM Fellow, CTO and Chief Architect IBM Flash Systems

Page 2: High Performance Storage in Today’s Critical Applications · 2020. 7. 16. · How Flash Accelerates Today’s Most Critical Applications • Latency – Inherent read latency –

© 2013 IBM Corporation 2

Hard Disk Drive History

• RAMAC was the first hard disk drive! – One of the top technological inventions. . . .

EVER!!

• 5MB across 50 HUGE platters

• After 50 years, the capacity increase is incredible.

• As are the reliability increases. . . .

• Performance limited by the rate at which it can spin.

– 15K RPM

• Has not kept up with the speed of CPUs

RAMAC Prototype

Page 3: High Performance Storage in Today’s Critical Applications · 2020. 7. 16. · How Flash Accelerates Today’s Most Critical Applications • Latency – Inherent read latency –

© 2013 IBM Corporation 3

HDD growth focus: areal density for 50 years

HDD access latency: <10% / y for most of that period

Cache

Access Latency

Data Rate Areal Density

Data rate has just topped 100MB/sec. But RPM not increasing. New increases will come from linear density improvement

SO: With HDDs, Performance improvements have been gained by scaling out high speed disks and only using a portion

Outer Diameter

Hard Disk Drive History

Page 4: High Performance Storage in Today’s Critical Applications · 2020. 7. 16. · How Flash Accelerates Today’s Most Critical Applications • Latency – Inherent read latency –

© 2013 IBM Corporation 4

Hard Disk Drive Technology Has Not Kept Up With Advances in CPUs or CPU Scaling

As you can see from this database example, which uses rotating disk drives, even well-tuned databases have the opportunity to improve performance and reduce hardware resources

•App %

•Sys %

•I/O Wait %

Perc

ent

CPU Time

Clock Time Source: Internal IBM performance lab testing

Reducing I/O wait time can allow for higher server utilization

Page 5: High Performance Storage in Today’s Critical Applications · 2020. 7. 16. · How Flash Accelerates Today’s Most Critical Applications • Latency – Inherent read latency –

© 2013 IBM Corporation 5

Performance Gap

CPU performance up 10x this last decade

Storage has grown capacity but unable to keep up in performance

Systems are now Latency & IO bound resulting in significant performance gap

From 1980 to 2010, CPU performance has grown 60% per year*

…and yet, disk performance has grown ~5% per year during that same period**

IT Infrastructure Challenges

Page 6: High Performance Storage in Today’s Critical Applications · 2020. 7. 16. · How Flash Accelerates Today’s Most Critical Applications • Latency – Inherent read latency –

© 2013 IBM Corporation 6

Flash is a powerful accelerator for today’s critical applications

• Big Data – Hadoop, MongoDB, Cassandra

• High Performance Cloud

• Business Analytics

• OLTP

• HPC

Page 7: High Performance Storage in Today’s Critical Applications · 2020. 7. 16. · How Flash Accelerates Today’s Most Critical Applications • Latency – Inherent read latency –

© 2013 IBM Corporation 7

How Flash Accelerates Today’s Most Critical Applications

• Latency – Inherent read latency – Systems employ DRAM for buffering so write latency can be very fast

• IOPS – Very high IOPS – More importantly, high IOPS with low average response time under load. – More consistent performance - can handle temporary workload spikes

• High Throughput – Reduced table scan times – Reuced time for clones and snapshots – Reduced time for backup coalescence

• Reduction in batch windows

Page 8: High Performance Storage in Today’s Critical Applications · 2020. 7. 16. · How Flash Accelerates Today’s Most Critical Applications • Latency – Inherent read latency –

© 2013 IBM Corporation 8

The Impact of Low Latency on CPU Performance

100 microseconds : 1 second :: 1 second : 2.78 hours

MicroLatency deliver microseconds response time to accelerate critical applications to achieve competitive advantages

• Faster decision making • Increase revenue • Accelerate cost savings

• Eliminate wait time • Scale performance with

capacity

I/O Time Network Time CPU Time

I/O Time Network Time CPU Time Time Recovered

Disk-Based

FlashSystem

Page 9: High Performance Storage in Today’s Critical Applications · 2020. 7. 16. · How Flash Accelerates Today’s Most Critical Applications • Latency – Inherent read latency –

© 2013 IBM Corporation 9

The Value of Performance

Extreme Performance enable business to unleash the power of performance, scale, and insight to drive services and products to market faster • Improved end-user experience • Faster insights into critical applications

A 1-SECOND DELAY IN PAGE

LOAD TIME

= 7% LOSS IN CONVERSIONS

11% FEWER PAGE VIEWS

16% DECREASE IN CUSTOMER SATISFACTION

In dollar terms, this means that if your site typically earns $100,000 a day, this year you could lose $2.5 million in sales.

Source: Aberdeen Group

Page 10: High Performance Storage in Today’s Critical Applications · 2020. 7. 16. · How Flash Accelerates Today’s Most Critical Applications • Latency – Inherent read latency –

© 2013 IBM Corporation 10

Much has Changed Around Flash Enabling Technology

• Given the right controller technology, one really does not have to worry about endurance any more – IBM is a Leader in enabling MLC for enterprise applications

• Well designed all flash arrays can be designed with excellent write

performance

• Flash has excellent sequential throughput characteristics – Not just good random IOPs – Most workloads have some attributes of each and Flash excels

Page 11: High Performance Storage in Today’s Critical Applications · 2020. 7. 16. · How Flash Accelerates Today’s Most Critical Applications • Latency – Inherent read latency –

© 2013 IBM Corporation 11

Flash Offers Other Significant Advantages

• Power reductions – A key consideration in driving Internet data centers to Flash – Can be the main driver in internet data centers and Big Data

• Density

– Incredible densities per rack unit possible with Flash – Saves rack space, floor space

• Form Factors and Flexibility

– Can be placed in many parts of the infrastructure – Can go on DIMMs, PCIE slots, attached directly via cables, unique form factors,

etc.

4TB Custom Flash Module

Page 12: High Performance Storage in Today’s Critical Applications · 2020. 7. 16. · How Flash Accelerates Today’s Most Critical Applications • Latency – Inherent read latency –

© 2013 IBM Corporation 12

High Performance Networked Flash Storage Architectures

• Inside Traditional Storage Systems – Hybrid or pure storage

• All Flash Arrays

– SAN Attached – RDMA SAN

• IB SRP, iSER, RoCE – SAN “Less”

• Ethernet, iSCSI – Building blocks for scale out storage.

• Advantages – Shared! – High Availability built in – Advanced storage function

like Disaster Recovery – All flash array is flash

optimized from ground up

• Perceived weaknesses – Network latencies – Further away from CPU

Page 13: High Performance Storage in Today’s Critical Applications · 2020. 7. 16. · How Flash Accelerates Today’s Most Critical Applications • Latency – Inherent read latency –

© 2013 IBM Corporation 13

World Class and Consistent Performance!

0.00

0.25

0.50

0.75

1.00

1.25

1.50

1.75

2.00

2.25

2.50

2.75

3.00

3.25

3.50

3.75

4.00

0 200,000 400,000 600,000 800,000 1,000,000 1,200,000

Res

pons

e Ti

me

(ms)

IOPS

IBM FlashSystem 840 Random 4K Read/Write Performance

100% rr

90% rr-- 10% rw

80% rr-- 20% rw

70% rr-- 30% rw

60% rr-- 40% rw

50% rr-- 50% rw

40% rr-- 60% rw

30% rr-- 70% rw

20% rr-- 80% rw

10%rr -- 90% rw

100% rw

Page 14: High Performance Storage in Today’s Critical Applications · 2020. 7. 16. · How Flash Accelerates Today’s Most Critical Applications • Latency – Inherent read latency –

© 2013 IBM Corporation 14

High Performance Direct Attached Flash Storage Architectures

• PCIe Drawers – Dense and can be

attached to 2 servers

• PCIe Cards

• Flash DIMMs

• Advantages – Attached to lowest latency

buses – Memory bus is snooped – Uses existing infrastructure

for power/cooling • Perceived weaknesses

– No Inherent high availability – Mirroring more expensive

than RAID – No advanced DR or storage

functionality

Page 15: High Performance Storage in Today’s Critical Applications · 2020. 7. 16. · How Flash Accelerates Today’s Most Critical Applications • Latency – Inherent read latency –

© 2013 IBM Corporation 15

Bottlenecks in Flash Storage

• RAID Controllers – Flash Optimized RAID controllers with hardware assists now exist

• Network HBAs – Reductions in latency – RDMA protocols

•OS and Stack Latency! – Standard driver model adds significant latency and reduces IOPS per core by an

order of magnitude – Fusion-io Atomic Writes – sNVMe and SCSIe – IBM Power CAPI

• Many Legacy Applications written around HDDs – Added path length to coalesce, avoid store, etc.

Page 16: High Performance Storage in Today’s Critical Applications · 2020. 7. 16. · How Flash Accelerates Today’s Most Critical Applications • Latency – Inherent read latency –

© 2013 IBM Corporation 16

User Space

Concept • Attach FlashSystem to POWER8 via CAPI coherent attach • CAPI flash controller operates in user space to eliminate 97% of instruction path length • Lowest achievable overhead and latency memory to flash. • Saves up to 10-12 cores per 1M IOPs

Application

Kernel

LVM

Disk & Adapter DD

FileSystem

Application

User Space Library Shared Memory Work Queue in cache Hierarchy

* CAPI (Coherent Accelerator Processor Interface)

Legacy Filesystem Stack CAPI Flash Stack User Space

Bounce Buffering and context switch overheads

Memory pinning and mapping for DMA overhead

Standard PCI-express Bus

Lowest achievable latency and overhead from DRAM to Flash

20K instructions reduced to <500 (lower core overhead)

CAPI Bus

Allows many Cores to directly Drive IOP

CAPI Attached Flash Value

11.20.2013 IBM CONFIDENTIAL 16

Page 17: High Performance Storage in Today’s Critical Applications · 2020. 7. 16. · How Flash Accelerates Today’s Most Critical Applications • Latency – Inherent read latency –

© 2013 IBM Corporation 17

Workload Optimized Systems and Flash

• Analytics – Very fast table scans – Tremendous IOPS capability to identify patterns and

relationships

• OLTP – Credit card, travel reservation, other – Can share without sacrificing IOPs – But low response time is key

• Cloud and Big Data

– Either inside servers as hyper converged or – Linear scale out with QoS for Grid Scale.