[email protected] 763.228.6463

17
© 2011 IBM Corporation 1 IBM Internal Use Only [email protected] 763.228.6463 Freakish Database Performance With Flash Storage

Upload: gibson

Post on 26-Feb-2016

44 views

Category:

Documents


0 download

DESCRIPTION

Freakish Database Performance With Flash Storage. [email protected] 763.228.6463. Agenda. Share some experience with using solid state/ flash storage for database workloads: OLTP (2TB) Warehouse (76TB) Which workload characteristics can best leverage flash storage? - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: HuyLy@us.ibm.com 763.228.6463

© 2011 IBM Corporation1 IBM Internal Use Only

[email protected]

Freakish Database Performance With Flash Storage

Page 2: HuyLy@us.ibm.com 763.228.6463

© 2011 IBM Corporation2 IBM Internal Use Only

Agenda

• Share some experience with using solid state/ flash storage for database workloads:

• OLTP (2TB)

• Warehouse (76TB)

• Which workload characteristics can best leverage flash storage?

• What are some best practices

Page 3: HuyLy@us.ibm.com 763.228.6463

© 2011 IBM Corporation3 IBM Internal Use Only

OLTP Workload

Initial profile• A brokerage house package

• Batch cycle comprised of five Java programs (only one can be parallelized)

• 1.5M transactions in 8 hours after extensive application and SQL tuning

• 1.68TB uncompressed

• Online backup time (backup, then gzip) in 36 hours

The Challenge• Goal: 1.5 M trans in 5.5 hours

• Stretch goal: 1.5M trans in 2 hours

• Improve backup time

• What is possible if CPU, memory, storage, and network are not constrained?

Page 4: HuyLy@us.ibm.com 763.228.6463

© 2011 IBM Corporation4 IBM Internal Use Only

The Setup

No holds barred• 2 x 64 cores, 3.86

GHz, 1TB RAM

• 86TB HDD, 256GB cache – 2 ms average response time

• 1TB SSD

• 10GbE

Approaches• Enabled compression

• No database tuning

• All-HDD

• Mixed – SSD (logs & temp), HDD (data & indexes)

• All-SSD

Page 5: HuyLy@us.ibm.com 763.228.6463

© 2011 IBM Corporation5 IBM Internal Use Only

Results

Mixed SSD (logs & temp) & HDD (data, indexes) 14%

All – SSD 26%

Disk Utilization < 1% busy

Average IOPS 20

Throughput 450KB/s

Application Engines 30

Uncompressed offline backup 30 – 40 min

Compressed online/offline backup (SSD to HDD) 18 min

Accept all default database settings out of the box• STMM

• Auto runstats

• Auto online table reorg

Page 6: HuyLy@us.ibm.com 763.228.6463

© 2011 IBM Corporation6 IBM Internal Use Only

Application Engines Performance

Most improvements resulted from more CPUs for the application

• CPU intensive

• Verbose application logging

• Application logs generated more IOs than database!

• More application engines generating transactions to reduce batch elapsed time

• Low database IO profile

Page 7: HuyLy@us.ibm.com 763.228.6463

© 2011 IBM Corporation7 IBM Internal Use Only

Final Results

Results

Goal: 1.5M trans in 5.5 hours Y

Stretch goal: 1.5M trans in 2 hours Y

Improve backup time: 18 minutes v. 36 hours Y

Best result: 1.5M trans in 1.1 hours! (All SSD) Y

Page 8: HuyLy@us.ibm.com 763.228.6463

© 2011 IBM Corporation8 IBM Internal Use Only

Warehouse Workload

Initial profile• Servers and storage running 100% all day

long

• Maxed out at around 30 – 40 active users

• Half-stroked disks to get performance and throughput

The Challenge• Aging servers and storage

• Data center floor space, cooling, and power consumption constraints

• Same or better performance

Page 9: HuyLy@us.ibm.com 763.228.6463

© 2011 IBM Corporation9 IBM Internal Use Only

The Setup

Approach• Replacement will be very fast, very

small, very simple

Page 10: HuyLy@us.ibm.com 763.228.6463

© 2011 IBM Corporation10 IBM Internal Use Only

Database IO Improvement for Warehouse Workload76TB IBM SSD v. Old HDD

Sub-millisecond IO response time Sustained

• Synchronous reads 21.8x

• Synchronous writes 13.6x• Asynchronous reads 17.6x• Asynchronous writes 18.34x• Data pages per asynchronous request 1.8xNote: Asynchronous IOs are ~18x faster, each asynchronous request is ~2x more effective due to 32K page size, that is a 36x improvement.

Page 11: HuyLy@us.ibm.com 763.228.6463

© 2011 IBM Corporation11 IBM Internal Use Only

Benchmark Queries Improvement for Warehouse Workload76TB IBM SSD v. Old HDD

Benchmark details

• Actual IO and CPU intensive queries captured from business users

• Runs weekly to monitor any performance degradation with respect to new and organic growth in the warehouse over time

• Noise queries (75) + benchmark queries (25) = 100

All SSD Old

Noise queries completed 85% 32%

BM queries completed 100%(first time ever)

64%(historically never reached

100)

CPU utilization 30% 100%

Page 12: HuyLy@us.ibm.com 763.228.6463

© 2011 IBM Corporation12 IBM Internal Use Only

Benchmark Queries Speed Up Factor for Warehouse Workload(Plotted on Logarithmic Scale)

76TB IBM SSD v. Old HDD

Speed up details

• Average: 2.21 (log) or 163.96x faster

• Median: 1.48 (log) or 29.96x faster (50% is at least ~30x faster)

• Low: 0.56 (log) or 3.59x faster

• High: 3.05 (log) or 1,113.56x faster

• Time is measured as elapsed time (prepare + execute + fetch)

Page 13: HuyLy@us.ibm.com 763.228.6463

© 2011 IBM Corporation13 IBM Internal Use Only

CPU Utilization

About 30% busy … BTW … We are also using disk level encryption (SED)

Page 14: HuyLy@us.ibm.com 763.228.6463

© 2011 IBM Corporation14 IBM Internal Use Only

EXP30 Ultra SSD

IO Specifications• Each drive: SFF (1.8”), 1/5 of 1U, 387GB

• IO drawer: 30 drives (6 x 5). Total raw capacity: 11.6TB (30 x 387GB). Cache: 3.1GB

• IOPS: 400K (100% read) / 280K (70/30 R/W) / 165K (100% write)

• Two POWER 740 servers connected to one IO drawer

• PCIe attached via GX++ adapter (8Gb/s)

• Configured as 5+p LUNs (130GB LUNs)

Page 15: HuyLy@us.ibm.com 763.228.6463

© 2011 IBM Corporation15 IBM Internal Use Only

Deployment Considerations

• IO adapter card (HBA)

• At 120K – 400K IOPS per IO drawer, and 32K IO size, it is possible to saturate the HBA

• Plan for adequate number of HBAs

• If using SAN then be sure the bandwidth to the storage server is consistent along the whole path, for example, 8Gb/s

• Balance IOs across HBAs and front end ports for even utilization

• Be cautious about mixing flash storage & HDD drives in one HBA

• Fewer, larger LUNs (500GB– 700GB)

• LUNs do take up available system memory and CPU cycles on the server

• Multiple logical volumes per LUN, no reason to stripe LV across LUNs

• Use large page size (32K), extent size, but ensure that the database bufferpool(s) are adequately sized to accept big reads

• Optimize data movement with less IOPS. It is not about driving up IOPS

Page 16: HuyLy@us.ibm.com 763.228.6463

© 2011 IBM Corporation16 IBM Internal Use Only

Candidate Application Considerations• High IO profile

• Indexes, data

• Database logs and temp spaces can take advantage of cache write through already, may not be the best candidates

• Applications that can parallelize well to take advantage of higher IO throughput

• Before we can process more transactions per second the applications need to be able to generate more transactions per second

• For example, we needed to increase the number of application engines from 3 to 30 in order to generate 8x throughput in transaction rate

• Applications that spend more time fetching result sets across a network, rather than executing complex queries in the database, will likely see less improvement (slow consumers)

• client_idle_wait_time (ms) (time spent waiting for client/application to send its next request)

• If the database spends more time waiting for client/application to send work then improving database response time alone will not improve throughput.

• Increase application parallelism

• Look for network congestion issues

• call monreport.dbsummary(600), examine client_idle_wait_time

Page 17: HuyLy@us.ibm.com 763.228.6463

© 2011 IBM Corporation17 IBM Internal Use Only

Why Consider Flash Storage

• Greatly beneficial for high IO workloads

• Much smaller footprint, much more energy efficient

• Servers (11), IO drawers (7), power supply all fit in one rack!

• Achieve high performance, and throughput quickly without tuning

• Performance, reliability, price