hstorage-db: heterogeneity-aware data management to exploit the full capability of hybrid storage...

30
hStorage-DB: Heterogeneity-aware Data Management to Exploit the Full Capability of Hybrid Storage Systems Tian Luo Rubao Lee Xiaodong Zhang Michael Mesnier Feng Chen The Ohio State University Intel Labs

Upload: sally-scholes

Post on 01-Apr-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: HStorage-DB: Heterogeneity-aware Data Management to Exploit the Full Capability of Hybrid Storage Systems Tian Luo Rubao Lee Xiaodong Zhang Michael Mesnier

hStorage-DB: Heterogeneity-aware Data Management to

Exploit the Full Capability of Hybrid Storage Systems

Tian LuoRubao Lee

Xiaodong Zhang

Michael MesnierFeng Chen

The Ohio State University Intel Labs

Page 2: HStorage-DB: Heterogeneity-aware Data Management to Exploit the Full Capability of Hybrid Storage Systems Tian Luo Rubao Lee Xiaodong Zhang Michael Mesnier

2

Heterogeneous Storage Resources vs. Diverse QoS Requirements of DB Requests

• Storage advancement provides us with – High capacity, low cost, but slow hard disk devices (HDD)– Fast, low power, but expensive solid state devices (SSD)– HDD and SSD co-exist due to their unique merits and limits

• DB requests have diverse QoS requirements – Different access patterns: bandwidth/latency demands– Different priorities of data processing requests– Dynamic changes of requirements

• Hybrid storage can well satisfy diverse QoS of DB requests – should be automatic and adaptive with low overhead – But with challenges

Page 3: HStorage-DB: Heterogeneity-aware Data Management to Exploit the Full Capability of Hybrid Storage Systems Tian Luo Rubao Lee Xiaodong Zhang Michael Mesnier

3

Challenges for Hybrid Storage Systems to Satisfy Diverse QoS Requirements

• DBMS (What I/O services do I need as a storage user?) – Classifications of I/O requests into different types– hStorage awareness– DBMS enhancements to utilize classifications automatically

• hStorage (What can I do for you as a service provider?) – Clear definition of supported QoS classifications– Hide device details to DBMS– Efficient data management among heterogeneous devices

• Communication between DBMS and hStorage – Rich information to deliver but limited by interface abilities– Need a standard and general purpose protocol

Page 4: HStorage-DB: Heterogeneity-aware Data Management to Exploit the Full Capability of Hybrid Storage Systems Tian Luo Rubao Lee Xiaodong Zhang Michael Mesnier

4

Current interface to access storage

read/write(int fd, void *buf, size_t count);

On-disk location In-memory data Request size

This interface cannot inform storage the per-request QoS. So, we must take other approaches.

Page 5: HStorage-DB: Heterogeneity-aware Data Management to Exploit the Full Capability of Hybrid Storage Systems Tian Luo Rubao Lee Xiaodong Zhang Michael Mesnier

5

DBA-based Approach

• DBAs decide data placement among heterogeneous devices based on experiences

• Limitations:– Significant human efforts: expertise on both DB and storage.– Large granularity, e.g. table/partition-based data placements – Static storage layout:

• Tuned for the “common” case• Could not well respond to execution dynamics

Indexes Other data

DBMS

SSD HDD

Page 6: HStorage-DB: Heterogeneity-aware Data Management to Exploit the Full Capability of Hybrid Storage Systems Tian Luo Rubao Lee Xiaodong Zhang Michael Mesnier

6

Monitoring-based Solutions

• Storage systems automatically make data placement and replacement decisions, by monitoring access patterns– LRU (a basic structure), LIRS (MySQL), ARC (IBM I/O controller)– Examples from industry:

• Solid State Hybrid Drive (Seagate)• Easy Tier (IBM)

• Limitations:– Takes time to recognize access patterns

• Hard to handle dynamics in short periods

– With concurrency, access patterns cannot be easily detected– Certain critical insights are not access patterns related

• Domain information (available from DBMS) is not utilized

Page 7: HStorage-DB: Heterogeneity-aware Data Management to Exploit the Full Capability of Hybrid Storage Systems Tian Luo Rubao Lee Xiaodong Zhang Michael Mesnier

7

What information from DBMS we can use?

• System catalog– Data type: index, regular table– Ownership of data sets: e.g. VIP user, regular user

• Query optimizer – Orders of operations and access path – Estimated frequency of accesses to related data

• Query planner– Access patterns

• Execution engine – Life cycles of data usage

They are un-organized semantic information for I/O requests

Page 8: HStorage-DB: Heterogeneity-aware Data Management to Exploit the Full Capability of Hybrid Storage Systems Tian Luo Rubao Lee Xiaodong Zhang Michael Mesnier

8

DBMS Knowledge is not Well Utilized

Buffer Pool Manager

Storage Manager

Request

Storage

I/O Request

Query Optimizer System

Catalog

Execution Engine

Block interface:r/w, LBN, data, size

Does not consider critical semanticinformation for storage management

Page 9: HStorage-DB: Heterogeneity-aware Data Management to Exploit the Full Capability of Hybrid Storage Systems Tian Luo Rubao Lee Xiaodong Zhang Michael Mesnier

9

Goal: organize/utilize DBMS semantic Information

Buffer Pool

Query Optimizer

Checkpoint

Vacuum

Bkgd. processes Connection pool

User1 User2

。。。

DBMS

SequentialRandom

Repeated scan

Sys table Index User Table Temp data

The mission of hStorage-DB is to fill this gap.

Storage

Semantic gap

Page 10: HStorage-DB: Heterogeneity-aware Data Management to Exploit the Full Capability of Hybrid Storage Systems Tian Luo Rubao Lee Xiaodong Zhang Michael Mesnier

10

hStorage-DB: DBMS for hStorage

• Objectives:– Automatic system management – High performance

• Utilizing available domain knowledge within DBMS for storage I/O• Fine-grained data management (block granularity)• Well respond to the dynamics of DB requests with different QoS reqs

• System Design Outline– A hStorage system specifies a set of QoS policies– At runtime, the DBMS selects the needed policy for each I/O

request based on the organized semantic information– I/O requests and their QoS policies are passed to hStorage system– The hStorage system makes data placement actions accordingly.

Page 11: HStorage-DB: Heterogeneity-aware Data Management to Exploit the Full Capability of Hybrid Storage Systems Tian Luo Rubao Lee Xiaodong Zhang Michael Mesnier

11

Outline

IntroductionhStorage-DB Caching priority of each I/O request Evaluation

Page 12: HStorage-DB: Heterogeneity-aware Data Management to Exploit the Full Capability of Hybrid Storage Systems Tian Luo Rubao Lee Xiaodong Zhang Michael Mesnier

12

Structure of hStorage-DB

Buffer Pool Manager

Storage ManagerInfo 1 Info N QoS policy

(Policy assignment table)

Request + Semantic Information

Storage System Control Logic

I/O Request + QoS policy

SSD SSD……HDD HDD

Query Optimizer Query

Planner

Execution Engine

Page 13: HStorage-DB: Heterogeneity-aware Data Management to Exploit the Full Capability of Hybrid Storage Systems Tian Luo Rubao Lee Xiaodong Zhang Michael Mesnier

13

Highlights of hStorage-DB

• Policy assignment table– Stores all the rules to assign a QoS policy for each I/O request – Assignments are made on organized DB semantic information

• Communication between a DBMS and hStorage– The QoS policy for each I/O request is delivered to a hStorage

system by protocol of “Differentiated Storage Services” (SOSP’11)– hStorage system makes action accordingly

Page 14: HStorage-DB: Heterogeneity-aware Data Management to Exploit the Full Capability of Hybrid Storage Systems Tian Luo Rubao Lee Xiaodong Zhang Michael Mesnier

14

The Interface Used in hStorage-DB

fd=open("foo", O_RDWR|O_CLASSIFIED, 0666);

qos = 19;

myiov[0].iov_base = &qos;

myiov[0].iov_len = 1;

myiov[1].iov_base = “Hello, world!”;

myiov[1].iov_len = 13;

writev(fd, myiov, 2);

Open with a flag

QoS policy of this equest

Payload

QoS is delivered with the payload

Page 15: HStorage-DB: Heterogeneity-aware Data Management to Exploit the Full Capability of Hybrid Storage Systems Tian Luo Rubao Lee Xiaodong Zhang Michael Mesnier

15

QoS Policies

• They are high-level abstractions of hStorage systems– Hide device complexities– Match resource characteristics

• QoS policy examples:• High bandwidth (parallelism in SSD/disk array) • Low latency for random accesses (SSD) • Low latency for large sequential accesses (HDD) • Reliability (data duplications)

• For a caching system– caching priorities: Priority 1, Priority 2, …, Bypass

Page 16: HStorage-DB: Heterogeneity-aware Data Management to Exploit the Full Capability of Hybrid Storage Systems Tian Luo Rubao Lee Xiaodong Zhang Michael Mesnier

16

Outline

Introduction Design of hStorage-DBCaching priority for each I/O request Evaluation

Page 17: HStorage-DB: Heterogeneity-aware Data Management to Exploit the Full Capability of Hybrid Storage Systems Tian Luo Rubao Lee Xiaodong Zhang Michael Mesnier

17

Caching Priorities as QoS Policies

• Priorities are enumerated– E.g. 1, 2, 3, …, N– Priority 1 is the highest priority• Data from high-priority requests can evict data cached

for low-priority requests

• Special “priorities”– Bypass• Requests with this priority will not affect in-cache data

– Eviction• Data accessed by requests with a eviction “priority” will

be immediately evicted out of cache– Write buffer

Page 18: HStorage-DB: Heterogeneity-aware Data Management to Exploit the Full Capability of Hybrid Storage Systems Tian Luo Rubao Lee Xiaodong Zhang Michael Mesnier

18

From Semantic Information to Caching Priorities

• Principle: 1. possibility of data reuse: no reuse, no cache2. benefit from cache: no benefit, no cache (repeated scan)

• Methodology:1. Classify requests into different types (focus on OLAP)• Sequential access• Random access• Temporary data requests• Update requests

2. Associate each type with a caching priority• Some types are further divided into subtypes

3. The hStorage system makes placement decisions accordingly upon receiving each I/O request

Page 19: HStorage-DB: Heterogeneity-aware Data Management to Exploit the Full Capability of Hybrid Storage Systems Tian Luo Rubao Lee Xiaodong Zhang Michael Mesnier

19

Policy Assignment Table

Sequential accesses Priority 1

Priority 2

Priority N

Bypass

Eviction

Write Buffer

…Random accesses

Temporary data accesses

Temporary Data delete

Updates

Page 20: HStorage-DB: Heterogeneity-aware Data Management to Exploit the Full Capability of Hybrid Storage Systems Tian Luo Rubao Lee Xiaodong Zhang Michael Mesnier

20

Random Requests• Determined by operator position in query plan tree• Follows the iteration model

Join

on: t.aIndex Scan

Join

on: t.aIndex Scan

Join

on: t.bIndex Scan

on: t.bSequential Scan

Hash

Join

on: t.cIndex Scan

Priority 2

Priority 4

Bypass

Page 21: HStorage-DB: Heterogeneity-aware Data Management to Exploit the Full Capability of Hybrid Storage Systems Tian Luo Rubao Lee Xiaodong Zhang Michael Mesnier

21

Concurrent Queries

• Concurrent queries may access the same object– Causing non-deterministic priority for random requests:

• Because each query may have a different query plan tree

• Solution– A data structure that “aggregates” all concurrent query plan trees– The data structure is updated at the start and end of each query – Each of the concurrent queries will be assigned a QoS policy based

on analytics

Page 22: HStorage-DB: Heterogeneity-aware Data Management to Exploit the Full Capability of Hybrid Storage Systems Tian Luo Rubao Lee Xiaodong Zhang Michael Mesnier

22

Outline

Introduction Design of hStorage-DB Caching priority each I/O requestEvaluation

Page 23: HStorage-DB: Heterogeneity-aware Data Management to Exploit the Full Capability of Hybrid Storage Systems Tian Luo Rubao Lee Xiaodong Zhang Michael Mesnier

23

Experimental setup

• Dual-machine setup (with 10GB Ethernet)– A DBMS: hStorage-DB based on PostgreSQL– A dedicated storage system, with an SSD cache

• Configuration– Xeon, 2-way, quad-core 2.33GHz, 8GB RAM,– 2 Seagate 15.7K rpm HDD– SSD cache: Intel 320 Series 300GB (use 32GB)

• Workload– TPC-H @30SF (46GB with 7 indexes)

Page 24: HStorage-DB: Heterogeneity-aware Data Management to Exploit the Full Capability of Hybrid Storage Systems Tian Luo Rubao Lee Xiaodong Zhang Michael Mesnier

24

Diverse Request Types in TPC-H

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 220%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Tmp. Rand. Seq.

• Most queries are dominated by sequential requests• Queries 2,8,9,20,21 have a large number of random requests• Query 18 has a large number of temporary data requests

Page 25: HStorage-DB: Heterogeneity-aware Data Management to Exploit the Full Capability of Hybrid Storage Systems Tian Luo Rubao Lee Xiaodong Zhang Michael Mesnier

25

No overhead for cache-insensitive queries

1 5 11 190

50

100

150

200

250

300

350

400

317

279

65

252

368

323

68

315317

280

65

254

313

279

62

245

HDD-only LRU hStorage-DB SSD-only

Query name

Exec

ution

tim

e (s

ec)

• Current SSD cannot speed up these queries• Caching may harm performance (LRU)• hStorage-DB does not incur overhead for sequential requests

Page 26: HStorage-DB: Heterogeneity-aware Data Management to Exploit the Full Capability of Hybrid Storage Systems Tian Luo Rubao Lee Xiaodong Zhang Michael Mesnier

26

Working Well for Cache-Effective Queries

0

5000

10000

15000

20000

25000

30000

35000

40000

35865

6216 6120 4986

HDD-only LRU hStorage-DB SSD-only

Query 9

Exec

ution

tim

e (s

ec)

• Random requests benefit from SSD• High locality can be captured by the traditional LRU• hStorage-DB achieves high performance without monitoring efforts

5.77X 5.86X 7.19X

Page 27: HStorage-DB: Heterogeneity-aware Data Management to Exploit the Full Capability of Hybrid Storage Systems Tian Luo Rubao Lee Xiaodong Zhang Michael Mesnier

27

Efficiently Handling Temporary Data Requests

• hStroage-DB:– Temporary data is cached as long as its lifetime, and evicted

immediately at the end of lifetime– Lifetime is hard to detect, if not informed semantically

Query 180

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

89508694

6146 5990

HDD-only LRU hStorage-DB SSD-onlyEx

ecuti

on ti

me

(sec

) 1.49X1.46X

1.03X

Page 28: HStorage-DB: Heterogeneity-aware Data Management to Exploit the Full Capability of Hybrid Storage Systems Tian Luo Rubao Lee Xiaodong Zhang Michael Mesnier

28

Concurrency (Throughput)

9 180

10000

20000

30000

40000

50000

60000

7000066385

25973

22542 23152

8039

12525

7701

1184

HDD-only LRU hStorage-DB SSD-only

Query name

Exec

ution

tim

e (s

ec)

9 180

2000

4000

6000

8000

10000

12000

9529

1495

2952

1316

2946

1092

2101

1034

HDD-only LRU hStorage-DB SSD-only

Query name

Exec

ution

tim

e (s

ec)

Performance in concurrencyPerformance in independent execution

Page 29: HStorage-DB: Heterogeneity-aware Data Management to Exploit the Full Capability of Hybrid Storage Systems Tian Luo Rubao Lee Xiaodong Zhang Michael Mesnier

29

Summary

• DBMS could exploit organized semantic information• DBMS should be hStorage-aware (QoS policies) • A set of rules to determine the QoS policy (caching priority)

for each I/O request• Experiments on hStorage-DB shows its effectiveness

Page 30: HStorage-DB: Heterogeneity-aware Data Management to Exploit the Full Capability of Hybrid Storage Systems Tian Luo Rubao Lee Xiaodong Zhang Michael Mesnier

30

Thank you!

Questions?