1 the 5 minute rule jim gray microsoft research [email protected]/talks kilo10 3 mega10 6...

20
1 The 5 Minute Rule The 5 Minute Rule Jim Gray Jim Gray Microsoft Research Microsoft Research [email protected] [email protected] http://www.Research.Microsoft.com/~Gray/talks http://www.Research.Microsoft.com/~Gray/talks Kilo Kilo 10 10 3 Mega Mega 10 10 6 Giga Giga 10 10 9 Tera Tera 10 10 12 12 today, we are here today, we are here Peta Peta 10 10 15 15 Exa Exa 10 10 18 18

Upload: jayden-mcdonough

Post on 27-Mar-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 The 5 Minute Rule Jim Gray Microsoft Research Gray@Microsoft.comGray/talks Kilo10 3 Mega10 6 Giga10 9 Tera10 12 today,

1

The 5 Minute RuleThe 5 Minute Rule

Jim GrayJim Gray

Microsoft ResearchMicrosoft Research

[email protected]@Microsoft.com

http://www.Research.Microsoft.com/~Gray/talkshttp://www.Research.Microsoft.com/~Gray/talks

KiloKilo 101033

MegaMega 101066

GigaGiga 101099

TeraTera 10101212 today, we are here today, we are here PetaPeta 10101515

ExaExa 10101818

Page 2: 1 The 5 Minute Rule Jim Gray Microsoft Research Gray@Microsoft.comGray/talks Kilo10 3 Mega10 6 Giga10 9 Tera10 12 today,

2

Storage Hierarchy (9 levels)Storage Hierarchy (9 levels)• Cache 1, 2Cache 1, 2

• Main (1, 2, 3 if nUMA).Main (1, 2, 3 if nUMA).

• Disk (1 (cached), 2)Disk (1 (cached), 2)

• Tape (1 (mounted), 2)Tape (1 (mounted), 2)

Page 3: 1 The 5 Minute Rule Jim Gray Microsoft Research Gray@Microsoft.comGray/talks Kilo10 3 Mega10 6 Giga10 9 Tera10 12 today,

3

Meta-Message:Meta-Message: Technology Ratios Are Important Technology Ratios Are ImportantMeta-Message:Meta-Message: Technology Ratios Are Important Technology Ratios Are Important

• If everything gets faster & cheaper If everything gets faster & cheaper

at the same rate at the same rate THEN nothing really changes.THEN nothing really changes.

Things getting MUCH BETTER:Things getting MUCH BETTER:

– communication speed & cost 1,000xcommunication speed & cost 1,000x– processor speed & cost 100xprocessor speed & cost 100x– storage size & cost 100xstorage size & cost 100x

• Things staying about the sameThings staying about the same– speed of light (more or less constant)speed of light (more or less constant)– people (10x more expensive)people (10x more expensive)– storage speed (only 10x better)storage speed (only 10x better)

Page 4: 1 The 5 Minute Rule Jim Gray Microsoft Research Gray@Microsoft.comGray/talks Kilo10 3 Mega10 6 Giga10 9 Tera10 12 today,

4

Today’s Storage Hierarchy : Today’s Storage Hierarchy : Speed & Capacity vs Cost TradeoffsSpeed & Capacity vs Cost TradeoffsToday’s Storage Hierarchy : Today’s Storage Hierarchy : Speed & Capacity vs Cost TradeoffsSpeed & Capacity vs Cost Tradeoffs

1015

1012

109

106

103

Typ

ical

Sys

tem

(by

tes)

Size vs Speed

Access Time (seconds)10-9 10-6 10-3 10 0 10 3

Cache

Main

Secondary

Disc

Nearline Tape Offline

Tape

Online Tape

104

102

100

10-2

10-4

$/M

B

Price vs Speed

Access Time (seconds)10-9 10-6 10-3 10 0 10 3

Cache

MainSecondary

DiscNearline

TapeOffline Tape

Online Tape

Page 5: 1 The 5 Minute Rule Jim Gray Microsoft Research Gray@Microsoft.comGray/talks Kilo10 3 Mega10 6 Giga10 9 Tera10 12 today,

5

Storage Ratios ChangedStorage Ratios Changed

• 10x better access time10x better access time

• 10x more bandwidth10x more bandwidth

• 4,000x lower media price4,000x lower media price

• DRAM/DISK 100:1 to 10:10 to 50:1DRAM/DISK 100:1 to 10:10 to 50:1

Disk Performance vs Time

1

10

100

1980 1990 2000

Year

acce

ss t

ime

(ms)

1

10

100

ban

dw

idth

(M

B/s

)

Disk Performance vs Time(accesses/ second & Capacity)

1

10

100

1980 1990 2000

Year

Acc

esse

s p

er

Sec

on

d

0.1

1

10

Dis

k C

apac

kty

(GB

)

Storage Price vs Time

0.01

0.1

1

10

100

1000

10000

1980 1990 2000

Year

$/M

B

Page 6: 1 The 5 Minute Rule Jim Gray Microsoft Research Gray@Microsoft.comGray/talks Kilo10 3 Mega10 6 Giga10 9 Tera10 12 today,

6

Thesis: Performance =Storage AccessesThesis: Performance =Storage Accesses not Instructions Executed not Instructions ExecutedThesis: Performance =Storage AccessesThesis: Performance =Storage Accesses not Instructions Executed not Instructions Executed• In the “old days” we counted instructions and IO’sIn the “old days” we counted instructions and IO’s

• Now we count memory referencesNow we count memory references

• Processors wait most of the timeProcessors wait most of the time

SortDisc Wait

Where the time goes: clock ticks used by AlphaSort Components

SortDisc WaitOS

Memory Wait

D-Cache Miss

I-Cache MissB-Cache

Data Miss

Page 7: 1 The 5 Minute Rule Jim Gray Microsoft Research Gray@Microsoft.comGray/talks Kilo10 3 Mega10 6 Giga10 9 Tera10 12 today,

7

The Pico ProcessorThe Pico ProcessorThe Pico ProcessorThe Pico Processor

1 M SPECmarks

106 clocks/ fault to bulk ram

Event-horizon on chip.

VM reincarnated

Multi-program cache

Terror Bytes!

10 microsecond ram

10 millisecond disc

10 second tape archive 100 petabyte

100 terabyte

1 terabyte

Pico Processor

10 pico-second ram1 MM

3

megabyte

10 nano-second ram 10 gigabyte

Page 8: 1 The 5 Minute Rule Jim Gray Microsoft Research Gray@Microsoft.comGray/talks Kilo10 3 Mega10 6 Giga10 9 Tera10 12 today,

8

Storage Latency: How Far Storage Latency: How Far Away is the Data?Away is the Data?Storage Latency: How Far Storage Latency: How Far Away is the Data?Away is the Data?

RegistersOn Chip CacheOn Board Cache

Memory

Disk

12

10

100

Tape /Optical Robot

10 9

10 6

Sacramento

This CampusThis Room

My Head

10 min

1.5 hr

2 Years

1 min

Pluto

2,000 YearsAndromeda

Page 9: 1 The 5 Minute Rule Jim Gray Microsoft Research Gray@Microsoft.comGray/talks Kilo10 3 Mega10 6 Giga10 9 Tera10 12 today,

9

The Five Minute RuleThe Five Minute Rule

• Trade DRAM for Disk AccessesTrade DRAM for Disk Accesses

• Cost of an access (DriveCost / Access_per_second)Cost of an access (DriveCost / Access_per_second)

• Cost of a DRAM page ( $/MB / pages_per_MB)Cost of a DRAM page ( $/MB / pages_per_MB)

• Break even has two terms:Break even has two terms:

• Technology term and an Economic termTechnology term and an Economic term

• Grew page size to compensate for changing ratios.Grew page size to compensate for changing ratios.

• Still at 5 minute for random, 1 minute sequentialStill at 5 minute for random, 1 minute sequential 1

ofDRAMPricePerMB

skDrivePricePerDi

skecondPerDiAccessPerS

ofDRAMPagesPerMBtervaleferenceInBreakEvenR

1ofDRAMPricePerMB

skDrivePricePerDi

skecondPerDiAccessPerS

ofDRAMPagesPerMBtervaleferenceInBreakEvenR

Page 10: 1 The 5 Minute Rule Jim Gray Microsoft Research Gray@Microsoft.comGray/talks Kilo10 3 Mega10 6 Giga10 9 Tera10 12 today,

10

Shows Best Page Index Page Size ~16KBShows Best Page Index Page Size ~16KB

Index Page Utility vs Page Size and Index Elemet Size

0.40

0.50

0.60

0.70

0.80

0.90

1.00

Page Size (KB)

Uti

lity

16 B 0.64 0.72 0.78 0.82 0.79 0.69 0.54

32 B 0.54 0.62 0.69 0.73 0.71 0.63 0.50

64 B 0.44 0.53 0.60 0.64 0.64 0.57 0.45

128 B 0.34 0.43 0.51 0.56 0.56 0.51 0.41

2 4 8 16 32 64 128

16 byte entries

32 byte

64 byte

128 byte

Index Page Utility vs Page Size and Disk Performance

0.40

0.50

0.60

0.70

0.80

0.90

1.00

Page Size (KB)

Uti

lity

40 MB/s 0.65 0.74 0.83 0.91 0.97 0.99 0.94

10 MB/s 0.64 0.72 0.78 0.82 0.79 0.69 0.54

5 MB/s 0.62 0.69 0.73 0.71 0.63 0.50 0.34

3 MB/s 0.51 0.56 0.58 0.54 0.46 0.34 0.22

1 MB/s 0.40 0.44 0.44 0.41 0.33 0.24 0.16

2 4 8 16 32 64 128

10 MB/s

5 MB/s

3 MB/s

1MB/s

Page 11: 1 The 5 Minute Rule Jim Gray Microsoft Research Gray@Microsoft.comGray/talks Kilo10 3 Mega10 6 Giga10 9 Tera10 12 today,

11

Standard Storage MetricsStandard Storage MetricsStandard Storage MetricsStandard Storage Metrics• Capacity: Capacity:

– RAM: RAM: MB and $/MB: today at 10MB & 100$/MBMB and $/MB: today at 10MB & 100$/MB– Disk:Disk: GB and $/GB: today at 10 GB and 200$/GBGB and $/GB: today at 10 GB and 200$/GB– Tape: Tape: TB and $/TB: today at .1TB and 25k$/TB TB and $/TB: today at .1TB and 25k$/TB

(nearline)(nearline)

• Access time (latency)Access time (latency)– RAM:RAM: 100 ns100 ns– Disk: Disk: 10 ms 10 ms– Tape: 30 second pick, 30 second position Tape: 30 second pick, 30 second position

• Transfer rateTransfer rate– RAM:RAM: 1 GB/s 1 GB/s– Disk:Disk: 5 MB/s - - - Arrays can go to 1GB/s 5 MB/s - - - Arrays can go to 1GB/s– Tape: 5 MB/s - - - striping is problematicTape: 5 MB/s - - - striping is problematic

Page 12: 1 The 5 Minute Rule Jim Gray Microsoft Research Gray@Microsoft.comGray/talks Kilo10 3 Mega10 6 Giga10 9 Tera10 12 today,

12

New Storage Metrics: New Storage Metrics: Kaps, Maps, SCAN?Kaps, Maps, SCAN?New Storage Metrics: New Storage Metrics: Kaps, Maps, SCAN?Kaps, Maps, SCAN?

• Kaps: How many kilobyte objects served per secondKaps: How many kilobyte objects served per second– The file server, transaction processing metricThe file server, transaction processing metric– This is the OLD metric.This is the OLD metric.

• Maps: How many megabyte objects served per secondMaps: How many megabyte objects served per second– The Multi-Media metricThe Multi-Media metric

• SCAN: How long to scan all the dataSCAN: How long to scan all the data– the data mining and utility metricthe data mining and utility metric

• AndAnd– Kaps/$, Maps/$, TBscan/$Kaps/$, Maps/$, TBscan/$

Page 13: 1 The 5 Minute Rule Jim Gray Microsoft Research Gray@Microsoft.comGray/talks Kilo10 3 Mega10 6 Giga10 9 Tera10 12 today,

13

For the Record For the Record (good 1998 devices packaged in system(good 1998 devices packaged in system

http://www.tpc.org/results/individual_results/Dell/dell.6100.9801.es.pdfhttp://www.tpc.org/results/individual_results/Dell/dell.6100.9801.es.pdf))DRAM DISK TAPE robot

Unit capacity (GB) 1 9 35Unit price $ 5000 900 10000

$/GB 5000 100 20Latency (s) 1.E-7 1.E-2 3.E+1

Bandwidth (Mbps) 500 5 5Kaps 5.E+5 1.E+2 3.E-2Maps 5.E+2 4.76 3.E-2

Scan time (s/TB) 2 1800 98000$/Kaps 1.E-10 1.E-7 3.E-3$/Maps 1.E-7 2.E-6 3.E-3

$/TBscan $0.11 $2 $296

X 14

Page 14: 1 The 5 Minute Rule Jim Gray Microsoft Research Gray@Microsoft.comGray/talks Kilo10 3 Mega10 6 Giga10 9 Tera10 12 today,

14

How To Get Lots of Maps, SCANsHow To Get Lots of Maps, SCANsHow To Get Lots of Maps, SCANsHow To Get Lots of Maps, SCANs• parallelism: use many little devices in parallelparallelism: use many little devices in parallel

• Beware of the media mythBeware of the media myth

• Beware of the access time mythBeware of the access time myth

1 Terabyte

10 MB/s

At 10 MB/s: 1.2 days to scan

1 Terabyte

1,000 x parallel: 100 seconds SCAN.

Parallelism: divide a big problem into many smaller ones to be solved in parallel.

Page 15: 1 The 5 Minute Rule Jim Gray Microsoft Research Gray@Microsoft.comGray/talks Kilo10 3 Mega10 6 Giga10 9 Tera10 12 today,

15

The Disk Farm On a CardThe Disk Farm On a CardThe Disk Farm On a CardThe Disk Farm On a CardThe 100GB disc cardThe 100GB disc cardAn array of discsAn array of discsCan be used asCan be used as 100 discs100 discs 1 striped disc1 striped disc 10 Fault Tolerant discs10 Fault Tolerant discs ....etc....etcLOTS of accesses/secondLOTS of accesses/second bandwidthbandwidth

14"

Life is cheap, its the accessories that cost ya.

Processors are cheap, it’s the peripherals that cost ya (a 10k$ disc card).

Page 16: 1 The 5 Minute Rule Jim Gray Microsoft Research Gray@Microsoft.comGray/talks Kilo10 3 Mega10 6 Giga10 9 Tera10 12 today,

16

Tape Farms for Tertiary StorageTape Farms for Tertiary StorageNot Mainframe SilosNot Mainframe SilosTape Farms for Tertiary StorageTape Farms for Tertiary StorageNot Mainframe SilosNot Mainframe Silos

Scan in 27 hours.many independent tape robots(like a disc farm)

10K$ robot 14 tapes500 GB 5 MB/s 20$/GB 30 Maps

100 robots

50TB 50$/GB 3K Maps

27 hr Scan

1M$

Page 17: 1 The 5 Minute Rule Jim Gray Microsoft Research Gray@Microsoft.comGray/talks Kilo10 3 Mega10 6 Giga10 9 Tera10 12 today,

17

0.01

0.1

1

10

100

1,000

10,000

100,000

1,000,000

1000 x Disc Farm STC Tape Robot 6,000 tapes, 8 readers

100x DLT Tape Farm

GB/K$

Maps

SCANS/Day

Kaps

The Metrics: The Metrics: Disk and Tape Farms Win Disk and Tape Farms Win The Metrics: The Metrics: Disk and Tape Farms Win Disk and Tape Farms Win

Data Motel:Data checks in, but it never checks out

Page 18: 1 The 5 Minute Rule Jim Gray Microsoft Research Gray@Microsoft.comGray/talks Kilo10 3 Mega10 6 Giga10 9 Tera10 12 today,

18

Tape & Optical: Tape & Optical: Beware of the Beware of the Media MythMedia MythTape & Optical: Tape & Optical: Beware of the Beware of the Media MythMedia Myth

Optical is cheap: 200 $/platter 2 GB/platter => 100$/GB (2x cheaper than disc)

Tape is cheap: 30 $/tape 20 GB/tape => 1.5 $/GB (100x cheaper than disc).

Page 19: 1 The 5 Minute Rule Jim Gray Microsoft Research Gray@Microsoft.comGray/talks Kilo10 3 Mega10 6 Giga10 9 Tera10 12 today,

19

Tape & Optical Tape & Optical RealityReality: : Media is 10% of System CostMedia is 10% of System CostTape & Optical Tape & Optical RealityReality: : Media is 10% of System CostMedia is 10% of System Cost

Tape needs a robot (10 k$ ... 3 m$ ) 10 ... 1000 tapes (at 20GB each) => 20$/GB ... 200$/GB

(1x…10x cheaper than disc)

Optical needs a robot (100 k$ ) 100 platters = 200GB ( TODAY ) => 400 $/GB

( more expensive than mag disc ) Robots have poor access times Not good for Library of Congress (25TB) Data motel: data checks in but it never checks out!

Page 20: 1 The 5 Minute Rule Jim Gray Microsoft Research Gray@Microsoft.comGray/talks Kilo10 3 Mega10 6 Giga10 9 Tera10 12 today,

20

The Access Time MythThe Access Time MythThe Access Time MythThe Access Time MythThe Myth: seek or pick time dominatesThe Myth: seek or pick time dominatesThe reality: (1) Queuing dominatesThe reality: (1) Queuing dominates (2) Transfer dominates BLOBs(2) Transfer dominates BLOBs (3) Disk seeks often short(3) Disk seeks often shortImplication: many cheap servers Implication: many cheap servers

better than one fast expensive server better than one fast expensive server– shorter queuesshorter queues– parallel transferparallel transfer– lower cost/access and cost/bytelower cost/access and cost/byte

This is now obvious for disk arraysThis is now obvious for disk arraysThis will be obvious for tape arraysThis will be obvious for tape arrays

Seek

Rotate

Transfer

Seek

Rotate

Transfer

Wait