1 the 5 minute rule jim gray microsoft research [email protected]/talks kilo10 3 mega10 6...
TRANSCRIPT
1
The 5 Minute RuleThe 5 Minute Rule
Jim GrayJim Gray
Microsoft ResearchMicrosoft Research
[email protected]@Microsoft.com
http://www.Research.Microsoft.com/~Gray/talkshttp://www.Research.Microsoft.com/~Gray/talks
KiloKilo 101033
MegaMega 101066
GigaGiga 101099
TeraTera 10101212 today, we are here today, we are here PetaPeta 10101515
ExaExa 10101818
2
Storage Hierarchy (9 levels)Storage Hierarchy (9 levels)• Cache 1, 2Cache 1, 2
• Main (1, 2, 3 if nUMA).Main (1, 2, 3 if nUMA).
• Disk (1 (cached), 2)Disk (1 (cached), 2)
• Tape (1 (mounted), 2)Tape (1 (mounted), 2)
3
Meta-Message:Meta-Message: Technology Ratios Are Important Technology Ratios Are ImportantMeta-Message:Meta-Message: Technology Ratios Are Important Technology Ratios Are Important
• If everything gets faster & cheaper If everything gets faster & cheaper
at the same rate at the same rate THEN nothing really changes.THEN nothing really changes.
•
Things getting MUCH BETTER:Things getting MUCH BETTER:
– communication speed & cost 1,000xcommunication speed & cost 1,000x– processor speed & cost 100xprocessor speed & cost 100x– storage size & cost 100xstorage size & cost 100x
• Things staying about the sameThings staying about the same– speed of light (more or less constant)speed of light (more or less constant)– people (10x more expensive)people (10x more expensive)– storage speed (only 10x better)storage speed (only 10x better)
4
Today’s Storage Hierarchy : Today’s Storage Hierarchy : Speed & Capacity vs Cost TradeoffsSpeed & Capacity vs Cost TradeoffsToday’s Storage Hierarchy : Today’s Storage Hierarchy : Speed & Capacity vs Cost TradeoffsSpeed & Capacity vs Cost Tradeoffs
1015
1012
109
106
103
Typ
ical
Sys
tem
(by
tes)
Size vs Speed
Access Time (seconds)10-9 10-6 10-3 10 0 10 3
Cache
Main
Secondary
Disc
Nearline Tape Offline
Tape
Online Tape
104
102
100
10-2
10-4
$/M
B
Price vs Speed
Access Time (seconds)10-9 10-6 10-3 10 0 10 3
Cache
MainSecondary
DiscNearline
TapeOffline Tape
Online Tape
5
Storage Ratios ChangedStorage Ratios Changed
• 10x better access time10x better access time
• 10x more bandwidth10x more bandwidth
• 4,000x lower media price4,000x lower media price
• DRAM/DISK 100:1 to 10:10 to 50:1DRAM/DISK 100:1 to 10:10 to 50:1
Disk Performance vs Time
1
10
100
1980 1990 2000
Year
acce
ss t
ime
(ms)
1
10
100
ban
dw
idth
(M
B/s
)
Disk Performance vs Time(accesses/ second & Capacity)
1
10
100
1980 1990 2000
Year
Acc
esse
s p
er
Sec
on
d
0.1
1
10
Dis
k C
apac
kty
(GB
)
Storage Price vs Time
0.01
0.1
1
10
100
1000
10000
1980 1990 2000
Year
$/M
B
6
Thesis: Performance =Storage AccessesThesis: Performance =Storage Accesses not Instructions Executed not Instructions ExecutedThesis: Performance =Storage AccessesThesis: Performance =Storage Accesses not Instructions Executed not Instructions Executed• In the “old days” we counted instructions and IO’sIn the “old days” we counted instructions and IO’s
• Now we count memory referencesNow we count memory references
• Processors wait most of the timeProcessors wait most of the time
SortDisc Wait
Where the time goes: clock ticks used by AlphaSort Components
SortDisc WaitOS
Memory Wait
D-Cache Miss
I-Cache MissB-Cache
Data Miss
7
The Pico ProcessorThe Pico ProcessorThe Pico ProcessorThe Pico Processor
1 M SPECmarks
106 clocks/ fault to bulk ram
Event-horizon on chip.
VM reincarnated
Multi-program cache
Terror Bytes!
10 microsecond ram
10 millisecond disc
10 second tape archive 100 petabyte
100 terabyte
1 terabyte
Pico Processor
10 pico-second ram1 MM
3
megabyte
10 nano-second ram 10 gigabyte
8
Storage Latency: How Far Storage Latency: How Far Away is the Data?Away is the Data?Storage Latency: How Far Storage Latency: How Far Away is the Data?Away is the Data?
RegistersOn Chip CacheOn Board Cache
Memory
Disk
12
10
100
Tape /Optical Robot
10 9
10 6
Sacramento
This CampusThis Room
My Head
10 min
1.5 hr
2 Years
1 min
Pluto
2,000 YearsAndromeda
9
The Five Minute RuleThe Five Minute Rule
• Trade DRAM for Disk AccessesTrade DRAM for Disk Accesses
• Cost of an access (DriveCost / Access_per_second)Cost of an access (DriveCost / Access_per_second)
• Cost of a DRAM page ( $/MB / pages_per_MB)Cost of a DRAM page ( $/MB / pages_per_MB)
• Break even has two terms:Break even has two terms:
• Technology term and an Economic termTechnology term and an Economic term
• Grew page size to compensate for changing ratios.Grew page size to compensate for changing ratios.
• Still at 5 minute for random, 1 minute sequentialStill at 5 minute for random, 1 minute sequential 1
ofDRAMPricePerMB
skDrivePricePerDi
skecondPerDiAccessPerS
ofDRAMPagesPerMBtervaleferenceInBreakEvenR
1ofDRAMPricePerMB
skDrivePricePerDi
skecondPerDiAccessPerS
ofDRAMPagesPerMBtervaleferenceInBreakEvenR
10
Shows Best Page Index Page Size ~16KBShows Best Page Index Page Size ~16KB
Index Page Utility vs Page Size and Index Elemet Size
0.40
0.50
0.60
0.70
0.80
0.90
1.00
Page Size (KB)
Uti
lity
16 B 0.64 0.72 0.78 0.82 0.79 0.69 0.54
32 B 0.54 0.62 0.69 0.73 0.71 0.63 0.50
64 B 0.44 0.53 0.60 0.64 0.64 0.57 0.45
128 B 0.34 0.43 0.51 0.56 0.56 0.51 0.41
2 4 8 16 32 64 128
16 byte entries
32 byte
64 byte
128 byte
Index Page Utility vs Page Size and Disk Performance
0.40
0.50
0.60
0.70
0.80
0.90
1.00
Page Size (KB)
Uti
lity
40 MB/s 0.65 0.74 0.83 0.91 0.97 0.99 0.94
10 MB/s 0.64 0.72 0.78 0.82 0.79 0.69 0.54
5 MB/s 0.62 0.69 0.73 0.71 0.63 0.50 0.34
3 MB/s 0.51 0.56 0.58 0.54 0.46 0.34 0.22
1 MB/s 0.40 0.44 0.44 0.41 0.33 0.24 0.16
2 4 8 16 32 64 128
10 MB/s
5 MB/s
3 MB/s
1MB/s
11
Standard Storage MetricsStandard Storage MetricsStandard Storage MetricsStandard Storage Metrics• Capacity: Capacity:
– RAM: RAM: MB and $/MB: today at 10MB & 100$/MBMB and $/MB: today at 10MB & 100$/MB– Disk:Disk: GB and $/GB: today at 10 GB and 200$/GBGB and $/GB: today at 10 GB and 200$/GB– Tape: Tape: TB and $/TB: today at .1TB and 25k$/TB TB and $/TB: today at .1TB and 25k$/TB
(nearline)(nearline)
• Access time (latency)Access time (latency)– RAM:RAM: 100 ns100 ns– Disk: Disk: 10 ms 10 ms– Tape: 30 second pick, 30 second position Tape: 30 second pick, 30 second position
• Transfer rateTransfer rate– RAM:RAM: 1 GB/s 1 GB/s– Disk:Disk: 5 MB/s - - - Arrays can go to 1GB/s 5 MB/s - - - Arrays can go to 1GB/s– Tape: 5 MB/s - - - striping is problematicTape: 5 MB/s - - - striping is problematic
12
New Storage Metrics: New Storage Metrics: Kaps, Maps, SCAN?Kaps, Maps, SCAN?New Storage Metrics: New Storage Metrics: Kaps, Maps, SCAN?Kaps, Maps, SCAN?
• Kaps: How many kilobyte objects served per secondKaps: How many kilobyte objects served per second– The file server, transaction processing metricThe file server, transaction processing metric– This is the OLD metric.This is the OLD metric.
• Maps: How many megabyte objects served per secondMaps: How many megabyte objects served per second– The Multi-Media metricThe Multi-Media metric
• SCAN: How long to scan all the dataSCAN: How long to scan all the data– the data mining and utility metricthe data mining and utility metric
• AndAnd– Kaps/$, Maps/$, TBscan/$Kaps/$, Maps/$, TBscan/$
13
For the Record For the Record (good 1998 devices packaged in system(good 1998 devices packaged in system
http://www.tpc.org/results/individual_results/Dell/dell.6100.9801.es.pdfhttp://www.tpc.org/results/individual_results/Dell/dell.6100.9801.es.pdf))DRAM DISK TAPE robot
Unit capacity (GB) 1 9 35Unit price $ 5000 900 10000
$/GB 5000 100 20Latency (s) 1.E-7 1.E-2 3.E+1
Bandwidth (Mbps) 500 5 5Kaps 5.E+5 1.E+2 3.E-2Maps 5.E+2 4.76 3.E-2
Scan time (s/TB) 2 1800 98000$/Kaps 1.E-10 1.E-7 3.E-3$/Maps 1.E-7 2.E-6 3.E-3
$/TBscan $0.11 $2 $296
X 14
14
How To Get Lots of Maps, SCANsHow To Get Lots of Maps, SCANsHow To Get Lots of Maps, SCANsHow To Get Lots of Maps, SCANs• parallelism: use many little devices in parallelparallelism: use many little devices in parallel
• Beware of the media mythBeware of the media myth
• Beware of the access time mythBeware of the access time myth
1 Terabyte
10 MB/s
At 10 MB/s: 1.2 days to scan
1 Terabyte
1,000 x parallel: 100 seconds SCAN.
Parallelism: divide a big problem into many smaller ones to be solved in parallel.
15
The Disk Farm On a CardThe Disk Farm On a CardThe Disk Farm On a CardThe Disk Farm On a CardThe 100GB disc cardThe 100GB disc cardAn array of discsAn array of discsCan be used asCan be used as 100 discs100 discs 1 striped disc1 striped disc 10 Fault Tolerant discs10 Fault Tolerant discs ....etc....etcLOTS of accesses/secondLOTS of accesses/second bandwidthbandwidth
14"
Life is cheap, its the accessories that cost ya.
Processors are cheap, it’s the peripherals that cost ya (a 10k$ disc card).
16
Tape Farms for Tertiary StorageTape Farms for Tertiary StorageNot Mainframe SilosNot Mainframe SilosTape Farms for Tertiary StorageTape Farms for Tertiary StorageNot Mainframe SilosNot Mainframe Silos
Scan in 27 hours.many independent tape robots(like a disc farm)
10K$ robot 14 tapes500 GB 5 MB/s 20$/GB 30 Maps
100 robots
50TB 50$/GB 3K Maps
27 hr Scan
1M$
17
0.01
0.1
1
10
100
1,000
10,000
100,000
1,000,000
1000 x Disc Farm STC Tape Robot 6,000 tapes, 8 readers
100x DLT Tape Farm
GB/K$
Maps
SCANS/Day
Kaps
The Metrics: The Metrics: Disk and Tape Farms Win Disk and Tape Farms Win The Metrics: The Metrics: Disk and Tape Farms Win Disk and Tape Farms Win
Data Motel:Data checks in, but it never checks out
18
Tape & Optical: Tape & Optical: Beware of the Beware of the Media MythMedia MythTape & Optical: Tape & Optical: Beware of the Beware of the Media MythMedia Myth
Optical is cheap: 200 $/platter 2 GB/platter => 100$/GB (2x cheaper than disc)
Tape is cheap: 30 $/tape 20 GB/tape => 1.5 $/GB (100x cheaper than disc).
19
Tape & Optical Tape & Optical RealityReality: : Media is 10% of System CostMedia is 10% of System CostTape & Optical Tape & Optical RealityReality: : Media is 10% of System CostMedia is 10% of System Cost
Tape needs a robot (10 k$ ... 3 m$ ) 10 ... 1000 tapes (at 20GB each) => 20$/GB ... 200$/GB
(1x…10x cheaper than disc)
Optical needs a robot (100 k$ ) 100 platters = 200GB ( TODAY ) => 400 $/GB
( more expensive than mag disc ) Robots have poor access times Not good for Library of Congress (25TB) Data motel: data checks in but it never checks out!
20
The Access Time MythThe Access Time MythThe Access Time MythThe Access Time MythThe Myth: seek or pick time dominatesThe Myth: seek or pick time dominatesThe reality: (1) Queuing dominatesThe reality: (1) Queuing dominates (2) Transfer dominates BLOBs(2) Transfer dominates BLOBs (3) Disk seeks often short(3) Disk seeks often shortImplication: many cheap servers Implication: many cheap servers
better than one fast expensive server better than one fast expensive server– shorter queuesshorter queues– parallel transferparallel transfer– lower cost/access and cost/bytelower cost/access and cost/byte
This is now obvious for disk arraysThis is now obvious for disk arraysThis will be obvious for tape arraysThis will be obvious for tape arrays
Seek
Rotate
Transfer
Seek
Rotate
Transfer
Wait