1 cs222: principles of database management fall 2010 professor chen li department of computer...
TRANSCRIPT
![Page 1: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d825503460f94a67ecb/html5/thumbnails/1.jpg)
1
CS222: Principles of Database Management Fall 2010
Professor Chen Li
Department of Computer Science
University of California, Irvine
Notes 01
![Page 2: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d825503460f94a67ecb/html5/thumbnails/2.jpg)
CS222 Notes 01 2
Topic 1: Data Storage and Record-Oriented File Systems • Data Storage
– Storage hierarchy– Disks
• Record-oriented file systems
![Page 3: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d825503460f94a67ecb/html5/thumbnails/3.jpg)
CS222 Notes 01 3
Storage hierarchy
CPU
Memory Controller
Disk/tape
......
cache
![Page 4: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d825503460f94a67ecb/html5/thumbnails/4.jpg)
CS222 Notes 01 4
Storage Media• Cache: inside/outside CPU
– CPU: becoming faster and faster (>=3 GHz now)
• Main Memory– costs $100/Mbyte -- reduces every year
– ‘volatile’ -- does not survive system failures
– random I/O very fast
– data can be processed by CPU directly
– capacity limited to orders of magnitude lower than what database needs.
![Page 5: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d825503460f94a67ecb/html5/thumbnails/5.jpg)
CS222 Notes 01 5
Storage Media: secondary storage• Disks (floppy disks, hard disks, CD)
– Cheap, and price reduces each year
– Non-volatile (except when disk crashes)
– Random I/O slow
– Data needs to be transferred to memory to be processed by CPU
• Tape– Cheaper but slower than disks.
– Sequential I/O devices.
– Handy for backups, sometimes for archival.
![Page 6: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d825503460f94a67ecb/html5/thumbnails/6.jpg)
CS222 Notes 01 6
Databases and Storage Devices• Due to capacity, cost, volatility factors, DBs usually stored in disks.
• Data brought to main memory for processing from disks
• There are many ways to interface memory with disk resident data
• E.g., virtual memory:– VM size limited to max address generated by CPU
– Existing VM does not support durability
• File system provides a more powerful mapping between memory and disk storage
• A bunch of tricks used ensure that high latency of secondary storage does not impact application response time and system throughput– access disks asynchronously with active applications
– prefetch data before application needs it
– intelligent caching techniques
![Page 7: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d825503460f94a67ecb/html5/thumbnails/7.jpg)
CS222 Notes 01 7
Disk Storages -- Outline
• Disk mechanics
• Access times (random, sequential)
• Examples
• Optimization
• Other topics
![Page 8: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d825503460f94a67ecb/html5/thumbnails/8.jpg)
CS222 Notes 01 8
Terms: Spindle, Platters, Magnetic surfaces, Disk head, Disk controller, …
…
Disk mechanics
![Page 9: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d825503460f94a67ecb/html5/thumbnails/9.jpg)
CS222 Notes 01 9
Top Views
TracksSectorsGaps
Cylinders
![Page 10: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d825503460f94a67ecb/html5/thumbnails/10.jpg)
CS222 Notes 01 10
Characteristics
• Diameter: 1 inch -- 15 inches• Cylinders: 100 -- 2000• Surfaces: 1 (CDs) -- many• Tracks/Cyl: 2 (floppies) -- 30• Sector Size: 512B -- 50K• Capacity: 360 KB (old floppy) --
>=200GB
![Page 11: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d825503460f94a67ecb/html5/thumbnails/11.jpg)
CS222 Notes 01 11
“Block”
• Corresponds to 1 or multiple sectors
• Its address consists of:– Physical device # (in case of multi disks)– Cylinder #– Surface #– Sector #
![Page 12: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d825503460f94a67ecb/html5/thumbnails/12.jpg)
CS222 Notes 01 12
block xin memory
I wantblock X
Random disk access time
Time = Seek Time + Rotational Delay + Transfer Time + Other time 1 time 2 time 3 time 4
![Page 13: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d825503460f94a67ecb/html5/thumbnails/13.jpg)
CS222 Notes 01 13
3 or 5x
x
1 N
Cylinders Traveled
Time
Time 1: seek time
![Page 14: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d825503460f94a67ecb/html5/thumbnails/14.jpg)
CS222 Notes 01 14
Average Random Seek Time
SeekTime(Track i Track j)
S =
N(N-1)
N N
i=1 j=1ji
• Assumptions: – Each track has the same probability to be accessed.
– Each track has the probability to jump to another track.
• Typical S value: 10 ms – 50 ms
![Page 15: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d825503460f94a67ecb/html5/thumbnails/15.jpg)
CS222 Notes 01 15
Time 2: Rotational Delay
Initial Head
Block Wanted
• Average delay: – R = 1/2 revolution
– If disk speed 3600 RPM, then R = 8.33 ms
![Page 16: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d825503460f94a67ecb/html5/thumbnails/16.jpg)
CS222 Notes 01 16
Complication
May have to wait for start of track before we can read desired block
Head Here
Block We Want
Track Start
![Page 17: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d825503460f94a67ecb/html5/thumbnails/17.jpg)
CS222 Notes 01 17
Time 3: Transfer time
• Transfer time: block size/transfer rate
• Typical transfer rate:1 3 MB/sec
![Page 18: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d825503460f94a67ecb/html5/thumbnails/18.jpg)
CS222 Notes 01 18
Time 4: Other Delays
• CPU time to issue I/O
• Contention for controller
• Contention for bus, memory, etc.
Typical value: “0”
![Page 19: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d825503460f94a67ecb/html5/thumbnails/19.jpg)
CS222 Notes 01 19
• Reading “Next” block
• Additional time = Block size/transfer rate
• Other time negligible:– skip gaps– once in a while, next cylinder
Sequential disk access
![Page 20: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d825503460f94a67ecb/html5/thumbnails/20.jpg)
CS222 Notes 01 20
• Average sequential IO time much smaller than random IO time
– Random I/O: 20 ms (most time on the initial delay)
– Sequential I/O: 1 ms.
• When designing a structure, try to use sequential IOs.– Data layout on disk becomes critical
– Do not just look at the number of IOs
Random I/O vs Sequential I/O
![Page 21: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d825503460f94a67ecb/html5/thumbnails/21.jpg)
CS222 Notes 01 21
Modify blocks
• Read block
• Modify in memory
• Write block
• Verify– Optional– If so, the access time needs to add:
full rotation + block size/transfer rate
![Page 22: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d825503460f94a67ecb/html5/thumbnails/22.jpg)
CS222 Notes 01 22
Disk Specs:• 3.5 in diameter• 3600 RPM• 1 surface• Usable capacity: 16 MB = 224
• # of cylinders: 128 = 27
• 1 block = 1 sector = 1 KB• 10% overhead between blocks (gaps)• seek time:
– average = 25 ms. – adjacent cyl = 5 ms.
Example 1
![Page 23: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d825503460f94a67ecb/html5/thumbnails/23.jpg)
CS222 Notes 01 23
• bytes/cyl = 224/27 = 217 = 128 KB
• blocks/cyl = 128 KB / 1 KB = 128
Cylinder
![Page 24: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d825503460f94a67ecb/html5/thumbnails/24.jpg)
CS222 Notes 01 24
One track ...
Track
• Speed: – 3600 RPM 60 revolutions / sec 16.66 ms/rev
• In each revolution:– Time over useful data: 16.66 * 0.9=14.99 ms
– Time over gaps: 16.66 * 0.1 = 1.66 ms
– Transfer time 1 block = 14.99/128 = 0.117 ms
– Trans. time 1 block + gap = 16.66/128 = 0.13ms
![Page 25: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d825503460f94a67ecb/html5/thumbnails/25.jpg)
CS222 Notes 01 25
Bandwidths
• Burst bandwidth:– No time on gaps (10%)– 1 KB in 0.117 ms.
BB=1KB / 0.117ms = 8.54 KB/ms = 8.33MB/sec
• Sustained bandwidth:– Including time on gaps– 128 KB in 16.66 ms.
SB=128KB /16.66ms = 7.68 KB/ms = 7.50 MB/sec
![Page 26: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d825503460f94a67ecb/html5/thumbnails/26.jpg)
CS222 Notes 01 26
Time of random block access
• Time to read one random block T1• T1 = seek time + rotational delay + Transfer time
– Assume we do not have to wait for track start
– Seek time = 25ms
– Rotational delay = 16.66ms /2 = 8.33 ms
– Transfer time = .117 ms
– Total = 25 ms + 8.33 ms + .117 ms= 33.45 ms
• Most of the time is on “seek time” and “rotational delay”!
![Page 27: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d825503460f94a67ecb/html5/thumbnails/27.jpg)
CS222 Notes 01 27
Larger blocks?
• Suppose OS deals with 4 KB blocks
• We need to include the time of reading 1 block (without gap) and 3 blocks (with gaps)
• T4 = 25ms + (16.66ms/2) + (.117) x 1 + (.130) * 3 = 33.83 ms
• Compare to T1 = 33.45 ms – not much difference– That’s why we want to use sequential IOs!
...1 2 3 4
1 block
![Page 28: 1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01](https://reader035.vdocument.in/reader035/viewer/2022081514/56649d825503460f94a67ecb/html5/thumbnails/28.jpg)
CS222 Notes 01 28
Reading a track
• TT = Time to read a full track (start at any block)
• TT = 25ms (seek time)
+ (0.13ms / 2) (rotational delay, half of a block)
+ 16.66 ms (transfer time)
= 41.73 ms• The time could be a bit less by ignoring the last gap.• Question: what if we need to wait for the start of a
track?