dbmss on a modern processor: where does time go? by a. ailamaki, d.j. dewitt, m.d. hill, and d. wood...
TRANSCRIPT
![Page 1: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept](https://reader035.vdocument.in/reader035/viewer/2022070407/56649e415503460f94b33944/html5/thumbnails/1.jpg)
DBMSs On A Modern Processor:Where Does Time Go?
byA. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood
University of Wisconsin-Madison Computer Science Dept.Madison, WI
Presented by
Derwin Halim
![Page 2: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept](https://reader035.vdocument.in/reader035/viewer/2022070407/56649e415503460f94b33944/html5/thumbnails/2.jpg)
Agenda
Database and DBMS
Motivation for DBMS performance study
Proposed DBMS performance study
Processor model
Query execution time breakdown
Database workload
Experimental setup and results
Conclusion
![Page 3: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept](https://reader035.vdocument.in/reader035/viewer/2022070407/56649e415503460f94b33944/html5/thumbnails/3.jpg)
Database and DBMS
Database is a collection of data, typically describing the activities of one or more related organizations: entities and relationships
DBMS (Database Management System) is a software designed to assist in maintaining and utilizing large collections of data
![Page 4: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept](https://reader035.vdocument.in/reader035/viewer/2022070407/56649e415503460f94b33944/html5/thumbnails/4.jpg)
Motivation for DBMS Performance Study
DBMSs are becoming compute and memory boundModern processors do not improve database system performance to the same extent as scientific workloadsContrasting commercial DBMSs and identifying common characteristics are difficultUrgent need to evaluate and understand the processor and memory behavior of commercial DBMSs on existing hardware platform
![Page 5: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept](https://reader035.vdocument.in/reader035/viewer/2022070407/56649e415503460f94b33944/html5/thumbnails/5.jpg)
Proposed DBMS Performance Study
Analyze the execution time breakdown of multiple different commercial DBMSs on the same hardware platformUse workload consists of simple queries on a memory resident databaseIsolate basic operations and identify common trends across the DBMSsIdentify and analyze bottlenecks and provide solutions
![Page 6: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept](https://reader035.vdocument.in/reader035/viewer/2022070407/56649e415503460f94b33944/html5/thumbnails/6.jpg)
Processor Model:Basic Pipeline Operation
![Page 7: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept](https://reader035.vdocument.in/reader035/viewer/2022070407/56649e415503460f94b33944/html5/thumbnails/7.jpg)
Processor Model:Handling Pipeline Stall
Non-blocking cache
Out-of-order execution
Speculative execution with branch prediction
![Page 8: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept](https://reader035.vdocument.in/reader035/viewer/2022070407/56649e415503460f94b33944/html5/thumbnails/8.jpg)
Query Execution Time Breakdown
TQ = TC + TM + TB + TR – TOVL
![Page 9: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept](https://reader035.vdocument.in/reader035/viewer/2022070407/56649e415503460f94b33944/html5/thumbnails/9.jpg)
Database Workload
Single-table range selections and two table equijoins over a memory resident database, running a single command streamEliminates dynamic and random parametersIsolate basic operations: sequential access and index selectionAllows examination of the processor and memory behavior without I/O interference
![Page 10: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept](https://reader035.vdocument.in/reader035/viewer/2022070407/56649e415503460f94b33944/html5/thumbnails/10.jpg)
Database Workload
Table:
create table R (a1 integer not null,
a2 integer not null,
a3 integer not null,
<rest of field>)
Sequential range selection:
select avg(a3)
from R
where a2 < Hi and a2 > Lo
![Page 11: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept](https://reader035.vdocument.in/reader035/viewer/2022070407/56649e415503460f94b33944/html5/thumbnails/11.jpg)
Database Workload
Indexed range selection:construct non-clustered index on R.a2 then resubmitted the range selectionSequential join:select avg(R.a3)from R, Swhere R.a2 = S.a1
40,000 100-byte records in S, each of which joins with 30 records in R
![Page 12: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept](https://reader035.vdocument.in/reader035/viewer/2022070407/56649e415503460f94b33944/html5/thumbnails/12.jpg)
Experimental Setup:Hardware and Software Platform
400MHz PII Xeon/MT Workstation512 MB main memory with 100 MHz system busOut-of-order engine and speculative instruction executionNon-blocking cacheSeparate data and instruction first level cachesUnified second level cache4 commercial DBMSs on Windows NT 4.0 Service Pack 4Event measurement counters and emon
![Page 13: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept](https://reader035.vdocument.in/reader035/viewer/2022070407/56649e415503460f94b33944/html5/thumbnails/13.jpg)
Experimental Setup:PII Xeon Cache Characteristics
![Page 14: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept](https://reader035.vdocument.in/reader035/viewer/2022070407/56649e415503460f94b33944/html5/thumbnails/14.jpg)
Experimental Setup:Measuring Stall Time Components
![Page 15: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept](https://reader035.vdocument.in/reader035/viewer/2022070407/56649e415503460f94b33944/html5/thumbnails/15.jpg)
Results:Execution Time Breakdown
Processor spends most of the time stalledThe problem will be exacerbated by the ever increasing processor-memory gapBottleneck shifts
![Page 16: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept](https://reader035.vdocument.in/reader035/viewer/2022070407/56649e415503460f94b33944/html5/thumbnails/16.jpg)
Results:Memory Stalls Breakdown
L1 D-cache, L2 I-cache, ITLB stall time are insignificant
Focus on L1 I-cache and L2 D-cache stall time component
![Page 17: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept](https://reader035.vdocument.in/reader035/viewer/2022070407/56649e415503460f94b33944/html5/thumbnails/17.jpg)
Results:L2 D-cache Stall Time
Position of the accessed data in the records and the record size
L2 D-cache miss is much more expensive than L1 D-cache miss
Only gets worse as processor-memory performance gap increases
Larger cache => longer latency
![Page 18: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept](https://reader035.vdocument.in/reader035/viewer/2022070407/56649e415503460f94b33944/html5/thumbnails/18.jpg)
Results:L1 I-cache Stall Time
L1 I-cache miss is difficult to overlap and causes serial bottleneck in the pipelineL1 cache size vs. latencyL1 cache miss increases as data record size increases- Inclusion: L2 cache replacement forces L1 cache replacement- OS interrupt: periodical context switching- Page boundary crossing
![Page 19: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept](https://reader035.vdocument.in/reader035/viewer/2022070407/56649e415503460f94b33944/html5/thumbnails/19.jpg)
Results:Branch Mis-prediction
Serial bottleneck and instruction cache misses40% BTB misses on average => more static predictionL1 I-cache miss follows branch mis-prediction behavior
![Page 20: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept](https://reader035.vdocument.in/reader035/viewer/2022070407/56649e415503460f94b33944/html5/thumbnails/20.jpg)
Results:Resource Stall Time
Dominated by dependency and/or functional unit stalls
Dependency stalls are the most important resource stalls due to low ILP except for System A
FU stalls are caused by contention in the execution unit
![Page 21: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept](https://reader035.vdocument.in/reader035/viewer/2022070407/56649e415503460f94b33944/html5/thumbnails/21.jpg)
Results:Simple Query vs TPC Benchmarks
Simple Query vs TPC-D (DSS):- Similar CPI breakdown- Still dominated by L1 I-cache and L2 D-cache miss
Simple Query vs TPC-C (OLTP):- CPI rate of TPC-C is much higher- Resource stalls are higher- Dominated by L2 D- and I-cache miss
![Page 22: DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept](https://reader035.vdocument.in/reader035/viewer/2022070407/56649e415503460f94b33944/html5/thumbnails/22.jpg)
Conclusion
Memory stall is a serious performance bottleneck
Focus on L1 I-cache and L2 D-cache misses
Improvements should address all of the stall components due to possibility of bottleneck shifts
Simple query offers methodological advantage
TPC-D has similar execution time breakdown, while TPC-C incur more second level cache and resource stalls