lock scheduling for tpc-c workload

Lock scheduling for TPC-C workload

Slides originally prepared by Jisu Oh

David T. McWherter, Bianca Schroeder, Anastassia Ailamaki, Mor Harchol-Balter, Improving Preemptive Prioritization via statistical characterization of OLTP, IEEE International Conference on Data Engineering (ICDE) 2005.

Motivation

• Minimize long delays and unpredictable large response times in online transaction processing (OLTP)

• Transaction prioritization (aka differentiated service)– High-priority transactions are not delayed by

low-priority transactions– Low-priority transactions are not excessively

penalized• Most commercial systems such as DB2 focus on

CPU prioritization, not lock prioritization

Contributions

• Show locks are the bottleneck resource

• Evaluate lock scheduling policies– Standard, NPrio, NPrioinher, PAbort– Average response time for high/low priority transactions

• Transaction performance under non-preemptive/preemptive policies

• Implementation and evaluation of POW scheduling

Bottleneck: locks

• TPC-C workload

• Assumption: CPU and I/O utilization are high• More than 80% of their lifetime waiting for locks• Response times depending on waiting for locks

Evaluating lock scheduling policies

• preemptive prioritization p-inheritance

• Standard X X X• NPrio X O X • NPrioinher X O O• PAbort O O N/A

• 300 clients• [10, 1] think time [25, 250] active clients• 10% high-priority transactions , 90% low-priority

transactions

Average response time

• fig 2. (a) and (b)

Policy overhead

• fig 2. (c)

High-priority performance under non-preemptive policies

• Q1. How many lock requests do high-priority transactions wait for?

• Q2. How long are lock waits?

• Q3. How much lock waiting is attributed to current lock holders?

• Q4. How much do lock waits contribute to response times?

Q1. How many lock requests do high-priority transactions wait for?

• Reducing the number of lock waits may be an effective strategy to improve high-priority transactions

99% 1%

Q2. How long are lock waits?

• QueueTime– When the transaction initiates the lock request until it

is granted

– Including preemption time

– 40~50% of high priorityTransaction’s response time

• PAbort QueueTime:– Only half of NPrioinher

Q3. How much lock waiting is attributed to current lock holders?

• WaitExcess– The time from when the lock request is made until the

first transaction waiting for the lock is woken and acquires the lock

– The time that a transaction waits for current holders to release the lock

• WaitRemainder– The time from when the first waiter acquires the lock

until the lock request is finally granted – The time the transaction waits for other transactions in

the queue with it

Q3. How much lock waiting is attributed to current lock holders?

(cont.)

• High priority transactions wait for only the current lock holders

QueueTime and WaitExcessFor NPrioinher

QueueTime and WaitExcessFor NPrio

Q4. How much do lock waits contribute to response times?

• An accurate predictor for the length of a transaction’s remaining execution time is whether the transaction is about to wait for locks or not

Preemption penalties of low-priority transactions

• The cost of rolling back transactions

• The number of preemptions (and rollbacks) per transaction

• The work lost executing transactions that are subsequently preempted

– In optimized commercial systems, rollback cost is less significant

– Less than 0.4– Expected total cost is not large

– Preempted after executing 75~90% of the length of an transactions

– Doubles its expected execution cost– The most significant flaw of PAbort

Best lock scheduling?

• Conclusion from several evaluations– High-priority transactions under non-preemptive policies wait too long for current lock holders– Low-priority transactions under preemptive policies preempted after completing a significant amount of

work

• Solution: Preempt-On-Wait Scheduling (POW)– When a high-priority transaction H waits for a lock X1 held

by a low-priority transaction L– L is preempted if only if L currently, or in the future, waits

for some other lock X2.

POW example

X1

X2

X3

L1

L2

L3

Hpreempted

fpow: on

fpow: on

waitingX

waiting

X

POW performance evaluation

• Performance of high-priority transactions identical to PAbort

• Performance of low-priority transactions identical to NPrioinher

POW vs. CR300

• CR300– similar to PAbort except a reprieve time (300 ms) to

complete before being preempted

• Lock waiting is more important than reprieve time

Summary of POW performance

• Low-priority penalty– Due to work lost to transactions later preempted– # preempted transactions: less than 1% (c.f., PAbort: 20%)– The age of a transaction preempted: complete during their

reprieve

• High-priority improvement– Reduce QueueTime of high-priority transactions significantly– High-priority transactions waits no longer than it would have if

it preempted the current holder(s)

Multi-programming level for external scheduling

Slides originally prepared by Phil H. Sin

B. Schroeder, M. Harchol-Balter, A. Iyengar, E. Nahum, A. Wierman,How to Determine a Good Multi-Programming Level for External Scheduling,ICDE 2006.

Introduction• For web application it is desirable to control the order in

which transactions are executed at the DBMS where majority of request processing time is spent.– Response time is the time from when a transaction arrives

until it completes, including time spent queueing externally to the DBMS

• External scheduling is used as a method of controlling the order in which transactions are executed– Provides class-based quality of service guarantees for

database workloads

– Portable and easy to implement since it does not require changes to the complex internals of the backend DBMS

• Multi-programming limit is used to control the number of transactions concurrently executing within the DBMS

Choosing Appropriate MPL

• By holding transactions outside the DBMS and sequencing them creates the potential for head-of-line (HOL) blocking where some long-running transactions prevent other shorter transactions from entering the DBMS

• Three important considerations:– Seek the lowest possible MPL value necessary to ensure

optimal throughput levels inside the DBMS.

– Seek the lowest possible MPL value necessary to prevent increase in overall mean response time

– It is not at all obvious that external scheduling, even with a sufficiently low MPL, will be as effective as internal scheduling, since an external scheduler does not have any control over transactions once they’re dispatched to DBMS

Experimental Setup

• To study feasibility and effectiveness of external prioritization, it is important to evaluate the effect of different workloads and hardware configurations– DBMS: IBM DB2 and Shore

– Hardware: 2.4-GHz Pentium 4 running Linux 2.4.23

– Workloads: TPC-C and TPC-W• Wide range of workloads by varying a large number of

hardware and benchmark configuration parameters

Benchmark configuration

• Parameters that are varied includes:– Number of warehouse in TPC-C

– Size of the database in TPC-W which includes both number of items included and number of emulated browsers

– Transaction mix used in TPC-W, particularly these are primarily “browsing” transactions or primarily “ordering” transactions

Effect of low MPL

• By experiment, study low values of MPL and its effect on throughput and on mean response time– Identify the workload factors that affect MPL

• Effect on throughput– CPU bound, I/O bound, Balanced, Lock bound

• Effect on response time– Variability of workload

Throughput: CPU bound workloads

• WCPU-inventory vs. WCPU-browsing

– 1 CPU vs. 2 CPU

• Higher MPL is needed to reach maximum throughput in the case of 2 CPUs as compared with 1 CPU because more transactions are needed to saturate 2 CPUs

Throughput: I/O bound workloads

• WI/O-inventory and WI/O-browsing

• Different number of disks

• MPL needed to maximize throughput grows for systems with more disks, since more transactions are required to saturate more resources

• MPL at WI/O-browsing (2GB) is higher than for WI/O-inventory (6GB) because the size of the database is smaller than for WI/O-inventory.

Throughput: Balanced workloads

• WCPU+I/O-inventory• 1 disk & 1 CPU vs. 4 disks & 2 CPUs

• MPL required is largely proportional to the number of resources that are utilized in a system without an MPL

Throughput: Lock-bound workloads

• Uncommitted read vs. Repeatable read– Increasing the amount of locking lowers the MPL.

When the amount of locking is high, throwing more transactions into the system doesn’t increase the rate at which transactions complete.

Response time

• Investigating effect of the MPL value on mean response time in open system with Poisson arrivals– Degree to which the MPL affects the mean response time

in TPC-W is dominated by the variability of the workload, rather than other factors such as the resource utilization

• Low MPL increases overall mean response time when short transactions get stuck waiting behind very long transactions in the external queue– External scheduling with MPL can be viewed as FIFO

queue

– In queueing theory, mean response time at FIFO queue is known to be directly affected by job size variability

Factors influencing choice of MPL

• Key factor respect to throughput is the number of resources that the workload would utilize– IO-bound workload running with 4 disks requires higher MPL

than if running with only 1 disk

• To optimize overall mean response time, the dominant factor of MPL is the variability in transaction demands– Higher variable require a higher MPL

• Number of resources utilized to keep throughput high is more important that type of resources– Respect to throughput an mean response time, both are hardly

affected by whether the workload is I/O bound, CPU bound, or lock bound

Finding MPL• Need to develop techniques for automatically tuning the MPL

value to make the external scheduling approach viable in practice– Predict how a given MPL changes throughput and mean response time

relative to the optimal performance

• Develop queueing theoretic models and analysis to capture basic properties of the relationship between system throughput and response time and the MPL– Use models to predict a lower bound on MPL that limits performance

penalties to some specified threshold

– Control loop optimizes value in alternating observation and reaction phases

• Observation phase collects data on the relevant performance metrics (throughput and mean response time)

• Reaction phase updates the MPL accordingly

Queueing analysis of throughput vs. MPL

• Model the MPL by using closed system with fixed number of clients (MPL)– Compare results to the maximum throughput for the system until finding the lowest

MPL value that leads to the desired throughput level• All resources are equally utilized in “worst-case”• Client will not utilize two resources at once

• MPL required to reach near maximum throughput grows linearly with the number of disks

– Min. MPL required to achieve 80% of maximum throughput (circle)– Min. MPL required to achieve 95% of maximum throughput (square)

• Analysis captures the main trends of the throughput vs. MPL function obtaining an initial estimate of the MPL required to achieve the desired throughput

Queueing analysis of response time vs. MPL

• External scheduling mechanism with MPL parameter can be viewed as single unbounded FIFO queue feeding into Processor-Sharing server– Mean response time is dominated

by the variability

• Continuous-time Markov chain (CTMC)– Vary variability C2 by modeling the

job sizes by a 2-phase hyper-exponential (H2) distribution with probability parameter p and rates u1 and u2

– Number of servers fluctuates between 1 and MPL

– Sum of the service rates at the multiple servers is always maintained constant and equal to single PS server

Evaluation of CTMC

• For low levels of variability, mean response time is largely independent of MPL value

• For high levels of variability, MPL is depends on load

Simple controller to find lowest feasible MPL

• Choosing right amount by which to adjust the parameter in each iteration– If too small, conservative adjustments will lead to long

convergence times– If too large adjustments can cause overshooting and

oscillations

• Control-loop with a close-to-optimal starting value provides fast convergence times

• Critical factor in implementing feedback based controller is choice of observation period. Need enough samples to provide a reliable estimate of mean response time and throughput– Observation period span around 100 transactions to provide

stable estimates

External scheduling for prioritization

• Low MPL gives control on the order in which transactions are scheduled, since order can be determined from the external queue– Problem of differentiating between high and low priority transactions

• Allow as many transactions into the system as allowed by the MPL, where the high-priority transactions are given first-priority, and low-priority transactions are only chosen if there are no more high-priority transactions– Random assigning 10% of transaction high-priority and the remainder low-

priority

• MPL is adjusted to limit throughput loss to 5%, 20% compared to case where no external scheduling is used

Internal scheduling• Scheduling internals of the DBMS is more involved than

external scheduling– Not clear which resource should be prioritized: the CPU, the

disk, the lock queues, etc.• Workload run on 2PL DBMS, transaction execution times are often

dominated by lock waiting times

• In other DBMS, execution times are dominated by CPU usage or I/O

• Lock-bound & Preempt-on-Wait lock prioritization policy– High priority transactions move ahead of low-priority

transactions in the lock queue

• CPU-bound & Manual CPU scheduling– High priority transaction to -20 (highest) and low priority

transaction to 20 (lowest) available CPU priority

Internal vs. External• Setup

– 1: WCPU-inventory, 1 CPU, 1 Disk, RR

– 3: WCPU-browsing, 1 CPU, 1 Disk, RR

• Comparison– Internal vs. External (5%, 20%, 0%

throughput loss) prioritization

• External scheduling is nearly as effective as the internal scheduling algorithms and can even be more effective when the MPL is low

• Key point is that external scheduling is promising approach when the MPL is adjusted appropriately

Conclusion

• The dominant factor in lower-bounding the MPL with respect to minimizing throughput loss is the number of resources that the workload utilizes.

• The key factor in choosing an MPL so as not to hurt overall mean response time, is the variability in service demands of transactions

• External scheduling mechanism is highly effective in providing prioritization differentiation.

Questions?

lock scheduling for tpc-c workload

Documents

lock waiting

lock requests

number of lock waits

current lock holdersqueuetime

high priority transactions

highpriority transactions99

transaction waits

c highpriority performance