an case for an interleaving constrained shared-memory multi-processor jie yu and satish narayanasamy...
Post on 14-Dec-2015
224 Views
Preview:
TRANSCRIPT
An Case for an Interleaving Constrained Shared-Memory
Multi-Processor
Jie Yu and Satish Narayanasamy
University of Michigan
Why is Parallel Programming Hard?
• Is single-threaded programming relatively easy?– Verification is NP-hard
– BUT, properties such as a function’s pre/post-conditions, loop invariants are verifiable in polynomial time
• Parallel programming is harder– Verifying properties for even small code regions is NP-
hard
– Reason: Unbounded number of legal thread interleavings exposed to the parallel runtime
– Impractical to test/verify properties for all legal interleavings
Legal Thread Interleavings
Too much freedom given to parallel runtime?
Tested Correct
Interleavings
Incorrect interleavings found during testing
Incorrect interleavings eliminated by adding synchronization constraints
Untested interleavings - cause for concurrency bugs
Solution : Limit Freedom
Programmer tests as many legal interleavingsas practically possible
Interleaving constraints from
correct test runs are encoded in the program binary
Runtime System Avoids Untested Interleavings
i.e. avoid corner cases
Result of Constraining Interleavings
• A majority of the concurrency bugs are avoidable– Data races, atomicity violations, and
also order violations
• Performance overhead is low– Untested interleavings in well-tested
programs are likely to manifest rarely– Processor support helps reduce the cost
of enforcing interleaving constraints
Challenges
• How to encode tested interleavings in a program’s binary?– Predecessor Set (PSet)
interleaving constraints
• How to efficiently enforce interleaving constraints at runtime?• Detect violations of PSet
constraints using processor support
• Avoid violations by stalling or using rollback-and-re-execution support
Encoding Tested Interleavings
• Interleaving Constraints from Test Runs– Too specific to a test input Performance
loss for a different input– Too generic Might allow untested
interleavings
• Predecessor Set (Pset)– PSet(m)defined for each static memory
operation m– pred PSet(m), if m is immediately and
remotely memory dependent on pred in at least one tested execution
A Test RunThread
1Thread
2Thread
3
R2
W1
R1
R3
W2
R4
W3
{ W1 }
{ }
{ }
{ W1 }
{ W2 }
{ }
{ R3, R4 }
PSet(W1) = {}PSet(R1) = {}PSet(R2) = {W1}PSet(R3) = {W1}PSet(R4) = {}PSet(W2) = {R3,R4}PSet(W3) = {W2}
R2
R4
W1
Enforcing Tested Interleaving
• Processor support for detecting and avoiding PSet constraints
• Detecting PSet constraint violations– For each memory location, track its last accessor
• Cache extension – Detect PSet constraint violation
• Piggyback cache coherence reply with last accessor • Processor executes PSet membership test by executing
additional micro-ops
• Overcoming a PSet Constraint violation– Stall– Re-execute using checkpoint-and-rollback support
• E.g. SafetyNet, ReVive, etc.
Two Case Studies
• Case Study 1– An Atomicity Violation Bug in MySQL– Avoided using stall
• Case Study 2– An order violation bug in Mozilla
• neither a data race nor an atomicity violation
– Avoided using rollback and re-execution
Two Case Studies
• Case Study 1– An Atomicity Violation Bug in MySQL– Avoided using stall
• Case Study 2– An order violation bug in Mozilla
• neither a data race nor an atomicity violation
– Avoided using rollback and re-execution
An Atomicity Violation Bug in MySQL
MYSQL_LOG::new_file(){ … close(); open(…); …}
mysql_insert(…){ … if (log_status != LOG_CLOSED) { // write into a log file } …}
…log_status = LOG_CLOSED;…
…log_status = LOG_OPEN;…
Thread 1
sql/log.cc sql/sql_insert.cc
W2
W1
R1
Thread 2
Correct Interleaving #1 -- “frequent”, therefore likely to be
tested
Thread 1
Thread 2
log_status = LOG_CLOSED
log_status = LOG_OPENW2
log_status != LOG_CLOSED ?
W1
R1
{ R1 }
{ }
{ }
PSet(W1) = {R1}PSet(W2) = {}PSet(R1) = {}
Correct Interleaving #2 -- “frequent”, therefore likely to be
tested
Thread 1
Thread 2
log_status = LOG_CLOSED
log_status = LOG_OPENW2
log_status != LOG_CLOSED ?
W1
R1
{ R1 }
{ }
{ }{ W2 }
PSet(W1) = {R1}PSet(W2) = {}PSet(R1) = {W2}
log_status != LOG_CLOSED ?
Incorrect Interleaving -- rare, and therefore likely to be
untested
Thread 1
Thread 2
log_status = LOG_CLOSED
log_status = LOG_OPENW2
W1
R1
{ R1 }
{ }
{ W2 }
Constraint ViolationPSet(R1)W1
PSet(R1)W2
Two Case Studies
• Case Study 1– An Atomicity Violation Bug in MySQL– Avoided using stall
• Case Study 2– An order violation bug in Mozilla
• neither a data race nor an atomicity violation
– Avoided using rollback and re-execution
Correct Test RunTimerThread::Run() { ... Lock(lock); mProcessing = TRUE; while (mProcessing) { ... mWaiting = TRUE; Wait(cond, lock); mWaiting = FALSE; } Unlock(lock); ...}
TimerThread.cpp
TimerThread::Shutdown() { ... Lock(lock); mProcessing = FALSE; if (mWaiting) Notify(cond, lock); Unlock(lock); ... mThread->Join(); return NS_OK;}
TimerThread.cpp
mWaiting = TRUE
if (mWaiting) ?
Thread 1
Thread 2
W
R
W
R
{ }
{ W }
PSet(W) = {}PSet(R) = {W}
Avoiding Order ViolationTimerThread::Run() { ... Lock(lock); mProcessing = TRUE; while (mProcessing) { ... mWaiting = TRUE; Wait(cond, lock); mWaiting = FALSE; } Unlock(lock); ...}
TimerThread.cpp
TimerThread::Shutdown() { ... Lock(lock); mProcessing = FALSE; if (mWaiting) Notify(cond, lock); Unlock(lock); ... mThread->Join(); return NS_OK;}
TimerThread.cpp
mWaiting = TRUE
if (mWaiting) ?
W
R
Thread 1
Thread 2
W
R
{ }
{ W }
Constraint ViolationPSet(W)R
Rollback
Methodology
• Pin based analysis
• 17 documented bugs analyzed– MySQL, Apache, Mozilla, pbzip, aget, pfscan
+ Parsec, Splash for performance study
• Applications tested using regression test suites when available or random test input
PSet Constraints from Test Runs
• Concurrent workload– MySQL: run regression test
suite in parallel with OSDB– FFT, pbzip2: random test
input
Bug Avoidance Capability• 17 bugs from MySQL, Apache, Mozilla, pbzip, aget,
pfscan
• 15/17 bugs avoided by enforcing PSet contraints– Including a bug that is neither a data race nor an
atomicity violation bug
• 2/17 false negatives– a multi-variable atomicity violation – a context sensitive deadlock bug
• 6 bugs are avoided using stalling mechanism. Other require rollback mechanism.
PSet violations in Bug Free Execution
• 2 PSet constraint violations in MySQL not avoided– MySQL, bmove512 unrolls a loop 128 times
PSet Size of Instructions
Over 95% of the inst. have PSets of size zero
Less than 2% of static memory inst. have a PSet of size greater than two
Summary• Multi-threaded programming is hard
– Existing shared-memory programming model exposes too many legal interleavings to the runtime
– Most interleavings remain untested in production code
• Interleaving constrained shared-memory
multiprocessor – Avoids untested (rare) interleavings to avoid
concurrency bugs
• Predecessor Set interleaving constraints– 15/17 concurrency bugs are avoidable– Acceptable performance and space overhead
Thanks
• Q & A
Memory Space Overhead
ProgramApp. Size
# PSet Pairs
Overhead w.r.t App.
Pbzip2 39KB 201 2.16%
Aget 90KB 365 1.69%
Pfscan 17KB 295 7.34%
Apache 2435KB 4119 0.69%
MySQL 4284KB 6604 0.64%
FFT 24KB 158 2.74%
FMM 73KB 1764 10.13%
LU 24KB 244 4.31%
Radix 21KB 255 5.00%
Blackscholes
54KB 41 0.32%
Canneal 59KB 752 5.24%
Space Overhead In the worst case, 10%
code size increase
top related