pitfalls in teaching development and testing of concurrent programs and how to overcome them eitan...
TRANSCRIPT
Pitfalls in Teaching Development and Testing ofConcurrent Programs and How to Overcome them
Eitan Farchi
IBM Labs in Haifa
2 Contest
Objectives of the course I wanted to teach
Background The process abstraction, mutual exclusion and conditional synchronization, scheduling policies
and fairness, the process life cycle, synchronization primitives (semaphores, monitors), message passing, logical time, examples,…
Design the protocol through an abstraction Use atomic and atomic wait primitives
(c1, s1) s2 => (c1 || c2, s1) (c2, s2) (c1, s1) s2 => (<c1>, s1) s2 (b, s1) true and (c, s1) s2 => (<await b c>, s1) s2
The use of higher abstraction level synchronization primitives lead to Lower number of possible interleavings Mistakes are less likely
Design is validated through Reviewing the important interleavings Formal reasoning (invariants, proofs, model checking,…)
Higher abstraction level synchronization primitives are correctly translated to lower abstraction level synchronization primitives
For example, an atomic primitive is carefully translated to locks and unlocks Bug patterns are used to avoid mistakes
The implementation is tested using ConTest At this stage a good test plan is readily available from the previous development phaes
IBM Labs in Haifa
3 Contest
I thought this course for several years in various formats
To third year computer science students To professional programmers
With and without experience in development of concurrent programs
At least first degree in computer science To testers with various degree of programming skills
IBM Labs in Haifa
4 Contest
Real world description of the ticket algorithm (start with something concrete ) Some stores/government offices employ the following method to ensure
that customers are serviced in order of arrival Upon entering the store, a customer draws a number that is larger
than the number held by any other customer The customer then waits until all customers holding smaller
numbers have been serviced This algorithm is implemented by a number dispenser and by a
display indicating which customer is being served If the store has one employee behind the service counter,
customers are served one at a time in their order of arrival
IBM Labs in Haifa
5 Contest
High level implementation of the ticket algorithm
var number := 1, next := 1, turn[1:n] := ([n], 0)
P[1:1..n]:: do true ->
<turn[i] := number, number := number + 1>
<await turn[i] == next>
critical section
<next := next +1>
non-critical section
od
IBM Labs in Haifa
6 Contest
Mapping of the previous two abstraction levels (real world and high level descriptions) <turn[i] := number, number := number + 1> // customer obtains a ticket <await turn[i] = next> // customers wait their turn <next := next +1> // call for next customer
IBM Labs in Haifa
7 Contest
Testing/validating the protocol
Even if the synchronization primitives are high level there are typically too many interleavings to review
This is addressed by inductive proof, invariants Assuming process i entered the critical section then
turn[i] == next right after <await turn[i] == next>. It is easy to prove that turn[i] <> turn[j] if i <> j and turn[i] <> 0 and turn[j] <> 0 Thus, as long as the critical section is not exited, any process that will reach
<await turn[i] == next> will have to wait and at most one process can enter the critical section.
Students – “We don’t like mathematics and we don’t like proofs, in fact, we hate them”
“And by the way – the ticket algorithm is ridiculously simple – its only a loop
with for lines of code” Maybe they don’t understand there is an exponential space of possible
interleavings?
IBM Labs in Haifa
8 Contest
Objectives of the course - updated
Background The process abstraction, mutual exclusion and conditional synchronization, scheduling policies and fairness, the
process life cycle, synchronization primitives (semaphores, monitors), message passing, logical time, examples,… Design the protocol through an abstraction
Use atomic and atomic wait primitives (c1, s1) s2 => (c1 || c2, s1) (c2, s2) (c1, s1) s2 => (<c1>, s1) s2 (b, s1) true and (c, s1) s2 => (<await b c>, s1) s2
The use of higher abstraction level synchronization primitives lead to Lower number of possible interleavings Mistakes are less likely
Design is validated through Systematically represent the set of possible interleavings
Typically through the use of Cartesian product models Reviewing the important interleavings
Higher abstraction level synchronization primitives are correctly translated to lower abstraction level synchronization primitives
For example, an atomic primitive is carefully translated to locks and unlocks Bug patterns are used to avoid mistakes
The implementation is tested using ConTest At this stage a good test plan is readily available from the previous development phases
IBM Labs in Haifa
9 Contest
Helping the students realize that there is an exponential interleaving space
First attempt - counting The number of possible interleavings is enormous
For (a;b;c;e;f;g)||(h;I;j;k;l;m) of none blocking atomic actions the number of possible traces is 12!/(6!*6!) = 924
Second attempt – riddles 100 threads are executing x++ on a shared variable initialized to 0,
what are the possible outcomes? Students – “OK there are many things happening together in parallel and
they can occur in many ways – but it is hard, too hard, to think about things happening in parallel”
IBM Labs in Haifa
10 Contest
Serialization helps understand the algorithmProcess 1 Process 2 number next turn[1] turn[2]
1 1 0 0
<turn[1] := number, number := number + 1>
2 1 1 0
<await turn[1] = next>
<turn[2] := number, number := number + 1>
3 1 1 2
<await turn[2] = next>
blocks
critical section
<next := next + 1>
3 2 1 2
IBM Labs in Haifa
11 Contest
Serialization helps understand the algorithm (Continued)
Process 1 Process 2 number next turn[1] turn[2]
returns
critical section
<next := next + 1>
3 3 1 2
IBM Labs in Haifa
12 Contest
Next we implement the protocol
Students – “Locks are easy to use – no need to read the instructions”
IBM Labs in Haifa
13 Contest
Avoid errors by understanding the synchronization primitives [precise-java] In Java each object is associated with a lock Consider the following class
class Conflict { Conflict(…){ synchronized(Conflict.class){…}; }; synchronized static void f(…){….}; synchronized void g(…){….}; void h(…){ synchronized(this){….}; }; void r(…){…};};
Which of the following pairs of methods when executing concurrently can cause a conflict?
f || g, f || h, f || r, g || h, g || r, h || r Pairs of the constructor method and one of the other methods
IBM Labs in Haifa
14 Contest
Translating from abstract to concrete - implementation pitfalls are explained Difference between atomicity and locking
What is the protection provide by synchronized(o){x++} occurring in parallel to x++?
When translating from an atomic block to locks/unlocks we need to identify all program locations that contened on the shared resource
Check that the lock was obtained – this is not good –lock()unlock()
Check that the lock was released along all error paths What happens if a signal is taken while in the critical section (pthreads) What happens if an interrupt exception is taken while in wait()? try{
synchronized(o){o.wait();
}}catch(Exception e){}
When atomic conditional wait is implemented we typically introduce a race and we need to recheck the condition once in the critical section
Teaching pitfalls is highly effective in reducing the learning curve
IBM Labs in Haifa
15 Contest
Hiding the protocol implementation
Prepare a general synchronization services for the system located in a separate class (see picture on the right)
Students - “OK but we’ll implement the protocol all over the place any way”
Hard to teach without real life large systems experience Hard to suggest to engineers that
maintain an existing system that is not like that
If its not broken don’t fix it…
IBM Labs in Haifa
16 Contest
Testing
Running many times a test that has a concurrency problem does not necessarily produce it
Especially in unit test environments Easy to demonstrate through examples
Create an “empty test” in which the synchronization primitives used are mapped to no-ops and shoe that the protocol “works fine”
Best practice – your test should at least expose a problem with the “empty implementation”
Running black box tests that have the required contention (e.g., customers accessing the ticketing system simultaneously) does not necessarily produce the white box contention you are after –
The blocking in <await turn[i] == next> to occur and not occur A context switch to occur right before and right after <await turn[i] == next> Defining the coverage tasks you are after and checking their “coverage”
helps E.g., ConTest synchronization coverage
BACKUP
IBM Labs in Haifa
18 Contest
Exercises - knowing the synchronization primitives (Java)
100 threads execute i++ where i is a global variable. Describe all possible outcomes The following thread is interrupted while waiting at the blue statement belowtry{ synchronized(foo){ foo.wait(); }}catch(Exception e){};
Is the thread still holding the lock and is the thread interrupt bit turned on at the red statement above? What are the answers to the same questions if we change theprogram to:
synchronized(foo){ try{ foo.wait(); }catch(Exception e){}; }
IBM Labs in Haifa
19 Contest
Exercises - knowing the synchronization primitives (Java)
What happens if one thread executes the following method recursively, e.g., by excecuting factorial(7)
synchronized int factorial(int i){
if(i == 0)
return(1);
else
return(i * factorial(i-1));
}
IBM Labs in Haifa
20 Contest
Will Parallel Programming Become Common Knowledge and the Parallel Programmer the Programmer of the future?
It is hard to teach parallel programming development and verification to novices Comprehending the space of possible interleavings is hard Accurately and correctly defining the behavior of many threads acting in
parallel is hard With the introduction of multi-core, there is an increasing need for programmers
who are able to reliably develop parallel programs But maybe a different solution is possible?
Can we avoid the need for the parallel programmer? Can we have the compiler or the programming language encapsulate the
difficulties of parallelism and return the genie to the bottle? Will parallel programming become common knowledge and the parallel
programmer the agent of the next revolution in programming paradigms?
IBM Labs in Haifa
21 Contest
Will Parallel Programming Become Common Knowledge and the Parallel Programmer the Programmer of the future?(continued)
How will future multi-core systems be programmed? How well does existing primitives address various application domains and how well do they coexist? (3)
What is the role of high level primitivies (e.g., the trasaction model). Can it hide perforomance? (3)
Is the major difficulty in programming parallel programs testing them (2)? How do we address students huge difficulties in predicting possible
interleavings and, most special, the unwanted/undesired ones (2)? What courses should be added to the curriculum and what should be taught
on the job? (2) What is the minimum knowledge one needs if the underlying program is
parallel? To be more specific, most programmers probably know close to nothing about compiler optimization and about the processor structure. Will they need more knowledge in the future, or can the details be hidden from them? (1)
What will be the minimum knowledge needed by a parallel programmer and how will he or she acquire it, with emphasis on testing/debugging? (1)