verifying conformance to memory models: the test model-checking approach ganesh gopalakrishnan...
Post on 21-Dec-2015
214 views
TRANSCRIPT
Verifying Conformance to Memory Models: the Test Model-checking Approach
Ganesh Gopalakrishnan (funded by NSF)
presenting work done by
Ratan Nalumasu (PhD, 9/98, HP Cupertino)
Rajnish Ghughal, (MS, 10/98?, interviewing)
Abdel Mokkedem (postdoc; going to Compaq)
Other members of the group:
Ravi Hosabettu (PhD, 12/99? Processor Verification)
Mike Jones (PhD, 6/00? - at Intel now)
Annette Bunker (PhD, 6/01?)
10/7/98 Ganesh, Utah Verifier group -- UT Austin talk
2
FM for memory system design• Processor speed increases at 55% a year, while
memory speed increases at 7%
• With shared memory multiprocessors, the mismatch is exacerbated
• Complex protocols are employed to help improve the overall performance
• Performance improvements should not be at the expense of correctness...
• Hence, the need for a formal verification technique for memory systems that can be employed in actual design cycles
10/7/98 Ganesh, Utah Verifier group -- UT Austin talk
3
Goal of our Research
Develop domain-specific formal methods for shared memory systems
Memory
CPUCPU
Interconnect
Memory
10/7/98 Ganesh, Utah Verifier group -- UT Austin talk
4
Types of shared memory systems: 1. Symmetric Multiprocessors (SMP)
- Can scale upto 10s of processors- Modern caches have support for such SMP protocols
CPU$
Memory
CPU$
CPU$
10/7/98 Ganesh, Utah Verifier group -- UT Austin talk
5
2. Distributed Shared Memory (DSM) systems
NODE NODE NODE
MEM MEM MEM
Network
Each node may be a SMP or a single CPU
10/7/98 Ganesh, Utah Verifier group -- UT Austin talk
6
Shared Memory Correctness• Low level:
– deadlock– forward progress– bus arbitration
• Intermediate level:– at most one owner of cache-line
• High-level:– verify abstraction provided to software
(the desired Formal Memory Model)
10/7/98 Ganesh, Utah Verifier group -- UT Austin talk
7
Results of the UV group• New Partial Order reduction algorithm (Nalumasu)
• Realized in verifier called PV
• Outperforms SPIN “10 to 1” on most examples
• Selective state-caching is available “for free”
• A DSM Protocol synthesis algorithm (Nalumasu)• Safety of synthesis proved correct using PVS
• Derives realistic (hand-quality) DSM protocols Incorporates a
scalable buffer-reservation scheme
• Verifying Formal Memory Models (Nalumasu, Mokkedem, Ghughal) ....this talk
10/7/98 8
The key issue in verifying memory models:must reorder reads / writes correctly
Multiprocessor:P1write(a,new)read(b)
P2write(b,new)read(a)
P1read(b)write(a,new)
P2read(a)write(b, new)
Not okunderS.C.
Uniprocessor: P1write(a,new)read(b)
P1read(b)write(a,new)
okcache/compiler/out-of-order execution
Test model checking
10/7/98 9
Speculative Execution can permit OO processingwhile maintaining SC...
wr(a,2) ;rd(b) ;
wr(b,3) ;rd(a) ;
CPU1 CPU2
bus
b : shareda : invalid
a : sharedb : invalid
MEM
- Miss on “a”- Hit on “b” (speculative)- Bus serializes as wr(a) ; wr(b)- Speculation successful
- Miss on “b”- Hit on “a” (speculative)- Bus serializes as wr(a) ; wr(b)- Speculation unsuccessful
10/7/98 10
Speculative Execution can permit OO processingwhile maintaining SC...
wr(a,2) ;rd(b) ;
wr(b,3) ;rd(a) ;
CPU1 CPU2
bus
b =1 : shareda : invalid
a=0 : sharedb : invalid
MEM
- SC can explain sequence wr(a,2); rd(b,1); wr(b,3)
- SC can’t explain sequence wr(a,2); wr(b,3); rd(a,0)
10/7/98 Ganesh, Utah Verifier group -- UT Austin talk
11
What Test model-checking does
SC:SC:The observed execution results can beexplained by a weave of the individual instructionsequences.
CPU $
Memory
CPU $
CPU $
Test modelchecking
?
10/7/98 Ganesh, Utah Verifier group -- UT Austin talk
12
Current Memory-system Verification Techniques
Simulation:– Ad-hoc– Incomplete– Memory model bugs are non-intuitive.
Difficult to tell that something’s wrong!
10/7/98 Ganesh, Utah Verifier group -- UT Austin talk
13
Current Formal Techniques for Memory System Verification
• Graf’s work on verifying Lazy caching in ACTL*– Not adequately demonstrated
• Toy examples• Technique unfit for iterative design cycles (..more..)
• Gibbons/Koren approach: verify execution traces• Ad-hoc• NP-complete
• McMillan proposed seed of idea similar to test model-checking• No details
10/7/98 Ganesh, Utah Verifier group -- UT Austin talk
14
Graf’s work: ACTL* for (stronger than) SC
• AG(enabled( read(a,d) )) avail(a,d)• AG(avail(a,d) AND EF(enable(read(a,d)))) A[NOT
avail(a,d) W AG NOT avail(a,d)]• ...• init AG[after(write(a,d))
A(NOT enabled(read(a,d) W avail(a,d))]
Such MODEL DEPENDENT SPECS do not fit in
an iterative industrial framework: - spec too tedious to write - spec rendered obsolete by design iterations
10/7/98 Ganesh, Utah Verifier group -- UT Austin talk
15
Test model-checking evolved out of ARCHTEST (Collier)
• Thread-based tests that are run on the CPUs
• Architectural rules formulated as safety properties
• Has detected bugs in commercial multiprocessors
• Formally based
• Available for free (for schools...)
• Unfortunately, ARCHTEST
– isn’t that effective at design-time
– is incomplete in many ways
10/7/98 Ganesh, Utah Verifier group -- UT Austin talk
16
ARCHTEST overview: (i) Instruction Execution
View programs w.r.t. their memory instructions (reads and writes) :
... rd(a) wr(b,2) ...
and focus on the outcomes of such executions...
10/7/98 Ganesh, Utah Verifier group -- UT Austin talk
17
(ii) Shared memory modeling
CPU
Memory
CPUCPU
Network
NODE
MEM
NODE
MEM
NODE
MEM
R1(a)W2(b,1)
R3(c)W4(d,2)
CPU_i
STORE_i
CPU_j
STORE_j
Conceptual“local stores”
R1(a) ;W2(b,1) ;R5(d) ;
R3(c) ;W4(d,2) ;W6(d) ;
CPU_i
STORE_i
CPU_j
STORE_j
Program ordering
(iii) Given a parallel program..., define
CPU_i
STORE_i
CPU_j
STORE_j
R1(a,T) ;W2(b,1) ;R5(d,2) ;
R3(c,T) ;W4(d,2) ;W6(d,3) ;
EXECUTIONS...
R3(c,T) W4(d,2) W6(d,3)
W2(b,1)
If ROobeyed
If POobeyed
Event orderingsRO, WO,PO...
R1(a,T) W2(b,1) R5(d,2)
W4(d,2) W6(d,3)
If WOobeyed
Events
(iv) Define computational ordering, CMP, as a total- ordering S per address and per CPU that includes a valid linearization of all local read events and all write events:
CPU_i
STORE_i
CPU_j
STORE_j
R1(a,T) W2(b,1) R5(d,2)
W4(d,2) W6(d,3)
R3(c,T) W4(d,2) W6(d,3)
W2(b,1)
OneCMPorder(cmp1)
Another (cmp2)
R1(a,T) ;W2(b,1) ;R5(d,2) ;
R3(c,T) ;W4(d,2) ;W6(d,3) ;
CPU_i
STORE_i
CPU_j
STORE_j
The entire event-graph for CPU_i...
R1(a,T) W2(b,1) R5(d,2)
W4(d,2) W6(d,3)
RO
WO
- RO arcs present in event-graph if architecture obeys Read Ordering (similarly for WO, PO, CMP)
- Acyclic event-graph G => the architecture obeys ordering rules used in G
- Given execution E on architecture A, if all members of G(E,R) are cyclic, A violates one of the rules in R
cmp1
CPU_i
cmp2 leads to a cycle..(but cmp1 doesn’t..)
10/7/98 Ganesh, Utah Verifier group -- UT Austin talk
21
Example execution revealing (CMP,RO,WO) violation
wr(A,1) ;wr(A,2) ;wr(A,3) ;
CPU_i
STORE_i
CPU_j
STORE_j
rd(A,1) ;rd(A,3) ;rd(A,2) ;
rd(A,1) rd(A,3) rd(A,2)
wr(A,1) wr(A,2) wr(A,3)
RO
RO
WO
WO CMParcs
Any linearization consistent with this graph causes cycle:Eg1) w1 r1 w2 r2 w3 r3 -- not acceptable because r3 <RO r2Eg2) w1 r1 w2 r3 w3 r2 -- not acceptable because this is not a valid linearization...
10/7/98 Ganesh, Utah Verifier group -- UT Austin talk
22
P1A := 1A := 2A := 3
....A := k
P2X1 := AX2 := AX3 := A
....Xk := A
Drawbacks:- ARCHTEST runs on real machines * (very) late-cycle debugging! * non-deterministic interleavings non-exhaustive- What “k” to use?- P2 never writes into A - what if buggy “write-update” coherence protocol used?
ARCHTEST test for (CMP,RO,WO)
Check Invariant:For all j >= i, X(j) X(i)
10/7/98 Ganesh, Utah Verifier group -- UT Austin talk
23
Test Model Checking
• Adaptation of ARCHTEST to model-checking
• (Like ARCHTEST) tests are independent of the model being verified
• Usable at design-time (VIS- or PV-based model-checking)– Simulates the effect of K infinity– Considers all interleavings– Complete tests (defined later) does examine
all possible writes by both CPUs
R3(c,T) ;W4(d,2) ;W6(d,3) ;
CPU_i
STORE_i
CPU_j
STORE_j
Basic adaptation of ARCHTEST to get k=infinityand all interleavings is possible, assuming thatthe memory system is:
R1(a,T) ;W2(b,1) ;R5(d,2) ;
R5(d,2) ;
Projectible:
Any executionprojected ontoa subset of theaddresses isstill an execution
W4(d,2) ;W6(d,3) ;
Used in “limited addresstheorems” later...
Data independent:
Replacing a datavalue d in anexecution by f(d)results in anexecution
R3(c,T) ;W4(d,22) ;W6(d,3) ;
R1(a,T) ;W2(b,1) ;R5(d,22) ;
Used to define completetests...
10/7/98 Ganesh, Utah Verifier group -- UT Austin talk
25
Details• Define a formal shared memory
description language– “data is not used for control decisions”– “addresses are symmetric”
• Use Model checking– “Small number of addresses” sufficient
• Have applied technique to HP Runway / PA 8000 memory system, using PV
10/7/98 Ganesh, Utah Verifier group -- UT Austin talk
26
Test model-checking adaptation of ARCHTEST for (CMP,RO,WO) (k=infinity;
all non-det interleavings; still incomplete...)
rd(1)
rd(0)
rd(0)
rd(1)
wr(0)
wr(1)
wr(1)
Errorstate
P2P1P2
X1 := AX2 := AX3 := A
....Xk := A
P1A := 1A := 2A := 3
....A := k
Check InvariantFor all j >= i, X(j) X(i)
10/7/98 Ganesh, Utah Verifier group -- UT Austin talk
27
Completeness result for (CMP,RO,WO)
– For any number of CPUs (N >= 1), we need consider only executions over TWO addresses
– The proof is by showing that if there exists an event-graph cycle involving more than two addresses, there exists one with one less
10/7/98 Ganesh, Utah Verifier group -- UT Austin talk
28
Reducing all (CMP,RO,WO) cycles to those that contain only two addresses
R1(P1)
R2(P1)
W1(P2)
W2(P2)
W3(P3)
W4(P3)
P1:
R1(P1)
R2(P1)
W1(P2)
W2(P2)
W3(P3)
W4(P3)
P1:
RO
WO
WO
R1(P1)
R2(P1)
W1(P2)
W2(P2)
W3(P3)
W4(P3)
RO
WO
WO
CMP
CMP
CMP
R1(P1)
R2(P1)
W1(P2)
W2(P2)
W3(P3)
W4(P3)
RO
WO
WO
CMP
CMP
CMP
R3(P1)
RO
RO
Involves twoaddrs!
10/7/98 Ganesh, Utah Verifier group -- UT Austin talk
29
Executions to consider for verifying (CMP,RO,WO)
• How a complete test can be developed: Design a test automaton that non-deterministically examines all “relevant” executions over two addresses...
10/7/98 Ganesh, Utah Verifier group -- UT Austin talk
30
2-address (CMP,RO,WO) test is broken intothe following cases (approx...)
P1:
wr(A,0)or rd(A,-)
wr(A,1)or rd(A,-)
wr(A,1)or rd(A,-)
Error
wr(A,1)
rd(A,1)
rd(A,0)
P2:
wr(A,2)or rd(A,-)
wr(A,2)or rd(A,-)
Error
rd(A,1)
rd(A,0)
Case 1:
10/7/98 31
2-address (CMP,RO,WO) test, Case 2:
P1:
Sigma(0,0)
Error
P2:
wr(A,2)or wr(B,2)or rd(A,-)or rd(B,-)
Error
rd(B,1)
rd(A,0)
wr(A,2)or wr(B,2)or rd(A,-)or rd(B,-)
Sigma(1,0)
Sigma(1,1)
Sigma(1,1)
Sigma(2,2)
wr(A,1)
wr(B,1)
rd(B,1)
rd(A,0)
10/7/98 Ganesh, Utah Verifier group -- UT Austin talk
32
• All processors agree on the order of writes– WO imposes the order only if the writes are
from same program
Write Atomicity, and S.C.
wr(A,0)
wr(B,1)
SC is (CMP, PO, WA)
10/7/98 Ganesh, Utah Verifier group -- UT Austin talk
33
SC testing: need to consider N-address programs...
wr(a,1) ;rd(b,0) ;
wr(b,1) ;rd(c,0) ;
wr(c,1) ;rd(d,0) ;
wr(d,1) ;rd(a,0) ;
P1: P2: P3: P4:
E.g.: The execution
violates SC when all four addresses a,b,c,d areconsidered..... but is SC if only 3 addresses areconsidered at a time.... for example:
wr(a,1) ;rd(b,0) ;
wr(b,1) ;rd(c,0) ;
wr(c,1) ;rd(a,0) ;
P1: P2: P3: P4:
10/7/98 Ganesh, Utah Verifier group -- UT Austin talk
34
SC creates barriers wrt writesNow each event-space provides at least two writessuch that these writes are connected by thewrite-atomicity equivalence arcs:
w w w
ww w WA equivalence arcs
10/7/98 Ganesh, Utah Verifier group -- UT Austin talk
35
Complete Tests for SC
• Theorem: A system with N processors implements SC if and only if it has no errors on all n<N address programs
• Scheme for N processors– N barriers– Data written before, at, and after barrier
are different• data 0, 1, 2 for P0, and data 3, 4, 5 for P1
10/7/98 36
A portion of the complete test for SC for 1-address programs
P1:
Sigma(0)
Sigma(2)
wr(A,1)
rd(A,3)or rd(A,4)
P2:
Sigma(3)
Sigma(5)
Error: Saw 1,4 in P1 4,1 in P2
wr(A,4)
rd(A,0)or rd(A,1)
10/7/98 Ganesh, Utah Verifier group -- UT Austin talk
37
Case Studies
• Serial memory (operational semantics of`SC)• Lazy caching• Runway/PA system model
– Bus based design
– An aggressive split transaction protocol
– Out-of-order completion of transactions on Runway for high-performance
– In-order completion of instructions in PA for sequential consistency
10/7/98 Ganesh, Utah Verifier group -- UT Austin talk
38
Test Model checking of HP/RunwaySpin PV
PO-1 56K 2412
PO-2 > 5M/DNF 285K
SC-1 499K 7880
SC-2a > 5M/DNF 5.9M
SC-2b > 4M/DNF 574K
39
Conclusions• Test model-checking is practically viable• Work is in progress in adapting these ideas for
weak memory models• SC tests don’t scale well: future work:
– discover non-trivial equivalence relations to reduce execution-space even more
– Need to consider symmetries in the system– Need to build a tool integrated into the design
cycle of CPUs (performance evaluation + test model-checking must go hand in hand...)