formal design and verification methods for shared memory systems
DESCRIPTION
Formal Design and Verification Methods for Shared Memory Systems. Ratan Nalumasu Dissertation Defense September 10, 1998. Problems Facing Digital Design. Complexity Longer design time Shorter time to market. Current Debugging Technology. Full model Partial examination No assurance - PowerPoint PPT PresentationTRANSCRIPT
Formal Design and Verification Methods for Shared Memory
Systems
Ratan Nalumasu
Dissertation Defense
September 10, 1998
9/10/1998 Design Complexity 2
Problems Facing Digital Design
• Complexity
• Longer design time
• Shorter time to market
9/10/1998 Design Complexity 3
Current Debugging Technology
+ Full model– Partial examination No assurance– Weaker properties– Difficult correctness metrics– Full model
9/10/1998 Introduction to FM 4
Formal Methods
• Formal methods = Math based techniques
• Continuous math : Engineering =
Discrete math : Digital system design
“It is what the designers want. It’s just challenging to prove.”
9/10/1998 Introduction to FM 5
Formal Methods based Design
– Reduced model+ Complete examination+ Better assurances (on the reduced model)+ Stronger property language+ Better correctness metrics+ Reduced model
9/10/1998 Introduction to FM 6
FM Taxonomy
• Manual verification techniques: Interactive theorem provers
• Automatic verification techniques: Model checkers
• Compilation techniques:
Refinement rules
9/10/1998 Theorem Provers 7
Interactive Theorem Provers
+ Can deal with infinite state systems
– Extensive manual reasoning
Proof of a compilation scheme
+ Good for algorithm verification
9/10/1998 Model Checking 8
Model Checking
process p(x) { global G; local L;
while (...) { recv ...; send ...; }}process q(x,y) ...
2
0
1
3
(G=0, p.L=0, ...)
9/10/1998 Model Checking 9
Model Checking Strengths
• Automatic
• If property fails, model checker shows the error trace– Deadlock: How initial state reached it– Assertion: How initial state reached it– Starvation: A loop where no progress is
made
9/10/1998 Model Checking 10
Model Checking: Example
• Construct graph of the system, and check the property: Deadlock at (22)
0
1
2
0
1
2
00
10
20
21
22
01
11 02
12
• State ExplosionPartial Order Reductions
9/10/1998 Refinement Algorithms 11
Refinement Algorithms
• Need to verify only high-level protocols
• Domain-specific compilers can generate efficient implementations
Refinement rules for DSM protocols
9/10/1998 Applied FM 12
State of the art of Applied FM
+ General purpose
+ Widely applicable techniques
– Inefficient algorithms
– Inefficient “compilers”
– Do not help with domain specific concerns
9/10/1998 13
Thesis Statement
Domain specific formal methods• Efficient verification techniques• Address domain specific concerns
Domain:
Memory
CPUCPU
Memory
9/10/1998 14
Overview
• Introduction to formal verification Shared memory systems
• Contributions
• Conclusions
9/10/1998 Memory Bottleneck 15
Memory Bottleneck
• Processor speed increases at 55% a year, while memory speed increases at 7%– Caches
• Tendency toward multiprocessors– Further imbalance complex protocols– SMP systems– DSM systems
9/10/1998 SMP Architecture 16
Symmetric Multiprocessors
Can scale upto 10s of processorsModern caches have support for such SMP
protocols
CPU$
Memory
CPU$
CPU$
9/10/1998 SMP Protocols 17
SMP Protocol Design
• Bus protocols– Bus arbitration algorithm– Cache invalidation scheme– Lack of atomicity on the bus
• Bus and CPU interaction– Does CPU have out-of-order execution?– Does bus allow out-of-order completion?
• Are these decisions visible to software?
9/10/1998 DSM Architecture 18
Distributed Shared Memory
NODE NODE NODE
MEM MEM MEM
Network
Each node may be a SMP or a single CPU
9/10/1998 DSM Protocols 19
DSM Protocol Design
• Network port arbitration
• Coherency maintenance across the network– Maintaining distributed state– Little atomicity– “Ghost” messages– Transient states
• Are these decisions visible to software?
9/10/1998 Shared Memory Systems 20
Shared Memory Correctness
• Low level:– deadlock– forward progress– bus arbitration
• Intermediate level:– at most one owner of a cache line at a
time
• High-level:– abstraction provided to the software
9/10/1998 Software Interface 21
Abstraction Provided to Software
Multiprocessor:P1write(a,new)read(b)
P2write(b,new)read(a)
P1read(b)write(a,new)
P2read(a)write(b, new)
Not okunderS.C.
Uniprocessor: P1write(a,new)read(b)
P1read(b)write(a,new)
okcache/compiler/out-of-order execution
Test model checking
9/10/1998 22
Overview• Introduction to formal verification• Shared Memory systems Contributions
– mitigating state explosion • Partial order reduction algorithm
– facilitating high-level design• Protocol synthesis algorithm
– enhancing applicability• High-level correctness such as SC
• Conclusions
9/10/1998 Contributions 23
Contributions
Protocol
PO algorithm1
TestModel checking
2
2
Refinement rules3
Efficient implementation3
Contribution #1
Mitigating State Explosion Problem
Partial Order Reductions
9/10/1998 PO Reductions 25
Partial Order Reductions
00
10
20
21
22
0
1
2
0
1
2
00
10
20
21
22
01
11 02
12
If two transitions are independent, thenexplore one of them postponing the other
9/10/1998 PO Reductions 26
Ignoring Problem
Select some transitions, and postpone others but do not postpone forever
S0
S1
Postponed
Postponed
9/10/1998 PO Reductions 27
Proviso based Solution
Godefroid, Valmari, Holzmann, Peled’s solutions are very similar: Proviso– Expands the “last” state of the loop
completely
S0
S1
Postponed
Expand
9/10/1998 PO Reductions 28
Problem with Proviso
12
11 01 21
00
Q postponed
10
202202
ALL 9 states
0
1 2P
0
1 2Q
9/10/1998 PO Reductions 29
Our Algorithm: 2-phase
00
0110 20 02
0
1 2P
0
1 2Q
Only 5 states
9/10/1998 PO Reductions 30
States TimeMig (Spin) 113,628 13.6
Mig (2 PV) 9,185 1.7
Inv (Spin) > 620,446 DNF
Inv (2 PV) 135,404 21.2
Performance Comparison
05,000
10,00015,00020,000
SC2 SC3 SC4 Pftp Snpy
SPIN
PV
(20x)
Contribution #2
Facilitating High-level Design
Protocol Refinement
9/10/1998 Refinement Algorithms 32
Protocol Refinement
• PO reductions not sufficient, theorem provers ruled out
• Compile from high-level protocol specification– easier to design– easier to verify– can generate efficient implementation
using domain knowledge
9/10/1998 Refinement Algorithms 33
Unexpected Messages
PP
recv ack
from Q
Send a
req to Q
Some request ???Always nack no forward progressAlways Silence Deadlock
9/10/1998 Refinement Algorithms 34
Refinement Procedures
• Debug the high-level specification: Synchronous communication with no transient states
• Automatic refinement procedures transforms it into detailed implementation– No need to verify the implementation– Needs domain specific knowledge for
efficiency
9/10/1998 Refinement Algorithms 35
Related Work
• Buckley & Silberschatz, 83– For OS environments, not fit for
hardware
• Gribomont,90– Protocols where synchronous
messages can be simply replaced by asynchronous messages
9/10/1998 Refinement Algorithms 36
Related Work (contd)
• Teapot, 96 for DSM systems (Chandra)– Protocol programming language– “Suspend” construct for transient states– Not high-level: Suspend states still
specify what to do in a transient state
9/10/1998 Refinement Algorithms 37
Context: DSM Protocols
Network
Protocol per each cache line1 home, n “remote” nodes per each lineHome is responsible for maintaining
consistency (Hub)
NODE
MEM
NODE
MEM
NODE
MEM
9/10/1998 Refinement Algorithms 38
Refinement Rules
Req
Ack orNack
Home Remote
Req
Ack orNack
Home Remote
9/10/1998 Refinement Algorithms 39
Refinement Rules (2)
Req1 isignored bybothprocesses
Home Remote
Req1 Req2
Ack orNack
9/10/1998 Refinement Algorithms 40
Debugging EffortProtocol N Low-level High-level
specMig 2 54 23,164/2.8
4 235/0.4
8 965/0.5
Inv 2 546/0.6 193389/20.6
4 18686/18.4
Protocol compilation scheme has beenproved using a theorem prover
Contribution #3
Enhancing Applicability
Shared Memory Model Verification
9/10/1998 Test Model Checking 42
Relaxing Instruction Orders
P1write(a,new)read(b)
P2write(b,new)read(a)
P1read(b)write(a,new)
P2read(a)write(b,new)
UnderSC
9/10/1998 Test Model Checking 43
Verification of HW/SW Interface
SC:SC:The result can be explained bysome interleaving of the instructions.
Test modelchecking
CPU $
Memory
CPU $
CPU $
9/10/1998 Test Model Checking 44
Current Verification Techniques
• Simulation– Must study lengthy executions– Must choose non-trivial programs
• Formal techniques (next slide)
9/10/1998 Test Model Checking 45
Related Work
• Graf’s Lazy caching in ACTL*
• Gibbons approach run programs and check if the results are SC
• McMillan’s thesis data abstraction for a test
• Hojati data abstraction in a different context
• Undecidability result by Alur et al
9/10/1998 Test Model Checking 46
ACTL* for (stronger than) SC
• AG(enabled( read(a,d) )) avail(a,d)• AG(avail(a,d) AND EF(enable(read(a,d))))
A[NOT avail(a,d) W AG NOT avail(a,d)]• ...• init AG[after(write(a,d))
A(NOT enabled(read(a,d) W avail(a,d))]
Such MODEL DEPENDENT SPECS do not fit in an iterative industrial frame
9/10/1998 Test Model Checking 47
Test Model Checking
• Adaptation of simulation to model checking– model checking (full coverage) + testing (“black box approach’’)
• Tests are independent of the model being verified manual effort is considerably reduced – Test model-checking can be used early
in the design cycle
9/10/1998 Test Model Checking 48
Results
• Defined a shared memory description language– “data is not used for control decisions”– “addresses are symmetric”– Can specify HP’s Runway/PA, ...
• Model checking technique– “Small number of addresses is
sufficient”
• Application to runway/PA using PV
9/10/1998 Test Model Checking 49
If P1 executes two write instructions, then P2 sees them in the program order of P1
P1A := 1A := 2A := 3
....A := k
P2X1 := AX2 := AX3 := A
....Xk := A
Many deficiencies
Read Order, Write Order
X(i+1) X(i)
9/10/1998 Test Model Checking 50
Deficiencies of the Test
• Finite k– What if an error occurs for a really large
k?
• Location “A” is never written by P2– What if an error occurs when the
ownership changes?
• Only 1-address– The definitions of RO and WO are not
restricted to a single address at a time– How many addresses to consider?
9/10/1998 Test Model Checking 51
Data abstraction + non-determinism
Unbounded k
rd(1)
rd(0)
rd(0)
rd(1)
wr(0)
wr(1)
wr(1)
Non-deterministicchange
9/10/1998 Test Model Checking 52
Ownership Changes
rd(1)
rd(0)
rd(0)
rd(1)
wr(0)
wr(1)
wr(1)or rd(-)
or rd(-)
or wr(2)
or wr(2)
Complete 1-address test
9/10/1998 Test Model Checking 53
2-address (RO, WO) test
wr(1)
rd(0)
rd(-) OR wr(0) rd(-) OR wr(2)
rd(-) OR wr(1)
rd(1)
rd(A,-) OR rd(B,-) ORwr(A,0) OR wr(B,0)
rd(A,-) OR wr(A,1) ORrd(B,-) OR wr(B,1)
rd(A,-) OR or rd(B-) ORwr(A,2) OR wr(B,2)
rd(B,1)wr(A,1)
rd(A,0)
9/10/1998 Test Model Checking 54
2-address (RO, WO) test
rd(A,-) OR rd(B,-) ORwr(A,0) OR wr(B,0)
rd(A,-) OR wr(A,1) ORrd(B,-) OR wr(B,1)
rd(A,-) OR or rd(B-) ORwr(A,2) OR wr(B,2)
rd(B,1)wr(A,1)
rd(A,0)
9/10/1998 Test Model Checking 55
Complete Test for (RO, WO)
• Theorem: A system implements (RO, WO) if and only if it has no errors on all 1- and 2-address programs
• Complete 1-address and 2-address tests
9/10/1998 Test Model Checking 56
Program Order
• PO generalizes RO and WO to include orderings between a read followed by write, and write followed by read
rd(A)
rd(B)
wr(A)
rd(B)
RO
RW
WR
PO
9/10/1998 Test Model Checking 57
• All processors agree on the order of writes– WO imposes the order only if the writes are
from same program
Write Atomicity
wr(A,0)
wr(B,1)
SC is (PO, WA)
9/10/1998 Test Model Checking 58
1-address SC test
ORDER:ORDER:1, 4OR4, 1
P0P0
A := 0rd(A)
A := 1
A := 2rd(A)
P1P1
A := 3rd(A)
A := 4
A := 5rd(A)
Barrier
9/10/1998 Test Model Checking 59
Complete Tests for SC
• Theorem: A system with N processors implements SC if and only if it has no errors on programs n<N address programs
• Scheme for N processors– N barriers– Data written before, at, and after barrier
are different• data 0, 1, 2 for P0, and data 3, 4, 5 for
P1
9/10/1998 Test Model Checking 60
Case Studies
• Serial memory (operational semantics of SC)
• Lazy caching• Runway/PA system model
– Bus based design
– An aggressive split transaction protocol
– Out-of-order completion of transactions on Runway for high-performance
– In-order completion of instructions in PA for sequential consistency
9/10/1998 Test Model Checking 61
Test Model checking of HP/RunwaySpin PV
PO-1 56K 2412
PO-2 > 5M/DNF 285K
SC-1 499K 7880
SC-2a > 5M/DNF 5.9M
SC-2b > 4M/DNF 574K
9/10/1998 62
Conclusion
Showed that specializing formal methods for a particular domain (shared memory) leads to efficient verification techniques for the domain, and increases the applicability of the formal methods– Two phase algorithm– Refinement procedure– Memory model verification
9/10/1998 63
Future Work
• Model checking algorithms– better partial order algorithms– tune for test model checking
• Protocol synthesis– More optimizations
• Test model checking– Weaker memory models, other objects– Application to other fields