a performance study of bdd-based model checking
DESCRIPTION
A Performance Study of BDD-Based Model Checking. Bwolen Yang. Randal E. Bryant, David R. O’Hallaron, Armin Biere, Olivier Coudert, Geert Janssen Rajeev K. Ranjan, Fabio Somenzi. Motivation for Studying Model Checking (MC). MC is an important part of formal verification - PowerPoint PPT PresentationTRANSCRIPT
A Performance Study of BDD-Based Model Checking
Bwolen Yang
Randal E. Bryant, David R. O’Hallaron,Armin Biere, Olivier Coudert, Geert Janssen
Rajeev K. Ranjan, Fabio Somenzi
2
Motivation for Studying Model Checking (MC)
MC is an important part of formal verification digital circuits and other finite state systems
BDD is an enabling technology for MC
Not well studied
Packages are tuned using combinational circuits (CC)
Qualitative differences between CC and MC computations CC: build outputs, constant time equivalence checking MC: build model, many fixed-points to verify the specs CC: BDD algorithms are polynomial MC: key BDD algorithms are exponential
3
Outline
BDD Overview
Organization of this Study participants, benchmarks, evaluation process
Experimental Results performance improvements characterizations of MC computations
BDD Evaluation Methodology evaluation platform
» various BDD packages» real workload
metrics
4
BDD Overview
BDD DAG representation for Boolean functions fixed order on Boolean variables
Set Representation represent set as Boolean function
» an element’s value is true <==> it is in the set
Transition Relation Representation set of pairs (current to next state transition) each state variable is split into two copies:
» current state and next state
5
BDD Overview (Cont’d)
BDD Algorithms dynamic programming sub-problems (operations)
» recursively apply Shannon decomposition memoization: computed cache
Garbage Collection recycle unreachable (dead) nodes
Dynamic Variable Reordering BDD graph size depends on the variable order sifting based
» nodes in adjacent levels are swapped
6
Organization of this Study:Participants
Armin Biere: ABCDCarnegie Mellon / Universität Karlsruhe
Olivier Coudert: TiGeRSynopsys / Monterey Design Systems
Geert Janssen: EHVEindhoven University of Technology
Rajeev K. Ranjan: CALSynopsys
Fabio Somenzi: CUDDUniversity of Colorado
Bwolen Yang: PBFCarnegie Mellon
7
Organization of this Study:Setup
Metrics: 17 statistics
Benchmark: 16 SMV execution traces traces of BDD-calls from verification of
» cache coherence, Tomasulo, phone, reactor, TCAS…
size» 6 million - 10 billion sub-operations» 1 - 600 MB of memory
Evaluation platform: trace driver “drives” BDD packages based on execution trace
8
Organization of this Study:Evaluation Process
Phase 1: no dynamic variable reordering
Phase 2: with dynamic variable reordering
hypothesizedesign
experiments validate
collect stats
identify issues
suggest improvements validate
Iterative Process
9
Phase 1 Results: Initial / Final
Conclusion: collaborative efforts have led to significant performance improvements
1
6
13
76
speedups > 100: 610 - 100: 16 5 - 10: 11 2 - 5: 28
1e+1
1e+2
1e+3
1e+4
1e+5
1e+6
1e+1 1e+2 1e+3 1e+4 1e+5 1e+6
final time (sec)
initi
al t
ime
(se
c)
1x10x100x
n/a
n/a
s
10
Computed Cache effects of computed cache size amounts of repeated sub-problems across time
Garbage Collection reachable / unreachable
Complement Edge Representation work space
Memory Locality for Breadth-First Algorithms
Computed Cache effects of computed cache size amounts of repeated sub-problems across time
Garbage Collection reachable / unreachable
Complement Edge Representation work space
Memory Locality for Breadth-First Algorithms
Phase 1:Hypotheses / Experiments
11
Phase 1:Hypotheses / Experiments (Cont’d)
For Comparison
ISCAS85 combinational circuits (> 5 sec, < 1GB) c2670, c3540 13-bit, 14-bit multipliers based on c6288
Metrics depends only on the trace and BDD algorithms machine-independent implementation-independent
12
Computed Cache:Repeated Sub-problems Across Time
Source of Speedup increase computed cache size
Possible Cause many repeated sub-problems are far apart in time
Validation study the number of repeated sub-problems across user issued operations (top-level operations).
13
Hypothesis:Top-Level Sharing
HypothesisMC computations have a large number of repeated
sub-problems across the top-level operations.
Experiment measure the minimum number of operations with GC disabled and complete cache. compare this with the same setup, but
cache is flushed between top-level operations.
14
Results on Top-Level Sharing
flush: cache flushed between top-level operations
Conclusion: large cache is more important for MC
1
10
100
Model Checking Traces ISCAS85
# o
f op
s(f
lush
/ m
in)
15
Garbage Collection:Rebirth Rate
Source of Speedup reduce GC frequency
Possible Cause many dead nodes become reachable again (rebirth)
» GC is delayed till the number of dead nodes reaches a threshold» dead nodes are reborn when they are part of the result of new sub-problems
16
Hypothesis:Rebirth Rate
HypothesisMC computations have very high rebirth rate.
Experiment
measure the number of deaths and the number of rebirths
17
Results on Rebirth Rate
0
0.2
0.4
0.6
0.8
1
Model Checking Traces ISCAS85
reb
irth
s / d
ea
ths
Conclusions delay garbage collection triggering GC should not base only on # of dead nodes delay updating reference counts
18
BF BDD Construction
On MC traces, breadth-first based BDD
construction has no demonstrated advantage
over traditional depth-first based techniques.
Two packages (CAL and PBF) are BF based.
19
BF BDD Construction Overview
Level-by-Level Access operations on same level (variable) are processed together one queue per level
Localitygroup nodes of the same level together in memory
Good memory locality due to BF ==>
# of ops processed per queue visit must be high
20
Average BF Locality
Conclusion: MC traces generally have less BF locality
0
1000
2000
3000
4000
5000
Model Checking Traces ISCAS85
# o
f op
s p
roce
sse
dp
er
qu
eu
e v
isit
21
Average BF Locality / Work
Conclusion: For comparable BF locality,MC computations do much more work.
0
50
100
150
200
250
Model Checking Traces ISCAS85
avg
. lo
calit
y / #
of o
ps
(x 1
e-6
)
22
Phase 1:Some Issues / Open Questions
Memory Management space-time tradeoff
» computed cache size / GC frequency resource awareness
» available physical memory, memory limit, page fault rate
Top-Level Sharing possibly the main cause for
» strong cache dependency» high rebirth rate
better understanding may lead to» better memory management» higher level algorithms to exploit the pattern
23
Phase 2:Dynamic Variable Reordering
BDD Packages Used CAL, CUDD, EHV, TiGeR improvements from phase 1 incorporated
24
Why is Variable Reordering Hard to Study
Time-space tradeoff how much time to spent to reduce graph sizes
Chaotic behaviore.g., small changes to triggering / termination criteria
can have significant performance impact
Resource intensive reordering is expensive space of possible orderings is combinatorial
Different variable order ==> different computatione.g., many “don’t-care space” optimization algorithms
25
Quality of Variable Order Generated
Variable Grouping Heuristic keep strongly related variables adjacent
Reorder Transition Relation BDDs for the transition relation are used repeatedly
Effects of Initial Variable Order with and without variable reordering
Quality of Variable Order Generated
Variable Grouping Heuristic keep strongly related variables adjacent
Reorder Transition Relation BDDs for the transition relation are used repeatedly
Effects of Initial Variable Order with and without variable reordering
Phase 2:Experiments
Only CUDD is used
26
Variable Grouping Heuristic:Group Current / Next Variables
Current / Next State Variables for transition relation, state variable is split into two:
» current state and next state
HypothesisGrouping the corresponding current- and next-state variables is a good heuristic.
Experiment for both with and without grouping, measure
» work (# of operations)» space (max # of live BDD nodes)» reorder cost (# of nodes swapped with their children)
27
Results onGrouping Current / Next Variables
Space
0
1
2
3
MC traces
max
live
nod
es
Reorder Cost
0
1
2
3
MC traces
node
s sw
appe
d
Work
0
1
2
3
4
MC traces
# of
ops
All results are normalized against no variable grouping.
Conclusion: grouping is generally effective
28
Effects of Initial Variable Order:Experimental Setup
For each trace, find a good variable ordering O perturb O to generate new variable orderings
» fraction of variables perturbed» distance moved
measure the effects of these new orderings with and without dynamic variable reordering
29
Effects of Initial Variable Order:Perturbation Algorithm
Perturbation Parameters (p, d) p: probability that a variable will be perturbed d: perturbation distance
Properties in average, p fraction of variables is perturbed max distance moved is 2d
(p = 1, d = infinity) ==> completely random variable order
For each perturbation level (p, d)generate a number (sample size) of variable orders
30
Effects of Initial Variable Order:Parameters
Parameter Values p: (0.1, 0.2, …, 1.0) d: (10, 20, …, 100, infinity) sample size: 10
==> for each trace, 1100 orderings 2200 runs (w/ and w/o dynamic reordering)
31
Effects of Initial Variable Order:Smallest Test Case
Base Case (best ordering) time: 13 sec memory: 127 MB
Resource Limits on Generated Orders time: 128x base case memory: 500 MB
32
Effects of Initial Variable Order: Result
# of unfinished cases
At 128x/500MB limit, “no reorder” finished 33%, “reorder” finished 90%.
Conclusion: dynamic reordering is effective
0
400
800
1200
>1x >2x >4x >8x >16x >32x >64x >128x
time limit
# o
f ca
ses
no reorder
reorder
33
Phase 2:Some Issues / Open Questions
Computed Cache Flushing cost
Effects of Initial Variable Order determine sample size
Need New Better Experimental Design
34
BDD Evaluation Methodology
Trace-Driven Evaluation Platform real workload (BDD-call traces) study various BDD packages focus on key (expensive) operations
Evaluation Metrics more rigorous quantitative analysis
35
BDD Evaluation MethodologyMetrics: Time
elapsed time(performance)
CPU time page fault rate
BDD Alg timeGC timereordering
time
# of GCs# of node swaps
(reorder cost)
memoryusage
# of ops(work)
# of reorderings
computed cache size
36
BDD Evaluation MethodologyMetrics: Space
memoryusage
# of GCs reordering
time
computed cache size
max # ofBDD nodes
37
Importance of Better MetricsExample: Memory Locality
# of GCs# of
reorderingscomputed cache size
elapsed time(performance)
CPU time page fault rate
cache miss rate TLB miss rate
memory locality# of ops(work)
without accountingfor work, may lead to
false conclusions
38
Summary
Collaboration + Evaluation Methodology significant performance improvements
» up to 2 orders of magnitude
characterization of MC computation» computed cache size» garbage collection frequency» effects of complement edge» BF locality » reordering heuristic
– current / next state variable grouping» effects of reordering the transition relation» effects of initial variable orderings
other general results (not mentioned in this talk)
issues and open questions for future research
39
Conclusions
Rigorous quantitative analysis can lead to: dramatic performance improvements better understanding of computational characteristics
Adopt the evaluation methodology by: building more benchmark traces
» for IP issues, BDD-call traces are hard to understand
using / improving the proposed metrics for future evaluation
For data and BDD traces used in this study,http://www.cs.cmu.edu/~bwolen/fmcad98/
41
Computed Cache effects of computed cache size amounts of repeated sub-problems across time
Garbage Collection reachable / unreachable
Complement Edge Representation work space
Memory Locality for Breadth-First Algorithms
Phase 1:Hypotheses / Experiments
42
Quality of Variable Order Generated
Variable Grouping Heuristic keep strongly related variables adjacent
Reorder Transition Relation BDDs for the transition relation are used repeatedly
Effects of Initial Variable Order for both with and without variable reordering
Phase 2:Experiments
Only CUDD is used
43
Benchmark “Sizes”
Work Space# ofBDDVars
Min Ops(millions)
Max Live Nodes(thousands)
MC 86 – 600 6 – ~10000 53 – 26944
ISCAS85 26 – 233 15 – 178 4363 – 9662
Min Ops: minimum number of sub-problems/operations (no GC and complete cache)
Max Live Nodes: maximum number of live BDD nodes
44
Phase 1: Before/After
Cumulative Speedup Histogram
6 packages * 16 traces = 96 cases
2233
61
75 76 76
136 1
6
0
20
40
60
80
>1
00
>1
0
>5
>2
>1
>0
.95
>0
ne
w
faile
d
ba
d
speedups
# o
f ca
ses
45
Computed Cache Size Dependency
Hypothesis
The computed cache is more important for MC than for CC.
Experiment
Vary the cache size and measure its effects on work. size as a percentage of BDD nodes normalize the result to minimum amount of work necessary; i.e., no GC and complete cache.
46
Effects of Computed Cache Size
# of ops: normalized to the minimum number of operations
cache size: % of BDD nodes
MC Traces
1e+0
1e+1
1e+2
1e+3
1e+4
10% 20% 40% 80%
cache size
# o
f op
s
ISCAS85 Circuits
1e+0
1e+1
1e+2
1e+3
1e+4
10% 20% 40% 80%
cache size#
of o
ps
Conclusion: large cache is important for MC
47
Death Rate
0
0.5
1
1.5
2
2.5
3
Model Checking Traces ISCAS85
de
ath
s / o
pe
ratio
ns
Conclusion: death rate for MC can be very high
48
Effects of Complement Edge
Work
0
0.5
1
1.5
2
Model Checking Traces ISCAS85
# o
f op
sno
C.E
. / w
ith C
.E.
Conclusion: complement edge only affects work for CC
49
Effects of Complement Edge
Space
Conclusion: complement edge does not affect space
Note: maximum # of live BDD nodes would be better measure
0
0.5
1
1.5
Model Checking Traces ISCAS85
sum
of B
DD
siz
es
no C
.E. /
with
C.E
.
50
Phase 2 Results
With / Without Reorder
4 packages * 16 traces = 64 cases
13
6
1
44
1e+1
1e+2
1e+3
1e+4
1e+5
1e+6
1e+1 1e+2 1e+3 1e+4 1e+5 1e+6
with reorder (sec)
no
re
ord
er
(se
c) 1x10x100x
.1x
.01x
n/a
n/a
51
Variable Reordering
Effects of Reordering Transition Relation Only
nodes swapped: number of nodes swapped with their children
Space
0
1
2
3
MC traces
max
live
nod
es
Reorder Cost
0
1
MC traces
node
s sw
appe
d
Work
0
1
2
3
MC traces
# of
ops
All results are normalized against variable reordering w/ grouping.
52
Quality of the New Variable Order
Experiment use the final variable ordering as the new initial order. compare the results using the new initial order with the results using the original variable order.
53
Results on Quality of the New Variable Order
Conclusion: qualities of new variable orders are generally good.
4
60
1e+1
1e+2
1e+3
1e+4
1e+5
1e+6
1e+1 1e+2 1e+3 1e+4 1e+5 1e+6
new initial order (sec)
orig
ina
l ord
er
(se
c)
n/a
n/a1x10x100x
.1x
.01x
54
No Reorder (> 4x or > 500Mb)
102030405060708090
10
0
inf
1.00.8
0.6
0.4
0.20
1
2
3
4
5
6
7
8
9
10#
of c
ase
s
distance
pro
ba
bili
ty
55
> 4x or > 500Mb
Conclusions: For very low perturbation, reordering does not work well.
Overall, very few cases get finished.
1030507090inf
1.00.7
0.4
0.10
2
4
6
8
10
# o
f ca
ses
distance
pro
b
No Reorder
1030507090inf
1.00.7
0.4
0.10
2
4
6
8
10
# o
f ca
ses
distance
pro
b
Reorder
56
> 32x or > 500Mb
Conclusion: variable reordering worked rather well
1030507090inf
1.00.7
0.4
0.10
2
4
6
8
10
# o
f ca
ses
distance
pro
b
No Reorder
1030507090inf
1.00.7
0.4
0.10
2
4
6
8
10
# o
f ca
ses
distance
pro
b
Reorder
57
Memory Out (> 512Mb)
Conclusion: memory intensive at highly perturbed region.
1030507090inf
1.00.7
0.4
0.10
2
4
6
8
10
# o
f ca
ses
distance
pro
b
No Reorder
1030507090inf
1.00.7
0.4
0.10
2
4
6
8
10
# o
f ca
ses
distance
pro
b
Reorder
58
Timed Out (> 128x)
diagonal band from lower-left to upper-right
1030507090inf
1.00.7
0.4
0.10
2
4
6
8
10
# o
f ca
ses
distance
pro
b
No Reorder
1030507090inf
1.00.7
0.4
0.10
2
4
6
8
10
# o
f ca
ses
distance
pro
b
Reorder
59
Issues and Open Questions:
Cross Top-Level Sharing
Potential experiment
identify how far apart these repetitions are
Why are there so many repeated sub-problems across top-level operations?
60
Issues and Open Questions:
Inconsistent Cross-Platform Results
For some BDD packages,
results on machine A are 2x faster than B
Other BDD packages,
the difference is not as significant
Probable Causememory hierarchy
This may shed some light on the memory locality issues.