a performance study of bdd-based model checking

A Performance Study of BDD-Based Model Checking

Bwolen Yang

Randal E. Bryant, David R. O’Hallaron,Armin Biere, Olivier Coudert, Geert Janssen

Rajeev K. Ranjan, Fabio Somenzi

2

Motivation for Studying Model Checking (MC)

MC is an important part of formal verification digital circuits and other finite state systems

BDD is an enabling technology for MC

Not well studied

Packages are tuned using combinational circuits (CC)

Qualitative differences between CC and MC computations CC: build outputs, constant time equivalence checking MC: build model, many fixed-points to verify the specs CC: BDD algorithms are polynomial MC: key BDD algorithms are exponential

3

Outline

BDD Overview

Organization of this Study participants, benchmarks, evaluation process

Experimental Results performance improvements characterizations of MC computations

BDD Evaluation Methodology evaluation platform

» various BDD packages» real workload

metrics

4

BDD Overview

BDD DAG representation for Boolean functions fixed order on Boolean variables

Set Representation represent set as Boolean function

» an element’s value is true <==> it is in the set

Transition Relation Representation set of pairs (current to next state transition) each state variable is split into two copies:

» current state and next state

5

BDD Overview (Cont’d)

BDD Algorithms dynamic programming sub-problems (operations)

» recursively apply Shannon decomposition memoization: computed cache

Garbage Collection recycle unreachable (dead) nodes

Dynamic Variable Reordering BDD graph size depends on the variable order sifting based

» nodes in adjacent levels are swapped

6

Organization of this Study:Participants

Armin Biere: ABCDCarnegie Mellon / Universität Karlsruhe

Olivier Coudert: TiGeRSynopsys / Monterey Design Systems

Geert Janssen: EHVEindhoven University of Technology

Rajeev K. Ranjan: CALSynopsys

Fabio Somenzi: CUDDUniversity of Colorado

Bwolen Yang: PBFCarnegie Mellon

7

Organization of this Study:Setup

Metrics: 17 statistics

Benchmark: 16 SMV execution traces traces of BDD-calls from verification of

» cache coherence, Tomasulo, phone, reactor, TCAS…

size» 6 million - 10 billion sub-operations» 1 - 600 MB of memory

Evaluation platform: trace driver “drives” BDD packages based on execution trace

8

Organization of this Study:Evaluation Process

Phase 1: no dynamic variable reordering

Phase 2: with dynamic variable reordering

hypothesizedesign

experiments validate

collect stats

identify issues

suggest improvements validate

Iterative Process

9

Phase 1 Results: Initial / Final

Conclusion: collaborative efforts have led to significant performance improvements

1

6

13

76

speedups > 100: 610 - 100: 16 5 - 10: 11 2 - 5: 28

1e+1

1e+2

1e+3

1e+4

1e+5

1e+6

1e+1 1e+2 1e+3 1e+4 1e+5 1e+6

final time (sec)

initi

al t

ime

(se

c)

1x10x100x

n/a

n/a

s

10

Computed Cache effects of computed cache size amounts of repeated sub-problems across time

Garbage Collection reachable / unreachable

Complement Edge Representation work space

Memory Locality for Breadth-First Algorithms





Phase 1:Hypotheses / Experiments

11

Phase 1:Hypotheses / Experiments (Cont’d)

For Comparison

ISCAS85 combinational circuits (> 5 sec, < 1GB) c2670, c3540 13-bit, 14-bit multipliers based on c6288

Metrics depends only on the trace and BDD algorithms machine-independent implementation-independent

12

Computed Cache:Repeated Sub-problems Across Time

Source of Speedup increase computed cache size

Possible Cause many repeated sub-problems are far apart in time

Validation study the number of repeated sub-problems across user issued operations (top-level operations).

13

Hypothesis:Top-Level Sharing

HypothesisMC computations have a large number of repeated

sub-problems across the top-level operations.

Experiment measure the minimum number of operations with GC disabled and complete cache. compare this with the same setup, but

cache is flushed between top-level operations.

14

Results on Top-Level Sharing

flush: cache flushed between top-level operations

Conclusion: large cache is more important for MC

1

10

100

Model Checking Traces ISCAS85

# o

f op

s(f

lush

/ m

in)

15

Garbage Collection:Rebirth Rate

Source of Speedup reduce GC frequency

Possible Cause many dead nodes become reachable again (rebirth)

» GC is delayed till the number of dead nodes reaches a threshold» dead nodes are reborn when they are part of the result of new sub-problems

16

Hypothesis:Rebirth Rate

HypothesisMC computations have very high rebirth rate.

Experiment

measure the number of deaths and the number of rebirths

17

Results on Rebirth Rate

0

0.2

0.4

0.6

0.8

1


reb

irth

s / d

ea

ths

Conclusions delay garbage collection triggering GC should not base only on # of dead nodes delay updating reference counts

18

BF BDD Construction

On MC traces, breadth-first based BDD

construction has no demonstrated advantage

over traditional depth-first based techniques.

Two packages (CAL and PBF) are BF based.

19

BF BDD Construction Overview

Level-by-Level Access operations on same level (variable) are processed together one queue per level

Localitygroup nodes of the same level together in memory

Good memory locality due to BF ==>

# of ops processed per queue visit must be high

20

Average BF Locality

Conclusion: MC traces generally have less BF locality

0

1000

2000

3000

4000

5000


# o

f op

s p

roce

sse

dp

er

qu

eu

e v

isit

21

Average BF Locality / Work

Conclusion: For comparable BF locality,MC computations do much more work.

0

50

100

150

200

250


avg

. lo

calit

y / #

of o

ps

(x 1

e-6

)

22

Phase 1:Some Issues / Open Questions

Memory Management space-time tradeoff

» computed cache size / GC frequency resource awareness

» available physical memory, memory limit, page fault rate

Top-Level Sharing possibly the main cause for

» strong cache dependency» high rebirth rate

better understanding may lead to» better memory management» higher level algorithms to exploit the pattern

23

Phase 2:Dynamic Variable Reordering

BDD Packages Used CAL, CUDD, EHV, TiGeR improvements from phase 1 incorporated

24

Why is Variable Reordering Hard to Study

Time-space tradeoff how much time to spent to reduce graph sizes

Chaotic behaviore.g., small changes to triggering / termination criteria

can have significant performance impact

Resource intensive reordering is expensive space of possible orderings is combinatorial

Different variable order ==> different computatione.g., many “don’t-care space” optimization algorithms

25

Quality of Variable Order Generated

Variable Grouping Heuristic keep strongly related variables adjacent

Reorder Transition Relation BDDs for the transition relation are used repeatedly

Effects of Initial Variable Order with and without variable reordering




Effects of Initial Variable Order with and without variable reordering

Phase 2:Experiments

Only CUDD is used

26

Variable Grouping Heuristic:Group Current / Next Variables

Current / Next State Variables for transition relation, state variable is split into two:

» current state and next state

HypothesisGrouping the corresponding current- and next-state variables is a good heuristic.

Experiment for both with and without grouping, measure

» work (# of operations)» space (max # of live BDD nodes)» reorder cost (# of nodes swapped with their children)

27

Results onGrouping Current / Next Variables

Space

0

1

2

3

MC traces

max

live

nod

es

Reorder Cost

0

1

2

3

MC traces

node

s sw

appe

d

Work

0

1

2

3

4

MC traces

# of

ops

All results are normalized against no variable grouping.

Conclusion: grouping is generally effective

28

Effects of Initial Variable Order:Experimental Setup

For each trace, find a good variable ordering O perturb O to generate new variable orderings

» fraction of variables perturbed» distance moved

measure the effects of these new orderings with and without dynamic variable reordering

29

Effects of Initial Variable Order:Perturbation Algorithm

Perturbation Parameters (p, d) p: probability that a variable will be perturbed d: perturbation distance

Properties in average, p fraction of variables is perturbed max distance moved is 2d

(p = 1, d = infinity) ==> completely random variable order

For each perturbation level (p, d)generate a number (sample size) of variable orders

30

Effects of Initial Variable Order:Parameters

Parameter Values p: (0.1, 0.2, …, 1.0) d: (10, 20, …, 100, infinity) sample size: 10

==> for each trace, 1100 orderings 2200 runs (w/ and w/o dynamic reordering)

31

Effects of Initial Variable Order:Smallest Test Case

Base Case (best ordering) time: 13 sec memory: 127 MB

Resource Limits on Generated Orders time: 128x base case memory: 500 MB

32

Effects of Initial Variable Order: Result

# of unfinished cases

At 128x/500MB limit, “no reorder” finished 33%, “reorder” finished 90%.

Conclusion: dynamic reordering is effective

0

400

800

1200

>1x >2x >4x >8x >16x >32x >64x >128x

time limit

# o

f ca

ses

no reorder

reorder

33

Phase 2:Some Issues / Open Questions

Computed Cache Flushing cost

Effects of Initial Variable Order determine sample size

Need New Better Experimental Design

34

BDD Evaluation Methodology

Trace-Driven Evaluation Platform real workload (BDD-call traces) study various BDD packages focus on key (expensive) operations

Evaluation Metrics more rigorous quantitative analysis

35

BDD Evaluation MethodologyMetrics: Time

elapsed time(performance)

CPU time page fault rate

BDD Alg timeGC timereordering

time

# of GCs# of node swaps

(reorder cost)

memoryusage

# of ops(work)

# of reorderings

computed cache size

36

BDD Evaluation MethodologyMetrics: Space

memoryusage

# of GCs reordering

time

computed cache size

max # ofBDD nodes

37

Importance of Better MetricsExample: Memory Locality

# of GCs# of

reorderingscomputed cache size

elapsed time(performance)

CPU time page fault rate

cache miss rate TLB miss rate

memory locality# of ops(work)

without accountingfor work, may lead to

false conclusions

38

Summary

Collaboration + Evaluation Methodology significant performance improvements

» up to 2 orders of magnitude

characterization of MC computation» computed cache size» garbage collection frequency» effects of complement edge» BF locality » reordering heuristic

– current / next state variable grouping» effects of reordering the transition relation» effects of initial variable orderings

other general results (not mentioned in this talk)

issues and open questions for future research

39

Conclusions

Rigorous quantitative analysis can lead to: dramatic performance improvements better understanding of computational characteristics

Adopt the evaluation methodology by: building more benchmark traces

» for IP issues, BDD-call traces are hard to understand

using / improving the proposed metrics for future evaluation

For data and BDD traces used in this study,http://www.cs.cmu.edu/~bwolen/fmcad98/

41





Phase 1:Hypotheses / Experiments

42




Effects of Initial Variable Order for both with and without variable reordering

Phase 2:Experiments

Only CUDD is used

43

Benchmark “Sizes”

Work Space# ofBDDVars

Min Ops(millions)

Max Live Nodes(thousands)

MC 86 – 600 6 – ~10000 53 – 26944

ISCAS85 26 – 233 15 – 178 4363 – 9662

Min Ops: minimum number of sub-problems/operations (no GC and complete cache)

Max Live Nodes: maximum number of live BDD nodes

44

Phase 1: Before/After

Cumulative Speedup Histogram

6 packages * 16 traces = 96 cases

2233

61

75 76 76

136 1

6

0

20

40

60

80

>1

00

>1

0

>5

>2

>1

>0

.95

>0

ne

w

faile

d

ba

d

speedups

# o

f ca

ses

45

Computed Cache Size Dependency

Hypothesis

The computed cache is more important for MC than for CC.

Experiment

Vary the cache size and measure its effects on work. size as a percentage of BDD nodes normalize the result to minimum amount of work necessary; i.e., no GC and complete cache.

46

Effects of Computed Cache Size

# of ops: normalized to the minimum number of operations

cache size: % of BDD nodes

MC Traces

1e+0

1e+1

1e+2

1e+3

1e+4

10% 20% 40% 80%

cache size

# o

f op

s

ISCAS85 Circuits

1e+0

1e+1

1e+2

1e+3

1e+4

10% 20% 40% 80%

cache size#

of o

ps

Conclusion: large cache is important for MC

47

Death Rate

0

0.5

1

1.5

2

2.5

3


de

ath

s / o

pe

ratio

ns

Conclusion: death rate for MC can be very high

48

Effects of Complement Edge

Work

0

0.5

1

1.5

2


# o

f op

sno

C.E

. / w

ith C

.E.

Conclusion: complement edge only affects work for CC

49

Effects of Complement Edge

Space

Conclusion: complement edge does not affect space

Note: maximum # of live BDD nodes would be better measure

0

0.5

1

1.5


sum

of B

DD

siz

es

no C

.E. /

with

C.E

.

50

Phase 2 Results

With / Without Reorder

4 packages * 16 traces = 64 cases

13

6

1

44

1e+1

1e+2

1e+3

1e+4

1e+5

1e+6

1e+1 1e+2 1e+3 1e+4 1e+5 1e+6

with reorder (sec)

no

re

ord

er

(se

c) 1x10x100x

.1x

.01x

n/a

n/a

51

Variable Reordering

Effects of Reordering Transition Relation Only

nodes swapped: number of nodes swapped with their children

Space

0

1

2

3

MC traces

max

live

nod

es

Reorder Cost

0

1

MC traces

node

s sw

appe

d

Work

0

1

2

3

MC traces

# of

ops

All results are normalized against variable reordering w/ grouping.

52

Quality of the New Variable Order

Experiment use the final variable ordering as the new initial order. compare the results using the new initial order with the results using the original variable order.

53

Results on Quality of the New Variable Order

Conclusion: qualities of new variable orders are generally good.

4

60

1e+1

1e+2

1e+3

1e+4

1e+5

1e+6

1e+1 1e+2 1e+3 1e+4 1e+5 1e+6

new initial order (sec)

orig

ina

l ord

er

(se

c)

n/a

n/a1x10x100x

.1x

.01x

54

No Reorder (> 4x or > 500Mb)

102030405060708090

10

0

inf

1.00.8

0.6

0.4

0.20

1

2

3

4

5

6

7

8

9

10#

of c

ase

s

distance

pro

ba

bili

ty

55

> 4x or > 500Mb

Conclusions: For very low perturbation, reordering does not work well.

Overall, very few cases get finished.

1030507090inf

1.00.7

0.4

0.10

2

4

6

8

10

# o

f ca

ses

distance

pro

b

No Reorder

1030507090inf

1.00.7

0.4

0.10

2

4

6

8

10

# o

f ca

ses

distance

pro

b

Reorder

56

> 32x or > 500Mb

Conclusion: variable reordering worked rather well

1030507090inf

1.00.7

0.4

0.10

2

4

6

8

10

# o

f ca

ses

distance

pro

b

No Reorder

1030507090inf

1.00.7

0.4

0.10

2

4

6

8

10

# o

f ca

ses

distance

pro

b

Reorder

57

Memory Out (> 512Mb)

Conclusion: memory intensive at highly perturbed region.

1030507090inf

1.00.7

0.4

0.10

2

4

6

8

10

# o

f ca

ses

distance

pro

b

No Reorder

1030507090inf

1.00.7

0.4

0.10

2

4

6

8

10

# o

f ca

ses

distance

pro

b

Reorder

58

Timed Out (> 128x)

diagonal band from lower-left to upper-right

1030507090inf

1.00.7

0.4

0.10

2

4

6

8

10

# o

f ca

ses

distance

pro

b

No Reorder

1030507090inf

1.00.7

0.4

0.10

2

4

6

8

10

# o

f ca

ses

distance

pro

b

Reorder

59

Issues and Open Questions:

Cross Top-Level Sharing

Potential experiment

identify how far apart these repetitions are

Why are there so many repeated sub-problems across top-level operations?

60

Issues and Open Questions:

Inconsistent Cross-Platform Results

For some BDD packages,

results on machine A are 2x faster than B

Other BDD packages,

the difference is not as significant

Probable Causememory hierarchy

This may shed some light on the memory locality issues.

a performance study of bdd-based model checking

Documents

performance study of

state variable

key bdd algorithms

variable order

mc computations cc

study participants

current state

evaluation processphase