copyright by jason raymond baumgartner 2002...automatic structural abstraction techniques for...

Copyright

by

Jason Raymond Baumgartner

2002

The Dissertation Committee for Jason Raymond Baumgartner

certifies that this is the approved version of the following dissertation:

Automatic Structural Abstraction Techniques for

Enhanced Verification

Committee:

Jacob Abraham, Supervisor

Andreas Kuehlmann

Adnan Aziz

E. Allen Emerson

Lizy Kurian John



by

Jason Raymond Baumgartner, B.S., M.S.

Dissertation

Presented to the Faculty of the Graduate School of

The University of Texas at Austin

in Partial Fulfillment

of the Requirements

for the Degree of

Doctor of Philosophy


December 2002

To my wife Shelly, to my parents,

to my grandmothers and the memory of my grandfathers, to the memory of Clover,

and to the semi-formal team at IBM

Acknowledgments

This acknowledgment is partitioned into several components.� I wish to first acknowledge those who were the most central to the research contribu-

tions of this thesis, as indeed this is the acknowledgment section of this thesis.

– After graduation with a Bachelor of Science in Electrical Engineering, I joined

IBM Austin in 1995, and quickly became involved in functional verification.

In 1996 I began the Master’s program at UT, and one of my first classes was

Formal Verification taught by Adnan Aziz. This class was the reason that I de-

veloped a passion for formal verification; within several months I began model

checking at IBM using RuleBase. Furthermore, my term projects in Adnan’s

classes involved structural abstractions entailing statefolding in a property

checking framework, which evolved into phase abstraction and -slow abstrac-

tion. I wish to thank Adnan for bringing me into the world of formal verifica-

tion. I also wish to thank and acknowledge Vigyan Singhal forcollaborations

and insight during these projects. Both Adnan and Vigyan were instrumental to

the development of my skills in technical writing and formalreasoning.

– I next wish to acknowledge Andreas Kuehlmann, whom I met in 1999. By

this time I had been deploying functional formal verification at IBM for sev-

eral years, relying upon every trick in the book and a few others to attempt to

v

push large Gigahertz designs through model checkers.1 Numerous ideas for the

automation of some of these tricks were floating around my head at this time,

such as generalizations of the acyclic -slow abstraction which evolved into our

approach for structural diameter overapproximation, and some reductions that

have been subsumed by retiming and structural target enlargement. Andreas

was the visionary behind the concept oftransformation-based verification, en-

abling the synergistic application of various structural abstractions in a verifi-

cation framework. I immediately acknowledged this framework as the cohesive

force to unify all of my ideas and more. Andreas additionallyprovided many

brilliant concepts to the retiming work described herein, and has been central

to the development of my technical writing and formal reasoning skills since.

One virtue (or curse) I inherited through Andreas is the notion that “Excellent

is not good enough, because there is always better.” What this means below

the surface of the words is that behind every good idea is an even more general

concept.2 I owe many of the results of this thesis, and much of my technical

focus, to Andreas.

– I wish to briefly acknowledge those whose efforts have most inspired and in-

fluenced this work, and those whom have provided me stimulating discussion

during the past: my supervisor Jacob Abraham, E. Allen Emerson, Aarti Gupta,

Alan Hu, Kenneth McMillan, Steven German, Edmund Clarke, Armin Biere,

Carl Pixley, Pranav Ashar, Alessandro Cimatti, In-Ho Moon,James Kukula,

Thomas Shiple, Fabio Somenzi, Kavita Ravi, Robert Kurshan,Flemming An-

dersen, James Saxe, Malay Ganai, and many others whose namesappear in the

bibliography.

1I wish to thank Tamir Heyman for teaching me many of these tricks, including that of optimally exploitinginsomnia.

2A similar idea was quoted by my Integer Programming professor Dr. Gang Yu: “Behind complexity,there is always simplicity to be revealed. Inside simplicity, there is always complexity to be discovered.”

vi

– I lastly wish to thank some of the outstanding professors that I have had the

privilege to learn from while at UT: Greg Plaxton, Vladimir Lifschitz, Lizy

Kurian John, Gang Yu, Margarida Jacome, and Joydeep Ghosh.� I next wish to acknowledge those at IBM who have influenced this research, as well

as the IBM Server Group as a whole for supporting this work, and for providing the

real-world motivation for many of the techniques developedherein.

– Many of the techniques described in this thesis have been implemented within

IBM’s semi-formal verification tool. This project has enabled numerous exper-

imental results reported herein, and provided the motivation to wring through

implementation details that otherwise would likely have gone unexplored. I

want to thank those at IBM who have helped make this project happen, includ-

ing Dave T. Nelson, one of the major motivating factors behind this project;

Wolfgang Roesner, the technically cohesive force between this project and the

rest of IBM; and Victor Rodriguez, the manager of this project who has been

central to keeping it on its tracks. I next wish to thank the semi-formal team

itself, which is the best development team one could ever hope to be a part of:

Viresh Paruthi, Mark Williams, Bob Kanzelman, Hari Mony, Jessie Xu, and

Yee Ja.

– I additionally wish to thank those who have assisted this project via support and

development of adjacent components: Steven Bergman, Matyas Sustik, Ali El-

Zein, Zoltan Hidvegi, Robert Shadowen, Geert Janssen, PaulRoessler, Gavin

Meil, John J. Forrest, and Scott Mack.� I last, but certainly not least, wish to acknowledge those who have influenced my life

during this period.

– I wish to thank my motivating, supportive, and all-around optimal wife Shelly,

without whom my sanity, and quite possibly my will to live, would long since

vii

have vanished during this grueling effort of graduate work in parallel to an

extremely time-consuming full-time job.

– I wish to acknowledge our present stress-relief lops Mocha,Loppy, and Beary

for adding some amusement to my life. I also wish to acknowledge the mem-

ory of Clover; her ever-cheerful nature served to pick up my spirits no matter

how difficult and stressful times became during much of the period that I was

working on this thesis.

– I wish to thank my parents for always encouraging me to achieve.

– I wish to thank the rest of my family for words of encouragement.

– I wish to thank my lifelong friends who provided me buffer overflow protection:

Hagop Jay Tumayan, Raymond Jones, Chris Bald, Mark Dungan, and Biz E. J.

Marquis.

– I lastly wish to thank my spatially-immediate friends whom Ihave had the

opportunity to meet only more recently, who also provided relief from termi-

nal steam build-up: Kenneth Klapproth, David Mui, Praveen and Sona Reddy,

Jerome Delune, John Spencer, Susann Keohane, David Fink, Andy Murati,

James Marrone, Michael Barenys, Steve Roberts, and Jun Sawada.

JASON RAYMOND BAUMGARTNER


December 2002

viii



Publication No.

Jason Raymond Baumgartner, Ph.D.

The University of Texas at Austin, 2002

Supervisor: Jacob Abraham

Computers have become central components of nearly every facet of modern life. Ad-

vances in hardware development have resulted in computers more powerful than the largest

mainframe of the last decade becoming available and affordable for general use. This in

turn has enabled problems which were historically intractable to become solvable with

present technologies. This trend has been noted for four decades.

Functional verification is the process of validating that a design conforms to its

specification. Exhaustive verification generally requiresexponential resources with respect

to design size, hence there is a fine line between “solvable” and “intractable”; this cut-off

point is unfortunately often far smaller than that which is practically necessary. Due to

ongoing increases in hardware design size, direct application of exhaustive techniques to

verify these designs requires exponentially-growing verification resources which outpaces

available boosts in computing power. Therefore, on the surface, Moore’s law works against

the hardware verification community.

This thesis presents an approach to battling verification complexity via automatic

abstraction techniques which transform the structure of a design. These techniques require

ix

only polynomial resources with respect to design size, and may yield exponential speedups

to the verification process. These abstractions are developed as components of a modular

transformation-based verificationframework, enabling optimal synergy between the vari-

ous techniques.

Our specific contributions include: 1) a compositional and structural diameter over-

approximation technique, enabling the use of abstractionsto tighten the produced bounds;

2) an on-the-fly retiming technique for redundancy removal;3) the concept of fanin regis-

ter sharing to enhance min-area retiming; 4) a generalized retiming approach which elim-

inates reset state and input-output equivalence constraints, and supports negative regis-

ters; 5) structural cut-based abstraction; 6) a structuraltarget enlargement approach; 7) the

technique of -slow abstraction; and 8) the technique of phase abstraction. Numerous ex-

periments demonstrate the utility and synergy of these techniques in simplifying difficult

problems. We therefore feel that these techniques comprisea significant step towards a

scalable, automated verification system, helping to realize the prediction made by E. Allen

Emerson that “Someday, Moore’s Law will work for us [the verification community], rather

than against us.”

x

Contents

Acknowledgments v

Abstract ix

List of Tables xiv

List of Figures xv

Chapter 1 Background and Scope 1

Chapter 2 Previous Work 13

Chapter 3 Netlists: Syntax and Semantics 18

3.1 Verification Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . .. 26

3.2 Figure Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

Chapter 4 Diameter Overapproximation Techniques 30

4.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . .43

Chapter 5 Redundancy Removal 44

5.1 Redundancy Removal Algorithms . . . . . . . . . . . . . . . . . . . . .. 46

5.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

xi


Chapter 6 Generalized Retiming 58

6.1 Retiming Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .59

6.1.1 Fanout REGISTERSharing . . . . . . . . . . . . . . . . . . . . . . 60

6.1.2 Fanin REGISTERSharing . . . . . . . . . . . . . . . . . . . . . . . 61

6.1.3 Relaxing Input-Output Equivalence Constraints . . . .. . . . . . . 64

6.1.4 Enabling NEGATIVE REGISTERs . . . . . . . . . . . . . . . . . . 66

6.1.5 Normalized Retiming . . . . . . . . . . . . . . . . . . . . . . . . . 70

6.2 Retiming for Enhanced Verification . . . . . . . . . . . . . . . . . .. . . 71

6.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77


6.4.1 Redundancy Removal Experiments . . . . . . . . . . . . . . . . . 79

6.4.2 Retiming Experiments . . . . . . . . . . . . . . . . . . . . . . . . 84

6.4.3 Diameter Overapproximation Experiments . . . . . . . . . .. . . 88

Chapter 7 Cut-Based Abstraction 94

7.1 Cut-Based Abstraction Algorithms . . . . . . . . . . . . . . . . . .. . . . 98

7.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106


Chapter 8 Structural Target Enlargement 113

8.1 Target Enlargement Algorithms . . . . . . . . . . . . . . . . . . . . .. . . 118

8.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121


Chapter 9 C-Slow Abstraction 127

9.1 C-Slow Abstraction Algorithms . . . . . . . . . . . . . . . . . . . . . . . 142

9.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

xii


Chapter 10 Phase Abstraction 148

10.1 Phase Abstraction Algorithms . . . . . . . . . . . . . . . . . . . . .. . . 164

10.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

10.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 169

Chapter 11 Conclusions and Future Work 172

Appendix A Appendix 175

A.1 Modeling Interconnections as Nets . . . . . . . . . . . . . . . . . .. . . . 175

A.2 Alternate Gate Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

Bibliography 178

Vita 190

xiii

List of Tables

6.1 Retiming results for ISCAS89 benchmarks . . . . . . . . . . . . .. . . . . 81

6.2 Retiming results for IBM Gigahertz Processor (GP) netlists . . . . . . . . . 82

6.3 Generalized retiming results for ISCAS89 and GP netlists . . . . . . . . . . 86

6.4 Effect of retiming on reachability analysis . . . . . . . . . .. . . . . . . . 87

6.5 Diameter experiments for ISCAS89 benchmarks . . . . . . . . .. . . . . . 91

6.6 Diameter experiments for GP netlists . . . . . . . . . . . . . . . .. . . . . 92

7.1 Cut results for ISCAS89 benchmarks . . . . . . . . . . . . . . . . . .. . . 110

7.2 Cut results for GP netlists . . . . . . . . . . . . . . . . . . . . . . . . .. . 111

8.1 Target enlargement results for ISCAS89 benchmarks . . . .. . . . . . . . 124

8.2 Target enlargement results for GP netlists . . . . . . . . . . .. . . . . . . 126

10.1 Phase abstraction results for GP netlists . . . . . . . . . . .. . . . . . . . 170

xiv

List of Figures

1.1 Invariant checking methodology . . . . . . . . . . . . . . . . . . . .. . . 5

1.2 Example flow of transformation-based verification system . . . . . . . . . 11

3.1 Simulatealgorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2 Figure symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.1 Depiction of netlist partitioning for diameter overapproximation . . . . . . 34

4.2 Structural diameter overapproximation algorithm . . . .. . . . . . . . . . 37

4.3 Diameter overapproximation example . . . . . . . . . . . . . . . .. . . . 40

5.1 StructuralMerge algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.2 Mapping an AND/INVERTER/REGISTERgraph to a netlist . . . . . . . . . 48

5.3 Fanout REGISTERsharing example . . . . . . . . . . . . . . . . . . . . . . 49

5.4 AND/INVERTER/REGISTER-graph algorithm for AND gate creation . . . . 51

5.5 AND/INVERTER/REGISTER-graph algorithm for REGISTERcreation . . . . 52

5.6 On-the-fly retiming example . . . . . . . . . . . . . . . . . . . . . . . .. 53

5.7 AND/INVERTER/REGISTERgraph example . . . . . . . . . . . . . . . . . 54

6.1 Example of ILP modeling of fanout and fanin REGISTERsharing . . . . . . 61

6.2 Decomposition of AND vertex for optimal fanin REGISTERsharing . . . . 62

6.3 Retiming graph example . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

6.4 Alternate retiming graph example . . . . . . . . . . . . . . . . . . .. . . 64

xv

6.5 Alternate retiming graph example with relaxations . . . .. . . . . . . . . . 65

6.6 Retimed netlist example with a NEGATIVE REGISTER . . . . . . . . . . . 68

6.7 Example of incorrect ILP modeling of sharing with relaxed non-negativity

constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

6.8 Temporal decomposition of a retimed netlist . . . . . . . . . .. . . . . . . 72

6.9 Example netlist depicting how on-the-fly retiming may hurt REGISTERcount 84

6.10 BDD profile for reachability of S3330 with retiming and redundancy removal 89

7.1 Cut abstraction trace lifting algorithm . . . . . . . . . . . . .. . . . . . . 97

7.2 Top-levelCut Abstract algorithm . . . . . . . . . . . . . . . . . . . . . . 100

7.3 Analyze Cut algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

7.4 SynthesizeSetalgorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

7.5 BDD synthesis example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

8.1 Enlarge Target algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 115

8.2 Top-level target enlargement flow . . . . . . . . . . . . . . . . . . .. . . 119

8.3 Target enlargement trace lifting algorithm . . . . . . . . . .. . . . . . . . 119

9.1 Example three-slow netlist . . . . . . . . . . . . . . . . . . . . . . . .. . 129

9.2 Recurrence structure of abstracted three-slow netlist. . . . . . . . . . . . . 130

9.3 Initialization structure of abstracted three-slow netlist . . . . . . . . . . . . 130

9.4 Algorithm for preprocessing generalized -slow netlists . . . . . . . . . . . 137

9.5 C-Slow trace lifting algorithm . . . . . . . . . . . . . . . . . . . . . . . . 140

9.6 C Slow Abstract algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 142

9.7 Algorithms for coloring generalized -slow netlists . . . . . . . . . . . . . 143

10.1 Semantics-preserving translation of LATCHes to REGISTERs . . . . . . . . 150

10.2 Example netlist with two minimal dependent layers . . . .. . . . . . . . . 153

10.3 Example two-phase netlist . . . . . . . . . . . . . . . . . . . . . . . .. . 154

10.4 Phase-abstracted netlist . . . . . . . . . . . . . . . . . . . . . . . .. . . . 156

xvi

10.5 Alternate phase-abstracted netlist . . . . . . . . . . . . . . .. . . . . . . . 157

10.6 PhaseAbstract algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 161

10.7 Phase abstraction trace lifting algorithm . . . . . . . . . .. . . . . . . . . 162

10.8 MDL partitioning algorithm . . . . . . . . . . . . . . . . . . . . . . .. . 165

10.9 Example three-phase MDL . . . . . . . . . . . . . . . . . . . . . . . . . .166

A.1 Remodeling GATED-CLOCK REGISTERs . . . . . . . . . . . . . . . . . . 177

xvii

Chapter 1

Background and Scope

Computers have become pervasive elements in our daily lives. Advances in hardware de-

velopment have resulted in computers more powerful than thelargest mainframe of the

last decade becoming available and affordable as deskside units for general use. Com-

puters have found their way into nearly every electronic device manufactured today – be-

coming central components of communications, entertainment, medical, and transporta-

tion devices, to name a few. Several-generation-old computer technology may be manu-

factured for mere pennies. The trend of bigger (in capacity), smaller (in physical size),

faster, and cheaper is characteristic of microprocessor, memory, and storage device tech-

nologies [1, 2, 3].

The digital design process consists of several distinct phases. First is thearchitec-

turestage, wherein the desired functionality and performance are used to dictate the type of

logic components that will be used to realize the design. These components are nextimple-

mented, often using a register-transfer level hardware description language (HDL). Finally,

throughlogic synthesis, the HDL models are refined down to gate-level models, which in

turn are refined down to transistor-level models suitable ultimately as a template for sili-

con [4]. Verificationis the process of assessing that the design conforms to its specification,

and is performed at various stages of the design flow.

1

Among the technical advances required to enable the design of high performance

computers are powerful computers themselves – today’s high-end computers are critical

tools to the crafting of tomorrow’s even higher-end computers. There are two primary

tasks related to hardware design for which computers are irreplaceable: logic synthesis

and verification. Verification is the focus of this thesis. Wefurther will focus only upon

timing-independent functional verification of gate-levelhardware models; combinational

propagation delays are assumed to be zero, and the domain of verification is to prove that

certain properties hold in allreachable statesof the design. Additionally, we will focus

only upon automatic verification paradigms which carry out aproof without a need for

manual intervention, hence we will not discuss theorem proving techniques [5, 6].

Computing machines are generallyreactiveto input stimulus, andsequentialhence

may “remember” past input stimuli and computations via state-holding elements such as

registers. The verification of such designs is extremely complex; in addition to needing

to verify these designs against every possible input stimulus from the environment, their

sequentiality requires verification against all possiblesequencesof stimuli. Even a sim-

ple invariant check – that some predicate holds in all reachable states of the design – is

PSPACE-complete [7] hence generally requires exponentialverification resources with re-

spect to design size. This complexity limits the applicability of exhaustive verification

techniques to designs with several hundred state-holding elements or less, while modern

processors may have millions of state-holding elements, even ignoring main memory and

caches. Additionally, the individual “building block” components of such a design – such

as arithmetic logic, prefetch logic, address translation logic, and cache controllers – are

likely to contain many thousands of state elements.

Traditionally, simulation-based approaches are used for verification. In simulation,

testcases (sequences of input stimuli) are developed manually or randomly, and the be-

havior of the design as subjected to these testcases is explored. The benefit of test-based

approaches is that they are scalable, and may be applied to designs of almost any size.

2

Their drawback is that they are incomplete; the fraction of design behavior – e.g., coverage

of reachable states – that can possibly be explored through simulation given fixed resources

generally decreases exponentially with design size. Many techniques have been developed

to increase the coverage attainable through simulation, such as high-level model-based test

generation and the use of coverage analysis to direct test generation [8]. If done cleverly,

simulation has the ability to flush out many design flaws, and will likely always have an

important role in design validation due to its scalability.However, as design sizes increase,

test-based approaches must be deployed in an ad-hoc rather than systematic and complete

manner, hence they cannot prove the absence of errors. The incomplete nature of simula-

tion implies that certain design flaws will go unexposed. Even one missed design flaw may

cost a company hundreds of millions of dollars to rectify, cause project cancellation due to

lost time-to-market, and even risk the loss of human life as computers are finding their way

into safety-critical applications such as transportationand medicine.

Due to the limitations of test-based approaches, there has been an increased effort

throughout the industry to exploit formal verification techniques. Formal verification (FV)

addresses the coverage problem; it exhaustively considersall possible design behavior,

hence has the ability to prove the absence as well as the presence of design errors. As

mentioned above, exhaustive verification generally consumes exponential resources with

respect to design size hence is of limited applicability to larger designs, which arise fre-

quently in industrial applications. Unlike test-based approaches which circumvent this

complexity through incompleteness, the exhaustiveness ofFV directly entails this com-

plexity. Nevertheless, the only way to guarantee the absence of bugs is through formal

techniques, and industrial designs are the ultimate targetfor FV application – hence there

exists a need for a robust mechanism to allow FV to scale up to large designs. Modern high-

performance designs pose many challenges to verification due to the very characteristics

which are intrinsic to achieving their high performance: structural redundancy (or near-

redundancy) such as duplication of logic and storage to minimize propagation delays to

3

distinct fanout points, and a high degree of pipelining [9].The higher degree of pipelining

often implies higher complexity due to more “timing window”conditions; e.g., cache type

of logic must correlate processor accesses against external snoop requests not only concur-

rently but across several clock periods, since the processing of both is spread across several

clock periods. Such timing windows often render test-basedtechniques grossly insufficient

at exposing design flaws. The design-for-high-performance-silicon paradigm further poses

significant barriers to the application of formal techniques: due to the increasing number

of state-holding elements for a design with specific functionality, FV faces increasing chal-

lenges in keeping up with the pace of high-performance designs.

This focus of this thesis is a novel approach to battling the FV complexity barrier

through the use of automatic structural abstraction techniques. These abstractions reduce

the complexity of verification of a design through a transformation of its structure. Our ab-

straction techniques are largely motivated by industrial design characteristics. By restrict-

ing our techniques to using fast graph-based analysis, we constrain their computational

resources to polynomial with respect to design size, and enable exponential speedups in

the verification process. We develop our abstractions as components of a general and mod-

ular transformation-based verificationframework as proposed in [10]. This framework

enables optimal synergy between the various algorithms forproblem simplification and de-

composition, and run-time configurability of algorithm flows to most efficiently discharge

the verification problem at hand.

We assume that our netlist comprises a composition of thedesign, its driver (also

known as theenvironment, encodinginput assumptions[11]), and correctness monitors re-

ferred to asproperty automata, as depicted in Figure 1.1. In composition, certain vertices

in the design under test (DUT) – such as primary inputs, and possibly also internal ver-

tices – will bemergedonto vertices in the driver. This merging algorithm is introduced

in Figure 5.1. The driver may also be dependent upon the DUT, hence some vertices of

the driver may be merged onto vertices of the DUT; care must betaken not to introduce

4

combinational cyclesin this composition. A similar process is used to compose theprop-

erty automata to the DUT/driver composition. Our verification problem is thus aninvariant

check– an attempt to find a trace from an initial state to one which assigns a binary one to

a targetvertex in the property automaton, or to prove that no such trace exists. This would

be akin to checking the CTL [12] propertyAG(:target) – equivalently, that the invariant:target holds in all reachable states of this composition.1 Such a paradigm is sufficient

for the verification of safety properties [13, 14, 15], whichfrom our experience is almost

always sufficient for industrial verification problems. However, liveness properties cannot

be expressed in this system. Practically, one may often decompose hardware liveness prop-

erties into a set of conservative safety properties (as a simple example, every request will

be granted withinn steps rather than “eventually”) – though clearly there are limitations to

such an approach.

DRIVER

NetlistComposite

AUTOMATA

DUT

PROPERTY

FREE vertices

FREE vertices

targets

targets

Figure 1.1: Invariant checking methodology

There are several reasons that more a general property checking paradigm is not

1We use the common “implicit type casting” of binary 1� booleantrue and binary 0� booleanfalsethroughout this thesis.

5

discussed in detail herein.� The goal of this thesis is to describe structural abstraction techniques that are us-

able with arbitrary verification algorithms – e.g., binary and symbolic simulation,

hardware emulation, explicit and symbolic reachability analysis, satisfiability-based

bounded model checking, and semi-formal hybrid approaches. It is our experience

that each algorithm has its own strengths and weaknesses, and works well on certain

designs though not on others. Many industrial problems are too large for direct ap-

plication of exact verification techniques – abstraction techniques may significantly

help, but only up to a point. Therefore, approximate techniques may be the only di-

rect approach of obtaining any verification coverage for a very large problem, at least

before expert manual abstraction may be performed (if the latter is even feasible).

Many underapproximate algorithms do not handle general property checking.� The invariant checking paradigm is sufficient for nearly allindustrial verification

problems due the ability to convert safety properties to automata, hence this restric-

tion is not a significant practical barrier.� The lengthy discussion of suitable temporal logic fragments, liveness constraints,

and modifications of these to match the structural transformations is unnecessary in

our paradigm, and would detract from its primary focus. Furthermore, there may

exist theoretical limitations on usability of certain abstractions within a more general

property checking environment, hence focusing on such may preclude a fragment of

research which is of significant utility in an invariant-checking paradigm. The bulk

of the techniques described herein are applicable for use ina more general verifi-

cation framework (e.g., temporal logic model checking), though there may well be

challenges, limitations, and in cases impossibilities. Wehave in cases explored such

extensions [16, 17], though have not sufficiently generalized this research for the

above reasons.

6

The main bottleneck of invariant checking is the potentially exorbitant computa-

tional resources necessary for state traversal. In general, there is no clear dependency be-

tween the structure or size of a netlist and verification resource requirements. For example,

some designs with less than 100 registers are too complex forreachability analysis; others

with more than 500 may be simple for reachability analysis. In some cases, reducing regis-

ter count may increase correlation between them, hence hurtBDD-based analysis. In other

cases, one reduction may hurt another – e.g., translating ashift registerto a log-2 counter

may hurt retiming, since retiming may be able to eliminate all registers of a pipeline but

cannot eliminate any registers from a directed cycle. However, our experiments demon-

strate that a reduction in netlist size by one technique often enhances the application of

other reduction techniques. Furthermore, a smaller netlist graph tends to require less mem-

ory and run-time resources for performing verification – often, exponentially lower. In par-

ticular, for BDD-based techniques [18, 19] fewer registersresult in fewer BDD variables

which typically decreases the size of the BDDs representingthe set of states and transi-

tions among them. Similarly, in satisfiability-based (SAT-based) state enumeration [20],

the complexity of the state recording device proportionally depends upon the number of

registers. A second motivation comes from the observation that a reduced number of regis-

ters often decreases the functional correlation between them, although as mentioned above

the opposite may occasionally occur. Intuitively, register reduction often produces a less

scattered state encoding which results in a more compact setof BDDs or cube structure for

BDD or SAT-based reachability analysis, respectively. Thus, the primary objective of our

model reductions is to reduce register count.

Our secondary objective is to minimize the number of primaryinputs, which we

denote as FREE vertices. At a coarse level, the number of FREE vertices, like the number

of registers, has a bearing upon an upper-bound on the size ofthe transition relation – and

more generally upon the number of distinct functions over these elements which provides

an upper-bound on the maximum number of irredundant vertices in a netlist. Furthermore,

7

the more FREE vertices, often the less likely simulation is to transitionthe netlist into a

specific state which may occur for only one of exponentially many possible valuations to

the FREE vertices.

Our third and final objective is to minimize the number of combinational vertices

(e.g., AND gates) in a netlist. The size of a netlist entails a linear-time increase on the run-

time of binary simulation and graph traversal algorithms, and often a superlinear increase

on the resources necessary for SAT-based or BDD-based analysis.

Generally, we wish to minimize some function of these three entities – i.e., it may

not be beneficial to exponentially increase combinational vertex count to yield a small de-

crease in register count, but the above is a close approximation to what we have found to

be an optimal objective function. Several of the transformations that we will discuss have

been borrowed from synthesis optimization techniques – forexample, retiming and redun-

dancy removal. The use of structural transformations for enhanced functional verification

is a fairly new topic, whereas such transformations have been used in synthesis and com-

binational equivalence checking for many years. Note that the objective of transformations

for optimal synthesis may often differ from that for verification. For synthesis, one must

balance the effect of a transformation between its effect oncombinational delay, circuit

size and topology, and power consumption. For verification,our objective is in a sense

more direct – we only care to decrease verification complexity, so minimization of netlist

graph size as per the above objectives is our only concern. However, as mentioned above,

a reduction in netlist size may hurt a verification flow in uncommon cases.

The abstraction techniques described in this thesis are discussed as modules of a

re-entrant engine-based verification toolset. This enables synergistic interactions of and

iteration between engines. Each technique is developed according to the following criteria.� It must be “sound and complete” – i.e., it will not cause the user to see a semantically

incorrect trace or obtain an incorrect pass/fail answer.� It must be capable of efficiently “lifting a trace” obtained from the abstracted netlist

8

to one consistent with the original, unabstracted netlist.� It must be capable of receiving a netlist, then handing off a (presumably simpler)

netlist; no extra information may be required to perform theabstraction other than

a structural transformation. However, the encapsulating engine may decompose a

problem, hence split off multiple subproblems.� It may not require assistance of the user – e.g., manually-guided abstractions will not

be considered. Also, it must operate without special annotation to or syntax of the

implementation; e.g., without a need for word-level predicates in the source HDL.

While user-guided abstraction and the exploitation of moreabstract design models

may be arbitrarily useful in simplifying a verification problem, we feel that many

of the benefits attainable in a self-contained abstraction framework are infeasible to

reproduce manually due to their applicability to complex bit-level control logic. Fur-

thermore, purely automatic abstractions are more extensible in an industrial setting

where much of the design and verification staff is not versed in formal techniques.

Overall, we view manual and automatic abstractions as complementary approaches,

and further investigation of the synergy of these approaches is a promising area of

future research.

By developing our abstraction techniques to adhere to thesecriteria, we enable

their application within atransformation-based verificationframework as proposed in [10]

wherein one may iteratively simplify and decompose a verification problem using a series

of transformations, until the problem is “simple enough” tobe solved by a terminal verifi-

cation engine. Therefore, our abstractions must operate solely by structural transformation,

and must be optimized for arbitrary subsequent verificationflows. Such a verification sys-

tem holds tremendous promise for industrial designs, whichare often large and incorporate

many diverse types of logic – e.g., control, arbitration, and table-based storage within a sin-

gle component. Note that such a modular, engine-based approach was also key to making

9

automatic logic synthesis [21, 22] and combinational equivalence checking [23] practical.

Many of the techniques discussed in this thesis have the ability to work synergistically – i.e.,

application of one technique may enhance the application ofanother. Additionally, these

techniques are largely independent of the actual terminal verification algorithms used, mak-

ing the contribution of this thesis orthogonal but complementary to the body of research of

pushing the capacity of verification algorithms.

A particular instance of this system is depicted in Figure 1.2. Note that the netlist

representation at each engine may differ, since each enginemay transform the netlist. For

example, after redundancy removal which merges semantically equivalent vertices, the re-

sulting netlist will generally contain fewer vertices in the cone-of-influence of the targets

than the netlist prior to redundancy removal. After retiming, some vertices in the netlist

are temporally skewed with respect to the netlist received.It is possible that a transforma-

tion alone may trivialize a target – for example, by merging atarget vertex onto a constant

vertex: ZERO or ONE. However, such cases are fairly infrequent, thus our abstractions are

primarily effective in enhancing subsequent verification flows. We require that any trace

returned by an engine be consistent with the netlist received by that engine. Therefore, an

engine which transforms a netlist must undo the effects of its transformation when receiv-

ing a trace from achild engine (which will be in terms of the transformed netlist) before

passing a corresponding trace up to itsparentengine, which will expect the trace to be in

terms of the untransformed netlist. Note also that an enginemay instantiate multiple child

engines as depicted in Figure 1.2. This may be useful to re-use work performed by an an-

cestor engine flow for a multi-faceted verification strategy, or to decompose a problem into

sub-problems as is done by some of the algorithms to be discussed.

This thesis is organized as follows. We first briefly discuss previous work in the

area of abstraction for enhanced verification in Chapter 2, though research most related to

our techniques will be discussed in the corresponding chapters. We next define the syntax

and semantics of our netlist-based representation of the verification problem in Chapter 3.

10

RetimingEngine

HDL Compilation

User

EngineSimulationRandom

ReachabilityEngine

Redundancy

Engine

Removal Engine

Symbolic

TargetEnlargement

NetlistNNetlistN 0

NetlistN 00Tracep00

Tracep

Tracep000NetlistN 000

Tracep0

NetlistN 000Figure 1.2: Example flow of transformation-based verification system

We introduce several common verification algorithms in Section 3.1; these are useful tools

in discharging invariants, and are used as components of some of our abstraction algo-

rithms. We explore the notion of netlistdiameterin Chapter 4, and introduce a composi-

tional structural algorithm for diameter overapproximation, in addition to some theory that

will be used throughout the thesis to enable diameter boundsobtained upon an abstracted

netlist to backward-imply diameter bounds on the unabstracted netlist. This chapter ex-

tends results of collaborative work with Andreas Kuehlmannand Jacob Abraham reported

in [24]. We then begin detailed discussion of several abstraction techniques. The first

topic is redundancy removal, discussed in Chapter 5. Our contribution to this area is the

topic of on-the-fly retiming, which incorporates results ofcollaborative work with Andreas

Kuehlmann reported in [25]. We next describe the use of generalized min-area retiming for

11

enhanced verification in Chapter 6, extending results of collaborative work with Andreas

Kuehlmann reported in [10, 25]. We discuss the concept of cut-based abstraction in Chap-

ter 7. Chapter 8 introduces the topic of structural target enlargement, extending results of

collaborative research with Andreas Kuehlmann and Jacob Abraham reported in [24]. A

discussion of generalized -slow abstraction follows in Chapter 9, which generalizes upon

results reported in [17] obtained in collaboration with Anson Tripp, Adnan Aziz, Vigyan

Singhal, and Flemming Andersen. The final topic is phase abstraction, presented in Chap-

ter 10, which extends results of collaborative work with Tamir Heyman, Vigyan Singhal,

and Adnan Aziz reported in [16]. In Chapter 11 we conclude thethesis and discuss future

research directions. In Appendix A we discuss ways to model more complex gate types

and interconnections in our framework. We have organized this thesis so that a reader in-

terested only in a particular topic may read only the corresponding chapter, possibly using

Chapter 3 as a reference.

12

Chapter 2

Previous Work

In this chapter, we briefly discuss research related to the use of abstraction for enhanced

verification. Because of the volume of prior work, we limit our focus to abstractions of use

in enhancing automatic invariant checking of netlists. We furthermore focus on abstractions

applicable to general verification flows, hence do not consider abstractions which must be

embedded inside specific verification algorithms. We defer discussion of prior research

most closely related to the topics explored in this thesis until later chapters, so that we may

provide the proper background to discuss them more meaningfully.

Several categories of abstractions seek to compress the state space representation

of a design, though do not focus on ways in which to efficientlyrepresent such reductions

in terms of netlist structure. Therefore, they are not directly useful as reduction engines

in a transformation-based verification setting – though they may be useful as features of a

terminal verification engine. Because of their analysis of state space representations, they

risk outweighing the cost of an invariant check in themselves, and are often more focused

on enhancing temporal logic model checking approaches which are more computationally

expensive than invariant checking. Such approaches are fundamentally different from those

taken in this thesis – we use fast structural analysis of the netlist to automatically guide

our abstractions, occasionally using semantic analysis (such as BDDs) only in a resource-

13

bounded manner. Examples of such abstractions include the following.� Bisimulation minimization may be used to reduce the state space of a design to sim-

plify temporal logic verification [26, 27]. As demonstratedin [28], bisimilarity pre-

serves property checking for any CTL* formula, hence this reduction is sound and

complete, though often weaker than necessary for invariantchecking. Such tech-

niques require analysis of the state space of the design, which is too computationally

expensive to consistently offer benefits to invariant checking as noted in [27].� Several similar, more aggressive techniques for reductionof state space while guar-

anteeing preservation of only a necessary set of formulas (rather than all formulas)

have been proposed, for example in [29]. Experimental results of the utility vs. cost

of such approaches have yet to be established as useful for invariant checking.� The topic of abstract interpretation [30, 31] has been proposed to allow reasoning

about abstractions of a design. This work is an excellent framework from which to

theoretically understand many forms of abstraction. However, these approaches do

not directly address how to automatically select abstractions; they instead provide an

infrastructure for reasoning about a selected abstractionof a system.

Numerous practical and powerful abstractions for enhancedverification are pro-

posed in [32, 33], using a theoretical framework similar to abstract interpretation. These

techniques are generally prone to yielding inexact answers, unless fairly stringent suffi-

ciency conditions are met. These approaches also fall into the category of manually-

obtained abstractions for extracting an abstract state space representation for temporal logic

model checking rather than for generating a more compact netlist representation usable

in a transformation-based verification setting. Furthermore, the proposed abstractions are

geared more for simplifying data representations of word-level models than for simplifying

complex bit-level control logic.

14

The automated technique of localization introduced in [34]proposes isolating a sub-

netlist local to the property automata by using afree fence– an overapproximate cut of the

fanin cone. This approach is therefore generally prone to false failures, though a special

case of this technique is sound and complete when the entirecone of influenceis retained

and the remainder of the netlist is discarded. Localization, and techniques to exploit spuri-

ous counterexamples obtained through localization to refine its abstraction, have been the

topic of numerous research approaches [34, 35]. This technique is complementary to our

contributions; the sound and completecone of influencereduction has become a corner-

stone of most practical verification tools, and our abstractions may be used to simplify the

verification of localized cones.

Numerous approaches have been developed for reducing the complexity of word-

level designs. For example, the data-path abstractions proposed in [36, 37, 38] are prone

to false negatives due to exploitation ofuninterpreted functions. Some related approaches

require use of custom verification algorithms [36, 38], hence are not applicable in a general

transformation-based verification toolset. Most of these techniques are not applicable if

data fans out to control (except possibly in a constrained fashion) [36, 37, 38, 39], and re-

quire word-level predicates in the design to exploit, whichis also characteristic of automatic

predicate abstraction approaches [40]. Many of these techniques require manual guidance

to select the abstraction, and none are discussed as components of a transformation-based

verification system.

Symbolic verification approaches [19, 41] have provided a tremendous increase in

verification capacity, scaling up to designs with hundreds of REGISTERs hence up to 100states for some small . Such approaches are complementary to our contributions, and

useful as verification engines in our system. Numerous techniques have been proposed to

enhance symbolic verification, such as exploitation of known invariants of the system [42],

partitioning of the transition relation to decompose imagecomputation [43, 44], and many

more – all of which are complementary techniques useful in a transformation-based verifi-

15

cation setting.

Compositional verification approaches [45, 33] have been proposed to isolate com-

ponents of large systems for standalone verification using abstractenvironmentmodels.

Theenvironmentsencode a set ofinput assumptions, and the verification task consists of

demonstrating that the composition of the assumptions and the component under test satis-

fies certain properties. Such approaches have become invaluable to industrial verification,

and are complementary to our work – indeed, as depicted in Figure 1.1, our system is

designed for use with such a paradigm.

Hierarchical verification approaches seek to reuse the results of verification at a

lower level to simplify higher-level proofs [46]. For example, if one validates that compo-

nents of a netlist satisfy a set of properties, one may attempt to compose automata repre-

sentations of those properties to validate higher-level properties directly upon the automata

composition without needing to reason about the underlyingimplementation. Such ap-

proaches are also complementary to our contributions; our techniques may be used to en-

hance verification at the lower level (of the implementation), as well as at the higher level

(of the composition of property automata).

Symmetry reduction techniques [47, 48, 49] seek to identifysymmetries of the un-

derlying design, e.g., parallel instantiations of identical components, for enhancing the

verification process. While many of these techniques are proposed as embedded compo-

nents for simplifying specific verification algorithms, andpossibly require manual assis-

tance to identify the symmetry, incorporation of symmetry reductions into our framework

is a promising complementary area of future research.

Structural simplification techniques [50, 51] have become core algorithms of com-

binational equivalence checking. In equivalence checking, one has two netlist gates that

they wish to prove semantically equivalent, often from two levels of refinement of the de-

sign process (e.g., gate-level vs. transistor-level representations). An exclusive-or is built

over these two gates, and the verification goal is to demonstrate that this exclusive-or is

16

semantically equivalent to zero. Simplification techniques are used attempt to identify re-

dundant gates in the fanin cone of the exclusive-or; when tworedundant gates are found,

one is merged onto the other. This merging thereby reduces the size of the problem, and

often trivializes it (e.g., by merging the two gates being compared onto each other). In-

variant checking may be viewed as a sequential generalization of the equivalence check-

ing paradigm, hence structural simplification techniques are equally applicable to invariant

checking. Such techniques are discussed further in Chapter5.

17

Chapter 3

Netlists: Syntax and Semantics

In this chapter we introduce the netlist, and provide structural and semantic definitions that

will be used throughout this thesis to reason about the netlist. A reader well-versed in

hardware verification may wish to skip to the next chapter, referring to this chapter only as

a reference.

Our netlist definition is based upon a directed graph model.

Definition 3.1. A directed graphG = hV; Ei consists of a finite set of verticesV, and a set

of directed edges between verticesE � V � V. For edge(u; v), we refer tou as thesource

vertex andv as thesinkvertex.

Definition 3.2. We defineinlist(U) = fv : 9u 2 U:(v; u) 2 Eg as the set of ver-

tices sourcing input edges to vertex setU . We define the indegree of a vertex setU by

indegree(U) = jinlist(U)j.Definition 3.3. We defineoutlist(U) = fv : 9u 2 U:(u; v) 2 Eg as the set of ver-

tices sinking output edges from vertex setU . We define the outdegree of vertex setU by

outdegree(U) = joutlist(U)j.Definition 3.4. We definefanin cone(U) = U [ fanin cone

�inlist(U)� for vertex setU .

Due to the monotonicity of evaluation of this definition, andthe finiteness ofG, this set is

18

well-formed.

Definition 3.5. We definefanout cone(U) = outlist(U) [ fanout cone�outlist(U)� for

vertex setU . This set is well-formed as per the analysis of Definition 3.4.

Definition 3.6. A strongly-connected component (SCC)is a set of verticesU � V such

that8u; v 2 U:(u = v) _ �u 2 fanin cone(v) ^ u 2 fanout cone(v)� for vertex setU . The

maximalSCC associated with vertex setU , comprising the union of all SCCs containingu 2 U , is denoted bySCC(U).Definition 3.7. A cut hC; Ci is a partition of a set of verticesV into two sets:C andC =V n C. A cut defines a set of edgesEC with sources inC and sinks inC, i.e. EC = f(u; v) :(u; v) 2 E ^ u 2 C ^ v 2 Cg. The set of vertices sourcingEC is denoted asVC = fu :9v:(u; v) 2 ECg.Definition 3.8. Given the set of all possible cuts of a graph, amin-cutis one such thatECis minimal in cardinality.

Definition 3.9. Given the set of all possible cuts of a graph, avertex min-cutis one whose

set of sourcing verticesVC is minimal in cardinality.

We often wish to specify a set ofsourcesCs � V andsinksCt � V n Cs to seed a

min-cut solution; i.e.,Cs � C andCt � C for any cut returned by a min-cut algorithm. The

resulting seeded formulation is referred to as ans-t min-cut problem.

Our verification problem is expressed as a netlist. This netlist represents a com-

position of the design under test, a netlist-based representation of itsdriver (also known

as theenvironment, encodinginput assumptions), and a netlist-based representation of the

property automataas illustrated in Figure 1.1. We assume that time is discrete, defined

on [0;1). As will be demonstrated in Definition 3.12, we reason about the netlist in two

dimensions: vertices and time. Time dictates the period of update of our only sequential

netlist primitive: the REGISTER. Combinational elements are assumed to have 0 delay. In

19

practice, this model is sufficient to reason about most interesting properties of synchronous

sequential netlists, and many asynchronous problems may bemodeled in this fashion. We

now introduce a formal syntax and semantics for our netlists.

Definition 3.10. A netlist is a tupleN = hG; G; Z; T i. TermG = hV; Ei represents a

directed graph, where the verticesV representgates, and the edgesE representintercon-

nections. FunctionG : V 7! types defines the semantic gatetypeassociated with each gatev. FunctionZ : V 7! V is the initial value mappingZ(v) of each gatev. The nonempty set

of targetsT � V correlate to a set of invariants, as will be discussed in Definition 3.13.

Definition 3.11. A gatev 2 V may be of the followingtypeswhich comprise the range of

functionG. TermGv represents the semantic function correlating to thetypeof v which

will be used in Definition 3.12. Letuj denote thej-th element of an arbitrary ordering of

the setinlist(v).� FREE. This gate has indegree of zero. It may nondeterministically drive a 0 or a 1

at any time-step independently of any other vertices. TermGv() is not referenced

for FREE vertices, though for simplification of notation we occasionally will refer toGv() at a specific time-stepi as the sampled value of the corresponding FREE vertex

at timei.� ZERO. This gate has indegree of zero. It is semantically equivalent to 0, thusGv() = 0.� INVERTER. This gate has one input which is combinationally inverted,thereforeGv(u1) = :u1.� AND. This gate hasn � 1 inputs. It drives the combinational conjunction of all input

values, thusGv(u1; : : : ; un) = Vnj=1 uj.� REGISTER. This one-input gate drives its initial valueZ(v) at time 0, and thereafter

20

unconditionally shadows its input by one time-step. TermGv is not referenced for

REGISTERs.

Hereafter we refer to the vertex type ONE as a shorthand for an INVERTER whose in-

coming edge is sourced by a ZERO vertex. Note that the REGISTERhas an implicit clock –

it represents an unconditional one-time-step delay of its input value. For a discussion of

how to incorporate alternate gate types and more intricate interconnection types into this

framework, refer to Appendix A. The above set of gate types issufficient to succinctly

model sequential boolean netlists. We will occasionally use more complex gate types in

our examples for brevity (refer to Section 3.2); it is understood that this is merely shorthand

for the equivalent synthesis of such gates into the above types. Hereafter we denote the set

of REGISTERs asR, and the set of FREE vertices asI.

Definition 3.12. The semantics of a netlistN are defined in terms of itstraces: 0, 1 val-

uations to gates over time. We denote the set of all legal traces associated with a netlist

by P � [V � N 7! f0; 1g℄, definingP as the subset of all possible functions fromV � Nto f0; 1g which are consistent with the following rule. The value of gate v at time i in

tracep 2 P is denoted byp(v; i). The value of edge(u; v) 2 E at time i in tracep is

defined asp�(u; v); i� = p(u; i). Termuj denotes thej-th element of an arbitrary ordering

of inlist(v), implying that(uj; v) 2 E .

p(v; i) = 8>>>>>>>><>>>>>>>>:b 2 f0; 1g : v 2 IGv�p(u1; i); :::; p(un; i)� : v 2 V n fR; Igp(u1; i� 1) : (v 2 R) ^ (i > 0)p�Z(v); 0� : (v 2 R) ^ (i = 0)

The initial values of a netlist constrain the values that a REGISTERmay take at time

0; note that this function is ignored for non-REGISTER types. Our semantics allow us to

reason about a netlist as a state machine – i.e., a Mealy or Moore machine [52]. However,

21

in this thesis we limit our semantic analysis to this trace-theoretic view. We occasionally

will refer to the set of all legal traces associated with a netlist as _p.Definition 3.13. We say that targett is hit in tracep at timei if p(t; i) = 1. A targett is

not hittable in any trace iff8p 2 P:8i 2 N : p(t; i) = 0. We say that a target which may be

hit is reachable, and one which is not hittable isunreachable.

Because our properties are safety properties, the traces generated by abstraction or

verification algorithms will bepartial traces.

Definition 3.14. A partial trace is defined as the set of finite subsets of legal traces1: P =fp : 9p 2 P: p � pg.Definition 3.15. A refinementp0 of a partial tracep consists of adding one or more elements

to p, such thatp0 � p andp0 is a partial trace.

Definition 3.16. A partial tracep is said to besemantically consistentif all possible refine-

ments ofp are partial traces.

To illustrate the concept of a semantically consistent partial trace, consider a netlist

with FREE verticesa andb, and AND vertex = a^ b. Setfh(a; 0); 0i; h( ; 0); 0gi is a con-

sistent partial trace, as isfh(a; 0); 1i; h(b; 0); 1i; h( ; 0); 1ig. However, the setfh(a; 0); 1i;h( ; 0); 1ig is not consistent since one possible refinement, which addsh(b; 0); 0i, does not

render a partial trace. This fact has implications on an optimal toolset; it may be desirable

that each abstraction algorithm provide as small of a partial trace as possible to make the

trace lifting process as efficient as possible. Nevertheless, it is often necessary that suffi-

cient data be reflected in the partial trace to guarantee legality, else the trace provided to a

user of the tool may not comprise sufficient data to explain how the corresponding target

was hit. Binary simulation algorithms are useful to refine traces, and may be used to fully

populate a partial trace for the necessary length up to a hit of a target.

1We will occasionally refer to an element of a functiona 7! b asha; bi, using the common extension of afunction to a relation.

22

Definition 3.17. Thelengthof a partial tracep ismaxfi : 9v 2 V:9b 2 f0; 1g: h(v; i); bi 2pg+ 1.

Throughout this thesis, we use the convention thatmaxf;g = 0 andminf;g = 1.

We now introduce some terminology related to the structure of netlists.

Definition 3.18. A structural pathof a netlist is an ordered set of verticeshv0; : : : ; vni such

that8i 2 [0; n� 1℄: (vi; vi+1) 2 E .

Definition 3.19. A directed cycleis a structural pathhv0; : : : ; vni such thatv0 = vn.

Definition 3.20. The sequential weightof the structural pathhv0; : : : ; vni is defined asPni=0(vi 2 R).Definition 3.21. Thecone of influenceof a vertex setU is denoted ascoi(U), and defined

asfanin cone(U) [ fanin cone�Z(R \ fanin cone(U))�.

Definition 3.22. Thecombinational faninof vertex setU is defined asSu2U cfi(u), where

cfi(u) is defined asu if u 2 R, elseu [ combinational fanin�inlist(u)� if u =2 R. This set

is well-formed as per the analysis of Definition 3.4.

Definition 3.23. The combinational fanoutof vertex setU is defined asSu2U cfo(u),

wherecfo(u) is defined asoutlist(u) [ combinational fanout�outlist(u) n R�. This set is

well-formed as per the analysis of Definition 3.4.

Intuitively, thecombinational faninof v contains all vertices in the fanin cone ofvwhich may be reached without passing through a REGISTER, and thecombinational fanout

of v contains all vertices in the fanout cone ofv which may be reached without passing

through a REGISTER.

Definition 3.24. A legal netlistis one which satisfies the following rules.

1. The indegree of each gate is consistent with its specified type. Each INVERTER and

REGISTERhas indegree of 1; each AND gate has indegree greater than 0; each ZERO

and FREE vertex has indegree of 0.

23

2. A legal netlist has a finite number of gates.

3. Every directed cycle has strictly positive sequential weight.

4. The initial value cone of each REGISTER must be entirely combinational, i.e.fanin

cone�Z(R)� \ R = ;.

Hereafter, we assume that all netlists under discussion arelegal. The first three rules

of Definition 3.24 are trivially satisfied by netlists generated by synthesis of HDL, which

encompass the class of netlists which are the primary focus of this thesis. The requirement

that initial values be combinational is not semantically limiting; since the initial values have

semantic significance only at time 0, if one wishes to have a REGISTERu in the fanin cone

of an initial value, they may simply replace the occurrence of u in this initial value cone

by Z(u). This constraint prevents possible ill-formed netlists due to cyclic initial value

definitions – e.g., for REGISTERs u andv, stating thatZ(u) = v andZ(v) = u. These

assumptions collectively ensure that Definition 3.12 is well-formed.

Definition 3.25. A states of a netlist is defined ass 2 S, whereS = 2R.

Intuitively, a states represents the subset of REGISTERs which evaluate to a binary1 at a given time-step; setR n s represents the set of REGISTERs which evaluate to a binary0 at that time-step.

Definition 3.26. We define theinitial statesS0 of a netlist asSp2Pffr 2 R : p�Z(r); 0� =1gg.

Definition 3.27. A Kripke statesK of a netlist is defined assK 2 2R[I .Given a subset of REGISTERs and FREE vertices (a Kripke state), we may use the

semantics of our netlist to provide a unique deterministic valuation to any vertex using the

Simulatealgorithm depicted in Figure 3.1.

24

Binary Simulate (Vertex v; Kripke State sK) fswitch

�G(v)� fcase FREE:case REGISTER:

return v 2 sK;case ZERO:

return 0;case INVERTER:

return :Simulate�inlist(v); sK�;

case AND:return

Vui2inlist(v)Simulate(ui; sK);ggFigure 3.1:Simulatealgorithm

Definition 3.28. The imageof a set of statesA, denoted byimage(A), is defined asfs 2S : 9s0 2 A:9i � I:8r 2 R: �Simulate(inlist(r); fs0; ig) = (r 2 s)�g.Definition 3.29. Thepreimageof a set of statesA, denoted bypreimage(A), is defined asfs 2 S : 9s0 2 A:9i � I:8r 2 R: �Simulate(inlist(r); fs; ig) = (r 2 s0)�g.Definition 3.30. The distancefrom states to s0 is defined asdistance(s; s0) = minfj :9p 2 P:9i 2 N :8r 2 R: �((p(r; i) = 1)$ (r 2 s)) ^ ((p(r; i + j) = 1)$ (r 2 s0))�g.

Because of our convention thatminf;g = 1, Definition 3.30 implies that the dis-

tance from states to states0 is1 if s0 is not reachable froms along any trace. Since a legal

netlist is finite, the distance betweens ands0, provided thats0 is reachable froms, cannot

be1.

Definition 3.31. Verticesv andv0 of N are said to besemantically equivalentiff 8p 2P:8i 2 N : p(v; i) = p(v0; i).Definition 3.32. Vertex setsA andA0 of netlistsN andN 0, respectively, are said to betrace

25

equivalentiff there exists a bijective mapping : A 7! A0 which satisfies the following

conditions.� 8p 2 P:9p0 2 P 0:8i 2 N :8a 2 A: p(a; i) = p0� (a); i�� 8p0 2 P 0:9p 2 P:8i 2 N :8a 2 A: p(a; i) = p0� (a); i�The notion ofbisimilarity, relating state transition graphs of netlists, is more restric-

tive than trace equivalence – bisimilarity implies trace equivalence, though the latter does

not imply the former. However, in an invariant checking domain, trace equivalence is a

sufficient condition for most purposes.

3.1 Verification Algorithms

In this section we briefly introduce several common verification algorithms which are use-

ful to discharge invariants.

There are two primary methodologies for the verification of safety properties: state

traversal techniques and inductive methods. State traversal techniques employ exact or

approximate search to attempt to calculate a trace which hits a target; unreachability is

proven if a search exhausts without finding such a trace. Exhaustive search is performed

by enumerating the reachable states of the design, which is almost exclusively performed

using BDDs [53] to represent the transition relation and state sets [18, 19]. However, more

recently, noncanonical representations have been proposed for reachability analysis [54],

as have satisfiability-based algorithms [55]. Because of their exponential complexity, exact

state traversal techniques – whether symbolic or explicit –are applicable only to smaller

designs with at most several hundred REGISTERs.

Numerous approximate techniques have been proposed to address the capacity lim-

itations of exact state traversal. Overapproximating the set of reachable states is useful to

prove a target unreachable if all target states remain outside the overapproximation, though

26

cannot readily demonstrate reachability otherwise. For example, design partitioning [56]

may be applied to overapproximate the set of reachable states by exploring components

whose sizes are tractable for exact traversal. Similarly, the concept of an injected FREE

fence [34] to obtain an overapproximate localized cone of influence has been suggested for

proving unreachability of a target bylocalization.

Conversely, underapproximate techniques are useful to demonstrate reachability of

targets, but are generally incapable of proving their unreachability. For example, sequential

binary simulationis based upon the combinational algorithm depicted in Figure 3.1, and

consists of the process of evaluating Definition 3.12 to produce a semantically consistent

partial trace. Random selection is used to determine valuations to FREE vertices. As an-

other example, bounded model checking (BMC ) [57] is based upon a satisfiability check of

a finitek-step unfolding of the target. This unfolding process consists of building a combi-

national netlist by recursively evaluating Definition 3.12, injecting a unique FREE vertexvifor each time-stepi of a given FREE vertexv encountered during the evaluation, and sim-

ilarly replicating combinational gates per time-step for any AND, INVERTER, and ZERO

vertices encountered. Note that REGISTERs merely constitute a shift in time-steps, hence

do not appear in the unfolded netlist. If it can be proven thatthe diameterof the netlist

(refer to Definitions 4.1 and 4.2) is smaller or equal tok, BMC becomes complete and can

thereby also prove unreachability; this concept is explored further in Chapter 4. A simi-

lar underapproximate method is based upon a bounded backward unfolding of the design

starting from the target. The unfolded structure comprisesan enlarged targetwhich may

be used to either directly discharge the verification problem or to produce a new, simplified

problem to be solved by a subsequent verification flow. We explore target enlargement

further in Chapter 8. Lastly, a semi-formal toolset, which interleaves between resource-

bounded exhaustive searches and simulation [58, 59], may beuseful to quickly calculate

a trace which hits a target even if that target is too probabilistically difficult to be hit by

random simulation alone, especially when netlist size renders exact search infeasible.

27

An inductive proof requires an invariant (either automatically generated or manually

provided) that implies the property; one then demonstratesthat the invariant holds in all

reachable states. The base step of ak-step inductive proof checks that the invariant holds

during the firstk time-steps. This may be performed by ak-step bounded model check

of the invariant, which is used to validate theinduction hypothesis. The inductive step

must then demonstrate that asserting the invariant during time-stepsi; : : : ; (i + k � 1)implies that it continues to hold at time-step(i + k). Inductive proofs may be performed

via BDD-based analysis [60, 24] or SAT-based analysis [57, 61]. If the proof is completed,

then unreachability of the target is deduced. The general drawback of inductive schemes

is the intrinsic difficulty in determining a powerful enoughinvariant that is inductive and

also implies correctness of the property. However, for manypractical problems, backward

unfolding of the target –target enlargement(see Chapter 8) – yields an inductive invariant

after several steps.

3.2 Figure Symbols

In this section we introduce the symbols that we will use throughout this thesis in our

figures depicting netlists. We illustrate these symbols in Figure 3.2. Though our defined

netlist gate types are only FREE, AND, INVERTER, ZERO, and REGISTER, we often use

more abstract types in our examples for brevity.

Term ite(sel ; a; b) means “ifsel thena elseb.” We label the “data ports”a andb of

multiplexors with a 1 and a 0 in our figures, respectively, indicating which respective net

has its value sensitized through the multiplexor when the selectorsel evaluates to a binary

1 vs. a binary 0, respectively.

28

... ...

...

... ... ...

AND gate AND gate with inverted inputs

b

a

sel

0Multiplexor; = ite(sel ; a; b)c1

OR gate

REGISTER

FREE vertex

INVERTER

Sinkless 1-input AND gate

Combinational logic withoutFREE vertices

Combinational logic, possiblycontaining FREE vertices

Figure 3.2: Figure symbols

29

Chapter 4

Diameter Overapproximation

Techniques

In this chapter, we define thediameterof a netlist, and discuss various diameter overap-

proximation techniques. Diameter is an important topic to verification, since we may use

an overapproximate diameter bound to ensure that an application of BMC is sound and

complete;BMC is often much more efficient than general fixpoint computations used for

reachability analysis. In addition to discussing our structural diameter approximation tech-

nique from [24], which is collaborative work with Andreas Kuehlmann and Jacob Abra-

ham, we formalize a theory for allowing a compositional approach to diameter approxima-

tion which allows arbitrary techniques to be used on a per-component basis. Additionally,

we introduce concepts which will be used throughout this thesis to allow a diameter over-

approximation obtained upon an abstracted netlist to implya bound on the diameter of the

corresponding unabstracted netlist.

Definition 4.1. Thediameterd 2 N of netlistN is the maximum finite distance between

any two statesplus one: d = maxfs;s02S:distan e(s;s0)6=1g distance(s; s0) + 1.

In other words, if any states0 is reachable from states, then s0 is reachable in

less thand time-steps froms. This implies that an exhaustive bounded state traversal of

30

depth0; : : : ; d�1 is sufficient to determine whether a target is hittable or unreachable since

FREE vertices may take values independently at any point in time,and since a deterministic

valuation to any vertex may be obtained from a valuation toR [ I as per the algorithm

Simulateof Figure 3.1. Note that our definition of diameter is one greater than the standard

definition for graphs; this simplifies the exposition of our compositional techniques, and

matches the number of time-steps necessary to ensure completeness ofBMC .

In many cases, using diameter to bound the depth of application of BMC is not

tight. For example, to assess reachability of a target, we may ignore any vertices outside

of its cone of influence, which may decrease diameter. This observation is one of the

motivations for an alternate diameter definition we presentin Definition 4.2. Additionally,

a BMC application for the maximum distance from any initial staterather than from any

reachablestate suffices for invariant checking. Furthermore, for invariant checking targett, we need only perform a search deep enough to assess whether we may toggle the target

from 0 to 1 relative to an initial state; the amount of time necessary to toggle the target from

a 1 to a 0 may be exponentially greater. This concept is revisited in Theorem 8.3. However,

the more conservative diameter bound will be necessary for our compositional algorithms,

and most approximation techniques directly yield diameterbounds.

Note that use of Kripke states in Definition 4.1 may yield a result one larger. For

example, a purely combinational netlist containing FREEvertices has a diameter of 1. How-

ever, this netlist will have multiple Kripke states, and each of these Kripke states may tran-

sition to every other, hence it will have a Kripke diameter of2. Next, consider a netlist with

no FREE vertices but a set of REGISTERs which act as amod -counter. We will obtain a

diameter of for this netlist whether or not we use a Kripke representation, since it requires � 1 time-steps to transition the counter from any state to the corresponding furthest state.

To applyBMC in a complete manner, for the former combinational netlist we need only

verify one time-step. For the latter netlist, we must verify time-steps. These tight bounds

are accurately reflected in our non-Kripke diameter definition. Thus, inclusion of FREE

31

vertices in our state model unnecessarily weakens our diameter bounds. We now introduce

an alternate diameter definition, which will allow further tightening.

Definition 4.2. Thediameterd(U) of vertex setU is the minimum number such that for any

tracep and any increasing succession1 k1; : : : ; k , there exists another tracep0 and another

increasing successionl1; : : : ; l such thatV j=1(lj � kj) and

�l � l �1 + d(U)�, takingl0 = �1, which satisfies8u 2 U:V j=1 �p(u; kj) = p0(u; lj)�.By Definition 4.2, the diameter of verticesU actually need not correlate to that of oi(U). For example, if a vertexu encodes an XOR function of a FREE vertex and a se-

quential coneA, thend(u) = 1 regardless of that ofA since any valuation tou will be

producible at any time-step. This definition provides an opportunity to bound diameter

without a need to analyze the underlying state space representation, which is key to under-

standing our structural diameter overapproximation algorithm of Figure 4.2. Furthermore,

this definition is extended in Theorem 4.3 to enable a bound obtained on a transformed

netlist to be used to imply a bound for the original, untransformed netlist.

Theorem 4.1. The diameterd of Definition 4.1 is equal or one greater thand(V) of Defi-

nition 4.2.

Proof. We consider two cases. First, assume that the netlist has a diameter of 1 by Defi-

nition 4.1. This implies that either the netlist is combinational henceS = ;, or the netlist

has REGISTERS though they act as constants – i.e., no state may transition to any other.

Because of the lack of any sequential behavior of the netlist, any valuation reachable at

any timei must be reachable at every time-step, thus we also obtain a diameter of 1 by

Definition 4.2.

Second, assume that the netlist has ad > 1 by Definition 4.1. Lets ands0 represent

a maximally-distant state pair withdistance(s; s0) < 1. If there exists such a maximally-

distant state pairs ands0 such thats is an initial state, ands0 is not reachable in any trace

1An increasing succession is an ordered set of natural numbers k1; : : : ; k for � 1 which satisfies therelationki < ki+1;8i 2 [1; � 1℄.

32

before timed� 1, then these two definitions clearly yield identical resultsby using = 1in Definition 4.2. Otherwise, assume that states00 transitions to states along some trace.

Let x be the FREE vertex valuation which transitions the netlist froms00 to s. We note that

the minimum number of time-steps necessary to witnesss0 after witnessingfs00; xg in any

trace is exactlyd, thus these two definitions yield identical results. Lastly, assume that we

may not transitionto any states which is maximally distant to any other states0 (i.e., smust be an initial state); additionally, states0 is reachable more shallowly along a trace not

passing throughs. Definition 4.2 will yield a bound one smaller than that of Definition 4.1,

since it will taked� 1 time-steps to witnesss0 after witnessings. Thus, the diameter with

respect to Definition 4.2 is often identical to, though occasionally one less than, that with

respect to Definition 4.1.

Theorem 4.1 illustrates an interesting result; our diameter of Definition 4.2 is less

than or equal to that of Definition 4.1, though the former but not the latter may include FREE

vertices. As per the previous discussion of themod -counter, we cannot merely drop the

addition of1 from Definition 4.1 to attempt to yield an identical bound. This increment is

generally necessary to ensure completeness ofBMC , and indeed our proof of Theorem 4.1

indicates that in most cases these two definitions yield identical bounds. Nevertheless, the

bound of Definition 4.2 is sufficient both for invariant checking (as follows from assigning = 1) as well as for bounding the diameter of isolated components(refer to Theorem 4.2).

We now define recurrence diameter, which constitutes an overapproximation of di-

ameter. Our definition is one greater than that of [57] for consistency with diameter.

Definition 4.3. The recurrence diameterdr 2 N of a netlistN is defined as the length

of its maximal acyclic state sequence. In other wordsdr = maxfj : 9p 2 P:9i 2N :9s1 ; : : : ; sj 2 S:8r 2 R:�Vjk=1((p(r; i+ k� 1) = 1)$ (r 2 sk))^ (8k; l 2 [1; j℄:(k 6=l) ! (sk 6= sl))�g.There are two characteristics of practical netlists which may be exploited to com-

pute tight diameter bounds. First, netlists seldom represent monolithic structural strongly

33

connected graphs. Instead, they often comprise multiple maximal SCCs; an approxima-

tion of diameter may thus be compositionally derived from anestimation of the individual

SCC diameters. Second, although the diameter of a componentis generally exponential

in its REGISTER count, several commonly occurring structures have much tighter bounds.

For example, as proven in Theorem 4.2, the diameter of a single memoryrow compris-

ing n REGISTERs is 2 instead of2n; acyclic REGISTERs only cause a linear, rather than

multiplicative increase in diameter.

...

...

...

...

.........

Cj�2 Cj+1CjCj�1TSAP j+1TSAP jTSAP j�1

Figure 4.1: Slice ofTSAPstructure

Definition 4.4. A topologically sorted acyclic partitioning (TSAP ) of V into n com-

ponents is a labelingTSAP : V 7! f1; : : : ; ng such that8u; v 2 V:�(u; v) 2 E )TSAP(u) � TSAP(v)�. We denote thei-th component of aTSAP by the setTSAP i =fv : TSAP(v) = ig.Note that the acyclic requirement mandates thatTSAPTSAP(v) � SCC (v). LetCi = fTSAP i\Tg[fu : 9v 2 V:�(u; v) 2 E^u 2 Sij=1TSAP j^v 2 Snj=i+1TSAP j�g.

SetCi comprises the targets inTSAP i, in addition to vertices of components0; : : : ; iwhich

have sinks in componentsi + 1; : : : ; n. For example, in Figure 4.1, some elements of

componentTSAP j�1 are included inCj andCj+1, though no elements ofTSAP j are

included inCj+1 since no outgoing edges fromTSAP j have sinks beyondTSAP j+1. We

useCi in our compositional diameter overapproximation approach; it is the vertices inCi \ TSAP i which must be considered in our bound forTSAP i.34

Definition 4.5. We distinguish between the followingTSAP component types. Letxi be a

REGISTERvertex andyi be the source of the incoming edge toxi.� A combinational/constant component(CC) contains only non-REGISTER vertices,

or REGISTERs whose incoming edges are sourced by themselves; i.e.yi = xi. FREE

vertices may only appear inCCs.� An acyclic component(AC) contains only REGISTERvertices whose incoming edges

are inputs to the component.� A memory component(MC) is composed solely of a set ofr � REGISTERs and

combinational gates, forr � 1 and � 1. The next-state functions of the REGISTERs

have the form:yi;j = (xi;j ^Vwk=1 :load i;k) _Wwk=1(datai;j;k ^ load i;k), for 1 � i �r and 1 � j � , wheredatai;j;k and load i;k are inputs to the component. Letrows(TSAP i) = r for MCTSAP i.� A queue component(QC) is composed solely of a set ofr � REGISTERs and com-

binational gates, forr > 1 and � 1. The next-state functions of the REGIS-

TERs have the form:y1;j = (x1;j ^ Vwk=1 :loadk) _ Wwk=1(dataj;k ^ loadk); yi;j =(xi;j ^ Vwk=1 :loadk) _ (xi�1;j ^ Wwk=1 loadk), for 1 < i � r and 1 � j � ,wheredataj;k and loadk are inputs to the component. Letrows(TSAP i) = r for

QCTSAP i.� All remaining components are termedgeneral components(GCs). We note thatR \TSAP i 6= ; for GCs. If there exists a combinational path from an input ofTSAP ito any combinational gateu 2 TSAP i, andu 2 Ci, we say that theGC is Mealy.

Note thatMCs andQCs have been generalized forw load ports. Further general-

izations are possible, though we have found these adequate for most commonly-occurring

structures. The constant REGISTERs in CCs may have constant initial values (in which

35

case they may be simplified by constant propagations) or symbolic initial values (e.g., im-

plementingforall variables). As we shall demonstrate, our overapproximation algorithm

provides the smallest bounds for TSAPs with maximally-sized ACs,CCs, MCs, andQCs.

Obtaining such a partition is a simple linear-time procedure: we first identify the cyclic

vertices using a fanin or fanout sweep. Any REGISTERs not in the cyclic subset will beAC

elements. The other REGISTERs are then classified as follows: if the incoming edge of a

REGISTERis sourced by itself, it is aCC. Otherwise, we use a pattern-matching heuristic to

see if the REGISTERappears as atable cell; i.e., an element of aQCor MC. If so, we hash

the corresponding REGISTER based upon itsload vertices. All REGISTERs with identical

load vertices are candidates for appearing in the sameMC or QC component. Finally, we

selectively cluster components in an attempt to maximize the size of theACs, MCs, and

QCs, while preventing the introduction of cycles in the partition graph.

Our approximation of the diameter of targett is based upon aTSAP of its cone of

influence. We ascribe anadditiveelementda and amultiplicativeelementdm with eachTSAP component, using the algorithm of Figure 4.2. Term�(i) in this algorithm denotes

whetherTSAP i entails a cut between components0; : : : ; i�1 and componentsi+1; : : : ; n,

and�(i) = :�(i). In Figure 4.1, onlyTSAP j�1 entails a cut (provided that it is not aMealy

GC), hence�(j�1) = 1, whereas�(j) = �(j+1) = 0. TermDi represents an upper-bound

on the diameter ofCi \TSAP i in the context ofN ; clearly2jR\TSAP ij is conservative, and

may be improved upon by various mechanisms as we will discusslater in this chapter.

Theorem 4.2.The valueda(i)+dm(i) obtained by the algorithm of Figure 4.2 is an upper-

bound on the diameter ofCi. This implies thatd(t) = da�TSAP(t)� + dm�TSAP(t)� is

an upper-bound on the diameter of targett.Proof. We will prove this theorem by induction oni. Our proof is based upon the hy-

pothesis that any arbitrary succession of reachable valuations toCi is producible within� = � dm(i) + da(i) time-steps. Restating this hypothesis more formally: for any in-

creasing succession of time-stepsk1; : : : ; k and any tracep, there exists another increasing

36

hda; dmi Preprocess Diameter(Netlist N; TSAP A) fdm(0) = 1;da(0) = 0;for(i = 1; i � jAj; i++) f

if�(i � 1) _ (Ci�1 \ Ci 6= ;) _ (type(Ai) � Mealy GC )� f�(i) = 0;g

else f�(i) = 1;g�(i) = 1� �(i);dm(i) =8><>:dm(i� 1) : type(Ai) 2 fCC ;ACgdm(i� 1) � �rows(Ai) + �(i)� : type(Ai) 2 fMC ;QCgdm(i� 1) � �Di � �(i)�+ �(i) : type(Ai) � GCda(i) =8><>:da(i� 1) : type(Ai) 2 fCC ;GCgda(i� 1) + �(i) : type(Ai) 2 fMC ;QCgda(i� 1) + 1 : type(Ai) � ACgreturn hda; dmi;g

Figure 4.2: Algorithm for calculation ofda anddmsuccessionl1; : : : ; l such that

V j=1(lj � kj) and(l < � ), and another tracep0 such that8u 2 Ci:V j=1 �p(u; kj) = p0(u; lj)�. This theorem follows from assigning = 1.

The intuition behind this hypothesis is that componentTSAP i+1 may transition

from each of its states only upon witnessing a distinct valuation toCi. Therefore, in order

to ensure that we attain an upper bound on the diameter ofTSAP i+1, we generally must

wait for a succession of = Di+1 valuations toCi. For example, ifTSAP1 is amod-4

counter, andTSAP 2 is amod-5 counter, we will assign = 5 since we need to wait for5valuations toC1 to be sure that we attain an upper-bound on the diameter ofC2.

37

Our base case hasi = 1. If type(TSAP1) = CC, we obtaindm(1) = 1 andda(1) =0. This result is correct, since any valuation producible byC1 is producible every time-step

due to its lack of sequential behavior. We note thattype(TSAP1) cannot beMC, QC, orAC

since those types require other components to drive their inputs. Finally, iftype(TSAP1) =GC, thendm(1) = D1 which is an upper bound on the diameter ofC1 by definition, hence

our proof obligation is satisfied.

We next proceed to the inductive step. Iftype(TSAP i+1) = CC, then our result is

correct by hypothesis, noting thatTSAP i+1 is a purely combinational function ofCi, as

well as FREE vertices and REGISTERs which behave as constants. Iftype(TSAP i+1) =AC, thendm(i+1) = dm(i) andda(i+1) = da(i)+1. This result is correct since the initial

values of anAC have semantic importance only at time 0, and since anAC merely delays

some valuations toCi by one time-step. Iftype(TSAP i+1) 2 fMC, QCg, then we obtaindm(i + 1) = dm(i) � �rows(TSAP i+1) + �(i + 1)� andda(i+ 1) = da(i) + �(i + 1). This

result is correct by noting that it can take at most � dm(i) + da(i) time-steps to reach any

possible succession of valuations toCi by hypothesis. If�(i + 1) = 1, thenCi fans out

to Ci+2, meaning that we generally must wait for = �rows(TSAP i+1) + 1� valuations

to Ci to be sure that we have an upper bound on the diameter ofCi+1. If �(i + 1) = 0,

then we need only wait for = rows(TSAP i+1) valuations toCi, plus one extra time-step

for the load to take effect uponCi+1. Lastly, if type(TSAP i+1) = GC, thendm(i + 1) =dm(i) � �Di+1 � �(i + 1)� + �(i + 1) andda(i + 1) = da(i), whereDi+1 is defined as an

upper-bound on the diameter ofCi+1 \ TSAP i+1. For �(i + 1) = 0 this result is obvious.

Otherwise, note that any trace segment begins in one state ofTSAP i+1, and = (Di+1�1)transitions – which must initiate within � dm(i) + da(i) time-steps, plus one for the final

transition to complete – is sufficient to putTSAP i+1 into any of its subsequently-reachable

states. Hencedm(i+ 1) = dm(i) � (Di+1 � 1) + 1 time-steps satisfies our obligation.

We demonstrate the use of our structural diameter overapproximation algorithm for

the netlist depicted in Figure 4.3. We have partitioned thisexample netlist into six compo-

38

nents. The first component to the left is aCCcontaining only combinational logic, possibly

including FREE vertices. Our algorithm providesda(1) = 0 anddm(1) = 1; thus the diam-

eter overapproximationd that our algorithm would ascribe to any vertex in component 1is

1. This result implies that we need to check such a target onlyfor time-step 0 to provide

an exacthit/unreachableanswer, which is intuitive since component 1 does not act sequen-

tially; any reachable valuation to the vertices in this component will be reachable every

time-step, thus if a target cannot be hit at time 0, then it cannot be hit at any time-step.

We next compose anAC onto this first component. Our algorithm providesda(2) = 1 anddm(2) = 1; thusd for any vertex in component 2 will be 2. This implies that we need only

check such a target for time-steps 0 and 1, which is again intuitive since the time-0 check

will validate whether the initial values can hit the target,and at time 1 any possible valua-

tion to component 1 will propagate through component 2. We next compose anotherCC,

which does not affect diameter; a two-step bounded check is complete since this bounds

the diameter of component 2, and since component 3 does not act sequentially. Component

4 is anAC and adds one to diameter, which is correct as per the analysisof component 2.

We next add anMC with two rows, which constitutes a cut of the netlist. It thusadds one

to theda sum and doubles thedm product, yielding a diameter bound of 5 for vertices in

this component. This result is conservative since we need towait at most for two additional

time-steps over the diameter of component 4 to be sure that all possibleload anddataval-

ues will propagate into these two memory rows. Note thatload values may be correlated

hence mutually exclusive, which is why we must doubledm to be sure that we have waited

long enough for two loads to occur. Lastly, we compose another CCas component 6, which

does not affect diameter as per the previous discussion.

The following corollary is an immediate consequence of Theorem 4.2.

Corollary 4.1. Given an arbitraryTSAPof a netlistN , we may compositionally obtain

a diameter bound by the algorithm of Figure 4.2 while using anarbitrary mechanism to

obtain a diameter boundDi for each componenti in the context ofN . Furthermore, the

39

d = 2 d = 31 d = 22d = 1 d = 5 d = 5653 4mad (5) = 2ma ma d (3) = 1d (1) = 0d (1) = 1 d (2) = 1d (2) = 1 d (3) = 1 d (4) = 2a d (5) = 3m md (4) = 1 aam d (6) = 3d (6) = 2

10

01

ACCC CCMCACCC

Figure 4.3: Diameter overapproximation example

diameter bounds obtained by the algorithm of Figure 4.2 for each isolatedCC, AC, MC,

andQCare overapproximate regardless of the nature of the overallnetlist.

Corollary 4.1 implies that we may use different techniques to obtain a diameter

bound on the various components of aTSAPand still obtain an overall overapproximate

bound. This is a noteworthy result, since general-purpose exact diameter calculation pro-

cedures are presently intractable (refer to Section 4.1), and since overapproximation tech-

niques may yield results which are tight, or exponentially loose, or anywhere between.

However, different techniques may yield superior results on different components. Con-

sider, for example, a complete state graph, such as that induced by a vector of FREE ver-

tices driving a parallel vector of REGISTERs as illustrated by component 2 of Figure 4.3.

Corollary 4.1 states that the diameter of thisAC is at most two. However, use of an overap-

proximaterecurrence diameterwill yield an exponentially loose bound for thisAC. In other

cases, recurrence diameter may yield a tight bound. Our compositional approach therefore

provides a theoretical framework to enable a robust mechanism for efficiently obtaining

as tight a diameter bound as possible using a variety of techniques. Since diameter over-

40

approximation enables the use of bounded verification algorithms instead of often more

costly unbounded algorithms for assessing unreachability, this theory, coupled with further

advances in diameter estimation techniques, may well become a powerful cornerstone of a

robust, multi-faceted verification strategy.

Note that it is critical to obtain a bound on the diameter of each component in the

context of its cone-of-influence. To visualize this, assumethat a givenGC is an-bit counter

with a parallel load input port. If theload input is asserted, the valuation at theparallel

dataport will be loaded into the counter; else the counter will increment. If we isolate this

counter for semantic analysis, since every state may reach every other state via thisload

mechanism, the isolated counter will have diameter of 2. However, in the context ofN , the

load input may be semantically equivalent to 0, or perhaps some valuations to theparallel

data port are unsensitizable – thus implying a potentially exponentially greater diameter

for this component in the context ofN . In this example, the partitioned analysis enables

state transitions in the component which are unreachable inthe context ofN . The diameter

obtained on the isolated component may conversely be largerthan that in the context ofNsince unreachable states may become reachable. Therefore,we conclude that we cannot

use a tight diameter bound obtained upon an isolated component without consideration of

its cone of influence. However, use of recurrence diameter obtained from isolated analy-

sis is conservative; the possible additional states and state transitions which are an artifact

of partitioned analysis may only increase recurrence diameter. Similarly, we may use the

number of states reachable in the isolated component instead of 2jR\TSAP ij. Additionally,

we discuss the impact of all of the abstraction techniques presented in this thesis upon di-

ameter, thus enabling a transformation-based approach at calculating a diameter boundDiof Ci \TSAP i for each componenti. Each transformation will render a component which

may be recursively partitioned and analyzed using the theory presented in this chapter.

For example, it is possible that an abstraction may yield a component which may be sub-

partitioned into more “diameter-friendly” types such asACs andMCs. Alternatively, due

41

to the potential for REGISTER reduction inherent in our abstractions, semantic approaches

are likely to become more efficient and yield tighter bounds.

Theorem 4.3. Let N andN 0 be netlists which are trace-equivalent with respect to vertex

setsA andA0 and bijective mapping : A 7! A0. The diameter ofA is equal to that ofA0.Proof. This theorem follows immediately from Definitions 3.32 and 4.2.

While the result of Theorem 4.3 may seem obvious from the trace equivalence ofAandA0, it is somewhat counter-intuitive since the cone-of-influence ofA andA0 may be ar-

bitrarily dissimilar. For example, the cone-of-influence of Amay be entirely combinational

while that ofA0 include REGISTERs. Theorem 4.3 represents a powerful observation: we

may derive a bound for the diameter of a set of vertices based upon analysis of another

trace-equivalent set of vertices. We therefore could view our diameter ofA in the context

of N as being equivalent to that ofA with oi(A) n A being replaced by any arbitrary

set of vertices which preserves trace-equivalence ofA. While finding such a minimal safe

replacement will often be computationally infeasible (similarly to the computational com-

plexity of bisimilarity reductions [27]), we heuristically will consider specific replacements

as those resulting from optimal solutions to our structuralabstraction techniques. We will

exploit this fact in later chapters to demonstrate how a diameter bound obtained upon a

transformed (e.g., retimed) netlist implies a diameter bound on the original netlist.

4.1 Related Work

In this section we discuss prior research in diameter estimation techniques. Note that

breadth-first reachability analysis may be used to calculate the distance between states,

and thus yield a diameter bound. However, this approach is not practically useful, since a

reachability calculation from the initial states is sufficient to solve a verification problem.

The techniques of [57, 61] propose two uses of satisfiabilityalgorithms to attempt

to obtain a bound. First, quantified boolean formulae (QBF) are capable of providing tight

42

diameter bounds, though their solution is PSPACE-complete[62] and effective heuristics

have not yet been demonstrated. Second, recurrence diameter may in cases be tight, though

in others may be exponentially loose (recall theACexample discussed after Corollary 4.1).

In [61] it is further proposed to use a hybrid between these two approaches to attempt to

partially alleviate their shortcomings. Both of these techniques rely on heavy semantic

analysis which often outweighs the complexity of theBMC of the target itself, and which

significantly limits their applicability to practical problems. Our structural approach con-

sumes trivial resources, though it may also yield an exponentially loose solution in the case

of GCs. However, for other component types our approach does provide near-tight bounds.

Furthermore, our compositional theory allows a per-component hybrid use of structural vs.

semantic techniques, hence these semantic approaches are complementary tools to enable

our theory to obtain the tightest possible overapproximations with minimal resources.

The technique of [63] proposes using directed simulation toestimate diameter. The

primary drawback of this approach, and a significant differentiating factor with respect to

our technique, is that it constitutes neither an overapproximation nor an underapproxima-

tion of diameter, hence is not useful for enabling completeness ofBMC . The computational

resources reported in [63] also outweigh ours by several orders of magnitude.

In [17], we demonstrate that an acyclic netlist may be transformed into a purely

combinational netlist as a special case of -slow abstraction. However, the theory presented

in this chapter yields a smaller netlist through unfolding in such cases due to obviating

the need to represent “initial value selection” logic (refer to Chapter 9). Furthermore, this

chapter generalizes feed-forward -slow abstraction in allowing minimal unfolding of other

types of cyclic logic (such asCCs andMCs).

4.2 Experimental Results

We defer experimental results for our diameter approximation technique until Section 6.4.3,

so that we may study its synergy with various abstractions.

43

Chapter 5

Redundancy Removal

In this chapter we discuss redundancy removal techniques, by which we mean transforma-

tions which structurally replace vertices in the netlist graph with semantically-equivalent

vertices, thereby minimizing the total number of vertices in the cone-of-influence of a tar-

get. The common optimization technique ofconstant propagation[64, 65] is a special case

of redundancy removal, which entails merging vertices ontoZERO or ONE. The intricacy

of exploiting this technique lies in efficiently detecting as many semantically equivalent

vertices as possible to achieve optimal reductions.

The crux of redundancy removal is theMerge algorithm of Figure 5.1, which moves

all outgoing edges from one vertexv0 to another vertexv, and causesv0 to shadowv. In

order to ensure soundness and completeness for invariant checking, theMerge function

may generally only be applied to two vertices which are determined to be semantically

equivalent. To ensure legality of the resulting netlist, and to ensure efficiency, we addi-

tionally require that we do not merge a vertex with a purely combinational fanin cone onto

one which has a sequential fanin cone, and that we only merge aREGISTER onto another

REGISTER or a constant vertex.1 Practically, we may swap the merge arguments, or the

order of merges, to circumvent this limitation – and often achieve superior reductions and

1This rule may be relaxed, as long as care is taken not to introduce combinational cycles when merging aREGISTERonto an AND vertex.

44

void Merge(Vertex v; Vertex v0) fif(v � v0) f

return;gforeach u 2 outlist(v0) f

Delete Edge(v0; u);Add Edge(v; u);g

foreach u 2 inlist(v0) fDelete Edge(v0; u);gG(v0) = AND;

Add Edge(v; v0);foreach r 2 R f

if�Z(r) � v0� fZ(r) = v;gg

if(v0 2 T) fT = fT n v0g [ v;ggFigure 5.1: StructuralMerge algorithm

run-time in doing so.

Theorem 5.1. Algorithm Merge(v0; v) does not alter the semantics of any vertex inN ,

provided thatv andv0 are semantically equivalent.

Proof. We note that after the merge, verticesv andv0 are still semantically equivalent sincev0 shadowsv. We therefore conclude by Definition 3.12 that any trace which is legal before

aMerge is legal after and vice-versa.

Theorem 5.1 implies that redundancy removal is sound and complete for invariant

checking. Furthermore, this theorem implies that trace lifting merely requires a call to

Simulate to propagate consistent valuations to any merged vertices.

45

Theorem 5.2.Redundancy removal generates a legal netlist.

Proof. We consider the requirements for legality enumerated in Definition 3.24.

1. The only gates modified by redundancy removal become one-input AND gates, which

are legal. All other gates are legal by assumption.

2. No gates are created by redundancy removal, hence the resulting netlist is finite by

assumption.

3. Redundancy removal cannot merge a combinationally-driven vertex onto a sequen-

tially-driven one. Initial value cones are combinational by assumption, thus cannot

be made sequential by redundancy removal.

4. Any REGISTER transformed by redundancy removal is merged onto another REGIS-

TER, or onto ZERO or ONE. Therefore, each directed cycle remains sequential, or is

broken.

Theorem 5.3. If the diameter of a set of verticesU of a redundancy-removed netlist isd(U), then the diameter ofU prior to redundancy removal is alsod(U).Proof. This proof follows from Theorem 4.3, lettingN represent the netlist before redun-

dancy removal, andN 0 represent the netlist after redundancy removal. ClearlyU andU 0are trace-equivalent vertices with respect to defined as the set of corresponding tupleshu; u0i for eachu 2 U .

5.1 Redundancy Removal Algorithms

In this section we discuss efficient algorithms and data structures for performing redun-

dancy removal. The core redundancy removal algorithm is theMerge function defined in

46

Figure 5.1. However, the intricacy of exploiting redundancy removal lies in efficiently de-

tecting as many semantically equivalent vertices as possible to achieve optimal reductions.

There are two distinct, complementary approaches for this detection. On-the-fly compres-

sion minimizes vertex count during netlist construction [50, 25] by exploiting constant fold-

ing techniques (such asa ^ :a = 0) and by merging isomorphic vertices. Post-processing

techniques such as BDD sweeping [51] are used to identify semantically-equivalent ver-

tices which are too structurally dissimilar to be identifiedas such by the efficient yet lim-

ited on-the-fly techniques. On-the-fly techniques augment the use of semantic approaches

by keeping the original netlist representation as compact as possible, and by exploiting the

merging initiated by the semantic analysis.

In this section we focus upon our technique for on-the-fly redundancy removal

from [25], which is collaborative work with Andreas Kuehlmann. This technique, termed

on-the-fly retiming, is based upon an AND/INVERTER/REGISTER graph representation of

the netlist described in Definition 5.1. The AND/INVERTER/REGISTER graph is a power-

ful and compact netlist representation, useful not only forredundancy removal but also for

a retiming implementation (as discussed in Chapter 6) sinceit closely matches a retiming

graph representation.

Definition 5.1. An AND/INVERTER/REGISTERgraph representation of a netlist is a graph

where all vertices are of type AND or FREE; INVERTERs and REGISTERs are represented

implicitly as edge attributes. With each edge(u; v) 2 E we associate a tuplehw;E; ii.� Weightw(u; v) 2 Z represents the number of REGISTERs along this edge.� TermE = E1uv; : : : ; Ew(u;v)uv represents the corresponding sequence of initial values

for the REGISTERs along this edge.� Termi(u; v) 2 f0; 1g is aninvertedattribute indicating whether the edge function is

complemented; if 1, the corresponding INVERTER is at the fanout of any REGISTERs

along the edge.

47

We may map an AND/INVERTER/REGISTER graph to a netlist as demonstrated in

Figure 5.2. For each edge(u; v) in the AND/INVERTER/REGISTER graph, there will be

a structural path in the netlist beginning with vertexu and ending with vertexv. This

path will include a sequence ofw(u; v) intermediate REGISTERs whose initial values are

determined by the sequenceEiuv. Additionally, if i(u; v) is 1, there will be an INVERTER

at the fanout of the sequence of REGISTERs.

...(b)(a)

v Z(r1) = E1uvZ(rw(u;v)) = Ew(u;v)uvrw(u;v)u u r1w;E synthesis i synthesis v...hw;E; ii

Figure 5.2: Mapping the AND/INVERTER/REGISTER graph to a netlist: (a)AND/INVERTER/REGISTERgraph edge, (b) corresponding netlist fragment

The direct mapping to a netlist as depicted in Figure 5.2 doesnot account for the

concept of fanout REGISTERsharing as proposed in [66]. As is depicted in Figure 5.3, the

REGISTERs along all outgoing edges from a given source vertex may be shared, provided

that their initial value sequences are compatible. A more efficient mapping of an AND/IN-

VERTER/REGISTERgraph to a netlist should account for fanout REGISTERsharing, hence

will generate only the maximum number of REGISTERs along any outgoing edge from a

vertex, rather than the sum across all outgoing edges.

Note that an individual gate of the resulting netlist is identifiable by a source vertexu and a set of attributeshw;E; ii. For this reason, our algorithms for constructing gates

in the AND/INVERTER/REGISTER graph provided in Figures 5.4 and 5.5 returnvertex,

attributetuples, and take such tuples as operands.

Edge weightsw will always be non-negative given that our only sequential elements

are REGISTERs. However, we introduce the NEGATIVE REGISTER in Chapter 6. Allowing

48

(a) (b)

u uw = 2; i = 1w = 1; i = 0w = 0; i = 1Figure 5.3: Fanout REGISTERsharing example

negative weights enables use of the AND/INVERTER/REGISTERgraph for netlists contain-

ing NEGATIVE REGISTERs.

As discussed, the functions represented by two edges are semantically equivalent if

(though not necessarily “only if”) they have: (1) the same source vertex, and (2) the samehw;E; ii attributes; in this case, they correlate to the same netlistgate. In our implementa-

tion of this data structure, we use a compact 64-bit word to uniquely represent these tuples.

This word is composed of four bit fields: an index into the array of graph vertices, the num-

ber of edge REGISTERs, an index to a canonical representation of their initial values, and a

single bit to indicate edge complementation. Using this data structure, a simple comparison

of two words may decide whether two edges are semantically equivalent or inverted.

The canonical representation of initial values is based upon a tree structure where the

paths correspond to sequences of initial values of the edgesof N . The tree root is aNULL

dummy node. The first level of children represents the initial values of the first REGISTER

along any edge. The tree branching structure corresponds tothe different combinations of

initial values of all edges. By ensuring uniqueness of the individual paths and subpaths

during tree construction and manipulation, a pointer to anyof the tree nodes provides a

representation that is canonical for that particular set ofinitial values. (However, the initial

value vertices themselves may not be canonical.)

The technique of on-the-fly retiming is used to eliminate sequential redundancy by

applying specific, local retiming [66] steps during the construction of the AND/INVER-

49

TER/REGISTER graph. Similar to the use of an AND/INVERTER graph for combinational

netlists [50], this approach may result in a significant compaction of the netlist represen-

tation without significant time or memory overhead. The on-the-fly retiming step is inte-

grated into our algorithms for constructing AND gates and REGISTERs. We demonstrate

the algorithm for 2-input AND gates in Figure 5.4, and for REGISTERs in Figure 5.5. The

operands to these functions each comprise a source vertex and a set of edge attributes, cor-

relating to an AND/INVERTER/REGISTER graph edge without a sink vertex. The graph

construction begins from FREE and ZERO vertices and an arbitrary set of REGISTER cuts

of the cyclic logic. For each cut, first a dummy 0-input AND vertex is created and used as

a place-holder. Once the next-state function of the corresponding REGISTER is built, the

place-holder is merged onto that structure.

The AND construction algorithm first performs constant folding similar to methods

applied in combinational netlist compaction [50]. Next, the REGISTERsequence along both

edges are truncated by “dragging” as many REGISTERs as possible through the AND vertex.

The edge truncation is performed by functionTruncate Registers; note that for inverted

input edges, these truncated initial values must be inverted before being returned. The

initial values of the dragged REGISTERs are computed by a pairwise AND of the truncated

initial values. This operation is performed by the functionAnd Initial Values. If a pre-

existing node corresponding to this conjunction is found inthe initial value tree, it is reused;

otherwise, a new node is constructed. We next form an AND vertex over the truncated

edges. We swap incoming edges to the AND vertex to capture commutativity using an

arbitrary ordering functionRank, then check to see if an isomorphic AND vertex exists. If

so, it is reused; otherwise, a new AND vertex is created and hashed. The edge correlating

to the set of dragged REGISTERs is then returned with the source AND vertex. For an AND

gate withn inputs, the computational resources required for each callto this algorithm areO(n � wmin) due to REGISTERdragging and initial value conjunction; the other operations

require constant time, assuming a constant-time hashing function.

50

/* Create And takes two operand edges e1 and e2, andreturns an edge representing their conjunction */

AIR Edge Create_And(AIR Edge e1; AIR Edge e2) fif (e1 � ZERO) return ZERO;if (e2 � ZERO) return ZERO;if (e1 � ONE) return e2;if (e2 � ONE) return e1;if (e1 � e2) return e1;if (e1 � :e2) return ZERO;

/* Truncate as many REGISTERs as possible from eachedge, and store them as edge attributes in Ei */wmin = minfw(e1); w(e2)g;e01; E1 = Truncate_Registers(e1; wmin);e02; E2 = Truncate_Registers(e2; wmin);

/* Merge the initial values by ANDing them */E = And_Initial_Values(E1; E2);/* Apply ranking to exploit commutativity */if

�Rank(e01) > Rank(e02)� Swap(e01; e02);

/* Hash lookup for AND over e01 and e02 */e = Hash_Lookup(e01; e02);/* Create & hash new vertex if lookup fails */if (e � NULL) fe = Create_And_Vertex(e01; e02);g/* Add back dragged REGISTERs */return e; hwmin ; E; 0i;gFigure 5.4: AND/INVERTER/REGISTER-graph algorithm for AND gate creation

51

/* Create Register takes two operand edges: en representingthe input to the REGISTER, and ei representingits initial value. It returns an edge representingthe corresponding REGISTER */

AIR Edge Create_Register(AIR Edge en; AIR Edge ei) fif ((en � ZERO) ^ (ei � ZERO)) return ZERO;if ((en � ONE) ^ (ei � ONE)) return ONE;i = i(en);/* Drag inversion past REGISTER */if (i) fei = Create_Inverter(ei);ge; E = Create_Edge(en; ei);return e; hw(en) + 1; E; ii;gFigure 5.5: AND/INVERTER/REGISTER-graph algorithm for REGISTERcreation

The REGISTERconstruction algorithm first attempts to replace the REGISTERwith

a constant. If unsuccessful, it drags any inversion past theREGISTER being created by

inverting the corresponding initial value using functionCreate Inverter . Create Inverter

merely toggles the inversion attribute of the corresponding edge.Create Edge looks for a

node in the initial value tree correlating toei as a child ofen; if it finds one, it reuses this

node, otherwise a new node is constructed. Each call to this algorithm requires constant

time, assuming a constant-time hashing function.

We introduce an example netlist in Figure 5.6a, and its corresponding AND/IN-

VERTER/REGISTER graph in Figure 5.7a. If this graph is created using the on-the-fly

retiming algorithms of Figures 5.4 and 5.5, the resulting AND/INVERTER/REGISTERgraph

is depicted in Figure 5.7b, corresponding to the netlist of Figure 5.6b. The graph was

constructed from the original netlist shown in Figure 5.6a starting from the FREE vertices

and a cut at REGISTERr1.52

(a)

0

0

0

0

1

(b)

1 1

1

0

1

x3x2x1g1

g3 g4g2r1r2 r4r3 r7g5x1 r03r01r7

y1x3x2 g3 g4

g1=g2

y1r5r6 y2

g5y2

Figure 5.6: On-the-fly retiming example: (a) original netlist, (b) netlist after on-the-flyretiming

Lemma 5.1. The on-the-fly retiming transformations of Figures 5.4 and 5.5 are sound and

complete for invariant checking.

Proof. We consider the individual transformations.� The following transformations are correct by propositional logic.

– Conjunction with ZERO yields ZERO.

– Conjunction of opposite polarity literals yields ZERO (contradiction).

– Conjunction of a literal with ONE is semantically equivalent to that literal (iden-

tity).

53

0

1

0

0

0

1

1

(b)

(a) y1y2g4g3g1=g2

y1y2g2g4g3w = 2 w = 1w = 1w = 1w = 2

w = 2w = 1

w = 1

g5

g5

x3x1x2x3x1x2

w = 1

g1

edge inversionreference to node of initial value tree

set ofw REGISTERsw

Figure 5.7: AND/INVERTER/REGISTER graph example: (a) graph of original netlist ofFigure 5.6a, (b) graph of on-the-fly retimed netlist of Figure 5.6b

– Conjunction of identical literals is semantically equivalent to that literal (idem-

potency).

– Swapping of incoming edges to an AND vertex is semantically correct by the

commutativity of conjunction.

– Elimination of pairs of adjacent INVERTERs is semantically correct (double

negation).� The following transformations are correct by semantic equivalence (Theorem 5.1).

54

– Re-use of an existing AND or INVERTER with identical inputs is semantically

correct.

– Re-use of an existing REGISTER with an identical input and initial value is

semantically correct.

– Replacement of a REGISTER with a constant initial value, whose input is the

same constant, with the corresponding constant is semantically correct.� An INVERTER dragged past a REGISTER is semantically equivalent to the original

REGISTER (without dragging). Because we invert the initial value of the bypassed

REGISTER, at time 0 the dragged INVERTER drives the inversion of the inverted

initial value, equivalent to the value of the original REGISTERby the double negation

property of propositional logic. Thereafter, the dragged INVERTER will drive the

negation of the valuation that appeared at the source of the undragged INVERTER

one time-step earlier, as will the original un-bypassed REGISTER.� The last of thewmin REGISTERs dragged beyond an AND vertex is semantically

equivalent to the original unbypassed AND vertex. At time-stepi 2 [0; wmin � 1℄,this last dragged REGISTER drives the(wmin � i)-th dragged initial value, which

is equivalent to the valuation to the unbypassed AND vertex at the same time-step

because we conjunct initial values of the dragged REGISTERs. Thereafter, this last

dragged REGISTER drives the conjunction of valuations to the sources of the by-

passed AND from wmin time-steps earlier; valuations to the sources of the bypassed

AND arewmin time-steps earlier than those of the unbypassed AND.

Note that eachMerge call requires linear resources with respect to netlist size. Prac-

tically, the resources tend to be near constant, since most vertices have a relatively small

indegree and outdegree, and since we may hash REGISTER initial values to avoid needing

55

to explicitly check each one duringMerge. After merging, the merged vertex will have

zero sinks, and will not be in the cone of influence ofT . Therefore, optimal redundancy

removal may be achieved withinjVj calls toMerge, which overall bounds necessary re-

sources to quadratic. After a vertex is merged, it is often beneficial to analyze its original

sinks to see if they too may be candidates for simplification –we refer this recursive for-

ward sweeping of simplification asforward hashing. However, we must take care not to get

caught in an infinite loop of dragging REGISTERs through cyclic logic when redundancy

removal includes on-the-fly retiming. This may be enforced by setting and clearingvisited

flags as forward hashing processes vertices. If thevisitedflag of a vertex is already set,

forward hashing neglects processing that vertex to preventinfinite recursion.

5.2 Related Work

Redundancy removal has been the topic of numerous prior research efforts; our contribution

to this area is the technique of on-the-fly retiming. Our AND/INVERTER/REGISTERgraph

is a sequential extension to the AND/INVERTER graph proposed in [50] which enables

sequential redundancy removal by relocation of REGISTERs across combinational vertices

in the graph. This relocation correlates to the applicationof specific retiming moves [66].

Semantic approaches of redundancy removal provide an extension to on-the-fly

techniques. If they determine two vertices to be semantically equivalent, they merge one

onto the other. As an example, the technique of [51] iteratesbetween BDD-sweeping and

SAT solving with resource bounds. BDDs are canonical representations of the function of

a vertex, and are built up to a specified upper-bound size limit. BDD hashing is used to

determine whether vertices are semantically equivalent, in which case they are merged, or

opposite, in which case one is merged onto the inversion of the other. When BDDs become

too large, intermediatecut verticesare used to allow BDDs to be built beginning from ar-

bitrary points in the netlist, not just FREE or REGISTER points. SAT is also used to prove

56

equivalence or inversion as an alternate algorithm which may in cases outperform BDDs.

The application of on-the-fly redundancy removal may be synergistically combined with

such semantic approaches. The integration of on-the-fly retiming furthermore extends the

equivalence checking capability of these techniques beyond combinational verification to

potentially cover a significant class of practical problemsin verifying retimed netlists [67].

The benefit of specific forms of redundancy removal for enhancing verification has

been noted in numerous prior publications, such as [64, 65].


The experimental results of our redundancy removal techniques will be provided in Sec-

tion 6.4.1 so that we may study their synergy with retiming.

57

Chapter 6

Generalized Retiming

In this chapter we discuss the use of generalized min-area retiming to reduce verifica-

tion complexity. This chapter extends the results of collaborative work with Andreas

Kuehlmann reported in [10, 25]. Retiming is a structural optimization technique which

relocates REGISTERs in a netlist across combinational gates with the objectiveof mini-

mizing their total count, minimizing the greatest combinational delay along any directed

path containing no REGISTERs, or minimizing one objective while constraining the poten-

tial increase of the other [68, 66]. For synthesis purposes,the latter is the most common

objective since minimization of REGISTER count is often contrary to minimization of the

worst-case combinational delay, thus simultaneous minimization of these two objectives

is typically necessary. However, for invariant checking weare not concerned about com-

binational delays, hence the minimization of REGISTER count – which is referred to as

min-arearetiming – is our primary objective. Unlike the on-the-fly retiming technique

from the previous chapter, traditional retiming alters thesemantics of the netlist by causing

gates to be temporally shifted.

The traditional use of retiming is for enhanced synthesis, which imposes two con-

straints that fundamentally limit its solution space: the retimed netlist must be physically

implementable, and the retiming must preserve the originalinput-output behavior of the

58

netlist. For verification, these restrictions may be liftedwhich results in a larger solution

space, hence a potentially significantly greater reductionin REGISTER count. There are

three generalizations of classical retiming that may be exploited in a verification domain.

First, REGISTERs which are sourced by FREE vertices or have no sinks represent a mere

temporal shift of peripheral values, thus may be suppressedfor state space traversal us-

ing the technique of peripheral retiming. Second, a temporally partitioned invariant check

eliminates the restriction that the retimed netlist must have an equivalent reset state. Third,

verification algorithms may handle NEGATIVE REGISTERs, which are formalized in Defi-

nition 6.2. This significantly increases the solution spacefor legal retimings by removing

the non-negativity constraints from the problem formulation. In this chapter we explore

these topics, extending the results we reported in [10].

Retiming is traditionally applied to a rigid netlist graph and repositions the REGIS-

TERs without altering the combinational logic structure. Wheninterleaved with redundancy

removal, a repeated application of retiming may significantly optimize the overall netlist

structure [10, 69, 70]. In this chapter we additionally introduce our technique of fanin REG-

ISTER sharing from [25], which is analogous to the original concept of fanout REGISTER

sharing [66]. This technique takes a new view of the retimingformulation by departing

from a traditional, more restrictive use of a fixed netlist structure.

6.1 Retiming Formulation

In this section we define retiming, and discuss its formulation. The retiming optimization

problem may be formulated as an Integer Linear Program (ILP)using a directed graph

model of the netlist [66] which represents REGISTERs implicitly as edge weights, similarly

to our AND/INVERTER/REGISTER graph introduced in Section 5.1. For simplicity of ex-

position, all theory of this chapter is developed accordingto this representation; refer to

Section 5.1 for a precise mapping of this representation to anetlist.

59

Definition 6.1. A retimingof netlistN is a gate labelingr : V 7! Z, wherer(v) is thelag

of vertexv denoting the number of REGISTERs that are moved backward through it.

The retimed edge weights~w of the retimed netlist~N are computed as follows.~w(u; v) = w(u; v) + r(v)� r(u) (6.1)

Traditional retiming also imposes non-negativity constraints upon~w(u; v).~w(u; v) � 0 (6.2)

For min-area retiming, we are interested in minimizing the total number of REGIS-

TERs of ~N . min X8(u;v)2E ~w(u; v) (6.3)

6.1.1 FanoutREGISTERSharing

The above retiming formulation does not consider fanout REGISTER sharing as depicted

in Figure 5.3. Leiserson and Saxe [66] provide an extension to the retiming formulation

to account for fanout sharing, as depicted in Figure 6.1a. Their approach adds a dummy

vertex for each netlist vertexv with an outdegree greater than one. (No semantics are

applied to these dummy vertices; they are merely temporary artifacts of the ILP graph.)

This dummy vertex will sink all vertices inoutlist(v), and edge weights are modified as

shown. Letwmax represent the maximum weight of any of these outgoing edges,andn = outdegree(v). Each weightwi is divided byn, and the new edges to the dummy

vertex are assigned a weight equal to the difference betweenwmaxn and the modified weight

of the corresponding fanout edge. This fractional weight isrealized by associating a “cost

per unit weight”�(u;v) = 1=n with each edge(u; v), and minimizing the total weighted

cost in the objective function (6.3). Note that the sum of alledge weights in this “sharing

subnetlist” is equal towmax, and that the retiming formulation accounts for~wmax in the

overall minimization problem because at least one incomingedge to the dummy vertex

60

...(a)

...(b)

w2nwnnw1n wmax�w2nwmax�wnnwmax�w1n

Dummy vertex for fanout register sharing

Dummy vertex for fanin register sharing

Combinatinal gate

wmax�w1n wnnw1nw2nwmax�wnnwmax�w2nFigure 6.1: REGISTER sharing: (a) ILP model of fanout REGISTER sharing, (b) extensionto fanin REGISTERsharing

will have a weight of zero in any optimal solution. This precisely models fanout REGISTER

sharing for the ILP formulation.

6.1.2 FaninREGISTERSharing

As we proposed in [25], the concept of fanout REGISTERsharing may be extended to fanin

REGISTERsharing. If the vertices represent completely symmetric boolean functions, then

all possible tree configurations establish a valid decomposition of their function. In our

framework, we use multi-input AND vertices as a base system for fanin sharing. How-

ever, the presented concepts are equally applicable to other completely symmetric functions

(e.g., OR and XOR vertices).

Figure 6.1b shows how the concept of fanout REGISTERsharing may be adapted to

fanin REGISTERsharing. A dummy vertex for fanin sharing is created for eachAND vertexv with an indegree of three or greater (since decomposition ofa 2-input AND is not useful).

This dummy vertex sources new incoming edges into all vertices ofinlist(v), and the edge

weights are modified as shown, analogously to the modeling offanout REGISTERsharing.

With this configuration, the retiming optimization problemwill minimize the maximum

number of REGISTERs at any of the fanin edges tov, rather than their sum. Once a min-

61

area retiming is computed, a simple algorithm may be used to decompose a multi-input

AND vertex into a tree of 2-input AND vertices to enable maximal sharing of REGISTERs.

...

...

...

(b)(a)

mk n

k + n +m

0n +m

k + n +mn +mm m w

set ofw REGISTERs

Figure 6.2: Decomposition of an AND vertex that requires onlywmax = k + n +m REG-ISTERs: (a) vertex with incoming edges sorted by weight, (b) corresponding AND tree

The scheme for flexible tree decomposition that requires only ~wmax REGISTERs is

illustrated in Figure 6.2. The algorithm first sorts the incoming edges to an AND vertex by

their retimed weight. Next, an AND tree is built using the structure of Figure 6.2b. For each

set of incoming edges with identical weight, a balanced AND subtree is constructed. The

individual subtrees are then connected by REGISTERs in a linear sequence. The number

of REGISTERs assigned to the edges between the subtrees is equal to the difference of

their REGISTER count. This construction, and the calculation of the corresponding initial

values, may be performed by building a series of 2-input AND vertices from the highest-

weight incoming edge to the lowest-weight incoming edge using the on-the-fly retiming

algorithm of Figure 5.4.

For maximum fanin sharing, the netlist graph is first restructured to form maximal

AND vertices – i.e., we iteratively subsume AND vertices which are connected without

inversion or sequential elements into “larger” multi-input AND vertices. Next, a retiming

graph with the dummy vertices for fanin and fanout sharing isbuilt. Note that for any edgee involved in simultaneous fanin and fanout sharing, a splitting one-input AND vertex must

62

be introduced between its endpoints to disambiguate a retiming solution during resynthesis

of the retiming graph. After computing the optimal retiming, a two-input AND graph may

be rebuilt using the procedures depicted in Figures 5.3 and 6.2 to obtain a minimum number

of REGISTERs.

(b)

Splitting vertex between fanin and fanout sharing subnetlists

(a)g5

y1y2131313 03 g3=g403 13

0313 0303g1=g2

x1x3 03 x2 13000

0g5

y1y22313 13 g3=g423

030303 13 0303g1=g2

x1x3 13 x2 13000

0Figure 6.3: Retiming graph for netlist of Figure 5.6: (a) original graph with 3 REGISTERs,(b) optimal solution resulting in 2 REGISTERs

Figure 6.3 depicts the retiming graph for the example netlist of Figure 5.6b. Part

(a) provides the edge weights for the original netlist. The top portion of the graph depicts

the dummy vertex modeling the possible sharing of fanout REGISTERs of vertexg1=g2.The bottom portion models the possible sharing of fanin REGISTERs of vertexg3=g4. Note

that we initially arbitrarily assigned the two REGISTERs between gatesg1=g2 andg3=g4 to

the input portion of gateg3=g4. An assignment to the output portion of gateg1=g2 would

yield identical results. Part (b) shows the resulting weights from the ILP solver, which

corresponds to an optimal retiming.

63

6.1.3 Relaxing Input-Output Equivalence Constraints

The original definition of retiming for synthesis requires the preservation of input-output

semantic equivalence. One consequence of this requirementis that the sequential weight

of any path from a design input (which correlates to a FREE vertex in our framework) to a

design output (which correlates to a sinkless vertex in our framework) must be unchanged

through retiming. Leiserson and Saxe [68] propose enforcing this constraint by introduc-

ing a specialhostvertex, which sources edges to all FREE vertices, and sinks edges from

all outputs. Retiming the host therefore merely shifts REGISTERs across these peripheral

vertices rather than discarding them, thereby preserving path weights.

Figure 6.4a depicts a netlist with six REGISTERs R1; : : : ; R6, two FREE verticesaandb, and one sinkless targett. The initial values of the REGISTERs areZ(R1) = ONE,Z(R2) = ZERO, Z(R3) = ZERO, Z(R4) = ONE, Z(R5) = ZERO, andZ(R6) = ZERO.

Figure 6.4b shows the retiming graph for this netlist including the host vertex. The edge

labels denote the number of REGISTERs along the corresponding nets.

(a) (b)

R2 R3

4RR6

51 R0

1 1

2

1

0

0

0 0

00

0 0host0

1

R

abg1g3

g4 g5g2 tg1 g2

g3g4 g5

g6g6 b ta

Figure 6.4: Retiming example: (a) original netlist, (b) corresponding retiming graph

For verification purposes, REGISTERs at the peripheral vertices of a netlist represent

mere temporal offsets and do not impact the reachability of the netlist core [71]. Thus, they

may be suppressed during the verification process. These offsets may be restored by tempo-

ral shifts in any traces obtained on the retimed netlist as per the trace-lifting algorithm im-

64

plied by Lemma 6.2. To enable discarding of REGISTERs from peripheral vertices, which

is termedperipheral retiming[69], the host vertex is removed from the retiming graph,

causing the ILP solution to pull as many REGISTERs as possible out of the netlist. For syn-

thesis applications, these REGISTERs are considered temporarily “hidden” or “borrowed”

and would have to be added back after optimization [69]. Figure 6.5a shows the graph for a

maximal peripheral retiming of the netlist of Figure 6.4, ignoring initial state equivalence.

The edge labels represent the REGISTERcounts 1= 2 of the original netlist ( 1) and retimed

netlist ( 2), respectively. The vertex labels denote their lag, i.e., the number of REGISTERs

that have been pushed backward through them. As depicted, bymergingR1 andR2 and

removingR6, the REGISTERcount may be reduced from six to four.

(a) (b)

1/0 1/1

2/1−1 −1 −10

−1 −1−1

−1 −2

1/0 1/1

2/00/1−1 −2 −20

−1 −2−1

−1 −3

0/1

1/01/0

0/−1

1/1 1/0

g1 g4 g5g6g3

g2 g1 g2 g4 g5g6

tba

b ta g3

Figure 6.5: Relaxed retiming graphs for the example of Fig. 6.4 : (a) peripheral retimingignoring reset state equivalence, (b) retiming with NEGATIVE REGISTERs permitted

A second constraint imposed by synthesis requirements is that the retimed netlist

must have an equivalent initial state. With this restriction, the netlist of Figure 6.4a can-

not readily be retimed since REGISTERs R1 andR2 have incompatible initial values and

cannot be merged by a backward move. To visualize this, ifR1 andR2 are shared with an

initial value of ONE, the sequencefh(a; 0); 0i; h(b; 0); 0i; h(a; 1); 1i; h(b; 1); 0i; h(a; 2); 0i;h(b; 2); 0igwould produce the sequencefh(t; 0); 0i; h(t; 1); 0i; h(t; 2); 0ig instead of the se-

quencefh(t; 0); 0i; h(t; 1); 0i; h(t; 2); 1ig in the original and REGISTER-shared netlist, re-

spectively. Similarly, the sequencefh(a; 0); 1i; h(b; 0); 0i; h(a; 1); 0i; h(b; 1); 0i; h(a; 2); 1i;65

h(b; 2); 0i; h(a; 3); 0i; h(b; 3); 0igwould produce a distinguishing sequence of valuations tot if both REGISTERs are shared with a joint initial value of ZERO.

In verification, we need not preserve input-output equivalence of the retimed netlist

as long invariant checking is preserved. The requirement for equivalent reset states may be

relaxed by a temporal decomposition of the verification taskinto two parts: (1) performing

a bounded model checkof each time-step oft included in a combinational initialization

structure, representing an unfolding of each vertexv for time-steps0 : : : 1 � r(v), here-

after referred to as theretiming stump, and (2) checking the retimed netlist core, hereafter

referred to as the retimedrecurrence structure. By separating these two obligations, we

perform a temporal decomposition of the invariant check which enables greater reduction

capability for the subsequent verification flow than otherwise possible.

6.1.4 EnablingNEGATIVE REGISTERs

A third and final relaxation of retiming is achieved by enabling negative weights along the

edges. This approach is motivated by the fact that REGISTERs merely denote functional

relations between different time-steps as illustrated by Definition 3.12. In logic synthesis,

clocked or unclocked delay elements are used to physically implement these relations. Such

elements may only cause delays of present values into futuretime-steps. However, for veri-

fication, this limitation may be lifted and arbitrary temporal relations in either direction may

be supported, thus enabling the generation of NEGATIVE REGISTERs. NEGATIVE REG-

ISTERs are formalized in Definition 6.2, which serves as an addendum to Definition 3.11

and 3.12 for this chapter.

Definition 6.2. A NEGATIVE REGISTER is a one-input gate which acts as a one-time-step

predictor of its input. TermGv is not referenced for NEGATIVE REGISTERs.� If type(v) = NEGATIVE REGISTER, thenp(v; i) = p(u; i+ 1), whereu = inlist(v).66

We provide an alternative interpretation of the NEGATIVE REGISTERin Figure 6.6b.

The sequential weight of a NEGATIVE REGISTER is �1, whereas that of a REGISTER is1. In the presence of NEGATIVE REGISTERs, states reached during forward traversal must

be validated as being truly reachable by analysis of future time-steps. This results in a

third component for the temporal decomposition of the verification task, reflected by the

retiming top(refer to Figures 6.6 and 6.8) – a state encountered during forward traversal

may be determined to be unreachable if it cannot satisfy thisstructure.

Though most abstractions discussed in later chapters are applicable to netlists which

contain NEGATIVE REGISTERs, we neglect discussing such due to the necessity of vali-

dating counterexample traces against theretiming top. Extensions of most verification

algorithms to handle NEGATIVE REGISTERs are straightforward as per Definition 6.2.

However, the focus of this thesis is on abstractions of use ina general invariant check-

ing paradigm without restriction or customization of verification algorithms. We therefore

only discuss NEGATIVE REGISTERs in this chapter.

To enable NEGATIVE REGISTERs, the non-negativity constraints of formula (6.2)

are relaxed for the ILP solver, and we minimize the sum of the absolute value of each~w(u; v) in our objective function (6.3). In synthesis, NEGATIVE REGISTERs are consid-

ered temporary and must be eliminated after optimization, except in specific cases where

precomputationmay be employed [72]. Figure 6.5b shows the resulting retiming graph for

the netlist of Figure 6.4. By using one NEGATIVE REGISTER, the total sequential element

count is reduced to three. Figure 6.6a depicts the resultingnetlist, where ~R2 represents a

NEGATIVE REGISTER. Note that these three sequential elements reflect the true temporal

relations present in the cyclic and reconverging paths of the original netlist. Figure 6.6b

also provides an alternative interpretation of NEGATIVE REGISTERs; we may merge a

NEGATIVE REGISTERr onto a new FREE vertexv, but we then must constrain the netlist

so that the value driven byv at timei is equal to the value sourcing the input tor at timei + 1. During symbolic analysis, NEGATIVE REGISTERs may be handled by exchanging

67

. . . . . .

(b)(a)

. . . . . .

=

1

R2

RR3~~

~ CONSTRAIN

FREE

ab tg6g5

g3g1 g2 g4

Figure 6.6: Retimed netlist of Fig. 6.5b: (a) retimed netlist, (b) intuitive interpretation ofNEGATIVE REGISTERs

the present and next state variables in the transition relation.

As a practical implementation issue, use of an absolute value in the objective func-

tion (6.3) causes a nonlinearity which may significantly increase computational require-

ments in precluding the application of a linear algorithm. We have found that an effi-

cient way to deal with this problem is to use two variables to denote retimed edge weight:~w+(u; v) and ~w�(u; v), correlating to the number of REGISTERs and NEGATIVE REGIS-

TERs along the retimed edge, respectively [73]. We split (6.1) into two constraints per edge:~w+(u; v) � w(u; v) + r(v)� r(u)~w�(u; v) � ��w(u; v) + r(v)� r(u)� (6.4)

We in turn require that~w+(u; v) and ~w�(u; v) are non-negative.~w+(u; v) � 0~w�(u; v) � 0 (6.5)

Our modified objective minimizes the sum of these two variables. Clearly, any

optimal solution will assign at least one of these two variables per edge to 0.min X8(u;v)2E � ~w+(u; v) + ~w�(u; v)� (6.6)

This modeling allows the use of an efficient ILP algorithm (such as the network

simplex algorithm [74]) to calculate an optimal solution. As demonstrated by Leiserson

68

and Saxe [66], if NEGATIVE REGISTERs are disallowed we may cast the retiming problem

as a min-cost flow problem for which we may use a polynomial-time graph-based algo-

rithm.1 Allowing NEGATIVE REGISTERs precludes this modeling; though efficient, the

simplex algorithm is not guaranteed to require sub-exponential resources for arbitrary prob-

lems. However, we have found that the network simplex algorithm often yields superior

performance to graph-based algorithms for retiming applications even when NEGATIVE

REGISTERs are disallowed.

One noteworthy practical issue is that relaxation of non-negativity constraints along

with REGISTERsharing modeling does not present an accurate cost model to the ILP solver.

In particular, if we allow negative weights on edges with non-unity �, retiming may pro-

duce a solution with a higher REGISTER count than the modeled cost of the ILP solution

reflected in the objective function. An example of this of phenomena is depicted in Fig-

ure 6.7. In Figure 6.7a is a retiming graph with a total cost ofone, distributed across three

edges. With non-negativity constraints, we could not reduce the number of REGISTERs

along the incoming edges to the dummy sink vertex unless we could backward-retimeg2or g3 and thereby exploit fanout sharing or backward-retiming ofg1, or forward-retimeg4to enable forward-retiming across the dummy vertex; each ofthose retimings risk caus-

ing suboptimalities due to other sources and sinks ofg2; : : : ; g4. However, without non-

negativity constraints, the ILP solver may drop the cost by13 without retimingg2; : : : ; g4as depicted in Figure 6.7b. Such a retiming obviously does not reduce REGISTER count

for the resulting netlist, and may overall prevent the ILP solver from selecting an avail-

able truly lower-cost solution. For example, one possible solution may drop the retimed

netlist weight by one, but the ILP solver instead will choosea solution which merely re-

times the dummy vertices in a graph with three structures as depicted in Figure 6.7. There

are two possible solutions to this modeling problem: first, we may impose non-negativity

constraints only on the edges with non-unity�, which is the solution we have chosen in our

1One of the most efficient known graph algorithms for the min-cost flow problem is theenhanced capacityscaling algorithm, which isO�jEj � log(jVj) � (jEj+ jVj � log(jVj))� [75].

69

(b)(a)

g2g3g1 g4 03-1303g2g3g1 g4 131303030313 13 0303

Figure 6.7: Example of incorrect ILP modeling of sharing with relaxed non-negativity con-straints: (a) original graph with total cost of1; (b) “incorrect” solution of lagging dummyvertex by -1, with absolute value total cost of23implementation due to its computational efficiency. Second, we may correct the sharing

model by departing from a graph representation, instead representing the “maximum ver-

sus summation” condition of sharing by modeling constraints which concurrently reason

about multiple edges, rather than individual edges.

6.1.5 Normalized Retiming

Formula (6.1) imposes an equivalence relation on the set of retimings. Two retimingsr1andr2 result in identical REGISTER placement and count, thus are said to be equivalent,

if and only if r1 = r2 + for some arbitrary integer . This concept enables us to use a

normalized retimingwithout sacrificing reduction optimality.

Definition 6.3. A normalized retimingr0 is obtained from an arbitrary retimingr, and is

defined asr0 = r �maxv2V r(v).Hereafter, we will use the term retiming to denote a normalized retiming. As will

be discussed, the use of a normalized retiming simplifies thecalculation of initial values

of retimed REGISTERs; a solution to this problem otherwise may not exist. Furthermore,

because fanout sharing is only applicable if the shared REGISTERs have equivalent initial

values, we may use an extra degree of normalization to shift all vertices forward until

70

the REGISTERs to be shared obtain equivalent initial values (refer to theretiming stump

discussed in Definition 6.4), thereby enhancing reduction potential. A similar trick may

be employed to ensure that REGISTERs on outgoing edges from ZERO or ONE may be

eliminated from the retimed netlist through constant propagations regardless of their initial

values.

6.2 Retiming for Enhanced Verification

In this section we discuss the use of generalized retiming for enhanced verification, and

provide proofs of correctness of this technique for invariant checking.

As discussed in the previous sections, we temporally decompose our verification

task into three components to enable greater reduction capability for min-area retiming.

Figure 6.8 illustrates the overall temporally decomposed verification task for the netlist

of Figure 6.6a. The medium-shaded area reflects the retimed recurrence structure which

must generally be discharged by sequential reachability analysis. The darkly-shaded area

denotes theretiming stump(refer to Definition 6.4), which is used to compute the initial

values for the retimed REGISTERs and to verify our targett for the first three time-steps.

The lightly-shaded area represents the retiming top.

We now illustrate how to process these three verification components. First, we

need to prove that the property holds for the retiming stump;because this is a combina-

tional structure, we may discharge this obligation using aBMC approach. In our example,

we see thatti = 0 for i = 0; 1; 2. The set of retimed initial values~Z is a subset of the

retiming stump. In our example, we obtain~Z( ~R1) = a0 ^ Z(R1) and ~Z( ~R3) = :� ~R2 ^Z(R2)^:((a0_b0)^Z(R3)^Z(R5))�. This correlates to an initial state set( ~R1; ~R2; ~R3) =f(0; 0; 1); (0; 1; 1); (1; 0; 1); (1; 1; 1)g. Next, using these retimed initial values, sequential

verification is performed on the recurrence structure. Thisleads to a counterexample for

initial state(0; 1; 1), with FREE vertex valuationsa1 = 0 andb1 = 0. Furthermore, the

71

(c)

1R’

ab 11

1R

ab 00ab 22

R’

R’

2

3

2R

1t t 2 t 3

...

...

...

...

...−1

−1

−2−2

−2

−1

−1

−1

~

~

3R~

1

5

2

~

~

Z(R )

Z(R )

Z(R )

Z(R ) 6

Z(R )

Z(R ) 3

4

t 0

Figure 6.8: Components of retimed netlist depicted in Fig. 6.6: darkly shaded: retimingstump; medium shaded: retiming recurrence structure; lightly shaded: retiming top

retiming top imposes the constrainta2 _ b2 upon the NEGATIVE REGISTER ~R2, which is

satisfiable for the given failing state. A complete counterexample trace is composed of a

satisfying assignment to the retiming stump for generatinga retimed initial state, a coun-

terexample trace generated upon the recurrence structure from the corresponding retimed

initial state, and a satisfying assignment to the constraint imposed by theretiming top. For

the given example, this results infh(a; 0); 0i; h(b; 0); 0i; h(a; 1); 0i; h(b; 1); 0i; h(a; 2); 0i;h(b; 2); 1ig.Ascribing semantics to this netlist representation where REGISTERs are implicit as

edge attributes, letEjuv; 1 � j � w(u; v) denote the initial value of thej-th REGISTER

along edge(u; v). Additionally, letGv(fjv; : : : ; fkv) be the function of gatev with incom-

ing edges(j; v); : : : ; (k; v). If v is a FREE vertex,Gv() denotes the sampled input value at

a specified time-step. Valuations to the gates ofN at timei � 0 may be computed by (6.7).

72

_p(fuv; i) = 8><>:Ew(u;v)�iuv if i < w(u; v);_p�u; i� w(u; v)� otherwise_p(u; i) = Gu� _p(fju; i); : : : ; _p(fku; i)� (6.7)

For example, the value at timei of the net connecting the output of REGISTER j with the

input of REGISTERj + 1 of edge(u; v) is _p(fuv; i+ w(u; v)� j).Similar to formula (6.7), for a given retimingr, valuations to the gates of the corre-

sponding retimed netlist~N at timei may be computed by (6.8). Term~Eiuv represents the

initial values of the corresponding retimed REGISTERs of ~N ._p( ~fuv; i) = 8><>: ~E ~w(u;v)�iuv if i < ~w(u; v);_p(~u; i� ~w(u; v)) otherwise_p(~u; i) = Gu� _p( ~fju; i); : : : ; _p( ~fku; i)� (6.8)

In contrast to formula (6.7), it is not obvious that this formula is well formed, be-

cause~w(u; v) may be negative.

Lemma 6.1. Let N be a legal netlist, andr be a retiming resulting in netlist~N . The

evaluation of formula (6.8) for computing the state of~N at timei will terminate for any

finite i � 0.

Proof. First, we note thati remains non-negative during the evaluation of (6.8) since time

is defined only uponN hence we never begin a valuation at a negativei, and since (6.8) will

never reducei below 0. Second, sinceN and therefore~N are finite, any non-terminating

evaluation of formula (6.8) must involve an infinite recursion on at least one gate. Letube one of those gates andhu; u1; : : : ; un; ui be a directed cycle in~N causing the recursion.

The difference betweeni andi0 of two succeeding recursions is theni � i0 = ~w(u; u1) +~w(u1; u2)+: : :+ ~w(un; u). A substitution using (6.1) leads toi�i0 = w(u; u1)+w(u1; u2)+73

: : : + w(un; u) since ther terms telescope. Our original netlist is assumed to be legal,

hence all directed cycles have strictly positive sequential weight; the telescoping property

of retiming furthermore guarantees that the cumulative weight along any directed cycle

is unchanged by retiming. Thereforei strictly decreases after each recursion throughu,

which causes the evaluation to terminate oncei < ~w(uj; uj+1) for some edge(uj; uj+1) in

the corresponding directed cycle.

Definition 6.4. The retiming stump~NS is a combinational netlist obtained by an unfolding

of N , and contains vertices corresponding to the following set.~NS = fsiuv : (u; v) 2 E ^ �siuv = _p(fuv; i)� ^ �0 � i < ~w(u; v)� r(v)�gOur retimed verification structure is a composition of~N and ~NS. The retiming

stump ~NS provides the edge functions for the first several time-steps, which is necessary

for verification of the targets during the time-steps eliminated from the recurrence structure,

as well as for providing the initial values for the REGISTERs of ~N as follows.~Ejuv = s ~w(u;v)�r(v)�juv ; 0 < j � ~w(u; v) (6.9)

Note that this formula is well formed for normalized retimings becauser(v) � 0.

Lemma 6.2.LetN be a legal netlist, andr be a retiming resulting in netlist~N and retiming

stump ~NS. The following relations provide a bijective mapping between each edge function

of f ~N; ~NSg to the corresponding edge function ofN and vice versa._p(fuv; i) = 8><>:siuv if i < ~w(u; v)� r(v);_p� ~fuv; i+ r(v)� otherwise(6.10)

siuv = _p(fuv; i) if i < ~w(u; v)� r(v);_p( ~fuv; i) = _p�fuv; i� r(v)� otherwise (6.11)

74

Proof. We first demonstrate that formula (6.10) correctly mapsf ~N; ~NSg to N . For i <~w(u; v) � r(v), (6.10) reflects ~NS from Definition 6.4, thus our obligation is trivially

satisfied. Fori � ~wuv � r(v), after substitution using (6.8), we must demonstrate that_p(fuv; i) = Gu� _p( ~fju; i + r(v)� ~w(u; v)); : : : ; _p( ~fku; i + r(v)� ~w(u; v))� by inductively

proving for each input tou that _p� ~fju; i+r(v)� ~w(u; v)� = _p�fju; i�w(u; v)�. Base case:

For the base case, we have thati + r(v) � ~w(u; v) < ~w(j; u). Using (6.8) and (6.9) we

obtain _p� ~fju; i+r(v)� ~w(u; v)� = ~E ~w(j;u)�i�r(v)+ ~w(u;v)ju = _p�fju; i+r(v)� ~w(u; v)�r(u)�which, after applying (6.1), satisfies our obligation.Inductive step: For the inductive step,

we have thati+ r(v)� ~w(u; v) � ~w(j; u). A substitution using (6.8) results in the equality_p� ~fju; i+ r(v)� ~w(u; v)� = Gj� _p( ~fhj; i+ r(v)� ~w(u; v)� ~w(j; u)); : : : ; _p( ~flj; i+ r(v)�~w(u; v) � ~w(j; u))�. If ~w(j; u) > 0 we may immediately reduce the temporal arguments

of Gj by induction. If ~w(j; u) � 0, then the right-hand side must be further expanded until

an inductive reduction may be performed. A termination analysis similar to the proof of

Lemma 6.1 may be applied, demonstrating that the time-stepi will eventually decrease and

therefore the expansion will terminate after a finite numberof iterations. This termination

will either result in a valuation from~NS, which satisfies our proof obligation as demon-

strated in the base case analysis above, or at a zero-input gate (FREE or ZERO), which is

clearly semantically equivalent both before and after the retiming.

We next demonstrate that (6.11) correctly mapsf ~N; ~NSg toN . The first part follows

from Definition 6.4. The second part follows from the previous inductive proof.

Formula (6.10) illustrates an efficient mechanism for lifting a trace obtained on the

retimed netlist to one consistent with the original netlist.

Theorem 6.1.Retiming is sound and complete for invariant checking.

Proof. This theorem is an immediate consequence of the bijectivevertex, time mapping

between the original and retimed netlist reflected by Lemma 6.2. In particular, atarget

unreachableresult will be generated only if all time-steps of the targetwithin the retiming

75

stump are proven unreachable, and also if the target is proven unreachable in the recurrence

structure, which collectively imply that the unretimed target is also unreachable.

Additionally, atarget hitresult will be generated if the target is hit within the retim-

ing stump, or if the target is hit in the recurrence structure. In either case, the unretimed

target is also hittable. The trace generated by a subsequentverification flow is semantically

correct with respect to the retimed netlist and hits the retimed target by assumption. There-

fore, the trace-lifting procedure implied by Lemma 6.2 willyield a semantically correct

trace with respect to the unretimed netlist, and will hit theoriginal unretimed target.

Theorem 6.2.A retimed netlist is a legal netlist.


1. The only gates fabricated by retiming are either retimed REGISTERs (and NEGATIVE

REGISTERs) which are correct by our synthesis of the AND/INVERTER/REGISTER

graph, or constructed by combinational unfolding, which are correct by the assump-

tion that the original netlist is legal.

2. We note that a normalized retiming will lag each vertex at most jRj time-steps, and

each retimed edge weight will be between�jRj; : : : ; jRj, else the retiming is not op-

timal. Therefore,~NS is of finite size. Furthermore, the recurrence structure contains

a copy of each combinational gate of the original netlist, with at mostjRj generated

REGISTERs and NEGATIVE REGISTERs, else the retiming is not optimal. Thus the

composite retimed netlist~N k ~NS is finite.

3. Our retimed initial values come from the retiming stump, which comprises a com-

binational unfolding of the original netlist. Hence, the initial value of every retimed

REGISTERmust be combinational.

4. Due to the telescoping ofr values for the vertices comprising each directed cycle,

retiming preserves the sequential weight of directed cycles whether or not NEGATIVE

76

REGISTERs are allowed. Therefore, by assumption, all directed cycles will have

strictly positive weight.

Theorem 6.3. If the diameter of a set of vertices~U of the recurrence structure isd( ~U),andmax~u2 ~U � � r(~u)� = i, then the diameter of the original set of verticesU satisfiesd(U) � d( ~U) + i.Proof. Note that we may compose a series of�r(~u) REGISTERs to each~u, whose initial

values are determined by corresponding values from~NS, to yield a set of verticesU 0 which

are trace-equivalent toU . Each stage of this pipeline is anAC, hence increments diame-

ter by at most 1. This proof therefore follows from Theorem 4.3 and Corollary 4.1 by the

trace-equivalence ofU 0 andU .

6.3 Related Work

Leiserson and Saxe first proposed retiming as a synthesis optimization [68] and developed

its graph-based ILP formulation [66]. Malik et al. [69] werethe first to introduce periph-

eral retiming with the objective of moving a maximum number of REGISTERs to the netlist

boundaries. This yields a maximal combinational netlist core to enhance the domain of

applicability of conventional combinational optimizations. They also introduced the con-

cept of NEGATIVE REGISTERs as a method of temporarily “borrowing” or “discarding”

REGISTERs from inputs and outputs. After combinational optimization, these NEGATIVE

REGISTERs are “legalized” by retiming them back to positive REGISTERs. In contrast, we

provide algorithms to directly handle NEGATIVE REGISTERs for enhanced verification.

The problem of generating valid initial states for a retimednetlist has been the topic

of several prior research efforts. Touati and Brayton [76] proposed a method for adding

reset logic which forces an equivalent initial state. Even et al. [77] described a modified

77

retiming algorithm that favors forward retiming, allowinga simple computation of the ini-

tial states similarly to our use of a normalized retiming. All previous work on retimed

initial state computation assumes the necessity of preserving input-output equivalence. In

contrast, our approach eliminates this restriction through a temporal decomposition of the

verification task, enabling a larger solution space hence a greater reduction potential for the

retiming solution.

Gupta et al. [71] were the first to propose the application of maximal peripheral re-

timing in the context of simulation-based verification. They showed that peripheral REG-

ISTERs may be discarded during test generation without compromising the coverage of the

resulting transition tour. However, their approach is focused upon test generation and does

not consider more general verification frameworks. Furthermore, their work does not ad-

dress the initialization problem and does not use the concept of NEGATIVE REGISTERs.

The work of Cabodi et al. [78], which uses retiming to enhancesymbolic reachability anal-

ysis, is the closest to ours. However, they use an original synthesis retiming algorithm

with the above-mentioned limitations regarding enforced reset state equivalence and dis-

allowing of NEGATIVE REGISTERs. Furthermore, their retiming domain is based upon

next-state functions of REGISTERs which significantly reduces the optimization freedom.

Consequently, their reported results demonstrate fairly modest improvements.

There are only two previous publications related to our technique of fanin REGIS-

TER sharing to our knowledge. In [79], a technique is presented that simultaneously con-

siders multiple structures for possible logic implementations using achoicevertex. Their

technique focuses upon technology mapping in synthesis, and despite its recursive capa-

bility, it must explicitly generate candidate structures for an AND cluster decomposition

including possible retiming configurations. In our approach, we defer the actual decom-

position step until after an optimal retiming is computed. Our modeling guarantees that

there will exist a decomposition of the AND clusters with the minimal number of REG-

ISTERs computed by the retiming solution. In [80], the concept of algebraic factorization

78

is extended to sequential expressions, which implicitly intertwines retiming with structural

rewriting. This work proposes a set of sequential transformations which may be applied in

a synthesis scenario. In contrast to our work, this technique is based on individual, local

restructuring steps and does not model the decomposition flexibility of the expressions for

global retiming.


In this section we provide a set of experimental results for retiming, redundancy removal,

and diameter overapproximation. We have deferred results for the previous chapters until

now so that we may study their synergy with retiming. We implemented these techniques in

C using the data structures and algorithms described in these chapters. We used the primal

network simplex algorithm from IBM’s Optimization Solutions Library (OSL) [81] as ILP

solver for the retiming formulation.

6.4.1 Redundancy Removal Experiments

Our first set of experiments study the effectiveness of our on-the-fly retiming algorithm

(presented in Section 5.1) and our fanin sharing algorithm on the reduction capability of

retiming. We disallow NEGATIVE REGISTERs for this set of experiments. These exper-

iments were run on an IBM ThinkPad Model T21, with an 800MHz PIII and 256 MB

main memory, running RedHat Linux 6.2. In these experimentswe used peripheral retim-

ing [71]. We focus here on reduction of the size of the recurrence structure~N , injecting

constants for retimed initial values to eliminate the contribution of the retiming stump~NS.

The retiming stump is often small hence does not constitute abottleneck in the overall ver-

ification scheme. This is mainly due to the fact that large portions of the stump resolve to

constants since most of the original REGISTERs have constant initial values. We revisit the

size of the recurrence structure in Section 7.3.

79

Table 6.1 provides results for various retiming options forthe ISCAS89 bench-

marks. The results are based upon the described AND/INVERTER/REGISTER graph rep-

resentation of the netlist and report the number of 2-input AND vertices and REGISTERs.

Columns 1 and 2 list the name of the netlists and their initial, unretimed sizes, respectively.

Column 3 provides the netlist sizes for retiming without theapplication of on-the-fly retim-

ing or fanin REGISTER sharing. This option is identical to classical peripheral retiming as

per [66]. In column 4 we report the result for fanin-REGISTER sharing without on-the-fly

retiming, whereas for the following column we enabled both.Columns 6 through 8 pro-

vide the results for an iterated application of retiming interleaved with redundancy removal,

using the technique of Kuehlmann et al. [51]. We iterated between both engines until no

further improvement was gained and reported the best results. Column 6 provides these re-

sults using plain retiming (as in column 3), whereas column 7reports the results of the best

option of the techniques used in column 4 or 5. Column 8 indicates the required computing

resources for the best run between columns 6 and 7, preferring minimum REGISTERs to

minimum AND vertices. In column 9 we provide previously published results. As shown,

our technique almost always yields lower REGISTERcounts. Despite detailed analysis, we

could not reproduce the results reported in [71] for netlists S344 and S349.

Table 6.2 provides the data for an identical set of experiments for various IBM Gi-

gahertz Processor (GP) netlists, after performing phase abstraction [16]. There are several

noteworthy trends in both tables. First, plain retiming decreases REGISTER count by an

average of16:8% on the ISCAS netlists, and by50:1% on the GP netlists. The larger re-

ductions observed for the GP netlists are a characteristic of the high degree of pipelining

inherent in high-performance designs, and indicative of the power of retiming to alleviate

these inflated REGISTER counts. Fanin REGISTER sharing allows an additional reduction

of the REGISTERcount by an average of0:9% and4:7% for the ISCAS and GP netlists. In

addition, the AND count is significantly decreased by the maximal AND clustering and tree

reformation process, by9:8% for ISCAS and20:7% for GP.

80

Design Original Plain Retiming On-the-Fly Iteration of interleaved Previousnetlist retiming with retiming retiming and redundancy removal results

[66] fanin with fanin (iterated until no further improvements) [71];[78]sharing sharing Plain Best result of Time (s) ;

retiming columns 4 or 5Memory (MB)PROLOG 853 ; 136 853 ; 45 676 ; 45 672 ; 46 709 ; 45 644 ; 45 1.0 ; 14.9 - ; -S1196 480 ; 18 480 ; 16 475 ; 16 475 ; 16 463 ; 16 456 ; 16 0.4 ; 4.4 16 ; -S1238 533 ; 18 533 ; 16 532 ; 16 532 ; 16 518 ; 16 513 ; 16 0.5 ; 6.5 17 ; -S1269 478 ; 37 478 ; 36 462 ; 36 463 ; 36 459 ; 36 450 ; 36 0.3 ; 4.4 - ; -S132071 3205 ; 638 3205 ; 389 2604 ; 390 2593 ; 407 1295 ; 266 1221 ; 267 3.6 ; 31.3 - ; -S1423 507 ; 74 507 ; 72 458 ; 72 458 ; 72 461 ; 72 455 ; 72 0.4 ; 5.5 72 ; 74S1488 734 ; 6 734 ; 6 618 ; 6 632 ; 6 659 ; 6 610 ; 6 0.7 ; 12.7 - ; -S1494 746 ; 6 746 ; 6 629 ; 6 644 ; 6 668 ; 6 622 ; 6 0.4 ; 6.5 - ; -S1512 484 ; 57 484 ; 57 455 ; 57 455 ; 57 470 ; 57 455 ; 57 0.3 ; 2.4 - ; 57S158501 3852 ; 534 3852 ; 495 3457 ; 498 3465 ; 498 3283 ; 490 3112 ; 475 9.3 ; 34.5 - ; -S2081 77 ; 8 77 ; 8 70 ; 8 71 ; 8 70 ; 8 70 ; 8 0.2 ; 2.2 - ; -S27 8 ; 3 8 ; 3 8 ; 3 8 ; 3 8 ; 3 8 ; 3 0.1 ; 2.3 - ; -S298 125 ; 14 125 ; 14 97 ; 14 97 ; 14 100 ; 14 91 ; 14 0.2 ; 6.3 - ; -S3271 1125 ; 116 1125 ; 110 1091 ; 110 1093 ; 110 1082 ; 110 1067 ; 110 1.0 ; 8.7 - ; 116S3330 820 ; 132 820 ; 45 657 ; 45 654 ; 46 692 ; 45 624 ; 45 0.7 ; 9.7 - ; -S3384 1070 ; 183 1070 ; 72 1070 ; 72 1070 ; 72 1064 ; 72 1062 ; 72 0.9 ; 6.7 - ; 147S344 109 ; 15 109 ; 15 102 ; 15 102 ; 15 101 ; 15 98 ; 15 0.2 ; 2.3 7 ; -S349 112 ; 15 112 ; 15 104 ; 15 104 ; 15 101 ; 15 98 ; 15 0.2 ; 2.3 7 ; -S35932 12204; 172812204; 172811948; 172811948; 172811660; 1728 11660; 1728 14.3 ; 38.5 - ; -S382 148 ; 21 148 ; 15 134 ; 15 136 ; 15 140 ; 15 134 ; 15 0.2 ; 2.3 15 ; -S385841 13479; 142613479; 141611769; 137511811; 141511794; 1374 11464; 1373 86.6 ; 239.9 - ; -S386 188 ; 6 188 ; 6 126 ; 6 133 ; 6 166 ; 6 125 ; 6 0.2 ; 4.3 - ; -S400 158 ; 21 158 ; 15 141 ; 15 143 ; 15 148 ; 15 141 ; 15 0.2 ; 2.3 15 ; -S4201 165 ; 16 165 ; 16 156 ; 16 159 ; 16 156 ; 16 156 ; 16 0.2 ; 2.3 - ; -S444 169 ; 21 169 ; 15 150 ; 15 153 ; 15 155 ; 15 149 ; 15 0.2 ; 2.3 15 ; -S4863 1750 ; 104 1750 ; 72 1537 ; 37 1537 ; 37 1376 ; 37 1326 ; 37 2.4 ; 17.3 - ; 96S499 187 ; 22 187 ; 22 199 ; 22 199 ; 22 187 ; 22 190 ; 20 0.3 ; 4.4 - ; -S510 213 ; 6 213 ; 6 213 ; 6 213 ; 6 211 ; 6 206 ; 6 0.3 ; 6.4 - ; -S526N 251 ; 21 251 ; 21 191 ; 21 191 ; 21 202 ; 21 183 ; 21 0.3 ; 6.4 - ; -S5378 1422 ; 179 1422 ; 115 1346 ; 114 1321 ; 124 1260 ; 112 1242 ; 113 1.4 ; 15.0 - ; 144S635 190 ; 32 190 ; 32 190 ; 32 190 ; 32 161 ; 32 161 ; 32 0.2 ; 2.3 - ; -S641 160 ; 19 160 ; 15 132 ; 15 132 ; 15 146 ; 15 131 ; 15 0.2 ; 3.3 18 ; -S6669 2263 ; 239 2263 ; 92 2199 ; 92 2199 ; 92 2238 ; 77 2174 ; 76 1.1 ; 5.8 - ; -S713 174 ; 19 174 ; 15 137 ; 15 137 ; 15 149 ; 15 130 ; 15 0.2 ; 5.4 - ; -S820 468 ; 5 468 ; 5 325 ; 5 335 ; 5 345 ; 5 317 ; 5 0.5 ; 12.6 - ; -S832 482 ; 5 482 ; 5 335 ; 5 344 ; 5 355 ; 5 324 ; 5 0.4 ; 8.5 - ; -S8381 341 ; 32 341 ; 32 328 ; 32 335 ; 32 328 ; 32 328 ; 32 0.2 ; 2.3 - ; -S92341 2346 ; 211 2346 ; 172 1896 ; 172 1891 ; 174 1437 ; 145 1377 ; 146 1.8 ; 14.3 - ; -S938 341 ; 32 341 ; 32 328 ; 32 335 ; 32 328 ; 32 328 ; 32 0.2 ; 2.3 - ; -S953 348 ; 29 348 ; 6 356 ; 6 343 ; 6 340 ; 6 332 ; 6 0.3 ; 4.4 - ; -S967 369 ; 29 369 ; 6 386 ; 6 370 ; 6 357 ; 6 355 ; 6 0.3 ; 4.4 - ; -S991 299 ; 19 299 ; 19 297 ; 19 297 ; 19 297 ; 19 297 ; 19 0.2 ; 2.3 - ; -

%Reduction 0.0 ; 0.0 0.0 ; 16.8 9.8 ; 17.7 9.5 ; 17.4 10.8 ; 18.7 14.3 ; 18.9

Table 6.1: Retiming results for the ISCAS89 benchmarks (number of two-input AND ver-tices; number of REGISTERs)

81

Design Original Plain Retiming On-the-Fly Iteration of interleavednetlist retiming with retiming retiming and redundancy removal

[66] fanin with fanin (iterated until no further improvements)sharing sharing Plain Best result of Time (s) ;

retiming columns 4 or 5Memory (MB)CP RAS 2686 ; 660 2686 ; 585 2103 ; 492 2159 ; 492 2148 ; 489 2039 ; 489 4.9 ; 32.4CR RAS 2297 ; 431 2297 ; 379 2200 ; 378 2209 ; 387 1735 ; 341 1873 ; 348 2.0 ; 14.5D DASA 1223 ; 115 1223 ; 100 967 ; 100 968 ; 100 844 ; 100 815 ; 100 0.8 ; 8.9D DCLA 10916 ; 1137 10916 ; 771 10483 ; 771 10506 ; 771 7853 ; 750 7443 ; 750 23.9 ; 94.1D DUDD 1295 ; 129 1295 ; 100 1143 ; 100 1146 ; 100 1119 ; 100 1084 ; 100 1.1 ; 12.9I IBBC 389 ; 195 389 ; 43 228 ; 41 217 ; 41 207 ; 43 196 ; 37 0.5 ; 9.7I IFAR 1202 ; 413 1202 ; 147 1031 ; 142 1033 ; 143 997 ; 139 929 ; 137 1.7 ; 18.5I IFEC 334 ; 182 334 ; 46 302 ; 45 309 ; 45 308 ; 46 287 ; 45 0.7 ; 15.0I IFPF 5896 ; 1546 5896 ; 705 5273 ; 679 4715 ; 612 2812 ; 350 2768 ; 355 43.9 ; 78.0L EMQ 981 ; 220 981 ; 88 737 ; 87 745 ; 88 920 ; 86 632 ; 74 1.2 ; 16.3L EXEC 1618 ; 535 1618 ; 168 1191 ; 163 1193 ; 197 1178 ; 144 974 ; 138 2.2 ; 19.0L FLUSH 893 ; 159 893 ; 5 495 ; 1 409 ; 1 358 ; 1 338 ; 1 0.6 ; 8.7L LMQ 14074 ; 187614074 ; 119612921 ; 119012983 ; 11905793 ; 432 5363 ; 428 41.5 ; 91.9L LRU 581 ; 237 581 ; 94 524 ; 94 518 ; 94 469 ; 94 439 ; 94 1.0 ; 13.1L PNTR 1453 ; 541 1453 ; 245 1351 ; 245 1349 ; 245 1387 ; 245 1325 ; 245 1.2 ; 8.2L TBWK 1160 ; 307 1160 ; 125 829 ; 124 829 ; 124 279 ; 40 267 ; 40 0.8 ; 11.0M CIU 4550 ; 777 4550 ; 459 3262 ; 415 3244 ; 415 2929 ; 381 2757 ; 379 4.8 ; 35.8S SCU1 1520 ; 373 1520 ; 212 1296 ; 204 1346 ; 207 1308 ; 201 1160 ; 192 2.8 ; 20.2S SCU2 8560 ; 1368 8560 ; 640 6632 ; 566 5990 ; 564 3928 ; 432 4119 ; 425 34.6 ; 58.9V CACH 753 ; 173 753 ; 103 652 ; 105 649 ; 110 424 ; 95 393 ; 97 0.8 ; 14.9V DIR 554 ; 178 554 ; 87 491 ; 87 285 ; 50 160 ; 45 152 ; 43 0.5 ; 10.7V L2FB 120 ; 75 120 ; 26 103 ; 26 103 ; 26 107 ; 26 95 ; 26 0.3 ; 4.4V SCR1 826 ; 150 826 ; 95 418 ; 52 618 ; 94 341 ; 49 325 ; 48 0.6 ; 10.6V SCR2 2563 ; 551 2563 ; 458 1157 ; 86 2343 ; 460 524 ; 82 510 ; 82 1.4 ; 14.3V SNPC 78 ; 93 78 ; 21 68 ; 21 68 ; 21 67 ; 21 62 ; 21 0.3 ; 5.4V SNPM 2421 ; 1421 2421 ; 241 1843 ; 237 1814 ; 241 1800 ; 232 1221 ; 180 33.8 ; 116.8W GAR 2107 ; 242 2107 ; 93 1775 ; 91 1769 ; 91 1896 ; 91 1590 ; 75 3.3 ; 16.8W SFA 471 ; 64 471 ; 42 329 ; 42 329 ; 42 324 ; 41 300 ; 41 0.6 ; 12.7

% Reduction 0.0 ; 0.0 0.0 ; 50.1 20.7 ; 54.8 20.3 ; 51.8 33.6 ; 60.3 39.3 ; 61.1

Table 6.2: Retiming results for IBM Gigahertz Processor (GP) netlists

The additional application of on-the-fly retiming has a varying effect upon size.

Our experiments show that on average it hurts both REGISTER count and AND count.

However, in individual cases, it provides a substantial benefit. For example, for seven of

the 42 ISCAS netlists and eleven of the 28 GP netlists, on-the-fly retiming further reduced

the overall AND count. In addition, for three GP netlists the number of REGISTERs is

decreased. Selecting the best result of columns 4 and 5 on a per-netlist basis, we attain an

additional cumulative reduction of 2.5% in AND count and 1.6% in REGISTER count for

the GP netlists over the results of column 4 alone. Also, as illustrated in Figure 5.7, on-

the-fly retiming alone may result in REGISTERreduction even without solving the retiming

problem. For example, the GP netlist LFLUSH is a reconvergent acyclic pipeline. Before

82

using the ILP solver to calculate an optimal retiming, the options used in columns 4 and 5

reduce the REGISTER count to 78 and 38, respectively. Nevertheless, on-the-fly retiming

often temporarily hurts REGISTER count; this penalty is subsequently rectified during the

global retiming phase.

We briefly discuss how redundancy removal and on-the-fly retiming may occasion-

ally hurt REGISTER count using the example netlist depicted in Figure 6.9. Initially, ver-

ticesa and b have three sinks: one AND vertexg1 through a weight-of-zero edge; one

AND vertexg2 through a weight-of-one edge; and another distinct set of vertices through

a weight-of-one edge. By fanout sharing, the initial netlist depicted in part (a) has a total

weight of two. However, with on-the-fly retiming, we will drag the REGISTERs beyondg2 thus eliminating the ability to share fanout REGISTERs; we then may mergeg1 andg2,locking us into a weight-of-three solution depicted in part(b). Even if we had not mergedg1 andg2, this example depicts how on-the-fly retiming often temporarily hurts REGIS-

TER count, though the ILP solver has the opportunity to rectify this penalty. By mergingg1 andg2, we hurt REGISTER count in a manner which the ILP solver cannot rectify un-

less adjacent retiming opportunities may be exploited, since backward retiming the merged

vertex may entail a NEGATIVE REGISTERon the outgoing edge with 0 weight. Therefore,

a promising direction of future research is to apply on-the-fly retiming in a more limited

fashion, perhaps neglecting such a drag unless it is determined thatall non-zero-weight

outgoing edges from botha andb may be on-the-fly retimed, or if it is determined that a

resulting merge will cause a weight-of-zero and a weight-of-one outgoing edge from the

merged-onto vertex (g1=g2 in this example).

Iteration of redundancy removal and retiming may provide significant additional

reductions. Compared to the single application runs, an additional average reduction of4:5% and1:2% on the ISCAS benchmarks, and18:6% and6:3% on the GP netlists, was

achieved for the number of AND vertices and REGISTERs, respectively. Up to six itera-

tions were applied during these runs, with an average numberof 2:6 for ISCAS and4:683

...

...

(a) (b)

...

...

babag1g2 g1=g2

Figure 6.9: Example netlist depicting how on-the-fly retiming may hurt REGISTERcount

for GP. The reported results in column 7 used on-the-fly retiming on eight of the 42 ISCAS

netlists and on six of the 28 GP netlists. One particularly interesting result is that an iterated

application using our new techniques of fanin sharing and on-the-fly retiming is able to sig-

nificantly outperform an interleaved classical retiming and redundancy removal approach.

This demonstrates the overall potential of the presented approaches for enhancing verifi-

cation and technology-independent logic synthesis, and furthermore illustrates the synergy

possible between reduction algorithms in a transformation-based verification framework.

6.4.2 Retiming Experiments

In our next set of experiments we evaluated the impact of generalized retiming to re-

duce netlist size and enhance verification. These experiments were performed on an IBM

RS/6000 Model 260, with a 256 MB memory limit. NEGATIVE REGISTERs are allowed

for these experiments.

In the first set of experiments we assessed the potential of generalized retiming

for reducing REGISTER count. In particular, we evaluated an iterative scheme where the

retiming engine (RET) and the redundancy removal [51] engine (COM) are called in an

interleaved manner. The results for the ISCAS and GP netlists are provided in Table 6.3.

For the ISCAS benchmarks, we list only the netlists with morethan 16 REGISTERs since

smaller designs are of less interest. Columns 2, 3, and 4 report the number of REGISTERs

plus NEGATIVE REGISTERs of the original netlist, after applying COM only, and after

84

applying RET only, respectively. The following columns provide these REGISTER counts

after performing several iterations of COM followed by RET.The number of NEGATIVE

REGISTERs in the sum, if non-zero, is provided in parentheses. For brevity, we report only

up to three iterations; additional iterations provided marginal, though non-zero, improve-

ments. The maximum lag reported in column 9 provides an indication of retiming stump

size; see Section 7.3 for a more detailed discussion of this topic.

Overall, these results indicate that generalized retiminghas a significant potential for

reducing the number of REGISTERs for enhanced verification. For the ISCAS benchmarks

we obtained a maximum REGISTER reduction of 79% with an average of 27%. For the

GP netlists we achieved a maximum reduction of 99.4% with an average of 62%. One

particularly interesting example is the LFLUSH netlist which implements intricate acyclic

control logic. It has one critical path which prevents retiming from being able to remove all

REGISTERs. Retiming removes all REGISTERs outside this subnetlist, and finds a single

net along the critical path to which the remaining REGISTERmay be moved.

The number of NEGATIVE REGISTERs generated by retiming is quite small. This

can be explained by several factors. First, we disallow NEGATIVE REGISTERs on sharing

edges as per the discussion of Section 6.1 to enable efficientlinear algorithms for our (albeit

more limited) solution space. Second, since retiming preserves the sequential weight of

directed cycles, there is generally a penalty associated with NEGATIVE REGISTERs within

a SCC. Only paths between the SCCs are likely to require NEGATIVE REGISTERs for an

optimal solution.

The results further indicate that a repeated application ofretiming and redundancy

removal techniques may achieve greater reductions than a single application of either or

both techniques. For example, the number of REGISTERs of netlist LLMQ is reduced from

1876 to 1185, 433, and 425 by applying one, two, or three iterations of redundancy removal

followed by retiming, respectively. This is a justificationof the power of a transformation-

based verification architecture in simplifying problems which may otherwise be infeasible.

85

Design Number of Sequential Elements (NEGATIVE REGISTERs) Relative Max. Time (s) ;Original COM RET COM-RET COM-RET COM-RET Reduction Lag Memory (MB)

Only Only 1 Iteration 2 Iterations 3 Iterations (Best)PROLOG 136 81 45 (1) 45 (1) 45 (3) 44 (2) 67.6% 2 1.4 ; 22.4S1196 18 16 16 14 14 14 22.2% 1 0.6 ; 10.7S1238 18 17 16 15 14 14 22.2% 1 0.9 ; 21.1S1269 37 37 36 36 36 36 2.7% 1 0.4 ; 6.2S132071 638 513 390 343 292 (1) 289 54.7% 11 3.8 ; 34.7S1423 74 74 72 72 72 72 2.7% 1 0.5 ; 6.2S1512 57 57 57 57 57 57 0.0% 1 0.5 ; 6.2S158501 534 518 498 488 485 485 9.2% 6 5.3 ; 31.8S3271 116 116 110 110 110 110 5.2% 5 0.7 ; 7.0S3330 132 81 44 (2) 44 (3) 44 (2) 44 (2) 66.7% 3 0.7 ; 7.0S3384 183 183 72 72 72 72 60.7% 6 0.7 ; 7.1S35932 1728 1728 1728 1728 1728 1728 0.0% 1 7.2 ; 38.0S382 21 21 15 15 15 15 28.6% 1 0.3 ; 5.9S385841 1426 1415 1375 1375 1374 1374 3.6% 5 29.4 ; 127.4S400 21 21 15 15 15 15 28.6% 0 0.3 ; 5.9S444 21 21 15 15 15 15 28.6% 1 0.3 ; 5.9S4863 104 88 37 37 37 37 64.4% 4 0.9 ; 7.3S499 22 22 22 22 20 20 9.1% 1 0.6 ; 15.1S526N 21 21 21 21 21 21 0.0% 2 0.4 ; 5.9S5378 179 164 112 (6) 112 (6) 111 (6) 111 (6) 38.0% 5 1.6 ; 18.4S635 32 32 32 32 32 32 0.0% 1 0.4 ; 5.9S641 19 17 15 15 15 15 21.1% 2 0.4 ; 5.9S6669 239 231 92 75 75 75 68.6% 5 1.6 ; 14.1S713 19 17 15 15 15 15 21.1% 2 0.4 ; 5.9S8381 32 32 32 32 32 32 0.0% 0 0.5 ; 6.1S92341 211 193 172 172 165 131 37.9% 3 2.5 ; 26.2S938 32 32 32 32 32 32 0.0% 0 0.4 ; 6.1S953 29 29 6 6 6 6 79.3% 0 0.4 ; 6.1S967 29 29 6 6 6 6 79.3% 0 0.4 ; 6.1S991 19 19 19 19 19 19 0.0% 2 0.4 ; 6.0

CR RAS 431 431 378 370 348 348 19.3% 3 6.0 ; 22.6D DASA 115 115 100 100 100 100 13.0% 2 0.9 ; 7.1D DCLA 1137 1137 771 750 750 750 34.0% 1 35.4 ; 36.2D DUDD 129 129 100 100 100 100 22.5% 3 0.9 ; 7.0I IBBC 195 195 40 40 38 36 81.5% 2 1.6 ; 21.6I IFAR 413 413 142 139 136 136 67.1% 4 3.1 ; 19.5I IFEC 182 182 45 45 45 45 75.3% 6 0.7 ; 7.0I IFPF 1546 1356 673 (4) 661 (4) 449 (2) 442 (2) 71.4% 10 46.5 ; 127.9L EMQ 220 220 87 88 74 74 66.4% 4 3.4 ; 18.5L EXEC 535 535 163 137 135 134 75.0% 6 9.8 ; 28.1L FLUSH 159 159 1 1 1 1 99.4% 3 0.8 ; 7.0L LMQ 1876 1831 1190 1185 433 (3) 425 (3) 77.3% 3 50.7 ; 139.1L LRU 237 237 94 94 94 94 60.3% 2 1.1 ; 7.1L PNTR 541 541 245 245 245 245 54.7% 3 1.8 ; 8.8L TBWK 307 307 124 124 40 40 87.0% 3 2.7 ; 18.0M CIU 777 686 415 415 411 387 (1) 50.2% 15 26.3 ; 76.6S SCU1 373 373 204 200 192 192 48.5% 3 9.0 ; 20.6S SCU2 1368 1368 566 565 426 423 69.1% 5 102.2 ; 67.4V CACH 173 155 104 (2) 96 (3) 96 (2) 95 (1) 45.1% 9 1.1 ; 24.0V DIR 178 151 87 83 43 42 (1) 76.4% 5 0.9 ; 22.3V L2FB 75 75 26 26 26 26 65.3% 2 0.5 ; 5.9V SCR1 150 128 52 48 (1) 48 (1) 48 68.0% 4 0.7 ; 10.9V SCR2 551 551 86 82 82 82 85.1% 4 4.4 ; 15.0V SNPC 93 93 21 21 21 21 77.4% 4 0.5 ; 6.8V SNPM 1421 1216 233 (7) 233 (7) 231 (11) 227 (8) 84.0% 15 14.7 ; 65.2W GAR 242 232 91 (1) 90 90 79 (1) 67.4% 2 3.2 ; 25.4W SFA 64 64 42 42 41 41 35.9% 1 1.0 ; 16.0

Table 6.3: Generalized retiming results for ISCAS89 (upperpart) and GP (lower part)

86

Design Original Netlist Reduced Netlist RelativeNumber of Reachability Time (s) ; Number of Reachability BDDinit Time (s) ; ImprovementREGISTERs Steps, Algo Memory(MB) REGISTERs Steps, Algo Nodes Memory(MB) Time ; Memory

PROLOG 136 17 CI 2285 ; 134.5 45 16 CH 611 81.6 ; 27.5 96.4% ; 79.6%S1196 18 4 CI 1.1 ; 6.5 14 2 C I 122 0.5 ; 6.3 54.5% ; 3.1%S1238 18 4 CI 1.2 ; 6.5 14 2 C I 159 0.1 ; 6.3 91.7% ; 3.1%S1269 37 11 CH 13194 ; 185.5 36 11 CH 901 13395 ; 187.5 -1.5% ; -1.1%S3330 132 17 CH 668.0 ; 35.3 45 16 CI 194 35.8 ; 15.6 94.6% ; 55.8%S382 21 13 CI < 0:1 ; 6.2 15 11 CI 17 < 0:1 ; 6.1 0.0% ; 1.6%S400 21 10 CI < 0:1 ; 6.2 15 10 CH 16 < 0:1 ; 6.1 0.0% ; 1.6%S444 21 4 C I < 0:1 ; 6.1 15 3 CH 27 < 0:1 ; 6.1 0.0% ; 0.0%S4863 104 3 I 14400 ; 174.2 37 4 C I 199 14.8 ; 16.6 99.9% ; 90.5%S499 22 1 CH 0.2 ; 6.2 20 1 CH 21 < 0:1 ; 6.2 100% ; 0.0%S641 19 6 C I 0.8 ; 6.4 15 5 C I 15 1.0 ; 6.4 -25.0% ; 0.0%S713 19 6 C I 0.9 ; 6.3 15 5 C I 15 0.6 ; 6.4 33.3% ; -1.6%S953 29 6 C I 0.8 ; 6.4 6 5 CH 7 < 0:1 ; 6.1 100% ; 4.7%S967 29 4 C I 1.1 ; 6.3 6 3 CH 7 < 0:1 ; 6.1 100% ; 3.2%

CR RAS 431 1028 CI 724.3 ; 57.2 370 1026 CI 415 424.0 ; 51.8 41.5% ; 9.4%D DASA 115 6 CI 19.7 ; 7.8 100 5 C I 200 33.0 ; 11.6 -67.5% ; -48.7%D DUDD 129 13 CI 953.3 ; 112.8 100 11 CH 2568 359.1 ; 33.7 62.3% ; 70.1%I IBBC 195 5 CH 145.3 ; 11.4 40 3 CH 41 4.4 ; 6.4 97.0% ; 43.9%I IFAR 413 5 I 14400 ; 87.0 139 22 CI 719 2302 ; 102.0 84.0% ; -17.2%I IFEC 182 6 CI 66.3 ; 8.4 45 2 CH 151 28.0 ; 6.9 57.8% ; 17.9%L EMQ 220 8 CH 323.7 ; 17.0 88 5 CH 5519 205.6 ; 33.0 36.5% ; -94.1%L EXEC 535 5H 14400 ; 63.2 137 9 C I 1856 593.6 ; 103.2 95.9% ; -63.3%L FLUSH 159 4 CI 37.4 ; 7.7 1 2 CH 2 < 0:1 ; 6.2 100% ; 19.5%L PNTR 541 6 CI 6687 ; 138.5 245 3 C I 242 2423 ; 51.2 63.8% ; 63.0%L TBWK 307 6 CH 184.1 ; 9.1 124 4 CH 123 74.0 ; 7.4 59.8% ; 18.7%S SCU1 373 14 CH 8934 ; 165.8 200 12 CH 755 1195 ; 118.1 86.6% ; 28.8%V CACH 173 11 CH 92.1 ; 17.2 97 8 C I 910 20.0 ; 8.9 78.3% ; 48.3%V DIR 178 8 CH 57.9 ; 8.3 83 2 C I 95 11.1 ; 7.0 80.8% ; 15.7%V L2FB 75 4 C I 2.9 ; 6.3 26 2 CH 27 < 0:1 ; 6.1 100% ; 3.2%V SCR1 150 20 CH 250.0 ; 17.7 48 17 CI 90 5.0 ; 15.5 98.0% ; 12.4%V SCR2 551 22 CI 1201 ; 105.0 82 20 CI 220 260.0 ; 36.7 78.4% ; 65.0%V SNPC 93 4 CH 4.9 ; 6.6 21 1 CH 17 < 0:1 ; 6.2 100% ; 6.1%W GAR 242 11 CI 109.8 ; 25.0 90 9 CH 191 82.5 ; 13.0 24.9% ; 48.0%W SFA 64 7 CI 3.7 ; 6.8 42 6 C I 14 3.6 ; 6.9 2.7% ; -1.5%

Table 6.4: Effect of retiming on reachability analysis (C = completed within a time limit offour hours,H = hybrid image computation,I = IWLS95 image computation)

Table 6.4 provides results of another experiment on assessing the impact of gener-

alized retiming for enhanced symbolic reachability analysis using VIS 1.4 [82]. We report

results for all netlists of Table 6.3 for which retiming resulted in a REGISTER reduction,

and for which reachability analysis (before or after retiming) could be completed. We ran

each experiment with two options for image computation: theIWLS95 partitioned transi-

tion relation method [83] and the hybrid method [43]. We report the best of the two results

on a per-example basis. Although after reduction we complete traversal for only three ad-

ditional netlists, the results clearly show that retiming significantly improves the overall

87

performance of reachability analysis. The CPU time is decreased by an average of 53.1%

for ISCAS and 64.0% for GP netlists, respectively. The corresponding memory reductions

are 17.2% and 12.3%, respectively. The cumulative run-timespeedup is 55.7% for the

ISCAS benchmarks and 83.5% for the GP netlists. As another measure of the size of the

retiming stump, we report the BDD sizes for the initial states in column 7. As shown, these

BDDs remain fairly small and tend not to hinder reachabilityanalysis.

Figure 6.10 illustrates the profile of peak BDD size while traversing benchmark

S3330, for the original netlist and after various reductions. This example demonstrates

how retiming tends to benefit the performance of reachability analysis. To further illus-

trate the effect of retiming on reducing the correlation of the state encoding, we analyzed

the traversal of netlist S4863. Reachability timed out during the third traversal step of

the original netlist. Using retiming, the correlation between the remaining REGISTERs

was completely removed resulting in full reachability of all 237 states. Interestingly, the

fine-grained conjunction scheduling approach proposed in [44] provides a similar result for

this netlist which eliminates the need for representing thepresent- and next-state variables

of any REGISTERs without using retiming, instead using an advanced image computation

algorithm. They too are able to complete reachability for this netlist, though their com-

putational requirements exceed ours by more than an order ofmagnitude, and on a faster

computer. While such a profound result is likely atypical, this is strong evidence of the

power of both redundancy removal and retiming to reduce REGISTERcorrelation.

6.4.3 Diameter Overapproximation Experiments

In our final set of experiments, we implemented the diameter overapproximation algorithms

presented in Chapter 4. We ran several sets of experiments toassess the effectiveness of

these techniques on netlists after various transformations.

Our first set of experiments summarized in Table 6.5 are on theISCAS89 bench-

marks, using each primary output as a target. We categorizedthe REGISTERs in the netlist

88

Symbolic Reachability Profile for S3330

0

20000

40000

60000

80000

100000

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Time-step

Nu

mb

er o

f B

DD

No

des

No Reduction

COM Only

Retiming Only

COM + Retiming

Figure 6.10: Peak BDD size profile for traversing S3330 with IWLS95 image computationmethod after various transformations

into the variousTSAPtypes:CCs,ACs,MCs +QCs, andGCs. We additionally ran our di-

ameter overapproximation algorithm on all targets; any with a diameter of less than 50 were

enumerated in setT 0 � T and the average of these corresponding diameters is reported.

The bound of 50 was arbitrarily chosen as being a reasonable cut-off size for discharging

with BMC . In the bottom row we report the cumulative sum of REGISTERs of the corre-

sponding types, and the cumulative sum ofjT 0j andjT j. We performed these experiments

on the original netlists; on redundancy-removed netlists (COM); and on netlists after redun-

dancy removal and retiming (COM,RET,COM), using Theorems 5.3 and 6.3. We perform

the identical set of experiments on GP netlists in Table 6.6.We do not report per-line

computational resources for these experiments; our structural diameter overapproximation

algorithms consume trivial resources. The maximum resources necessary per netlist in

these runs were 12.4 seconds for ISCAS, and 0.4 seconds for GP, with less than 1 MB for

either. The reason for the larger requirements for ISCAS is that we distinctly analyze the

89

fanin cone of each target; some of the ISCAS netlists have a large number of targets. Our

resource requirements are thus less than one second per target on any of these benchmarks.

Analyzing the ISCAS results, we see that for the original netlists, many REGISTERs

are non-complex: 21% are acyclic REGISTERs, and 5% are table cells. A total of 477

original targets (30%) have a diameter of less than 50. Afterredundancy removal, 24%

of the REGISTERs are acyclic, and 10% are table cells; 556 of the targets (34%) have a

diameter of less than 50. After redundancy removal and retiming, 10% of the REGISTERs

are acyclic and 11% are table cells. This drop in acyclic REGISTERs is due primarily

to their elimination by retiming. A total of 639 targets (40%) have a diameter of less

than 50. These results demonstrate a significant potential of structural transformations to

enable the ability to attain a practically useful overapproximate diameter bound for the

untransformed netlists. This result is particularly profound noting that we did not employ

any (possibly costly) techniques to attempt to tightenGCdiameter bounds; our experiments

thus reflect a very fine line between being able to attain a small diameter bound and a huge

bound. As techniques emerge for efficiently improving diameter bounding forGCs, the

compositional and transformation-based theory we have developed should prove even more

useful to obtain superior results with lesser resources.

For the GP netlists, we see that a larger fraction of the REGISTERs is originally non-

complex: 1% are constants, 57% are acyclic, and 13% are tablecells. A total of 95 targets

(33%) have a diameter of less than 50. After redundancy removal, 0.5% of the REGISTERs

are constants, 58% are acyclic, and 15% are table cells. A total of 111 targets (39%) have

a diameter of less than 50. After retiming and redundancy removal, 1% of the REGISTERs

are constants, 19% are acyclic, and 34% are table cells. A total of 126 (44%) of these

targets have a diameter of less than 50.

90

Original Netlist COM COM,RET,COMDesign jRj 2 CC; AC; jT 0j / jT j; jRj 2 CC; AC; jT 0j / jT j; jRj 2 CC; AC; jT 0j / jT j;

MC+QC; GC Avg. d(t0) MC+QC; GC Avg. d(t0) MC+QC; GC Avg. d(t0)PROLOG 0 ; 107 ; 1 ; 28 14 / 73 ; 8.9 0 ; 103 ; 1 ; 28 16 / 73 ; 11.9 0 ; 16 ; 1 ; 28 24 / 73 ; 21.0S1196 0 ; 18 ; 0 ; 0 14 / 14 ; 3.3 0 ; 18 ; 0 ; 0 14 / 14 ; 3.3 0 ; 16 ; 0 ; 0 14 / 14 ; 4.3S1238 0 ; 18 ; 0 ; 0 14 / 14 ; 3.3 0 ; 18 ; 0 ; 0 14 / 14 ; 3.3 0 ; 16 ; 0 ; 0 14 / 14 ; 4.3S1269 0 ; 9 ; 17 ; 11 2 / 10 ; 10.0 0 ; 9 ; 17 ; 11 2 / 10 ; 10.0 0 ; 8 ; 17 ; 11 2 / 10 ; 10.0S132071 0 ; 314 ; 128 ; 196 49 / 152 ; 2.0 0 ; 315 ; 128 ; 195 49 / 152 ; 2.1 0 ; 77 ; 89 ; 183 79 / 152 ; 6.4S1423 0 ; 3 ; 16 ; 55 1 / 5 ; 1.0 0 ; 3 ; 16 ; 55 1 / 5 ; 1.0 0 ; 1 ; 12 ; 59 1 / 5 ; 2.0S1488 0 ; 0 ; 0 ; 6 19 / 19 ; 33.0 0 ; 0 ; 0 ; 6 19 / 19 ; 33.0 0 ; 0 ; 0 ; 6 19 / 19 ; 33.0S1494 0 ; 0 ; 0 ; 6 19 / 19 ; 33.0 0 ; 0 ; 0 ; 6 19 / 19 ; 33.0 0 ; 0 ; 0 ; 6 19 / 19 ; 33.0S1512 0 ; 0 ; 1 ; 56 0 / 21 ; 0.0 0 ; 0 ; 0 ; 57 0 / 21 ; 0.0 0 ; 0 ; 0 ; 57 0 / 21 ; 0.0S158501 0 ; 99 ; 124 ; 311 115 / 150 ; 2.7 0 ; 96 ; 107 ; 328 115 / 150 ; 2.7 0 ; 73 ; 81 ; 292 115 / 150 ; 4.7S2081 0 ; 0 ; 0 ; 8 0 / 1 ; 0.0 0 ; 0 ; 0 ; 8 0 / 1 ; 0.0 0 ; 0 ; 0 ; 8 0 / 1 ; 0.0S27 0 ; 1 ; 2 ; 0 1 / 1 ; 4.0 0 ; 1 ; 2 ; 0 1 / 1 ; 4.0 0 ; 1 ; 2 ; 0 1 / 1 ; 4.0S298 0 ; 0 ; 1 ; 13 0 / 6 ; 0.0 0 ; 0 ; 1 ; 13 0 / 6 ; 0.0 0 ; 0 ; 1 ; 13 0 / 6 ; 0.0S3271 0 ; 6 ; 0 ; 110 1 / 14 ; 7.0 0 ; 6 ; 0 ; 110 1 / 14 ; 7.0 0 ; 0 ; 0 ; 110 1 / 14 ; 7.0S3330 0 ; 103 ; 1 ; 28 16 / 73 ; 11.9 0 ; 103 ; 1 ; 28 16 / 73 ; 11.9 0 ; 16 ; 1 ; 28 33 / 73 ; 25.3S3384 0 ; 111 ; 0 ; 72 6 / 26 ; 16.5 0 ; 111 ; 0 ; 72 6 / 26 ; 16.5 0 ; 0 ; 0 ; 72 6 / 26 ; 16.5S344 0 ; 0 ; 4 ; 11 3 / 11 ; 5.0 0 ; 0 ; 4 ; 11 3 / 11 ; 5.0 0 ; 0 ; 4 ; 11 3 / 11 ; 5.0S349 0 ; 0 ; 4 ; 11 3 / 11 ; 5.0 0 ; 0 ; 4 ; 11 3 / 11 ; 5.0 0 ; 0 ; 4 ; 11 3 / 11 ; 5.0S35932 0 ; 0 ; 0 ; 1728 0 / 320 ; 0.0 0 ; 0 ; 0 ; 1728 0 / 320 ; 0.0 0 ; 0 ; 0 ; 1728 0 / 320 ; 0.0S382 0 ; 6 ; 0 ; 15 0 / 6 ; 0.0 0 ; 6 ; 0 ; 15 0 / 6 ; 0.0 0 ; 0 ; 0 ; 15 0 / 6 ; 0.0S385841 0 ; 47 ; 4 ; 1375 56 / 304 ; 1.0 1 ; 203 ; 366 ; 854133 / 304 ; 14.90 ; 170 ; 345 ; 832110 / 304 ; 16.7S386 0 ; 0 ; 0 ; 6 7 / 7 ; 33.0 0 ; 0 ; 0 ; 6 7 / 7 ; 33.0 0 ; 0 ; 0 ; 6 7 / 7 ; 33.0S400 0 ; 6 ; 0 ; 15 0 / 6 ; 0.0 0 ; 6 ; 0 ; 15 0 / 6 ; 0.0 0 ; 0 ; 0 ; 15 0 / 6 ; 0.0S4201 0 ; 0 ; 0 ; 16 0 / 1 ; 0.0 0 ; 0 ; 0 ; 16 0 / 1 ; 0.0 0 ; 0 ; 0 ; 16 0 / 1 ; 0.0S444 0 ; 6 ; 0 ; 15 0 / 6 ; 0.0 0 ; 6 ; 0 ; 15 0 / 6 ; 0.0 0 ; 0 ; 0 ; 15 0 / 6 ; 0.0S4863 0 ; 62 ; 0 ; 42 0 / 16 ; 0.0 0 ; 83 ; 0 ; 21 0 / 16 ; 0.0 0 ; 16 ; 0 ; 21 0 / 16 ; 0.0S499 0 ; 0 ; 0 ; 22 0 / 22 ; 0.0 0 ; 0 ; 0 ; 22 0 / 22 ; 0.0 0 ; 0 ; 0 ; 22 0 / 22 ; 0.0S510 0 ; 0 ; 0 ; 6 7 / 7 ; 33.0 0 ; 0 ; 0 ; 6 7 / 7 ; 33.0 0 ; 0 ; 0 ; 6 7 / 7 ; 33.0S526N 0 ; 0 ; 1 ; 20 0 / 6 ; 0.0 0 ; 0 ; 1 ; 20 0 / 6 ; 0.0 0 ; 0 ; 1 ; 20 0 / 6 ; 0.0S5378 0 ; 115 ; 0 ; 64 4 / 49 ; 1.5 0 ; 126 ; 0 ; 53 4 / 49 ; 1.5 0 ; 56 ; 0 ; 56 7 / 49 ; 3.9S635 0 ; 0 ; 0 ; 32 0 / 1 ; 0.0 0 ; 0 ; 0 ; 32 0 / 1 ; 0.0 0 ; 0 ; 0 ; 32 0 / 1 ; 0.0S641 0 ; 7 ; 0 ; 12 3 / 24 ; 1.0 0 ; 7 ; 0 ; 12 3 / 24 ; 1.0 0 ; 4 ; 0 ; 10 7 / 24 ; 2.0S6669 0 ; 181 ; 0 ; 58 37 / 55 ; 3.4 0 ; 181 ; 0 ; 58 37 / 55 ; 3.4 0 ; 18 ; 0 ; 58 37 / 55 ; 4.0S713 0 ; 7 ; 0 ; 12 3 / 23 ; 1.0 0 ; 7 ; 0 ; 12 3 / 23 ; 1.0 0 ; 7 ; 0 ; 7 7 / 23 ; 2.3S820 0 ; 0 ; 0 ; 5 19 / 19 ; 17.0 0 ; 0 ; 0 ; 5 19 / 19 ; 17.0 0 ; 0 ; 0 ; 5 19 / 19 ; 17.0S832 0 ; 0 ; 0 ; 5 19 / 19 ; 17.0 0 ; 0 ; 0 ; 5 19 / 19 ; 17.0 0 ; 0 ; 0 ; 5 19 / 19 ; 17.0S8381 0 ; 0 ; 0 ; 32 0 / 1 ; 0.0 0 ; 0 ; 0 ; 32 0 / 1 ; 0.0 0 ; 0 ; 0 ; 32 0 / 1 ; 0.0S92341 0 ; 45 ; 9 ; 157 22 / 39 ; 1.2 0 ; 49 ; 5 ; 157 22 / 39 ; 1.2 0 ; 14 ; 25 ; 133 22 / 39 ; 2.0S938 0 ; 0 ; 0 ; 32 0 / 1 ; 0.0 0 ; 0 ; 0 ; 32 0 / 1 ; 0.0 0 ; 0 ; 0 ; 32 0 / 1 ; 0.0S953 0 ; 23 ; 0 ; 6 3 / 23 ; 2.0 0 ; 23 ; 0 ; 6 3 / 23 ; 2.0 0 ; 0 ; 0 ; 6 23 / 23 ; 29.8S967 0 ; 23 ; 0 ; 6 3 / 23 ; 2.0 0 ; 23 ; 0 ; 6 3 / 23 ; 2.0 0 ; 0 ; 0 ; 6 23 / 23 ; 29.8S991 0 ; 0 ; 0 ; 19 17 / 17 ; 8.8 0 ; 0 ; 0 ; 19 17 / 17 ; 8.8 0 ; 0 ; 0 ; 19 17 / 17 ; 8.8P

0; 1317; 313; 4622 477 / 1615 1; 1503; 653; 4086 556 / 1615 0; 509; 583; 3992 639 / 1615

Table 6.5: Diameter experiments for ISCAS89 benchmarks

91

Original Netlist COM COM,RET,COMDesign jRj 2 CC; AC; jT 0j / jT j; jRj 2 CC; AC; jT 0j / jT j; jRj 2 CC; AC; jT 0j / jT j;

MC+QC; GC Avg. d(t0) MC+QC; GC Avg. d(t0) MC+QC; GC Avg. d(t0)CP RAS 0 ; 279 ; 66 ; 315 0 / 2 ; 0.0 0 ; 286 ; 66 ; 307 0 / 2 ; 0.0 0 ; 179 ; 65 ; 238 0 / 2 ; 0.0CLB CNTL 0 ; 29 ; 2 ; 19 0 / 2 ; 0.0 0 ; 25 ; 2 ; 19 0 / 2 ; 0.0 0 ; 15 ; 2 ; 20 0 / 2 ; 0.0CR RAS 0 ; 96 ; 6 ; 329 0 / 1 ; 0.0 0 ; 100 ; 7 ; 321 0 / 1 ; 0.0 0 ; 52 ; 10 ; 284 0 / 1 ; 0.0D DASA 0 ; 16 ; 81 ; 18 1 / 2 ; 35.0 0 ; 10 ; 86 ; 13 2 / 2 ; 27.0 0 ; 1 ; 86 ; 13 2 / 2 ; 28.0D DCLA 0 ; 382 ; 1 ; 754 0 / 2 ; 0.0 0 ; 387 ; 1 ; 748 0 / 2 ; 0.0 0 ; 14 ; 0 ; 736 0 / 2 ; 0.0D DUDD 0 ; 30 ; 28 ; 71 4 / 22 ; 9.2 0 ; 21 ; 28 ; 71 4 / 22 ; 10.8 0 ; 1 ; 21 ; 71 7 / 22 ; 11.0I IBBQn 0 ; 623 ; 1488 ; 0 15 / 15 ; 4.7 0 ; 623 ; 1488 ; 0 15 / 15 ; 4.7 0 ; 0 ; 1488 ; 0 15 / 15 ; 4.7I IFAR 0 ; 303 ; 11 ; 99 0 / 2 ; 0.0 0 ; 257 ; 11 ; 93 0 / 2 ; 0.0 0 ; 41 ; 18 ; 79 0 / 2 ; 0.0I IFPF 11 ; 893 ; 44 ; 598 0 / 1 ; 0.0 1 ; 923 ; 35 ; 525 0 / 1 ; 0.0 0 ; 191 ; 4 ; 218 0 / 1 ; 0.0L3 SNP1 25 ; 529 ; 39 ; 82 0 / 5 ; 0.0 6 ; 400 ; 41 ; 62 0 / 5 ; 0.0 0 ; 31 ; 30 ; 41 1 / 5 ; 1.0L EMQn 5 ; 146 ; 6 ; 66 0 / 1 ; 0.0 5 ; 136 ; 6 ; 66 1 / 1 ; 1.0 5 ; 20 ; 14 ; 57 1 / 1 ; 1.0L EXEC 12 ; 421 ; 0 ; 102 0 / 2 ; 0.0 0 ; 430 ; 0 ; 58 0 / 2 ; 0.0 0 ; 88 ; 0 ; 57 0 / 2 ; 0.0L FLUSHn 6 ; 198 ; 0 ; 4 7 / 7 ; 3.7 0 ; 194 ; 0 ; 4 7 / 7 ; 3.7 0 ; 12 ; 0 ; 4 7 / 7 ; 4.0L INTRo 14 ; 143 ; 12 ; 5 30 / 30 ; 3.8 0 ; 135 ; 12 ; 5 30 / 30 ; 3.8 0 ; 3 ; 12 ; 4 30 / 30 ; 3.6L LMQo 28 ; 690 ; 4 ; 133 0 / 16 ; 0.0 24 ; 682 ; 4 ; 141 0 / 16 ; 0.0 24 ; 114 ; 2 ; 132 0 / 16 ; 0.0L LRU 0 ; 142 ; 20 ; 75 0 / 12 ; 0.0 0 ; 127 ; 86 ; 9 12 / 12 ; 15.0 0 ; 0 ; 86 ; 8 12 / 12 ; 15.0L PFQo 14 ; 1936 ; 17 ; 84 1 / 67 ; 1.0 8 ; 1929 ; 82 ; 20 1 / 67 ; 1.0 8 ; 192 ; 83 ; 17 1 / 67 ; 1.0L PNTRn 3 ; 228 ; 10 ; 11 23 / 31 ; 2.0 0 ; 211 ; 10 ; 11 23 / 31 ; 2.0 0 ; 1 ; 10 ; 11 23 / 31 ; 4.0L PRQn 34 ; 366 ; 106 ; 265 10 / 10 ; 15.2 30 ; 367 ; 108 ; 260 10 / 10 ; 15.2 30 ; 12 ; 64 ; 302 10 / 10 ; 8.0L SLB 3 ; 135 ; 6 ; 27 2 / 3 ; 1.0 0 ; 126 ; 6 ; 26 2 / 3 ; 1.0 0 ; 15 ; 6 ; 23 2 / 3 ; 1.0L TBWKn 0 ; 202 ; 117 ; 14 0 / 21 ; 0.0 0 ; 186 ; 119 ; 12 1 / 21 ; 1.0 0 ; 1 ; 78 ; 53 1 / 21 ; 1.0M CIU 0 ; 343 ; 10 ; 424 0 / 6 ; 0.0 0 ; 321 ; 5 ; 417 0 / 6 ; 0.0 0 ; 63 ; 60 ; 286 6 / 6 ; 1.0SIDECAR 4 3 ; 109 ; 32 ; 455 0 / 1 ; 0.0 0 ; 60 ; 34 ; 453 0 / 1 ; 0.0 0 ; 24 ; 34 ; 67 0 / 1 ; 0.0S SCU1 1 ; 232 ; 4 ; 136 0 / 3 ; 0.0 0 ; 220 ; 6 ; 124 0 / 3 ; 0.0 0 ; 75 ; 4 ; 70 2 / 3 ; 2.0V CACH 5 ; 94 ; 15 ; 59 0 / 1 ; 0.0 0 ; 93 ; 14 ; 52 0 / 1 ; 0.0 1 ; 22 ; 15 ; 27 1 / 1 ; 1.0V DIR 6 ; 91 ; 13 ; 68 0 / 2 ; 0.0 0 ; 100 ; 13 ; 55 0 / 2 ; 0.0 0 ; 13 ; 10 ; 20 2 / 2 ; 8.0V SNPM 65 ; 846 ; 134 ; 376 1 / 2 ; 2.0 3 ; 762 ; 97 ; 401 2 / 2 ; 1.5 0 ; 51 ; 26 ; 46 2 / 2 ; 1.5W GAR 0 ; 159 ; 0 ; 83 1 / 7 ; 1.0 0 ; 158 ; 0 ; 82 1 / 7 ; 1.0 0 ; 10 ; 0 ; 81 1 / 7 ; 1.0W SFA 0 ; 22 ; 0 ; 42 0 / 8 ; 0.0 0 ; 22 ; 0 ; 42 0 / 8 ; 0.0 0 ; 0 ; 0 ; 42 0 / 8 ; 0.0P

235; 9683; 2272; 4714 95 / 284 77; 9291; 2367; 4397 111 / 284 68; 1241; 2228; 3007 126 / 284

Table 6.6: Diameter experiments for GP netlists

Note that in some cases, the diameter bound obtained on the retimed netlist is

slightly larger than that of the original netlist – for example, with S1196 and S158501.

This is due to the inequality in Theorem 6.3; we must add the negated lag of the target to its

diameter bound, even though retiming may not have reduced REGISTERcount for that tar-

get. Use of a normalized retiming helps minimize this potential increase, as does retiming a

single target cone at a time. However, the potential for increase tends to be very small (since

most lags tend to be very small), and the potential for decrease is much greater (potentially

exponentially greater). Transformations also impact table identification andTSAP clus-

tering heuristics. Due to the speed of these heuristic algorithms, it may be beneficial to run

them on every possible netlist representation to enable thebest possible result.

92

We now discuss several netlists in more detail. Netlist IIBBQn is a large table-

based netlist. Forward reachability analysis of the redundancy-removed cone of a single

unreachable target with a diameter of three (comprising 442REGISTERs and 134 FREE

vertices) requires 172.3 seconds and 25 MB with a MLP [84] algorithm, with sift variable

ordering enabled and a random initial ordering. For a cone ofthis size, completion of

reachability is somewhat a matter of luck, in this case due toa large degree of independence

of the corresponding BDD variables. However, because of itssmall diameter, the presented

techniques solve the target using SAT with a total of 0.46 seconds and 16 MB. Ignoring the

time necessary to parse the netlist, we attain nearly a triple order-of-magnitude speedup.

L FLUSHn is nearly acyclic; it has only ten REGISTERs in self-loops, six of which

are constant. For one target with 38 REGISTERs and 47 FREE vertices, reachability analysis

of the optimized target with MLP requires 1.20 seconds and 11MB. Redundancy removal

plus retiming enable MLP to solve the target in 0.60 seconds with 13 MB. Due to a shallow

diameter of three, our techniques solve the target using SATwith cumulative resources of

0.19 seconds and 9 MB.

93

Chapter 7

Cut-Based Abstraction

In this chapter we discuss the technique of structuralcut-based abstraction. The idea of this

approach is to identify a cuthC; Ci of the netlist graph whereT � C, then to replace the cut

coneC with a simpler yet trace-equivalent logic cone. In order to ensure soundness of this

approach, we need to include the initial values of any REGISTERs in C as elements ofC,

henceT[Z(C\R) � C. The abstracted netlist is then transferred to an arbitraryverification

flow, which may include successive applications of cut-based abstraction interspersed with

other abstraction techniques such as redundancy removal (refer to Chapter 5) and retiming

(refer to Chapter 6).

We develop the theory of this chapter to handle arbitrary cuts, though the imple-

mentation we discuss limitsC to be combinational. We provide efficient algorithms for

computing a minimally-sized trace-equivalent replacement cone forC. The primary goal

of this abstraction is to reduce the number of FREE vertices, with reduction of AND ver-

tices as a secondary goal. Our primary motivation for this combinational restriction is the

difficulty of sequential trace-equivalence calculation, which generally requires state space

enumeration like bisimilarity reduction [27, 85], hence often outweighs the cost of invari-

ant checking. We wish to simplify an overall verification flowwith this abstraction, hence

wish to avoid relying on algorithms which are likely to significantly hamper the verification

94

effort. Furthermore, when coupled with retiming, this combinational limitation becomes

less restrictive because retiming increases the size of thecombinationally-driven logic of

the netlist. Additionally, this abstraction is useful in reducing the size of the retiming stump

(refer to Section 6.2).

This abstraction is beneficial to several types of verification flows. First and fore-

most, though it is possible that the number of AND vertices may increase through this

abstraction (whereas the implementation may easily be tuned to guarantee that the number

of FREE vertices will decrease), as we demonstrate in our experimental results we often

reduce AND count. Therefore, this technique tends to increase the efficiency of arbitrary

subsequent algorithms since it reduces netlist size, hencethe amount of memory required

to represent the netlist and the amount of time necessary to analyze the netlist (regardless

of the nature of the analysis algorithms) both tend to decrease.

BDD-based techniques (such as symbolic reachability analysis and symbolic sim-

ulation) often benefit since, with fewer FREE vertices, there are fewer necessary BDD

variables hence BDDs tend to be smaller and reordering tendsto take less time. This is one

motivation behind the concept of parametric representation [86, 87]. Additionally, the cut-

based method of creating FREE vertices to drive the replacement cone forC does not cause

any correlation that did not already exist inC, and is often able to eliminate correlation,

resulting in a more compact BDD encoding.

Simulation-based techniques (including semi-formal approaches) may be enhanced

by cut-based abstraction since, in minimizing FREE vertex count, it becomes probabilis-

tically more likely to exercise a better distribution of valuations to the cut frontier. For

example, given a 10-input AND vertexv whose inlist comprises only FREE vertices, only

one of210 possible valuations toinlist(v)will drive a 1 tov. However, if we replacev with a

single FREE vertexv0, one of only two possible valuations tov0 will result in a1. Such a re-

placement may often be beneficial to increase the coverage attainable with simulation, and

may be viewed as a transformation-based approach at exploiting constraint-based testcase

95

pattern generation to achieve similar goals. However, there is a risk that this transformation

may hurt simulation; for example, the above-mentioned AND vertex may represent areset

condition which should assert very infrequently to ensure best coverage.

Finally, this approach is capable of enhancing SAT-based analysis. First, structural

SAT solvers often benefit from BDD sweeping [51] to eliminateredundancy in the netlist;

as per the above analysis, BDD-based analysis may greatly benefit from this approach. Ad-

ditionally, SAT efficiency tends to be more dependent upon the amount of combinational

logic in the netlist than on FREE vertex count. This abstraction is often able to reduce com-

binational logic, thus may enhance SAT analysis in a similarbut complementary manner

as BDD sweeping for redundancy removal is capable of enhancing SAT analysis.

Lemma 7.1. LetVC denote the set of vertices ofN sourcing edgesEC crossing an arbitrary

cut hC; Ci whereT [ Z(C \R) � C. LetV 0C of N 0 denote a trace-equivalent set of vertices

with respect to bijective mapping : VC 7! V 0C. The netlistN 00 = N 0 k C formed

from N 0 k N by merging eachv 2 VC onto (v) satisfies the condition that vertex setfV 00C [ C 00g is trace-equivalent to vertex setfVC [ Cg with respect to bijective mapping 00 = fhv; (v)00i : v 2 VCg [ fhv; v00i : v 2 Cg.Proof. This lemma is similar to the result that bisimilarity preserves all CTL formulas [28],

viewing C as a synthesized automaton representing some correctness formula. However,

our formulas are invariants, hence trace equivalence is a sufficient condition.

We first note that the only way in which a vertexu 2 C may semantically affectv 2 C is if there exists a structural path fromu to v, or if u fans out to the initial value of

a REGISTER in C, as follows from Definition 3.12. This definition further implies that we

may consistently evaluate vertexv using only valuations toVC without a need to observe

valuations to oi(VC) n VC given thatZ(C \R) � C.

Since we merge each vertex ofVC onto a trace-equivalent vertex ofV 0C, this implies

that any sequence of valuations toVC is also producible atV 0C and vice-versa. This in turn

implies that any sequence of valuations toC is also producible atC 00 and vice-versa, which

96

Partial Trace Lift Trace(Partial Trace p0)1. Completep0 overN 00 up to its length withSimulate.

2. Initializep = ;. For eachv 2 fVC [ Cg, and eachi 2 0; : : : ; length(p0)� 1:p = p [ h(v; i); p0� 00(v); i�i.3. UseBMC over oi(VC) to calculate a satisfying assignment to the sequence of valu-

ations toVC present inp. BMC will produce another tracep00.4. Add all valuations fromp00 into p. For eachv 2 oi(VC), and eachi 20; : : : ; length(p)� 1:p = p [ h(v; i); p00(v; i)i.5. return p.

Figure 7.1: Cut abstraction trace lifting algorithm

collectively imply trace-equivalence offVC [ Cg andfV 00C [ C 00g with respect to 00.Lemma 7.1 forms the theoretical basis for cut-based abstraction. It indicates that we

may in certain cases merge vertices onto trace-equivalent vertices in a sound and complete

manner – which is more general than restricting merging ontosemantically equivalent ver-

tices as discussed in Theorem 5.1. We may only exploit this generalization provided that

we merge a “semantic cut” of trace-equivalent vertices, otherwise we risk violating trace

equivalence of the resulting netlist. For example, assume that we have FREE verticesi1 andi2, which fan out tou1 = i1_ i2 andu2 = i1^ i2. Vertexu1 is trace-equivalent to any FREE

vertexu01, and vertexu2 is trace-equivalent to any FREE vertexu02. Note thatfu1; u2g is not

trace-equivalent tofu01; u02g because any trace over the former set will adhere tou2 ! u1,whereas there is no correlation betweenu01 andu02, thus performing either or both merges

will loose this correlation. Therefore, merging of a vertexonto a trace-equivalent vertex

risks becoming overapproximate unless we merge an entire trace-equivalent cut.

Our cut-abstraction trace lifting algorithm is depicted inFigure 7.1. First, we use bi-

nary simulation to complete the abstract trace up to the necessary length. Second, we prop-

97

agate valuations toV 00C andC 00 from the abstract trace to the lifted trace since by Lemma 7.1

we will be able to obtain a trace for which those valuations are consistent withN . Lastly,

we use a bounded model check of the valuations toVC inherited from the abstract trace to

find corresponding valuations to oi(VC) which yield a consistent partial trace.

Theorem 7.1.Cut-based abstraction is sound and complete for invariant checking.

Proof. First, anytarget unreachableresult will be correct by Lemma 7.1. Second, any

target hit result is correct by the same lemma. By assumption, the tracereceived from the

verification of the abstracted target is semantically correct with respect to the abstracted

netlistN 00 = N 0 k C, and hits the abstracted target. SinceVC is trace-equivalent to (VC),theBMC call will be satisfiable and obtain a corresponding set of valuations toC to produce

the sequence of valuations toVC observed in the abstracted trace. Composition of these two

traces thus yields a semantically correct trace. Since the target is an element ofC, the target

will also be hit in the lifted trace.

We omit the proof that cut-based abstraction generates a legal netlist, since this

proof is dependent upon the nature ofN 0. However, since the original netlist is legal by

assumption, clearly a legal solution is attainable.

Theorem 7.2. If the diameter of a set of verticesA00 � fV 00C [C 00g of a cut-abstracted netlistN 00 is d(A00), then the diameter of the corresponding verticesA in the unabstracted netlist

is alsod(A00).Proof. This theorem is an immediate consequence of Lemma 7.1 and Theorem 4.3.

7.1 Cut-Based Abstraction Algorithms

In this section we discuss algorithms for performing cut-based abstraction. As previously

mentioned, the implementation we present limits its domainto combinational cones. This

limitation was largely motivated by the desire to integratea technique to augment retiming;

98

while retiming is useful to reduce the size of therecurrence structureof a netlist, it does add

combinational logic including FREE vertices for theretiming stump(refer to Section 6.2).

A justification of this limitation is that it enables efficient algorithms for performing the

abstraction, whereas sequential trace-equivalence abstraction would generally require state

space enumeration similarly to bisimilarity reduction, which often outweighs the cost of

invariant checking [27]. Recently, Moon et al. [88] have proposed a similar variable re-

duction technique with applications to BDD-based combinational equivalence checking.

Some of the techniques presented in this section follow from[88], though are included

for completeness and to qualify our experimental results. We discuss their work further in

Section 7.2.

Our top-level algorithm is encapsulated in theCut Abstract function depicted in

Figure 7.2. We seed our cut solution by using the FREE vertices of the netlist as sourcesCs.Additionally, all REGISTERs and their fanout cones and initial values, plus all targets, are

seeded as sinksCt. The overall concept of the abstraction process is to compute the char-

acteristic functionBDD i of the cut verticesVC, representing the set of all reachable valu-

ations toVC. Once obtained, we synthesize a netlistN 0 containing verticesV 0C which have

the identical characteristic function, and merge each element ofVC onto a trace-equivalent

correspondent inV 0C. Ideally, the number of FREE vertices and AND gates will be smaller

in oi(V 0C) than in oi(VC). Rather than processingVC in one piece, we maximally parti-

tion this cut into setsCi which have disjoint fanin cones. We decide whether to attempt to

transform oi(Ci) in step 3a based upon the following heuristics.

1. If Ci = Ii, then no transformation is possible.

2. If jCij � jIij, our technique may not be capable of reducing the number of FREE

vertices. We therefore may wish to neglect processing the component to minimize

resources. Alternatively, we may wish to attempt to minimize the number of AND

gates in this component, and perform a transformation only if we demonstrate that

such a reduction is possible.

99

void Cut Abstract(Netlist N)1. Compute a cut from a seededCs andCt, defined as follows.

(a) Cs = I.

(b) Ct = T [ Z(R) [ R [ fanout cone(R).2. Maximally partitionVC into k disjoint setsC1; : : : ; Ck, such that(i 6= j) ! �

fanin

cone(Ci) \ fanin cone(Cj) = ;�, andSki=1 Ci = VC. Let Ii representI \ oi(Ci).

3. For each componentCi:(a) Decide whether to attempt a transformation.

(b) If we wish to attempt to transformCi, we try to obtainBDD i representing thecharacteristic function ofCi using algorithmAnalyze Cut. If successful, wedo the following:

i. Perform aggressive reordering onBDD i to make it as small as possible.

ii. SynthesizeBDD i using algorithmSynthesizeSet. This will yield a netlistN 0 containing verticesC 0i.iii. If N 0 is not too large, we merge eachv 2 Ci onto the correspondingv0 2 C 0i.

Figure 7.2: Top-levelCut Abstract algorithm

3. If jCij < jIij, we likely wish to transform the component. The only conditions

under which we neglect transforming the component are if obtainingBDD i exceeds

resource bounds, or if the size of the replacement component oi(C 0i) is too large.

We often obtain the greatest reductions from using a vertex min-cut in step 1. For

optimality, it is beneficial to perform a cone-of-influence reduction prior to cut abstraction

to prevent edges crossing out of oi(T ) from affecting the solution. Additionally, a prior

redundancy removal is useful to help enable a smaller cut size. While our algorithms for

abstracting the cut are often quite efficient, in cases the exact min-cut is too complex to

process in one step, or its trace-equivalent replacement may be too large. Therefore, it may

occasionally be beneficial to use a “less minimal” cut. It furthermore may be beneficial to

incrementally approach a min-cut through repeated calls tothis algorithm, similarly to the

100

incremental BDD-based approach proposed in [88]. Such an incremental approach effec-

tively decomposes the min-cut reduction. While the number of necessary cut vertices at

each intermediate step will likely be larger, such a decomposition often reduces computa-

tional resources. This is because additional partitioning(in step 2 of Figure 7.2) is often

possible due to elimination of reconvergence with respect to the larger min-cut cone. Fur-

thermore, a subsequent approach at abstracting the min-cutis more likely to succeed with

lesser resources, since the intermediate abstraction willlikely reduce FREE vertex count.

A decomposed approach also allows us to intersperse other reductions algorithms (such as

redundancy removal) between repeated calls to the cut-based abstraction.

Algorithm Analyze Cut depicted in Figure 7.3 is used to obtain the characteristic

function of a set of cut verticesCi. In our implementation we use BDD-based analysis

with a tuned conjunction and quantification schedule. However, other techniques such

as simulation-based or SAT-based enumeration may be used for this purpose. We use a

modified MLP [84] algorithm for the conjunction and quantification schedule. It is theBDD(vj) for eachvj 2 Ci, representing the function of cut vertexvj overI variables, that

must be conjuncted, and theI variables that must be quantified. Rather than waiting to

perform all conjunctions prior to quantification, we wish toperform quantification as early

as possible to keep peak BDD size low. As soon as we complete the last conjunction of aBDD(vj) which has a given FREE variable in its support, we may quantify that variable.

At each MLP scheduling step, we either schedule a composition, or “activate” a

FREE vertexu 2 I to simplify future scheduling decisions – initially, all FREE vertices are

“inactive.” Our goal is to minimize the lifetime of FREE vertex variables, from entering the

support ofBDD i through conjunction until leaving the support through quantification. The

following modifications of the MLP algorithm have proven to be the most useful.� At each decision point, we schedule the conjunction of anyBDD(vj) which has zero

inactive FREE vertices in its support.� If no BDD(vj) satisfies the above criterion, we instead activate an inactive FREE

101

BDD Analyze Cut(Vertex SetCi)1. Compute MLP [84] schedule(v1; : : : ; vjCij) for vertices inCi.2. InitializeBDD i = 1.

3. for ( j = 1; j � jCij; j++ )

(a) Associate a BDD variable with vertexvj, denoted byb(vj).(b) CalculateBDD(vj) representing the function ofvj overI.

(c) UpdateBDD i = BDD i ^ �b(vj) � BDD(vj)�.(d) Perform early quantification ofI variables fromBDD i.(e) If BDD i is too large,return NULL.

4. return BDD i.Figure 7.3:Analyze Cut algorithm

vertex. When choosing which FREE vertexu to activate, we select one which is in

the support of an unscheduled cut vertexvj with the fewest inactive FREE vertices

in its support. Ties are broken to minimize the total number of FREE vertices not

already in the BDD support which would need to be introduced beforeu could be

quantified.

Once we obtain the characteristic functionBDD i of cut verticesCi, we next syn-

thesize this BDD into a netlist to obtain a trace-equivalentset of verticesC 0i. Synthesis ofBDD i may be performed by the algorithmSynthesizeSetprovided in Figure 7.4. Predi-

cateparents(n) returns the set of parent nodes of BDD noden. Note that for the root node,parents(root) is empty.

102

Netlist SynthesizeSet(BDD BDD i)For each BDD variable~n of BDD i, in order of support from root to leaf, we do the fol-lowing. These variables correlate to vertices inCi; any not in the support ofBDD i may beprocessed in any order.

1. Create a new FREE vertexv and assign (~n) = v.

2. Initializea0(~n) = 0 anda00(~n) = 0.

3. For each BDD noden over variable~n:� If n is the BDD root, we definea(n) = ONE. Otherwise, we synthesize the setof paths which “sensitize”n from the root as follows:

– Initialize a(n) = ZERO.

– foreachm 2 parents(n) fif (n is thenbranch ofm) fa(n) = a(n) _ �a(m) ^ 0( ~m)�;gelsefa(n) = a(n) _ �a(m) ^ : 0( ~m)�;gg

Term 0( ~m) is defined below.� If else(n) � 0, thena0(~n) = a0(~n) _ a(n).� If then(n) � 0, thena00(~n) = a00(~n) _ a(n).4. Synthesize 0(~n) = a0(~n) _ �:a0(~n) ^ :a00(~n) ^ (~n)�.

return C 0i = 0�BDD vars(BDDi)�.Figure 7.4:SynthesizeSetalgorithm

103

The purpose ofa0 anda00 in algorithmSynthesizeSetof Figure 7.4 are to enumerate

valuations to (~n) for which vertex 0(~n) must drive a deterministic value, to in turn preventC 0i from being able to produce a valuation which cannot be sensitized inCi. Intuitively,a0(~n) represents the set of valuations to predecessor variables of ~n for which an assignment

of 0 to ~n would render a cross-product of valuations in the offset ofBDD i, meaning that

the corresponding valuation is not in the characteristic function ofCi. Similarly, a00(~n)represents the set of valuations to predecessor variables of ~n for which an assignment of1to ~n would render a cross-product of valuations in the offset ofBDD i. The new inputs (~n)are parametric variables, and are the source of random choice in the abstracted coneC 0i.

We demonstrateSynthesizeSet on an example BDD in Figure 7.5. We have bor-

rowed this example from the work of [89], which will be discussed Section 7.2. Eachxirepresents a parametric variable (~ni) correlating to BDD variable~ni, over nodesnia andnib. Termx0i represents the corresponding synthesized 0(~ni).

1 0

n1a

n3a

n4bn4a

n3bn2bthenelse

n2aa(n2a) a(n2b)

a(n3a) a(n3b)

a(n4a) = a0(~n4a)

x01x02x03

a(n4b) = a00(~n4b)

ONE

x2x3

x4

x1a(n1a)

x04Figure 7.5: BDD synthesis example

104

Lemma 7.2.AlgorithmSynthesizeSetperforms a semantically correct BDD synthesis. In

particular, for the generated netlistN 0, a given mintermm : BDD vars(BDDi) 7! f0; 1g is

an element ofBDD i if and only if9p0 2 P 0:8~n 2 BDD vars(BDDi): p0� 0(~n); 0� = m(~n).Proof. For any variable~n not in the support of the BDD,SynthesizeSet will assign 0(~n) = (~n) since botha0(~n) and a00(~n) will be 0. This is correct since there is no

cross-correlation between~n and any other variable, hence we need to drive 0(~n) by an

uncorrelated parametric FREE vertex.

For other variables, the disjuncted terma0(~n) indicates the set of valuations to pre-

decessor variables (with respect to the arbitrary BDD rank)for which only a binary 1 may

be driven onto 0(~n) to avoid the synthesized netlist from being able to drive a valuation

which is not a minterm ofBDD i. Similarly, the disjuncted terma00(~n) indicates the set of

valuations to predecessor variables for which only a binary0 may be driven. It is only for

sets of valuations to predecessor variables which do not satisfy a0(~n) _ a00(~n) that we may

allow 0(~n) to randomly select values via (~n).We now analyze the number of AND vertices created bySynthesizeSet. Givenm BDD nodes, we note that there will be at most2 �m 2-input AND vertices necessary to

represent the conjunctions inside ofa, since each node has at most 2 children hence appears

at most twice in any of these conjunction terms (once with respect to its variable inverted,

once uninverted). There will be at most an additional2 � m 2-input AND vertices for the

disjunctions over those conjuncted terms, though this number is often smaller since many

nodes have a single parent. We note that there will be at mostm elements ina0 [ a00, since

no node will have both children as0 (else the BDD is not reduced). Therefore, there will be

at mostm 2-input AND vertices to associate thea0 anda00 vertices to variables. Practically,

the number ofa0 anda00 vertices necessary tends to be much smaller thanm, since most

nodes are likely not to have0 as a child. Lastly, there will be at most3�jBDD vars(BDDi)j2-input AND vertices necessary for the 0 terms, as there is one disjunction and a 3-input

105

conjunction per variable to drive 0. However, unless the nodes of a given variable have0both as athenand anelsebranch, at least one ofa0 or a00 will be empty hence less than 3

vertices will be necessary for that variable.

7.2 Related Work

The overall theory of soundness of replication of a cut by a trace-equivalent cone is similar

to that for a bisimilar cone, hence may be viewed as a conservative approach of assume-

guarantee reasoning [45] when verifying the moduleC. Several techniques for study-

ing bisimulation minimization and less conservative property-preserving minimizations

have been proposed, for example in [85, 27, 29]; while more general than the combina-

tional implementation proposed in this chapter, they suffer from computational complexity

which outweighs an invariant check. In contrast, we focus upon a more restrictive do-

main for which efficient reduction algorithms are applicable; this combinational domain

furthermore well-suits the goal of augmenting the reductions possible through retiming in

a transformation-based verification setting.

The discussed implementation is quite similar to the recentwork of Moon et al. [88],

hence we will limit our discussion of previous work to this technique. Their work provides

a cut-based variable reduction technique, with presented applications to simplifying BDD-

based combinational equivalence checking. Though their techniques are tuned for enhanc-

ing BDD-based verification thus perform their analysis and reduction purely via BDDs,

fairly straightforward extensions could be used to plug their algorithms for analyzing and

abstracting cuts into our top-levelCut Abstract function. OurAnalyze Cut technique

is one contribution above their approach. Both of our techniques require the calculation

of the characteristic function of a set of vertices to enableparametric reductions for cut-

sets; however, they do not discuss their algorithms for doing such. We present a tuned

conjunction and early quantification schedule for performing this calculation via BDDs,

106

which has proven efficient as per our experimental results. Additionally, this BDD-based

approach could be replaced with simulation-based or SAT-based enumeration. Their ap-

proach of obtaining a parametric representation is derivedfrom the input-output relation

synthesis algorithm presented in [89]. Though more general, when applied to BDDs with

only output variables (as with our approach) the technique of [89] tends to require several

more AND vertices per BDD node. As an example, applying their approach to the BDD

of Figure 7.5, and using the on-the-fly reduction techniquespresented in Chapter 5, will

require 24 2-input AND vertices instead of 11 with our approach (note that the top-most

two AND vertices of that figure are unnecessary due to conjunction with ONE). The tech-

nique of [88] obtains a parametric representation directlyas a BDD rather than a netlist,

though their algorithmBFS PR would yield an identical netlist as ourSynthesizeSet if

results were mapped to gates instead of BDD nodes. By representing the abstraction as a

netlist, we enable an efficient trace lifting procedure as per Figure 7.1 which may use an

arbitrary algorithm to discharge theBMC obligation (we have found SAT to be often the

most efficient); the approach of [88] does not provide a solution to the trace generation

problem. Nevertheless, their implementation is quite similar to ours; our primary motiva-

tion for discussing a structural flavor of this technique is to demonstrate its synergy with

other abstractions as will be reflected in our experimental results. In particular, we have

found that this technique coupled with redundancy removal is the most efficient way to

minimize the retiming stump created by retiming, thereby helping to ensure that retiming

will not risk hampering a verification flow.


In this section we provide experimental results for our combinational cut-based abstrac-

tion implementation. All experiments were run on an IBM ThinkPad model T21 running

RedHat Linux 7.2, with an 800 MHz Pentium III and 256 MB main memory. We set the

107

peak BDD size to219 nodes. We chose a modifiedaugmenting pathalgorithm [90] to com-

pute a vertex min-cut which tends to provide near-linear runtimes despite its worst-case

complexity ofO(jVj � jEj).We performed several sets of experiments to study the reduction capability of this

technique in minimizing FREE and AND vertex count. The first set of experiments was

performed upon the ISCAS89 benchmarks, and is summarized inTable 7.1. We enumer-

ated every primary output of these netlists as a target – aside from any which are also

FREE vertices. For various transformation flows, we report the number of FREE vertices in

the cone of influence of the targets before and after the reduction. We additionally report

the number of combinationally-driven AND vertices (elements of oi(VC) from the algo-

rithm of Figure 7.2) before and after the abstraction. The first column provides the name

of the benchmark. The next five columns present the results, before and after cut-based

abstraction, for various flows of abstraction engines. CUT refers to this cut-based ab-

straction engine; COM refers to a redundancy removal engineusing the technique of [51];

RET refers to a retiming engine (refer to Chapter 6). In columns 2-6, we report the flows

CUT, COM-CUT, RET-CUT, COM-RET-CUT, and COM-RET-COM-CUT,respectively.

In each of these columns, we first report the number of FREE vertices in the cone of in-

fluence of the targets after the abstraction; the number in parenthesis reports the number

eliminated through the abstraction. The second set of numbers (after the semi-colon) refers

to the number of combinationally-driven AND verticesA after the abstraction; the num-

ber in parenthesis reports the number eliminated through the abstraction. In cases, AND

count is increased through the abstraction, correlating toa negative number in parenthesis.

A summation of these values is provided in the last row. Table7.2 provides an identical

set of experiments for randomly-selected components and targets for the IBM Gigahertz

Processor, after performing phase abstraction (refer to Chapter 10).

The computing resources for this abstraction tend to be quite negligible. For Ta-

ble 7.1, the maximum run-time for the cut engine is 160 seconds (for S158501); the av-

108

erage run-time for the others is 0.66 seconds. The maximum memory requirement is 10.9

MB; the average for the others is less than 1 MB. During the abstraction process for these

benchmarks, we aborted theAnalyze Cut for six of 200 components due to exceeded re-

sources. For the ISCAS benchmarks, we note that very little reduction is possible prior to

retiming. This is due to two phenomena: first, the larger number of REGISTERs implies

that there is a fairly small combinational cone to which to apply this technique. Retiming

reduces REGISTERcount, thus the recurrence structure tends to have a larger combinational

cone. Second, after retiming we have the combinational retiming stump composed onto the

recurrence structure, creating an additional domain of applicability of this abstraction. As

mentioned in Section 6.4, and as reflected in those experiments, the retiming stump is rarely

a hindrance to the overall verification flow as much of it may beeliminated by redundancy

removal, though it nonetheless does add to netlist size. Comparing columns 5 and 6, prior

to the post-retiming call to redundancy removal, we have 1897 FREE vertices and 11835

AND vertices total in the combinational cones. After this last call to redundancy removal,

these numbers drop to 1846 and 9560, respectively. However,redundancy removal is lim-

ited in the type of reductions it may provide, which is the motivation for experimentation

with this combinational cut-based abstraction technique.Cut-based abstraction reduces

FREE vertex count for these two cases by 332 and 411, respectively, correlating to 17.5%

and 22.3%, respectively. In the former case we reduce AND count by 20.0%; however,

we see in the latter case that the transformation of S158501 causes a significant increase

in AND count. This illustrates an occasional, though infrequent,risk of increasing AND

count when reducing FREE vertices – though note that far more frequently the AND count

is reduced. This risk may be minimized by bounding the increased size of an abstracted

cone and neglecting a replacement if this threshold is exceeded. Nevertheless, ignoring

S158501, cut-based abstraction yields a reduction of 302 of 1624 FREE vertices (correlat-

ing to 18.6% reduction), and a reduction of 1559 of 9560 AND vertices (correlating to a

reduction of 16.3%) for this last column.

109

DesignCUT COM,CUT RET,CUT COM,RET,CUT COM,RET,COM,CUTjIj (4); jAj (4) jIj (4); jAj (4) jIj (4); jAj (4) jIj (4); jAj (4) jIj (4); jAj (4)

PROLOG 33 (3); 16 (3) 33 (3); 14 (3) 58 (9); 87 (-7) 52 (10); 88 (14) 51 (11); 79 (12)S1196 14 (0); 303 (0) 14 (0); 303 (0) 27 (1); 445 (-14) 27 (1); 449 (-22) 27 (1); 449 (-22)S1238 14 (0); 340 (0) 14 (0); 343 (0) 27 (1); 499 (-21) 27 (1); 496 (-21) 27 (1); 493 (-21)S1269 18 (0); 26 (0) 18 (0); 26 (0) 18 (0); 26 (0) 30 (2); 51 (2) 30 (2); 51 (2)S132071 54 (2); 11 (3) 54 (2); 11 (3) 148 (85); 968 (1222)121 (62); 591 (476) 103 (35); 254 (77)S1423 17 (0); 1 (0) 17 (0); 1 (0) 29 (0); 24 (0) 29 (0); 24 (0) 29 (0); 23 (0)S1488 8 (0); 12 (0) 8 (0); 12 (0) 10 (0); 13 (0) 8 (0); 12 (0) 8 (0); 12 (0)S1494 8 (0); 12 (0) 8 (0); 12 (0) 10 (0); 13 (0) 8 (0); 12 (0) 8 (0); 12 (0)S1512 29 (0); 11 (0) 29 (0); 11 (0) 29 (3); 24 (-4) 29 (0); 11 (0) 29 (0); 11 (0)S158501 55 (12); 259 (-39)43 (12); 227 (-39) 198 (8); 1426 (22) 206 (6); 1783 (-22) 103 (109); 30601 (-29101)S2081 10 (0); 1 (0) 10 (0); 1 (0) 10 (0); 1 (0) 10 (0); 1 (0) 10 (0); 1 (0)S27 4 (0); 0 (0) 4 (0); 0 (0) 7 (0); 2 (0) 4 (0); 0 (0) 4 (0); 0 (0)S298 3 (0); 0 (0) 3 (0); 0 (0) 6 (0); 2 (0) 3 (0); 0 (0) 3 (0); 0 (0)S3271 26 (0); 189 (0) 26 (0); 180 (0) 26 (0); 186 (0) 26 (0); 179 (0) 26 (0); 179 (0)S3330 34 (3); 12 (3) 34 (3); 12 (3) 59 (7); 67 (4) 55 (10); 86 (21) 54 (11); 74 (22)S3384 43 (0); 24 (0) 43 (0); 24 (0) 66 (102); 489 (435)81 (103); 1046 (510) 81 (103); 1040 (510)S344 9 (0); 0 (0) 9 (0); 0 (0) 1 (0); 0 (0) 9 (0); 0 (0) 9 (0); 0 (0)S349 9 (0); 0 (0) 9 (0); 0 (0) 1 (0); 0 (0) 9 (0); 0 (0) 9 (0); 0 (0)S35932 35 (0); 32 (0) 35 (0); 32 (0) 69 (0); 119 (0) 35 (0); 32 (0) 35 (0); 32 (0)S382 3 (0); 0 (0) 3 (0); 0 (0) 6 (0); 2 (0) 6 (0); 2 (0) 6 (0); 2 (0)S385841 32 (0); 11 (0) 31 (0); 11 (0) 94 (12); 1256 (21) 95 (15); 1606 (-85) 89 (15); 944 (-159)S386 7 (0); 6 (0) 7 (0); 6 (0) 7 (0); 2 (0) 7 (0); 6 (0) 7 (0); 6 (0)S400 3 (0); 0 (0) 3 (0); 0 (0) 6 (0); 2 (0) 6 (0); 2 (0) 6 (0); 2 (0)S4201 18 (0); 1 (0) 18 (0); 1 (0) 18 (0); 1 (0) 18 (0); 1 (0) 18 (0); 1 (0)S444 3 (0); 0 (0) 3 (0); 0 (0) 6 (0); 2 (0) 3 (0); 0 (0) 3 (0); 0 (0)S4863 47 (2); 0 (16) 47 (2); 0 (16) 73 (88); 0 (809) 72 (100); 74 (1382) 72 (100); 61 (1015)S499 1 (0); 0 (0) 1 (0); 0 (0) 1 (0); 0 (0) 1 (0); 0 (0) 1 (0); 0 (0)S510 19 (0); 6 (0) 19 (0); 6 (0) 19 (0); 5 (0) 19 (0); 6 (0) 19 (0); 6 (0)S526N 3 (0); 0 (0) 3 (0); 0 (0) 6 (0); 2 (0) 3 (0); 0 (0) 3 (0); 0 (0)S5378 34 (1); 108 (1) 34 (1); 108 (1) 48 (13); 214 (18) 51 (13); 227 (21) 51 (13); 217 (31)S635 2 (0); 0 (0) 2 (0); 0 (0) 2 (0); 0 (0) 2 (0); 0 (0) 2 (0); 0 (0)S641 33 (0); 20 (0) 33 (0); 20 (0) 33 (0); 14 (0) 32 (0); 16 (0) 32 (0); 16 (0)S6669 83 (0); 84 (0) 83 (0); 84 (0) 213 (1); 2418 (51) 213 (1); 2418 (51) 212 (2); 2297 (52)S713 33 (0); 20 (0) 33 (0); 20 (0) 33 (0); 14 (0) 33 (0); 16 (0) 33 (0); 16 (0)S820 18 (0); 39 (0) 18 (0); 37 (0) 17 (1); 18 (1) 18 (0); 37 (0) 18 (0); 37 (0)S832 18 (0); 39 (0) 18 (0); 37 (0) 17 (1); 17 (1) 18 (0); 37 (0) 18 (0); 37 (0)S8381 34 (0); 1 (0) 34 (0); 1 (0) 34 (0); 1 (0) 34 (0); 1 (0) 34 (0); 1 (0)S92341 24 (1); 20 (1) 24 (1); 20 (1) 32 (8); 34 (48) 34 (8); 54 (40) 34 (8); 44 (40)S938 34 (0); 1 (0) 34 (0); 1 (0) 34 (0); 1 (0) 34 (0); 1 (0) 34 (0); 1 (0)S953 16 (0); 49 (0) 16 (0); 49 (0) 16 (0); 50 (0) 16 (0); 52 (0) 16 (0); 52 (0)S967 16 (0); 59 (0) 16 (0); 59 (0) 16 (0); 49 (0) 16 (0); 51 (0) 16 (0); 51 (0)S991 65 (0); 0 (0) 65 (0); 0 (0) 68 (14); 1 (20) 65 (0); 0 (0) 65 (0); 0 (0)P 969 (24); 956 (24); 1598 (354); 1565 (332); 1435 (411);

1713 (-12) 1669 (-12) 8494 (2606) 9468 (2367) 37102 (-27542)

Table 7.1: Cut results for ISCAS89 benchmarks

110

DesignCUT COM,CUT RET,CUT COM,RET,CUT COM,RET,COM,CUTjIj (4); jAj (4) jIj (4); jAj (4) jIj (4); jAj (4) jIj (4); jAj (4) jIj (4); jAj (4)

CP RAS 68 (0); 0 (0) 68 (0); 0 (0) 76 (44); 34 (107) 73 (46); 28 (94) 70 (47); 5 (96)D DASA 19 (0); 13 (0) 19 (0); 13 (0) 25 (2); 14 (7) 19 (0); 13 (0) 19 (0); 13 (0)D DCLA 67 (0); 23 (0) 67 (0); 23 (0) 75 (8); 69 (42) 73 (0); 64 (0) 73 (0); 64 (0)D DUDD 49 (5); 81 (71) 48 (5); 41 (65) 89 (15); 1462 (-708) 81 (22); 730 (-292) 79 (24); 764 (-396)I IBBQn 402 (0); 279 (0) 402 (0); 68 (0) 402 (0); 3179 (0) 402 (0); 3085 (0) 402 (0); 3058 (0)I IFAR 36 (4); 3 (4) 28 (4); 3 (4) 62 (26); 739 (-440) 51 (10); 79 (9) 48 (8); 85 (-27)I IFPF 128 (27); 8 (120)121 (26); 8 (116)1110 (32); 5849 (122)128 (48); 74 (153) 121 (52); 52 (155)L3 SNP1 65 (0); 31 (0) 41 (0); 24 (0) 85 (23); 3546 (-1851) 42 (3); 401 (1) 42 (3); 303 (1)L EMQn 79 (10); 5 (10) 0 (0); 0 (0) 95 (132); 140 (139) 0 (0); 0 (0) 0 (0); 0 (0)L EXEC 108 (1); 3 (1) 77 (2); 2 (2) 140 (87); 109 (263) 65 (20); 44 (25) 48 (16); 28 (21)L FLUSHn 41 (6); 45 (11) 41 (6); 41 (11) 47 (48); 33 (300) 41 (29); 17 (119) 41 (29); 17 (119)L INTRo 24 (0); 0 (0) 24 (0); 0 (0) 17 (7); 49 (19) 17 (7); 35 (15) 17 (7); 35 (15)L LMQo 149 (40); 96 (59)149 (40); 98 (57) 167 (50); 356 (-2) 170 (48); 332 (13) 170 (48); 290 (48)L LRU 17 (0); 1 (0) 16 (0); 0 (0) 25 (6); 82 (6) 13 (3); 12 (3) 13 (3); 12 (3)L PFQo 46 (0); 6 (0) 46 (0); 6 (0) 68 (12); 135 (12) 66 (12); 168 (14) 65 (13); 117 (42)L PNTRn 88 (4); 58 (5) 13 (0); 45 (0) 331 (9); 1152 (-51) 13 (0); 146 (0) 0 (0); 0 (0)L PRQn 5 (1); 0 (1) 5 (1); 0 (1) 6 (5); 0 (5) 6 (1); 0 (1) 6 (1); 0 (1)L SLB 28 (1); 3 (2) 28 (1); 3 (2) 39 (15); 78 (4) 32 (23); 41 (34) 32 (23); 41 (29)L TBWKn 13 (1); 3 (1) 13 (1); 3 (1) 11 (5); 6 (5) 9 (5); 4 (5) 9 (5); 4 (5)SIDECAR 4 15 (0); 20 (0) 13 (0); 9 (0) 25 (3); 37 (27) 25 (3); 25 (11) 25 (3); 23 (6)S SCU1 70 (1); 5 (1) 70 (1); 5 (1) 74 (32); 27 (67) 63 (23); 9 (49) 49 (18); 5 (29)W GAR 38 (1); 0 (1) 38 (1); 0 (1) 71 (1); 7 (1) 38 (1); 0 (1) 38 (1); 0 (1)P 1555 (102); 1327 (88); 3040 (562); 1427 (304); 1367 (301);

683 (287) 392 (261) 17103 (-1926) 5307 (255) 4916 (148)

Table 7.2: Cut results for GP netlists

For the GP netlists of Table 7.2, our peak run time was 282 seconds for a resource-

exceeded attempt on LPNTRn; the average of the other runs is 0.9 seconds. Our peak

memory utilization was 1 MB. We aborted the abstraction process for a total of four of 363

components due to exceeded resources. For these netlists, we see a greater potential for

reduction prior to retiming; using the cut abstraction by itself we are able to reduce FREE

vertex count by 102 of 1657 (or 6.1%), and AND count by 287 of 970 (or 29.6%). After a

single redundancy removal call, we reduce FREE vertex count by 88 of 1415 (or 6.2%), and

AND count by 261 of 653 (or 40.0%). After redundancy removal and retiming (columns 5

and 6), prior to the post-retiming call to redundancy removal, we have 1731 FREE vertices

and 5562 AND vertices total in the combinational cone. After this call toredundancy

removal, these numbers drop to 1668 and 5064, respectively.Addition of our cut-based

abstraction reduces FREE vertex count for these two cases by 304 and 301, respectively,

correlating to 17.6% and 18.0%, respectively. We additionally reduce AND count by 4.6%

111

and 2.9%, respectively. Note that DDUDD hurts the cumulative AND reduction for these

two columns. Ignoring this netlist, we reduce FREE vertex count by 17.3% and 17.7%, and

AND vertex count by 10.7% and 11.6%, for these two columns, respectively.

Overall, these results demonstrate that cut-based abstraction has the potential to

yield a significant reductions above and beyond redundancy removal techniques, attainable

with negligible computational resources. They further illustrate its synergy with retiming

and redundancy removal. Future work of extending structural cut-based abstraction to

include sequential logic for which trace-equivalent reduction is computationally efficient

is a promising direction. Additionally, more research is needed into techniques to improve

incremental reductions in case of exceeded resources, and possibly preferring alternate cuts

if the replacement cone of a given cut increases AND count significantly.

112

Chapter 8

Structural Target Enlargement

In this chapter we introduce our technique ofstructural target enlargementfrom [24],

which is collaborative work with Andreas Kuehlmann and Jacob Abraham. The goal of

target enlargement is to render a target which may be hit at a shallower depth from the ini-

tial states of a netlist, and with a higher probability, thanthe original target as noted by prior

research [91, 92, 93]. Additionally, our particular approach enables significant reduction in

the size of the enlarged target by temporally decomposing the overall verification problem,

and may be viewed as a generalized inductive proof which makes use of SAT-basedBMC ,

BDD-based analysis, and diameter overapproximation techniques.

Definition 8.1. A k-step target enlargementis the set of states that can reach targett in ktime-steps, denoted asSt�k � S, defined as follows.S t�k = 8><>:fs 2 S : 9i � I: �Simulate(t; fs; ig) = 1g� : k = 0preimage�S t1�k� : k 6= 0 (8.1)

If an initial states0 becomes part of the enlarged target for anySt�j, the target is

proven reachable. Otherwise, if during the current enlargement stepj no new states are

enumerated that have not been encountered in “shallower” steps, i.e.St�j nSj�1i=0 St�i = ;,113

the target is clearly unreachable. Ifk � d(t) preimages are performed without reaching

an initial state, unreachability may be inferred. If at any step the computing resources

exceed a given limit, the enlargement process is truncated and the verification problem is

reformulated based upon the states enumerated during shallower steps (refer to Figures 8.1

and 8.2).

As per Definition 8.1, target enlargement is based upon preimage computation, for

which there are three primary techniques: (1) transition-relation based methods [91, 92,

93, 94], (2) transition-function based methods using theconstrainoperator [95], and (3)

transition-function based methods using thecomposeoperator [96]. In our implementation

we chose the latter, since the set of REGISTERs in the support of each iteration of a target

enlargement is often a small subset of those in the entire cone of influence of the target.

This precludes entailing unnecessary computational complexity, and well-suits our goal of

rendering a simpler problem with as few REGISTERs as possible – the enlarged target – if

the target is not solved during enlargement.

Figure 8.1 shows the pseudocode for our target enlargement algorithm. We use

BMC to attempt to hit the target as well as to discharge our induction hypothesis for the

subsequent backward analysis. In our implementation, we use SAT-basedBMC rather than

BDD-based analysis since the former is often more efficient for bounded analysis. IfBMC

hits the target, or an overapproximation of diameterd(t) is surpassed during the bounded

search, we discharge the target in step 1. A diameter bound may be obtained by the tech-

nique presented in Chapter 4, or by any other mechanism. IfBMC is inconclusive, we per-

form compose-basedpreimage computations; we may alternatively iterate betweenBMC

and preimage computation with resource bounds. We apply early quantification of FREE

vertex variables to keep the intermediate BDD size small; assoon as the last composition

which has a given FREE variablev in its support is performed, we may quantifyv. We use

a modified MLP algorithm [84] for our quantification and composition scheduling. At each

MLP scheduling step, we either schedule a composition, or “activate” a FREE vertex to

114

BDD Enlarge Target�Target t, N k, N d(t)�

1. for ( i = 0; i < k; i++ )

(a) RunBMC on targett for time-stepi. If t is hit, report the hit andreturn NULL.

(b) If t has not been hit, andi � d(t) � 1, then reportt asunreachable; returnNULL.

2. BuildBDD0 for t, over variables forfI [Rg \ combinational fanin(t).3. Existentially quantifyI variables fromBDD0. Note thatBDD0 represents the setSt0.4. for ( i = 1; i � k; i++ )

(a) Compute MLP [84] schedule(R1; : : : ; Rn) for REGISTERs supportingBDD i�1.(b) Rename all variablesr in BDD i�1 to r0, formingBDD i.(c) for ( j = 1; j � n; j++ )

i. BDD i = BDD compose(BDD i; r0j; frj ), which substitutesfrj (the BDDfor the next-state function of REGISTERrj) in place of variabler0j inBDD i.

ii. Perform early quantification ofI variables fromBDD i.iii. Minimize BDD i with BDD compactusingBDD0; : : : ;BDDi�1 asdon’t

cares.

iv. If BDD i is too large, assignk = i� 1 andreturn BDD i�1.(d) If BDD i is 0, then reportt asunreachable; return NULL.

5. return BDDk.Figure 8.1:Enlarge Target algorithm

simplify future scheduling decisions – initially, all FREE vertices are “inactive.” Our goal

is to minimize the lifetime of FREE vertex variables from activation until quantification,

and to delay the introduction of REGISTER variables. Each composition step eliminates

one next-state REGISTERvariabler0, and introduces zero or more present-state REGISTER

variablesr and FREE vertex variables. The following modifications of the MLP algorithm

have proven to be the most useful.� At each scheduling step, we schedule compositions of all REGISTERs with no in-

115

active FREE vertices in their support which introduce at most one REGISTER not

already in the BDD support. Each such composition eliminates the correspondingr0 variable from the BDD support, and adds at most oner variable to the support,

which is typically beneficial for minimizing peak BDD size. We next schedule com-

positions of all REGISTERs with zero inactive, and nonzero active, FREE vertices in

their support, regardless of their REGISTERsupport, to force quantification.� If no REGISTERsatisfies the above criteria, we instead activate a FREE vertex. When

choosing which FREE vertexv to activate, we select one which is in the support of an

unscheduled REGISTER with the fewest, though non-zero, inactive FREE vertices in

its support. Ties are broken to minimize the total number of REGISTERs not already

in the BDD support which would need to be introduced beforev could be quantified.

After each quantification, the intermediateBDD i is simplified by theBDD compact op-

eration [97], using the BDDs of previous iterations asdon’t cares.1 This simplification

constitutes aweakinductive proof of unreachability of the target; use of a constraint in-

stead of a don’t care would constitute an exact inductive proof. Note that the corresponding

induction hypothesis was previously discharged byBMC . The resulting simplifiedBDD isatisfies the following relation.St�i n i�1[j=0St�j � BDD i � i[j=0St�j (8.2)

Additionally, size(BDD i) � size(St�i), wheresize(St�i) represents the BDD node

count ofSt�i. TheBDD compactoperation cannot introduce new variables into the support

of a BDD, and may eliminate some. Hence it is well-suited for our goal of minimizing the

support of each preimage computation and thereby of the enlarged target. It is also this

goal that prompts us to keep eachBDD i distinct; taking their union may result in greater

1A similar reachability-based approach would exploit states that can hitt within k time-steps as don’tcares when assessing reachability of a state that can hitt in exactlyk steps. With this observation, wemay use these don’t cares also to simplify the next-state functions of REGISTERs, which may further reducecomplexity for a subsequent verification flow.

116

reductions throughBDD compact, though this union may be a costly operation. Using

don’t cares instead of constraints weakens our unreachability analysis, thus a fixed-point

may never be reached. However, as demonstrated by our experimental results, this weaker

approach is capable of solving or significantly simplifyingmany targets, which justifies the

chosen trade-off of precision versus computational efficiency.

If the BDD size at any step exceeds a given limit, the enlargement process is trun-

cated and the BDD of the previous iteration is returned. Thisprevents exceedingly large

enlarged targets which could harm the subsequent verification flow. We have found it bene-

ficial to use two limits, since the intermediate BDD size tends to be significantly larger than

that of the final, fully-quantified BDD: one hard upper-limiton BDD size to prevent the en-

largement process from consuming too many resources, and another smaller limit on the

finalBDD i size which reflects the potential increase in AND count of the resulting netlist.

As observed in [98], representing the target as a structure may often be beneficial

in a general toolset, in circumventing the need for a potentially costly interfacing of algo-

rithms (e.g., mapping simulation results to a BDD to check for intersection). Furthermore,

the ability to quantify FREE variables is useful to increase the probability of hitting atarget

with incomplete search techniques. We have found that this approach – from structure to

BDDs back to structure – is more effective in a flexible toolset than enlargement by purely

structural transformation [54]. The latter does tend to yield large, redundant structures

which may significantly hinder subsequent BDD- or simulation-based analysis; structural

quantification furthermore may entail an exponential increase in size. In contrast, our en-

largement approach often reduces the size of the target coneand thus enhances arbitrary

subsequent verification approaches.

Using SAT rather than BDDs for an inductive proof may occasionally be more effi-

cient. However, if unsuccessful, our BDD-based result may be reused to directly represent

the simplifiedfunction of thek-step enlarged target. A similar reuse is not possible with

a SAT-based method. Furthermore, without BDD-based analysis, it is virtually impossi-

117

ble to assess whether a given enlargement risks fatally hurting a subsequent BDD-based

engine in a transformation-based verification flow. In [99] it is proposed to apply cubes ob-

tained during an inductive SAT call as “lighthouses” to enhance the ability to subsequently

hit targets; such an incomplete approach, however, precludes the structural reductions of

our technique.

8.1 Target Enlargement Algorithms

In this section we discuss the overall flow of our decomposition algorithmEnlarge which

is illustrated in Figure 8.2. For each target, we first determine a limit on the number of en-

largement steps, and then call the algorithmEnlarge Target on targett. If Enlarge Target

reports ahit or unreachablesolution, this corresponding target has been discharged. Oth-

erwise a structure representing the enlarged target is added to N . This is performed by

creating a new netlistN 0 which encodes the function of the BDD of the enlarged target,

using a standard multiplexor-based BDD synthesis [100]. The output gate ofN 0, denoted

ast0, is a combinational function over the REGISTERs inN . The composition ofN andN 0, denoted asN k N 0, is then passed to a subsequent verification flow to attempt tosolvet0. For example, we may next apply retiming (refer to Chapter 6)and redundancy removal

(refer to Chapter 5), which have the potential to further reduce the netlist size, after which

we may wish to attempt another target enlargement. If a subsequent engine demonstrates

unreachability oft0, thent is also unreachable. If the subsequent verification flow hitst0, we

use simulation and anotherBMC to lift the trace for the parent verification flow as depicted

in Figure 8.3.

Theorem 8.1.Target enlargement is sound and complete for invariant checking.

Proof. We first consider the case that anunreachableresult is generated. There are three

conditions in which this result may occur. First, in algorithm Enlarge Target, if BMC

does not hit the target butk constitutes an upper-bound on diameter, an unreachability

118

void Enlarge(Netlist N)� foreacht 2 T1. Determine a limit on the number of enlargement stepsk as follows. Letd(t) rep-

resent an arbitrary upper-bound on the diameter oft. We assignk = min �d(t);user specifiedlimit).

2. InvokeBDDk = Enlarge Target�t; k; d(t)� to enlarge the target.

3. If t is unsolved, synthesizeBDDk into netlistN 0; composeN 0 ontoN ; andreplacet with t0 in T .

Figure 8.2: Top-level target enlargement flow

Partial Trace Lift Trace(Partial Trace p0)1. Completep0 overN 0 with Simulateup to the first hit oft0 to obtainp00.2. Cast aBMC of t for k time-steps from the last state ofp00, wherek is the number of

time-steps thatt0 was enlarged. This call must be satisfiable, and will yield tracep000.3. Concatenatep000 ontop00, overwriting the last time-step ofp00 with the first time-step

of p000, to obtainp0000.4. return p0000.

Figure 8.3: Target enlargement trace lifting algorithm

result is correct by the definition of diameter. Second, in the same function, if a givenBDD i becomes equivalent to 0, the unreachable result is correct by inductiveness;BMC

discharged our base case. Finally, if anunreachableresult for the enlarged target is reported

by a child verification flow, this result will be propagated upward. This result is correct by

noting that the enlarged target constitutes the characteristic function of the set of states

defined in formula (8.2): a subset of all states that can hit the target in0; : : : ; k steps, and

a superset of those that can hit the target in exactlyk steps minus those that can hit the

target in0; : : : ; k � 1 steps. SinceBMC has demonstrated that the target cannot be hit at

time0; : : : ; k�1 and the child flow has effectively proven that the target is not reachable at

timesi; : : : ;1wherei � k, this collectively constitutes a valid proof thatt is unreachable.

119

We next consider the case that atarget hit result is generated. If the target is hit by

BMC during enlargement, the result and trace are correct by assumption. The only other

target hit result will be generated if a child flow hits the enlarged target. We note that the

enlarged targett0 will first be hit along the child trace from a states 2 St�k nSk�1j=0 St�j since

BMC has demonstrated that the target cannot be hit at time0; : : : ; k�1. Furthermore, there

exists ak-step extension to the child trace which hitst for this same reason. Concatenation

of these two traces thus clearly yields a semantically correct trace which hitst.Theorem 8.2.Target enlargement generates a legal netlist.


1. The only gates fabricated by target enlargement are from the synthesis ofBDDk,which are correct by construction.

2. Since the original netlist is finite, and sincek must be finite (we will always obtain

a finite diameter overapproximation for a finite netlist using the algorithm of Fig-

ure 4.2), ourBMC instance will be finite. Furthermore,BDDk must be finite since it

is over a finite number of BDD variables, and it is synthesizedusing a straightforward

translation of one multiplexor per BDD node. ThusN k N 0 is finite.

3. Target enlargement does not alter initial values, hence all initial value cones are com-

binational by assumption.

4. The only logic created by target enlargement is a combinational function over REG-

ISTERs inN – no REGISTER, nor its fanin cone, is affected. Therefore target en-

largement cannot create combinational cycles.

Theorem 8.3. If the diameter of ak-step enlarged targett0 is d(t0), then the original targett is hittable withind(t0) + k time-steps, if at all.

120

Proof. If d(t0) = i, thent0 must be hittable at time0; : : : ; i� 1 if at all, as follows from the

definition of diameter. As per the proof of Theorem 8.1, ift0 is first hit at timej along tracep0, thent must be hittable at exactly timej + k along some tracep0000.Due to the nature of the temporal union in (8.2), and the quantification inherent in

target enlargement, it may be the case that a transition oft0 from 1 to 0 may be skewed

and possibly eliminated with respect to such a transition oft. For example, targett may be

an OR over an arbitrary coneA, and the functioncounter 6� 0 for a mod- counter. The

first hit of t via A may cause thecounterto unconditionally begin counting, such thattwill thereafter only be deasserted one time-step of every time-steps. Target enlargement

may obscure this deasserted time-step such that once hit,t0 will never be deasserted. The

number of time-steps necessary to drive a binary1 to t from any reachable state ofN may

be exponentially lesser than the number necessary to subsequently drive a binary 0 ontot; note thatBMC ensures that the initial value oft is 0. Therefore, target enlargement

does not entail as clean of an impact on diameter as we may hope; we cannot use a target

enlargement approach to bound the diameter of an intermediate component of a partitioned

netlist, for example. However, the result of Theorem 8.3 is sufficient to allow a bound

derived from the target-enlarged netlist to imply an upper-bound on the number of time-

steps sufficient to performBMC in a complete manner for the original target.

8.2 Related Work

There have been several research efforts related to target enlargement. The concept of using

preimage computation to enumerate the target-enlarged states for enhancing forward search

was first proposed by Yang and Dill [101] and independently byYuan et al. [91]. The latter

effort termed this approachretrograde analysis, borrowing from the artificial intelligence

community. Yang and Dill provide a more extensive study of the probabilistic increase of

simulation to hit the enlarged target in [92]. These works also propose ways in which to

121

use the enlarged target to prioritize state traversal. Unlike our approach, these efforts do not

offer structural reduction capability for the forward search since they represent the enlarged

targets as BDDs, nor do they propose the intertwined application of induction or diameter

bounding techniques. Furthermore, their preimage computation uses a transition relation

approach, which limits the size of the design to which it may be applied.

The work of [93] uses the notion ofcontrollabilitywhen isolating a localized cone of

a target for enlargement. Using a transition-relation based approach, they calculate the set

of states for which an arbitrary environment cannot preventthe localized target from being

reached. Due to their compositional approach, they are ableto scale to arbitrarily large

designs in cases as with our technique. However, this notionof controllability weakens the

enlargement potential; they may enumerate only the set of states for which hitting a target

is unavoidable rather than possible due to their compositional approach. For this reason,

our enumeration provides a larger set of states. They additionally do not address reduction

potential, or interaction with diameter bounding or inductive methods.

There are several variations to target enlargement, such astarget look-ahead[98],

which computes exactlyS t�k in formula (8.1). This calculation may be performed using

structural compose-based preimages [99] similarly to the BED-based preimage compu-

tation proposed in [54]. Quantification of FREE variables is performed by a translation

to-and-from BDDs (instead of purely structurally as with [54], which risks exponential in-

crease in structure size), representing the final result as anetlist. However, this approach

lacks the ability to use don’t cares and induction, hence does not offer the reduction or

unreachability capability which is the primary contribution of our technique.

The concept of alighthousemay be viewed as an incomplete target enlargement,

and may either be manually specified [58] or automatically-generated [102]. Like target

enlargement, the use of lighthouses may increase the probability of simulation hitting a

target; however, the incomplete nature of lighthouses precludes any reduction potential.

122


In this section we provide experimental results for our target enlargement approach. All

experiments were run on an IBM ThinkPad model T21 running RedHat Linux 6.2, with

an 800 MHz Pentium III and 256 MB main memory. We set the peak BDD size to217nodes, and cappedBMC (using a structural SAT solver [51]) to 10 seconds per targetwith

an upper-bound of fifty steps.

Our first set of experiments were performed on the ISCAS89 benchmarks. The

results are provided in Table 8.1. Since these netlists haveno specified properties, we

labeled each primary output as a target. Column 1 provides the name of the benchmark. The

next columns provide results for two distinct runs: first a standard run using the techniques

as described in the previous sections, and second a “reduction-only” run which does not

applyBMC to solve the problem. Instead, ifBMC would solve the target ini steps, our

enlargement is performed to depthj = i � 1; if j < 1, we only buildBDD0 in Enlarge.

For the standard run in Column 2 we report the number of targets in the netlist, the number

of targets which are hit, and the number of targets that are proven unreachable. The number

of unreachable results proven with BDDs is provided in parenthesis. Any targets proven

unreachable by SAT use the structural diameter overapproximation algorithm of Figure 4.2.

In Column 3 we report the accumulated size of thecoi’s of unsolved targets in terms of the

number of REGISTERs and FREE vertices, and the number eliminated in the corresponding

enlarged cones. In other words, after the semi-colon we report the sum of thecoi size

of each unsolved target, and before the semi-colon we reportthe number of REGISTERs

and FREE vertices of the corresponding un-enlarged cones which wereeliminated by the

enlargement. Column 4 reports the average number of secondsspent per target, and the

peak memory usage. For the reduction-only run we reportcoi sizes and reduction results

(similar to Column 3) in Column 5.

123

Standard Run Reduction-Only RunDesign jT j; Hit; jRj (jIj) Time/jT j (s); jRj (jIj)

Unrch (BDDs) Eliminated ; Sum Memory (MB) Eliminated ; SumPROLOG 73 ; 69 ; 4 (0) 0 (0); 0 (0) 0.07 ; 15 146 (126); 2044 (1438)S1196 14 ; 14 ; 0 (0) 0 (0); 0 (0) 0.08 ; 12 24 (56); 88 (196)S1238 14 ; 14 ; 0 (0) 0 (0); 0 (0) 0.08 ; 12 24 (56); 88 (196)S1269 10 ; 10 ; 0 (0) 0 (0); 0 (0) 0.10 ; 15 289 (145); 296 (152)S132071 152 ; 131 ; 12 (9) 26 (0); 527 (18) 1.02 ; 107 3155 (302); 24244 (2172)S1423 5 ; 5 ; 0 (0) 0 (0); 0 (0) 0.13 ; 15 2 (0); 278 (69)S1488 19 ; 18 ; 0 (0) 0 (0); 6 (8) 0.74 ; 23 0 (0); 114 (152)S1494 19 ; 18 ; 0 (0) 0 (0); 6 (8) 0.89 ; 23 0 (0); 114 (152)S1512 21 ; 10 ; 0 (0) 8 (8); 437 (283) 8.22 ; 24 135 (93); 837 (543)S158501 150 ; 135 ; 8 (1) 451 (54); 1450 (174) 0.68 ; 63 2425 (321); 9683 (1301)S2081 1 ; 1 ; 0 (0) 0 (0); 0 (0) 0.27 ; 15 0 (0); 8 (10)S27 1 ; 1 ; 0 (0) 0 (0); 0 (0) 0.23 ; 12 0 (0); 3 (4)S298 6 ; 6 ; 0 (0) 0 (0); 0 (0) 0.12 ; 15 22 (6); 54 (18)S3271 14 ; 14 ; 0 (0) 0 (0); 0 (0) 0.11 ; 15 0 (0); 1248 (339)S3330 73 ; 73 ; 0 (0) 0 (0); 0 (0) 0.08 ; 15 146 (125); 2044 (1442)S3384 26 ; 26 ; 0 (0) 0 (0); 0 (0) 0.09 ; 15 26 (25); 2587 (425)S344 11 ; 10 ; 1 (0) 0 (0); 0 (0) 0.09 ; 15 6 (2); 129 (75)S349 11 ; 10 ; 1 (0) 0 (0); 0 (0) 0.09 ; 15 3 (1); 126 (74)S35932 320 ; 320 ; 0 (0) 0 (0); 0 (0) 2.01 ; 105 0 (0); 331776 (11200)S382 6 ; 6 ; 0 (0) 0 (0); 0 (0) 1.71 ; 15 34 (6); 96 (18)S385841 304 ; 301 ; 1 (0) 0 (0); 1377 (24) 1.54 ; 88 17925 (458);105273 (2564)S386 7 ; 7 ; 0 (0) 0 (0); 0 (0) 0.07 ; 14 0 (1); 42 (43)S400 6 ; 6 ; 0 (0) 0 (0); 0 (0) 1.72 ; 15 34 (6); 96 (18)S4201 1 ; 1 ; 0 (0) 0 (0); 0 (0) 0.25 ; 15 0 (0); 16 (18)S444 6 ; 6 ; 0 (0) 0 (0); 0 (0) 1.80 ; 15 34 (6); 96 (18)S4863 16 ; 16 ; 0 (0) 0 (0); 0 (0) 0.09 ; 15 0 (0); 1664 (784)S499 22 ; 22 ; 0 (0) 0 (0); 0 (0) 0.09 ; 16 0 (0); 484 (22)S510 7 ; 4 ; 0 (0) 0 (0); 18 (57) 6.64 ; 25 0 (0); 42 (133)S526N 6 ; 2 ; 0 (0) 8 (2); 64 (12) 10.44 ; 27 10 (2); 96 (18)S5378 49 ; 47 ; 1 (1) 4 (0); 164 (33) 0.59 ; 26 165 (37); 7087 (1456)S635 1 ; 0 ; 0 (0) 0 (0); 32 (2) 18.23 ; 15 0 (0); 32 (2)S641 24 ; 23 ; 1 (1) 0 (0); 0 (0) 0.09 ; 15 64 (64); 319 (338)S6669 55 ; 55 ; 0 (0) 0 (0); 0 (0) 0.08 ; 15 16 (0); 3061 (1466)S713 23 ; 22 ; 1 (1) 0 (0); 0 (0) 0.10 ; 15 64 (64); 304 (323)S820 19 ; 19 ; 0 (0) 0 (0); 0 (0) 0.23 ; 13 0 (0); 90 (324)S832 19 ; 19 ; 0 (0) 0 (0); 0 (0) 0.25 ; 13 0 (0); 90 (324)S8381 1 ; 1 ; 0 (0) 0 (0); 0 (0) 0.29 ; 15 0 (0); 32 (34)S92341 39 ; 37 ; 2 (0) 0 (0); 0 (0) 0.06 ; 16 146 (24); 1786 (317)S938 1 ; 1 ; 0 (0) 0 (0); 0 (0) 0.33 ; 15 0 (0); 32 (34)S953 23 ; 23 ; 0 (0) 0 (0); 0 (0) 0.13 ; 15 23 (8); 143 (288)S967 23 ; 23 ; 0 (0) 0 (0); 0 (0) 0.14 ; 15 23 (8); 143 (288)S991 17 ; 17 ; 0 (0) 0 (0); 0 (0) 0.08 ; 13 64 (564); 67 (629)

Table 8.1: Target enlargement results for ISCAS89 benchmarks

124

As indicated in Table 8.1, our techniques solve most targetsregardless of netlist size,

1575 of 1615, whether reachable or not. Refer to Table 6.5 forthe size of these netlists.

Though the “difficulty” of these targets is unknown, this is an indication of the robustness of

our approach. For netlists with unsolved targets, we achieve an average reduction per netlist

of 5.3% in REGISTER count and 5.0% in FREE vertex count, and a cumulative reduction

of 12.2% for REGISTERs and 10.3% for FREE vertices. Our reduction-only run yields an

average reduction per netlist of 13.9% in REGISTERs and 13.0% in FREE vertices.

In Table 8.2 we provide a similar analysis for randomly-selected targets from the

IBM Gigahertz Processor (GP), after performing phase abstraction (refer to Chapter 10).

Most targets, 254 out of 284, are solved; refer to Table 6.6 for the size of these netlists.

We achieve an average reduction per netlist of 12.1% in REGISTERs and 11.1% in FREE

vertices. The reduction-only run yields an average reduction per netlist of 54.9% in REG-

ISTERs and 54.8% in FREE vertices, and a cumulative reduction of 70.6% of REGISTERs

and 69.5% of FREE vertices.

We now discuss several results in more detail. IIBBQn is a large table-based netlist.

Forward reachability analysis of the redundancy removed [51] cone of a single unreach-

able target with a diameter of three (comprising 442 REGISTERs and 134 FREE vertices)

requires 172.3 seconds and 25 MB with a MLP [84] algorithm, with sift variable reordering

enabled and a random initial order. Ourcompose-basedsearch requires 34.7 seconds and

16 MB for the same BDD conditions. After one step of enlargement, the cone drops to 380

REGISTERs and 132 FREE vertices; the second step solves the target.

Netlist L FLUSHn is primarily acyclic; less than 5% of its REGISTERs are elements

of directed cycles. For one target with 38 REGISTERs and 47 FREE vertices, reachability

analysis of the redundancy-removed [51] target with MLP requires 1.20 seconds and 11

MB. Redundancy removal plus retiming [10] with MLP solves the target in 0.60 seconds

with 13 MB. Compose-basedsearch requires 0.50 seconds and 9 MB. The first two steps

of enlargement of this target reduce it to 4 then 2 REGISTERs, and 3 then 2 FREE vertices,

125

Standard Run Reduction-Only RunDesign jT j; Hit; jRj (jIj) Time/jT j (s); jRj (jIj)

Unrch (BDDs) Eliminated ; Sum Memory (MB) Eliminated ; SumCP RAS 2 ; 2 ; 0 (0) 0 (0); 0 (0) 0.61 ; 19 1 (0); 554 (131)CLB CNTL 2 ; 2 ; 0 (0) 0 (0); 0 (0) 0.24 ; 15 0 (0); 84 (12)CR RAS 1 ; 0 ; 0 (0) 0 (0); 401 (99) 3.55 ; 24 0 (0); 401 (99)D DASA 2 ; 2 ; 0 (0) 0 (0); 0 (0) 0.24 ; 15 11 (17); 20 (25)D DCLA 2 ; 1 ; 1 (1) 0 (0); 0 (0) 7.65 ; 44 273 (67); 469 (133)D DUDD 22 ; 14 ; 8 (6) 0 (0); 0 (0) 1.15 ; 25 491 (353); 1009 (725)I IBBQn 15 ; 8 ; 7 (0) 0 (0); 0 (0) 0.28 ; 60 190 (30); 2169 (437)I IFAR 2 ; 2 ; 0 (0) 0 (0); 0 (0) 0.31 ; 16 8 (0); 101 (35)I IFPF 1 ; 1 ; 0 (0) 0 (0); 0 (0) 2.72 ; 40 745 (152); 746 (154)L3 SNP1 5 ; 4 ; 1 (0) 0 (0); 0 (0) 1.21 ; 22 7 (0); 595 (164)L EMQn 1 ; 0 ; 1 (1) 0 (0); 0 (0) 11.57 ; 18 127 (89); 127 (89)L EXEC 2 ; 2 ; 0 (0) 0 (0); 0 (0) 0.47 ; 18 433 (200); 433 (200)L FLUSHn 7 ; 6 ; 1 (0) 0 (0); 0 (0) 0.11 ; 12 128 (170); 165 (222)L INTRo 30 ; 24 ; 6 (0) 0 (0); 0 (0) 0.06 ; 12 750 (626); 830 (672)L LMQo 16 ; 0 ; 8 (8) 0 (0); 2592 (1512) 14.01 ; 39 2568 (1512); 5160 (3024)L LRU 12 ; 5 ; 7 (7) 0 (0); 0 (0) 6.27 ; 19 721 (192); 721 (192)L PFQo 67 ; 0 ; 67 (66) 0 (0); 0 (0) 10.99 ; 77 10318 (3036); 10318 (3036)L PNTRn 31 ; 0 ; 31 (8) 0 (0); 0 (0) 2.92 ; 19 1057 (1023); 1057 (1023)L PRQn 10 ; 0 ; 8 (2) 24 (8); 36 (12) 0.30 ; 19 42 (16); 54 (20)L SLB 3 ; 1 ; 2 (0) 0 (0); 0 (0) 0.16 ; 15 1 (1); 61 (29)L TBWKn 21 ; 1 ; 3 (3) 2 (0); 291 (238) 17.07 ; 26 36 (28); 342 (280)M CIU 6 ; 1 ; 5 (0) 0 (0); 0 (0) 0.24 ; 18 775 (60); 775 (60)SIDECAR 4 1 ; 0 ; 0 (0) 1 (0); 137 (13) 18.64 ; 27 1 (0); 137 (13)S SCU1 3 ; 2 ; 1 (1) 0 (0); 0 (0) 0.66 ; 24 386 (142); 579 (213)V CACH 1 ; 0 ; 1 (1) 0 (0); 0 (0) 0.61 ; 16 86 (21); 86 (21)V DIR 2 ; 2 ; 0 (0) 0 (0); 0 (0) 0.20 ; 15 33 (16); 33 (16)V SNPM 2 ; 1 ; 1 (0) 0 (0); 0 (0) 1.27 ; 32 905 (266); 905 (266)W GAR 7 ; 6 ; 0 (0) 4 (0); 86 (37) 2.43 ; 20 4 (0); 500 (224)W SFA 8 ; 8 ; 0 (0) 0 (0); 0 (0) 0.08 ; 15 42 (21); 112 (56)

Table 8.2: Target enlargement results for GP netlists

respectively. The third step hits the target.

One target of netlist S158501 comprises 476 REGISTERs and 55 FREE vertices.

MLP-based analysis is infeasible on this cone, even after redundancy removal [51] plus

retiming [10] which yields 397 REGISTERs. However, the first five steps of structural

enlargement of this target reduce it to 475, 38, 36, 35, and finally 24 REGISTERs, and to

55, 55, 14, 13, and 13 FREE vertices, respectively. MLP-based forward reachability hits

the 5-step-enlarged target in 10 iterations with a combinedeffort of 2.5 seconds and 23

MB. The only other approach that is able to hit this target is a15-stepBMC which requires

7.3 seconds and 14 MB; if unreachable,BMC would not have been applicable. Traditional

approaches of target enlargement would be ineffective on this netlist since they do not offer

reduction capability, without which the enlarged target remains infeasibly complex.

126

Chapter 9

C-Slow Abstraction

In this chapter we discuss our generalized -slow abstraction techniques, extending the re-

sults of collaborative work with Anson Tripp, Adnan Aziz, Vigyan Singhal, and Flemming

Andersen reported in [17]. The goal of this abstraction is toreduce REGISTER count and

diameter; in doing so, we often benefit BDD-based algorithmsdue to reducing variable

count, which often reduces their size and reordering time. Further, the removal of REG-

ISTERs allows “collapsing” of adjacent logic cones to a single combinational cone, which

increases the domain of applicability of combinational redundancy removal techniques,

thereby helping to enable a smaller netlist graph which benefits arbitrary algorithms. How-

ever, this elimination of REGISTERs also risks explosion of BDD sizes representing these

composite cones; this abstraction thus has the potential toharm BDD-based analysis. Nev-

ertheless, as our experiments demonstrate, this abstraction tends to enhance BDD-based

analysis; refer to Section 9.3 for a more detailed discussion of this topic.

Leiserson and Saxe [68, 66] define a -slow netlistN as one which is retiming-

equivalent (i.e., may be made structurally equivalent through retiming, ignoring initial

value cones) to another netlistN 0, where the number of REGISTERs along each net ofN 0 is a multiple of . NetlistN 0 may be viewed as having equivalence classes of REGIS-

TERs; those in classi may only fan out to those in class(i + 1) mod . Each equivalence

127

class of REGISTERs ofN 0 contains data from an independent stream of execution, and data

from two independent streams may never arrive at any vertex concurrently. Intuitively, it is

this property which allows -slow abstraction to “fold” such designs to a smaller domain

of a single coloring of REGISTERs – rendering a netlist where each vertex may be a func-

tion of each data source at each time-step. They demonstratehow designs may be made

systolic1 throughslowdown(increasing ) and retiming, and how this process may signif-

icantly benefit the clock period of such designs through reducing their maximum-length

combinational path.

Definition 9.1. A c-slow netlist, for > 1, is one whose gates may be -colored. We denote

the coloring functionC : V 7! 0; : : : ; � 1, defined as follows.

1. If the color of REGISTER r isC(r), then the color of each REGISTERv in the com-

binational fanout ofr is�C(r) + 1� mod .

2. If the color of REGISTER r is C(r), then the color of each non-REGISTER v in the

combinational fanout ofr isC(r).3. If the color of REGISTERr isC(r), then the color of each gatev in the combinational

fanin of inlist(r) is�C(r)� 1 + � mod .

4. If the color of REGISTERr isC(r), then the color ofZ(r) isC(r).5. The color of each target is � 1.

The last two rules of Definition 9.1 are additions to the definition of [66] to fit our

verification paradigm. In Definition 9.4 we generalize this definition to ensure that these

conditions, and others, do not preclude the application of -slow abstraction. For optimality

of reduction, we wish to find the maximum consistent with this definition; if = 1,

1Informally, a systolic netlist is one in which predefined SCCclusters connect to others only through pathsof strictly positive sequential weight.

128

then the design is not -slow. This definition lends itself to a simple linear-time coloring

algorithm for determining a maximal which will be provided in Figure 9.7.

Consider the 3-slow netlistN depicted in Figure 9.1. We label vertices according

to their color: e.g., REGISTERs Ri have colori. NetlistN is defined by the following

expressions:_p(VA2; i) = f2� _p(VF2; i); _p(I2; i)�; _p(VB0; i + 1) = _p(VA2; i); _p(VC0; i) =f0� _p(VB0; i); _p(I0; i)�; _p(VD1; i + 1) = _p(VC0; i); _p(VE1; i) = f1� _p(VD1; i); _p(I1; i)�; and_p(VF2; i + 1) = _p(VE1; i). Through unrolling, we obtain the expression_p(VF2; i + 3) =f1�f0(f2( _p(VF2; i); _p(I2; i)); _p(I0; i+1)); _p(I1; i+2)�. This illustrates thatVF2 is a func-

tion with modulo-3 feedback. Similar analysis demonstrates that all nets within the SCC

have this property.

f2f1R1f0 VE1VD1VC0VA2 VB0

R2R0 VF2I0I1

I2Figure 9.1: Example three-slow netlistN

Consider netlistN 00 depicted in Figure 9.2, and netlistN 0 depicted in Figure 9.3,

which collectively comprise our -slow abstraction ofN . We useN 0 as aninitialization

structurefor N 00 which is therecurrence structureof the -slow abstracted netlist.

NetlistN 00 depicted in Figure 9.2 represents the recurrence structureof N . We ob-

tain the expressions_p(V 00A2; i) = f2� _p(V 00F2; i); _p(I 002 ; i)�; _p(V 00B0; i) = _p(V 00A2; i); _p(V 00C0; i) =f0� _p(V 00B0; i); _p(I 000 ; i)�; _p(V 00D1; i) = _p(V 00C0; i); _p(V 00E1; i) = f1� _p(V 00D1; i); _p(I 001 ; i)�; and fi-

nally _p(V 00F2; i + 1) = _p(V 00E1; i). We additionally define_p(V 00F2; 0) = _p(V 0F1; 0), the latter

of which is defined in the initialization structureN 0 depicted in Figure 9.3. Through un-

rolling, we obtain _p(V 00F2; i + 1) = f1�f0(f2( _p(V 00F2; i); _p(I 002 ; i)); _p(I 000 ; i)); _p(I 001 ; i)�. This

129

f0 f1f2 V 00E1

V 00B0V 00A2I 002I 000I 001V 00D1V 00C0 R002 V 00F2

Figure 9.2: Abstracted three-slow netlistN 00: recurrence structure

f0 f1V 0A2f2 Z(R0)0 V 0E1V 0D1Z(R1)0 Z(R2)0

01 01 V 0F2I 01m0I 02I 00m1 n2n1

V 0B0 V 0C0n0

Figure 9.3: Abstracted three-slow netlistN 0: initialization structure

illustrates thatV 00F2 is a function with modulo-1 feedback in the abstracted netlist.

The initialization structure ofN 00 is N 0, depicted in Figure 9.3. We obtain the

expressions_p(V 0A2; i) = f2� _p(V 0F2; i); _p(I 02; i)�; _p(V 0B0; i) = _p�(Z(R0)0; i�; _p(V 0C0; i) =f0� _p(V 0B0; i); _p(I 00; i)�; _p(V 0D1; i) = ite� _p(n1; i); _p(Z(R1)0; i); _p(V 0C0; i)�; _p(V 0E1; i) = f1�_p(V 0D1; i); _p(I 01; i)�; and finally _p(V 0F2; i) = ite� _p(n2; i); _p(Z(R2)0; i); _p(V 0E1; i)�. TermZ(Rj)0 represents a copy of the initial value coneZ(Rj) fromN .

Definition 9.2. A -slow abstractionis a structural transformation of a -slow netlistN as

follows. We first preprocess the targets of the netlist to ensure that they are color � 1.

For each targett 2 T , if C(t) = � 1 then no action is necessary. Otherwise, we create a

sequence of � 1� C(t) REGISTERs with initial values of ZERO connected in series, the

first being sourced byt, and re-label the last REGISTER in the sequence as the target.

130

We next create two correspondents for each vertex inN : one for an initialization

structureN 0, and one for the sequential recurrence structureN 00. We createN 0 as follows.� We replace each non-REGISTERgatev by an identical gatev0 in N 0.� We replace each REGISTERv of colorC(v) = 0 by a 1-input AND gate sourced by

the correspondent ofZ(v).� We replace each REGISTER v of color C(v) 6= 0 by a multiplexorv0 in N 0. The

“then” input is driven by the correspondent ofZ(v). The “else” input is driven by

the correspondent ofinlist(v). The selector is driven bynC(v), defined as follows.

– We createi = dlog2( )e new FREE verticesm0; : : : ; mi�1.– nj = �unsigned(m0; : : : ; mi�1) � j� for each0 < j < .– n0 = V �1j=1 :nj.

We createN 00 as follows. Eacht00 corresponding tot 2 T will be an abstracted

target.� We replace each non-REGISTERgatev by an identical gatev00 in N 00.� We replace each REGISTERv of colorC(v) 6= � 1 by a 1-input AND gatev00.� We replace each REGISTERv of colorC(v) = � 1 by another REGISTERv00. The

initial value ofv00 is v0 fromN 0.Intuitively, the nondeterministic valuesni allow us to initialize the REGISTERs ofN 00 with the set of all valuations that the corresponding REGISTERs of N could take at

times0; : : : ; �1 by selecting an initial value of a specific color. Thereafter, straightforward

reachability analysis will ensure correspondence ofN and ~N = N 0 k N 00. This allows the

subsequent verification flow to select any of these initial values that may be necessary to hit

131

a target; unreachability may be assessed only when it is determined that no color of initial

values may hit the target. Note that we only explicitly use valuesnj for 0 < j < to signify

the selection of initial values of colorj; n0 signifies that none of thesej colors have been

selected hence implicitly selects the initial value of color 0. To simplify the subsequent

verification task, we normalize all targets to be color � 1. For optimality, we prefer a

coloring from the set of possible colorings implied by Definition 9.1 which assigns � 1to as many targets as possible. However, this normalizationentails a small overhead since

the added pipeline REGISTERs are sinkless, hence may be trivially removed by retiming or

structural target enlargement.

Generalizing our analysis from 3-slow to arbitrary -slow netlists, we obtain the

following expression for the color- � 1 verticesV �1. We define the numerical sequencea0; : : : ; a �1 asai = � 1� i._p(V �1; j) = 8>>>>><>>>>>:ff � 1g� _p(Z(R �1); 0); _p(I �1; 0)� : j = 0ffa0g� : : : (ffajg( _p(Z(Raj ); 0); _p(Iaj ; 0));: : : ); _p(Ia0 ; j)� : 0 < j < (9.1)

In formula (9.1), the first sequence represents a nesting offfa0g�ffa1g(: : : (ffajg. We

use this same sequencing in (9.3). The second sequence represents the closing of these

functions with the corresponding FREE vertices _p(Iaj ; 0)); : : : ); _p(Ia1 ; j � 1)); _p(Ia0 ; j)�.In (9.3), the FREE vertex ordering is identical, but the temporal arguments are all 0._p(V �1; i+ ) = ffa0g� : : : (ffa �1g(ff � 1g( _p(R �1; i); _p(I �1; i));_p(Ia �1 ; i+ 1)); : : : ); _p(Ia0 ; i+ )� (9.2)

In formula (9.2), the first sequence representsffa0g�ffa1g(: : : (ffa �1g. We use this

same sequencing in (9.4). The second sequence represents the closing of these functions

with the corresponding FREE vertices _p(Ia �1 ; i+1)); : : : ); _p(Ia1 ; i+ � 1); _p(Ia0 ; i+ )�.In (9.4), the vertex ordering is identical, but the temporalarguments arei; i; : : : ; i; i+ 1.

132

After -slow abstraction, we obtain the following._p(V 00 �1; 0) = � n �1 ^ ff � 1g( _p(Z(R �1)0; 0); _p(I 00 �1; 0)) � _ �1_j=1� naj ^ ffa0g� : : : (ffajg( _p(Z(Raj )0; 0); _p(I 00aj ; 0));: : : ); _p(I 00a0 ; 0)�� (9.3)_p(V 00 �1; i+ 1) = ffa0g� : : : (ffa �1g(ff � 1g( _p(R00 �1; i); _p(I 00 �1; i));_p(I 00a �1 ; i); : : : ); _p(I 00a0 ; i+ 1)� (9.4)

The key observations that follow from formulas (9.1 - 9.4) are the following.� As demonstrated by (9.2), valuations toV �1 of the -slow netlist at timei + and

greater are a function ofR �1 only from the -th predecessor time-step, and of each

FREE vertex at most once during the previous time-steps.� As demonstrated by (9.4), valuations toV 00 �1 of the abstracted netlist at timei+1 and

greater are a function ofR00 �1 only from the previous time-step, and of each FREE

vertex from the previous time-step.� Since exactly one of thenj terms are asserted at any time, (9.1) and (9.3) demonstrate

that the initial states of the abstracted netlist correspond to all reachable valuations toV �1 at times0; : : : ; � 1.

These observations demonstrate that valuations toV �1 andV 00 �1 directly correspond

with a time-folding modulo- . Based upon this analysis, we introduce our notion of -slow

bisimilarity.

Definition 9.3. A -slow bisimulation relation2 with respect tobisimilar vertex setsA andA00, whereC(A) = � 1, holds between -slow netlistN and its abstraction~N = N 0 k2Though in this thesis we only require trace equivalence for invariant checking, as we demonstrate in [17]

this abstraction preserves a type of bisimilarity.

133

N 00, respectively, iff there exists a bijective mapping : A 7! A00 which satisfies the

following conditions.

1. 8p 2 P:8i < :9~p 2 ~P:8j 2 N :8a 2 A: p(a; i+ j � ) = ~p� (a); j�2. 8~p 2 ~P:9p 2 P:9i < :8j 2 N :8a 2 A: p(a; i+ j � ) = ~p� (a); j�

We restrict -slow bisimilarity to color- � 1 vertices so that we may directly use

the results of (9.1-9.4). Without this restriction, additional effort is necessary to map~Nto and fromN with respect to initial states. Practically, since we may “rotate” coloring,

and since we “pad” targets to make them color- � 1, this restriction is not limiting for

invariant checking.

Lemma 9.1. If ~N = N 0 k N 00 is a -slow abstraction ofN , thenN is -slow bisimilar to~N with respect to any corresponding vertex setsA andA00 such thatC(A) = � 1.

Proof. From (9.1) and (9.3), for any valuation toA � V �1 reachable at timei < in

tracep, there exists an equivalent valuation ofA00 � V 00 �1 at time 0 in trace~p which hasn �1�i = 1.

The correspondence of transitions ofN to single transitions of~N follows from

(9.2) and (9.4). In particular, for anyp andi < there exists a~p with ~p(n �1�i; 0) = 1 such

that valuations toV �1 at timesi+j � andi+k � in p are equivalent to valuations toV 00 �1 at

timesj andk in ~p for all j andk. Similarly, for every~p there exists ani : ~p(n �1�i; 0) = 1and ap such that valuations toV �1 at timesi + j � andi + k � in p are equivalent to

valuations toV 00 �1 at timesj andk in ~p.We now discuss generalizations of netlist topologies suitable for -slow abstraction,

using a type of logic replication. We note that Definition 9.1is overly restrictive from the

view-point that many REGISTERs are likely to have ZERO as their initial values. However,

if two REGISTERs of different colors have ZERO as their initial values, then the netlist is

not -slow by this definition. We clearly could replicate ZERO for each colora, and use

134

the properly-colored copy ZEROa for each corresponding initial value to create a -slow

netlist without altering semantics. Due to this example, and by the observation that no

vertex in the netlist may ever be a function of more than one color of initial value anyway,

we may wish to globally relax Rule 4. However, this relaxation would be unsound with

our abstraction. For example, assume that the initial valueof a color-0 REGISTER is FREE

vertexv, which also fans out to functionf1 of color 1. If we perform -slow abstraction on

this netlist, due to its temporal folding we could miss a certain hit of a target which requiresv as the initial value of the REGISTER to be a 1 at time 0, and also requiresv as the source

of f1 to be 0 at time 1.

We therefore introduce a preprocessing transformation of the netlist which allevi-

ates these restrictions, and enables a generalization of the -slow topology for verification

purposes. As we discussed in [17], we may readily relax Rule 3of Definition 9.1 and

yield a bisimilar model by replicating multi-colored combinational cones. This relaxation

allows combinational cones of logic to fan out to multiple colors of REGISTERs. We now

formalize this concept, and extend it to handle multi-colored sequential cones.

Definition 9.4. A generalized -slow netlistis one in which each directed cycle has sequen-

tial weight which is a non-zero multiple of > 1. Given a generalized -slow netlistN , we

may attribute a hypercoloringC : V 7! 2f0;:::; �1g, defined as follows.

1. If REGISTER r has colora 2 C(r), then each gatev in the combinational fanin of

inlist(r) has color(a� 1 + ) mod 2 C(v).2. If non-REGISTER v has colora 2 C(v), then each gateu in the combinational fanin

of inlist(r) has colora 2 C(v).3. If REGISTERr has colora 2 C(r), then the initial value ofr has colora 2 C�Z(r)�.

For optimality, we wish to minimally hypercolor a generalized -slow netlist; an

efficient coloring algorithm is presented in Figure 9.7. We may obtain a -slow netlistN135

from a generalized -slow netlistN using the algorithm of Figure 9.4. This preprocessing

transformation is itself sound and complete for a generalized -slow netlist; however, since

it often increases vertex count of each type, it is not usefulas a standalone transformation.

Therefore, we introduce this transformation only as a preprocessing to enable generalized -slow abstraction, which will offset its potential increase in REGISTER count. Note that

we develop a vertex mappingV in addition to coloringC through this preprocessing.

Lemma 9.2. If N was obtained via a preprocessing of generalized -slow netlistN , thenN is legal and is a -slow netlist.

Proof. We first prove thatN is legal. Note thatN is legal by assumption. We consider the

requirements for legality enumerated in Definition 3.24.

1. The preprocessing step only generates legal gates. By thehypercoloring of Defini-

tion 9.4, the indegree of each gateua generated by the preprocessing is identical to

that ofu such thatV (ua) = u. Padding REGISTERs are also legal by construction.

2. The number of gates ofN is at most � (jVj + 2 � jT j), and we may ensure that is

finite for any legal netlist, henceN is finite.

3. The only initial values generated by preprocessing are either ZERO which is com-

binational, or are replications of initial values ofN hence are combinational by as-

sumption.

4. Any directed cycleA of a generalized -slow netlist has sequential weight ofi � for somei > 0 by assumption. By the hypercoloring rules of Definition 9.4,every

correspondent of cycleA in the preprocessed netlist will have the same sequential

weight as the originalA.

136

h Netlist, C, V i PreprocessC Slow(Netlist N , Hypercoloring C) fC = V = V = E = G = Z = T = ;;foreach u 2 V f

foreach a 2 C(u) fadd new vertex ua to V;G(ua) = G(u); V (ua) = v; C(ua) = a;gg

foreach u 2 V fforeach v 2 inlist(u) f

foreach a 2 C(u) fb = color of v consistent by Definition 9.1w.r.t. u having color a;

add edge (vb; ua) to E;gggforeach u 2 R f

foreach a 2 C(u) fZ(ua) = fva : �V (va) � Z(u)� ^ �C(va) � a�g;ggforeach t 2 T fu = t = fmax-colored v : V (v) � tg;

for�a = C(t)+1; a � � 1; a++� f

create REGISTER v; C(v) = a; Z(v) = ZEROa;add edge (u; v) to E;u = v;gT = T [ u;gN = hhV; Ei; G; Z; T i;

return hN;C; V i;gFigure 9.4: Algorithm for preprocessing generalized -slow netlists

137

We next prove thatN is -slow. Every edge added toN maintains the satisfaction

of the coloring rules of Definition 9.1 by construction. Additionally, we guarantee that the

color of initial values of REGISTERs are equivalent to the corresponding REGISTERs, and

that our set of targets are of color � 1.

Lemma 9.3. If N was obtained via a preprocessing of generalized -slow netlistN , then

vertex setA of N is trace-equivalent to any corresponding vertex setA of N such that8a; a0 2 A:C(a) = C(a0).Proof. The intuition of this proof is that each “clone”vj of a given vertexv 2 A for a given

colorj is trace-equivalent tov. Because every directed cycle ofN hasmod -c REGISTERs,

no vertex inN can “observe” the replication inherent in preprocessing.

We first assume that no REGISTERs are replicated. Take any color-j setA derived

via preprocessing fromA. Because every directed cycle has modulo- REGISTERs, it fol-

lows that valuations toVj at timei + are exclusively functions ofRj at timei, and FREE

vertices at time-steps duringi; : : : ; i+ similarly to (9.2). Unlike the -slow case, for gen-

eralized -slow netlists, a given multi-colored FREE vertexv may appear in this expression

during multiple time-stepsi; : : : ; i + � 1. However, sincev may take valuations at each

time-step independently of other time-steps, clearly all possible transitions of valuations

to Vj from time-stepi to i + are trace-equivalent to those possible if we replicate the

multi-colored FREE vertices into one copy per color, and replace each color-a occurrence

of v with va, the “properly-colored” copy ofv. Therefore, we conclude that sequences of transitions out of any state are trace-equivalent with respect toA andA. Also, similarly

to (9.1), it follows that valuations toVj at timei < are a function only of initial values of

REGISTERs Rj�i and FREE vertices. Using the above analysis, any cross-dependencies or

lack thereof between initial value cones and the fanin cone of A are preserved inA because

the initial value of a REGISTER must have the same color as that REGISTER. We there-

fore conclude that preprocessing preserves trace-equivalence, provided that no REGISTER

was replicated. This result is similar to Lemma 7.1 for cut-based abstraction; -slow pre-

138

processing effectively replaces semantic cuts of formulas(9.1 - 9.2) with trace-equivalent

cones. In the -slow case, these semantic cuts are on a per-color basis.

We next eliminate the restriction that no REGISTER is replicated in preprocessing.

Because each clone of a REGISTER is of a distinct color, and has an identically-colored

initial value, we note that the initial value of eachr 2 R corresponds tor 2 R : V (r) = r.Therefore, we may conclude that time-0 valuations toA, determined uniquely by initial

values ofR and FREE vertices of colorC(A), are trace-equivalent to those ofA since these

valuations are a function of a single color. Valuations to the color-j vertices at timei define

the state of the color-(j + 1) mod REGISTERs at timei + 1. We therefore conclude by

a simple inductive argument that -slow preprocessing preserves trace-equivalence for all

color-j vertices for anyj. This lemma follows by assigningj = C(A).Lemma 9.3 implies that, regardless of how many copies we makeof a targett, we

need only verify one – the one which will be preprocessed to have color �1. We introduce

our -slow trace lifting algorithm in Figure 9.5, which is the last necessary component to

demonstrate soundness and completeness of -slow abstraction.

Theorem 9.1.C-slow abstraction is sound and complete for invariant checking.

Proof. A target unreachableresult will be generated only if the abstracted target~t is proven

to be unreachable by a child verification flow. We first note that padding REGISTERs with

initial values of ZERO onto a target does not affect its unreachability. Second, wenote

that preprocessing preserves trace-equivalence with respect to any color of vertices and

hence invariant checking as per Lemma 9.3. Third, -slow abstraction guarantees that

every valuation to a color- � 1 vertex is preserved through -slow abstraction due to -slow bisimilarity as per Lemma 9.1. Therefore,unreachableresults are correct.

A target hit result, accompanied by a trace demonstrating a hit of the target, will

be generated only when an abstracted target~t is hit by a child verification flow. By as-

sumption, the corresponding tracep0 is semantically correct and hits~t. We first note that

139

Partial Trace Lift Trace(Partial Trace p0) fcomplete p0 up to its length with Simulate;n = fi : p0(ni; 0) = 1g;i = � 1� n;foreach v 2 V f

if�n � C(v) < � 1� fp = p [ h�v; C(v)� n�; p0(v0; 0)i;gg

for�j = 0; j < length(p0); j++� fforeach v 2 V fk = �C(v) + 1� mod ;p = p [ h(v; � j + i+ k); p0(v00; j)i;gg

return p;gFigure 9.5:C-Slow trace lifting algorithm

any trace hitting a REGISTER-padded target must previously hit the un-padded target. By

Lemma 9.3, preprocessing preserves trace-equivalence. ByLemma 9.1,~t will be hittable

iff t is hittable. Thus our obligation is only to demonstrate thattrace lifting yields a seman-

tically correct trace.

The lifting of values fromN 0 for the firsti time-steps directly reflects (9.1) and (9.3),

hence is consistent withN . From (9.2) and (9.4), the effect of -slow abstraction is to fold

time modulo- , hence our lifting of values fromN 00 multiplies time by . The addition ofiaccounts for the temporal folding of initial states from (9.1) and (9.3), and correlates to the

bisimilarity offset of Definition 9.3. The addition ofk is necessary for the generalization

of (9.2) and (9.4) to arbitrary-colored vertices.

Note that a vertex which was not replicated will only have valuations inp at most

once per consecutive time-steps. For a vertexv that was replicated fork colors, we will

attaink valuations tov per consecutive time-steps. The correctness of the trace lifting

given in the presence of replications follows from the trace-equivalence demonstrated in

Lemma 9.3. Therefore, we conclude that our lifted trace is semantically correct.

140

Theorem 9.2.A -slow abstracted netlist is a legal netlist.

Proof. By Lemma 3.24, preprocessing yields a legal netlist. We therefore need only prove

that the -slow abstraction procedure yields a legal netlist. We consider the requirements

for legality enumerated in Definition 3.24.

1. The only gates generated by -slow abstraction other than cloned vertices are one-

or-more input AND gates, one-input INVERTERs, FREE vertices, and multiplexor

structures, all of which are legal by construction.

2. For bothN 0 andN 00, each gate ofN is either cloned, or translated into a buffer or

multiplexor. Theni logic requires at most 2 AND and FREE and INVERTER gates.

We may guarantee that is finite, hence the abstracted netlist will be finite.

3. The only REGISTERScreated in the abstracted netlist have initial values defined byN 0, which is a purely combinational structure. Hence all initial values will be com-

binational.

4. Every directed cycle initially contains at least one REGISTER of each color. N 0contains no cycles since color-0 REGISTERs have their initial values inlined, which

are combinational and acyclic by assumption. Every cycle inN 00 will contain at least

one preserved REGISTERof color- �1. SinceN 0 andN 00 are composed in an acyclic

fashion, ~N has no combinational cycles.

Theorem 9.3. If the diameter of a set of verticesU 00 of -slow abstracted netlist~N = N 0 kN 00 is d(U 00), then the diameter of the corresponding set of verticesU of the original netlistN , provided that8u 2 U:C(u) = � 1, is at most � d( ~U).Proof. By Definition 4.2, if the diameter of the -slow abstracted vertex setU 00 is d(U 00),then the longest required duration to witness a particular valuation toU 00 is d(U 00) time-

steps. From Lemmas 9.1 and 9.3, we know that -slow abstraction folds time modulo- .141

void C Slow Abstract(Netlist N )1. Color netlistN using algorithmColor Generalized C Slow to determine .� If = 1, no abstraction is possible.� If = 1, the netlist is acyclic; skip the abstraction or assign to be the maxi-

mum color of any vertex.

2. PreprocessN using algorithmPreprocessC Slow to yield -slow netlistN .

3. Perform -slow abstraction uponN using the algorithm provided in Definition 9.2.

Figure 9.6:C Slow Abstract algorithm

Therefore, any transition of states in~N correlates to transitions inN , and the correspond-

ing valuation toU will occur within �d(U 00) time-steps. Note that the first �1 time-steps

account for any necessary delay inN to produce a corresponding initial value of~N .

9.1 C-Slow Abstraction Algorithms

In this section we provide our algorithms for -slow abstraction. Several core algorithms

were introduced in Figures 9.4 and 9.5 for preprocessing generalized -slow netlists and

for trace lifting, respectively. Our top-levelC Slow Abstract function of Figure 9.6 calls

our coloring algorithm depicted in Figure 9.7. Performing acone-of-influence reduction,

and redundancy removal, prior to -slow abstraction is beneficial to prevent unnecessary

logic from reducing . GCD refers to agreatest-common divisor. We note that if we

obtain = 1, then all vertices will have color0 hence -slow abstraction is not useful.

Because a legal netlist is finite, = 1 uniquely identifies an acyclic netlist; in such cases,

we assign to be the maximum color of any vertex. However, use ofBMC with a tight

diameter overapproximation resulting from our technique from Chapter 4 is often a superior

verification strategy in such cases.

142

Hypercoloring Color Generalized C Slow(Netlist N) fC = ;; =1; visited(V) = ?;foreach v 2 V f

if�C(v) � ;� f

Color(v,jVj, , C);ggsubtract minfa : 9v 2 V:a 2 C(v)g from each value in C;return C;g

void Color(Vertex v; N color; N ; Hypercoloring C) fcolor = color mod ;if

�color 2 C(v)� freturn;g

if (visited(v) � >) fa = maxfa 2 C(v)g; a0 = maxfa; olorg �minfa; olorg;if (a0 mod 6= 0) f = GCD( ; a0); Normalize(C; );greturn;gC(v) = C(v) [ color;visited(v) = >;

new color = � + olor� (G(v) � REGISTER)� mod ;foreach u 2 inlist(v) f

Color(u; new color; ; C);gif

�G(v) � REGISTER� f

Color(Z(v); color; ; C);gvisited(v) = ?;gvoid Normalize(Hypercoloring C; N new c)f

foreach v 2 V fforeach a 2 C(v) fC(v) = (A n a) [ (a mod new c);ggg

Figure 9.7: Algorithms for coloring generalized -slow netlists

143

The running time of our -slow algorithms areO� � (jEj+ jVj+ jT j)�. This follows

from noting that in the worst case, each vertex will obtain every color during algorithm

Color hence preprocessing will need to replicate every vertex times, and pad every target

with at most REGISTERs. This bound assumes that the number of calls toNormalize

will be a constant factor of ; in a pathological and rare case, the number of calls may

become logarithmic injRj. This function is actually unnecessary except as a final step

of Color GeneralizedC Slow; its processing may be emulated during reads and writes

of C, though this clutters the exposition of the algorithms. Theabstraction process itself

performs a linear sweep over the preprocessed netlist, replicating each vertex twice.

9.2 Related Work

The use ofslowdown(increasing ) as a design optimization technique was first proposed

by Leiserson and Saxe [68, 66]. They demonstrate that slowdown coupled with retiming is

capable of yielding significant reductions in the clock period of a design through decreasing

its longest combinational path. They also provide algorithms to increase and decrease as

a design technique.

The topic of retiming (refer to Chapter 6) is a related, yet orthogonal structural

transformation. Retiming itself is insufficient to achievethe results of -slow abstraction –

for example, retiming cannot alter the weight of a directed cycle. However, retiming is

a complementary technique which yields different types of reductions – for example, the

REGISTERplacement after -slow abstraction will match that of the color- �1 REGISTERs

before the abstraction, whereas retiming may move REGISTERs to fairly arbitrary positions.

Phase abstraction (refer to Chapter 10) is a topologically-related though fundamen-

tally different state-folding approach. Phase abstraction applies only to LEVEL-SENSITIVE

LATCH-based netlists, wheres -slow abstraction is applicable to REGISTER-based netlists.

It is possible that repeated -slow abstractions may be useful, interleaved with other trans-

144

formations that render increasing simplifications; phase abstraction is useful at most once

for a verification run. Semantically, in multi-phase designs, only one class of latches up-

dates at each time-step, hence the LATCHes stutter.C-slow designs generally do not stutter

whatsoever. Furthermore, the initial values of all but one class of LATCHes will be over-

written before propagation. In contrast, all initial values of a -slow design may propagate.

Overall, these two techniques are complementary.

For acyclic netlists, we have demonstrated in [17] that a simple modification of -slow abstraction may be used to yield a purely combinationalnetlist. This is a similar

result as using our diameter bounding algorithms presentedin Chapter 4 to render a com-

binational netlist through unfolding. However, the diameter overapproximation algorithms

are superior to -slow abstraction in such cases, since they enable tight bounds and obviate

the need for creating theni logic.


Our experimental results were performed with the model checker RuleBase [103]. All re-

sults were obtained on an IBM RS/6000 Workstation Model 595 with 2 GB main memory.

We arbitrarily selected ten components of IBM’s Gigahertz Processor which had previously

been model checked. Our algorithms identified two of these asbeing -slow. The first is an

acyclic pipeline with aninstruction qualifierinput that combinationally fans out to multiple

stages; the second is an intricate cyclic five-slow pipelinewith an asynchronous interrupt

to every stage. These multi-colored inputs prevent the classification of these designs as -slow by Definition 9.1, but our generalized Definition 9.4 enables a -slow classification.

Both of these examples were explicitly entered in HDL as -slow designs – this topology

is not the by-product of a synthesis optimization. Both of these components had been un-

dergoing verification for more than 12 months prior to the development of this abstraction

technique. Consequently, the unabstracted variants had very good BDD orders available.

145

RuleBase was run with phase abstraction [16] and its redundancy removal reductions en-

abled, and with dynamic BDD reordering enabled using the technique of Rudell [104].

Prior to performing -slow abstraction, we performed a structural transformation to elimi-

nate scan chain connections between the REGISTERs (which unnecessarily limited ), and

to cut self-feedback loops on constant REGISTERs.

We first deployed this abstraction technique on the acyclic pipeline for the most

complex property against which the design had been verified.The unabstracted version

had 148 variables, and with our best initial order required 409.6 seconds with a maximum

of 1410244 allocated BDD nodes to complete this property. The first run on the abstracted

variant with a random initial order (though pairing present-state and next-state variables

for each state element) had 53 variables, and required 44.9 seconds with a maximum of

201224 BDD nodes. While this speedup is significant, this comparison is skewed since the

unabstracted run benefited from the extensive prior BDD reordering. Re-running the un-

abstracted experiment with a random initial order required3657.9 seconds, with 2113255

BDD nodes. Re-running the abstracted experiment using the order obtained during the first

run required 4.6 seconds with 98396 nodes. Computing the c-slow abstraction required 0.3

seconds. Overall, our abstraction yielded a factor of 81 speedup with random initial orders.

A justifications for comparing relative to random initial orders include the following points.� At the initial stages of a verification effort, no good ordersare available.� These results capture the difficulty of calculating a reasonable BDD order before and

after abstraction (since reordering times are included theresults), and reflect the time

necessary to obtain a result for a new problem.

We even attained a factor of 9 speedup in the extremely skewedcase that the abstracted run

had a random order and the unabstracted run had a very good order.

The next example is the five-slow design. With our best BDD order, model checking

the unabstracted design against one arbitrarily selected formula required 5526.4 seconds,

146

with 251 variables and 3662500 nodes. With a random initial order, the unabstracted run

required 23692.5 seconds with 7461703 nodes. The first run ofthe abstracted design with

a random initial order required 381.5 seconds, with 134 variables and 339424 nodes. Re-

running the formula twice more and reusing the calculated BDD orders yielded a run of

181.1 seconds with 293545 nodes. Performing -slow abstraction required 3.2 seconds.

Due to the potential increase in depth of combinational cones through this abstrac-

tion, there is a potential for a significant blowup of a BDD-based representation of the

resulting netlist. Advanced techniques such as splitting and conjoining [43], or fine-grain

reachability analysis [44], may be used to help combat such blowup. A reasonable BDD or-

der furthermore seems fairly important when performing this abstraction. With reordering

off, and a random initial order, the results for the acyclic pipeline were akin to those re-

ported above. However, the five-slow abstracted transitionrelation was significantly larger

than the unabstracted variant given the random order, thereby resulting in a much slower

execution than on the unabstracted run. With reordering enabled, the results as reported

above were consistently significantly superior.

147

Chapter 10

Phase Abstraction

In this chapter we discuss the technique of phase abstraction, extending results of collabo-

rative work with Tamir Heyman, Vigyan Singhal, and Adnan Aziz reported in [16]. Phase

abstraction is an efficient method to translate a netlist comprised of LEVEL-SENSITIVE

LATCHes into one comprised instead of REGISTERs. The LEVEL-SENSITIVE LATCH, or

simply LATCH, was purposefully not introduced as a possible gate type previously in this

thesis; this is the only chapter in which LATCHes will be discussed, and by performing

phase abstraction, we practically circumvent the need to consider LATCHes elsewhere in

this thesis, or in a verification toolset. Definition 10.1 formalizes the LATCH, and serves as

an addendum to Definition 3.11 and 3.12 for this chapter.

Definition 10.1. A L ATCH vertexv has two-inputs:clockanddata. TermGv is not refer-

enced for LATCHes.� If i > 0, thenp(v; i) = ite�p(clock(v); i); p(data(v); i); p(v; i � 1)�. Otherwisep(v; 0) = ite�p(clock(v); 0); p(data(v); 0); p(Z(v); 0)�.We refer to the set of LATCHes asL. We define the clock logic of a netlist asD =

fanin cone�clock(L)�. We assume without practical loss of generality thathD;V n Di

constitutes a cut of the netlist whose crossing edges areclockedges.

148

In common two-phase designs, a “correct” clocking scheme may be visualized as a

global clock vertex which alternates between 0 and 1 at everytime-step. A LATCH which

is transparent when the global clock is a 0 will be denoted as a�0 LATCH (often referred

to as anL1 LATCH); one which is transparent when the global clock is a 1 will bedenoted

as�1 (often referred to as anL2 LATCH). Hardware design rules, arising from timing con-

straints, require any structural path between two�0 LATCHes to comprise a�1 LATCH,

and vice-versa. An elementary design style requires each�0 LATCH to fan out directly

to a�1 LATCH (called a master-slave LATCH pair), and allows only�1 to drive combina-

tional logic. However, a common high-performance hardwaredesign technique involves

distributing combinational logic freely between�0 and�1 LATCHes to better utilize each

half-period of the clock. Such designs are often explicitlyimplemented in this manner; this

topology is not the byproduct of a synthesis tool, but instead a necessary design technique

to ensure the highest performance hardware.

One may readily model a LATCH using implicitly clocked REGISTERs as demon-

strated in Figure 10.1. We use a multiplexor selected byclock; when clock is a 1, and

the LATCH is transparent, we sensitize a combinational flow-through path from data to

dout. Otherwise, we sensitize a path driven by a REGISTER with the same initial value as

the LATCH which shadows the last-driven value through the LATCH. Note that modeling

LATCHes in this fashion may cause the appearance of combinationalcycles, for example,

given a structural directed cycle from a�0 to a �1 back to the�0. In the presence of a

correct clocking scheme, this apparent combinational pathis an unsensitizablefalse path.

However, in case of a clocking flaw, a semantic combinationalcycle may truly exist.

Because LATCH-based netlists tend to contain more sequential elements than func-

tionally correspondent REGISTER-based netlists, verification algorithms that enumerate

states often require more time and memory in the former case,potentially exponentially

more. We furthermore must model an oscillating clock, whichis in the support of all

LATCHes. Additionally, sincek image computations are necessary per clock period, the

149

� doutdout

data

data

clock

clock

10Figure 10.1: Semantics-preserving translation of LATCHes to REGISTERs

diameter of such a netlist isk times that of a correspondent REGISTER-based netlist. We

therefore propose the technique ofphase abstractionto overcome these difficulties. We

perform this abstraction by selectively eliminating LATCHes. In doing so, this technique

often reduces netlist size, thereby enhancing arbitrary transformation and verification al-

gorithms which may consume superlinear, possibly even exponential, resources. Addition-

ally, by eliminating state elements, we often benefit BDD-based algorithms due to reducing

variable count, which often reduces their size and reordering time. Further, the removal of

REGISTERs allows “collapsing” of adjacent logic cones to a single combinational cone,

which increases the domain of applicability of combinational redundancy removal tech-

niques. However, this elimination of state elements also risks explosion of BDD sizes

representing these composite cones; this abstraction thushas the potential to harm BDD-

based analysis. Nevertheless, as our experiments demonstrate, this abstraction tends to

enhance BDD-based analysis. We have not observed one case where phase abstraction hurt

a verification effort to the point where it needed to be disabled during five years of using

this technique; refer to Section 10.3 for a more detailed discussion of this topic.

Definition 10.2. A k-phase netlistN , for k � 2, contains LATCHes but no REGISTERs.

We associate� representing a “global” clock, andC : V n D 7! 0; : : : ; k � 1 repre-

senting ak-coloring function, withN . Semantically,� acts an unconditionalmod-k up

counter which initializes to 0, thusp(�; i) = i mod k for anyp 2 P . We require thatp� lo k(v); i� = 1 iff�p(�; i) = C(v)� for eachv 2 L and eachp 2 P . Therefore,� indi-

150

cates which phase of LATCHes are transparent at the corresponding time-step. We require

thatT \D = ;. ColoringC is defined as follows.

1. If the color of LATCH v isC(v), then the color of each LATCH v0 in the combinational

fanout ofv is�C(v) + 1� mod k.

2. If the color of LATCH v isC(v), then the color of each non-LATCH v0 in the combi-

national fanout ofv isC(v).3. If the color of LATCH v isC(v), then the color of each non-LATCH v0 in the combi-

national fanin ofdata(v) is�C(v)� 1 + k� mod k.

4. If the color of LATCH v is C(v), andZ(v) is not ZERO or ONE, thenC�Z(v)� =C(v).If the topology of a netlist renders a consistent gate coloring with respect to Defi-

nition 10.2 infeasible, the netlist is notk-phase. Note that the coloring ofk-phase netlists

resembles that of -slow netlists described in Definition 9.1. A linear-time algorithm may

be used to color the vertices ofN ; � further provides a “seed” for the coloring. It is possible

to generalizek-phase topologies similarly to our generalization of -slow netlists provided

in Definition 9.4; however, synthesis constraints precludethe need for such. Note that we

require that the target not be an element of the clock logicD, which is necessary for sound-

ness of the abstraction. Use of an integer rather than a set ofbinary values for� is merely

a notational shorthand.

Definition 10.3. Givenk-phase netlistsN andN 0, letA = fu 2 V : C(u) = k�1gnfu 2V : combinational fanin(u) \ I 6= ;g andA0 = fu0 2 V 0 : C 0(u0) = k � 1g n fu0 2V 0 : combinational fanin(u0)\ I 0 6= ;g. Thecomposition ofk-phase netlistsN 00 = N k N 0is defined by merging some FREE verticesv of N onto verticesu0 2 fI 0 [ A0g of N 0 with

equal color:C(v) = C 0(u0), and by merging some FREE verticesv0 of N 0 onto vertices

151

u 2 fI [ Ag of N with equal color:C 0(v0) = C(u). We require thatfanin cone�Z 00(L00)�

remain combinational after merging.

Since we may only merge vertices of the same color, the composition of twok-phase

netlistsN andN 0 yields ak-phase netlistN 00 which inherits coloring, henceC 00 = C [C 0.Composition furthermore is guaranteed to yield a legalk-phase netlist since we may only

merge a FREE vertex ofN onto a FREE vertex ofN 0, or onto a non-FREE vertex ofN 0which has no FREE vertices in its combinational fanin. The optimality of our algorithm

results from representingN as the composition ofminimal dependent layers(MDLs) of

LATCHes, and preserving only one phase of LATCHes per MDL.

Definition 10.4. A dependent layerof a k-phase netlist is a nonempty set of�0; : : : ; �k�1LATCHes l0; : : : ; lk�1, such thatli+1 is a superset of all LATCHes in the combinational

fanout ofli, andli is a superset of all LATCHes in the combinational fanin ofdata(li+1), for0 � i < k � 1.

Definition 10.5. A dependent layerl is termedminimal if and only if there does not exist

a nonempty set of LATCHesl0 which may be removed froml and still result in a nonempty

dependent layerl n l0.Lemma 10.1.There is a unique MDL partition of anyk-phase netlist.

Proof. We prove this lemma by contradiction. LetQ0 andQ1 be two non-equivalent MDL

partitions ofN . Let q0i represent thei-th MDL of Q0, andq1i the i-th MDL of Q1. ForQ0 to be non-unique, there must exist aq1i which is not an element ofQ0. Note that there

cannot exist aq0i which is a superset of thisq1i elseq0i is not minimal (or is equivalent toq1i); similarly, thisq1i cannot be a superset of anyq0i .If q1i is a singletonflg of color j, there must exist no other LATCHes in the fanout

(unlessj = k � 1) or fanin (unlessj = 0) cone ofl elseq1i is not a dependent layer.

Clearly, theq0i which containsl is not minimal since we may removel from that set and

the remaining nonempty setq0i n l is still a dependent layer – as is the singletonflg.152

If q1i has cardinality greater than one, there must exist two LATCHesl andm in q1isuch thatl 2 q0i andm 2 q0j for i 6= j. If l is a�i for i < k � 1, we note that all�i+1LATCHes li+1 in the combinational fanout ofl must be inq0i andq1i , and all�i LATCHesli in the combinational fanin ofdata(li+1) must also be inq0i andq1i . We may iteratively

repeat this analysis for LATCHes in the combinational fanout oflj (for 0 � j < k� 1), and

for LATCHes in the combinational fanin ofdata(lj) (for 0 < j � k� 1), until all LATCHes

have been encountered and we have reached a fixed point of dependent LATCHes. Note

thatm must be one of the LATCHes reached in this fixed point, elseq1i is not minimal.

Furthermoremmust also be an element ofq0i elseq0i is not a dependent layer, contradicting

the claim thatq0i 6= q1i .B

A

�0�0 �1 �0 �1�1Figure 10.2: Example netlist with two minimal dependent layers

Consider the example two-phase netlist depicted in Figure 10.2. The two unique

MDLs are marked with dotted boxes. Merely removing all�0 or all �1 LATCHes will not

yield an optimum reduction for this netlist; the�0 LATCHes of layer A, and the�1 LATCHes

of layer B must be removed to yield an optimum solution of two LATCHes.

Consider the generic two-phase netlist shown in Figure 10.3. The initial values of

the LATCHes areZ(VB0) andZ(VD1). Note thatZ(VB0) will not be visible since the�0LATCHes are transparent at time 0. Let� denote the global clock, which initializes to 0, and

alternates between 0 and 1 at every time-step, indicating whether the�0 or �1 LATCHes,

respectively, are presently transparent. For alli, we have_p(VB0; i) = ite� _p(�; i); _p(VB0; i�153

f0 VC0VA1f1 VB0 VD1�1�0I1I0Figure 10.3: Two-phase netlistN1); _p(VA1; i)�. For i > 0, we have_p(VD1; i) = ite� _p(�; i); _p(VC0; i); _p(VD1; i� 1)� For the

combinational gates, for alli we have_p(VA1; i) = f1� _p(VD1; i); _p(I1; i)� and _p(VC0; i) =f0� _p(VB0; i); _p(I0; i)�.Extending our example to an arbitraryk-phase netlist, we denote the combinational

logic sourcingdata(�(j+1) mod k) asffjg, which has LATCHes�j and FREE verticesIj of

color j in its combinational fanin. Note that the initial values of the�0 LATCHes are of

no semantic importance since the�0 LATCHes are transparent at time 0. The initial values

of only the�k�1 LATCHes propagate to other LATCHes (since all others are transparent

before�k�1) – though the initial values of LATCHes of color0 < j < k� 1 are of semantic

importance during the firstj � 1 time-steps. For this reason, unlike -slow abstraction, we

cannot exploit the simplifying assumption that targets will be colork� 1 through padding.

Finally, we note that the�j LATCHes are transparent only at time-stepsk � i+ j, and stutter

between. We obtain the following expressions for color-j verticesVj._p(Vj; j) = 8>>>>>>>><>>>>>>>>:

ff0g�ffk � 1g( _p(Z(�k�1); 0); _p(Ik�1; 0)); _p(I0; 0)� : j = 0ffjg� : : : (ff0g(ffk � 1g( _p(Z(�k�1); 0); _p(Ik�1; 0));_p(I0; 1)); : : : ); _p(Ij; j)� : j 6= 0 (10.1)

In (10.1), the first sequence represents a nesting offfjg�ffj � 1g(: : : (ff1g(ff0g. We

will use this same sequence in (10.4). The second sequence closes the first with the corre-

sponding FREE vertices _p(I0; 1)); _p(I1; 2)); : : : ); _p(Ij�1; j)); _p(Ij; j)�. In (10.4), the FREE

154

vertex ordering is identical, but the temporal arguments are all 0._p�Vj; bj + k) = ffa0jg� : : : (ffak�1j g(ffjg( _p(�j; bj); _p(Ij; bj + 1));_p(Iak�1j ; bj + 2)); : : : ); _p(Ia0j ; bj + k)� (10.2)

We define the sequencea0j ; : : : ; ak�1j asaij = (k + j � i) mod k. We definebj =k � i+ j; the�j LATCHes are transparent only during such time-steps. The first sequence of

(10.2) represents a nesting of functionsffa0jg�ffa1jg(: : : (ffak�2j g(ffak�1j g. The second

sequence closes the first with the corresponding vertices_p(Iak�1j ; bj + 2)); _p(Iak�2j ; bj +3)); : : : ); _p(Ia1j ; bj + k)); _p(Ia0j ; bj + k)�. In (10.4), the FREE vertex ordering is identical;

the temporal arguments are discussed below.

Letting j = k � i + j 0 for any j 0 : (j 0 6= j) ^ (j 0 < k), we obtain the following

valuations for time-steps when�j is not transparent hence stutters._p�Vj; j) = ffjg� _p(�j; j); _p(Ij; j)� (10.3)

Similarly to the analysis of -slow netlists provided in formulas (9.1) and (9.2), formu-

las (10.1) and (10.2) indicate that each valuation to a�j LATCH is a deterministic function

of the value of the�j LATCH k time-steps past, and the FREE vertex valuations at the ap-

propriate time-step since. Unlike -slow designs, only the initial values of�k�1 LATCHes

propagate as per (10.1). Furthermore, each�j only updates once perk time-steps.

Either layer of LATCHes of a two-phase netlistN may be turned into buffers (one-

input AND gates), and the remaining layer transformed to REGISTERs; the resulting ab-

stracted netlist will be shown to be bisimilar to the original netlist with respect to the

colored verticesV n D. Figure 10.4 illustrates the first abstraction, which removes the�0 LATCHes. TermZ(VD1)0 = _p(V 0D1; 0) is the cloned initial value of the preserved

sequential elements. Fori > 0, we have _p(V 0D1; i) = _p(V 0C0; i � 1). For the com-

binational nets, we have_p(V 0A1; i) = f1� _p(V 0D1; i); _p(I 01; i)�; _p(V 0B0; i) = _p(V 0A1; i); and_p(V 0C0; i) = f0� _p(V 0B0; i); _p(I 00; i)�.155

f0f1 V 0A1 V 0C0 V 0D1I01I00 V 0B0Figure 10.4: Phase-abstracted netlistN 0

Figure 10.5 shows the second abstractionN 00 with layer�1 removed. We need a

new REGISTERInit00 whose initial value is 1, and thereafter is 0. This REGISTERensures

that the initial value cloneZ(VF1)00 is applied to netsV 00D1 in N 00 at time 0. The initial

values of the preserved state variables (which have been transformed from LATCHes into

REGISTERs) are transformed tof1�Z(VD1)00; I 001 �, which is equivalent to_p(V 00A0; 0). This is

to prevent false-hits of a color-0 target at time 0, since�0 is transparent at time 0 hence its

initial value is of no semantic importance. Fori > 0 we have_p(V 00B0; i) = _p(V 00A1; i � 1).If _p(Init00; i) = 1, then _p(V 00D1; i) = _p(Z(VD1)00; i), else _p(V 00D1; i) = _p(V 00C0; i). For the

other combinational nets, we have that_p(V 00A1; i) = f1� _p(V 00D1; i); _p(I 001 ; i)� and _p(V 00C0; i) =f0� _p(V 00B0; i); _p(I 000 ; i)�.Either of these abstractions may be applied to each MDL partition of a two-phase

netlist independently of the other MDLs, thus yielding an abstraction which has a globally

minimum number of LATCHes (refer to Theorem 10.4). This minimum would, in general,

be less than removing either all�0 or all�1 LATCHes. We now define the phase abstraction

process for arbitraryk � 2.

Definition 10.6. A k-phase abstractionis a transformation of ak-phase netlistN to a

REGISTER-based netlist~N as follows. The transformation operates upon maximal directed

pathse from a�i to a�j LATCH including each color at most once;i = 0, andj = k � 1,

unless there exist no other LATCHes in the fanin cone of�i, and no other LATCHes in the

fanout cone of�j, respectively.

156

f0f1 Init00ZERO00V 00A1 V 00B0 V 00C0 V 00D1Z(VD1)00 10I000I001

Figure 10.5: Alternate phase-abstracted netlistN 00� Exactly one LATCH l alonge is replaced by a REGISTER r if i = 0 andj = k � 1,

otherwise at most one LATCH is replaced by a REGISTER. Letting l be a�i LATCH,

we refer to the transformation as apreserve-�i abstraction.

– If C(l) = k � 1, then useZ(�k�1) as the initial value ofr.– If C(l) 6= k � 1, use an unfolding-based approach to calculate_p�r; C(l)� as a

function overZ(�k�1) andI as per (10.1).� All L ATCHes alonge other thanl are eliminated.

– For a�i LATCH (i 6= k � 1) which is eliminated, it is replaced by a buffer.

– For a�k�1 LATCH which is eliminated, it is replaced by a multiplexor structure

to preserve its initial value as demonstrated by Figure 10.5.

Straight-forward analysis demonstrates that valuations to the phase-abstracted ver-

tices of colorj, denoted by~Vj, satisfy the following formulas.

157

_p( ~Vj; 0) =8>>>>>>>>>>>><>>>>>>>>>>>>:ff0g�ffk � 1g( _p( ~Z(�k�1); 0); _p(~Ik�1; 0)); _p(~I0; 0)� : j = 0ffjg� : : : (ff0g(ffk � 1g( _p( ~Z(�k�1); 0); _p(~Ik�1; 0));_p(~I0; 0)); : : : ); _p(~Ij; 0)� : 0 < j < k � 1ffk � 1g� _p( ~Z(�k�1); 0); _p(~Ik�1; 0)� : j = k � 1 (10.4)

The following formula defines transitions of~N . Note that the “artifact” initial values of any

preserved�j for j 6= k � 1, obtained through unfolding, imply that valuations to~�j will

stutter from time-steps0 to 1. We therefore require consideration of the type of abstraction

in this formula. Let�j(i) = (k � 1� j + i) mod k. Term�j(v) is 1 if a�j0 LATCH was

preserved for�j(j 0) > �j�C(v)� in any pathe containingv (refer to Definition 10.6), else�j(v) is 0. TermÆj(v) = �C(v) 6= k � 1� ^ :�j(v). We also useÆj(v) in our k-phase

bisimilarity of Definition 10.7 and ourtrace lifting algorithm of Figure 10.7 to ignore the

time-0 valuations to verticesv with Æj(v) = 1, due to the above-mentioned stuttering. We

again define the numerical sequencea0j ; : : : ; ak�1j asaij = (k + j � i) mod k for (10.5).

Let ~V 0j be a subset of~Vj such that8v; v0 2 ~V 0j : �j(v) = �j(v0), and leti beÆj( ~V 0j ) + for

any 2 N ._p� ~V 0j ; i+ 1) = ffa0jg�ffa1jg( : : : (ffak�1j g(ffjg( _p( ~�j; i); _p(~Ij; i)); (10.5)_p(~Iak�1j ; i+ 1� �j(~Iak�1j ))); : : : ); _p(~Ia1j ; i+ 1� �j(~Ia1j )); _p(~Ia0j ; i+ 1)�The first sequence of formula (10.5), like that of (10.2), represents a nesting of functionsffa0jg�ffa1jg(: : : (ffak�2j g(ffak�1j g. The second sequence closes the first with the corre-

sponding vertices_p(~Iak�1j ; i + 1 � �j(~Iak�1j ))); _p(~Iak�2j ; i + 1 � �j(~Iak�2j ))); : : : ); _p(~Ia1j ; i +1� �j(~Ia1j ))); _p(~Ia0j ; i+ 1)�.Comparing formulas (10.2) and (10.5), we note that similarly to -slow abstraction,

the semantic effect of phase abstraction is to fold time modulo-k. Furthermore, the time-j158

valuations to the color-j vertices as per (10.1) correspond to the time-0 valuations to the

abstracted vertices as per (10.4) and (10.5).

Definition 10.7. A k-phase bisimulation relation1 with respect tobisimilar vertex setsAand ~A, where8a; a0 2 A:�C(a) = C(a0)�^��C(a)(a) = �C(a0)(a0)�, holds betweenk-phase

netlistN and its abstraction~N , respectively, iff there exists a bijective mapping : A 7! ~Awhich satisfies the following conditions. Let (a) = �C(a) + 1� mod k.

1. 8p 2 P:9~p 2 ~P:8j 2 N :8a 2 A: p�a; (a) + j � k� = ~p� (a); j + ÆC(a)(a)�2. 8~p 2 ~P:9p 2 P:8j 2 N :8a 2 A: p�a; (a) + j � k� = ~p� (a); j + ÆC(a)(a)�

Note thatk-phase bisimilarity leaves an initial semantic gap due to (a), necessary

because the initial values for�j LATCHes (where0 < j < k � 1) may not be represented

in the abstracted netlist. We will useBMC to patch this hole during invariant checking,

thereby performing a temporal decomposition of the verification task. We definek-phase

bisimilarity only with respect to a vertex setA of a single color, and with an identical� val-

ues. This not only guarantees applicability of the results of formulas (10.1)-(10.5), but is

necessary since certain cross-products of concurrent valuations to multi-colored vertices inN may become unreachable in~N through the temporal folding inherent in phase abstrac-

tion. Furthermore, the stuttering at time 0 and 1 of the preserved lower-or-equal colored

LATCH causingÆC(a)(a) = 1 requires special consideration – the addition ofÆC(a)(a) – in

this bisimilarity.

Lemma 10.2. If ~N is ak-phase abstraction ofN , thenN is k-phase bisimilar to~N with

respect to any corresponding vertex setsA and ~A such that8a; a0 2 A: �C(a) = C(a0)� ^��C(a)(a) = �C(a)(a0)�.Proof. Formulas (10.1) and (10.4) demonstrate bisimilarity of thetime-C(a) valuations ofN to the time-0 valuations to~N for C(a) 6= k � 1. Bisimilarity of the time-0 valuations to

1Though in this thesis we only require trace equivalence for invariant checking, as we demonstrate in [16]this abstraction preserves a type of bisimilarity.

159

C(k � 1) vertices follow from (10.3) and (10.4). Correspondence ofk transitions ofN to

one transition of~N follows from (10.2) and (10.5). Note that incrementing the color of the

vertices to use as the offset (a) within the clock period is necessary as per (10.2). Note

also that forÆC(a)(a) = 1, valuations to~a reflect a stuttering from time 0 to 1 which is not

present inN , correlating to the addition ofÆC(a)(a) in Definition 10.7.

Lemma 10.3. Let N and ~N be k-phase bisimilar with respect to vertex setsA and ~A,

andN 0 and ~N 0 be k-phase bisimilar with respect to vertex setsA0 and ~A0. A k-phase

compositionN 00 = N k N 0 is k-phase bisimilar to~N 00 = ~N k ~N 0 with respect toA [ A0and ~A [ ~A0 provided that8a; a0 2 fA [ A0g: �C(a) = C(a0)� ^ ��C(a)(a) = �C(a)(a0)�.Proof. This proof follows immediately from (10.1)-(10.5) and the analysis of Lemma 10.2,

noting that as per Definition 10.3, the rules of composition of k-phase netlists yield anotherk-phase netlist.

Lemmas 10.2 and 10.3 allow us to apply the variousk-phase abstractions indepen-

dently on each dependent layer, and still render ak-phase bisimilar netlist on the composi-

tion of the abstractions.

Our top-level phase-abstraction algorithm is depicted in Figure 10.6. We first color

the netlist using the algorithm implied by Definition 10.2 and a seed provided from the

clocking logic. We useBMC to determine if any targets of color1; : : : ; k�2 may be hit by

the initial values of the corresponding LATCHes, since as demonstrated by Definition 10.7,

phase abstraction may not preserve those initial values. Practically, thisBMC is rarely nec-

essary sincek is almost always equal to 2 for industrial designs; furthermore, the likelihood

that a target is of colork � 1 tends to be rather high regardless ofk. Even when necessary,

constant propagations of initial values are likely to trivialize theBMC call, similarly to the

observation that a retiming stump tends to be small due to constant propagations (refer to

Section 6.4). Note that we need only perform aBMC for one time-step – this is because

the initial values of these intermediate-colored verticesdo not propagate, and because the

160

void PhaseAbstract(Netlist N)1. Color netlistN .

2. If k < 2, orN cannot bek-colored, orT \D 6= ;, no abstraction is possible.

3. RunBMC on all targets of colorf1; : : : ; k � 2g for time 0.

4. Partition netlist into MDLs.

5. Perform phase abstraction. For each MDLAi:(a) Bi = maxfj : 8j 0 6= j:j�j \ Aij � j�j0 \ Aijg.(b) Perform apreserve-�Bi abstraction onAi.

Figure 10.6:PhaseAbstract algorithm�j LATCHes hold their initial values through timej�1 as per formula (10.3). We next par-

tition the netlist, and abstract viapreserve-�i for a maximal-coloredi of smallest LATCH

cardinality for each element of the partition.

Our algorithm for trace lifting is depicted in Figure 10.7. The colorC(v) vertices

are those of semantic importance at offset (v) within thek-step clock period because at

such times, the FREE vertices of colorC(v) are those that impact transitions of the netlist

as per (10.2). Terms�(v) and Æ(v) capture the type of abstraction. IfÆ(v) = 1, then

the corresponding MDL was abstracted with a “preserve-�C(v) or lesser” abstraction forC(v) 6= k � 1, hence the valuation must be pushed back one clock-period toproperly

capture its temporal correlation as per (10.5), causing itstime-0 valuation to be dropped.

For trace lifting, we do not define� andÆ on a per-j basis unlike (10.5) because we root

our evaluation directly to color-k � 1 vertices.

Theorem 10.1.Phase abstraction is sound and complete for invariant checking.

Proof. A target unreachableresult will be generated by phase abstraction only if the corre-

sponding abstracted target was proven to be unreachable by achild verification flow. From

Lemmas 10.2 and 10.3, phase abstraction preserves all valuations to colored vertices other

161

Partial Trace Lift Trace(Partial Trace p0) fforeach v 2 fV nDg f (v) = �C(v) + 1� mod k;�(v) = MDL(v) was abstracted via preserve-�j for j > C(v);Æ(v) = �C(v) 6= k � 1� ^ :�(v);

for�i = 0;i < length(p0);i++� fj = k � (i� Æ(v)) + (v);

if�(j � 0) ^ (9b:h(~v; i); bi 2 p0)� fp = p [ h(v; j); p0(~v; i)i;ggg

return p;gFigure 10.7: Phase abstraction trace lifting algorithm

than initial values of LATCHes of color1; : : : ; k � 2. Additionally, aBMC call is used to

guarantee that targets of color1; : : : ; k� 2 are not hittable at time 0;BMC thus fills in this

temporal gap for invariant checking. Hence unreachable results are correct.

Note that, unlike -slow bisimilarity from Lemma 9.1, in ak-phase netlist a val-

uation to a particular FREE vertex of colorj only affects transitions of the trace for one

time-step,a = k � i + (j + 1) mod k, perk time-stepsa; : : : ; a+ k � 1. However, (10.3)

demonstrates that any valuation toVj during the range between consecutive transparent

states of�j is producible at each time-step during this range, thus thischaracteristic does

not impact invariant checking. The stuttering of state elements in the combinational fanin

of vertices~v for whichÆC(v)(v) = 1 similarly does not impact invariant checking.

A target hitresult will be generated only when an abstracted target~t is hit by BMC

or a child verification flow. If the target was hit byBMC , this result is correct by assump-

tion. Otherwise, we note that the corresponding tracep0 is semantically correct with respect

to ~N and hits~t by assumption. Comparing our trace lifting algorithm from Figure 10.7 to

(10.1) and (10.2), and to (10.4) and (10.5), we see that the lifting correctly temporally

transforms the trace into one that is semantically consistent withN . Furthermore, our trace

162

lifting algorithm propagates every valuation to colored vertices from the abstracted trace,

aside from time-0 valuations to vertices~v for whichÆC(v)(v) = 1 which are equivalent their

time-1 valuations, and aside from time-0; : : : ; j � 1 valuations to color-j 6= k � 1 vertices

which cannot comprise a target hit as validated byBMC . Our target is colored, hence the

lifted trace also hits the target.

Theorem 10.2.Phase abstraction generates a legal netlist.


1. The only gates fabricated by phase abstraction are one-input AND gates, REGISTERs,

multiplexors, and cloned vertices, all of which are legal (possibly by assumption).

2. Each set of fabricated gates (see the previous point) which replaces a gate in thek-phase netlist is of constant size. Thus~N is finite by assumption.

3. Phase abstraction replicates LATCH initial values, which are combinational by as-

sumption, to use for preserved-�k�1 REGISTERs, and uses combinational unfolding

to generate initial values of preserved�i REGISTERs for i 6= k � 1. Hence all initial

values of ~N are combinational.

4. By assumption, every original directed cycle will include every LATCH color at least

once (else we do not have a legalk-phase netlist). Phase abstraction will guarantee

that at least one LATCH of each path from a�0 to a �k�1 will be translated to a

REGISTER, hence phase abstraction cannot generate combinational cycles.

Theorem 10.3.If the diameter of a set of vertices~U of phase abstracted netlist~N is d( ~U),then the diameter of the corresponding verticesU of thek-phase netlistN , provided that8u; u0 2 U: �C(u) = C(u0)� ^ ��C(u)(u) = �C(u)(u0)�, is at mostk � d( ~U).

163

Proof. By Definition 4.2, if the diameter of the phase abstracted abstracted vertex set~U isd( ~U), then the longest required duration to witness a particularvaluation to~U is d( ~U) time-

steps. From Lemma 10.2, we know that phase abstraction foldstime modulo-k. Therefore,

any transition of states in~N correlates tok transitions inN , and the corresponding valua-

tion toU will occur within k � d( ~U) time-steps.

10.1 Phase Abstraction AlgorithmsIn this section we discuss our algorithms for abstractingk-phase netlists. Because use

of implicit-clocked REGISTERs simplifies a toolset, and since phase abstraction is not it-

eratively applicable, it is often beneficial to perform phase abstraction during the design

compile and import process.

Several important algorithms were already provided in Figure 10.6 (our top-level

phase abstraction algorithm), and in Figure 10.7 (for tracelifting). We thus need only

provide our MDL partitioning algorithm in Figure 10.8. Notethat this algorithm may read-

ily be optimized so that each net is considered only a constant number of times in fanout

traversal as well as fanin traversal, thus ensuring linearity of the phase abstraction process.

Theorem 10.4.Algorithm PhaseAbstract performs optimalk-phase abstraction reduc-

tions for two-phase netlists.

Proof. By Lemma 10.1, there is a unique MDL partition of a netlist. Each MDL is of min-

imum size, resulting in a maximum number of dependent layersin the netlist. Since each

MDL may be abstracted independently of the others, the locally optimal solutions yield a

globally optimal result for two-phase netlists. This property follows from the observation

that eliminating any LATCH l from a two-phase MDL implies that all LATCHes l0 in the

combinational fanin or fanout of this LATCH (within the MDL) must be preserved, which

in turn implies that all LATCHes in the combinational fanin or fanout ofl0 must be elim-

inated, and so on. Therefore, all LATCHes of a single color must be eliminated, and the

other color preserved, within the MDL.

164

Partition K PhasePartition (Netlist N) fi = �1;foreach v 2 L f

if (v 2 Sij=0A[j℄) fcontinue;gi++;A[i℄ = in q = out q = fvg;while

�:empty(in q) _ :empty(out q)� fif

�:empty(in q)� fu = Pop(in q);if

�C(u) � 0� fcontinue;g� = fw : w 2 combinational fanin�data(u)� ^ w 2 Lg n A[i℄;

Assert�C(�) � C(u)� 1�;

Push(in q; �); Push(out q; �);A[i℄ = A[i℄ [ �;gif

�:empty(out q)� fu = Pop(out q);if�C(u) � k � 1� fcontinue;g� = fw : w 2 combinational fanout(u) ^ w 2 Lg n A[i℄;

Assert�C(�) � C(u) + 1�;

Push(in q; �); Push(out q; �);A[i℄ = A[i℄ [ �;gggreturn A[ ℄;g

Figure 10.8: MDL partitioning algorithm

For the relatively uncommon case ofk � 3, this linear-time algorithm may not

yield an optimal solution. Consider the3-phase MDL of Figure 10.9, where the numbers

after the slashes indicate the “width” of the correspondingvectored LATCHes. Preserv-

ing any single phase will yield a solution with three REGISTERs, whereas preserving the�1 A2 and the�0 A3 yields an optimum solution of two. Clearly the optimum solution

is achievable in superlinear polynomial time by solving as-t node min-cut problem2 on

2One of the most efficient known algorithms for solving thes-t min-cut problem is thehighest-labelpreflow-push algorithm, which isO(jVj2 � jEj1=2) [105].

165

A1/2 A2 /1 A5 /3A4 /2A3/1

�0 �1�1�0 �2Figure 10.9: Example three-phase MDL

the LATCH connectivity graph (whose vertices are all LATCHes, and whose directed edges

represent a combinational fanout connectivity between LATCHes of colori to color i + 1for 0 � i < k � 1) betweensourcesof color 0 andsinksof color k � 1. Rather than

spending superlinear resources for phase abstraction, it is our experience that an optimal

tool implementation will use a linear technique to achieve efficient phase abstraction (yet

to obtain superior reductions compared to a globalpreserve-�i approach [106]), then to

subsequently use other superlinear reduction techniques such as retiming (refer to Chap-

ter 6) to provide additional reductions. As noted in [106], retiming will compensate for any

suboptimality in the phase abstraction process, thus a global preserve-�i approach coupled

with retiming may be a reasonable choice for a simpler tool implementation aside from the

extra processing time required for retiming the less optimal phase-abstracted netlist. Alter-

natively, if optimal phase abstraction is desired, a more sophisticated algorithm may readily

be incorporated into the above framework to perform finer-grained abstraction decisions.

10.2 Related Work

A similar technique for phase abstraction was proposed in [106] for sequential hardware

equivalence. They propose globally converting LATCHes of all but a single phase into

buffers. Their work proves correctness for the steady-state subgraph of the abstracted

netlist; as such, initial values are discarded. However, this approach is insufficient for

166

invariant checking; modern hardware designs typically require an explicit initialization se-

quence (e.g., via scan chains) before proper functionalityis ensured. Failure to consider

initial values, and transitions outside of the steady state, may fail to expose certain arbi-

trarily complicated design flaws existing before steady state is reached, and even prevent

the netlist from reaching its intended steady state. Our approach does preserve initial val-

ues, and we prove a bisimilarity between the original and abstracted netlists relative to

their initial states. The technique in [106] further proposed a globally greedy approach of

“removing all but the smallest phase set” of LATCHes. They propose retiming as a sec-

ond reduction step to ensure minimal LATCH counts. Our work calculates minimally-sized

partitions of the netlist, and allows a greedy choice of which phases to discard for each

partition, independently of the other partitions, hence isable to achieve reductions beyond

those possible with the technique of [106] alone (without the more costly retiming step). A

customized algorithm for efficient image calculation uponk-phase netlists, exploiting the

distinctness of vertices of each color, is presented in [106] – though it does not offer the

above-mentioned benefits of our structural transformation.

Many hardware compilers allow automatic translation of master-slave LATCH sets

into a single REGISTER. Retiming algorithms [68] may be used to retime the netlist such

that�0-�k�1 layers become adjacent and one-to-one. However, use of sucha mechanism

would require interpretation of LATCHes in addition to REGISTERs in a re-entrant retim-

ing engine, and furthermore would require additional retiming constraints to ensure such

adjacency, which is somewhat unattractive. Additionally,retiming requires quadratic re-

sources or greater3; a prior linear-time phase abstraction may offer significant speed-ups

to retiming algorithms by decreasing REGISTER count, as was also observed in [106]. We

therefore have found that use of retiming to perform phase abstraction is less attractive than

our approach. However, phase abstraction and retiming offer complementary benefits, and

we have found the subsequent use of retiming extremely beneficial.

3Retiming is solvable as a min-cost flow problem [66], for which one of the most efficient known algo-rithms is theenhanced capacity scaling algorithm, which isO�jEj � log(jVj) � (jEj+ jVj � log(jVj))� [75].

167

Stuttering bisimulation[28], which relates two machines which are semantically

equivalent except that either may add “repetitious state transitions” that do not appear in

the other, is a related concept. The satisfaction of two stuttering bisimilar states is identi-

cal for CTL* formulae with noX(f) subformulae [107], which covers invariant checking.

Stuttering bisimilarity offers some degree of insight intothe nature of ourk-phase bisim-

ilarity. Indeed,k-phase abstraction yields an abstracted netlist~N related to its originalNby ak-stuttering upon sets of vertices with no FREE vertices in their combinational fanin.

Therefore, the results of prior research on stuttering bisimulation hold between selective

vertices ofN and ~N . However, there are several important distinctions between these top-

ics. One is that we establish trace-equivalence over vertices with FREE vertices in their

combinational fanin, which generally do not stutter. Additionally, there is one fundamental

contribution of this work beyond (or leading to) stutteringbisimulation: we provide linear-

time algorithms that analyze and transform the structure ofa netlist, hence there is no need

to transform or even analyze the state transition graph of a netlist to achieve our reductions.

Thus, while stuttering bisimulation is a good framework from which to theoretically under-

stand phase abstraction, it does not offer a practically useful mechanism to perform phase

abstraction on very large netlists.

Related to the topic of phase abstraction is -slow abstraction (refer to Chapter 9).

Like a -phase netlist, the topology of -slow netlists guarantee that any directed cycle has

modulo- state elements; similar -coloring may be applied to both netlist types. How-

ever, unlike phase abstraction, -slow abstraction is only applicable to netlists composed

of REGISTERs. Furthermore, the state elements of -slow netlists generally do not stutter

whatsoever, and all of their initial values have semantic importance, unlikek-phase netlists.

The use of -slow abstraction after phase abstraction may be a beneficial verification strat-

egy; these techniques are complementary.

The work of [108] provides a methodology for a specification to operate at a dif-

ferent time-scale as a hardware implementation to increasethe utility of assume-guarantee

168

reasoning. However, they do not focus on abstractions to enhance verification, only on the

mechanics of interfacing a specification in one time-scale to an implementation in another.

The work of [109] provides a general set of formalisms to relate various transformations

of netlists with various latching and clocking schemes, such as multi-phase netlists, re-

timed netlists, and netlists with multiple clock domains. However, their approach does not

address techniques for reducing netlist size.

In [76], Touati and Brayton proposed a method for adding reset logic which forces

an equivalent initial state for retimed netlists. This reset logic is similar to our technique of

preserving the initial value of eliminated phase-k� 1 LATCHES as depicted in Figure 10.5.


Our experimental results are reported for two-phase abstraction using the model checker

RuleBase [103]. This algorithm has been deployed for use on many components of IBM’s

Gigahertz Processor. The results of this reduction on several components of this processor

are provided in Table 10.1. During the initial stages of model checking, this abstraction was

not available. Once the abstraction became available, properties which previously required

many hours to complete would finish in several minutes. More encompassing properties

became feasible on the abstracted netlist which would not otherwise complete.

These experiments were run on an IBM RS/6000 Workstation Model 590 with 2 GB

main memory. RuleBase was run with redundancy removal reductions enabled. These ex-

periments were run with a random initial BDD order (though pairing present-state and next-

state variables), and with dynamic reordering enabled using the technique of Rudell [104].

One property run on the Load Serialization Logic required 25.6 seconds and 36 MB

on the abstracted netlist (with 81 FREE plus REGISTERvariables), including phase abstrac-

tion resources. The same property required 450.2 seconds and 92 MB for the unabstracted

netlist (with 116 variables). A more challenging property run on the Instruction Flushing

169

Logic FunctionState Elements

Before ReductionState ElementsAfter Reduction

Load Serialization Logic 8096 2586L1 Cache Reload Logic 3102 1418Instruction Flushing Logic 138 69Instruction-Fetch Address Generation Logic 4891 2196Branch Logic 6918 3290Instruction Issue Logic 6578 3249Tag Management Logic 578 289Instruction Decode Logic 1980 978Load / Store Control 821 409

Table 10.1: Phase abstraction results for GP netlists

Logic required 852 seconds of user time and 48 MB on the abstracted netlist (with 96 vari-

ables). This same property did not complete on the unabstracted netlist (with 162 variables)

within 72 hours.

While it may seem surprising in these two case that the numberof variables after

phase abstraction ismore than halfthat without phase abstraction, this is due to several

phenomena. First, some of these variables are used for the driver and property automata;

these may be modeled directly as REGISTERs rather than�0-�1 LATCHes even for the un-

abstracted two-phase netlist. Second, phase abstraction does not eliminate FREE variables.

Third, since these results include redundancy removal, some of the initial LATCH variables

may be eliminated by this technique.

With this abstraction, as demonstrated above, model checking was enabled to verify

much “larger” and more meaningful properties in less time. All RuleBase users on the

Gigahertz processor project began running exclusively with this abstraction. There have

been more than one thousand formulae written and model checked to date on this project,

which collectively have exposed more than two hundred bugs at various design stages.

This abstraction thus provided an efficient means to help alleviate the verification burdens

imposed by the low level of the high-performance implementation.

Additionally, we have implemented phase abstraction within the transformation-

170

based verification system discussed in previous chapters; all Gigahertz processor experi-

ments mentioned in those chapters use this technique. Though elimination of state elements

often reduces verification complexity as per the above results, such an approach risks ex-

ponential blowup of BDDs representing the composite cones.Advanced BDD-based tech-

niques such as implicit conjoining [110], or fine-grained reachability analysis [44], may

be used to minimize such risk. However, it is noteworthy thatin more than five years of

deploying this technique for model checking and invariant checking on nearly 100 design

components, we never once needed to disable this abstraction to prevent BDD-blowup.

171

Chapter 11

Conclusions and Future Work

In this chapter we summarize the contributions of this thesis, and discuss future research di-

rections. Our overall research thrust has focused upon the deployment of structural analysis

and abstraction techniques to enhance hardware verification. At a high-level, our contribu-

tions are two-fold: we discuss a set of structural abstraction techniques to simplify netlist

representations, and we provide theory for compositionally and structurally deriving di-

ameter bounds from netlist partitions which enables the useof abstractions to help tighten

these bounds. A common theme across many of these techniquesis that atemporalde-

composition of the verification task enables significantspatialreductions. We develop all

techniques as re-entrant modules, allowing arbitrary sequencing of and synergy between

these techniques under a transformation-based verification framework as proposed in [10].

Numerous experimental results have been provided to demonstrate the power and synergy

of these techniques, as well as their overall ability to increase the capacity of automated

proof systems. Our specific contributions include the following.� Our diameter approximation techniques are discussed in Chapter 4. We discuss a

structural algorithm for overapproximating design diameter. Though overapproxi-

mate, this approach is very efficient and able to yield tight bounds for some netlists

for which other approximate techniques (such as recurrencediameter) are exponen-

172

tially loose. We perform the overapproximation based upon apartitioning of the

netlist, and develop theory to allow arbitrary methods to beused on a per-component

basis. Additionally, we discuss the effects of our abstraction techniques upon diam-

eter in each of the corresponding chapters, allowing per-component transformations

to improve diameter bounds.� We discuss redundancy removal in Chapter 5. Our contribution in this area is the

technique of on-the-fly retiming, and the efficient AND/INVERTER/REGISTERgraph

netlist representation.� We discuss our technique of generalized retiming in Chapter6. Our generalizations

include the use of peripheral retiming, NEGATIVE REGISTERs, and a relaxed reset

state in an invariant checking framework. We furthermore propose the concept of

fanin sharingof REGISTERs to enhance min-area retiming.� We discuss the use of structural cut-based abstraction in Chapter 7, based upon the

technique presented in [88]. This abstraction is useful in eliminating combinational

logic and FREE vertices in a netlist.� We discuss structural target enlargement in Chapter 8. Thistechnique is capable of

providing significant reductions in netlist size, in addition to the common character-

istic of making targets probabilistically easier, and shallower, to hit.� We discuss generalized -slow abstraction in Chapter 9. This state folding technique

is capable of providing significant reductions in REGISTERcount.� We discuss phase abstraction in Chapter 10. This approach renders a REGISTER-

based netlist from a LATCH-based one, which is easier to support in a verification

toolset and contains significantly fewer sequential elements.

There are numerous future work directions to enhance the results reported herein.

First, structural diameter overapproximation techniquesshould be improved to enable bet-

173

ter bounding for SCCs. Additionally, as semantic approaches (such as QBF and recurrence

diameter) are improved, our compositional theory providesa framework to synergistically

exploit their strengths. Finally, the incorporation and exploitation of other abstraction tech-

niques to help bound netlist or component diameters can further help enable the most effi-

cient diameter overapproximation system, capable of yielding the tightest diameter bounds.

This research direction has the potential to greatly increase automatic proof capacity on

large netlists due to the efficiency ofBMC techniques.

Second, there are numerous synergistic abstraction approaches which may be in-

cluded in a transformation-based verification setting, such as symmetry reductions and

more powerful synthesis optimizations. Further enhancements to the techniques proposed

herein are also possible, as discussed in the respective chapters. Furthermore, the use of

our completely automatic structural abstractions may be augmented by the application of

manually-guided abstractions and those that require more abstract netlist representations,

perhaps as a preprocessing step.

Third, as improved verification techniques emerge such as enhancements to SAT and

reachability analysis, the overall capacity of encapsulating verification tools will increase.

This will enhance verification capacity synergistically toabstraction techniques.

Finally, further research into applying these techniques to more general property

checking frameworks may be useful to exploit their potential to reduce verification com-

plexity for more general types of proof (such as liveness).

174

Appendix A

Appendix

A.1 Modeling Interconnections as Nets

Throughout this thesis we refer tointerconnectionsasnets. This is somewhat imprecise;

a net may have multiple sinks and sources, and is not necessarily “directed,” whereas an

edge has exactly one source and one sink, and is directed. In this section we discuss ways

to model more general interconnections as nets.

1. An interconnection may have multiple sinks. This may simply be handled by repre-

senting ann-sink net asn distinct edges in our graph model. Recall that each edge is

semantically equivalent to its source vertex.

2. An interconnection may have multiple sources – and furthermore these sources may

drive conflicting values, or no values, onto the net at any time-step.

In this case, we may need to add additional logic to create a semantically-equivalent

net. For example, if the interconnection acts as an OR between various sources, we

may need to inject an OR gate whose inputs are the sources of the interconnection,

which in turn is the source of the edge correlating to the original interconnection.

If an interconnection is defined as a multi-source bus, a morecomplicated function

175

may exist for its behavior in the presence of multiple activesources, for example,

due to varying transistor sizes and drive strengths. Nevertheless, a straightforward

modeling is often possible.

In the case of zero active drivers, there are two common possibilities. First, the net

may act as a pull-up or a pull-down, which may be modeled by logic that drives a

ONE or ZERO, respectively, in case of zero active drivers. If no driversare active and

no “default-value” logic is in place, or if two or more drivers are driving conflicting

values and driver strengths cannot resolve the value deterministically, we may con-

servatively inject a new driver – a unique FREE vertex, whose value takes dominance

in precisely these ambiguous cases.

Given such conventions, we may transform arbitrary interconnections to nets.

A.2 Alternate Gate Types

One may wish to add to the possible gate types of our netlist model – e.g., to add OR or

XOR primitives. We may readily extend Definition 3.11 by definingGv = fv(u1; u2; :::; uj)for a new gate functionfv. Note that our set of gate types from Definition 3.11 are all

completely symmetric on their inputs. If we wish to include more complex gates which

are not symmetric on inputs (e.g., a multiplexor), our netlist model would need to reflect

the ordering of incoming edges of each vertex – and the graph representation and structural

algorithms may need to be altered appropriately. Synthesisof alternate gate types into our

supported types is possible using common logic decompositions. Our choice of the AND

gate as the only multi-input primitive is motivated by priorresearch [79, 51, 25].

Our use of implicit-clock REGISTERs may seem limiting, since a hardware design

may have GATED-CLOCK REGISTERs, MULTI -PORT REGISTERs, or LEVEL-SENSITIVE

LATCHes. Translation of LEVEL-SENSITIVE LATCHes to REGISTERs is handled by phase

abstraction (refer to Chapter 10). We now show that the alternate REGISTERs may be

176

readily modeled by the implicit-clock version with some added combinational gates.

Definition A.1. A GATED-CLOCK REGISTER has 2 inputs:dataandgate. Its semantics

are defined as follows.� If i > 0, then p(v; i + 1) = ite�p(gate(v); i); p(data(v); i); p(v; i)�. Otherwisep(v; 0) = p�Z(v); 0�.As with the LEVEL-SENSITIVE LATCH in Definition 10.1, Definition A.1 requires

an ordering of incoming edges to map the structure of the GATED-CLOCK REGISTER to

a precise semantics. We may model GATED-CLOCK REGISTERs as normal REGISTERs,

with the addition of a feedback path as depicted in Figure A.1[64]. This figure depicts a

GATED-CLOCK REGISTERto the left, with thegatesignal which may force it to holddout.

To the right we indicate a semantically equivalent model of this structure using an implicit-

clock REGISTER. This transformation consists of adding a multiplexor, selected by the

gate, which sensitizes a sampling ofdata if a 1, else a feedback loop from the REGISTER

to emulate the “hold” condition if thegateis a0.

gate

datadout

gatedout data

01Figure A.1: Remodeling GATED-CLOCK REGISTERs

MULTIPLE-PORT REGISTERs may be represented by a set ofk > 1 gate, datainput

ports. They must have a pre-specified permutation of “priorities” between them to define

which input port’sdata value will be sampled in case of multiple activegates – though

such a condition is almost always a design error. Intuitively, this MULTI -PORT REGISTER

will sample and delay thedataof the highest-priority port which has a non-blockinggate,

or hold its value if none are non-blocking. MULTI -PORT REGISTERs may be represented

by generalizing the synthesis of Figure A.1 in the straightforward fashion.

177

Bibliography

[1] G. E. Moore, “Cramming more components onto integrated circuits,” Electronics, vol. 38,

pp. 114–117, April 1965.

[2] P. Gelsinger, P. Gargini, G. Parker, and A. Yu, “Microprocessors circa 2000,”IEEE Spectrum,

vol. 26, pp. 43–47, October 1989.

[3] D. W. Jorgenson and C. W. Wessner, eds.,Measuring and Sustaining the New Economy:

Report of a Workshop. National Academies Press, 2002.

[4] N. Weste and K. Eshraghian,Principles of CMOS VLSI Design - A System Perspective. Sec-

ond Edition. Addison-Wesley Publishing Company, 1993.

[5] M. Srivas, H. Rueß, and D. Cyrluk, “Hardware verificationusing PVS,” inFormal Hardware

Verification: Methods and Systems in Comparison, pp. 156–205, Springer-Verlag, 1997.

[6] M. Kaufmann, P. Manolios, and J. S. Moore,Computer-Aided Reasoning: An Approach.

Kluwer Academic Publishers, 2000.

[7] A. Aziz, V. Singhal, and R. K. Brayton, “Verifying interacting finite state machines: Com-

plexity issues,” Tech. Rep. UCB/ERL M93/52, Electronics Research Lab, University of Cal-

ifornia at Berkeley, July 1993.

[8] J. M. Ludden, W. Roesner, G. M. Heiling, J. R. Reysa, J. R. Jackson, B.-L. Chu, M. L. Behm,

J. Baumgartner, R. D. Peterson, J. Abdulhafiz, W. E. Bucy, J. H. Klaus, D. J. Klema, T. N.

Le, F. D. Lewis, P. E. Milling, L. A. McConville, B. S. Nelson,V. Paruthi, T. W. Pouarz,

A. D. Romonosky, J. Stuecheli, K. D. Thompson, D. W. Victor, and B. Wile, “Functional

178

verification of the POWER4 microprocessor and POWER4 multiprocessor systems,”IBM

Journal of Research and Development, vol. 46, pp. 53–76, January 2002.

[9] D. A. Patterson and D. R. Ditzel, “The case for the reducedinstruction set computer,”Com-

puter Architecture News, vol. 8, pp. 25–33, October 1980.

[10] A. Kuehlmann and J. Baumgartner, “Transformation-based verification using generalized

retiming,” inComputer-Aided Verification (CAV’01), (Paris, France), pp. 104–117, July 2001.

[11] A. Pnueli, “In transition from global to modular temporal reasoning about programs,”Logics

and Models of Concurrent Systems, vol. F13, pp. 123–144, 1985.

[12] E. M. Clarke and E. A. Emerson, “Design and synthesis of synchronization skeletons us-

ing branching-time temporal logic,” inProceedings of the Workshop on Logic of Programs,

(Yorktown Heights, NY), pp. 52–71, May 1981.

[13] E. A. Emerson, “Temporal and modal logic,”Handbook of Theoretical Computer Science,

vol. B, pp. 996–1072, 1990.

[14] R. Gerth, D. Peled, M. Y. Vardi, and P. Wolper, “Simple on-the-fly automatic verification of

linear temporal logic,” inProtocol Specification Testing and Verification, (Warsaw, Poland),

pp. 3–18, June 1995.

[15] I. Beer, S. Ben-David, and A. Landver, “On-the-fly modelchecking of RCTL formulas,” in

Computer-Aided Verification (CAV’98), (Vancouver, BC, Canada), pp. 184–194, July 1998.

[16] J. Baumgartner, T. Heyman, V. Singhal, and A. Aziz, “Model checking the IBM Gigahertz

Processor: An abstraction algorithm for high-performancenetlists,” inComputer-Aided Ver-

ification (CAV’99), (Trento, Italy), pp. 72–83, July 1999.

[17] J. Baumgartner, A. Tripp, A. Aziz, V. Singhal, and F. Andersen, “An abstraction algorithm

for the verification of generalized C-slow designs,” inComputer-Aided Verification (CAV’00),

(Chicago, IL), pp. 5–19, July 2000.

179

[18] O. Coudert, C. Berthet, and J. C. Madre, “Verification ofsynchronous sequential machines

based on symbolic execution,” inInternational Workshop on Automatic Verification Methods

for Finite State Systems, (Grenoble, France), pp. 365–373, June 1989.

[19] J. R. Burch, E. M. Clarke, K. L. McMillan, D. L. Dill, and L. J. Hwang, “Symbolic model

checking:1020 states and beyond,” inFifth Annual Symposium on Logic in Computer Sci-

ence, (Philadelphia, PA), pp. 428–439, June 1990.

[20] T. Niermann and J. H. Patel, “HITEC: A test generation package for sequential circuits,” in

European Conference on Design Automation, (Amsterdam, The Netherlands), pp. 214–218,

February 1991.

[21] J. A. Darringer, D. Brand, J. V. Gerbi, W. H. Joyner, and L. H. Trevillyan, “Logic synthesis

through local transformations,”IBM Journal on Research and Development, vol. 25, pp. 272–

280, July 1981.

[22] E. M. Sentovich, K. J. Singh, C. Moon, H. Savoj, R. K. Brayton, and A. L. Sangiovanni-

Vincentelli, “Sequential circuit design using synthesis and optimization,” inIEEE Interna-

tional Conference on Computer Design, pp. 328–333, October 1992.

[23] A. Kuehlmann, V. Paruthi, F. Krohm, and M. Ganai, “Robust Boolean reasoning for equiva-

lence checking and functional property verification,”IEEE Transactions on Computer-Aided

Design, vol. 21, December 2002.

[24] J. Baumgartner, A. Kuehlmann, and J. Abraham, “Property checking via structural analysis,”

in Computer-Aided Verification (CAV’02), (Copenhagen, Denmark), pp. 151–165, July 2002.

[25] J. Baumgartner and A. Kuehlmann, “Min-area retiming onflexible circuit structures,” in

IEEE/ACM International Conference on Computer-Aided Design, (San Jose, CA), pp. 176–

192, November 2001.

[26] A. Aziz, V. Singhal, G. M. Swamy, and R. K. Brayton, “Minimizing interacting finite state

machines: A compositional approach to language containment,” in IEEE International Con-

ference on Computer Design, (Cambridge, MA), pp. 255–261, October 1994.

180

[27] K. Fisler and M. Vardi, “Bisimulation and model checking,” in Correct Hardware Design and

Verification Methods (CHARME’99), (Bad Herrenalb, Germany), pp. 338–341, September

1999.

[28] M. C. Browne, E. M. Clarke, and O. Grumberg, “Characterizing finite Kripke structures in

propositional temporal logic,”Theoretical Computer Science, vol. 59, pp. 115–131, 1988.

[29] A. Aziz, T. R. Shiple, V. Singhal, and A. L. Sangiovanni-Vincentelli, “Formula-dependent

equivalence for compositional CTL model checking,” inComputer-Aided Verification

(CAV’94), (Stanford, CA), pp. 324–337, June 1994.

[30] P. Cousot and R. Cousot, “Abstract interpretation: A unified lattice model for static analy-

sis of programs by construction or approximation of fixpoints,” in ACM Symposium on the

Principles of Programming Languages, (Los Angeles, CA), pp. 238–252, January 1977.

[31] D. Dams, R. Gerth, and O. Grumberg, “Abstract interpretation of reactive systems,”ACM

Transactions on Programming Languages and Systems, vol. 19, no. 2, pp. 253–291, 1997.

[32] E. M. Clarke, O. Grumberg, and D. E. Long, “Model checking and abstraction,” inSympo-

sium on the Principles of Programming Languages, (Albuquerque, New Mexico), pp. 343–

354, January 1992.

[33] D. E. Long, Model Checking, Abstraction and Compositional Verification. PhD thesis,

Carnegie Mellon University, Pittsburgh, Pennsylvania, July 1993.

[34] R. P. Kurshan,Computer-Aided Verification of Coordinating Processes. Princeton University

Press, 1994.

[35] E. M. Clarke, O. Grumburg, S. Jha, Y. Lu, and H. Veith, “Counterexample-guided abstrac-

tion refinement,” inComputer-Aided Verification (CAV’00), (Chicago, IL), pp. 154–169, July

2000.

[36] R. Hojati and R. K. Brayton, “Automatic datapath abstraction of hardware systems,” in

Computer-Aided Verification (CAV’95), (Liege, Belgium), pp. 98–113, July 1995.

181

[37] R. Hojati, A. J. Isles, D. Kirkpatrick, and R. K. Brayton, “Verification using uninterpreted

functions and finite instantiations,” inFormal Methods in Computer-Aided Design, (Palo

Alto, CA), pp. 218–232, November 1996.

[38] P.-H. Ho, A. J. Isles, and T. Kam, “Formal verification ofpipeline control using controlled

token nets and abstract interpretation,” inIEEE/ACM International Conference on Computer-

Aided Design, (San Jose, CA), pp. 529–536, November 1998.

[39] V. Paruthi, N. Mansouri, and R. Vemuri, “Automatic datapath abstraction for verification of

large scale designs,” inIEEE International Conference on Computer Design, (Austin, TX),

pp. 192–194, October 1998.

[40] K. S. Namjoshi and R. P. Kurshan, “Syntactic program transformations for automatic ab-

straction,” inComputer-Aided Verification (CAV’00), (Chicago, IL), pp. 435–449, July 2000.

[41] O. Coudert and J. C. Madre, “A unified framework for the formal verification of sequential

circuits,” in IEEE International Conference on Computer-Aided Design, (Santa Clara, CA),

pp. 126–129, November 1990.

[42] I.-H. Moon, J.-Y. Jang, G. D. Hachtel, F. Somenzi, J. Yuan, and C. Pixley, “Approximate

reachability don’t cares for CTL model checking,” inIEEE/ACM International Conference

on Computer-Aided Design, (San Jose, CA), pp. 351–358, November 1998.

[43] I.-H. Moon, J. H. Kukula, K. Ravi, and F. Somenzi, “To split or to conjoin: the question

in image computation,” inACM/IEEE Design Automation Conference, (Los Angeles, CA),

pp. 23–28, June 2000.

[44] H. Jin, A. Kuehlmann, and F. Somenzi, “Fine-grain conjunction scheduling for symbolic

reachability analysis,” inTools and Algorithms for the Construction and Analysis of Systems,

(Grenoble, France), pp. 312–326, April 2002.

[45] E. M. Clarke, D. E. Long, and K. L. McMillan, “Compositional model checking,” inIEEE

Symposium on Logic in Computer Science, (Pacific Grove, CA), pp. 353–362, June 1989.

182

[46] R. Beers, R. Ghughal, and M. Aagaard, “Applications of hierarchical verification in model

checking,” inFormal Methods in Computer-Aided Design, (Austin, TX), November 2000.

[47] E. A. Emerson and A. P. Sistla, “Symmetry and model checking,” in Computer Aided Verifi-

cation (CAV’93), (Elounda, Greece), pp. 463–478, 1993.

[48] C. N. Ip and D. L. Dill, “Better verification through symmetry,” in Computer Hardware

Description Languages and their Applications, (Ottawa, Canada), pp. 97–111, 1993.

[49] G. S. Manku, R. Hojati, and R. K. Brayton, “Structural symmetry and model checking,” in

Computer-Aided Verification (CAV’98), (Vancouver, BC, Canada), pp. 159–171, July 1998.

[50] M. K. Ganai and A. Kuehlmann, “On-the-fly compression oflogical circuits,” in Interna-

tional Workshop on Logic & Synthesis, (Dana Point, CA), May 2000.

[51] A. Kuehlmann, M. K. Ganai, and V. Paruthi, “Circuit-based Boolean reasoning,” in

ACM/IEEE Design Automation Conference, (Las Vegas, NV), pp. 232–237, June 2001.

[52] Z. Kohavi,Switching and Finite Automata Theory. New York, NY: McGraw-Hill, 1978.

[53] R. E. Bryant, “Graph-based algorithms for Boolean function manipulation,”IEEE Transac-

tions on Computers, vol. C-35, pp. 677–691, August 1986.

[54] P. F. Williams, A. Biere, E. M. Clarke, and A. Gupta, “Combining decision diagrams and

SAT procedures for efficient symbolic model checking,” inComputer-Aided Verification

(CAV’00), (Chicago, IL), pp. 124–138, July 2000.

[55] K. L. McMillan, “Applying SAT methods in unbounded symbolic model checking,” in

Computer-Aided Verification (CAV’02), (Copenhagen, Denmark), pp. 250–264, July 2002.

[56] H. Cho, G. D. Hachtel, E. Macii, B. Pleisser, and F. Somenzi, “Algorithms for approximate

FSM traversal based on state space decomposition,”IEEE Transactions on Computer-Aided

Design, vol. 15, pp. 1465–1478, December 1996.

183

[57] A. Biere, A. Cimatti, E. M. Clarke, and Y. Zhu, “Symbolicmodel checking without BDDs,”

in Tools and Algorithms for Construction and Analysis of Systems, (Amsterdam, The Nether-

lands), pp. 193–207, March 1999.

[58] M. K. Ganai, A. Aziz, and A. Kuehlmann, “Enhancing simulation with BDDs and ATPG,”

in ACM/IEEE Design Automation Conference, (New Orleans, LA), pp. 385–390, June 1999.

[59] P.-H. Ho, T. Shiple, K. Harer, J. Kukula, R. Damiano, V. Bertacco, J. Taylor, and J. Long,

“Smart simulation using collaborative formal and simulation engines,” inIEEE/ACM Inter-

national Conference on Computer-Aided Design, (San Jose, CA), pp. 120–126, November

2000.

[60] D. Deharbe and A. M. Moreira, “Using induction and BDDs to model check invariants,”

in Correct Hardware Design and Verification Methods (CHARME’97), (Montreal, Canada),

pp. 203–213, October 1997.

[61] M. Sheeran, S. Singh, and G. Stalmarck, “Checking safety properties using induction and

a SAT-solver,” inFormal Methods in Computer-Aided Design, (Austin, TX), pp. 108–125,

November 2000.

[62] L. J. Stockmeyer and A. R. Meyer, “Word problems requiring exponential time,” inProceed-

ings of the 5th ACM Symposium on the Theory of Computing, (Austin, TX), pp. 1–9, April

1973.

[63] C.-C. Yen, K.-C. Chen, and J.-Y. Jou, “A practical approach to cycle bound estimation,” in

International Workshop on Logic & Synthesis, (New Orleans, LA), pp. 149–154, June 2002.

[64] R. K. Ranjan,Design and Implementation Verification of Finite State Systems. PhD thesis,

University of California at Berkeley, Berkeley, CA, December 1997.

[65] I. Beer, S. Ben-David, D. Geist, R. Gewirtzman, and M. Yoeli, “Methodology and system for

practical formal verification of reactive hardware,” inComputer-Aided Verification (CAV’94),

(Stanford, CA), pp. 182–193, July 1994.

184

[66] C. Leiserson and J. Saxe, “Retiming synchronous circuitry,” Algorithmica, vol. 6, pp. 5–35,

1991.

[67] G. P. Bischoff, K. S. Brace, S. Jain, and R. Razdan, “Formal implementation verification of

the bus interface unit for the Alpha 21264 microprocessor,”in IEEE International Conference

on Computer Design, (Austin, TX), pp. 16–24, October 1997.

[68] C. Leiserson and J. Saxe, “Optimizing synchronous systems,”Journal of VLSI and Computer

Systems, vol. 1, pp. 41–67, January 1983.

[69] S. Malik, E. M. Sentovich, R. K. Brayton, and A. Sangiovanni-Vincentelli, “Retiming and

resynthesis: Optimizing sequential networks with combinational techniques,”IEEE Trans-

actions on Computer-Aided Design, vol. 10, pp. 74–84, January 1991.

[70] S. Hassoun and C. Ebeling, “Experiments in the iterative application of resynthesis and re-

timing,” in International Workshop on Timing Issues in the Specification and Synthesis of

Digital Systems, December 1997.

[71] A. Gupta, P. Ashar, and S. Malik, “Exploiting retiming in a guided simulation based valida-

tion methodology,” inCorrect Hardware Design and Verification Methods (CHARME’99),

(Bad Herrenalb, Germany), pp. 350–353, September 1999.

[72] S. Hassoun and C. Ebeling, “Using precomputation in architecture and logic synthesis,” in

IEEE/ACM International Conference on Computer-Aided Design, (San Jose, CA), pp. 316–

323, November 1998.

[73] J. J. Forrest. Personal communication, 2000.

[74] G. G. Dantzig,Linear Programming and Extensions. Princeton University Press, 1963.

[75] J. B. Orlin, “A faster strongly polynomial minimum costflow algorithm,” in Proceedings of

the 20th ACM Symposium on the Theory of Computing, (Chicago, IL), pp. 377–387, May

1988.

[76] H. J. Touati and R. K. Brayton, “Computing the initial states of retimed circuits,”IEEE

Transactions on Computer-Aided Design, vol. 12, pp. 157–162, January 1993.

185

[77] G. Even, I. Y. Spillinger, and L. Stok, “Retiming revisited and reversed,”IEEE Transactions

on Computer-Aided Design, vol. 15, pp. 348–357, March 1996.

[78] G. Cabodi, S. Quer, and F. Somenzi, “Optimizing sequential verification by retiming transfor-

mations,” inACM/IEEE Design Automation Conference, (Los Angeles, CA), pp. 601–606,

June 2000.

[79] E. Lehman, Y. Watanabe, J. Grodstein, and H. Harkness, “Logic decomposition during tech-

nology mapping,”IEEE Transactions on Computer-Aided Design, vol. 16, pp. 813–834, Au-

gust 1997.

[80] G. D. Micheli, “Synchronous logic synthesis: Algorithms for cycle-time minimization,”

IEEE Transactions on Computer-Aided Design, vol. 10, pp. 63–73, January 1991.

[81] M. S. Hung, W. O. Rom, and A. D. Waren,Optimization with IBM OSL. Scientific Press,

1993.

[82] R. K. Brayton, G. D. Hachtel, A. Sangiovanni-Vincentelli, F. Somenzi, A. Aziz, S.-T. Cheng,

S. Edwards, S. Khatri, Y. Kukimoto, A. Pardo, S. Qadeer, R. K.Ranjan, S. Sarwary, T. R.

Shiple, G. Swamy, and T. Villa, “VIS: A system for verification and synthesis,” inComputer-

Aided Verification (CAV’96), (New Brunswick, NJ), pp. 428–432, July 1996.

[83] R. K. Ranjan, A. Aziz, R. K. Brayton, B. Plessier, and C. Pixley, “Efficient BDD algorithms

for FSM synthesis and verification,” inInternational Workshop on Logic & Synthesis, (Lake

Tahoe, NV), June 1995.

[84] I.-H. Moon, G. D. Hachtel, and F. Somenzi, “Border-block triangular form and conjunction

schedule in image computation,” inFormal Methods in Computer-Aided Design, (Austin,

TX), pp. 73–90, November 2000.

[85] A. Dovier, C. Piazza, and A. Polticriti, “A fast bisimulation algorithm,” inComputer-Aided

Verification (CAV’01), (Paris, France), pp. 79–90, July 2001.

[86] P. Jain and G. Gopalakrishnan, “Efficient symbolic simulation-based verification using the

186

parametric form of Boolean expressions,”IEEE Transactions on Computer-Aided Design,

vol. 13, pp. 1005–1015, April 1994.

[87] M. D. Aagaard, R. B. Jones, and C.-J. H. Seger, “Formal verification using parametric rep-

resentations of Boolean constraints,” inACM/IEEE Design Automation Conference, (New

Orleans, LA), pp. 402–407, June 1999.

[88] I.-H. Moon, H. H. Kwak, J. Kukula, T. Shiple, and C. Pixley, “Simplifying circuits for formal

verification using parametric representation,” inFormal Methods in Computer-Aided Design,

(Portland, OR), pp. 52–69, November 2002.

[89] J. H. Kukula and T. R. Shiple, “Building circuits from relations,” in Computer-Aided Verifi-

cation (CAV’00), (Chicago, IL), pp. 113–123, July 2000.

[90] L. R. Ford and D. R. Fulkerson, “Maximal flow through a network,” Canadian Journal of

Mathematics, vol. 8, pp. 399–404, 1956.

[91] J. Yuan, J. Shen, J. Abraham, and A. Aziz, “On combining formal and informal verification,”

in Computer-Aided Verification (CAV’97), (Haifa, Israel), pp. 376–387, June 1997.

[92] C. H. Yang and D. L. Dill, “Validation with guided searchof the state space,” inACM/IEEE

Design Automation Conference, (San Francisco, CA), pp. 599–604, June 1998.

[93] L. de Alfaro, T. A. Henzinger, and F. Y. C. Mang, “Detecting errors before reaching them,”

in Computer-Aided Verification (CAV’00), (Chicago, IL), pp. 186–201, July 2000.

[94] J. R. Burch, E. M. Clarke, D. E. Long, K. L. McMillan, and D. L. Dill, “Symbolic model

checking for sequential circuit verification,”IEEE Transactions on Computer-Aided Design,

vol. 13, pp. 401–424, April 1994.

[95] O. Coudert, C. Berthet, and J. C. Madre, “Verification ofsequential machines using Boolean

functional vectors,” inIMEC-IFIP International Workshop on Applied Formal Methods for

Correct VLSI Design, (Leuven, Belgium), pp. 111–128, November 1989.

[96] T. Filkorn, “Functional extension of symbolic model checking,” inComputer-Aided Verifica-

tion (CAV’91), (Aalborg, Denmark), pp. 225–232, June 1991.

187

[97] Y. Hong, P. A. Beerel, J. R. Burch, and K. L. McMillan, “Safe BDD minimization using

don’t cares,” inACM/IEEE Design Automation Conference, (Anaheim, CA), pp. 208–213,

June 1997.

[98] M. Ganai and A. Aziz, “Enhancements to invariant verification using SIVA,” inInternational

Workshop on High Level Design Validation and Test (HLDVT’99), (San Diego, CA), Novem-

ber 1999.

[99] M. K. Ganai,Algorithms for Efficient State Space Search. PhD thesis, University of Texas,

Austin, TX, May 2001.

[100] P. Ashar, S. Devadas, and K. Keutzer, “Gate-delay-fault testability properties of multiplexor-

based networks,”Formal Methods in System Design, vol. 2, no. 1, pp. 93–112, 1993.

[101] C. H. Yang and D. L. Dill, “Spotlight: Best-first searchof FSM state space,” inInternational

Workshop on High Level Design Validation and Test (HLDVT’96), (Oakland, CA), November

1996.

[102] P. Yalagandula, V. Singhal, and A. Aziz, “Automatic lighthouse generation for directed state

space search,” inDesign, Automation, and Test in Europe, (Paris, France), pp. 237–242,

March 2000.

[103] I. Beer, S. Ben-David, C. Eisner, and A. Landver, “RuleBase: an industry-oriented formal

verification tool,” inACM/IEEE Design Automation Conference, (Las Vegas, NV), pp. 655–

660, June 1996.

[104] R. Rudell, “Dynamic variable ordering for ordered binary decision diagrams,” inInterna-

tional Workshop on Logic & Synthesis, (Tahoe City, CA), May 1993.

[105] J. Cheriyan and S. N. Maheshwari, “Analysis of preflow push algorithms for maximum net-

work flow,” SIAM Journal on Computing, vol. 18, no. 6, pp. 1057–1086, 1989.

[106] G. Hasteer, A. Mathur, and P. Banerjee, “Efficient equivalence checking of multi-phase de-

signs using retiming,” inIEEE/ACM International Conference on Computer-Aided Design,

(San Jose, CA), pp. 557–562, November 1998.

188

[107] E. A. Emerson and J. Y. Halpern, “‘Sometimes’ and ‘not never’ revisited: on branching time

versus linear time temporal logic,”Journal of the ACM, vol. 33, no. 1, pp. 151–178, 1986.

[108] T. A. Henzinger, S. Qadeer, and S. K. Rajamani, “Assume-guarantee refinement between

different time scales,” inComputer-Aided Verification (CAV’99), (Trento, Italy), pp. 208–

221, July 1999.

[109] A. R. Albrecht and A. J. Hu, “Register transformationswith multiple clock domains,” in

Correct Hardware Design and Verification Methods (CHARME’01), (Livingston, Scotland),

pp. 126–139, September 2001.

[110] A. J. Hu, G. York, and D. L. Dill, “New techniques for efficient verification with implicitly

conjoined BDDs,” inACM/IEEE Design Automation Conference, (San Diego, CA), pp. 276–

282, June 1994.

189

Vita

Jason Raymond Baumgartner received his Bachelor of Sciencein Electrical Engineering

from the University of Florida in May 1995. He immediately joined IBM’s Server Group

in Austin, TX, becoming involved in hardware verification. He began graduate school in

the Computer Engineering program at the University of Texasat Austin in 1996, receiv-

ing his Master of Science in 1998. He immediately became captivated by algorithms and

mathematical logic, and their implications to formal verification. This interest led him

to begin deploying model checking technologies at IBM. His efforts have yielded hun-

dreds of complex design flaws, and helped to establish formalverification as an essential

complementary verification technique for emerging designs. His research is focused upon

automatic abstraction techniques to enable formal verification to scale to large and complex

industrial designs.

Permanent Address: 14936 Purslane Meadow Trail

Austin, TX 78728

This dissertation was typeset with LATEX 2"1 by the author.

1LATEX 2" is an extension of LATEX. LATEX is a collection of macros for TEX. TEX is a trademark of theAmerican Mathematical Society. Some of the macros used in formatting this dissertation were written byDinesh Das, Department of Computer Sciences, The University of Texas at Austin, and extended by BertKay and James A. Bednar.

190

copyright by jason raymond baumgartner 2002...automatic structural abstraction techniques for...

Documents