copyright by jason raymond baumgartner 2002...automatic structural abstraction techniques for...
TRANSCRIPT
Copyright
by
Jason Raymond Baumgartner
2002
The Dissertation Committee for Jason Raymond Baumgartner
certifies that this is the approved version of the following dissertation:
Automatic Structural Abstraction Techniques for
Enhanced Verification
Committee:
Jacob Abraham, Supervisor
Andreas Kuehlmann
Adnan Aziz
E. Allen Emerson
Lizy Kurian John
Automatic Structural Abstraction Techniques for
Enhanced Verification
by
Jason Raymond Baumgartner, B.S., M.S.
Dissertation
Presented to the Faculty of the Graduate School of
The University of Texas at Austin
in Partial Fulfillment
of the Requirements
for the Degree of
Doctor of Philosophy
The University of Texas at Austin
December 2002
To my wife Shelly, to my parents,
to my grandmothers and the memory of my grandfathers, to the memory of Clover,
and to the semi-formal team at IBM
Acknowledgments
This acknowledgment is partitioned into several components.� I wish to first acknowledge those who were the most central to the research contribu-
tions of this thesis, as indeed this is the acknowledgment section of this thesis.
– After graduation with a Bachelor of Science in Electrical Engineering, I joined
IBM Austin in 1995, and quickly became involved in functional verification.
In 1996 I began the Master’s program at UT, and one of my first classes was
Formal Verification taught by Adnan Aziz. This class was the reason that I de-
veloped a passion for formal verification; within several months I began model
checking at IBM using RuleBase. Furthermore, my term projects in Adnan’s
classes involved structural abstractions entailing statefolding in a property
checking framework, which evolved into phase abstraction and -slow abstrac-
tion. I wish to thank Adnan for bringing me into the world of formal verifica-
tion. I also wish to thank and acknowledge Vigyan Singhal forcollaborations
and insight during these projects. Both Adnan and Vigyan were instrumental to
the development of my skills in technical writing and formalreasoning.
– I next wish to acknowledge Andreas Kuehlmann, whom I met in 1999. By
this time I had been deploying functional formal verification at IBM for sev-
eral years, relying upon every trick in the book and a few others to attempt to
v
push large Gigahertz designs through model checkers.1 Numerous ideas for the
automation of some of these tricks were floating around my head at this time,
such as generalizations of the acyclic -slow abstraction which evolved into our
approach for structural diameter overapproximation, and some reductions that
have been subsumed by retiming and structural target enlargement. Andreas
was the visionary behind the concept oftransformation-based verification, en-
abling the synergistic application of various structural abstractions in a verifi-
cation framework. I immediately acknowledged this framework as the cohesive
force to unify all of my ideas and more. Andreas additionallyprovided many
brilliant concepts to the retiming work described herein, and has been central
to the development of my technical writing and formal reasoning skills since.
One virtue (or curse) I inherited through Andreas is the notion that “Excellent
is not good enough, because there is always better.” What this means below
the surface of the words is that behind every good idea is an even more general
concept.2 I owe many of the results of this thesis, and much of my technical
focus, to Andreas.
– I wish to briefly acknowledge those whose efforts have most inspired and in-
fluenced this work, and those whom have provided me stimulating discussion
during the past: my supervisor Jacob Abraham, E. Allen Emerson, Aarti Gupta,
Alan Hu, Kenneth McMillan, Steven German, Edmund Clarke, Armin Biere,
Carl Pixley, Pranav Ashar, Alessandro Cimatti, In-Ho Moon,James Kukula,
Thomas Shiple, Fabio Somenzi, Kavita Ravi, Robert Kurshan,Flemming An-
dersen, James Saxe, Malay Ganai, and many others whose namesappear in the
bibliography.
1I wish to thank Tamir Heyman for teaching me many of these tricks, including that of optimally exploitinginsomnia.
2A similar idea was quoted by my Integer Programming professor Dr. Gang Yu: “Behind complexity,there is always simplicity to be revealed. Inside simplicity, there is always complexity to be discovered.”
vi
– I lastly wish to thank some of the outstanding professors that I have had the
privilege to learn from while at UT: Greg Plaxton, Vladimir Lifschitz, Lizy
Kurian John, Gang Yu, Margarida Jacome, and Joydeep Ghosh.� I next wish to acknowledge those at IBM who have influenced this research, as well
as the IBM Server Group as a whole for supporting this work, and for providing the
real-world motivation for many of the techniques developedherein.
– Many of the techniques described in this thesis have been implemented within
IBM’s semi-formal verification tool. This project has enabled numerous exper-
imental results reported herein, and provided the motivation to wring through
implementation details that otherwise would likely have gone unexplored. I
want to thank those at IBM who have helped make this project happen, includ-
ing Dave T. Nelson, one of the major motivating factors behind this project;
Wolfgang Roesner, the technically cohesive force between this project and the
rest of IBM; and Victor Rodriguez, the manager of this project who has been
central to keeping it on its tracks. I next wish to thank the semi-formal team
itself, which is the best development team one could ever hope to be a part of:
Viresh Paruthi, Mark Williams, Bob Kanzelman, Hari Mony, Jessie Xu, and
Yee Ja.
– I additionally wish to thank those who have assisted this project via support and
development of adjacent components: Steven Bergman, Matyas Sustik, Ali El-
Zein, Zoltan Hidvegi, Robert Shadowen, Geert Janssen, PaulRoessler, Gavin
Meil, John J. Forrest, and Scott Mack.� I last, but certainly not least, wish to acknowledge those who have influenced my life
during this period.
– I wish to thank my motivating, supportive, and all-around optimal wife Shelly,
without whom my sanity, and quite possibly my will to live, would long since
vii
have vanished during this grueling effort of graduate work in parallel to an
extremely time-consuming full-time job.
– I wish to acknowledge our present stress-relief lops Mocha,Loppy, and Beary
for adding some amusement to my life. I also wish to acknowledge the mem-
ory of Clover; her ever-cheerful nature served to pick up my spirits no matter
how difficult and stressful times became during much of the period that I was
working on this thesis.
– I wish to thank my parents for always encouraging me to achieve.
– I wish to thank the rest of my family for words of encouragement.
– I wish to thank my lifelong friends who provided me buffer overflow protection:
Hagop Jay Tumayan, Raymond Jones, Chris Bald, Mark Dungan, and Biz E. J.
Marquis.
– I lastly wish to thank my spatially-immediate friends whom Ihave had the
opportunity to meet only more recently, who also provided relief from termi-
nal steam build-up: Kenneth Klapproth, David Mui, Praveen and Sona Reddy,
Jerome Delune, John Spencer, Susann Keohane, David Fink, Andy Murati,
James Marrone, Michael Barenys, Steve Roberts, and Jun Sawada.
JASON RAYMOND BAUMGARTNER
The University of Texas at Austin
December 2002
viii
Automatic Structural Abstraction Techniques for
Enhanced Verification
Publication No.
Jason Raymond Baumgartner, Ph.D.
The University of Texas at Austin, 2002
Supervisor: Jacob Abraham
Computers have become central components of nearly every facet of modern life. Ad-
vances in hardware development have resulted in computers more powerful than the largest
mainframe of the last decade becoming available and affordable for general use. This in
turn has enabled problems which were historically intractable to become solvable with
present technologies. This trend has been noted for four decades.
Functional verification is the process of validating that a design conforms to its
specification. Exhaustive verification generally requiresexponential resources with respect
to design size, hence there is a fine line between “solvable” and “intractable”; this cut-off
point is unfortunately often far smaller than that which is practically necessary. Due to
ongoing increases in hardware design size, direct application of exhaustive techniques to
verify these designs requires exponentially-growing verification resources which outpaces
available boosts in computing power. Therefore, on the surface, Moore’s law works against
the hardware verification community.
This thesis presents an approach to battling verification complexity via automatic
abstraction techniques which transform the structure of a design. These techniques require
ix
only polynomial resources with respect to design size, and may yield exponential speedups
to the verification process. These abstractions are developed as components of a modular
transformation-based verificationframework, enabling optimal synergy between the vari-
ous techniques.
Our specific contributions include: 1) a compositional and structural diameter over-
approximation technique, enabling the use of abstractionsto tighten the produced bounds;
2) an on-the-fly retiming technique for redundancy removal;3) the concept of fanin regis-
ter sharing to enhance min-area retiming; 4) a generalized retiming approach which elim-
inates reset state and input-output equivalence constraints, and supports negative regis-
ters; 5) structural cut-based abstraction; 6) a structuraltarget enlargement approach; 7) the
technique of -slow abstraction; and 8) the technique of phase abstraction. Numerous ex-
periments demonstrate the utility and synergy of these techniques in simplifying difficult
problems. We therefore feel that these techniques comprisea significant step towards a
scalable, automated verification system, helping to realize the prediction made by E. Allen
Emerson that “Someday, Moore’s Law will work for us [the verification community], rather
than against us.”
x
Contents
Acknowledgments v
Abstract ix
List of Tables xiv
List of Figures xv
Chapter 1 Background and Scope 1
Chapter 2 Previous Work 13
Chapter 3 Netlists: Syntax and Semantics 18
3.1 Verification Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . .. 26
3.2 Figure Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Chapter 4 Diameter Overapproximation Techniques 30
4.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . .43
Chapter 5 Redundancy Removal 44
5.1 Redundancy Removal Algorithms . . . . . . . . . . . . . . . . . . . . .. 46
5.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
xi
5.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . .57
Chapter 6 Generalized Retiming 58
6.1 Retiming Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .59
6.1.1 Fanout REGISTERSharing . . . . . . . . . . . . . . . . . . . . . . 60
6.1.2 Fanin REGISTERSharing . . . . . . . . . . . . . . . . . . . . . . . 61
6.1.3 Relaxing Input-Output Equivalence Constraints . . . .. . . . . . . 64
6.1.4 Enabling NEGATIVE REGISTERs . . . . . . . . . . . . . . . . . . 66
6.1.5 Normalized Retiming . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.2 Retiming for Enhanced Verification . . . . . . . . . . . . . . . . . .. . . 71
6.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . .79
6.4.1 Redundancy Removal Experiments . . . . . . . . . . . . . . . . . 79
6.4.2 Retiming Experiments . . . . . . . . . . . . . . . . . . . . . . . . 84
6.4.3 Diameter Overapproximation Experiments . . . . . . . . . .. . . 88
Chapter 7 Cut-Based Abstraction 94
7.1 Cut-Based Abstraction Algorithms . . . . . . . . . . . . . . . . . .. . . . 98
7.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
7.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . .107
Chapter 8 Structural Target Enlargement 113
8.1 Target Enlargement Algorithms . . . . . . . . . . . . . . . . . . . . .. . . 118
8.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
8.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . .123
Chapter 9 C-Slow Abstraction 127
9.1 C-Slow Abstraction Algorithms . . . . . . . . . . . . . . . . . . . . . . . 142
9.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
xii
9.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . .145
Chapter 10 Phase Abstraction 148
10.1 Phase Abstraction Algorithms . . . . . . . . . . . . . . . . . . . . .. . . 164
10.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
10.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 169
Chapter 11 Conclusions and Future Work 172
Appendix A Appendix 175
A.1 Modeling Interconnections as Nets . . . . . . . . . . . . . . . . . .. . . . 175
A.2 Alternate Gate Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
Bibliography 178
Vita 190
xiii
List of Tables
6.1 Retiming results for ISCAS89 benchmarks . . . . . . . . . . . . .. . . . . 81
6.2 Retiming results for IBM Gigahertz Processor (GP) netlists . . . . . . . . . 82
6.3 Generalized retiming results for ISCAS89 and GP netlists . . . . . . . . . . 86
6.4 Effect of retiming on reachability analysis . . . . . . . . . .. . . . . . . . 87
6.5 Diameter experiments for ISCAS89 benchmarks . . . . . . . . .. . . . . . 91
6.6 Diameter experiments for GP netlists . . . . . . . . . . . . . . . .. . . . . 92
7.1 Cut results for ISCAS89 benchmarks . . . . . . . . . . . . . . . . . .. . . 110
7.2 Cut results for GP netlists . . . . . . . . . . . . . . . . . . . . . . . . .. . 111
8.1 Target enlargement results for ISCAS89 benchmarks . . . .. . . . . . . . 124
8.2 Target enlargement results for GP netlists . . . . . . . . . . .. . . . . . . 126
10.1 Phase abstraction results for GP netlists . . . . . . . . . . .. . . . . . . . 170
xiv
List of Figures
1.1 Invariant checking methodology . . . . . . . . . . . . . . . . . . . .. . . 5
1.2 Example flow of transformation-based verification system . . . . . . . . . 11
3.1 Simulatealgorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Figure symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.1 Depiction of netlist partitioning for diameter overapproximation . . . . . . 34
4.2 Structural diameter overapproximation algorithm . . . .. . . . . . . . . . 37
4.3 Diameter overapproximation example . . . . . . . . . . . . . . . .. . . . 40
5.1 StructuralMerge algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.2 Mapping an AND/INVERTER/REGISTERgraph to a netlist . . . . . . . . . 48
5.3 Fanout REGISTERsharing example . . . . . . . . . . . . . . . . . . . . . . 49
5.4 AND/INVERTER/REGISTER-graph algorithm for AND gate creation . . . . 51
5.5 AND/INVERTER/REGISTER-graph algorithm for REGISTERcreation . . . . 52
5.6 On-the-fly retiming example . . . . . . . . . . . . . . . . . . . . . . . .. 53
5.7 AND/INVERTER/REGISTERgraph example . . . . . . . . . . . . . . . . . 54
6.1 Example of ILP modeling of fanout and fanin REGISTERsharing . . . . . . 61
6.2 Decomposition of AND vertex for optimal fanin REGISTERsharing . . . . 62
6.3 Retiming graph example . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.4 Alternate retiming graph example . . . . . . . . . . . . . . . . . . .. . . 64
xv
6.5 Alternate retiming graph example with relaxations . . . .. . . . . . . . . . 65
6.6 Retimed netlist example with a NEGATIVE REGISTER . . . . . . . . . . . 68
6.7 Example of incorrect ILP modeling of sharing with relaxed non-negativity
constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.8 Temporal decomposition of a retimed netlist . . . . . . . . . .. . . . . . . 72
6.9 Example netlist depicting how on-the-fly retiming may hurt REGISTERcount 84
6.10 BDD profile for reachability of S3330 with retiming and redundancy removal 89
7.1 Cut abstraction trace lifting algorithm . . . . . . . . . . . . .. . . . . . . 97
7.2 Top-levelCut Abstract algorithm . . . . . . . . . . . . . . . . . . . . . . 100
7.3 Analyze Cut algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
7.4 SynthesizeSetalgorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
7.5 BDD synthesis example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
8.1 Enlarge Target algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 115
8.2 Top-level target enlargement flow . . . . . . . . . . . . . . . . . . .. . . 119
8.3 Target enlargement trace lifting algorithm . . . . . . . . . .. . . . . . . . 119
9.1 Example three-slow netlist . . . . . . . . . . . . . . . . . . . . . . . .. . 129
9.2 Recurrence structure of abstracted three-slow netlist. . . . . . . . . . . . . 130
9.3 Initialization structure of abstracted three-slow netlist . . . . . . . . . . . . 130
9.4 Algorithm for preprocessing generalized -slow netlists . . . . . . . . . . . 137
9.5 C-Slow trace lifting algorithm . . . . . . . . . . . . . . . . . . . . . . . . 140
9.6 C Slow Abstract algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 142
9.7 Algorithms for coloring generalized -slow netlists . . . . . . . . . . . . . 143
10.1 Semantics-preserving translation of LATCHes to REGISTERs . . . . . . . . 150
10.2 Example netlist with two minimal dependent layers . . . .. . . . . . . . . 153
10.3 Example two-phase netlist . . . . . . . . . . . . . . . . . . . . . . . .. . 154
10.4 Phase-abstracted netlist . . . . . . . . . . . . . . . . . . . . . . . .. . . . 156
xvi
10.5 Alternate phase-abstracted netlist . . . . . . . . . . . . . . .. . . . . . . . 157
10.6 PhaseAbstract algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 161
10.7 Phase abstraction trace lifting algorithm . . . . . . . . . .. . . . . . . . . 162
10.8 MDL partitioning algorithm . . . . . . . . . . . . . . . . . . . . . . .. . 165
10.9 Example three-phase MDL . . . . . . . . . . . . . . . . . . . . . . . . . .166
A.1 Remodeling GATED-CLOCK REGISTERs . . . . . . . . . . . . . . . . . . 177
xvii
Chapter 1
Background and Scope
Computers have become pervasive elements in our daily lives. Advances in hardware de-
velopment have resulted in computers more powerful than thelargest mainframe of the
last decade becoming available and affordable as deskside units for general use. Com-
puters have found their way into nearly every electronic device manufactured today – be-
coming central components of communications, entertainment, medical, and transporta-
tion devices, to name a few. Several-generation-old computer technology may be manu-
factured for mere pennies. The trend of bigger (in capacity), smaller (in physical size),
faster, and cheaper is characteristic of microprocessor, memory, and storage device tech-
nologies [1, 2, 3].
The digital design process consists of several distinct phases. First is thearchitec-
turestage, wherein the desired functionality and performance are used to dictate the type of
logic components that will be used to realize the design. These components are nextimple-
mented, often using a register-transfer level hardware description language (HDL). Finally,
throughlogic synthesis, the HDL models are refined down to gate-level models, which in
turn are refined down to transistor-level models suitable ultimately as a template for sili-
con [4]. Verificationis the process of assessing that the design conforms to its specification,
and is performed at various stages of the design flow.
1
Among the technical advances required to enable the design of high performance
computers are powerful computers themselves – today’s high-end computers are critical
tools to the crafting of tomorrow’s even higher-end computers. There are two primary
tasks related to hardware design for which computers are irreplaceable: logic synthesis
and verification. Verification is the focus of this thesis. Wefurther will focus only upon
timing-independent functional verification of gate-levelhardware models; combinational
propagation delays are assumed to be zero, and the domain of verification is to prove that
certain properties hold in allreachable statesof the design. Additionally, we will focus
only upon automatic verification paradigms which carry out aproof without a need for
manual intervention, hence we will not discuss theorem proving techniques [5, 6].
Computing machines are generallyreactiveto input stimulus, andsequentialhence
may “remember” past input stimuli and computations via state-holding elements such as
registers. The verification of such designs is extremely complex; in addition to needing
to verify these designs against every possible input stimulus from the environment, their
sequentiality requires verification against all possiblesequencesof stimuli. Even a sim-
ple invariant check – that some predicate holds in all reachable states of the design – is
PSPACE-complete [7] hence generally requires exponentialverification resources with re-
spect to design size. This complexity limits the applicability of exhaustive verification
techniques to designs with several hundred state-holding elements or less, while modern
processors may have millions of state-holding elements, even ignoring main memory and
caches. Additionally, the individual “building block” components of such a design – such
as arithmetic logic, prefetch logic, address translation logic, and cache controllers – are
likely to contain many thousands of state elements.
Traditionally, simulation-based approaches are used for verification. In simulation,
testcases (sequences of input stimuli) are developed manually or randomly, and the be-
havior of the design as subjected to these testcases is explored. The benefit of test-based
approaches is that they are scalable, and may be applied to designs of almost any size.
2
Their drawback is that they are incomplete; the fraction of design behavior – e.g., coverage
of reachable states – that can possibly be explored through simulation given fixed resources
generally decreases exponentially with design size. Many techniques have been developed
to increase the coverage attainable through simulation, such as high-level model-based test
generation and the use of coverage analysis to direct test generation [8]. If done cleverly,
simulation has the ability to flush out many design flaws, and will likely always have an
important role in design validation due to its scalability.However, as design sizes increase,
test-based approaches must be deployed in an ad-hoc rather than systematic and complete
manner, hence they cannot prove the absence of errors. The incomplete nature of simula-
tion implies that certain design flaws will go unexposed. Even one missed design flaw may
cost a company hundreds of millions of dollars to rectify, cause project cancellation due to
lost time-to-market, and even risk the loss of human life as computers are finding their way
into safety-critical applications such as transportationand medicine.
Due to the limitations of test-based approaches, there has been an increased effort
throughout the industry to exploit formal verification techniques. Formal verification (FV)
addresses the coverage problem; it exhaustively considersall possible design behavior,
hence has the ability to prove the absence as well as the presence of design errors. As
mentioned above, exhaustive verification generally consumes exponential resources with
respect to design size hence is of limited applicability to larger designs, which arise fre-
quently in industrial applications. Unlike test-based approaches which circumvent this
complexity through incompleteness, the exhaustiveness ofFV directly entails this com-
plexity. Nevertheless, the only way to guarantee the absence of bugs is through formal
techniques, and industrial designs are the ultimate targetfor FV application – hence there
exists a need for a robust mechanism to allow FV to scale up to large designs. Modern high-
performance designs pose many challenges to verification due to the very characteristics
which are intrinsic to achieving their high performance: structural redundancy (or near-
redundancy) such as duplication of logic and storage to minimize propagation delays to
3
distinct fanout points, and a high degree of pipelining [9].The higher degree of pipelining
often implies higher complexity due to more “timing window”conditions; e.g., cache type
of logic must correlate processor accesses against external snoop requests not only concur-
rently but across several clock periods, since the processing of both is spread across several
clock periods. Such timing windows often render test-basedtechniques grossly insufficient
at exposing design flaws. The design-for-high-performance-silicon paradigm further poses
significant barriers to the application of formal techniques: due to the increasing number
of state-holding elements for a design with specific functionality, FV faces increasing chal-
lenges in keeping up with the pace of high-performance designs.
This focus of this thesis is a novel approach to battling the FV complexity barrier
through the use of automatic structural abstraction techniques. These abstractions reduce
the complexity of verification of a design through a transformation of its structure. Our ab-
straction techniques are largely motivated by industrial design characteristics. By restrict-
ing our techniques to using fast graph-based analysis, we constrain their computational
resources to polynomial with respect to design size, and enable exponential speedups in
the verification process. We develop our abstractions as components of a general and mod-
ular transformation-based verificationframework as proposed in [10]. This framework
enables optimal synergy between the various algorithms forproblem simplification and de-
composition, and run-time configurability of algorithm flows to most efficiently discharge
the verification problem at hand.
We assume that our netlist comprises a composition of thedesign, its driver (also
known as theenvironment, encodinginput assumptions[11]), and correctness monitors re-
ferred to asproperty automata, as depicted in Figure 1.1. In composition, certain vertices
in the design under test (DUT) – such as primary inputs, and possibly also internal ver-
tices – will bemergedonto vertices in the driver. This merging algorithm is introduced
in Figure 5.1. The driver may also be dependent upon the DUT, hence some vertices of
the driver may be merged onto vertices of the DUT; care must betaken not to introduce
4
combinational cyclesin this composition. A similar process is used to compose theprop-
erty automata to the DUT/driver composition. Our verification problem is thus aninvariant
check– an attempt to find a trace from an initial state to one which assigns a binary one to
a targetvertex in the property automaton, or to prove that no such trace exists. This would
be akin to checking the CTL [12] propertyAG(:target) – equivalently, that the invariant:target holds in all reachable states of this composition.1 Such a paradigm is sufficient
for the verification of safety properties [13, 14, 15], whichfrom our experience is almost
always sufficient for industrial verification problems. However, liveness properties cannot
be expressed in this system. Practically, one may often decompose hardware liveness prop-
erties into a set of conservative safety properties (as a simple example, every request will
be granted withinn steps rather than “eventually”) – though clearly there are limitations to
such an approach.
DRIVER
NetlistComposite
AUTOMATA
DUT
PROPERTY
FREE vertices
FREE vertices
targets
targets
Figure 1.1: Invariant checking methodology
There are several reasons that more a general property checking paradigm is not
1We use the common “implicit type casting” of binary 1� booleantrue and binary 0� booleanfalsethroughout this thesis.
5
discussed in detail herein.� The goal of this thesis is to describe structural abstraction techniques that are us-
able with arbitrary verification algorithms – e.g., binary and symbolic simulation,
hardware emulation, explicit and symbolic reachability analysis, satisfiability-based
bounded model checking, and semi-formal hybrid approaches. It is our experience
that each algorithm has its own strengths and weaknesses, and works well on certain
designs though not on others. Many industrial problems are too large for direct ap-
plication of exact verification techniques – abstraction techniques may significantly
help, but only up to a point. Therefore, approximate techniques may be the only di-
rect approach of obtaining any verification coverage for a very large problem, at least
before expert manual abstraction may be performed (if the latter is even feasible).
Many underapproximate algorithms do not handle general property checking.� The invariant checking paradigm is sufficient for nearly allindustrial verification
problems due the ability to convert safety properties to automata, hence this restric-
tion is not a significant practical barrier.� The lengthy discussion of suitable temporal logic fragments, liveness constraints,
and modifications of these to match the structural transformations is unnecessary in
our paradigm, and would detract from its primary focus. Furthermore, there may
exist theoretical limitations on usability of certain abstractions within a more general
property checking environment, hence focusing on such may preclude a fragment of
research which is of significant utility in an invariant-checking paradigm. The bulk
of the techniques described herein are applicable for use ina more general verifi-
cation framework (e.g., temporal logic model checking), though there may well be
challenges, limitations, and in cases impossibilities. Wehave in cases explored such
extensions [16, 17], though have not sufficiently generalized this research for the
above reasons.
6
The main bottleneck of invariant checking is the potentially exorbitant computa-
tional resources necessary for state traversal. In general, there is no clear dependency be-
tween the structure or size of a netlist and verification resource requirements. For example,
some designs with less than 100 registers are too complex forreachability analysis; others
with more than 500 may be simple for reachability analysis. In some cases, reducing regis-
ter count may increase correlation between them, hence hurtBDD-based analysis. In other
cases, one reduction may hurt another – e.g., translating ashift registerto a log-2 counter
may hurt retiming, since retiming may be able to eliminate all registers of a pipeline but
cannot eliminate any registers from a directed cycle. However, our experiments demon-
strate that a reduction in netlist size by one technique often enhances the application of
other reduction techniques. Furthermore, a smaller netlist graph tends to require less mem-
ory and run-time resources for performing verification – often, exponentially lower. In par-
ticular, for BDD-based techniques [18, 19] fewer registersresult in fewer BDD variables
which typically decreases the size of the BDDs representingthe set of states and transi-
tions among them. Similarly, in satisfiability-based (SAT-based) state enumeration [20],
the complexity of the state recording device proportionally depends upon the number of
registers. A second motivation comes from the observation that a reduced number of regis-
ters often decreases the functional correlation between them, although as mentioned above
the opposite may occasionally occur. Intuitively, register reduction often produces a less
scattered state encoding which results in a more compact setof BDDs or cube structure for
BDD or SAT-based reachability analysis, respectively. Thus, the primary objective of our
model reductions is to reduce register count.
Our secondary objective is to minimize the number of primaryinputs, which we
denote as FREE vertices. At a coarse level, the number of FREE vertices, like the number
of registers, has a bearing upon an upper-bound on the size ofthe transition relation – and
more generally upon the number of distinct functions over these elements which provides
an upper-bound on the maximum number of irredundant vertices in a netlist. Furthermore,
7
the more FREE vertices, often the less likely simulation is to transitionthe netlist into a
specific state which may occur for only one of exponentially many possible valuations to
the FREE vertices.
Our third and final objective is to minimize the number of combinational vertices
(e.g., AND gates) in a netlist. The size of a netlist entails a linear-time increase on the run-
time of binary simulation and graph traversal algorithms, and often a superlinear increase
on the resources necessary for SAT-based or BDD-based analysis.
Generally, we wish to minimize some function of these three entities – i.e., it may
not be beneficial to exponentially increase combinational vertex count to yield a small de-
crease in register count, but the above is a close approximation to what we have found to
be an optimal objective function. Several of the transformations that we will discuss have
been borrowed from synthesis optimization techniques – forexample, retiming and redun-
dancy removal. The use of structural transformations for enhanced functional verification
is a fairly new topic, whereas such transformations have been used in synthesis and com-
binational equivalence checking for many years. Note that the objective of transformations
for optimal synthesis may often differ from that for verification. For synthesis, one must
balance the effect of a transformation between its effect oncombinational delay, circuit
size and topology, and power consumption. For verification,our objective is in a sense
more direct – we only care to decrease verification complexity, so minimization of netlist
graph size as per the above objectives is our only concern. However, as mentioned above,
a reduction in netlist size may hurt a verification flow in uncommon cases.
The abstraction techniques described in this thesis are discussed as modules of a
re-entrant engine-based verification toolset. This enables synergistic interactions of and
iteration between engines. Each technique is developed according to the following criteria.� It must be “sound and complete” – i.e., it will not cause the user to see a semantically
incorrect trace or obtain an incorrect pass/fail answer.� It must be capable of efficiently “lifting a trace” obtained from the abstracted netlist
8
to one consistent with the original, unabstracted netlist.� It must be capable of receiving a netlist, then handing off a (presumably simpler)
netlist; no extra information may be required to perform theabstraction other than
a structural transformation. However, the encapsulating engine may decompose a
problem, hence split off multiple subproblems.� It may not require assistance of the user – e.g., manually-guided abstractions will not
be considered. Also, it must operate without special annotation to or syntax of the
implementation; e.g., without a need for word-level predicates in the source HDL.
While user-guided abstraction and the exploitation of moreabstract design models
may be arbitrarily useful in simplifying a verification problem, we feel that many
of the benefits attainable in a self-contained abstraction framework are infeasible to
reproduce manually due to their applicability to complex bit-level control logic. Fur-
thermore, purely automatic abstractions are more extensible in an industrial setting
where much of the design and verification staff is not versed in formal techniques.
Overall, we view manual and automatic abstractions as complementary approaches,
and further investigation of the synergy of these approaches is a promising area of
future research.
By developing our abstraction techniques to adhere to thesecriteria, we enable
their application within atransformation-based verificationframework as proposed in [10]
wherein one may iteratively simplify and decompose a verification problem using a series
of transformations, until the problem is “simple enough” tobe solved by a terminal verifi-
cation engine. Therefore, our abstractions must operate solely by structural transformation,
and must be optimized for arbitrary subsequent verificationflows. Such a verification sys-
tem holds tremendous promise for industrial designs, whichare often large and incorporate
many diverse types of logic – e.g., control, arbitration, and table-based storage within a sin-
gle component. Note that such a modular, engine-based approach was also key to making
9
automatic logic synthesis [21, 22] and combinational equivalence checking [23] practical.
Many of the techniques discussed in this thesis have the ability to work synergistically – i.e.,
application of one technique may enhance the application ofanother. Additionally, these
techniques are largely independent of the actual terminal verification algorithms used, mak-
ing the contribution of this thesis orthogonal but complementary to the body of research of
pushing the capacity of verification algorithms.
A particular instance of this system is depicted in Figure 1.2. Note that the netlist
representation at each engine may differ, since each enginemay transform the netlist. For
example, after redundancy removal which merges semantically equivalent vertices, the re-
sulting netlist will generally contain fewer vertices in the cone-of-influence of the targets
than the netlist prior to redundancy removal. After retiming, some vertices in the netlist
are temporally skewed with respect to the netlist received.It is possible that a transforma-
tion alone may trivialize a target – for example, by merging atarget vertex onto a constant
vertex: ZERO or ONE. However, such cases are fairly infrequent, thus our abstractions are
primarily effective in enhancing subsequent verification flows. We require that any trace
returned by an engine be consistent with the netlist received by that engine. Therefore, an
engine which transforms a netlist must undo the effects of its transformation when receiv-
ing a trace from achild engine (which will be in terms of the transformed netlist) before
passing a corresponding trace up to itsparentengine, which will expect the trace to be in
terms of the untransformed netlist. Note also that an enginemay instantiate multiple child
engines as depicted in Figure 1.2. This may be useful to re-use work performed by an an-
cestor engine flow for a multi-faceted verification strategy, or to decompose a problem into
sub-problems as is done by some of the algorithms to be discussed.
This thesis is organized as follows. We first briefly discuss previous work in the
area of abstraction for enhanced verification in Chapter 2, though research most related to
our techniques will be discussed in the corresponding chapters. We next define the syntax
and semantics of our netlist-based representation of the verification problem in Chapter 3.
10
RetimingEngine
HDL Compilation
User
EngineSimulationRandom
ReachabilityEngine
Redundancy
Engine
Removal Engine
Symbolic
TargetEnlargement
NetlistNNetlistN 0
NetlistN 00Tracep00
Tracep
Tracep000NetlistN 000
Tracep0
NetlistN 000Figure 1.2: Example flow of transformation-based verification system
We introduce several common verification algorithms in Section 3.1; these are useful tools
in discharging invariants, and are used as components of some of our abstraction algo-
rithms. We explore the notion of netlistdiameterin Chapter 4, and introduce a composi-
tional structural algorithm for diameter overapproximation, in addition to some theory that
will be used throughout the thesis to enable diameter boundsobtained upon an abstracted
netlist to backward-imply diameter bounds on the unabstracted netlist. This chapter ex-
tends results of collaborative work with Andreas Kuehlmannand Jacob Abraham reported
in [24]. We then begin detailed discussion of several abstraction techniques. The first
topic is redundancy removal, discussed in Chapter 5. Our contribution to this area is the
topic of on-the-fly retiming, which incorporates results ofcollaborative work with Andreas
Kuehlmann reported in [25]. We next describe the use of generalized min-area retiming for
11
enhanced verification in Chapter 6, extending results of collaborative work with Andreas
Kuehlmann reported in [10, 25]. We discuss the concept of cut-based abstraction in Chap-
ter 7. Chapter 8 introduces the topic of structural target enlargement, extending results of
collaborative research with Andreas Kuehlmann and Jacob Abraham reported in [24]. A
discussion of generalized -slow abstraction follows in Chapter 9, which generalizes upon
results reported in [17] obtained in collaboration with Anson Tripp, Adnan Aziz, Vigyan
Singhal, and Flemming Andersen. The final topic is phase abstraction, presented in Chap-
ter 10, which extends results of collaborative work with Tamir Heyman, Vigyan Singhal,
and Adnan Aziz reported in [16]. In Chapter 11 we conclude thethesis and discuss future
research directions. In Appendix A we discuss ways to model more complex gate types
and interconnections in our framework. We have organized this thesis so that a reader in-
terested only in a particular topic may read only the corresponding chapter, possibly using
Chapter 3 as a reference.
12
Chapter 2
Previous Work
In this chapter, we briefly discuss research related to the use of abstraction for enhanced
verification. Because of the volume of prior work, we limit our focus to abstractions of use
in enhancing automatic invariant checking of netlists. We furthermore focus on abstractions
applicable to general verification flows, hence do not consider abstractions which must be
embedded inside specific verification algorithms. We defer discussion of prior research
most closely related to the topics explored in this thesis until later chapters, so that we may
provide the proper background to discuss them more meaningfully.
Several categories of abstractions seek to compress the state space representation
of a design, though do not focus on ways in which to efficientlyrepresent such reductions
in terms of netlist structure. Therefore, they are not directly useful as reduction engines
in a transformation-based verification setting – though they may be useful as features of a
terminal verification engine. Because of their analysis of state space representations, they
risk outweighing the cost of an invariant check in themselves, and are often more focused
on enhancing temporal logic model checking approaches which are more computationally
expensive than invariant checking. Such approaches are fundamentally different from those
taken in this thesis – we use fast structural analysis of the netlist to automatically guide
our abstractions, occasionally using semantic analysis (such as BDDs) only in a resource-
13
bounded manner. Examples of such abstractions include the following.� Bisimulation minimization may be used to reduce the state space of a design to sim-
plify temporal logic verification [26, 27]. As demonstratedin [28], bisimilarity pre-
serves property checking for any CTL* formula, hence this reduction is sound and
complete, though often weaker than necessary for invariantchecking. Such tech-
niques require analysis of the state space of the design, which is too computationally
expensive to consistently offer benefits to invariant checking as noted in [27].� Several similar, more aggressive techniques for reductionof state space while guar-
anteeing preservation of only a necessary set of formulas (rather than all formulas)
have been proposed, for example in [29]. Experimental results of the utility vs. cost
of such approaches have yet to be established as useful for invariant checking.� The topic of abstract interpretation [30, 31] has been proposed to allow reasoning
about abstractions of a design. This work is an excellent framework from which to
theoretically understand many forms of abstraction. However, these approaches do
not directly address how to automatically select abstractions; they instead provide an
infrastructure for reasoning about a selected abstractionof a system.
Numerous practical and powerful abstractions for enhancedverification are pro-
posed in [32, 33], using a theoretical framework similar to abstract interpretation. These
techniques are generally prone to yielding inexact answers, unless fairly stringent suffi-
ciency conditions are met. These approaches also fall into the category of manually-
obtained abstractions for extracting an abstract state space representation for temporal logic
model checking rather than for generating a more compact netlist representation usable
in a transformation-based verification setting. Furthermore, the proposed abstractions are
geared more for simplifying data representations of word-level models than for simplifying
complex bit-level control logic.
14
The automated technique of localization introduced in [34]proposes isolating a sub-
netlist local to the property automata by using afree fence– an overapproximate cut of the
fanin cone. This approach is therefore generally prone to false failures, though a special
case of this technique is sound and complete when the entirecone of influenceis retained
and the remainder of the netlist is discarded. Localization, and techniques to exploit spuri-
ous counterexamples obtained through localization to refine its abstraction, have been the
topic of numerous research approaches [34, 35]. This technique is complementary to our
contributions; the sound and completecone of influencereduction has become a corner-
stone of most practical verification tools, and our abstractions may be used to simplify the
verification of localized cones.
Numerous approaches have been developed for reducing the complexity of word-
level designs. For example, the data-path abstractions proposed in [36, 37, 38] are prone
to false negatives due to exploitation ofuninterpreted functions. Some related approaches
require use of custom verification algorithms [36, 38], hence are not applicable in a general
transformation-based verification toolset. Most of these techniques are not applicable if
data fans out to control (except possibly in a constrained fashion) [36, 37, 38, 39], and re-
quire word-level predicates in the design to exploit, whichis also characteristic of automatic
predicate abstraction approaches [40]. Many of these techniques require manual guidance
to select the abstraction, and none are discussed as components of a transformation-based
verification system.
Symbolic verification approaches [19, 41] have provided a tremendous increase in
verification capacity, scaling up to designs with hundreds of REGISTERs hence up to 100states for some small . Such approaches are complementary to our contributions, and
useful as verification engines in our system. Numerous techniques have been proposed to
enhance symbolic verification, such as exploitation of known invariants of the system [42],
partitioning of the transition relation to decompose imagecomputation [43, 44], and many
more – all of which are complementary techniques useful in a transformation-based verifi-
15
cation setting.
Compositional verification approaches [45, 33] have been proposed to isolate com-
ponents of large systems for standalone verification using abstractenvironmentmodels.
Theenvironmentsencode a set ofinput assumptions, and the verification task consists of
demonstrating that the composition of the assumptions and the component under test satis-
fies certain properties. Such approaches have become invaluable to industrial verification,
and are complementary to our work – indeed, as depicted in Figure 1.1, our system is
designed for use with such a paradigm.
Hierarchical verification approaches seek to reuse the results of verification at a
lower level to simplify higher-level proofs [46]. For example, if one validates that compo-
nents of a netlist satisfy a set of properties, one may attempt to compose automata repre-
sentations of those properties to validate higher-level properties directly upon the automata
composition without needing to reason about the underlyingimplementation. Such ap-
proaches are also complementary to our contributions; our techniques may be used to en-
hance verification at the lower level (of the implementation), as well as at the higher level
(of the composition of property automata).
Symmetry reduction techniques [47, 48, 49] seek to identifysymmetries of the un-
derlying design, e.g., parallel instantiations of identical components, for enhancing the
verification process. While many of these techniques are proposed as embedded compo-
nents for simplifying specific verification algorithms, andpossibly require manual assis-
tance to identify the symmetry, incorporation of symmetry reductions into our framework
is a promising complementary area of future research.
Structural simplification techniques [50, 51] have become core algorithms of com-
binational equivalence checking. In equivalence checking, one has two netlist gates that
they wish to prove semantically equivalent, often from two levels of refinement of the de-
sign process (e.g., gate-level vs. transistor-level representations). An exclusive-or is built
over these two gates, and the verification goal is to demonstrate that this exclusive-or is
16
semantically equivalent to zero. Simplification techniques are used attempt to identify re-
dundant gates in the fanin cone of the exclusive-or; when tworedundant gates are found,
one is merged onto the other. This merging thereby reduces the size of the problem, and
often trivializes it (e.g., by merging the two gates being compared onto each other). In-
variant checking may be viewed as a sequential generalization of the equivalence check-
ing paradigm, hence structural simplification techniques are equally applicable to invariant
checking. Such techniques are discussed further in Chapter5.
17
Chapter 3
Netlists: Syntax and Semantics
In this chapter we introduce the netlist, and provide structural and semantic definitions that
will be used throughout this thesis to reason about the netlist. A reader well-versed in
hardware verification may wish to skip to the next chapter, referring to this chapter only as
a reference.
Our netlist definition is based upon a directed graph model.
Definition 3.1. A directed graphG = hV; Ei consists of a finite set of verticesV, and a set
of directed edges between verticesE � V � V. For edge(u; v), we refer tou as thesource
vertex andv as thesinkvertex.
Definition 3.2. We defineinlist(U) = fv : 9u 2 U:(v; u) 2 Eg as the set of ver-
tices sourcing input edges to vertex setU . We define the indegree of a vertex setU by
indegree(U) = jinlist(U)j.Definition 3.3. We defineoutlist(U) = fv : 9u 2 U:(u; v) 2 Eg as the set of ver-
tices sinking output edges from vertex setU . We define the outdegree of vertex setU by
outdegree(U) = joutlist(U)j.Definition 3.4. We definefanin cone(U) = U [ fanin cone
�inlist(U)� for vertex setU .
Due to the monotonicity of evaluation of this definition, andthe finiteness ofG, this set is
18
well-formed.
Definition 3.5. We definefanout cone(U) = outlist(U) [ fanout cone�outlist(U)� for
vertex setU . This set is well-formed as per the analysis of Definition 3.4.
Definition 3.6. A strongly-connected component (SCC)is a set of verticesU � V such
that8u; v 2 U:(u = v) _ �u 2 fanin cone(v) ^ u 2 fanout cone(v)� for vertex setU . The
maximalSCC associated with vertex setU , comprising the union of all SCCs containingu 2 U , is denoted bySCC(U).Definition 3.7. A cut hC; Ci is a partition of a set of verticesV into two sets:C andC =V n C. A cut defines a set of edgesEC with sources inC and sinks inC, i.e. EC = f(u; v) :(u; v) 2 E ^ u 2 C ^ v 2 Cg. The set of vertices sourcingEC is denoted asVC = fu :9v:(u; v) 2 ECg.Definition 3.8. Given the set of all possible cuts of a graph, amin-cutis one such thatECis minimal in cardinality.
Definition 3.9. Given the set of all possible cuts of a graph, avertex min-cutis one whose
set of sourcing verticesVC is minimal in cardinality.
We often wish to specify a set ofsourcesCs � V andsinksCt � V n Cs to seed a
min-cut solution; i.e.,Cs � C andCt � C for any cut returned by a min-cut algorithm. The
resulting seeded formulation is referred to as ans-t min-cut problem.
Our verification problem is expressed as a netlist. This netlist represents a com-
position of the design under test, a netlist-based representation of itsdriver (also known
as theenvironment, encodinginput assumptions), and a netlist-based representation of the
property automataas illustrated in Figure 1.1. We assume that time is discrete, defined
on [0;1). As will be demonstrated in Definition 3.12, we reason about the netlist in two
dimensions: vertices and time. Time dictates the period of update of our only sequential
netlist primitive: the REGISTER. Combinational elements are assumed to have 0 delay. In
19
practice, this model is sufficient to reason about most interesting properties of synchronous
sequential netlists, and many asynchronous problems may bemodeled in this fashion. We
now introduce a formal syntax and semantics for our netlists.
Definition 3.10. A netlist is a tupleN = hG; G; Z; T i. TermG = hV; Ei represents a
directed graph, where the verticesV representgates, and the edgesE representintercon-
nections. FunctionG : V 7! types defines the semantic gatetypeassociated with each gatev. FunctionZ : V 7! V is the initial value mappingZ(v) of each gatev. The nonempty set
of targetsT � V correlate to a set of invariants, as will be discussed in Definition 3.13.
Definition 3.11. A gatev 2 V may be of the followingtypeswhich comprise the range of
functionG. TermGv represents the semantic function correlating to thetypeof v which
will be used in Definition 3.12. Letuj denote thej-th element of an arbitrary ordering of
the setinlist(v).� FREE. This gate has indegree of zero. It may nondeterministically drive a 0 or a 1
at any time-step independently of any other vertices. TermGv() is not referenced
for FREE vertices, though for simplification of notation we occasionally will refer toGv() at a specific time-stepi as the sampled value of the corresponding FREE vertex
at timei.� ZERO. This gate has indegree of zero. It is semantically equivalent to 0, thusGv() = 0.� INVERTER. This gate has one input which is combinationally inverted,thereforeGv(u1) = :u1.� AND. This gate hasn � 1 inputs. It drives the combinational conjunction of all input
values, thusGv(u1; : : : ; un) = Vnj=1 uj.� REGISTER. This one-input gate drives its initial valueZ(v) at time 0, and thereafter
20
unconditionally shadows its input by one time-step. TermGv is not referenced for
REGISTERs.
Hereafter we refer to the vertex type ONE as a shorthand for an INVERTER whose in-
coming edge is sourced by a ZERO vertex. Note that the REGISTERhas an implicit clock –
it represents an unconditional one-time-step delay of its input value. For a discussion of
how to incorporate alternate gate types and more intricate interconnection types into this
framework, refer to Appendix A. The above set of gate types issufficient to succinctly
model sequential boolean netlists. We will occasionally use more complex gate types in
our examples for brevity (refer to Section 3.2); it is understood that this is merely shorthand
for the equivalent synthesis of such gates into the above types. Hereafter we denote the set
of REGISTERs asR, and the set of FREE vertices asI.
Definition 3.12. The semantics of a netlistN are defined in terms of itstraces: 0, 1 val-
uations to gates over time. We denote the set of all legal traces associated with a netlist
by P � [V � N 7! f0; 1g℄, definingP as the subset of all possible functions fromV � Nto f0; 1g which are consistent with the following rule. The value of gate v at time i in
tracep 2 P is denoted byp(v; i). The value of edge(u; v) 2 E at time i in tracep is
defined asp�(u; v); i� = p(u; i). Termuj denotes thej-th element of an arbitrary ordering
of inlist(v), implying that(uj; v) 2 E .
p(v; i) = 8>>>>>>>><>>>>>>>>:b 2 f0; 1g : v 2 IGv�p(u1; i); :::; p(un; i)� : v 2 V n fR; Igp(u1; i� 1) : (v 2 R) ^ (i > 0)p�Z(v); 0� : (v 2 R) ^ (i = 0)
The initial values of a netlist constrain the values that a REGISTERmay take at time
0; note that this function is ignored for non-REGISTER types. Our semantics allow us to
reason about a netlist as a state machine – i.e., a Mealy or Moore machine [52]. However,
21
in this thesis we limit our semantic analysis to this trace-theoretic view. We occasionally
will refer to the set of all legal traces associated with a netlist as _p.Definition 3.13. We say that targett is hit in tracep at timei if p(t; i) = 1. A targett is
not hittable in any trace iff8p 2 P:8i 2 N : p(t; i) = 0. We say that a target which may be
hit is reachable, and one which is not hittable isunreachable.
Because our properties are safety properties, the traces generated by abstraction or
verification algorithms will bepartial traces.
Definition 3.14. A partial trace is defined as the set of finite subsets of legal traces1: P =fp : 9p 2 P: p � pg.Definition 3.15. A refinementp0 of a partial tracep consists of adding one or more elements
to p, such thatp0 � p andp0 is a partial trace.
Definition 3.16. A partial tracep is said to besemantically consistentif all possible refine-
ments ofp are partial traces.
To illustrate the concept of a semantically consistent partial trace, consider a netlist
with FREE verticesa andb, and AND vertex = a^ b. Setfh(a; 0); 0i; h( ; 0); 0gi is a con-
sistent partial trace, as isfh(a; 0); 1i; h(b; 0); 1i; h( ; 0); 1ig. However, the setfh(a; 0); 1i;h( ; 0); 1ig is not consistent since one possible refinement, which addsh(b; 0); 0i, does not
render a partial trace. This fact has implications on an optimal toolset; it may be desirable
that each abstraction algorithm provide as small of a partial trace as possible to make the
trace lifting process as efficient as possible. Nevertheless, it is often necessary that suffi-
cient data be reflected in the partial trace to guarantee legality, else the trace provided to a
user of the tool may not comprise sufficient data to explain how the corresponding target
was hit. Binary simulation algorithms are useful to refine traces, and may be used to fully
populate a partial trace for the necessary length up to a hit of a target.
1We will occasionally refer to an element of a functiona 7! b asha; bi, using the common extension of afunction to a relation.
22
Definition 3.17. Thelengthof a partial tracep ismaxfi : 9v 2 V:9b 2 f0; 1g: h(v; i); bi 2pg+ 1.
Throughout this thesis, we use the convention thatmaxf;g = 0 andminf;g = 1.
We now introduce some terminology related to the structure of netlists.
Definition 3.18. A structural pathof a netlist is an ordered set of verticeshv0; : : : ; vni such
that8i 2 [0; n� 1℄: (vi; vi+1) 2 E .
Definition 3.19. A directed cycleis a structural pathhv0; : : : ; vni such thatv0 = vn.
Definition 3.20. The sequential weightof the structural pathhv0; : : : ; vni is defined asPni=0(vi 2 R).Definition 3.21. Thecone of influenceof a vertex setU is denoted ascoi(U), and defined
asfanin cone(U) [ fanin cone�Z(R \ fanin cone(U))�.
Definition 3.22. Thecombinational faninof vertex setU is defined asSu2U cfi(u), where
cfi(u) is defined asu if u 2 R, elseu [ combinational fanin�inlist(u)� if u =2 R. This set
is well-formed as per the analysis of Definition 3.4.
Definition 3.23. The combinational fanoutof vertex setU is defined asSu2U cfo(u),
wherecfo(u) is defined asoutlist(u) [ combinational fanout�outlist(u) n R�. This set is
well-formed as per the analysis of Definition 3.4.
Intuitively, thecombinational faninof v contains all vertices in the fanin cone ofvwhich may be reached without passing through a REGISTER, and thecombinational fanout
of v contains all vertices in the fanout cone ofv which may be reached without passing
through a REGISTER.
Definition 3.24. A legal netlistis one which satisfies the following rules.
1. The indegree of each gate is consistent with its specified type. Each INVERTER and
REGISTERhas indegree of 1; each AND gate has indegree greater than 0; each ZERO
and FREE vertex has indegree of 0.
23
2. A legal netlist has a finite number of gates.
3. Every directed cycle has strictly positive sequential weight.
4. The initial value cone of each REGISTER must be entirely combinational, i.e.fanin
cone�Z(R)� \ R = ;.
Hereafter, we assume that all netlists under discussion arelegal. The first three rules
of Definition 3.24 are trivially satisfied by netlists generated by synthesis of HDL, which
encompass the class of netlists which are the primary focus of this thesis. The requirement
that initial values be combinational is not semantically limiting; since the initial values have
semantic significance only at time 0, if one wishes to have a REGISTERu in the fanin cone
of an initial value, they may simply replace the occurrence of u in this initial value cone
by Z(u). This constraint prevents possible ill-formed netlists due to cyclic initial value
definitions – e.g., for REGISTERs u andv, stating thatZ(u) = v andZ(v) = u. These
assumptions collectively ensure that Definition 3.12 is well-formed.
Definition 3.25. A states of a netlist is defined ass 2 S, whereS = 2R.
Intuitively, a states represents the subset of REGISTERs which evaluate to a binary1 at a given time-step; setR n s represents the set of REGISTERs which evaluate to a binary0 at that time-step.
Definition 3.26. We define theinitial statesS0 of a netlist asSp2Pffr 2 R : p�Z(r); 0� =1gg.
Definition 3.27. A Kripke statesK of a netlist is defined assK 2 2R[I .Given a subset of REGISTERs and FREE vertices (a Kripke state), we may use the
semantics of our netlist to provide a unique deterministic valuation to any vertex using the
Simulatealgorithm depicted in Figure 3.1.
24
Binary Simulate (Vertex v; Kripke State sK) fswitch
�G(v)� fcase FREE:case REGISTER:
return v 2 sK;case ZERO:
return 0;case INVERTER:
return :Simulate�inlist(v); sK�;
case AND:return
Vui2inlist(v)Simulate(ui; sK);ggFigure 3.1:Simulatealgorithm
Definition 3.28. The imageof a set of statesA, denoted byimage(A), is defined asfs 2S : 9s0 2 A:9i � I:8r 2 R: �Simulate(inlist(r); fs0; ig) = (r 2 s)�g.Definition 3.29. Thepreimageof a set of statesA, denoted bypreimage(A), is defined asfs 2 S : 9s0 2 A:9i � I:8r 2 R: �Simulate(inlist(r); fs; ig) = (r 2 s0)�g.Definition 3.30. The distancefrom states to s0 is defined asdistance(s; s0) = minfj :9p 2 P:9i 2 N :8r 2 R: �((p(r; i) = 1)$ (r 2 s)) ^ ((p(r; i + j) = 1)$ (r 2 s0))�g.
Because of our convention thatminf;g = 1, Definition 3.30 implies that the dis-
tance from states to states0 is1 if s0 is not reachable froms along any trace. Since a legal
netlist is finite, the distance betweens ands0, provided thats0 is reachable froms, cannot
be1.
Definition 3.31. Verticesv andv0 of N are said to besemantically equivalentiff 8p 2P:8i 2 N : p(v; i) = p(v0; i).Definition 3.32. Vertex setsA andA0 of netlistsN andN 0, respectively, are said to betrace
25
equivalentiff there exists a bijective mapping : A 7! A0 which satisfies the following
conditions.� 8p 2 P:9p0 2 P 0:8i 2 N :8a 2 A: p(a; i) = p0� (a); i�� 8p0 2 P 0:9p 2 P:8i 2 N :8a 2 A: p(a; i) = p0� (a); i�The notion ofbisimilarity, relating state transition graphs of netlists, is more restric-
tive than trace equivalence – bisimilarity implies trace equivalence, though the latter does
not imply the former. However, in an invariant checking domain, trace equivalence is a
sufficient condition for most purposes.
3.1 Verification Algorithms
In this section we briefly introduce several common verification algorithms which are use-
ful to discharge invariants.
There are two primary methodologies for the verification of safety properties: state
traversal techniques and inductive methods. State traversal techniques employ exact or
approximate search to attempt to calculate a trace which hits a target; unreachability is
proven if a search exhausts without finding such a trace. Exhaustive search is performed
by enumerating the reachable states of the design, which is almost exclusively performed
using BDDs [53] to represent the transition relation and state sets [18, 19]. However, more
recently, noncanonical representations have been proposed for reachability analysis [54],
as have satisfiability-based algorithms [55]. Because of their exponential complexity, exact
state traversal techniques – whether symbolic or explicit –are applicable only to smaller
designs with at most several hundred REGISTERs.
Numerous approximate techniques have been proposed to address the capacity lim-
itations of exact state traversal. Overapproximating the set of reachable states is useful to
prove a target unreachable if all target states remain outside the overapproximation, though
26
cannot readily demonstrate reachability otherwise. For example, design partitioning [56]
may be applied to overapproximate the set of reachable states by exploring components
whose sizes are tractable for exact traversal. Similarly, the concept of an injected FREE
fence [34] to obtain an overapproximate localized cone of influence has been suggested for
proving unreachability of a target bylocalization.
Conversely, underapproximate techniques are useful to demonstrate reachability of
targets, but are generally incapable of proving their unreachability. For example, sequential
binary simulationis based upon the combinational algorithm depicted in Figure 3.1, and
consists of the process of evaluating Definition 3.12 to produce a semantically consistent
partial trace. Random selection is used to determine valuations to FREE vertices. As an-
other example, bounded model checking (BMC ) [57] is based upon a satisfiability check of
a finitek-step unfolding of the target. This unfolding process consists of building a combi-
national netlist by recursively evaluating Definition 3.12, injecting a unique FREE vertexvifor each time-stepi of a given FREE vertexv encountered during the evaluation, and sim-
ilarly replicating combinational gates per time-step for any AND, INVERTER, and ZERO
vertices encountered. Note that REGISTERs merely constitute a shift in time-steps, hence
do not appear in the unfolded netlist. If it can be proven thatthe diameterof the netlist
(refer to Definitions 4.1 and 4.2) is smaller or equal tok, BMC becomes complete and can
thereby also prove unreachability; this concept is explored further in Chapter 4. A simi-
lar underapproximate method is based upon a bounded backward unfolding of the design
starting from the target. The unfolded structure comprisesan enlarged targetwhich may
be used to either directly discharge the verification problem or to produce a new, simplified
problem to be solved by a subsequent verification flow. We explore target enlargement
further in Chapter 8. Lastly, a semi-formal toolset, which interleaves between resource-
bounded exhaustive searches and simulation [58, 59], may beuseful to quickly calculate
a trace which hits a target even if that target is too probabilistically difficult to be hit by
random simulation alone, especially when netlist size renders exact search infeasible.
27
An inductive proof requires an invariant (either automatically generated or manually
provided) that implies the property; one then demonstratesthat the invariant holds in all
reachable states. The base step of ak-step inductive proof checks that the invariant holds
during the firstk time-steps. This may be performed by ak-step bounded model check
of the invariant, which is used to validate theinduction hypothesis. The inductive step
must then demonstrate that asserting the invariant during time-stepsi; : : : ; (i + k � 1)implies that it continues to hold at time-step(i + k). Inductive proofs may be performed
via BDD-based analysis [60, 24] or SAT-based analysis [57, 61]. If the proof is completed,
then unreachability of the target is deduced. The general drawback of inductive schemes
is the intrinsic difficulty in determining a powerful enoughinvariant that is inductive and
also implies correctness of the property. However, for manypractical problems, backward
unfolding of the target –target enlargement(see Chapter 8) – yields an inductive invariant
after several steps.
3.2 Figure Symbols
In this section we introduce the symbols that we will use throughout this thesis in our
figures depicting netlists. We illustrate these symbols in Figure 3.2. Though our defined
netlist gate types are only FREE, AND, INVERTER, ZERO, and REGISTER, we often use
more abstract types in our examples for brevity.
Term ite(sel ; a; b) means “ifsel thena elseb.” We label the “data ports”a andb of
multiplexors with a 1 and a 0 in our figures, respectively, indicating which respective net
has its value sensitized through the multiplexor when the selectorsel evaluates to a binary
1 vs. a binary 0, respectively.
28
... ...
...
... ... ...
AND gate AND gate with inverted inputs
b
a
sel
0Multiplexor; = ite(sel ; a; b)c1
OR gate
REGISTER
FREE vertex
INVERTER
Sinkless 1-input AND gate
Combinational logic withoutFREE vertices
Combinational logic, possiblycontaining FREE vertices
Figure 3.2: Figure symbols
29
Chapter 4
Diameter Overapproximation
Techniques
In this chapter, we define thediameterof a netlist, and discuss various diameter overap-
proximation techniques. Diameter is an important topic to verification, since we may use
an overapproximate diameter bound to ensure that an application of BMC is sound and
complete;BMC is often much more efficient than general fixpoint computations used for
reachability analysis. In addition to discussing our structural diameter approximation tech-
nique from [24], which is collaborative work with Andreas Kuehlmann and Jacob Abra-
ham, we formalize a theory for allowing a compositional approach to diameter approxima-
tion which allows arbitrary techniques to be used on a per-component basis. Additionally,
we introduce concepts which will be used throughout this thesis to allow a diameter over-
approximation obtained upon an abstracted netlist to implya bound on the diameter of the
corresponding unabstracted netlist.
Definition 4.1. Thediameterd 2 N of netlistN is the maximum finite distance between
any two statesplus one: d = maxfs;s02S:distan e(s;s0)6=1g distance(s; s0) + 1.
In other words, if any states0 is reachable from states, then s0 is reachable in
less thand time-steps froms. This implies that an exhaustive bounded state traversal of
30
depth0; : : : ; d�1 is sufficient to determine whether a target is hittable or unreachable since
FREE vertices may take values independently at any point in time,and since a deterministic
valuation to any vertex may be obtained from a valuation toR [ I as per the algorithm
Simulateof Figure 3.1. Note that our definition of diameter is one greater than the standard
definition for graphs; this simplifies the exposition of our compositional techniques, and
matches the number of time-steps necessary to ensure completeness ofBMC .
In many cases, using diameter to bound the depth of application of BMC is not
tight. For example, to assess reachability of a target, we may ignore any vertices outside
of its cone of influence, which may decrease diameter. This observation is one of the
motivations for an alternate diameter definition we presentin Definition 4.2. Additionally,
a BMC application for the maximum distance from any initial staterather than from any
reachablestate suffices for invariant checking. Furthermore, for invariant checking targett, we need only perform a search deep enough to assess whether we may toggle the target
from 0 to 1 relative to an initial state; the amount of time necessary to toggle the target from
a 1 to a 0 may be exponentially greater. This concept is revisited in Theorem 8.3. However,
the more conservative diameter bound will be necessary for our compositional algorithms,
and most approximation techniques directly yield diameterbounds.
Note that use of Kripke states in Definition 4.1 may yield a result one larger. For
example, a purely combinational netlist containing FREEvertices has a diameter of 1. How-
ever, this netlist will have multiple Kripke states, and each of these Kripke states may tran-
sition to every other, hence it will have a Kripke diameter of2. Next, consider a netlist with
no FREE vertices but a set of REGISTERs which act as amod -counter. We will obtain a
diameter of for this netlist whether or not we use a Kripke representation, since it requires � 1 time-steps to transition the counter from any state to the corresponding furthest state.
To applyBMC in a complete manner, for the former combinational netlist we need only
verify one time-step. For the latter netlist, we must verify time-steps. These tight bounds
are accurately reflected in our non-Kripke diameter definition. Thus, inclusion of FREE
31
vertices in our state model unnecessarily weakens our diameter bounds. We now introduce
an alternate diameter definition, which will allow further tightening.
Definition 4.2. Thediameterd(U) of vertex setU is the minimum number such that for any
tracep and any increasing succession1 k1; : : : ; k , there exists another tracep0 and another
increasing successionl1; : : : ; l such thatV j=1(lj � kj) and
�l � l �1 + d(U)�, takingl0 = �1, which satisfies8u 2 U:V j=1 �p(u; kj) = p0(u; lj)�.By Definition 4.2, the diameter of verticesU actually need not correlate to that of oi(U). For example, if a vertexu encodes an XOR function of a FREE vertex and a se-
quential coneA, thend(u) = 1 regardless of that ofA since any valuation tou will be
producible at any time-step. This definition provides an opportunity to bound diameter
without a need to analyze the underlying state space representation, which is key to under-
standing our structural diameter overapproximation algorithm of Figure 4.2. Furthermore,
this definition is extended in Theorem 4.3 to enable a bound obtained on a transformed
netlist to be used to imply a bound for the original, untransformed netlist.
Theorem 4.1. The diameterd of Definition 4.1 is equal or one greater thand(V) of Defi-
nition 4.2.
Proof. We consider two cases. First, assume that the netlist has a diameter of 1 by Defi-
nition 4.1. This implies that either the netlist is combinational henceS = ;, or the netlist
has REGISTERS though they act as constants – i.e., no state may transition to any other.
Because of the lack of any sequential behavior of the netlist, any valuation reachable at
any timei must be reachable at every time-step, thus we also obtain a diameter of 1 by
Definition 4.2.
Second, assume that the netlist has ad > 1 by Definition 4.1. Lets ands0 represent
a maximally-distant state pair withdistance(s; s0) < 1. If there exists such a maximally-
distant state pairs ands0 such thats is an initial state, ands0 is not reachable in any trace
1An increasing succession is an ordered set of natural numbers k1; : : : ; k for � 1 which satisfies therelationki < ki+1;8i 2 [1; � 1℄.
32
before timed� 1, then these two definitions clearly yield identical resultsby using = 1in Definition 4.2. Otherwise, assume that states00 transitions to states along some trace.
Let x be the FREE vertex valuation which transitions the netlist froms00 to s. We note that
the minimum number of time-steps necessary to witnesss0 after witnessingfs00; xg in any
trace is exactlyd, thus these two definitions yield identical results. Lastly, assume that we
may not transitionto any states which is maximally distant to any other states0 (i.e., smust be an initial state); additionally, states0 is reachable more shallowly along a trace not
passing throughs. Definition 4.2 will yield a bound one smaller than that of Definition 4.1,
since it will taked� 1 time-steps to witnesss0 after witnessings. Thus, the diameter with
respect to Definition 4.2 is often identical to, though occasionally one less than, that with
respect to Definition 4.1.
Theorem 4.1 illustrates an interesting result; our diameter of Definition 4.2 is less
than or equal to that of Definition 4.1, though the former but not the latter may include FREE
vertices. As per the previous discussion of themod -counter, we cannot merely drop the
addition of1 from Definition 4.1 to attempt to yield an identical bound. This increment is
generally necessary to ensure completeness ofBMC , and indeed our proof of Theorem 4.1
indicates that in most cases these two definitions yield identical bounds. Nevertheless, the
bound of Definition 4.2 is sufficient both for invariant checking (as follows from assigning = 1) as well as for bounding the diameter of isolated components(refer to Theorem 4.2).
We now define recurrence diameter, which constitutes an overapproximation of di-
ameter. Our definition is one greater than that of [57] for consistency with diameter.
Definition 4.3. The recurrence diameterdr 2 N of a netlistN is defined as the length
of its maximal acyclic state sequence. In other wordsdr = maxfj : 9p 2 P:9i 2N :9s1 ; : : : ; sj 2 S:8r 2 R:�Vjk=1((p(r; i+ k� 1) = 1)$ (r 2 sk))^ (8k; l 2 [1; j℄:(k 6=l) ! (sk 6= sl))�g.There are two characteristics of practical netlists which may be exploited to com-
pute tight diameter bounds. First, netlists seldom represent monolithic structural strongly
33
connected graphs. Instead, they often comprise multiple maximal SCCs; an approxima-
tion of diameter may thus be compositionally derived from anestimation of the individual
SCC diameters. Second, although the diameter of a componentis generally exponential
in its REGISTER count, several commonly occurring structures have much tighter bounds.
For example, as proven in Theorem 4.2, the diameter of a single memoryrow compris-
ing n REGISTERs is 2 instead of2n; acyclic REGISTERs only cause a linear, rather than
multiplicative increase in diameter.
...
...
...
...
.........
Cj�2 Cj+1CjCj�1TSAP j+1TSAP jTSAP j�1
Figure 4.1: Slice ofTSAPstructure
Definition 4.4. A topologically sorted acyclic partitioning (TSAP ) of V into n com-
ponents is a labelingTSAP : V 7! f1; : : : ; ng such that8u; v 2 V:�(u; v) 2 E )TSAP(u) � TSAP(v)�. We denote thei-th component of aTSAP by the setTSAP i =fv : TSAP(v) = ig.Note that the acyclic requirement mandates thatTSAPTSAP(v) � SCC (v). LetCi = fTSAP i\Tg[fu : 9v 2 V:�(u; v) 2 E^u 2 Sij=1TSAP j^v 2 Snj=i+1TSAP j�g.
SetCi comprises the targets inTSAP i, in addition to vertices of components0; : : : ; iwhich
have sinks in componentsi + 1; : : : ; n. For example, in Figure 4.1, some elements of
componentTSAP j�1 are included inCj andCj+1, though no elements ofTSAP j are
included inCj+1 since no outgoing edges fromTSAP j have sinks beyondTSAP j+1. We
useCi in our compositional diameter overapproximation approach; it is the vertices inCi \ TSAP i which must be considered in our bound forTSAP i.34
Definition 4.5. We distinguish between the followingTSAP component types. Letxi be a
REGISTERvertex andyi be the source of the incoming edge toxi.� A combinational/constant component(CC) contains only non-REGISTER vertices,
or REGISTERs whose incoming edges are sourced by themselves; i.e.yi = xi. FREE
vertices may only appear inCCs.� An acyclic component(AC) contains only REGISTERvertices whose incoming edges
are inputs to the component.� A memory component(MC) is composed solely of a set ofr � REGISTERs and
combinational gates, forr � 1 and � 1. The next-state functions of the REGISTERs
have the form:yi;j = (xi;j ^Vwk=1 :load i;k) _Wwk=1(datai;j;k ^ load i;k), for 1 � i �r and 1 � j � , wheredatai;j;k and load i;k are inputs to the component. Letrows(TSAP i) = r for MCTSAP i.� A queue component(QC) is composed solely of a set ofr � REGISTERs and com-
binational gates, forr > 1 and � 1. The next-state functions of the REGIS-
TERs have the form:y1;j = (x1;j ^ Vwk=1 :loadk) _ Wwk=1(dataj;k ^ loadk); yi;j =(xi;j ^ Vwk=1 :loadk) _ (xi�1;j ^ Wwk=1 loadk), for 1 < i � r and 1 � j � ,wheredataj;k and loadk are inputs to the component. Letrows(TSAP i) = r for
QCTSAP i.� All remaining components are termedgeneral components(GCs). We note thatR \TSAP i 6= ; for GCs. If there exists a combinational path from an input ofTSAP ito any combinational gateu 2 TSAP i, andu 2 Ci, we say that theGC is Mealy.
Note thatMCs andQCs have been generalized forw load ports. Further general-
izations are possible, though we have found these adequate for most commonly-occurring
structures. The constant REGISTERs in CCs may have constant initial values (in which
35
case they may be simplified by constant propagations) or symbolic initial values (e.g., im-
plementingforall variables). As we shall demonstrate, our overapproximation algorithm
provides the smallest bounds for TSAPs with maximally-sized ACs,CCs, MCs, andQCs.
Obtaining such a partition is a simple linear-time procedure: we first identify the cyclic
vertices using a fanin or fanout sweep. Any REGISTERs not in the cyclic subset will beAC
elements. The other REGISTERs are then classified as follows: if the incoming edge of a
REGISTERis sourced by itself, it is aCC. Otherwise, we use a pattern-matching heuristic to
see if the REGISTERappears as atable cell; i.e., an element of aQCor MC. If so, we hash
the corresponding REGISTER based upon itsload vertices. All REGISTERs with identical
load vertices are candidates for appearing in the sameMC or QC component. Finally, we
selectively cluster components in an attempt to maximize the size of theACs, MCs, and
QCs, while preventing the introduction of cycles in the partition graph.
Our approximation of the diameter of targett is based upon aTSAP of its cone of
influence. We ascribe anadditiveelementda and amultiplicativeelementdm with eachTSAP component, using the algorithm of Figure 4.2. Term�(i) in this algorithm denotes
whetherTSAP i entails a cut between components0; : : : ; i�1 and componentsi+1; : : : ; n,
and�(i) = :�(i). In Figure 4.1, onlyTSAP j�1 entails a cut (provided that it is not aMealy
GC), hence�(j�1) = 1, whereas�(j) = �(j+1) = 0. TermDi represents an upper-bound
on the diameter ofCi \TSAP i in the context ofN ; clearly2jR\TSAP ij is conservative, and
may be improved upon by various mechanisms as we will discusslater in this chapter.
Theorem 4.2.The valueda(i)+dm(i) obtained by the algorithm of Figure 4.2 is an upper-
bound on the diameter ofCi. This implies thatd(t) = da�TSAP(t)� + dm�TSAP(t)� is
an upper-bound on the diameter of targett.Proof. We will prove this theorem by induction oni. Our proof is based upon the hy-
pothesis that any arbitrary succession of reachable valuations toCi is producible within� = � dm(i) + da(i) time-steps. Restating this hypothesis more formally: for any in-
creasing succession of time-stepsk1; : : : ; k and any tracep, there exists another increasing
36
hda; dmi Preprocess Diameter(Netlist N; TSAP A) fdm(0) = 1;da(0) = 0;for(i = 1; i � jAj; i++) f
if�(i � 1) _ (Ci�1 \ Ci 6= ;) _ (type(Ai) � Mealy GC )� f�(i) = 0;g
else f�(i) = 1;g�(i) = 1� �(i);dm(i) =8><>:dm(i� 1) : type(Ai) 2 fCC ;ACgdm(i� 1) � �rows(Ai) + �(i)� : type(Ai) 2 fMC ;QCgdm(i� 1) � �Di � �(i)�+ �(i) : type(Ai) � GCda(i) =8><>:da(i� 1) : type(Ai) 2 fCC ;GCgda(i� 1) + �(i) : type(Ai) 2 fMC ;QCgda(i� 1) + 1 : type(Ai) � ACgreturn hda; dmi;g
Figure 4.2: Algorithm for calculation ofda anddmsuccessionl1; : : : ; l such that
V j=1(lj � kj) and(l < � ), and another tracep0 such that8u 2 Ci:V j=1 �p(u; kj) = p0(u; lj)�. This theorem follows from assigning = 1.
The intuition behind this hypothesis is that componentTSAP i+1 may transition
from each of its states only upon witnessing a distinct valuation toCi. Therefore, in order
to ensure that we attain an upper bound on the diameter ofTSAP i+1, we generally must
wait for a succession of = Di+1 valuations toCi. For example, ifTSAP1 is amod-4
counter, andTSAP 2 is amod-5 counter, we will assign = 5 since we need to wait for5valuations toC1 to be sure that we attain an upper-bound on the diameter ofC2.
37
Our base case hasi = 1. If type(TSAP1) = CC, we obtaindm(1) = 1 andda(1) =0. This result is correct, since any valuation producible byC1 is producible every time-step
due to its lack of sequential behavior. We note thattype(TSAP1) cannot beMC, QC, orAC
since those types require other components to drive their inputs. Finally, iftype(TSAP1) =GC, thendm(1) = D1 which is an upper bound on the diameter ofC1 by definition, hence
our proof obligation is satisfied.
We next proceed to the inductive step. Iftype(TSAP i+1) = CC, then our result is
correct by hypothesis, noting thatTSAP i+1 is a purely combinational function ofCi, as
well as FREE vertices and REGISTERs which behave as constants. Iftype(TSAP i+1) =AC, thendm(i+1) = dm(i) andda(i+1) = da(i)+1. This result is correct since the initial
values of anAC have semantic importance only at time 0, and since anAC merely delays
some valuations toCi by one time-step. Iftype(TSAP i+1) 2 fMC, QCg, then we obtaindm(i + 1) = dm(i) � �rows(TSAP i+1) + �(i + 1)� andda(i+ 1) = da(i) + �(i + 1). This
result is correct by noting that it can take at most � dm(i) + da(i) time-steps to reach any
possible succession of valuations toCi by hypothesis. If�(i + 1) = 1, thenCi fans out
to Ci+2, meaning that we generally must wait for = �rows(TSAP i+1) + 1� valuations
to Ci to be sure that we have an upper bound on the diameter ofCi+1. If �(i + 1) = 0,
then we need only wait for = rows(TSAP i+1) valuations toCi, plus one extra time-step
for the load to take effect uponCi+1. Lastly, if type(TSAP i+1) = GC, thendm(i + 1) =dm(i) � �Di+1 � �(i + 1)� + �(i + 1) andda(i + 1) = da(i), whereDi+1 is defined as an
upper-bound on the diameter ofCi+1 \ TSAP i+1. For �(i + 1) = 0 this result is obvious.
Otherwise, note that any trace segment begins in one state ofTSAP i+1, and = (Di+1�1)transitions – which must initiate within � dm(i) + da(i) time-steps, plus one for the final
transition to complete – is sufficient to putTSAP i+1 into any of its subsequently-reachable
states. Hencedm(i+ 1) = dm(i) � (Di+1 � 1) + 1 time-steps satisfies our obligation.
We demonstrate the use of our structural diameter overapproximation algorithm for
the netlist depicted in Figure 4.3. We have partitioned thisexample netlist into six compo-
38
nents. The first component to the left is aCCcontaining only combinational logic, possibly
including FREE vertices. Our algorithm providesda(1) = 0 anddm(1) = 1; thus the diam-
eter overapproximationd that our algorithm would ascribe to any vertex in component 1is
1. This result implies that we need to check such a target onlyfor time-step 0 to provide
an exacthit/unreachableanswer, which is intuitive since component 1 does not act sequen-
tially; any reachable valuation to the vertices in this component will be reachable every
time-step, thus if a target cannot be hit at time 0, then it cannot be hit at any time-step.
We next compose anAC onto this first component. Our algorithm providesda(2) = 1 anddm(2) = 1; thusd for any vertex in component 2 will be 2. This implies that we need only
check such a target for time-steps 0 and 1, which is again intuitive since the time-0 check
will validate whether the initial values can hit the target,and at time 1 any possible valua-
tion to component 1 will propagate through component 2. We next compose anotherCC,
which does not affect diameter; a two-step bounded check is complete since this bounds
the diameter of component 2, and since component 3 does not act sequentially. Component
4 is anAC and adds one to diameter, which is correct as per the analysisof component 2.
We next add anMC with two rows, which constitutes a cut of the netlist. It thusadds one
to theda sum and doubles thedm product, yielding a diameter bound of 5 for vertices in
this component. This result is conservative since we need towait at most for two additional
time-steps over the diameter of component 4 to be sure that all possibleload anddataval-
ues will propagate into these two memory rows. Note thatload values may be correlated
hence mutually exclusive, which is why we must doubledm to be sure that we have waited
long enough for two loads to occur. Lastly, we compose another CCas component 6, which
does not affect diameter as per the previous discussion.
The following corollary is an immediate consequence of Theorem 4.2.
Corollary 4.1. Given an arbitraryTSAPof a netlistN , we may compositionally obtain
a diameter bound by the algorithm of Figure 4.2 while using anarbitrary mechanism to
obtain a diameter boundDi for each componenti in the context ofN . Furthermore, the
39
d = 2 d = 31 d = 22d = 1 d = 5 d = 5653 4mad (5) = 2ma ma d (3) = 1d (1) = 0d (1) = 1 d (2) = 1d (2) = 1 d (3) = 1 d (4) = 2a d (5) = 3m md (4) = 1 aam d (6) = 3d (6) = 2
10
01
ACCC CCMCACCC
Figure 4.3: Diameter overapproximation example
diameter bounds obtained by the algorithm of Figure 4.2 for each isolatedCC, AC, MC,
andQCare overapproximate regardless of the nature of the overallnetlist.
Corollary 4.1 implies that we may use different techniques to obtain a diameter
bound on the various components of aTSAPand still obtain an overall overapproximate
bound. This is a noteworthy result, since general-purpose exact diameter calculation pro-
cedures are presently intractable (refer to Section 4.1), and since overapproximation tech-
niques may yield results which are tight, or exponentially loose, or anywhere between.
However, different techniques may yield superior results on different components. Con-
sider, for example, a complete state graph, such as that induced by a vector of FREE ver-
tices driving a parallel vector of REGISTERs as illustrated by component 2 of Figure 4.3.
Corollary 4.1 states that the diameter of thisAC is at most two. However, use of an overap-
proximaterecurrence diameterwill yield an exponentially loose bound for thisAC. In other
cases, recurrence diameter may yield a tight bound. Our compositional approach therefore
provides a theoretical framework to enable a robust mechanism for efficiently obtaining
as tight a diameter bound as possible using a variety of techniques. Since diameter over-
40
approximation enables the use of bounded verification algorithms instead of often more
costly unbounded algorithms for assessing unreachability, this theory, coupled with further
advances in diameter estimation techniques, may well become a powerful cornerstone of a
robust, multi-faceted verification strategy.
Note that it is critical to obtain a bound on the diameter of each component in the
context of its cone-of-influence. To visualize this, assumethat a givenGC is an-bit counter
with a parallel load input port. If theload input is asserted, the valuation at theparallel
dataport will be loaded into the counter; else the counter will increment. If we isolate this
counter for semantic analysis, since every state may reach every other state via thisload
mechanism, the isolated counter will have diameter of 2. However, in the context ofN , the
load input may be semantically equivalent to 0, or perhaps some valuations to theparallel
data port are unsensitizable – thus implying a potentially exponentially greater diameter
for this component in the context ofN . In this example, the partitioned analysis enables
state transitions in the component which are unreachable inthe context ofN . The diameter
obtained on the isolated component may conversely be largerthan that in the context ofNsince unreachable states may become reachable. Therefore,we conclude that we cannot
use a tight diameter bound obtained upon an isolated component without consideration of
its cone of influence. However, use of recurrence diameter obtained from isolated analy-
sis is conservative; the possible additional states and state transitions which are an artifact
of partitioned analysis may only increase recurrence diameter. Similarly, we may use the
number of states reachable in the isolated component instead of 2jR\TSAP ij. Additionally,
we discuss the impact of all of the abstraction techniques presented in this thesis upon di-
ameter, thus enabling a transformation-based approach at calculating a diameter boundDiof Ci \TSAP i for each componenti. Each transformation will render a component which
may be recursively partitioned and analyzed using the theory presented in this chapter.
For example, it is possible that an abstraction may yield a component which may be sub-
partitioned into more “diameter-friendly” types such asACs andMCs. Alternatively, due
41
to the potential for REGISTER reduction inherent in our abstractions, semantic approaches
are likely to become more efficient and yield tighter bounds.
Theorem 4.3. Let N andN 0 be netlists which are trace-equivalent with respect to vertex
setsA andA0 and bijective mapping : A 7! A0. The diameter ofA is equal to that ofA0.Proof. This theorem follows immediately from Definitions 3.32 and 4.2.
While the result of Theorem 4.3 may seem obvious from the trace equivalence ofAandA0, it is somewhat counter-intuitive since the cone-of-influence ofA andA0 may be ar-
bitrarily dissimilar. For example, the cone-of-influence of Amay be entirely combinational
while that ofA0 include REGISTERs. Theorem 4.3 represents a powerful observation: we
may derive a bound for the diameter of a set of vertices based upon analysis of another
trace-equivalent set of vertices. We therefore could view our diameter ofA in the context
of N as being equivalent to that ofA with oi(A) n A being replaced by any arbitrary
set of vertices which preserves trace-equivalence ofA. While finding such a minimal safe
replacement will often be computationally infeasible (similarly to the computational com-
plexity of bisimilarity reductions [27]), we heuristically will consider specific replacements
as those resulting from optimal solutions to our structuralabstraction techniques. We will
exploit this fact in later chapters to demonstrate how a diameter bound obtained upon a
transformed (e.g., retimed) netlist implies a diameter bound on the original netlist.
4.1 Related Work
In this section we discuss prior research in diameter estimation techniques. Note that
breadth-first reachability analysis may be used to calculate the distance between states,
and thus yield a diameter bound. However, this approach is not practically useful, since a
reachability calculation from the initial states is sufficient to solve a verification problem.
The techniques of [57, 61] propose two uses of satisfiabilityalgorithms to attempt
to obtain a bound. First, quantified boolean formulae (QBF) are capable of providing tight
42
diameter bounds, though their solution is PSPACE-complete[62] and effective heuristics
have not yet been demonstrated. Second, recurrence diameter may in cases be tight, though
in others may be exponentially loose (recall theACexample discussed after Corollary 4.1).
In [61] it is further proposed to use a hybrid between these two approaches to attempt to
partially alleviate their shortcomings. Both of these techniques rely on heavy semantic
analysis which often outweighs the complexity of theBMC of the target itself, and which
significantly limits their applicability to practical problems. Our structural approach con-
sumes trivial resources, though it may also yield an exponentially loose solution in the case
of GCs. However, for other component types our approach does provide near-tight bounds.
Furthermore, our compositional theory allows a per-component hybrid use of structural vs.
semantic techniques, hence these semantic approaches are complementary tools to enable
our theory to obtain the tightest possible overapproximations with minimal resources.
The technique of [63] proposes using directed simulation toestimate diameter. The
primary drawback of this approach, and a significant differentiating factor with respect to
our technique, is that it constitutes neither an overapproximation nor an underapproxima-
tion of diameter, hence is not useful for enabling completeness ofBMC . The computational
resources reported in [63] also outweigh ours by several orders of magnitude.
In [17], we demonstrate that an acyclic netlist may be transformed into a purely
combinational netlist as a special case of -slow abstraction. However, the theory presented
in this chapter yields a smaller netlist through unfolding in such cases due to obviating
the need to represent “initial value selection” logic (refer to Chapter 9). Furthermore, this
chapter generalizes feed-forward -slow abstraction in allowing minimal unfolding of other
types of cyclic logic (such asCCs andMCs).
4.2 Experimental Results
We defer experimental results for our diameter approximation technique until Section 6.4.3,
so that we may study its synergy with various abstractions.
43
Chapter 5
Redundancy Removal
In this chapter we discuss redundancy removal techniques, by which we mean transforma-
tions which structurally replace vertices in the netlist graph with semantically-equivalent
vertices, thereby minimizing the total number of vertices in the cone-of-influence of a tar-
get. The common optimization technique ofconstant propagation[64, 65] is a special case
of redundancy removal, which entails merging vertices ontoZERO or ONE. The intricacy
of exploiting this technique lies in efficiently detecting as many semantically equivalent
vertices as possible to achieve optimal reductions.
The crux of redundancy removal is theMerge algorithm of Figure 5.1, which moves
all outgoing edges from one vertexv0 to another vertexv, and causesv0 to shadowv. In
order to ensure soundness and completeness for invariant checking, theMerge function
may generally only be applied to two vertices which are determined to be semantically
equivalent. To ensure legality of the resulting netlist, and to ensure efficiency, we addi-
tionally require that we do not merge a vertex with a purely combinational fanin cone onto
one which has a sequential fanin cone, and that we only merge aREGISTER onto another
REGISTER or a constant vertex.1 Practically, we may swap the merge arguments, or the
order of merges, to circumvent this limitation – and often achieve superior reductions and
1This rule may be relaxed, as long as care is taken not to introduce combinational cycles when merging aREGISTERonto an AND vertex.
44
void Merge(Vertex v; Vertex v0) fif(v � v0) f
return;gforeach u 2 outlist(v0) f
Delete Edge(v0; u);Add Edge(v; u);g
foreach u 2 inlist(v0) fDelete Edge(v0; u);gG(v0) = AND;
Add Edge(v; v0);foreach r 2 R f
if�Z(r) � v0� fZ(r) = v;gg
if(v0 2 T) fT = fT n v0g [ v;ggFigure 5.1: StructuralMerge algorithm
run-time in doing so.
Theorem 5.1. Algorithm Merge(v0; v) does not alter the semantics of any vertex inN ,
provided thatv andv0 are semantically equivalent.
Proof. We note that after the merge, verticesv andv0 are still semantically equivalent sincev0 shadowsv. We therefore conclude by Definition 3.12 that any trace which is legal before
aMerge is legal after and vice-versa.
Theorem 5.1 implies that redundancy removal is sound and complete for invariant
checking. Furthermore, this theorem implies that trace lifting merely requires a call to
Simulate to propagate consistent valuations to any merged vertices.
45
Theorem 5.2.Redundancy removal generates a legal netlist.
Proof. We consider the requirements for legality enumerated in Definition 3.24.
1. The only gates modified by redundancy removal become one-input AND gates, which
are legal. All other gates are legal by assumption.
2. No gates are created by redundancy removal, hence the resulting netlist is finite by
assumption.
3. Redundancy removal cannot merge a combinationally-driven vertex onto a sequen-
tially-driven one. Initial value cones are combinational by assumption, thus cannot
be made sequential by redundancy removal.
4. Any REGISTER transformed by redundancy removal is merged onto another REGIS-
TER, or onto ZERO or ONE. Therefore, each directed cycle remains sequential, or is
broken.
Theorem 5.3. If the diameter of a set of verticesU of a redundancy-removed netlist isd(U), then the diameter ofU prior to redundancy removal is alsod(U).Proof. This proof follows from Theorem 4.3, lettingN represent the netlist before redun-
dancy removal, andN 0 represent the netlist after redundancy removal. ClearlyU andU 0are trace-equivalent vertices with respect to defined as the set of corresponding tupleshu; u0i for eachu 2 U .
5.1 Redundancy Removal Algorithms
In this section we discuss efficient algorithms and data structures for performing redun-
dancy removal. The core redundancy removal algorithm is theMerge function defined in
46
Figure 5.1. However, the intricacy of exploiting redundancy removal lies in efficiently de-
tecting as many semantically equivalent vertices as possible to achieve optimal reductions.
There are two distinct, complementary approaches for this detection. On-the-fly compres-
sion minimizes vertex count during netlist construction [50, 25] by exploiting constant fold-
ing techniques (such asa ^ :a = 0) and by merging isomorphic vertices. Post-processing
techniques such as BDD sweeping [51] are used to identify semantically-equivalent ver-
tices which are too structurally dissimilar to be identifiedas such by the efficient yet lim-
ited on-the-fly techniques. On-the-fly techniques augment the use of semantic approaches
by keeping the original netlist representation as compact as possible, and by exploiting the
merging initiated by the semantic analysis.
In this section we focus upon our technique for on-the-fly redundancy removal
from [25], which is collaborative work with Andreas Kuehlmann. This technique, termed
on-the-fly retiming, is based upon an AND/INVERTER/REGISTER graph representation of
the netlist described in Definition 5.1. The AND/INVERTER/REGISTER graph is a power-
ful and compact netlist representation, useful not only forredundancy removal but also for
a retiming implementation (as discussed in Chapter 6) sinceit closely matches a retiming
graph representation.
Definition 5.1. An AND/INVERTER/REGISTERgraph representation of a netlist is a graph
where all vertices are of type AND or FREE; INVERTERs and REGISTERs are represented
implicitly as edge attributes. With each edge(u; v) 2 E we associate a tuplehw;E; ii.� Weightw(u; v) 2 Z represents the number of REGISTERs along this edge.� TermE = E1uv; : : : ; Ew(u;v)uv represents the corresponding sequence of initial values
for the REGISTERs along this edge.� Termi(u; v) 2 f0; 1g is aninvertedattribute indicating whether the edge function is
complemented; if 1, the corresponding INVERTER is at the fanout of any REGISTERs
along the edge.
47
We may map an AND/INVERTER/REGISTER graph to a netlist as demonstrated in
Figure 5.2. For each edge(u; v) in the AND/INVERTER/REGISTER graph, there will be
a structural path in the netlist beginning with vertexu and ending with vertexv. This
path will include a sequence ofw(u; v) intermediate REGISTERs whose initial values are
determined by the sequenceEiuv. Additionally, if i(u; v) is 1, there will be an INVERTER
at the fanout of the sequence of REGISTERs.
...(b)(a)
v Z(r1) = E1uvZ(rw(u;v)) = Ew(u;v)uvrw(u;v)u u r1w;E synthesis i synthesis v...hw;E; ii
Figure 5.2: Mapping the AND/INVERTER/REGISTER graph to a netlist: (a)AND/INVERTER/REGISTERgraph edge, (b) corresponding netlist fragment
The direct mapping to a netlist as depicted in Figure 5.2 doesnot account for the
concept of fanout REGISTERsharing as proposed in [66]. As is depicted in Figure 5.3, the
REGISTERs along all outgoing edges from a given source vertex may be shared, provided
that their initial value sequences are compatible. A more efficient mapping of an AND/IN-
VERTER/REGISTERgraph to a netlist should account for fanout REGISTERsharing, hence
will generate only the maximum number of REGISTERs along any outgoing edge from a
vertex, rather than the sum across all outgoing edges.
Note that an individual gate of the resulting netlist is identifiable by a source vertexu and a set of attributeshw;E; ii. For this reason, our algorithms for constructing gates
in the AND/INVERTER/REGISTER graph provided in Figures 5.4 and 5.5 returnvertex,
attributetuples, and take such tuples as operands.
Edge weightsw will always be non-negative given that our only sequential elements
are REGISTERs. However, we introduce the NEGATIVE REGISTER in Chapter 6. Allowing
48
(a) (b)
u uw = 2; i = 1w = 1; i = 0w = 0; i = 1Figure 5.3: Fanout REGISTERsharing example
negative weights enables use of the AND/INVERTER/REGISTERgraph for netlists contain-
ing NEGATIVE REGISTERs.
As discussed, the functions represented by two edges are semantically equivalent if
(though not necessarily “only if”) they have: (1) the same source vertex, and (2) the samehw;E; ii attributes; in this case, they correlate to the same netlistgate. In our implementa-
tion of this data structure, we use a compact 64-bit word to uniquely represent these tuples.
This word is composed of four bit fields: an index into the array of graph vertices, the num-
ber of edge REGISTERs, an index to a canonical representation of their initial values, and a
single bit to indicate edge complementation. Using this data structure, a simple comparison
of two words may decide whether two edges are semantically equivalent or inverted.
The canonical representation of initial values is based upon a tree structure where the
paths correspond to sequences of initial values of the edgesof N . The tree root is aNULL
dummy node. The first level of children represents the initial values of the first REGISTER
along any edge. The tree branching structure corresponds tothe different combinations of
initial values of all edges. By ensuring uniqueness of the individual paths and subpaths
during tree construction and manipulation, a pointer to anyof the tree nodes provides a
representation that is canonical for that particular set ofinitial values. (However, the initial
value vertices themselves may not be canonical.)
The technique of on-the-fly retiming is used to eliminate sequential redundancy by
applying specific, local retiming [66] steps during the construction of the AND/INVER-
49
TER/REGISTER graph. Similar to the use of an AND/INVERTER graph for combinational
netlists [50], this approach may result in a significant compaction of the netlist represen-
tation without significant time or memory overhead. The on-the-fly retiming step is inte-
grated into our algorithms for constructing AND gates and REGISTERs. We demonstrate
the algorithm for 2-input AND gates in Figure 5.4, and for REGISTERs in Figure 5.5. The
operands to these functions each comprise a source vertex and a set of edge attributes, cor-
relating to an AND/INVERTER/REGISTER graph edge without a sink vertex. The graph
construction begins from FREE and ZERO vertices and an arbitrary set of REGISTER cuts
of the cyclic logic. For each cut, first a dummy 0-input AND vertex is created and used as
a place-holder. Once the next-state function of the corresponding REGISTER is built, the
place-holder is merged onto that structure.
The AND construction algorithm first performs constant folding similar to methods
applied in combinational netlist compaction [50]. Next, the REGISTERsequence along both
edges are truncated by “dragging” as many REGISTERs as possible through the AND vertex.
The edge truncation is performed by functionTruncate Registers; note that for inverted
input edges, these truncated initial values must be inverted before being returned. The
initial values of the dragged REGISTERs are computed by a pairwise AND of the truncated
initial values. This operation is performed by the functionAnd Initial Values. If a pre-
existing node corresponding to this conjunction is found inthe initial value tree, it is reused;
otherwise, a new node is constructed. We next form an AND vertex over the truncated
edges. We swap incoming edges to the AND vertex to capture commutativity using an
arbitrary ordering functionRank, then check to see if an isomorphic AND vertex exists. If
so, it is reused; otherwise, a new AND vertex is created and hashed. The edge correlating
to the set of dragged REGISTERs is then returned with the source AND vertex. For an AND
gate withn inputs, the computational resources required for each callto this algorithm areO(n � wmin) due to REGISTERdragging and initial value conjunction; the other operations
require constant time, assuming a constant-time hashing function.
50
/* Create And takes two operand edges e1 and e2, andreturns an edge representing their conjunction */
AIR Edge Create_And(AIR Edge e1; AIR Edge e2) fif (e1 � ZERO) return ZERO;if (e2 � ZERO) return ZERO;if (e1 � ONE) return e2;if (e2 � ONE) return e1;if (e1 � e2) return e1;if (e1 � :e2) return ZERO;
/* Truncate as many REGISTERs as possible from eachedge, and store them as edge attributes in Ei */wmin = minfw(e1); w(e2)g;e01; E1 = Truncate_Registers(e1; wmin);e02; E2 = Truncate_Registers(e2; wmin);
/* Merge the initial values by ANDing them */E = And_Initial_Values(E1; E2);/* Apply ranking to exploit commutativity */if
�Rank(e01) > Rank(e02)� Swap(e01; e02);
/* Hash lookup for AND over e01 and e02 */e = Hash_Lookup(e01; e02);/* Create & hash new vertex if lookup fails */if (e � NULL) fe = Create_And_Vertex(e01; e02);g/* Add back dragged REGISTERs */return e; hwmin ; E; 0i;gFigure 5.4: AND/INVERTER/REGISTER-graph algorithm for AND gate creation
51
/* Create Register takes two operand edges: en representingthe input to the REGISTER, and ei representingits initial value. It returns an edge representingthe corresponding REGISTER */
AIR Edge Create_Register(AIR Edge en; AIR Edge ei) fif ((en � ZERO) ^ (ei � ZERO)) return ZERO;if ((en � ONE) ^ (ei � ONE)) return ONE;i = i(en);/* Drag inversion past REGISTER */if (i) fei = Create_Inverter(ei);ge; E = Create_Edge(en; ei);return e; hw(en) + 1; E; ii;gFigure 5.5: AND/INVERTER/REGISTER-graph algorithm for REGISTERcreation
The REGISTERconstruction algorithm first attempts to replace the REGISTERwith
a constant. If unsuccessful, it drags any inversion past theREGISTER being created by
inverting the corresponding initial value using functionCreate Inverter . Create Inverter
merely toggles the inversion attribute of the corresponding edge.Create Edge looks for a
node in the initial value tree correlating toei as a child ofen; if it finds one, it reuses this
node, otherwise a new node is constructed. Each call to this algorithm requires constant
time, assuming a constant-time hashing function.
We introduce an example netlist in Figure 5.6a, and its corresponding AND/IN-
VERTER/REGISTER graph in Figure 5.7a. If this graph is created using the on-the-fly
retiming algorithms of Figures 5.4 and 5.5, the resulting AND/INVERTER/REGISTERgraph
is depicted in Figure 5.7b, corresponding to the netlist of Figure 5.6b. The graph was
constructed from the original netlist shown in Figure 5.6a starting from the FREE vertices
and a cut at REGISTERr1.52
(a)
0
0
0
0
1
(b)
1 1
1
0
1
x3x2x1g1
g3 g4g2r1r2 r4r3 r7g5x1 r03r01r7
y1x3x2 g3 g4
g1=g2
y1r5r6 y2
g5y2
Figure 5.6: On-the-fly retiming example: (a) original netlist, (b) netlist after on-the-flyretiming
Lemma 5.1. The on-the-fly retiming transformations of Figures 5.4 and 5.5 are sound and
complete for invariant checking.
Proof. We consider the individual transformations.� The following transformations are correct by propositional logic.
– Conjunction with ZERO yields ZERO.
– Conjunction of opposite polarity literals yields ZERO (contradiction).
– Conjunction of a literal with ONE is semantically equivalent to that literal (iden-
tity).
53
0
1
0
0
0
1
1
(b)
(a) y1y2g4g3g1=g2
y1y2g2g4g3w = 2 w = 1w = 1w = 1w = 2
w = 2w = 1
w = 1
g5
g5
x3x1x2x3x1x2
w = 1
g1
edge inversionreference to node of initial value tree
set ofw REGISTERsw
Figure 5.7: AND/INVERTER/REGISTER graph example: (a) graph of original netlist ofFigure 5.6a, (b) graph of on-the-fly retimed netlist of Figure 5.6b
– Conjunction of identical literals is semantically equivalent to that literal (idem-
potency).
– Swapping of incoming edges to an AND vertex is semantically correct by the
commutativity of conjunction.
– Elimination of pairs of adjacent INVERTERs is semantically correct (double
negation).� The following transformations are correct by semantic equivalence (Theorem 5.1).
54
– Re-use of an existing AND or INVERTER with identical inputs is semantically
correct.
– Re-use of an existing REGISTER with an identical input and initial value is
semantically correct.
– Replacement of a REGISTER with a constant initial value, whose input is the
same constant, with the corresponding constant is semantically correct.� An INVERTER dragged past a REGISTER is semantically equivalent to the original
REGISTER (without dragging). Because we invert the initial value of the bypassed
REGISTER, at time 0 the dragged INVERTER drives the inversion of the inverted
initial value, equivalent to the value of the original REGISTERby the double negation
property of propositional logic. Thereafter, the dragged INVERTER will drive the
negation of the valuation that appeared at the source of the undragged INVERTER
one time-step earlier, as will the original un-bypassed REGISTER.� The last of thewmin REGISTERs dragged beyond an AND vertex is semantically
equivalent to the original unbypassed AND vertex. At time-stepi 2 [0; wmin � 1℄,this last dragged REGISTER drives the(wmin � i)-th dragged initial value, which
is equivalent to the valuation to the unbypassed AND vertex at the same time-step
because we conjunct initial values of the dragged REGISTERs. Thereafter, this last
dragged REGISTER drives the conjunction of valuations to the sources of the by-
passed AND from wmin time-steps earlier; valuations to the sources of the bypassed
AND arewmin time-steps earlier than those of the unbypassed AND.
Note that eachMerge call requires linear resources with respect to netlist size. Prac-
tically, the resources tend to be near constant, since most vertices have a relatively small
indegree and outdegree, and since we may hash REGISTER initial values to avoid needing
55
to explicitly check each one duringMerge. After merging, the merged vertex will have
zero sinks, and will not be in the cone of influence ofT . Therefore, optimal redundancy
removal may be achieved withinjVj calls toMerge, which overall bounds necessary re-
sources to quadratic. After a vertex is merged, it is often beneficial to analyze its original
sinks to see if they too may be candidates for simplification –we refer this recursive for-
ward sweeping of simplification asforward hashing. However, we must take care not to get
caught in an infinite loop of dragging REGISTERs through cyclic logic when redundancy
removal includes on-the-fly retiming. This may be enforced by setting and clearingvisited
flags as forward hashing processes vertices. If thevisitedflag of a vertex is already set,
forward hashing neglects processing that vertex to preventinfinite recursion.
5.2 Related Work
Redundancy removal has been the topic of numerous prior research efforts; our contribution
to this area is the technique of on-the-fly retiming. Our AND/INVERTER/REGISTERgraph
is a sequential extension to the AND/INVERTER graph proposed in [50] which enables
sequential redundancy removal by relocation of REGISTERs across combinational vertices
in the graph. This relocation correlates to the applicationof specific retiming moves [66].
Semantic approaches of redundancy removal provide an extension to on-the-fly
techniques. If they determine two vertices to be semantically equivalent, they merge one
onto the other. As an example, the technique of [51] iteratesbetween BDD-sweeping and
SAT solving with resource bounds. BDDs are canonical representations of the function of
a vertex, and are built up to a specified upper-bound size limit. BDD hashing is used to
determine whether vertices are semantically equivalent, in which case they are merged, or
opposite, in which case one is merged onto the inversion of the other. When BDDs become
too large, intermediatecut verticesare used to allow BDDs to be built beginning from ar-
bitrary points in the netlist, not just FREE or REGISTER points. SAT is also used to prove
56
equivalence or inversion as an alternate algorithm which may in cases outperform BDDs.
The application of on-the-fly redundancy removal may be synergistically combined with
such semantic approaches. The integration of on-the-fly retiming furthermore extends the
equivalence checking capability of these techniques beyond combinational verification to
potentially cover a significant class of practical problemsin verifying retimed netlists [67].
The benefit of specific forms of redundancy removal for enhancing verification has
been noted in numerous prior publications, such as [64, 65].
5.3 Experimental Results
The experimental results of our redundancy removal techniques will be provided in Sec-
tion 6.4.1 so that we may study their synergy with retiming.
57
Chapter 6
Generalized Retiming
In this chapter we discuss the use of generalized min-area retiming to reduce verifica-
tion complexity. This chapter extends the results of collaborative work with Andreas
Kuehlmann reported in [10, 25]. Retiming is a structural optimization technique which
relocates REGISTERs in a netlist across combinational gates with the objectiveof mini-
mizing their total count, minimizing the greatest combinational delay along any directed
path containing no REGISTERs, or minimizing one objective while constraining the poten-
tial increase of the other [68, 66]. For synthesis purposes,the latter is the most common
objective since minimization of REGISTER count is often contrary to minimization of the
worst-case combinational delay, thus simultaneous minimization of these two objectives
is typically necessary. However, for invariant checking weare not concerned about com-
binational delays, hence the minimization of REGISTER count – which is referred to as
min-arearetiming – is our primary objective. Unlike the on-the-fly retiming technique
from the previous chapter, traditional retiming alters thesemantics of the netlist by causing
gates to be temporally shifted.
The traditional use of retiming is for enhanced synthesis, which imposes two con-
straints that fundamentally limit its solution space: the retimed netlist must be physically
implementable, and the retiming must preserve the originalinput-output behavior of the
58
netlist. For verification, these restrictions may be liftedwhich results in a larger solution
space, hence a potentially significantly greater reductionin REGISTER count. There are
three generalizations of classical retiming that may be exploited in a verification domain.
First, REGISTERs which are sourced by FREE vertices or have no sinks represent a mere
temporal shift of peripheral values, thus may be suppressedfor state space traversal us-
ing the technique of peripheral retiming. Second, a temporally partitioned invariant check
eliminates the restriction that the retimed netlist must have an equivalent reset state. Third,
verification algorithms may handle NEGATIVE REGISTERs, which are formalized in Defi-
nition 6.2. This significantly increases the solution spacefor legal retimings by removing
the non-negativity constraints from the problem formulation. In this chapter we explore
these topics, extending the results we reported in [10].
Retiming is traditionally applied to a rigid netlist graph and repositions the REGIS-
TERs without altering the combinational logic structure. Wheninterleaved with redundancy
removal, a repeated application of retiming may significantly optimize the overall netlist
structure [10, 69, 70]. In this chapter we additionally introduce our technique of fanin REG-
ISTER sharing from [25], which is analogous to the original concept of fanout REGISTER
sharing [66]. This technique takes a new view of the retimingformulation by departing
from a traditional, more restrictive use of a fixed netlist structure.
6.1 Retiming Formulation
In this section we define retiming, and discuss its formulation. The retiming optimization
problem may be formulated as an Integer Linear Program (ILP)using a directed graph
model of the netlist [66] which represents REGISTERs implicitly as edge weights, similarly
to our AND/INVERTER/REGISTER graph introduced in Section 5.1. For simplicity of ex-
position, all theory of this chapter is developed accordingto this representation; refer to
Section 5.1 for a precise mapping of this representation to anetlist.
59
Definition 6.1. A retimingof netlistN is a gate labelingr : V 7! Z, wherer(v) is thelag
of vertexv denoting the number of REGISTERs that are moved backward through it.
The retimed edge weights~w of the retimed netlist~N are computed as follows.~w(u; v) = w(u; v) + r(v)� r(u) (6.1)
Traditional retiming also imposes non-negativity constraints upon~w(u; v).~w(u; v) � 0 (6.2)
For min-area retiming, we are interested in minimizing the total number of REGIS-
TERs of ~N . min X8(u;v)2E ~w(u; v) (6.3)
6.1.1 FanoutREGISTERSharing
The above retiming formulation does not consider fanout REGISTER sharing as depicted
in Figure 5.3. Leiserson and Saxe [66] provide an extension to the retiming formulation
to account for fanout sharing, as depicted in Figure 6.1a. Their approach adds a dummy
vertex for each netlist vertexv with an outdegree greater than one. (No semantics are
applied to these dummy vertices; they are merely temporary artifacts of the ILP graph.)
This dummy vertex will sink all vertices inoutlist(v), and edge weights are modified as
shown. Letwmax represent the maximum weight of any of these outgoing edges,andn = outdegree(v). Each weightwi is divided byn, and the new edges to the dummy
vertex are assigned a weight equal to the difference betweenwmaxn and the modified weight
of the corresponding fanout edge. This fractional weight isrealized by associating a “cost
per unit weight”�(u;v) = 1=n with each edge(u; v), and minimizing the total weighted
cost in the objective function (6.3). Note that the sum of alledge weights in this “sharing
subnetlist” is equal towmax, and that the retiming formulation accounts for~wmax in the
overall minimization problem because at least one incomingedge to the dummy vertex
60
...(a)
...(b)
w2nwnnw1n wmax�w2nwmax�wnnwmax�w1n
Dummy vertex for fanout register sharing
Dummy vertex for fanin register sharing
Combinatinal gate
wmax�w1n wnnw1nw2nwmax�wnnwmax�w2nFigure 6.1: REGISTER sharing: (a) ILP model of fanout REGISTER sharing, (b) extensionto fanin REGISTERsharing
will have a weight of zero in any optimal solution. This precisely models fanout REGISTER
sharing for the ILP formulation.
6.1.2 FaninREGISTERSharing
As we proposed in [25], the concept of fanout REGISTERsharing may be extended to fanin
REGISTERsharing. If the vertices represent completely symmetric boolean functions, then
all possible tree configurations establish a valid decomposition of their function. In our
framework, we use multi-input AND vertices as a base system for fanin sharing. How-
ever, the presented concepts are equally applicable to other completely symmetric functions
(e.g., OR and XOR vertices).
Figure 6.1b shows how the concept of fanout REGISTERsharing may be adapted to
fanin REGISTERsharing. A dummy vertex for fanin sharing is created for eachAND vertexv with an indegree of three or greater (since decomposition ofa 2-input AND is not useful).
This dummy vertex sources new incoming edges into all vertices ofinlist(v), and the edge
weights are modified as shown, analogously to the modeling offanout REGISTERsharing.
With this configuration, the retiming optimization problemwill minimize the maximum
number of REGISTERs at any of the fanin edges tov, rather than their sum. Once a min-
61
area retiming is computed, a simple algorithm may be used to decompose a multi-input
AND vertex into a tree of 2-input AND vertices to enable maximal sharing of REGISTERs.
...
...
...
(b)(a)
mk n
k + n +m
0n +m
k + n +mn +mm m w
set ofw REGISTERs
Figure 6.2: Decomposition of an AND vertex that requires onlywmax = k + n +m REG-ISTERs: (a) vertex with incoming edges sorted by weight, (b) corresponding AND tree
The scheme for flexible tree decomposition that requires only ~wmax REGISTERs is
illustrated in Figure 6.2. The algorithm first sorts the incoming edges to an AND vertex by
their retimed weight. Next, an AND tree is built using the structure of Figure 6.2b. For each
set of incoming edges with identical weight, a balanced AND subtree is constructed. The
individual subtrees are then connected by REGISTERs in a linear sequence. The number
of REGISTERs assigned to the edges between the subtrees is equal to the difference of
their REGISTER count. This construction, and the calculation of the corresponding initial
values, may be performed by building a series of 2-input AND vertices from the highest-
weight incoming edge to the lowest-weight incoming edge using the on-the-fly retiming
algorithm of Figure 5.4.
For maximum fanin sharing, the netlist graph is first restructured to form maximal
AND vertices – i.e., we iteratively subsume AND vertices which are connected without
inversion or sequential elements into “larger” multi-input AND vertices. Next, a retiming
graph with the dummy vertices for fanin and fanout sharing isbuilt. Note that for any edgee involved in simultaneous fanin and fanout sharing, a splitting one-input AND vertex must
62
be introduced between its endpoints to disambiguate a retiming solution during resynthesis
of the retiming graph. After computing the optimal retiming, a two-input AND graph may
be rebuilt using the procedures depicted in Figures 5.3 and 6.2 to obtain a minimum number
of REGISTERs.
(b)
Splitting vertex between fanin and fanout sharing subnetlists
(a)g5
y1y2131313 03 g3=g403 13
0313 0303g1=g2
x1x3 03 x2 13000
0g5
y1y22313 13 g3=g423
030303 13 0303g1=g2
x1x3 13 x2 13000
0Figure 6.3: Retiming graph for netlist of Figure 5.6: (a) original graph with 3 REGISTERs,(b) optimal solution resulting in 2 REGISTERs
Figure 6.3 depicts the retiming graph for the example netlist of Figure 5.6b. Part
(a) provides the edge weights for the original netlist. The top portion of the graph depicts
the dummy vertex modeling the possible sharing of fanout REGISTERs of vertexg1=g2.The bottom portion models the possible sharing of fanin REGISTERs of vertexg3=g4. Note
that we initially arbitrarily assigned the two REGISTERs between gatesg1=g2 andg3=g4 to
the input portion of gateg3=g4. An assignment to the output portion of gateg1=g2 would
yield identical results. Part (b) shows the resulting weights from the ILP solver, which
corresponds to an optimal retiming.
63
6.1.3 Relaxing Input-Output Equivalence Constraints
The original definition of retiming for synthesis requires the preservation of input-output
semantic equivalence. One consequence of this requirementis that the sequential weight
of any path from a design input (which correlates to a FREE vertex in our framework) to a
design output (which correlates to a sinkless vertex in our framework) must be unchanged
through retiming. Leiserson and Saxe [68] propose enforcing this constraint by introduc-
ing a specialhostvertex, which sources edges to all FREE vertices, and sinks edges from
all outputs. Retiming the host therefore merely shifts REGISTERs across these peripheral
vertices rather than discarding them, thereby preserving path weights.
Figure 6.4a depicts a netlist with six REGISTERs R1; : : : ; R6, two FREE verticesaandb, and one sinkless targett. The initial values of the REGISTERs areZ(R1) = ONE,Z(R2) = ZERO, Z(R3) = ZERO, Z(R4) = ONE, Z(R5) = ZERO, andZ(R6) = ZERO.
Figure 6.4b shows the retiming graph for this netlist including the host vertex. The edge
labels denote the number of REGISTERs along the corresponding nets.
(a) (b)
R2 R3
4RR6
51 R0
1 1
2
1
0
0
0 0
00
0 0host0
1
R
abg1g3
g4 g5g2 tg1 g2
g3g4 g5
g6g6 b ta
Figure 6.4: Retiming example: (a) original netlist, (b) corresponding retiming graph
For verification purposes, REGISTERs at the peripheral vertices of a netlist represent
mere temporal offsets and do not impact the reachability of the netlist core [71]. Thus, they
may be suppressed during the verification process. These offsets may be restored by tempo-
ral shifts in any traces obtained on the retimed netlist as per the trace-lifting algorithm im-
64
plied by Lemma 6.2. To enable discarding of REGISTERs from peripheral vertices, which
is termedperipheral retiming[69], the host vertex is removed from the retiming graph,
causing the ILP solution to pull as many REGISTERs as possible out of the netlist. For syn-
thesis applications, these REGISTERs are considered temporarily “hidden” or “borrowed”
and would have to be added back after optimization [69]. Figure 6.5a shows the graph for a
maximal peripheral retiming of the netlist of Figure 6.4, ignoring initial state equivalence.
The edge labels represent the REGISTERcounts 1= 2 of the original netlist ( 1) and retimed
netlist ( 2), respectively. The vertex labels denote their lag, i.e., the number of REGISTERs
that have been pushed backward through them. As depicted, bymergingR1 andR2 and
removingR6, the REGISTERcount may be reduced from six to four.
(a) (b)
1/0 1/1
2/1−1 −1 −10
−1 −1−1
−1 −2
1/0 1/1
2/00/1−1 −2 −20
−1 −2−1
−1 −3
0/1
1/01/0
0/−1
1/1 1/0
g1 g4 g5g6g3
g2 g1 g2 g4 g5g6
tba
b ta g3
Figure 6.5: Relaxed retiming graphs for the example of Fig. 6.4 : (a) peripheral retimingignoring reset state equivalence, (b) retiming with NEGATIVE REGISTERs permitted
A second constraint imposed by synthesis requirements is that the retimed netlist
must have an equivalent initial state. With this restriction, the netlist of Figure 6.4a can-
not readily be retimed since REGISTERs R1 andR2 have incompatible initial values and
cannot be merged by a backward move. To visualize this, ifR1 andR2 are shared with an
initial value of ONE, the sequencefh(a; 0); 0i; h(b; 0); 0i; h(a; 1); 1i; h(b; 1); 0i; h(a; 2); 0i;h(b; 2); 0igwould produce the sequencefh(t; 0); 0i; h(t; 1); 0i; h(t; 2); 0ig instead of the se-
quencefh(t; 0); 0i; h(t; 1); 0i; h(t; 2); 1ig in the original and REGISTER-shared netlist, re-
spectively. Similarly, the sequencefh(a; 0); 1i; h(b; 0); 0i; h(a; 1); 0i; h(b; 1); 0i; h(a; 2); 1i;65
h(b; 2); 0i; h(a; 3); 0i; h(b; 3); 0igwould produce a distinguishing sequence of valuations tot if both REGISTERs are shared with a joint initial value of ZERO.
In verification, we need not preserve input-output equivalence of the retimed netlist
as long invariant checking is preserved. The requirement for equivalent reset states may be
relaxed by a temporal decomposition of the verification taskinto two parts: (1) performing
a bounded model checkof each time-step oft included in a combinational initialization
structure, representing an unfolding of each vertexv for time-steps0 : : : 1 � r(v), here-
after referred to as theretiming stump, and (2) checking the retimed netlist core, hereafter
referred to as the retimedrecurrence structure. By separating these two obligations, we
perform a temporal decomposition of the invariant check which enables greater reduction
capability for the subsequent verification flow than otherwise possible.
6.1.4 EnablingNEGATIVE REGISTERs
A third and final relaxation of retiming is achieved by enabling negative weights along the
edges. This approach is motivated by the fact that REGISTERs merely denote functional
relations between different time-steps as illustrated by Definition 3.12. In logic synthesis,
clocked or unclocked delay elements are used to physically implement these relations. Such
elements may only cause delays of present values into futuretime-steps. However, for veri-
fication, this limitation may be lifted and arbitrary temporal relations in either direction may
be supported, thus enabling the generation of NEGATIVE REGISTERs. NEGATIVE REG-
ISTERs are formalized in Definition 6.2, which serves as an addendum to Definition 3.11
and 3.12 for this chapter.
Definition 6.2. A NEGATIVE REGISTER is a one-input gate which acts as a one-time-step
predictor of its input. TermGv is not referenced for NEGATIVE REGISTERs.� If type(v) = NEGATIVE REGISTER, thenp(v; i) = p(u; i+ 1), whereu = inlist(v).66
We provide an alternative interpretation of the NEGATIVE REGISTERin Figure 6.6b.
The sequential weight of a NEGATIVE REGISTER is �1, whereas that of a REGISTER is1. In the presence of NEGATIVE REGISTERs, states reached during forward traversal must
be validated as being truly reachable by analysis of future time-steps. This results in a
third component for the temporal decomposition of the verification task, reflected by the
retiming top(refer to Figures 6.6 and 6.8) – a state encountered during forward traversal
may be determined to be unreachable if it cannot satisfy thisstructure.
Though most abstractions discussed in later chapters are applicable to netlists which
contain NEGATIVE REGISTERs, we neglect discussing such due to the necessity of vali-
dating counterexample traces against theretiming top. Extensions of most verification
algorithms to handle NEGATIVE REGISTERs are straightforward as per Definition 6.2.
However, the focus of this thesis is on abstractions of use ina general invariant check-
ing paradigm without restriction or customization of verification algorithms. We therefore
only discuss NEGATIVE REGISTERs in this chapter.
To enable NEGATIVE REGISTERs, the non-negativity constraints of formula (6.2)
are relaxed for the ILP solver, and we minimize the sum of the absolute value of each~w(u; v) in our objective function (6.3). In synthesis, NEGATIVE REGISTERs are consid-
ered temporary and must be eliminated after optimization, except in specific cases where
precomputationmay be employed [72]. Figure 6.5b shows the resulting retiming graph for
the netlist of Figure 6.4. By using one NEGATIVE REGISTER, the total sequential element
count is reduced to three. Figure 6.6a depicts the resultingnetlist, where ~R2 represents a
NEGATIVE REGISTER. Note that these three sequential elements reflect the true temporal
relations present in the cyclic and reconverging paths of the original netlist. Figure 6.6b
also provides an alternative interpretation of NEGATIVE REGISTERs; we may merge a
NEGATIVE REGISTERr onto a new FREE vertexv, but we then must constrain the netlist
so that the value driven byv at timei is equal to the value sourcing the input tor at timei + 1. During symbolic analysis, NEGATIVE REGISTERs may be handled by exchanging
67
. . . . . .
(b)(a)
. . . . . .
=
1
R2
RR3~~
~ CONSTRAIN
FREE
ab tg6g5
g3g1 g2 g4
Figure 6.6: Retimed netlist of Fig. 6.5b: (a) retimed netlist, (b) intuitive interpretation ofNEGATIVE REGISTERs
the present and next state variables in the transition relation.
As a practical implementation issue, use of an absolute value in the objective func-
tion (6.3) causes a nonlinearity which may significantly increase computational require-
ments in precluding the application of a linear algorithm. We have found that an effi-
cient way to deal with this problem is to use two variables to denote retimed edge weight:~w+(u; v) and ~w�(u; v), correlating to the number of REGISTERs and NEGATIVE REGIS-
TERs along the retimed edge, respectively [73]. We split (6.1) into two constraints per edge:~w+(u; v) � w(u; v) + r(v)� r(u)~w�(u; v) � ��w(u; v) + r(v)� r(u)� (6.4)
We in turn require that~w+(u; v) and ~w�(u; v) are non-negative.~w+(u; v) � 0~w�(u; v) � 0 (6.5)
Our modified objective minimizes the sum of these two variables. Clearly, any
optimal solution will assign at least one of these two variables per edge to 0.min X8(u;v)2E � ~w+(u; v) + ~w�(u; v)� (6.6)
This modeling allows the use of an efficient ILP algorithm (such as the network
simplex algorithm [74]) to calculate an optimal solution. As demonstrated by Leiserson
68
and Saxe [66], if NEGATIVE REGISTERs are disallowed we may cast the retiming problem
as a min-cost flow problem for which we may use a polynomial-time graph-based algo-
rithm.1 Allowing NEGATIVE REGISTERs precludes this modeling; though efficient, the
simplex algorithm is not guaranteed to require sub-exponential resources for arbitrary prob-
lems. However, we have found that the network simplex algorithm often yields superior
performance to graph-based algorithms for retiming applications even when NEGATIVE
REGISTERs are disallowed.
One noteworthy practical issue is that relaxation of non-negativity constraints along
with REGISTERsharing modeling does not present an accurate cost model to the ILP solver.
In particular, if we allow negative weights on edges with non-unity �, retiming may pro-
duce a solution with a higher REGISTER count than the modeled cost of the ILP solution
reflected in the objective function. An example of this of phenomena is depicted in Fig-
ure 6.7. In Figure 6.7a is a retiming graph with a total cost ofone, distributed across three
edges. With non-negativity constraints, we could not reduce the number of REGISTERs
along the incoming edges to the dummy sink vertex unless we could backward-retimeg2or g3 and thereby exploit fanout sharing or backward-retiming ofg1, or forward-retimeg4to enable forward-retiming across the dummy vertex; each ofthose retimings risk caus-
ing suboptimalities due to other sources and sinks ofg2; : : : ; g4. However, without non-
negativity constraints, the ILP solver may drop the cost by13 without retimingg2; : : : ; g4as depicted in Figure 6.7b. Such a retiming obviously does not reduce REGISTER count
for the resulting netlist, and may overall prevent the ILP solver from selecting an avail-
able truly lower-cost solution. For example, one possible solution may drop the retimed
netlist weight by one, but the ILP solver instead will choosea solution which merely re-
times the dummy vertices in a graph with three structures as depicted in Figure 6.7. There
are two possible solutions to this modeling problem: first, we may impose non-negativity
constraints only on the edges with non-unity�, which is the solution we have chosen in our
1One of the most efficient known graph algorithms for the min-cost flow problem is theenhanced capacityscaling algorithm, which isO�jEj � log(jVj) � (jEj+ jVj � log(jVj))� [75].
69
(b)(a)
g2g3g1 g4 03-1303g2g3g1 g4 131303030313 13 0303
Figure 6.7: Example of incorrect ILP modeling of sharing with relaxed non-negativity con-straints: (a) original graph with total cost of1; (b) “incorrect” solution of lagging dummyvertex by -1, with absolute value total cost of23implementation due to its computational efficiency. Second, we may correct the sharing
model by departing from a graph representation, instead representing the “maximum ver-
sus summation” condition of sharing by modeling constraints which concurrently reason
about multiple edges, rather than individual edges.
6.1.5 Normalized Retiming
Formula (6.1) imposes an equivalence relation on the set of retimings. Two retimingsr1andr2 result in identical REGISTER placement and count, thus are said to be equivalent,
if and only if r1 = r2 + for some arbitrary integer . This concept enables us to use a
normalized retimingwithout sacrificing reduction optimality.
Definition 6.3. A normalized retimingr0 is obtained from an arbitrary retimingr, and is
defined asr0 = r �maxv2V r(v).Hereafter, we will use the term retiming to denote a normalized retiming. As will
be discussed, the use of a normalized retiming simplifies thecalculation of initial values
of retimed REGISTERs; a solution to this problem otherwise may not exist. Furthermore,
because fanout sharing is only applicable if the shared REGISTERs have equivalent initial
values, we may use an extra degree of normalization to shift all vertices forward until
70
the REGISTERs to be shared obtain equivalent initial values (refer to theretiming stump
discussed in Definition 6.4), thereby enhancing reduction potential. A similar trick may
be employed to ensure that REGISTERs on outgoing edges from ZERO or ONE may be
eliminated from the retimed netlist through constant propagations regardless of their initial
values.
6.2 Retiming for Enhanced Verification
In this section we discuss the use of generalized retiming for enhanced verification, and
provide proofs of correctness of this technique for invariant checking.
As discussed in the previous sections, we temporally decompose our verification
task into three components to enable greater reduction capability for min-area retiming.
Figure 6.8 illustrates the overall temporally decomposed verification task for the netlist
of Figure 6.6a. The medium-shaded area reflects the retimed recurrence structure which
must generally be discharged by sequential reachability analysis. The darkly-shaded area
denotes theretiming stump(refer to Definition 6.4), which is used to compute the initial
values for the retimed REGISTERs and to verify our targett for the first three time-steps.
The lightly-shaded area represents the retiming top.
We now illustrate how to process these three verification components. First, we
need to prove that the property holds for the retiming stump;because this is a combina-
tional structure, we may discharge this obligation using aBMC approach. In our example,
we see thatti = 0 for i = 0; 1; 2. The set of retimed initial values~Z is a subset of the
retiming stump. In our example, we obtain~Z( ~R1) = a0 ^ Z(R1) and ~Z( ~R3) = :� ~R2 ^Z(R2)^:((a0_b0)^Z(R3)^Z(R5))�. This correlates to an initial state set( ~R1; ~R2; ~R3) =f(0; 0; 1); (0; 1; 1); (1; 0; 1); (1; 1; 1)g. Next, using these retimed initial values, sequential
verification is performed on the recurrence structure. Thisleads to a counterexample for
initial state(0; 1; 1), with FREE vertex valuationsa1 = 0 andb1 = 0. Furthermore, the
71
(c)
1R’
ab 11
1R
ab 00ab 22
R’
R’
2
3
2R
1t t 2 t 3
...
...
...
...
...−1
−1
−2−2
−2
−1
−1
−1
~
~
3R~
1
5
2
~
~
Z(R )
Z(R )
Z(R )
Z(R ) 6
Z(R )
Z(R ) 3
4
t 0
Figure 6.8: Components of retimed netlist depicted in Fig. 6.6: darkly shaded: retimingstump; medium shaded: retiming recurrence structure; lightly shaded: retiming top
retiming top imposes the constrainta2 _ b2 upon the NEGATIVE REGISTER ~R2, which is
satisfiable for the given failing state. A complete counterexample trace is composed of a
satisfying assignment to the retiming stump for generatinga retimed initial state, a coun-
terexample trace generated upon the recurrence structure from the corresponding retimed
initial state, and a satisfying assignment to the constraint imposed by theretiming top. For
the given example, this results infh(a; 0); 0i; h(b; 0); 0i; h(a; 1); 0i; h(b; 1); 0i; h(a; 2); 0i;h(b; 2); 1ig.Ascribing semantics to this netlist representation where REGISTERs are implicit as
edge attributes, letEjuv; 1 � j � w(u; v) denote the initial value of thej-th REGISTER
along edge(u; v). Additionally, letGv(fjv; : : : ; fkv) be the function of gatev with incom-
ing edges(j; v); : : : ; (k; v). If v is a FREE vertex,Gv() denotes the sampled input value at
a specified time-step. Valuations to the gates ofN at timei � 0 may be computed by (6.7).
72
_p(fuv; i) = 8><>:Ew(u;v)�iuv if i < w(u; v);_p�u; i� w(u; v)� otherwise_p(u; i) = Gu� _p(fju; i); : : : ; _p(fku; i)� (6.7)
For example, the value at timei of the net connecting the output of REGISTER j with the
input of REGISTERj + 1 of edge(u; v) is _p(fuv; i+ w(u; v)� j).Similar to formula (6.7), for a given retimingr, valuations to the gates of the corre-
sponding retimed netlist~N at timei may be computed by (6.8). Term~Eiuv represents the
initial values of the corresponding retimed REGISTERs of ~N ._p( ~fuv; i) = 8><>: ~E ~w(u;v)�iuv if i < ~w(u; v);_p(~u; i� ~w(u; v)) otherwise_p(~u; i) = Gu� _p( ~fju; i); : : : ; _p( ~fku; i)� (6.8)
In contrast to formula (6.7), it is not obvious that this formula is well formed, be-
cause~w(u; v) may be negative.
Lemma 6.1. Let N be a legal netlist, andr be a retiming resulting in netlist~N . The
evaluation of formula (6.8) for computing the state of~N at timei will terminate for any
finite i � 0.
Proof. First, we note thati remains non-negative during the evaluation of (6.8) since time
is defined only uponN hence we never begin a valuation at a negativei, and since (6.8) will
never reducei below 0. Second, sinceN and therefore~N are finite, any non-terminating
evaluation of formula (6.8) must involve an infinite recursion on at least one gate. Letube one of those gates andhu; u1; : : : ; un; ui be a directed cycle in~N causing the recursion.
The difference betweeni andi0 of two succeeding recursions is theni � i0 = ~w(u; u1) +~w(u1; u2)+: : :+ ~w(un; u). A substitution using (6.1) leads toi�i0 = w(u; u1)+w(u1; u2)+73
: : : + w(un; u) since ther terms telescope. Our original netlist is assumed to be legal,
hence all directed cycles have strictly positive sequential weight; the telescoping property
of retiming furthermore guarantees that the cumulative weight along any directed cycle
is unchanged by retiming. Thereforei strictly decreases after each recursion throughu,
which causes the evaluation to terminate oncei < ~w(uj; uj+1) for some edge(uj; uj+1) in
the corresponding directed cycle.
Definition 6.4. The retiming stump~NS is a combinational netlist obtained by an unfolding
of N , and contains vertices corresponding to the following set.~NS = fsiuv : (u; v) 2 E ^ �siuv = _p(fuv; i)� ^ �0 � i < ~w(u; v)� r(v)�gOur retimed verification structure is a composition of~N and ~NS. The retiming
stump ~NS provides the edge functions for the first several time-steps, which is necessary
for verification of the targets during the time-steps eliminated from the recurrence structure,
as well as for providing the initial values for the REGISTERs of ~N as follows.~Ejuv = s ~w(u;v)�r(v)�juv ; 0 < j � ~w(u; v) (6.9)
Note that this formula is well formed for normalized retimings becauser(v) � 0.
Lemma 6.2.LetN be a legal netlist, andr be a retiming resulting in netlist~N and retiming
stump ~NS. The following relations provide a bijective mapping between each edge function
of f ~N; ~NSg to the corresponding edge function ofN and vice versa._p(fuv; i) = 8><>:siuv if i < ~w(u; v)� r(v);_p� ~fuv; i+ r(v)� otherwise(6.10)
siuv = _p(fuv; i) if i < ~w(u; v)� r(v);_p( ~fuv; i) = _p�fuv; i� r(v)� otherwise (6.11)
74
Proof. We first demonstrate that formula (6.10) correctly mapsf ~N; ~NSg to N . For i <~w(u; v) � r(v), (6.10) reflects ~NS from Definition 6.4, thus our obligation is trivially
satisfied. Fori � ~wuv � r(v), after substitution using (6.8), we must demonstrate that_p(fuv; i) = Gu� _p( ~fju; i + r(v)� ~w(u; v)); : : : ; _p( ~fku; i + r(v)� ~w(u; v))� by inductively
proving for each input tou that _p� ~fju; i+r(v)� ~w(u; v)� = _p�fju; i�w(u; v)�. Base case:
For the base case, we have thati + r(v) � ~w(u; v) < ~w(j; u). Using (6.8) and (6.9) we
obtain _p� ~fju; i+r(v)� ~w(u; v)� = ~E ~w(j;u)�i�r(v)+ ~w(u;v)ju = _p�fju; i+r(v)� ~w(u; v)�r(u)�which, after applying (6.1), satisfies our obligation.Inductive step: For the inductive step,
we have thati+ r(v)� ~w(u; v) � ~w(j; u). A substitution using (6.8) results in the equality_p� ~fju; i+ r(v)� ~w(u; v)� = Gj� _p( ~fhj; i+ r(v)� ~w(u; v)� ~w(j; u)); : : : ; _p( ~flj; i+ r(v)�~w(u; v) � ~w(j; u))�. If ~w(j; u) > 0 we may immediately reduce the temporal arguments
of Gj by induction. If ~w(j; u) � 0, then the right-hand side must be further expanded until
an inductive reduction may be performed. A termination analysis similar to the proof of
Lemma 6.1 may be applied, demonstrating that the time-stepi will eventually decrease and
therefore the expansion will terminate after a finite numberof iterations. This termination
will either result in a valuation from~NS, which satisfies our proof obligation as demon-
strated in the base case analysis above, or at a zero-input gate (FREE or ZERO), which is
clearly semantically equivalent both before and after the retiming.
We next demonstrate that (6.11) correctly mapsf ~N; ~NSg toN . The first part follows
from Definition 6.4. The second part follows from the previous inductive proof.
Formula (6.10) illustrates an efficient mechanism for lifting a trace obtained on the
retimed netlist to one consistent with the original netlist.
Theorem 6.1.Retiming is sound and complete for invariant checking.
Proof. This theorem is an immediate consequence of the bijectivevertex, time mapping
between the original and retimed netlist reflected by Lemma 6.2. In particular, atarget
unreachableresult will be generated only if all time-steps of the targetwithin the retiming
75
stump are proven unreachable, and also if the target is proven unreachable in the recurrence
structure, which collectively imply that the unretimed target is also unreachable.
Additionally, atarget hitresult will be generated if the target is hit within the retim-
ing stump, or if the target is hit in the recurrence structure. In either case, the unretimed
target is also hittable. The trace generated by a subsequentverification flow is semantically
correct with respect to the retimed netlist and hits the retimed target by assumption. There-
fore, the trace-lifting procedure implied by Lemma 6.2 willyield a semantically correct
trace with respect to the unretimed netlist, and will hit theoriginal unretimed target.
Theorem 6.2.A retimed netlist is a legal netlist.
Proof. We consider the requirements for legality enumerated in Definition 3.24.
1. The only gates fabricated by retiming are either retimed REGISTERs (and NEGATIVE
REGISTERs) which are correct by our synthesis of the AND/INVERTER/REGISTER
graph, or constructed by combinational unfolding, which are correct by the assump-
tion that the original netlist is legal.
2. We note that a normalized retiming will lag each vertex at most jRj time-steps, and
each retimed edge weight will be between�jRj; : : : ; jRj, else the retiming is not op-
timal. Therefore,~NS is of finite size. Furthermore, the recurrence structure contains
a copy of each combinational gate of the original netlist, with at mostjRj generated
REGISTERs and NEGATIVE REGISTERs, else the retiming is not optimal. Thus the
composite retimed netlist~N k ~NS is finite.
3. Our retimed initial values come from the retiming stump, which comprises a com-
binational unfolding of the original netlist. Hence, the initial value of every retimed
REGISTERmust be combinational.
4. Due to the telescoping ofr values for the vertices comprising each directed cycle,
retiming preserves the sequential weight of directed cycles whether or not NEGATIVE
76
REGISTERs are allowed. Therefore, by assumption, all directed cycles will have
strictly positive weight.
Theorem 6.3. If the diameter of a set of vertices~U of the recurrence structure isd( ~U),andmax~u2 ~U � � r(~u)� = i, then the diameter of the original set of verticesU satisfiesd(U) � d( ~U) + i.Proof. Note that we may compose a series of�r(~u) REGISTERs to each~u, whose initial
values are determined by corresponding values from~NS, to yield a set of verticesU 0 which
are trace-equivalent toU . Each stage of this pipeline is anAC, hence increments diame-
ter by at most 1. This proof therefore follows from Theorem 4.3 and Corollary 4.1 by the
trace-equivalence ofU 0 andU .
6.3 Related Work
Leiserson and Saxe first proposed retiming as a synthesis optimization [68] and developed
its graph-based ILP formulation [66]. Malik et al. [69] werethe first to introduce periph-
eral retiming with the objective of moving a maximum number of REGISTERs to the netlist
boundaries. This yields a maximal combinational netlist core to enhance the domain of
applicability of conventional combinational optimizations. They also introduced the con-
cept of NEGATIVE REGISTERs as a method of temporarily “borrowing” or “discarding”
REGISTERs from inputs and outputs. After combinational optimization, these NEGATIVE
REGISTERs are “legalized” by retiming them back to positive REGISTERs. In contrast, we
provide algorithms to directly handle NEGATIVE REGISTERs for enhanced verification.
The problem of generating valid initial states for a retimednetlist has been the topic
of several prior research efforts. Touati and Brayton [76] proposed a method for adding
reset logic which forces an equivalent initial state. Even et al. [77] described a modified
77
retiming algorithm that favors forward retiming, allowinga simple computation of the ini-
tial states similarly to our use of a normalized retiming. All previous work on retimed
initial state computation assumes the necessity of preserving input-output equivalence. In
contrast, our approach eliminates this restriction through a temporal decomposition of the
verification task, enabling a larger solution space hence a greater reduction potential for the
retiming solution.
Gupta et al. [71] were the first to propose the application of maximal peripheral re-
timing in the context of simulation-based verification. They showed that peripheral REG-
ISTERs may be discarded during test generation without compromising the coverage of the
resulting transition tour. However, their approach is focused upon test generation and does
not consider more general verification frameworks. Furthermore, their work does not ad-
dress the initialization problem and does not use the concept of NEGATIVE REGISTERs.
The work of Cabodi et al. [78], which uses retiming to enhancesymbolic reachability anal-
ysis, is the closest to ours. However, they use an original synthesis retiming algorithm
with the above-mentioned limitations regarding enforced reset state equivalence and dis-
allowing of NEGATIVE REGISTERs. Furthermore, their retiming domain is based upon
next-state functions of REGISTERs which significantly reduces the optimization freedom.
Consequently, their reported results demonstrate fairly modest improvements.
There are only two previous publications related to our technique of fanin REGIS-
TER sharing to our knowledge. In [79], a technique is presented that simultaneously con-
siders multiple structures for possible logic implementations using achoicevertex. Their
technique focuses upon technology mapping in synthesis, and despite its recursive capa-
bility, it must explicitly generate candidate structures for an AND cluster decomposition
including possible retiming configurations. In our approach, we defer the actual decom-
position step until after an optimal retiming is computed. Our modeling guarantees that
there will exist a decomposition of the AND clusters with the minimal number of REG-
ISTERs computed by the retiming solution. In [80], the concept of algebraic factorization
78
is extended to sequential expressions, which implicitly intertwines retiming with structural
rewriting. This work proposes a set of sequential transformations which may be applied in
a synthesis scenario. In contrast to our work, this technique is based on individual, local
restructuring steps and does not model the decomposition flexibility of the expressions for
global retiming.
6.4 Experimental Results
In this section we provide a set of experimental results for retiming, redundancy removal,
and diameter overapproximation. We have deferred results for the previous chapters until
now so that we may study their synergy with retiming. We implemented these techniques in
C using the data structures and algorithms described in these chapters. We used the primal
network simplex algorithm from IBM’s Optimization Solutions Library (OSL) [81] as ILP
solver for the retiming formulation.
6.4.1 Redundancy Removal Experiments
Our first set of experiments study the effectiveness of our on-the-fly retiming algorithm
(presented in Section 5.1) and our fanin sharing algorithm on the reduction capability of
retiming. We disallow NEGATIVE REGISTERs for this set of experiments. These exper-
iments were run on an IBM ThinkPad Model T21, with an 800MHz PIII and 256 MB
main memory, running RedHat Linux 6.2. In these experimentswe used peripheral retim-
ing [71]. We focus here on reduction of the size of the recurrence structure~N , injecting
constants for retimed initial values to eliminate the contribution of the retiming stump~NS.
The retiming stump is often small hence does not constitute abottleneck in the overall ver-
ification scheme. This is mainly due to the fact that large portions of the stump resolve to
constants since most of the original REGISTERs have constant initial values. We revisit the
size of the recurrence structure in Section 7.3.
79
Table 6.1 provides results for various retiming options forthe ISCAS89 bench-
marks. The results are based upon the described AND/INVERTER/REGISTER graph rep-
resentation of the netlist and report the number of 2-input AND vertices and REGISTERs.
Columns 1 and 2 list the name of the netlists and their initial, unretimed sizes, respectively.
Column 3 provides the netlist sizes for retiming without theapplication of on-the-fly retim-
ing or fanin REGISTER sharing. This option is identical to classical peripheral retiming as
per [66]. In column 4 we report the result for fanin-REGISTER sharing without on-the-fly
retiming, whereas for the following column we enabled both.Columns 6 through 8 pro-
vide the results for an iterated application of retiming interleaved with redundancy removal,
using the technique of Kuehlmann et al. [51]. We iterated between both engines until no
further improvement was gained and reported the best results. Column 6 provides these re-
sults using plain retiming (as in column 3), whereas column 7reports the results of the best
option of the techniques used in column 4 or 5. Column 8 indicates the required computing
resources for the best run between columns 6 and 7, preferring minimum REGISTERs to
minimum AND vertices. In column 9 we provide previously published results. As shown,
our technique almost always yields lower REGISTERcounts. Despite detailed analysis, we
could not reproduce the results reported in [71] for netlists S344 and S349.
Table 6.2 provides the data for an identical set of experiments for various IBM Gi-
gahertz Processor (GP) netlists, after performing phase abstraction [16]. There are several
noteworthy trends in both tables. First, plain retiming decreases REGISTER count by an
average of16:8% on the ISCAS netlists, and by50:1% on the GP netlists. The larger re-
ductions observed for the GP netlists are a characteristic of the high degree of pipelining
inherent in high-performance designs, and indicative of the power of retiming to alleviate
these inflated REGISTER counts. Fanin REGISTER sharing allows an additional reduction
of the REGISTERcount by an average of0:9% and4:7% for the ISCAS and GP netlists. In
addition, the AND count is significantly decreased by the maximal AND clustering and tree
reformation process, by9:8% for ISCAS and20:7% for GP.
80
Design Original Plain Retiming On-the-Fly Iteration of interleaved Previousnetlist retiming with retiming retiming and redundancy removal results
[66] fanin with fanin (iterated until no further improvements) [71];[78]sharing sharing Plain Best result of Time (s) ;
retiming columns 4 or 5Memory (MB)PROLOG 853 ; 136 853 ; 45 676 ; 45 672 ; 46 709 ; 45 644 ; 45 1.0 ; 14.9 - ; -S1196 480 ; 18 480 ; 16 475 ; 16 475 ; 16 463 ; 16 456 ; 16 0.4 ; 4.4 16 ; -S1238 533 ; 18 533 ; 16 532 ; 16 532 ; 16 518 ; 16 513 ; 16 0.5 ; 6.5 17 ; -S1269 478 ; 37 478 ; 36 462 ; 36 463 ; 36 459 ; 36 450 ; 36 0.3 ; 4.4 - ; -S132071 3205 ; 638 3205 ; 389 2604 ; 390 2593 ; 407 1295 ; 266 1221 ; 267 3.6 ; 31.3 - ; -S1423 507 ; 74 507 ; 72 458 ; 72 458 ; 72 461 ; 72 455 ; 72 0.4 ; 5.5 72 ; 74S1488 734 ; 6 734 ; 6 618 ; 6 632 ; 6 659 ; 6 610 ; 6 0.7 ; 12.7 - ; -S1494 746 ; 6 746 ; 6 629 ; 6 644 ; 6 668 ; 6 622 ; 6 0.4 ; 6.5 - ; -S1512 484 ; 57 484 ; 57 455 ; 57 455 ; 57 470 ; 57 455 ; 57 0.3 ; 2.4 - ; 57S158501 3852 ; 534 3852 ; 495 3457 ; 498 3465 ; 498 3283 ; 490 3112 ; 475 9.3 ; 34.5 - ; -S2081 77 ; 8 77 ; 8 70 ; 8 71 ; 8 70 ; 8 70 ; 8 0.2 ; 2.2 - ; -S27 8 ; 3 8 ; 3 8 ; 3 8 ; 3 8 ; 3 8 ; 3 0.1 ; 2.3 - ; -S298 125 ; 14 125 ; 14 97 ; 14 97 ; 14 100 ; 14 91 ; 14 0.2 ; 6.3 - ; -S3271 1125 ; 116 1125 ; 110 1091 ; 110 1093 ; 110 1082 ; 110 1067 ; 110 1.0 ; 8.7 - ; 116S3330 820 ; 132 820 ; 45 657 ; 45 654 ; 46 692 ; 45 624 ; 45 0.7 ; 9.7 - ; -S3384 1070 ; 183 1070 ; 72 1070 ; 72 1070 ; 72 1064 ; 72 1062 ; 72 0.9 ; 6.7 - ; 147S344 109 ; 15 109 ; 15 102 ; 15 102 ; 15 101 ; 15 98 ; 15 0.2 ; 2.3 7 ; -S349 112 ; 15 112 ; 15 104 ; 15 104 ; 15 101 ; 15 98 ; 15 0.2 ; 2.3 7 ; -S35932 12204; 172812204; 172811948; 172811948; 172811660; 1728 11660; 1728 14.3 ; 38.5 - ; -S382 148 ; 21 148 ; 15 134 ; 15 136 ; 15 140 ; 15 134 ; 15 0.2 ; 2.3 15 ; -S385841 13479; 142613479; 141611769; 137511811; 141511794; 1374 11464; 1373 86.6 ; 239.9 - ; -S386 188 ; 6 188 ; 6 126 ; 6 133 ; 6 166 ; 6 125 ; 6 0.2 ; 4.3 - ; -S400 158 ; 21 158 ; 15 141 ; 15 143 ; 15 148 ; 15 141 ; 15 0.2 ; 2.3 15 ; -S4201 165 ; 16 165 ; 16 156 ; 16 159 ; 16 156 ; 16 156 ; 16 0.2 ; 2.3 - ; -S444 169 ; 21 169 ; 15 150 ; 15 153 ; 15 155 ; 15 149 ; 15 0.2 ; 2.3 15 ; -S4863 1750 ; 104 1750 ; 72 1537 ; 37 1537 ; 37 1376 ; 37 1326 ; 37 2.4 ; 17.3 - ; 96S499 187 ; 22 187 ; 22 199 ; 22 199 ; 22 187 ; 22 190 ; 20 0.3 ; 4.4 - ; -S510 213 ; 6 213 ; 6 213 ; 6 213 ; 6 211 ; 6 206 ; 6 0.3 ; 6.4 - ; -S526N 251 ; 21 251 ; 21 191 ; 21 191 ; 21 202 ; 21 183 ; 21 0.3 ; 6.4 - ; -S5378 1422 ; 179 1422 ; 115 1346 ; 114 1321 ; 124 1260 ; 112 1242 ; 113 1.4 ; 15.0 - ; 144S635 190 ; 32 190 ; 32 190 ; 32 190 ; 32 161 ; 32 161 ; 32 0.2 ; 2.3 - ; -S641 160 ; 19 160 ; 15 132 ; 15 132 ; 15 146 ; 15 131 ; 15 0.2 ; 3.3 18 ; -S6669 2263 ; 239 2263 ; 92 2199 ; 92 2199 ; 92 2238 ; 77 2174 ; 76 1.1 ; 5.8 - ; -S713 174 ; 19 174 ; 15 137 ; 15 137 ; 15 149 ; 15 130 ; 15 0.2 ; 5.4 - ; -S820 468 ; 5 468 ; 5 325 ; 5 335 ; 5 345 ; 5 317 ; 5 0.5 ; 12.6 - ; -S832 482 ; 5 482 ; 5 335 ; 5 344 ; 5 355 ; 5 324 ; 5 0.4 ; 8.5 - ; -S8381 341 ; 32 341 ; 32 328 ; 32 335 ; 32 328 ; 32 328 ; 32 0.2 ; 2.3 - ; -S92341 2346 ; 211 2346 ; 172 1896 ; 172 1891 ; 174 1437 ; 145 1377 ; 146 1.8 ; 14.3 - ; -S938 341 ; 32 341 ; 32 328 ; 32 335 ; 32 328 ; 32 328 ; 32 0.2 ; 2.3 - ; -S953 348 ; 29 348 ; 6 356 ; 6 343 ; 6 340 ; 6 332 ; 6 0.3 ; 4.4 - ; -S967 369 ; 29 369 ; 6 386 ; 6 370 ; 6 357 ; 6 355 ; 6 0.3 ; 4.4 - ; -S991 299 ; 19 299 ; 19 297 ; 19 297 ; 19 297 ; 19 297 ; 19 0.2 ; 2.3 - ; -
%Reduction 0.0 ; 0.0 0.0 ; 16.8 9.8 ; 17.7 9.5 ; 17.4 10.8 ; 18.7 14.3 ; 18.9
Table 6.1: Retiming results for the ISCAS89 benchmarks (number of two-input AND ver-tices; number of REGISTERs)
81
Design Original Plain Retiming On-the-Fly Iteration of interleavednetlist retiming with retiming retiming and redundancy removal
[66] fanin with fanin (iterated until no further improvements)sharing sharing Plain Best result of Time (s) ;
retiming columns 4 or 5Memory (MB)CP RAS 2686 ; 660 2686 ; 585 2103 ; 492 2159 ; 492 2148 ; 489 2039 ; 489 4.9 ; 32.4CR RAS 2297 ; 431 2297 ; 379 2200 ; 378 2209 ; 387 1735 ; 341 1873 ; 348 2.0 ; 14.5D DASA 1223 ; 115 1223 ; 100 967 ; 100 968 ; 100 844 ; 100 815 ; 100 0.8 ; 8.9D DCLA 10916 ; 1137 10916 ; 771 10483 ; 771 10506 ; 771 7853 ; 750 7443 ; 750 23.9 ; 94.1D DUDD 1295 ; 129 1295 ; 100 1143 ; 100 1146 ; 100 1119 ; 100 1084 ; 100 1.1 ; 12.9I IBBC 389 ; 195 389 ; 43 228 ; 41 217 ; 41 207 ; 43 196 ; 37 0.5 ; 9.7I IFAR 1202 ; 413 1202 ; 147 1031 ; 142 1033 ; 143 997 ; 139 929 ; 137 1.7 ; 18.5I IFEC 334 ; 182 334 ; 46 302 ; 45 309 ; 45 308 ; 46 287 ; 45 0.7 ; 15.0I IFPF 5896 ; 1546 5896 ; 705 5273 ; 679 4715 ; 612 2812 ; 350 2768 ; 355 43.9 ; 78.0L EMQ 981 ; 220 981 ; 88 737 ; 87 745 ; 88 920 ; 86 632 ; 74 1.2 ; 16.3L EXEC 1618 ; 535 1618 ; 168 1191 ; 163 1193 ; 197 1178 ; 144 974 ; 138 2.2 ; 19.0L FLUSH 893 ; 159 893 ; 5 495 ; 1 409 ; 1 358 ; 1 338 ; 1 0.6 ; 8.7L LMQ 14074 ; 187614074 ; 119612921 ; 119012983 ; 11905793 ; 432 5363 ; 428 41.5 ; 91.9L LRU 581 ; 237 581 ; 94 524 ; 94 518 ; 94 469 ; 94 439 ; 94 1.0 ; 13.1L PNTR 1453 ; 541 1453 ; 245 1351 ; 245 1349 ; 245 1387 ; 245 1325 ; 245 1.2 ; 8.2L TBWK 1160 ; 307 1160 ; 125 829 ; 124 829 ; 124 279 ; 40 267 ; 40 0.8 ; 11.0M CIU 4550 ; 777 4550 ; 459 3262 ; 415 3244 ; 415 2929 ; 381 2757 ; 379 4.8 ; 35.8S SCU1 1520 ; 373 1520 ; 212 1296 ; 204 1346 ; 207 1308 ; 201 1160 ; 192 2.8 ; 20.2S SCU2 8560 ; 1368 8560 ; 640 6632 ; 566 5990 ; 564 3928 ; 432 4119 ; 425 34.6 ; 58.9V CACH 753 ; 173 753 ; 103 652 ; 105 649 ; 110 424 ; 95 393 ; 97 0.8 ; 14.9V DIR 554 ; 178 554 ; 87 491 ; 87 285 ; 50 160 ; 45 152 ; 43 0.5 ; 10.7V L2FB 120 ; 75 120 ; 26 103 ; 26 103 ; 26 107 ; 26 95 ; 26 0.3 ; 4.4V SCR1 826 ; 150 826 ; 95 418 ; 52 618 ; 94 341 ; 49 325 ; 48 0.6 ; 10.6V SCR2 2563 ; 551 2563 ; 458 1157 ; 86 2343 ; 460 524 ; 82 510 ; 82 1.4 ; 14.3V SNPC 78 ; 93 78 ; 21 68 ; 21 68 ; 21 67 ; 21 62 ; 21 0.3 ; 5.4V SNPM 2421 ; 1421 2421 ; 241 1843 ; 237 1814 ; 241 1800 ; 232 1221 ; 180 33.8 ; 116.8W GAR 2107 ; 242 2107 ; 93 1775 ; 91 1769 ; 91 1896 ; 91 1590 ; 75 3.3 ; 16.8W SFA 471 ; 64 471 ; 42 329 ; 42 329 ; 42 324 ; 41 300 ; 41 0.6 ; 12.7
% Reduction 0.0 ; 0.0 0.0 ; 50.1 20.7 ; 54.8 20.3 ; 51.8 33.6 ; 60.3 39.3 ; 61.1
Table 6.2: Retiming results for IBM Gigahertz Processor (GP) netlists
The additional application of on-the-fly retiming has a varying effect upon size.
Our experiments show that on average it hurts both REGISTER count and AND count.
However, in individual cases, it provides a substantial benefit. For example, for seven of
the 42 ISCAS netlists and eleven of the 28 GP netlists, on-the-fly retiming further reduced
the overall AND count. In addition, for three GP netlists the number of REGISTERs is
decreased. Selecting the best result of columns 4 and 5 on a per-netlist basis, we attain an
additional cumulative reduction of 2.5% in AND count and 1.6% in REGISTER count for
the GP netlists over the results of column 4 alone. Also, as illustrated in Figure 5.7, on-
the-fly retiming alone may result in REGISTERreduction even without solving the retiming
problem. For example, the GP netlist LFLUSH is a reconvergent acyclic pipeline. Before
82
using the ILP solver to calculate an optimal retiming, the options used in columns 4 and 5
reduce the REGISTER count to 78 and 38, respectively. Nevertheless, on-the-fly retiming
often temporarily hurts REGISTER count; this penalty is subsequently rectified during the
global retiming phase.
We briefly discuss how redundancy removal and on-the-fly retiming may occasion-
ally hurt REGISTER count using the example netlist depicted in Figure 6.9. Initially, ver-
ticesa and b have three sinks: one AND vertexg1 through a weight-of-zero edge; one
AND vertexg2 through a weight-of-one edge; and another distinct set of vertices through
a weight-of-one edge. By fanout sharing, the initial netlist depicted in part (a) has a total
weight of two. However, with on-the-fly retiming, we will drag the REGISTERs beyondg2 thus eliminating the ability to share fanout REGISTERs; we then may mergeg1 andg2,locking us into a weight-of-three solution depicted in part(b). Even if we had not mergedg1 andg2, this example depicts how on-the-fly retiming often temporarily hurts REGIS-
TER count, though the ILP solver has the opportunity to rectify this penalty. By mergingg1 andg2, we hurt REGISTER count in a manner which the ILP solver cannot rectify un-
less adjacent retiming opportunities may be exploited, since backward retiming the merged
vertex may entail a NEGATIVE REGISTERon the outgoing edge with 0 weight. Therefore,
a promising direction of future research is to apply on-the-fly retiming in a more limited
fashion, perhaps neglecting such a drag unless it is determined thatall non-zero-weight
outgoing edges from botha andb may be on-the-fly retimed, or if it is determined that a
resulting merge will cause a weight-of-zero and a weight-of-one outgoing edge from the
merged-onto vertex (g1=g2 in this example).
Iteration of redundancy removal and retiming may provide significant additional
reductions. Compared to the single application runs, an additional average reduction of4:5% and1:2% on the ISCAS benchmarks, and18:6% and6:3% on the GP netlists, was
achieved for the number of AND vertices and REGISTERs, respectively. Up to six itera-
tions were applied during these runs, with an average numberof 2:6 for ISCAS and4:683
...
...
(a) (b)
...
...
babag1g2 g1=g2
Figure 6.9: Example netlist depicting how on-the-fly retiming may hurt REGISTERcount
for GP. The reported results in column 7 used on-the-fly retiming on eight of the 42 ISCAS
netlists and on six of the 28 GP netlists. One particularly interesting result is that an iterated
application using our new techniques of fanin sharing and on-the-fly retiming is able to sig-
nificantly outperform an interleaved classical retiming and redundancy removal approach.
This demonstrates the overall potential of the presented approaches for enhancing verifi-
cation and technology-independent logic synthesis, and furthermore illustrates the synergy
possible between reduction algorithms in a transformation-based verification framework.
6.4.2 Retiming Experiments
In our next set of experiments we evaluated the impact of generalized retiming to re-
duce netlist size and enhance verification. These experiments were performed on an IBM
RS/6000 Model 260, with a 256 MB memory limit. NEGATIVE REGISTERs are allowed
for these experiments.
In the first set of experiments we assessed the potential of generalized retiming
for reducing REGISTER count. In particular, we evaluated an iterative scheme where the
retiming engine (RET) and the redundancy removal [51] engine (COM) are called in an
interleaved manner. The results for the ISCAS and GP netlists are provided in Table 6.3.
For the ISCAS benchmarks, we list only the netlists with morethan 16 REGISTERs since
smaller designs are of less interest. Columns 2, 3, and 4 report the number of REGISTERs
plus NEGATIVE REGISTERs of the original netlist, after applying COM only, and after
84
applying RET only, respectively. The following columns provide these REGISTER counts
after performing several iterations of COM followed by RET.The number of NEGATIVE
REGISTERs in the sum, if non-zero, is provided in parentheses. For brevity, we report only
up to three iterations; additional iterations provided marginal, though non-zero, improve-
ments. The maximum lag reported in column 9 provides an indication of retiming stump
size; see Section 7.3 for a more detailed discussion of this topic.
Overall, these results indicate that generalized retiminghas a significant potential for
reducing the number of REGISTERs for enhanced verification. For the ISCAS benchmarks
we obtained a maximum REGISTER reduction of 79% with an average of 27%. For the
GP netlists we achieved a maximum reduction of 99.4% with an average of 62%. One
particularly interesting example is the LFLUSH netlist which implements intricate acyclic
control logic. It has one critical path which prevents retiming from being able to remove all
REGISTERs. Retiming removes all REGISTERs outside this subnetlist, and finds a single
net along the critical path to which the remaining REGISTERmay be moved.
The number of NEGATIVE REGISTERs generated by retiming is quite small. This
can be explained by several factors. First, we disallow NEGATIVE REGISTERs on sharing
edges as per the discussion of Section 6.1 to enable efficientlinear algorithms for our (albeit
more limited) solution space. Second, since retiming preserves the sequential weight of
directed cycles, there is generally a penalty associated with NEGATIVE REGISTERs within
a SCC. Only paths between the SCCs are likely to require NEGATIVE REGISTERs for an
optimal solution.
The results further indicate that a repeated application ofretiming and redundancy
removal techniques may achieve greater reductions than a single application of either or
both techniques. For example, the number of REGISTERs of netlist LLMQ is reduced from
1876 to 1185, 433, and 425 by applying one, two, or three iterations of redundancy removal
followed by retiming, respectively. This is a justificationof the power of a transformation-
based verification architecture in simplifying problems which may otherwise be infeasible.
85
Design Number of Sequential Elements (NEGATIVE REGISTERs) Relative Max. Time (s) ;Original COM RET COM-RET COM-RET COM-RET Reduction Lag Memory (MB)
Only Only 1 Iteration 2 Iterations 3 Iterations (Best)PROLOG 136 81 45 (1) 45 (1) 45 (3) 44 (2) 67.6% 2 1.4 ; 22.4S1196 18 16 16 14 14 14 22.2% 1 0.6 ; 10.7S1238 18 17 16 15 14 14 22.2% 1 0.9 ; 21.1S1269 37 37 36 36 36 36 2.7% 1 0.4 ; 6.2S132071 638 513 390 343 292 (1) 289 54.7% 11 3.8 ; 34.7S1423 74 74 72 72 72 72 2.7% 1 0.5 ; 6.2S1512 57 57 57 57 57 57 0.0% 1 0.5 ; 6.2S158501 534 518 498 488 485 485 9.2% 6 5.3 ; 31.8S3271 116 116 110 110 110 110 5.2% 5 0.7 ; 7.0S3330 132 81 44 (2) 44 (3) 44 (2) 44 (2) 66.7% 3 0.7 ; 7.0S3384 183 183 72 72 72 72 60.7% 6 0.7 ; 7.1S35932 1728 1728 1728 1728 1728 1728 0.0% 1 7.2 ; 38.0S382 21 21 15 15 15 15 28.6% 1 0.3 ; 5.9S385841 1426 1415 1375 1375 1374 1374 3.6% 5 29.4 ; 127.4S400 21 21 15 15 15 15 28.6% 0 0.3 ; 5.9S444 21 21 15 15 15 15 28.6% 1 0.3 ; 5.9S4863 104 88 37 37 37 37 64.4% 4 0.9 ; 7.3S499 22 22 22 22 20 20 9.1% 1 0.6 ; 15.1S526N 21 21 21 21 21 21 0.0% 2 0.4 ; 5.9S5378 179 164 112 (6) 112 (6) 111 (6) 111 (6) 38.0% 5 1.6 ; 18.4S635 32 32 32 32 32 32 0.0% 1 0.4 ; 5.9S641 19 17 15 15 15 15 21.1% 2 0.4 ; 5.9S6669 239 231 92 75 75 75 68.6% 5 1.6 ; 14.1S713 19 17 15 15 15 15 21.1% 2 0.4 ; 5.9S8381 32 32 32 32 32 32 0.0% 0 0.5 ; 6.1S92341 211 193 172 172 165 131 37.9% 3 2.5 ; 26.2S938 32 32 32 32 32 32 0.0% 0 0.4 ; 6.1S953 29 29 6 6 6 6 79.3% 0 0.4 ; 6.1S967 29 29 6 6 6 6 79.3% 0 0.4 ; 6.1S991 19 19 19 19 19 19 0.0% 2 0.4 ; 6.0
CR RAS 431 431 378 370 348 348 19.3% 3 6.0 ; 22.6D DASA 115 115 100 100 100 100 13.0% 2 0.9 ; 7.1D DCLA 1137 1137 771 750 750 750 34.0% 1 35.4 ; 36.2D DUDD 129 129 100 100 100 100 22.5% 3 0.9 ; 7.0I IBBC 195 195 40 40 38 36 81.5% 2 1.6 ; 21.6I IFAR 413 413 142 139 136 136 67.1% 4 3.1 ; 19.5I IFEC 182 182 45 45 45 45 75.3% 6 0.7 ; 7.0I IFPF 1546 1356 673 (4) 661 (4) 449 (2) 442 (2) 71.4% 10 46.5 ; 127.9L EMQ 220 220 87 88 74 74 66.4% 4 3.4 ; 18.5L EXEC 535 535 163 137 135 134 75.0% 6 9.8 ; 28.1L FLUSH 159 159 1 1 1 1 99.4% 3 0.8 ; 7.0L LMQ 1876 1831 1190 1185 433 (3) 425 (3) 77.3% 3 50.7 ; 139.1L LRU 237 237 94 94 94 94 60.3% 2 1.1 ; 7.1L PNTR 541 541 245 245 245 245 54.7% 3 1.8 ; 8.8L TBWK 307 307 124 124 40 40 87.0% 3 2.7 ; 18.0M CIU 777 686 415 415 411 387 (1) 50.2% 15 26.3 ; 76.6S SCU1 373 373 204 200 192 192 48.5% 3 9.0 ; 20.6S SCU2 1368 1368 566 565 426 423 69.1% 5 102.2 ; 67.4V CACH 173 155 104 (2) 96 (3) 96 (2) 95 (1) 45.1% 9 1.1 ; 24.0V DIR 178 151 87 83 43 42 (1) 76.4% 5 0.9 ; 22.3V L2FB 75 75 26 26 26 26 65.3% 2 0.5 ; 5.9V SCR1 150 128 52 48 (1) 48 (1) 48 68.0% 4 0.7 ; 10.9V SCR2 551 551 86 82 82 82 85.1% 4 4.4 ; 15.0V SNPC 93 93 21 21 21 21 77.4% 4 0.5 ; 6.8V SNPM 1421 1216 233 (7) 233 (7) 231 (11) 227 (8) 84.0% 15 14.7 ; 65.2W GAR 242 232 91 (1) 90 90 79 (1) 67.4% 2 3.2 ; 25.4W SFA 64 64 42 42 41 41 35.9% 1 1.0 ; 16.0
Table 6.3: Generalized retiming results for ISCAS89 (upperpart) and GP (lower part)
86
Design Original Netlist Reduced Netlist RelativeNumber of Reachability Time (s) ; Number of Reachability BDDinit Time (s) ; ImprovementREGISTERs Steps, Algo Memory(MB) REGISTERs Steps, Algo Nodes Memory(MB) Time ; Memory
PROLOG 136 17 CI 2285 ; 134.5 45 16 CH 611 81.6 ; 27.5 96.4% ; 79.6%S1196 18 4 CI 1.1 ; 6.5 14 2 C I 122 0.5 ; 6.3 54.5% ; 3.1%S1238 18 4 CI 1.2 ; 6.5 14 2 C I 159 0.1 ; 6.3 91.7% ; 3.1%S1269 37 11 CH 13194 ; 185.5 36 11 CH 901 13395 ; 187.5 -1.5% ; -1.1%S3330 132 17 CH 668.0 ; 35.3 45 16 CI 194 35.8 ; 15.6 94.6% ; 55.8%S382 21 13 CI < 0:1 ; 6.2 15 11 CI 17 < 0:1 ; 6.1 0.0% ; 1.6%S400 21 10 CI < 0:1 ; 6.2 15 10 CH 16 < 0:1 ; 6.1 0.0% ; 1.6%S444 21 4 C I < 0:1 ; 6.1 15 3 CH 27 < 0:1 ; 6.1 0.0% ; 0.0%S4863 104 3 I 14400 ; 174.2 37 4 C I 199 14.8 ; 16.6 99.9% ; 90.5%S499 22 1 CH 0.2 ; 6.2 20 1 CH 21 < 0:1 ; 6.2 100% ; 0.0%S641 19 6 C I 0.8 ; 6.4 15 5 C I 15 1.0 ; 6.4 -25.0% ; 0.0%S713 19 6 C I 0.9 ; 6.3 15 5 C I 15 0.6 ; 6.4 33.3% ; -1.6%S953 29 6 C I 0.8 ; 6.4 6 5 CH 7 < 0:1 ; 6.1 100% ; 4.7%S967 29 4 C I 1.1 ; 6.3 6 3 CH 7 < 0:1 ; 6.1 100% ; 3.2%
CR RAS 431 1028 CI 724.3 ; 57.2 370 1026 CI 415 424.0 ; 51.8 41.5% ; 9.4%D DASA 115 6 CI 19.7 ; 7.8 100 5 C I 200 33.0 ; 11.6 -67.5% ; -48.7%D DUDD 129 13 CI 953.3 ; 112.8 100 11 CH 2568 359.1 ; 33.7 62.3% ; 70.1%I IBBC 195 5 CH 145.3 ; 11.4 40 3 CH 41 4.4 ; 6.4 97.0% ; 43.9%I IFAR 413 5 I 14400 ; 87.0 139 22 CI 719 2302 ; 102.0 84.0% ; -17.2%I IFEC 182 6 CI 66.3 ; 8.4 45 2 CH 151 28.0 ; 6.9 57.8% ; 17.9%L EMQ 220 8 CH 323.7 ; 17.0 88 5 CH 5519 205.6 ; 33.0 36.5% ; -94.1%L EXEC 535 5H 14400 ; 63.2 137 9 C I 1856 593.6 ; 103.2 95.9% ; -63.3%L FLUSH 159 4 CI 37.4 ; 7.7 1 2 CH 2 < 0:1 ; 6.2 100% ; 19.5%L PNTR 541 6 CI 6687 ; 138.5 245 3 C I 242 2423 ; 51.2 63.8% ; 63.0%L TBWK 307 6 CH 184.1 ; 9.1 124 4 CH 123 74.0 ; 7.4 59.8% ; 18.7%S SCU1 373 14 CH 8934 ; 165.8 200 12 CH 755 1195 ; 118.1 86.6% ; 28.8%V CACH 173 11 CH 92.1 ; 17.2 97 8 C I 910 20.0 ; 8.9 78.3% ; 48.3%V DIR 178 8 CH 57.9 ; 8.3 83 2 C I 95 11.1 ; 7.0 80.8% ; 15.7%V L2FB 75 4 C I 2.9 ; 6.3 26 2 CH 27 < 0:1 ; 6.1 100% ; 3.2%V SCR1 150 20 CH 250.0 ; 17.7 48 17 CI 90 5.0 ; 15.5 98.0% ; 12.4%V SCR2 551 22 CI 1201 ; 105.0 82 20 CI 220 260.0 ; 36.7 78.4% ; 65.0%V SNPC 93 4 CH 4.9 ; 6.6 21 1 CH 17 < 0:1 ; 6.2 100% ; 6.1%W GAR 242 11 CI 109.8 ; 25.0 90 9 CH 191 82.5 ; 13.0 24.9% ; 48.0%W SFA 64 7 CI 3.7 ; 6.8 42 6 C I 14 3.6 ; 6.9 2.7% ; -1.5%
Table 6.4: Effect of retiming on reachability analysis (C = completed within a time limit offour hours,H = hybrid image computation,I = IWLS95 image computation)
Table 6.4 provides results of another experiment on assessing the impact of gener-
alized retiming for enhanced symbolic reachability analysis using VIS 1.4 [82]. We report
results for all netlists of Table 6.3 for which retiming resulted in a REGISTER reduction,
and for which reachability analysis (before or after retiming) could be completed. We ran
each experiment with two options for image computation: theIWLS95 partitioned transi-
tion relation method [83] and the hybrid method [43]. We report the best of the two results
on a per-example basis. Although after reduction we complete traversal for only three ad-
ditional netlists, the results clearly show that retiming significantly improves the overall
87
performance of reachability analysis. The CPU time is decreased by an average of 53.1%
for ISCAS and 64.0% for GP netlists, respectively. The corresponding memory reductions
are 17.2% and 12.3%, respectively. The cumulative run-timespeedup is 55.7% for the
ISCAS benchmarks and 83.5% for the GP netlists. As another measure of the size of the
retiming stump, we report the BDD sizes for the initial states in column 7. As shown, these
BDDs remain fairly small and tend not to hinder reachabilityanalysis.
Figure 6.10 illustrates the profile of peak BDD size while traversing benchmark
S3330, for the original netlist and after various reductions. This example demonstrates
how retiming tends to benefit the performance of reachability analysis. To further illus-
trate the effect of retiming on reducing the correlation of the state encoding, we analyzed
the traversal of netlist S4863. Reachability timed out during the third traversal step of
the original netlist. Using retiming, the correlation between the remaining REGISTERs
was completely removed resulting in full reachability of all 237 states. Interestingly, the
fine-grained conjunction scheduling approach proposed in [44] provides a similar result for
this netlist which eliminates the need for representing thepresent- and next-state variables
of any REGISTERs without using retiming, instead using an advanced image computation
algorithm. They too are able to complete reachability for this netlist, though their com-
putational requirements exceed ours by more than an order ofmagnitude, and on a faster
computer. While such a profound result is likely atypical, this is strong evidence of the
power of both redundancy removal and retiming to reduce REGISTERcorrelation.
6.4.3 Diameter Overapproximation Experiments
In our final set of experiments, we implemented the diameter overapproximation algorithms
presented in Chapter 4. We ran several sets of experiments toassess the effectiveness of
these techniques on netlists after various transformations.
Our first set of experiments summarized in Table 6.5 are on theISCAS89 bench-
marks, using each primary output as a target. We categorizedthe REGISTERs in the netlist
88
Symbolic Reachability Profile for S3330
0
20000
40000
60000
80000
100000
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Time-step
Nu
mb
er o
f B
DD
No
des
No Reduction
COM Only
Retiming Only
COM + Retiming
Figure 6.10: Peak BDD size profile for traversing S3330 with IWLS95 image computationmethod after various transformations
into the variousTSAPtypes:CCs,ACs,MCs +QCs, andGCs. We additionally ran our di-
ameter overapproximation algorithm on all targets; any with a diameter of less than 50 were
enumerated in setT 0 � T and the average of these corresponding diameters is reported.
The bound of 50 was arbitrarily chosen as being a reasonable cut-off size for discharging
with BMC . In the bottom row we report the cumulative sum of REGISTERs of the corre-
sponding types, and the cumulative sum ofjT 0j andjT j. We performed these experiments
on the original netlists; on redundancy-removed netlists (COM); and on netlists after redun-
dancy removal and retiming (COM,RET,COM), using Theorems 5.3 and 6.3. We perform
the identical set of experiments on GP netlists in Table 6.6.We do not report per-line
computational resources for these experiments; our structural diameter overapproximation
algorithms consume trivial resources. The maximum resources necessary per netlist in
these runs were 12.4 seconds for ISCAS, and 0.4 seconds for GP, with less than 1 MB for
either. The reason for the larger requirements for ISCAS is that we distinctly analyze the
89
fanin cone of each target; some of the ISCAS netlists have a large number of targets. Our
resource requirements are thus less than one second per target on any of these benchmarks.
Analyzing the ISCAS results, we see that for the original netlists, many REGISTERs
are non-complex: 21% are acyclic REGISTERs, and 5% are table cells. A total of 477
original targets (30%) have a diameter of less than 50. Afterredundancy removal, 24%
of the REGISTERs are acyclic, and 10% are table cells; 556 of the targets (34%) have a
diameter of less than 50. After redundancy removal and retiming, 10% of the REGISTERs
are acyclic and 11% are table cells. This drop in acyclic REGISTERs is due primarily
to their elimination by retiming. A total of 639 targets (40%) have a diameter of less
than 50. These results demonstrate a significant potential of structural transformations to
enable the ability to attain a practically useful overapproximate diameter bound for the
untransformed netlists. This result is particularly profound noting that we did not employ
any (possibly costly) techniques to attempt to tightenGCdiameter bounds; our experiments
thus reflect a very fine line between being able to attain a small diameter bound and a huge
bound. As techniques emerge for efficiently improving diameter bounding forGCs, the
compositional and transformation-based theory we have developed should prove even more
useful to obtain superior results with lesser resources.
For the GP netlists, we see that a larger fraction of the REGISTERs is originally non-
complex: 1% are constants, 57% are acyclic, and 13% are tablecells. A total of 95 targets
(33%) have a diameter of less than 50. After redundancy removal, 0.5% of the REGISTERs
are constants, 58% are acyclic, and 15% are table cells. A total of 111 targets (39%) have
a diameter of less than 50. After retiming and redundancy removal, 1% of the REGISTERs
are constants, 19% are acyclic, and 34% are table cells. A total of 126 (44%) of these
targets have a diameter of less than 50.
90
Original Netlist COM COM,RET,COMDesign jRj 2 CC; AC; jT 0j / jT j; jRj 2 CC; AC; jT 0j / jT j; jRj 2 CC; AC; jT 0j / jT j;
MC+QC; GC Avg. d(t0) MC+QC; GC Avg. d(t0) MC+QC; GC Avg. d(t0)PROLOG 0 ; 107 ; 1 ; 28 14 / 73 ; 8.9 0 ; 103 ; 1 ; 28 16 / 73 ; 11.9 0 ; 16 ; 1 ; 28 24 / 73 ; 21.0S1196 0 ; 18 ; 0 ; 0 14 / 14 ; 3.3 0 ; 18 ; 0 ; 0 14 / 14 ; 3.3 0 ; 16 ; 0 ; 0 14 / 14 ; 4.3S1238 0 ; 18 ; 0 ; 0 14 / 14 ; 3.3 0 ; 18 ; 0 ; 0 14 / 14 ; 3.3 0 ; 16 ; 0 ; 0 14 / 14 ; 4.3S1269 0 ; 9 ; 17 ; 11 2 / 10 ; 10.0 0 ; 9 ; 17 ; 11 2 / 10 ; 10.0 0 ; 8 ; 17 ; 11 2 / 10 ; 10.0S132071 0 ; 314 ; 128 ; 196 49 / 152 ; 2.0 0 ; 315 ; 128 ; 195 49 / 152 ; 2.1 0 ; 77 ; 89 ; 183 79 / 152 ; 6.4S1423 0 ; 3 ; 16 ; 55 1 / 5 ; 1.0 0 ; 3 ; 16 ; 55 1 / 5 ; 1.0 0 ; 1 ; 12 ; 59 1 / 5 ; 2.0S1488 0 ; 0 ; 0 ; 6 19 / 19 ; 33.0 0 ; 0 ; 0 ; 6 19 / 19 ; 33.0 0 ; 0 ; 0 ; 6 19 / 19 ; 33.0S1494 0 ; 0 ; 0 ; 6 19 / 19 ; 33.0 0 ; 0 ; 0 ; 6 19 / 19 ; 33.0 0 ; 0 ; 0 ; 6 19 / 19 ; 33.0S1512 0 ; 0 ; 1 ; 56 0 / 21 ; 0.0 0 ; 0 ; 0 ; 57 0 / 21 ; 0.0 0 ; 0 ; 0 ; 57 0 / 21 ; 0.0S158501 0 ; 99 ; 124 ; 311 115 / 150 ; 2.7 0 ; 96 ; 107 ; 328 115 / 150 ; 2.7 0 ; 73 ; 81 ; 292 115 / 150 ; 4.7S2081 0 ; 0 ; 0 ; 8 0 / 1 ; 0.0 0 ; 0 ; 0 ; 8 0 / 1 ; 0.0 0 ; 0 ; 0 ; 8 0 / 1 ; 0.0S27 0 ; 1 ; 2 ; 0 1 / 1 ; 4.0 0 ; 1 ; 2 ; 0 1 / 1 ; 4.0 0 ; 1 ; 2 ; 0 1 / 1 ; 4.0S298 0 ; 0 ; 1 ; 13 0 / 6 ; 0.0 0 ; 0 ; 1 ; 13 0 / 6 ; 0.0 0 ; 0 ; 1 ; 13 0 / 6 ; 0.0S3271 0 ; 6 ; 0 ; 110 1 / 14 ; 7.0 0 ; 6 ; 0 ; 110 1 / 14 ; 7.0 0 ; 0 ; 0 ; 110 1 / 14 ; 7.0S3330 0 ; 103 ; 1 ; 28 16 / 73 ; 11.9 0 ; 103 ; 1 ; 28 16 / 73 ; 11.9 0 ; 16 ; 1 ; 28 33 / 73 ; 25.3S3384 0 ; 111 ; 0 ; 72 6 / 26 ; 16.5 0 ; 111 ; 0 ; 72 6 / 26 ; 16.5 0 ; 0 ; 0 ; 72 6 / 26 ; 16.5S344 0 ; 0 ; 4 ; 11 3 / 11 ; 5.0 0 ; 0 ; 4 ; 11 3 / 11 ; 5.0 0 ; 0 ; 4 ; 11 3 / 11 ; 5.0S349 0 ; 0 ; 4 ; 11 3 / 11 ; 5.0 0 ; 0 ; 4 ; 11 3 / 11 ; 5.0 0 ; 0 ; 4 ; 11 3 / 11 ; 5.0S35932 0 ; 0 ; 0 ; 1728 0 / 320 ; 0.0 0 ; 0 ; 0 ; 1728 0 / 320 ; 0.0 0 ; 0 ; 0 ; 1728 0 / 320 ; 0.0S382 0 ; 6 ; 0 ; 15 0 / 6 ; 0.0 0 ; 6 ; 0 ; 15 0 / 6 ; 0.0 0 ; 0 ; 0 ; 15 0 / 6 ; 0.0S385841 0 ; 47 ; 4 ; 1375 56 / 304 ; 1.0 1 ; 203 ; 366 ; 854133 / 304 ; 14.90 ; 170 ; 345 ; 832110 / 304 ; 16.7S386 0 ; 0 ; 0 ; 6 7 / 7 ; 33.0 0 ; 0 ; 0 ; 6 7 / 7 ; 33.0 0 ; 0 ; 0 ; 6 7 / 7 ; 33.0S400 0 ; 6 ; 0 ; 15 0 / 6 ; 0.0 0 ; 6 ; 0 ; 15 0 / 6 ; 0.0 0 ; 0 ; 0 ; 15 0 / 6 ; 0.0S4201 0 ; 0 ; 0 ; 16 0 / 1 ; 0.0 0 ; 0 ; 0 ; 16 0 / 1 ; 0.0 0 ; 0 ; 0 ; 16 0 / 1 ; 0.0S444 0 ; 6 ; 0 ; 15 0 / 6 ; 0.0 0 ; 6 ; 0 ; 15 0 / 6 ; 0.0 0 ; 0 ; 0 ; 15 0 / 6 ; 0.0S4863 0 ; 62 ; 0 ; 42 0 / 16 ; 0.0 0 ; 83 ; 0 ; 21 0 / 16 ; 0.0 0 ; 16 ; 0 ; 21 0 / 16 ; 0.0S499 0 ; 0 ; 0 ; 22 0 / 22 ; 0.0 0 ; 0 ; 0 ; 22 0 / 22 ; 0.0 0 ; 0 ; 0 ; 22 0 / 22 ; 0.0S510 0 ; 0 ; 0 ; 6 7 / 7 ; 33.0 0 ; 0 ; 0 ; 6 7 / 7 ; 33.0 0 ; 0 ; 0 ; 6 7 / 7 ; 33.0S526N 0 ; 0 ; 1 ; 20 0 / 6 ; 0.0 0 ; 0 ; 1 ; 20 0 / 6 ; 0.0 0 ; 0 ; 1 ; 20 0 / 6 ; 0.0S5378 0 ; 115 ; 0 ; 64 4 / 49 ; 1.5 0 ; 126 ; 0 ; 53 4 / 49 ; 1.5 0 ; 56 ; 0 ; 56 7 / 49 ; 3.9S635 0 ; 0 ; 0 ; 32 0 / 1 ; 0.0 0 ; 0 ; 0 ; 32 0 / 1 ; 0.0 0 ; 0 ; 0 ; 32 0 / 1 ; 0.0S641 0 ; 7 ; 0 ; 12 3 / 24 ; 1.0 0 ; 7 ; 0 ; 12 3 / 24 ; 1.0 0 ; 4 ; 0 ; 10 7 / 24 ; 2.0S6669 0 ; 181 ; 0 ; 58 37 / 55 ; 3.4 0 ; 181 ; 0 ; 58 37 / 55 ; 3.4 0 ; 18 ; 0 ; 58 37 / 55 ; 4.0S713 0 ; 7 ; 0 ; 12 3 / 23 ; 1.0 0 ; 7 ; 0 ; 12 3 / 23 ; 1.0 0 ; 7 ; 0 ; 7 7 / 23 ; 2.3S820 0 ; 0 ; 0 ; 5 19 / 19 ; 17.0 0 ; 0 ; 0 ; 5 19 / 19 ; 17.0 0 ; 0 ; 0 ; 5 19 / 19 ; 17.0S832 0 ; 0 ; 0 ; 5 19 / 19 ; 17.0 0 ; 0 ; 0 ; 5 19 / 19 ; 17.0 0 ; 0 ; 0 ; 5 19 / 19 ; 17.0S8381 0 ; 0 ; 0 ; 32 0 / 1 ; 0.0 0 ; 0 ; 0 ; 32 0 / 1 ; 0.0 0 ; 0 ; 0 ; 32 0 / 1 ; 0.0S92341 0 ; 45 ; 9 ; 157 22 / 39 ; 1.2 0 ; 49 ; 5 ; 157 22 / 39 ; 1.2 0 ; 14 ; 25 ; 133 22 / 39 ; 2.0S938 0 ; 0 ; 0 ; 32 0 / 1 ; 0.0 0 ; 0 ; 0 ; 32 0 / 1 ; 0.0 0 ; 0 ; 0 ; 32 0 / 1 ; 0.0S953 0 ; 23 ; 0 ; 6 3 / 23 ; 2.0 0 ; 23 ; 0 ; 6 3 / 23 ; 2.0 0 ; 0 ; 0 ; 6 23 / 23 ; 29.8S967 0 ; 23 ; 0 ; 6 3 / 23 ; 2.0 0 ; 23 ; 0 ; 6 3 / 23 ; 2.0 0 ; 0 ; 0 ; 6 23 / 23 ; 29.8S991 0 ; 0 ; 0 ; 19 17 / 17 ; 8.8 0 ; 0 ; 0 ; 19 17 / 17 ; 8.8 0 ; 0 ; 0 ; 19 17 / 17 ; 8.8P
0; 1317; 313; 4622 477 / 1615 1; 1503; 653; 4086 556 / 1615 0; 509; 583; 3992 639 / 1615
Table 6.5: Diameter experiments for ISCAS89 benchmarks
91
Original Netlist COM COM,RET,COMDesign jRj 2 CC; AC; jT 0j / jT j; jRj 2 CC; AC; jT 0j / jT j; jRj 2 CC; AC; jT 0j / jT j;
MC+QC; GC Avg. d(t0) MC+QC; GC Avg. d(t0) MC+QC; GC Avg. d(t0)CP RAS 0 ; 279 ; 66 ; 315 0 / 2 ; 0.0 0 ; 286 ; 66 ; 307 0 / 2 ; 0.0 0 ; 179 ; 65 ; 238 0 / 2 ; 0.0CLB CNTL 0 ; 29 ; 2 ; 19 0 / 2 ; 0.0 0 ; 25 ; 2 ; 19 0 / 2 ; 0.0 0 ; 15 ; 2 ; 20 0 / 2 ; 0.0CR RAS 0 ; 96 ; 6 ; 329 0 / 1 ; 0.0 0 ; 100 ; 7 ; 321 0 / 1 ; 0.0 0 ; 52 ; 10 ; 284 0 / 1 ; 0.0D DASA 0 ; 16 ; 81 ; 18 1 / 2 ; 35.0 0 ; 10 ; 86 ; 13 2 / 2 ; 27.0 0 ; 1 ; 86 ; 13 2 / 2 ; 28.0D DCLA 0 ; 382 ; 1 ; 754 0 / 2 ; 0.0 0 ; 387 ; 1 ; 748 0 / 2 ; 0.0 0 ; 14 ; 0 ; 736 0 / 2 ; 0.0D DUDD 0 ; 30 ; 28 ; 71 4 / 22 ; 9.2 0 ; 21 ; 28 ; 71 4 / 22 ; 10.8 0 ; 1 ; 21 ; 71 7 / 22 ; 11.0I IBBQn 0 ; 623 ; 1488 ; 0 15 / 15 ; 4.7 0 ; 623 ; 1488 ; 0 15 / 15 ; 4.7 0 ; 0 ; 1488 ; 0 15 / 15 ; 4.7I IFAR 0 ; 303 ; 11 ; 99 0 / 2 ; 0.0 0 ; 257 ; 11 ; 93 0 / 2 ; 0.0 0 ; 41 ; 18 ; 79 0 / 2 ; 0.0I IFPF 11 ; 893 ; 44 ; 598 0 / 1 ; 0.0 1 ; 923 ; 35 ; 525 0 / 1 ; 0.0 0 ; 191 ; 4 ; 218 0 / 1 ; 0.0L3 SNP1 25 ; 529 ; 39 ; 82 0 / 5 ; 0.0 6 ; 400 ; 41 ; 62 0 / 5 ; 0.0 0 ; 31 ; 30 ; 41 1 / 5 ; 1.0L EMQn 5 ; 146 ; 6 ; 66 0 / 1 ; 0.0 5 ; 136 ; 6 ; 66 1 / 1 ; 1.0 5 ; 20 ; 14 ; 57 1 / 1 ; 1.0L EXEC 12 ; 421 ; 0 ; 102 0 / 2 ; 0.0 0 ; 430 ; 0 ; 58 0 / 2 ; 0.0 0 ; 88 ; 0 ; 57 0 / 2 ; 0.0L FLUSHn 6 ; 198 ; 0 ; 4 7 / 7 ; 3.7 0 ; 194 ; 0 ; 4 7 / 7 ; 3.7 0 ; 12 ; 0 ; 4 7 / 7 ; 4.0L INTRo 14 ; 143 ; 12 ; 5 30 / 30 ; 3.8 0 ; 135 ; 12 ; 5 30 / 30 ; 3.8 0 ; 3 ; 12 ; 4 30 / 30 ; 3.6L LMQo 28 ; 690 ; 4 ; 133 0 / 16 ; 0.0 24 ; 682 ; 4 ; 141 0 / 16 ; 0.0 24 ; 114 ; 2 ; 132 0 / 16 ; 0.0L LRU 0 ; 142 ; 20 ; 75 0 / 12 ; 0.0 0 ; 127 ; 86 ; 9 12 / 12 ; 15.0 0 ; 0 ; 86 ; 8 12 / 12 ; 15.0L PFQo 14 ; 1936 ; 17 ; 84 1 / 67 ; 1.0 8 ; 1929 ; 82 ; 20 1 / 67 ; 1.0 8 ; 192 ; 83 ; 17 1 / 67 ; 1.0L PNTRn 3 ; 228 ; 10 ; 11 23 / 31 ; 2.0 0 ; 211 ; 10 ; 11 23 / 31 ; 2.0 0 ; 1 ; 10 ; 11 23 / 31 ; 4.0L PRQn 34 ; 366 ; 106 ; 265 10 / 10 ; 15.2 30 ; 367 ; 108 ; 260 10 / 10 ; 15.2 30 ; 12 ; 64 ; 302 10 / 10 ; 8.0L SLB 3 ; 135 ; 6 ; 27 2 / 3 ; 1.0 0 ; 126 ; 6 ; 26 2 / 3 ; 1.0 0 ; 15 ; 6 ; 23 2 / 3 ; 1.0L TBWKn 0 ; 202 ; 117 ; 14 0 / 21 ; 0.0 0 ; 186 ; 119 ; 12 1 / 21 ; 1.0 0 ; 1 ; 78 ; 53 1 / 21 ; 1.0M CIU 0 ; 343 ; 10 ; 424 0 / 6 ; 0.0 0 ; 321 ; 5 ; 417 0 / 6 ; 0.0 0 ; 63 ; 60 ; 286 6 / 6 ; 1.0SIDECAR 4 3 ; 109 ; 32 ; 455 0 / 1 ; 0.0 0 ; 60 ; 34 ; 453 0 / 1 ; 0.0 0 ; 24 ; 34 ; 67 0 / 1 ; 0.0S SCU1 1 ; 232 ; 4 ; 136 0 / 3 ; 0.0 0 ; 220 ; 6 ; 124 0 / 3 ; 0.0 0 ; 75 ; 4 ; 70 2 / 3 ; 2.0V CACH 5 ; 94 ; 15 ; 59 0 / 1 ; 0.0 0 ; 93 ; 14 ; 52 0 / 1 ; 0.0 1 ; 22 ; 15 ; 27 1 / 1 ; 1.0V DIR 6 ; 91 ; 13 ; 68 0 / 2 ; 0.0 0 ; 100 ; 13 ; 55 0 / 2 ; 0.0 0 ; 13 ; 10 ; 20 2 / 2 ; 8.0V SNPM 65 ; 846 ; 134 ; 376 1 / 2 ; 2.0 3 ; 762 ; 97 ; 401 2 / 2 ; 1.5 0 ; 51 ; 26 ; 46 2 / 2 ; 1.5W GAR 0 ; 159 ; 0 ; 83 1 / 7 ; 1.0 0 ; 158 ; 0 ; 82 1 / 7 ; 1.0 0 ; 10 ; 0 ; 81 1 / 7 ; 1.0W SFA 0 ; 22 ; 0 ; 42 0 / 8 ; 0.0 0 ; 22 ; 0 ; 42 0 / 8 ; 0.0 0 ; 0 ; 0 ; 42 0 / 8 ; 0.0P
235; 9683; 2272; 4714 95 / 284 77; 9291; 2367; 4397 111 / 284 68; 1241; 2228; 3007 126 / 284
Table 6.6: Diameter experiments for GP netlists
Note that in some cases, the diameter bound obtained on the retimed netlist is
slightly larger than that of the original netlist – for example, with S1196 and S158501.
This is due to the inequality in Theorem 6.3; we must add the negated lag of the target to its
diameter bound, even though retiming may not have reduced REGISTERcount for that tar-
get. Use of a normalized retiming helps minimize this potential increase, as does retiming a
single target cone at a time. However, the potential for increase tends to be very small (since
most lags tend to be very small), and the potential for decrease is much greater (potentially
exponentially greater). Transformations also impact table identification andTSAP clus-
tering heuristics. Due to the speed of these heuristic algorithms, it may be beneficial to run
them on every possible netlist representation to enable thebest possible result.
92
We now discuss several netlists in more detail. Netlist IIBBQn is a large table-
based netlist. Forward reachability analysis of the redundancy-removed cone of a single
unreachable target with a diameter of three (comprising 442REGISTERs and 134 FREE
vertices) requires 172.3 seconds and 25 MB with a MLP [84] algorithm, with sift variable
ordering enabled and a random initial ordering. For a cone ofthis size, completion of
reachability is somewhat a matter of luck, in this case due toa large degree of independence
of the corresponding BDD variables. However, because of itssmall diameter, the presented
techniques solve the target using SAT with a total of 0.46 seconds and 16 MB. Ignoring the
time necessary to parse the netlist, we attain nearly a triple order-of-magnitude speedup.
L FLUSHn is nearly acyclic; it has only ten REGISTERs in self-loops, six of which
are constant. For one target with 38 REGISTERs and 47 FREE vertices, reachability analysis
of the optimized target with MLP requires 1.20 seconds and 11MB. Redundancy removal
plus retiming enable MLP to solve the target in 0.60 seconds with 13 MB. Due to a shallow
diameter of three, our techniques solve the target using SATwith cumulative resources of
0.19 seconds and 9 MB.
93
Chapter 7
Cut-Based Abstraction
In this chapter we discuss the technique of structuralcut-based abstraction. The idea of this
approach is to identify a cuthC; Ci of the netlist graph whereT � C, then to replace the cut
coneC with a simpler yet trace-equivalent logic cone. In order to ensure soundness of this
approach, we need to include the initial values of any REGISTERs in C as elements ofC,
henceT[Z(C\R) � C. The abstracted netlist is then transferred to an arbitraryverification
flow, which may include successive applications of cut-based abstraction interspersed with
other abstraction techniques such as redundancy removal (refer to Chapter 5) and retiming
(refer to Chapter 6).
We develop the theory of this chapter to handle arbitrary cuts, though the imple-
mentation we discuss limitsC to be combinational. We provide efficient algorithms for
computing a minimally-sized trace-equivalent replacement cone forC. The primary goal
of this abstraction is to reduce the number of FREE vertices, with reduction of AND ver-
tices as a secondary goal. Our primary motivation for this combinational restriction is the
difficulty of sequential trace-equivalence calculation, which generally requires state space
enumeration like bisimilarity reduction [27, 85], hence often outweighs the cost of invari-
ant checking. We wish to simplify an overall verification flowwith this abstraction, hence
wish to avoid relying on algorithms which are likely to significantly hamper the verification
94
effort. Furthermore, when coupled with retiming, this combinational limitation becomes
less restrictive because retiming increases the size of thecombinationally-driven logic of
the netlist. Additionally, this abstraction is useful in reducing the size of the retiming stump
(refer to Section 6.2).
This abstraction is beneficial to several types of verification flows. First and fore-
most, though it is possible that the number of AND vertices may increase through this
abstraction (whereas the implementation may easily be tuned to guarantee that the number
of FREE vertices will decrease), as we demonstrate in our experimental results we often
reduce AND count. Therefore, this technique tends to increase the efficiency of arbitrary
subsequent algorithms since it reduces netlist size, hencethe amount of memory required
to represent the netlist and the amount of time necessary to analyze the netlist (regardless
of the nature of the analysis algorithms) both tend to decrease.
BDD-based techniques (such as symbolic reachability analysis and symbolic sim-
ulation) often benefit since, with fewer FREE vertices, there are fewer necessary BDD
variables hence BDDs tend to be smaller and reordering tendsto take less time. This is one
motivation behind the concept of parametric representation [86, 87]. Additionally, the cut-
based method of creating FREE vertices to drive the replacement cone forC does not cause
any correlation that did not already exist inC, and is often able to eliminate correlation,
resulting in a more compact BDD encoding.
Simulation-based techniques (including semi-formal approaches) may be enhanced
by cut-based abstraction since, in minimizing FREE vertex count, it becomes probabilis-
tically more likely to exercise a better distribution of valuations to the cut frontier. For
example, given a 10-input AND vertexv whose inlist comprises only FREE vertices, only
one of210 possible valuations toinlist(v)will drive a 1 tov. However, if we replacev with a
single FREE vertexv0, one of only two possible valuations tov0 will result in a1. Such a re-
placement may often be beneficial to increase the coverage attainable with simulation, and
may be viewed as a transformation-based approach at exploiting constraint-based testcase
95
pattern generation to achieve similar goals. However, there is a risk that this transformation
may hurt simulation; for example, the above-mentioned AND vertex may represent areset
condition which should assert very infrequently to ensure best coverage.
Finally, this approach is capable of enhancing SAT-based analysis. First, structural
SAT solvers often benefit from BDD sweeping [51] to eliminateredundancy in the netlist;
as per the above analysis, BDD-based analysis may greatly benefit from this approach. Ad-
ditionally, SAT efficiency tends to be more dependent upon the amount of combinational
logic in the netlist than on FREE vertex count. This abstraction is often able to reduce com-
binational logic, thus may enhance SAT analysis in a similarbut complementary manner
as BDD sweeping for redundancy removal is capable of enhancing SAT analysis.
Lemma 7.1. LetVC denote the set of vertices ofN sourcing edgesEC crossing an arbitrary
cut hC; Ci whereT [ Z(C \R) � C. LetV 0C of N 0 denote a trace-equivalent set of vertices
with respect to bijective mapping : VC 7! V 0C. The netlistN 00 = N 0 k C formed
from N 0 k N by merging eachv 2 VC onto (v) satisfies the condition that vertex setfV 00C [ C 00g is trace-equivalent to vertex setfVC [ Cg with respect to bijective mapping 00 = fhv; (v)00i : v 2 VCg [ fhv; v00i : v 2 Cg.Proof. This lemma is similar to the result that bisimilarity preserves all CTL formulas [28],
viewing C as a synthesized automaton representing some correctness formula. However,
our formulas are invariants, hence trace equivalence is a sufficient condition.
We first note that the only way in which a vertexu 2 C may semantically affectv 2 C is if there exists a structural path fromu to v, or if u fans out to the initial value of
a REGISTER in C, as follows from Definition 3.12. This definition further implies that we
may consistently evaluate vertexv using only valuations toVC without a need to observe
valuations to oi(VC) n VC given thatZ(C \R) � C.
Since we merge each vertex ofVC onto a trace-equivalent vertex ofV 0C, this implies
that any sequence of valuations toVC is also producible atV 0C and vice-versa. This in turn
implies that any sequence of valuations toC is also producible atC 00 and vice-versa, which
96
Partial Trace Lift Trace(Partial Trace p0)1. Completep0 overN 00 up to its length withSimulate.
2. Initializep = ;. For eachv 2 fVC [ Cg, and eachi 2 0; : : : ; length(p0)� 1:p = p [ h(v; i); p0� 00(v); i�i.3. UseBMC over oi(VC) to calculate a satisfying assignment to the sequence of valu-
ations toVC present inp. BMC will produce another tracep00.4. Add all valuations fromp00 into p. For eachv 2 oi(VC), and eachi 20; : : : ; length(p)� 1:p = p [ h(v; i); p00(v; i)i.5. return p.
Figure 7.1: Cut abstraction trace lifting algorithm
collectively imply trace-equivalence offVC [ Cg andfV 00C [ C 00g with respect to 00.Lemma 7.1 forms the theoretical basis for cut-based abstraction. It indicates that we
may in certain cases merge vertices onto trace-equivalent vertices in a sound and complete
manner – which is more general than restricting merging ontosemantically equivalent ver-
tices as discussed in Theorem 5.1. We may only exploit this generalization provided that
we merge a “semantic cut” of trace-equivalent vertices, otherwise we risk violating trace
equivalence of the resulting netlist. For example, assume that we have FREE verticesi1 andi2, which fan out tou1 = i1_ i2 andu2 = i1^ i2. Vertexu1 is trace-equivalent to any FREE
vertexu01, and vertexu2 is trace-equivalent to any FREE vertexu02. Note thatfu1; u2g is not
trace-equivalent tofu01; u02g because any trace over the former set will adhere tou2 ! u1,whereas there is no correlation betweenu01 andu02, thus performing either or both merges
will loose this correlation. Therefore, merging of a vertexonto a trace-equivalent vertex
risks becoming overapproximate unless we merge an entire trace-equivalent cut.
Our cut-abstraction trace lifting algorithm is depicted inFigure 7.1. First, we use bi-
nary simulation to complete the abstract trace up to the necessary length. Second, we prop-
97
agate valuations toV 00C andC 00 from the abstract trace to the lifted trace since by Lemma 7.1
we will be able to obtain a trace for which those valuations are consistent withN . Lastly,
we use a bounded model check of the valuations toVC inherited from the abstract trace to
find corresponding valuations to oi(VC) which yield a consistent partial trace.
Theorem 7.1.Cut-based abstraction is sound and complete for invariant checking.
Proof. First, anytarget unreachableresult will be correct by Lemma 7.1. Second, any
target hit result is correct by the same lemma. By assumption, the tracereceived from the
verification of the abstracted target is semantically correct with respect to the abstracted
netlistN 00 = N 0 k C, and hits the abstracted target. SinceVC is trace-equivalent to (VC),theBMC call will be satisfiable and obtain a corresponding set of valuations toC to produce
the sequence of valuations toVC observed in the abstracted trace. Composition of these two
traces thus yields a semantically correct trace. Since the target is an element ofC, the target
will also be hit in the lifted trace.
We omit the proof that cut-based abstraction generates a legal netlist, since this
proof is dependent upon the nature ofN 0. However, since the original netlist is legal by
assumption, clearly a legal solution is attainable.
Theorem 7.2. If the diameter of a set of verticesA00 � fV 00C [C 00g of a cut-abstracted netlistN 00 is d(A00), then the diameter of the corresponding verticesA in the unabstracted netlist
is alsod(A00).Proof. This theorem is an immediate consequence of Lemma 7.1 and Theorem 4.3.
7.1 Cut-Based Abstraction Algorithms
In this section we discuss algorithms for performing cut-based abstraction. As previously
mentioned, the implementation we present limits its domainto combinational cones. This
limitation was largely motivated by the desire to integratea technique to augment retiming;
98
while retiming is useful to reduce the size of therecurrence structureof a netlist, it does add
combinational logic including FREE vertices for theretiming stump(refer to Section 6.2).
A justification of this limitation is that it enables efficient algorithms for performing the
abstraction, whereas sequential trace-equivalence abstraction would generally require state
space enumeration similarly to bisimilarity reduction, which often outweighs the cost of
invariant checking [27]. Recently, Moon et al. [88] have proposed a similar variable re-
duction technique with applications to BDD-based combinational equivalence checking.
Some of the techniques presented in this section follow from[88], though are included
for completeness and to qualify our experimental results. We discuss their work further in
Section 7.2.
Our top-level algorithm is encapsulated in theCut Abstract function depicted in
Figure 7.2. We seed our cut solution by using the FREE vertices of the netlist as sourcesCs.Additionally, all REGISTERs and their fanout cones and initial values, plus all targets, are
seeded as sinksCt. The overall concept of the abstraction process is to compute the char-
acteristic functionBDD i of the cut verticesVC, representing the set of all reachable valu-
ations toVC. Once obtained, we synthesize a netlistN 0 containing verticesV 0C which have
the identical characteristic function, and merge each element ofVC onto a trace-equivalent
correspondent inV 0C. Ideally, the number of FREE vertices and AND gates will be smaller
in oi(V 0C) than in oi(VC). Rather than processingVC in one piece, we maximally parti-
tion this cut into setsCi which have disjoint fanin cones. We decide whether to attempt to
transform oi(Ci) in step 3a based upon the following heuristics.
1. If Ci = Ii, then no transformation is possible.
2. If jCij � jIij, our technique may not be capable of reducing the number of FREE
vertices. We therefore may wish to neglect processing the component to minimize
resources. Alternatively, we may wish to attempt to minimize the number of AND
gates in this component, and perform a transformation only if we demonstrate that
such a reduction is possible.
99
void Cut Abstract(Netlist N)1. Compute a cut from a seededCs andCt, defined as follows.
(a) Cs = I.
(b) Ct = T [ Z(R) [ R [ fanout cone(R).2. Maximally partitionVC into k disjoint setsC1; : : : ; Ck, such that(i 6= j) ! �
fanin
cone(Ci) \ fanin cone(Cj) = ;�, andSki=1 Ci = VC. Let Ii representI \ oi(Ci).
3. For each componentCi:(a) Decide whether to attempt a transformation.
(b) If we wish to attempt to transformCi, we try to obtainBDD i representing thecharacteristic function ofCi using algorithmAnalyze Cut. If successful, wedo the following:
i. Perform aggressive reordering onBDD i to make it as small as possible.
ii. SynthesizeBDD i using algorithmSynthesizeSet. This will yield a netlistN 0 containing verticesC 0i.iii. If N 0 is not too large, we merge eachv 2 Ci onto the correspondingv0 2 C 0i.
Figure 7.2: Top-levelCut Abstract algorithm
3. If jCij < jIij, we likely wish to transform the component. The only conditions
under which we neglect transforming the component are if obtainingBDD i exceeds
resource bounds, or if the size of the replacement component oi(C 0i) is too large.
We often obtain the greatest reductions from using a vertex min-cut in step 1. For
optimality, it is beneficial to perform a cone-of-influence reduction prior to cut abstraction
to prevent edges crossing out of oi(T ) from affecting the solution. Additionally, a prior
redundancy removal is useful to help enable a smaller cut size. While our algorithms for
abstracting the cut are often quite efficient, in cases the exact min-cut is too complex to
process in one step, or its trace-equivalent replacement may be too large. Therefore, it may
occasionally be beneficial to use a “less minimal” cut. It furthermore may be beneficial to
incrementally approach a min-cut through repeated calls tothis algorithm, similarly to the
100
incremental BDD-based approach proposed in [88]. Such an incremental approach effec-
tively decomposes the min-cut reduction. While the number of necessary cut vertices at
each intermediate step will likely be larger, such a decomposition often reduces computa-
tional resources. This is because additional partitioning(in step 2 of Figure 7.2) is often
possible due to elimination of reconvergence with respect to the larger min-cut cone. Fur-
thermore, a subsequent approach at abstracting the min-cutis more likely to succeed with
lesser resources, since the intermediate abstraction willlikely reduce FREE vertex count.
A decomposed approach also allows us to intersperse other reductions algorithms (such as
redundancy removal) between repeated calls to the cut-based abstraction.
Algorithm Analyze Cut depicted in Figure 7.3 is used to obtain the characteristic
function of a set of cut verticesCi. In our implementation we use BDD-based analysis
with a tuned conjunction and quantification schedule. However, other techniques such
as simulation-based or SAT-based enumeration may be used for this purpose. We use a
modified MLP [84] algorithm for the conjunction and quantification schedule. It is theBDD(vj) for eachvj 2 Ci, representing the function of cut vertexvj overI variables, that
must be conjuncted, and theI variables that must be quantified. Rather than waiting to
perform all conjunctions prior to quantification, we wish toperform quantification as early
as possible to keep peak BDD size low. As soon as we complete the last conjunction of aBDD(vj) which has a given FREE variable in its support, we may quantify that variable.
At each MLP scheduling step, we either schedule a composition, or “activate” a
FREE vertexu 2 I to simplify future scheduling decisions – initially, all FREE vertices are
“inactive.” Our goal is to minimize the lifetime of FREE vertex variables, from entering the
support ofBDD i through conjunction until leaving the support through quantification. The
following modifications of the MLP algorithm have proven to be the most useful.� At each decision point, we schedule the conjunction of anyBDD(vj) which has zero
inactive FREE vertices in its support.� If no BDD(vj) satisfies the above criterion, we instead activate an inactive FREE
101
BDD Analyze Cut(Vertex SetCi)1. Compute MLP [84] schedule(v1; : : : ; vjCij) for vertices inCi.2. InitializeBDD i = 1.
3. for ( j = 1; j � jCij; j++ )
(a) Associate a BDD variable with vertexvj, denoted byb(vj).(b) CalculateBDD(vj) representing the function ofvj overI.
(c) UpdateBDD i = BDD i ^ �b(vj) � BDD(vj)�.(d) Perform early quantification ofI variables fromBDD i.(e) If BDD i is too large,return NULL.
4. return BDD i.Figure 7.3:Analyze Cut algorithm
vertex. When choosing which FREE vertexu to activate, we select one which is in
the support of an unscheduled cut vertexvj with the fewest inactive FREE vertices
in its support. Ties are broken to minimize the total number of FREE vertices not
already in the BDD support which would need to be introduced beforeu could be
quantified.
Once we obtain the characteristic functionBDD i of cut verticesCi, we next syn-
thesize this BDD into a netlist to obtain a trace-equivalentset of verticesC 0i. Synthesis ofBDD i may be performed by the algorithmSynthesizeSetprovided in Figure 7.4. Predi-
cateparents(n) returns the set of parent nodes of BDD noden. Note that for the root node,parents(root) is empty.
102
Netlist SynthesizeSet(BDD BDD i)For each BDD variable~n of BDD i, in order of support from root to leaf, we do the fol-lowing. These variables correlate to vertices inCi; any not in the support ofBDD i may beprocessed in any order.
1. Create a new FREE vertexv and assign (~n) = v.
2. Initializea0(~n) = 0 anda00(~n) = 0.
3. For each BDD noden over variable~n:� If n is the BDD root, we definea(n) = ONE. Otherwise, we synthesize the setof paths which “sensitize”n from the root as follows:
– Initialize a(n) = ZERO.
– foreachm 2 parents(n) fif (n is thenbranch ofm) fa(n) = a(n) _ �a(m) ^ 0( ~m)�;gelsefa(n) = a(n) _ �a(m) ^ : 0( ~m)�;gg
Term 0( ~m) is defined below.� If else(n) � 0, thena0(~n) = a0(~n) _ a(n).� If then(n) � 0, thena00(~n) = a00(~n) _ a(n).4. Synthesize 0(~n) = a0(~n) _ �:a0(~n) ^ :a00(~n) ^ (~n)�.
return C 0i = 0�BDD vars(BDDi)�.Figure 7.4:SynthesizeSetalgorithm
103
The purpose ofa0 anda00 in algorithmSynthesizeSetof Figure 7.4 are to enumerate
valuations to (~n) for which vertex 0(~n) must drive a deterministic value, to in turn preventC 0i from being able to produce a valuation which cannot be sensitized inCi. Intuitively,a0(~n) represents the set of valuations to predecessor variables of ~n for which an assignment
of 0 to ~n would render a cross-product of valuations in the offset ofBDD i, meaning that
the corresponding valuation is not in the characteristic function ofCi. Similarly, a00(~n)represents the set of valuations to predecessor variables of ~n for which an assignment of1to ~n would render a cross-product of valuations in the offset ofBDD i. The new inputs (~n)are parametric variables, and are the source of random choice in the abstracted coneC 0i.
We demonstrateSynthesizeSet on an example BDD in Figure 7.5. We have bor-
rowed this example from the work of [89], which will be discussed Section 7.2. Eachxirepresents a parametric variable (~ni) correlating to BDD variable~ni, over nodesnia andnib. Termx0i represents the corresponding synthesized 0(~ni).
1 0
n1a
n3a
n4bn4a
n3bn2bthenelse
n2aa(n2a) a(n2b)
a(n3a) a(n3b)
a(n4a) = a0(~n4a)
x01x02x03
a(n4b) = a00(~n4b)
ONE
x2x3
x4
x1a(n1a)
x04Figure 7.5: BDD synthesis example
104
Lemma 7.2.AlgorithmSynthesizeSetperforms a semantically correct BDD synthesis. In
particular, for the generated netlistN 0, a given mintermm : BDD vars(BDDi) 7! f0; 1g is
an element ofBDD i if and only if9p0 2 P 0:8~n 2 BDD vars(BDDi): p0� 0(~n); 0� = m(~n).Proof. For any variable~n not in the support of the BDD,SynthesizeSet will assign 0(~n) = (~n) since botha0(~n) and a00(~n) will be 0. This is correct since there is no
cross-correlation between~n and any other variable, hence we need to drive 0(~n) by an
uncorrelated parametric FREE vertex.
For other variables, the disjuncted terma0(~n) indicates the set of valuations to pre-
decessor variables (with respect to the arbitrary BDD rank)for which only a binary 1 may
be driven onto 0(~n) to avoid the synthesized netlist from being able to drive a valuation
which is not a minterm ofBDD i. Similarly, the disjuncted terma00(~n) indicates the set of
valuations to predecessor variables for which only a binary0 may be driven. It is only for
sets of valuations to predecessor variables which do not satisfy a0(~n) _ a00(~n) that we may
allow 0(~n) to randomly select values via (~n).We now analyze the number of AND vertices created bySynthesizeSet. Givenm BDD nodes, we note that there will be at most2 �m 2-input AND vertices necessary to
represent the conjunctions inside ofa, since each node has at most 2 children hence appears
at most twice in any of these conjunction terms (once with respect to its variable inverted,
once uninverted). There will be at most an additional2 � m 2-input AND vertices for the
disjunctions over those conjuncted terms, though this number is often smaller since many
nodes have a single parent. We note that there will be at mostm elements ina0 [ a00, since
no node will have both children as0 (else the BDD is not reduced). Therefore, there will be
at mostm 2-input AND vertices to associate thea0 anda00 vertices to variables. Practically,
the number ofa0 anda00 vertices necessary tends to be much smaller thanm, since most
nodes are likely not to have0 as a child. Lastly, there will be at most3�jBDD vars(BDDi)j2-input AND vertices necessary for the 0 terms, as there is one disjunction and a 3-input
105
conjunction per variable to drive 0. However, unless the nodes of a given variable have0both as athenand anelsebranch, at least one ofa0 or a00 will be empty hence less than 3
vertices will be necessary for that variable.
7.2 Related Work
The overall theory of soundness of replication of a cut by a trace-equivalent cone is similar
to that for a bisimilar cone, hence may be viewed as a conservative approach of assume-
guarantee reasoning [45] when verifying the moduleC. Several techniques for study-
ing bisimulation minimization and less conservative property-preserving minimizations
have been proposed, for example in [85, 27, 29]; while more general than the combina-
tional implementation proposed in this chapter, they suffer from computational complexity
which outweighs an invariant check. In contrast, we focus upon a more restrictive do-
main for which efficient reduction algorithms are applicable; this combinational domain
furthermore well-suits the goal of augmenting the reductions possible through retiming in
a transformation-based verification setting.
The discussed implementation is quite similar to the recentwork of Moon et al. [88],
hence we will limit our discussion of previous work to this technique. Their work provides
a cut-based variable reduction technique, with presented applications to simplifying BDD-
based combinational equivalence checking. Though their techniques are tuned for enhanc-
ing BDD-based verification thus perform their analysis and reduction purely via BDDs,
fairly straightforward extensions could be used to plug their algorithms for analyzing and
abstracting cuts into our top-levelCut Abstract function. OurAnalyze Cut technique
is one contribution above their approach. Both of our techniques require the calculation
of the characteristic function of a set of vertices to enableparametric reductions for cut-
sets; however, they do not discuss their algorithms for doing such. We present a tuned
conjunction and early quantification schedule for performing this calculation via BDDs,
106
which has proven efficient as per our experimental results. Additionally, this BDD-based
approach could be replaced with simulation-based or SAT-based enumeration. Their ap-
proach of obtaining a parametric representation is derivedfrom the input-output relation
synthesis algorithm presented in [89]. Though more general, when applied to BDDs with
only output variables (as with our approach) the technique of [89] tends to require several
more AND vertices per BDD node. As an example, applying their approach to the BDD
of Figure 7.5, and using the on-the-fly reduction techniquespresented in Chapter 5, will
require 24 2-input AND vertices instead of 11 with our approach (note that the top-most
two AND vertices of that figure are unnecessary due to conjunction with ONE). The tech-
nique of [88] obtains a parametric representation directlyas a BDD rather than a netlist,
though their algorithmBFS PR would yield an identical netlist as ourSynthesizeSet if
results were mapped to gates instead of BDD nodes. By representing the abstraction as a
netlist, we enable an efficient trace lifting procedure as per Figure 7.1 which may use an
arbitrary algorithm to discharge theBMC obligation (we have found SAT to be often the
most efficient); the approach of [88] does not provide a solution to the trace generation
problem. Nevertheless, their implementation is quite similar to ours; our primary motiva-
tion for discussing a structural flavor of this technique is to demonstrate its synergy with
other abstractions as will be reflected in our experimental results. In particular, we have
found that this technique coupled with redundancy removal is the most efficient way to
minimize the retiming stump created by retiming, thereby helping to ensure that retiming
will not risk hampering a verification flow.
7.3 Experimental Results
In this section we provide experimental results for our combinational cut-based abstrac-
tion implementation. All experiments were run on an IBM ThinkPad model T21 running
RedHat Linux 7.2, with an 800 MHz Pentium III and 256 MB main memory. We set the
107
peak BDD size to219 nodes. We chose a modifiedaugmenting pathalgorithm [90] to com-
pute a vertex min-cut which tends to provide near-linear runtimes despite its worst-case
complexity ofO(jVj � jEj).We performed several sets of experiments to study the reduction capability of this
technique in minimizing FREE and AND vertex count. The first set of experiments was
performed upon the ISCAS89 benchmarks, and is summarized inTable 7.1. We enumer-
ated every primary output of these netlists as a target – aside from any which are also
FREE vertices. For various transformation flows, we report the number of FREE vertices in
the cone of influence of the targets before and after the reduction. We additionally report
the number of combinationally-driven AND vertices (elements of oi(VC) from the algo-
rithm of Figure 7.2) before and after the abstraction. The first column provides the name
of the benchmark. The next five columns present the results, before and after cut-based
abstraction, for various flows of abstraction engines. CUT refers to this cut-based ab-
straction engine; COM refers to a redundancy removal engineusing the technique of [51];
RET refers to a retiming engine (refer to Chapter 6). In columns 2-6, we report the flows
CUT, COM-CUT, RET-CUT, COM-RET-CUT, and COM-RET-COM-CUT,respectively.
In each of these columns, we first report the number of FREE vertices in the cone of in-
fluence of the targets after the abstraction; the number in parenthesis reports the number
eliminated through the abstraction. The second set of numbers (after the semi-colon) refers
to the number of combinationally-driven AND verticesA after the abstraction; the num-
ber in parenthesis reports the number eliminated through the abstraction. In cases, AND
count is increased through the abstraction, correlating toa negative number in parenthesis.
A summation of these values is provided in the last row. Table7.2 provides an identical
set of experiments for randomly-selected components and targets for the IBM Gigahertz
Processor, after performing phase abstraction (refer to Chapter 10).
The computing resources for this abstraction tend to be quite negligible. For Ta-
ble 7.1, the maximum run-time for the cut engine is 160 seconds (for S158501); the av-
108
erage run-time for the others is 0.66 seconds. The maximum memory requirement is 10.9
MB; the average for the others is less than 1 MB. During the abstraction process for these
benchmarks, we aborted theAnalyze Cut for six of 200 components due to exceeded re-
sources. For the ISCAS benchmarks, we note that very little reduction is possible prior to
retiming. This is due to two phenomena: first, the larger number of REGISTERs implies
that there is a fairly small combinational cone to which to apply this technique. Retiming
reduces REGISTERcount, thus the recurrence structure tends to have a larger combinational
cone. Second, after retiming we have the combinational retiming stump composed onto the
recurrence structure, creating an additional domain of applicability of this abstraction. As
mentioned in Section 6.4, and as reflected in those experiments, the retiming stump is rarely
a hindrance to the overall verification flow as much of it may beeliminated by redundancy
removal, though it nonetheless does add to netlist size. Comparing columns 5 and 6, prior
to the post-retiming call to redundancy removal, we have 1897 FREE vertices and 11835
AND vertices total in the combinational cones. After this last call to redundancy removal,
these numbers drop to 1846 and 9560, respectively. However,redundancy removal is lim-
ited in the type of reductions it may provide, which is the motivation for experimentation
with this combinational cut-based abstraction technique.Cut-based abstraction reduces
FREE vertex count for these two cases by 332 and 411, respectively, correlating to 17.5%
and 22.3%, respectively. In the former case we reduce AND count by 20.0%; however,
we see in the latter case that the transformation of S158501 causes a significant increase
in AND count. This illustrates an occasional, though infrequent,risk of increasing AND
count when reducing FREE vertices – though note that far more frequently the AND count
is reduced. This risk may be minimized by bounding the increased size of an abstracted
cone and neglecting a replacement if this threshold is exceeded. Nevertheless, ignoring
S158501, cut-based abstraction yields a reduction of 302 of 1624 FREE vertices (correlat-
ing to 18.6% reduction), and a reduction of 1559 of 9560 AND vertices (correlating to a
reduction of 16.3%) for this last column.
109
DesignCUT COM,CUT RET,CUT COM,RET,CUT COM,RET,COM,CUTjIj (4); jAj (4) jIj (4); jAj (4) jIj (4); jAj (4) jIj (4); jAj (4) jIj (4); jAj (4)
PROLOG 33 (3); 16 (3) 33 (3); 14 (3) 58 (9); 87 (-7) 52 (10); 88 (14) 51 (11); 79 (12)S1196 14 (0); 303 (0) 14 (0); 303 (0) 27 (1); 445 (-14) 27 (1); 449 (-22) 27 (1); 449 (-22)S1238 14 (0); 340 (0) 14 (0); 343 (0) 27 (1); 499 (-21) 27 (1); 496 (-21) 27 (1); 493 (-21)S1269 18 (0); 26 (0) 18 (0); 26 (0) 18 (0); 26 (0) 30 (2); 51 (2) 30 (2); 51 (2)S132071 54 (2); 11 (3) 54 (2); 11 (3) 148 (85); 968 (1222)121 (62); 591 (476) 103 (35); 254 (77)S1423 17 (0); 1 (0) 17 (0); 1 (0) 29 (0); 24 (0) 29 (0); 24 (0) 29 (0); 23 (0)S1488 8 (0); 12 (0) 8 (0); 12 (0) 10 (0); 13 (0) 8 (0); 12 (0) 8 (0); 12 (0)S1494 8 (0); 12 (0) 8 (0); 12 (0) 10 (0); 13 (0) 8 (0); 12 (0) 8 (0); 12 (0)S1512 29 (0); 11 (0) 29 (0); 11 (0) 29 (3); 24 (-4) 29 (0); 11 (0) 29 (0); 11 (0)S158501 55 (12); 259 (-39)43 (12); 227 (-39) 198 (8); 1426 (22) 206 (6); 1783 (-22) 103 (109); 30601 (-29101)S2081 10 (0); 1 (0) 10 (0); 1 (0) 10 (0); 1 (0) 10 (0); 1 (0) 10 (0); 1 (0)S27 4 (0); 0 (0) 4 (0); 0 (0) 7 (0); 2 (0) 4 (0); 0 (0) 4 (0); 0 (0)S298 3 (0); 0 (0) 3 (0); 0 (0) 6 (0); 2 (0) 3 (0); 0 (0) 3 (0); 0 (0)S3271 26 (0); 189 (0) 26 (0); 180 (0) 26 (0); 186 (0) 26 (0); 179 (0) 26 (0); 179 (0)S3330 34 (3); 12 (3) 34 (3); 12 (3) 59 (7); 67 (4) 55 (10); 86 (21) 54 (11); 74 (22)S3384 43 (0); 24 (0) 43 (0); 24 (0) 66 (102); 489 (435)81 (103); 1046 (510) 81 (103); 1040 (510)S344 9 (0); 0 (0) 9 (0); 0 (0) 1 (0); 0 (0) 9 (0); 0 (0) 9 (0); 0 (0)S349 9 (0); 0 (0) 9 (0); 0 (0) 1 (0); 0 (0) 9 (0); 0 (0) 9 (0); 0 (0)S35932 35 (0); 32 (0) 35 (0); 32 (0) 69 (0); 119 (0) 35 (0); 32 (0) 35 (0); 32 (0)S382 3 (0); 0 (0) 3 (0); 0 (0) 6 (0); 2 (0) 6 (0); 2 (0) 6 (0); 2 (0)S385841 32 (0); 11 (0) 31 (0); 11 (0) 94 (12); 1256 (21) 95 (15); 1606 (-85) 89 (15); 944 (-159)S386 7 (0); 6 (0) 7 (0); 6 (0) 7 (0); 2 (0) 7 (0); 6 (0) 7 (0); 6 (0)S400 3 (0); 0 (0) 3 (0); 0 (0) 6 (0); 2 (0) 6 (0); 2 (0) 6 (0); 2 (0)S4201 18 (0); 1 (0) 18 (0); 1 (0) 18 (0); 1 (0) 18 (0); 1 (0) 18 (0); 1 (0)S444 3 (0); 0 (0) 3 (0); 0 (0) 6 (0); 2 (0) 3 (0); 0 (0) 3 (0); 0 (0)S4863 47 (2); 0 (16) 47 (2); 0 (16) 73 (88); 0 (809) 72 (100); 74 (1382) 72 (100); 61 (1015)S499 1 (0); 0 (0) 1 (0); 0 (0) 1 (0); 0 (0) 1 (0); 0 (0) 1 (0); 0 (0)S510 19 (0); 6 (0) 19 (0); 6 (0) 19 (0); 5 (0) 19 (0); 6 (0) 19 (0); 6 (0)S526N 3 (0); 0 (0) 3 (0); 0 (0) 6 (0); 2 (0) 3 (0); 0 (0) 3 (0); 0 (0)S5378 34 (1); 108 (1) 34 (1); 108 (1) 48 (13); 214 (18) 51 (13); 227 (21) 51 (13); 217 (31)S635 2 (0); 0 (0) 2 (0); 0 (0) 2 (0); 0 (0) 2 (0); 0 (0) 2 (0); 0 (0)S641 33 (0); 20 (0) 33 (0); 20 (0) 33 (0); 14 (0) 32 (0); 16 (0) 32 (0); 16 (0)S6669 83 (0); 84 (0) 83 (0); 84 (0) 213 (1); 2418 (51) 213 (1); 2418 (51) 212 (2); 2297 (52)S713 33 (0); 20 (0) 33 (0); 20 (0) 33 (0); 14 (0) 33 (0); 16 (0) 33 (0); 16 (0)S820 18 (0); 39 (0) 18 (0); 37 (0) 17 (1); 18 (1) 18 (0); 37 (0) 18 (0); 37 (0)S832 18 (0); 39 (0) 18 (0); 37 (0) 17 (1); 17 (1) 18 (0); 37 (0) 18 (0); 37 (0)S8381 34 (0); 1 (0) 34 (0); 1 (0) 34 (0); 1 (0) 34 (0); 1 (0) 34 (0); 1 (0)S92341 24 (1); 20 (1) 24 (1); 20 (1) 32 (8); 34 (48) 34 (8); 54 (40) 34 (8); 44 (40)S938 34 (0); 1 (0) 34 (0); 1 (0) 34 (0); 1 (0) 34 (0); 1 (0) 34 (0); 1 (0)S953 16 (0); 49 (0) 16 (0); 49 (0) 16 (0); 50 (0) 16 (0); 52 (0) 16 (0); 52 (0)S967 16 (0); 59 (0) 16 (0); 59 (0) 16 (0); 49 (0) 16 (0); 51 (0) 16 (0); 51 (0)S991 65 (0); 0 (0) 65 (0); 0 (0) 68 (14); 1 (20) 65 (0); 0 (0) 65 (0); 0 (0)P 969 (24); 956 (24); 1598 (354); 1565 (332); 1435 (411);
1713 (-12) 1669 (-12) 8494 (2606) 9468 (2367) 37102 (-27542)
Table 7.1: Cut results for ISCAS89 benchmarks
110
DesignCUT COM,CUT RET,CUT COM,RET,CUT COM,RET,COM,CUTjIj (4); jAj (4) jIj (4); jAj (4) jIj (4); jAj (4) jIj (4); jAj (4) jIj (4); jAj (4)
CP RAS 68 (0); 0 (0) 68 (0); 0 (0) 76 (44); 34 (107) 73 (46); 28 (94) 70 (47); 5 (96)D DASA 19 (0); 13 (0) 19 (0); 13 (0) 25 (2); 14 (7) 19 (0); 13 (0) 19 (0); 13 (0)D DCLA 67 (0); 23 (0) 67 (0); 23 (0) 75 (8); 69 (42) 73 (0); 64 (0) 73 (0); 64 (0)D DUDD 49 (5); 81 (71) 48 (5); 41 (65) 89 (15); 1462 (-708) 81 (22); 730 (-292) 79 (24); 764 (-396)I IBBQn 402 (0); 279 (0) 402 (0); 68 (0) 402 (0); 3179 (0) 402 (0); 3085 (0) 402 (0); 3058 (0)I IFAR 36 (4); 3 (4) 28 (4); 3 (4) 62 (26); 739 (-440) 51 (10); 79 (9) 48 (8); 85 (-27)I IFPF 128 (27); 8 (120)121 (26); 8 (116)1110 (32); 5849 (122)128 (48); 74 (153) 121 (52); 52 (155)L3 SNP1 65 (0); 31 (0) 41 (0); 24 (0) 85 (23); 3546 (-1851) 42 (3); 401 (1) 42 (3); 303 (1)L EMQn 79 (10); 5 (10) 0 (0); 0 (0) 95 (132); 140 (139) 0 (0); 0 (0) 0 (0); 0 (0)L EXEC 108 (1); 3 (1) 77 (2); 2 (2) 140 (87); 109 (263) 65 (20); 44 (25) 48 (16); 28 (21)L FLUSHn 41 (6); 45 (11) 41 (6); 41 (11) 47 (48); 33 (300) 41 (29); 17 (119) 41 (29); 17 (119)L INTRo 24 (0); 0 (0) 24 (0); 0 (0) 17 (7); 49 (19) 17 (7); 35 (15) 17 (7); 35 (15)L LMQo 149 (40); 96 (59)149 (40); 98 (57) 167 (50); 356 (-2) 170 (48); 332 (13) 170 (48); 290 (48)L LRU 17 (0); 1 (0) 16 (0); 0 (0) 25 (6); 82 (6) 13 (3); 12 (3) 13 (3); 12 (3)L PFQo 46 (0); 6 (0) 46 (0); 6 (0) 68 (12); 135 (12) 66 (12); 168 (14) 65 (13); 117 (42)L PNTRn 88 (4); 58 (5) 13 (0); 45 (0) 331 (9); 1152 (-51) 13 (0); 146 (0) 0 (0); 0 (0)L PRQn 5 (1); 0 (1) 5 (1); 0 (1) 6 (5); 0 (5) 6 (1); 0 (1) 6 (1); 0 (1)L SLB 28 (1); 3 (2) 28 (1); 3 (2) 39 (15); 78 (4) 32 (23); 41 (34) 32 (23); 41 (29)L TBWKn 13 (1); 3 (1) 13 (1); 3 (1) 11 (5); 6 (5) 9 (5); 4 (5) 9 (5); 4 (5)SIDECAR 4 15 (0); 20 (0) 13 (0); 9 (0) 25 (3); 37 (27) 25 (3); 25 (11) 25 (3); 23 (6)S SCU1 70 (1); 5 (1) 70 (1); 5 (1) 74 (32); 27 (67) 63 (23); 9 (49) 49 (18); 5 (29)W GAR 38 (1); 0 (1) 38 (1); 0 (1) 71 (1); 7 (1) 38 (1); 0 (1) 38 (1); 0 (1)P 1555 (102); 1327 (88); 3040 (562); 1427 (304); 1367 (301);
683 (287) 392 (261) 17103 (-1926) 5307 (255) 4916 (148)
Table 7.2: Cut results for GP netlists
For the GP netlists of Table 7.2, our peak run time was 282 seconds for a resource-
exceeded attempt on LPNTRn; the average of the other runs is 0.9 seconds. Our peak
memory utilization was 1 MB. We aborted the abstraction process for a total of four of 363
components due to exceeded resources. For these netlists, we see a greater potential for
reduction prior to retiming; using the cut abstraction by itself we are able to reduce FREE
vertex count by 102 of 1657 (or 6.1%), and AND count by 287 of 970 (or 29.6%). After a
single redundancy removal call, we reduce FREE vertex count by 88 of 1415 (or 6.2%), and
AND count by 261 of 653 (or 40.0%). After redundancy removal and retiming (columns 5
and 6), prior to the post-retiming call to redundancy removal, we have 1731 FREE vertices
and 5562 AND vertices total in the combinational cone. After this call toredundancy
removal, these numbers drop to 1668 and 5064, respectively.Addition of our cut-based
abstraction reduces FREE vertex count for these two cases by 304 and 301, respectively,
correlating to 17.6% and 18.0%, respectively. We additionally reduce AND count by 4.6%
111
and 2.9%, respectively. Note that DDUDD hurts the cumulative AND reduction for these
two columns. Ignoring this netlist, we reduce FREE vertex count by 17.3% and 17.7%, and
AND vertex count by 10.7% and 11.6%, for these two columns, respectively.
Overall, these results demonstrate that cut-based abstraction has the potential to
yield a significant reductions above and beyond redundancy removal techniques, attainable
with negligible computational resources. They further illustrate its synergy with retiming
and redundancy removal. Future work of extending structural cut-based abstraction to
include sequential logic for which trace-equivalent reduction is computationally efficient
is a promising direction. Additionally, more research is needed into techniques to improve
incremental reductions in case of exceeded resources, and possibly preferring alternate cuts
if the replacement cone of a given cut increases AND count significantly.
112
Chapter 8
Structural Target Enlargement
In this chapter we introduce our technique ofstructural target enlargementfrom [24],
which is collaborative work with Andreas Kuehlmann and Jacob Abraham. The goal of
target enlargement is to render a target which may be hit at a shallower depth from the ini-
tial states of a netlist, and with a higher probability, thanthe original target as noted by prior
research [91, 92, 93]. Additionally, our particular approach enables significant reduction in
the size of the enlarged target by temporally decomposing the overall verification problem,
and may be viewed as a generalized inductive proof which makes use of SAT-basedBMC ,
BDD-based analysis, and diameter overapproximation techniques.
Definition 8.1. A k-step target enlargementis the set of states that can reach targett in ktime-steps, denoted asSt�k � S, defined as follows.S t�k = 8><>:fs 2 S : 9i � I: �Simulate(t; fs; ig) = 1g� : k = 0preimage�S t1�k� : k 6= 0 (8.1)
If an initial states0 becomes part of the enlarged target for anySt�j, the target is
proven reachable. Otherwise, if during the current enlargement stepj no new states are
enumerated that have not been encountered in “shallower” steps, i.e.St�j nSj�1i=0 St�i = ;,113
the target is clearly unreachable. Ifk � d(t) preimages are performed without reaching
an initial state, unreachability may be inferred. If at any step the computing resources
exceed a given limit, the enlargement process is truncated and the verification problem is
reformulated based upon the states enumerated during shallower steps (refer to Figures 8.1
and 8.2).
As per Definition 8.1, target enlargement is based upon preimage computation, for
which there are three primary techniques: (1) transition-relation based methods [91, 92,
93, 94], (2) transition-function based methods using theconstrainoperator [95], and (3)
transition-function based methods using thecomposeoperator [96]. In our implementation
we chose the latter, since the set of REGISTERs in the support of each iteration of a target
enlargement is often a small subset of those in the entire cone of influence of the target.
This precludes entailing unnecessary computational complexity, and well-suits our goal of
rendering a simpler problem with as few REGISTERs as possible – the enlarged target – if
the target is not solved during enlargement.
Figure 8.1 shows the pseudocode for our target enlargement algorithm. We use
BMC to attempt to hit the target as well as to discharge our induction hypothesis for the
subsequent backward analysis. In our implementation, we use SAT-basedBMC rather than
BDD-based analysis since the former is often more efficient for bounded analysis. IfBMC
hits the target, or an overapproximation of diameterd(t) is surpassed during the bounded
search, we discharge the target in step 1. A diameter bound may be obtained by the tech-
nique presented in Chapter 4, or by any other mechanism. IfBMC is inconclusive, we per-
form compose-basedpreimage computations; we may alternatively iterate betweenBMC
and preimage computation with resource bounds. We apply early quantification of FREE
vertex variables to keep the intermediate BDD size small; assoon as the last composition
which has a given FREE variablev in its support is performed, we may quantifyv. We use
a modified MLP algorithm [84] for our quantification and composition scheduling. At each
MLP scheduling step, we either schedule a composition, or “activate” a FREE vertex to
114
BDD Enlarge Target�Target t, N k, N d(t)�
1. for ( i = 0; i < k; i++ )
(a) RunBMC on targett for time-stepi. If t is hit, report the hit andreturn NULL.
(b) If t has not been hit, andi � d(t) � 1, then reportt asunreachable; returnNULL.
2. BuildBDD0 for t, over variables forfI [Rg \ combinational fanin(t).3. Existentially quantifyI variables fromBDD0. Note thatBDD0 represents the setSt0.4. for ( i = 1; i � k; i++ )
(a) Compute MLP [84] schedule(R1; : : : ; Rn) for REGISTERs supportingBDD i�1.(b) Rename all variablesr in BDD i�1 to r0, formingBDD i.(c) for ( j = 1; j � n; j++ )
i. BDD i = BDD compose(BDD i; r0j; frj ), which substitutesfrj (the BDDfor the next-state function of REGISTERrj) in place of variabler0j inBDD i.
ii. Perform early quantification ofI variables fromBDD i.iii. Minimize BDD i with BDD compactusingBDD0; : : : ;BDDi�1 asdon’t
cares.
iv. If BDD i is too large, assignk = i� 1 andreturn BDD i�1.(d) If BDD i is 0, then reportt asunreachable; return NULL.
5. return BDDk.Figure 8.1:Enlarge Target algorithm
simplify future scheduling decisions – initially, all FREE vertices are “inactive.” Our goal
is to minimize the lifetime of FREE vertex variables from activation until quantification,
and to delay the introduction of REGISTER variables. Each composition step eliminates
one next-state REGISTERvariabler0, and introduces zero or more present-state REGISTER
variablesr and FREE vertex variables. The following modifications of the MLP algorithm
have proven to be the most useful.� At each scheduling step, we schedule compositions of all REGISTERs with no in-
115
active FREE vertices in their support which introduce at most one REGISTER not
already in the BDD support. Each such composition eliminates the correspondingr0 variable from the BDD support, and adds at most oner variable to the support,
which is typically beneficial for minimizing peak BDD size. We next schedule com-
positions of all REGISTERs with zero inactive, and nonzero active, FREE vertices in
their support, regardless of their REGISTERsupport, to force quantification.� If no REGISTERsatisfies the above criteria, we instead activate a FREE vertex. When
choosing which FREE vertexv to activate, we select one which is in the support of an
unscheduled REGISTER with the fewest, though non-zero, inactive FREE vertices in
its support. Ties are broken to minimize the total number of REGISTERs not already
in the BDD support which would need to be introduced beforev could be quantified.
After each quantification, the intermediateBDD i is simplified by theBDD compact op-
eration [97], using the BDDs of previous iterations asdon’t cares.1 This simplification
constitutes aweakinductive proof of unreachability of the target; use of a constraint in-
stead of a don’t care would constitute an exact inductive proof. Note that the corresponding
induction hypothesis was previously discharged byBMC . The resulting simplifiedBDD isatisfies the following relation.St�i n i�1[j=0St�j � BDD i � i[j=0St�j (8.2)
Additionally, size(BDD i) � size(St�i), wheresize(St�i) represents the BDD node
count ofSt�i. TheBDD compactoperation cannot introduce new variables into the support
of a BDD, and may eliminate some. Hence it is well-suited for our goal of minimizing the
support of each preimage computation and thereby of the enlarged target. It is also this
goal that prompts us to keep eachBDD i distinct; taking their union may result in greater
1A similar reachability-based approach would exploit states that can hitt within k time-steps as don’tcares when assessing reachability of a state that can hitt in exactlyk steps. With this observation, wemay use these don’t cares also to simplify the next-state functions of REGISTERs, which may further reducecomplexity for a subsequent verification flow.
116
reductions throughBDD compact, though this union may be a costly operation. Using
don’t cares instead of constraints weakens our unreachability analysis, thus a fixed-point
may never be reached. However, as demonstrated by our experimental results, this weaker
approach is capable of solving or significantly simplifyingmany targets, which justifies the
chosen trade-off of precision versus computational efficiency.
If the BDD size at any step exceeds a given limit, the enlargement process is trun-
cated and the BDD of the previous iteration is returned. Thisprevents exceedingly large
enlarged targets which could harm the subsequent verification flow. We have found it bene-
ficial to use two limits, since the intermediate BDD size tends to be significantly larger than
that of the final, fully-quantified BDD: one hard upper-limiton BDD size to prevent the en-
largement process from consuming too many resources, and another smaller limit on the
finalBDD i size which reflects the potential increase in AND count of the resulting netlist.
As observed in [98], representing the target as a structure may often be beneficial
in a general toolset, in circumventing the need for a potentially costly interfacing of algo-
rithms (e.g., mapping simulation results to a BDD to check for intersection). Furthermore,
the ability to quantify FREE variables is useful to increase the probability of hitting atarget
with incomplete search techniques. We have found that this approach – from structure to
BDDs back to structure – is more effective in a flexible toolset than enlargement by purely
structural transformation [54]. The latter does tend to yield large, redundant structures
which may significantly hinder subsequent BDD- or simulation-based analysis; structural
quantification furthermore may entail an exponential increase in size. In contrast, our en-
largement approach often reduces the size of the target coneand thus enhances arbitrary
subsequent verification approaches.
Using SAT rather than BDDs for an inductive proof may occasionally be more effi-
cient. However, if unsuccessful, our BDD-based result may be reused to directly represent
the simplifiedfunction of thek-step enlarged target. A similar reuse is not possible with
a SAT-based method. Furthermore, without BDD-based analysis, it is virtually impossi-
117
ble to assess whether a given enlargement risks fatally hurting a subsequent BDD-based
engine in a transformation-based verification flow. In [99] it is proposed to apply cubes ob-
tained during an inductive SAT call as “lighthouses” to enhance the ability to subsequently
hit targets; such an incomplete approach, however, precludes the structural reductions of
our technique.
8.1 Target Enlargement Algorithms
In this section we discuss the overall flow of our decomposition algorithmEnlarge which
is illustrated in Figure 8.2. For each target, we first determine a limit on the number of en-
largement steps, and then call the algorithmEnlarge Target on targett. If Enlarge Target
reports ahit or unreachablesolution, this corresponding target has been discharged. Oth-
erwise a structure representing the enlarged target is added to N . This is performed by
creating a new netlistN 0 which encodes the function of the BDD of the enlarged target,
using a standard multiplexor-based BDD synthesis [100]. The output gate ofN 0, denoted
ast0, is a combinational function over the REGISTERs inN . The composition ofN andN 0, denoted asN k N 0, is then passed to a subsequent verification flow to attempt tosolvet0. For example, we may next apply retiming (refer to Chapter 6)and redundancy removal
(refer to Chapter 5), which have the potential to further reduce the netlist size, after which
we may wish to attempt another target enlargement. If a subsequent engine demonstrates
unreachability oft0, thent is also unreachable. If the subsequent verification flow hitst0, we
use simulation and anotherBMC to lift the trace for the parent verification flow as depicted
in Figure 8.3.
Theorem 8.1.Target enlargement is sound and complete for invariant checking.
Proof. We first consider the case that anunreachableresult is generated. There are three
conditions in which this result may occur. First, in algorithm Enlarge Target, if BMC
does not hit the target butk constitutes an upper-bound on diameter, an unreachability
118
void Enlarge(Netlist N)� foreacht 2 T1. Determine a limit on the number of enlargement stepsk as follows. Letd(t) rep-
resent an arbitrary upper-bound on the diameter oft. We assignk = min �d(t);user specifiedlimit).
2. InvokeBDDk = Enlarge Target�t; k; d(t)� to enlarge the target.
3. If t is unsolved, synthesizeBDDk into netlistN 0; composeN 0 ontoN ; andreplacet with t0 in T .
Figure 8.2: Top-level target enlargement flow
Partial Trace Lift Trace(Partial Trace p0)1. Completep0 overN 0 with Simulateup to the first hit oft0 to obtainp00.2. Cast aBMC of t for k time-steps from the last state ofp00, wherek is the number of
time-steps thatt0 was enlarged. This call must be satisfiable, and will yield tracep000.3. Concatenatep000 ontop00, overwriting the last time-step ofp00 with the first time-step
of p000, to obtainp0000.4. return p0000.
Figure 8.3: Target enlargement trace lifting algorithm
result is correct by the definition of diameter. Second, in the same function, if a givenBDD i becomes equivalent to 0, the unreachable result is correct by inductiveness;BMC
discharged our base case. Finally, if anunreachableresult for the enlarged target is reported
by a child verification flow, this result will be propagated upward. This result is correct by
noting that the enlarged target constitutes the characteristic function of the set of states
defined in formula (8.2): a subset of all states that can hit the target in0; : : : ; k steps, and
a superset of those that can hit the target in exactlyk steps minus those that can hit the
target in0; : : : ; k � 1 steps. SinceBMC has demonstrated that the target cannot be hit at
time0; : : : ; k�1 and the child flow has effectively proven that the target is not reachable at
timesi; : : : ;1wherei � k, this collectively constitutes a valid proof thatt is unreachable.
119
We next consider the case that atarget hit result is generated. If the target is hit by
BMC during enlargement, the result and trace are correct by assumption. The only other
target hit result will be generated if a child flow hits the enlarged target. We note that the
enlarged targett0 will first be hit along the child trace from a states 2 St�k nSk�1j=0 St�j since
BMC has demonstrated that the target cannot be hit at time0; : : : ; k�1. Furthermore, there
exists ak-step extension to the child trace which hitst for this same reason. Concatenation
of these two traces thus clearly yields a semantically correct trace which hitst.Theorem 8.2.Target enlargement generates a legal netlist.
Proof. We consider the requirements for legality enumerated in Definition 3.24.
1. The only gates fabricated by target enlargement are from the synthesis ofBDDk,which are correct by construction.
2. Since the original netlist is finite, and sincek must be finite (we will always obtain
a finite diameter overapproximation for a finite netlist using the algorithm of Fig-
ure 4.2), ourBMC instance will be finite. Furthermore,BDDk must be finite since it
is over a finite number of BDD variables, and it is synthesizedusing a straightforward
translation of one multiplexor per BDD node. ThusN k N 0 is finite.
3. Target enlargement does not alter initial values, hence all initial value cones are com-
binational by assumption.
4. The only logic created by target enlargement is a combinational function over REG-
ISTERs inN – no REGISTER, nor its fanin cone, is affected. Therefore target en-
largement cannot create combinational cycles.
Theorem 8.3. If the diameter of ak-step enlarged targett0 is d(t0), then the original targett is hittable withind(t0) + k time-steps, if at all.
120
Proof. If d(t0) = i, thent0 must be hittable at time0; : : : ; i� 1 if at all, as follows from the
definition of diameter. As per the proof of Theorem 8.1, ift0 is first hit at timej along tracep0, thent must be hittable at exactly timej + k along some tracep0000.Due to the nature of the temporal union in (8.2), and the quantification inherent in
target enlargement, it may be the case that a transition oft0 from 1 to 0 may be skewed
and possibly eliminated with respect to such a transition oft. For example, targett may be
an OR over an arbitrary coneA, and the functioncounter 6� 0 for a mod- counter. The
first hit of t via A may cause thecounterto unconditionally begin counting, such thattwill thereafter only be deasserted one time-step of every time-steps. Target enlargement
may obscure this deasserted time-step such that once hit,t0 will never be deasserted. The
number of time-steps necessary to drive a binary1 to t from any reachable state ofN may
be exponentially lesser than the number necessary to subsequently drive a binary 0 ontot; note thatBMC ensures that the initial value oft is 0. Therefore, target enlargement
does not entail as clean of an impact on diameter as we may hope; we cannot use a target
enlargement approach to bound the diameter of an intermediate component of a partitioned
netlist, for example. However, the result of Theorem 8.3 is sufficient to allow a bound
derived from the target-enlarged netlist to imply an upper-bound on the number of time-
steps sufficient to performBMC in a complete manner for the original target.
8.2 Related Work
There have been several research efforts related to target enlargement. The concept of using
preimage computation to enumerate the target-enlarged states for enhancing forward search
was first proposed by Yang and Dill [101] and independently byYuan et al. [91]. The latter
effort termed this approachretrograde analysis, borrowing from the artificial intelligence
community. Yang and Dill provide a more extensive study of the probabilistic increase of
simulation to hit the enlarged target in [92]. These works also propose ways in which to
121
use the enlarged target to prioritize state traversal. Unlike our approach, these efforts do not
offer structural reduction capability for the forward search since they represent the enlarged
targets as BDDs, nor do they propose the intertwined application of induction or diameter
bounding techniques. Furthermore, their preimage computation uses a transition relation
approach, which limits the size of the design to which it may be applied.
The work of [93] uses the notion ofcontrollabilitywhen isolating a localized cone of
a target for enlargement. Using a transition-relation based approach, they calculate the set
of states for which an arbitrary environment cannot preventthe localized target from being
reached. Due to their compositional approach, they are ableto scale to arbitrarily large
designs in cases as with our technique. However, this notionof controllability weakens the
enlargement potential; they may enumerate only the set of states for which hitting a target
is unavoidable rather than possible due to their compositional approach. For this reason,
our enumeration provides a larger set of states. They additionally do not address reduction
potential, or interaction with diameter bounding or inductive methods.
There are several variations to target enlargement, such astarget look-ahead[98],
which computes exactlyS t�k in formula (8.1). This calculation may be performed using
structural compose-based preimages [99] similarly to the BED-based preimage compu-
tation proposed in [54]. Quantification of FREE variables is performed by a translation
to-and-from BDDs (instead of purely structurally as with [54], which risks exponential in-
crease in structure size), representing the final result as anetlist. However, this approach
lacks the ability to use don’t cares and induction, hence does not offer the reduction or
unreachability capability which is the primary contribution of our technique.
The concept of alighthousemay be viewed as an incomplete target enlargement,
and may either be manually specified [58] or automatically-generated [102]. Like target
enlargement, the use of lighthouses may increase the probability of simulation hitting a
target; however, the incomplete nature of lighthouses precludes any reduction potential.
122
8.3 Experimental Results
In this section we provide experimental results for our target enlargement approach. All
experiments were run on an IBM ThinkPad model T21 running RedHat Linux 6.2, with
an 800 MHz Pentium III and 256 MB main memory. We set the peak BDD size to217nodes, and cappedBMC (using a structural SAT solver [51]) to 10 seconds per targetwith
an upper-bound of fifty steps.
Our first set of experiments were performed on the ISCAS89 benchmarks. The
results are provided in Table 8.1. Since these netlists haveno specified properties, we
labeled each primary output as a target. Column 1 provides the name of the benchmark. The
next columns provide results for two distinct runs: first a standard run using the techniques
as described in the previous sections, and second a “reduction-only” run which does not
applyBMC to solve the problem. Instead, ifBMC would solve the target ini steps, our
enlargement is performed to depthj = i � 1; if j < 1, we only buildBDD0 in Enlarge.
For the standard run in Column 2 we report the number of targets in the netlist, the number
of targets which are hit, and the number of targets that are proven unreachable. The number
of unreachable results proven with BDDs is provided in parenthesis. Any targets proven
unreachable by SAT use the structural diameter overapproximation algorithm of Figure 4.2.
In Column 3 we report the accumulated size of thecoi’s of unsolved targets in terms of the
number of REGISTERs and FREE vertices, and the number eliminated in the corresponding
enlarged cones. In other words, after the semi-colon we report the sum of thecoi size
of each unsolved target, and before the semi-colon we reportthe number of REGISTERs
and FREE vertices of the corresponding un-enlarged cones which wereeliminated by the
enlargement. Column 4 reports the average number of secondsspent per target, and the
peak memory usage. For the reduction-only run we reportcoi sizes and reduction results
(similar to Column 3) in Column 5.
123
Standard Run Reduction-Only RunDesign jT j; Hit; jRj (jIj) Time/jT j (s); jRj (jIj)
Unrch (BDDs) Eliminated ; Sum Memory (MB) Eliminated ; SumPROLOG 73 ; 69 ; 4 (0) 0 (0); 0 (0) 0.07 ; 15 146 (126); 2044 (1438)S1196 14 ; 14 ; 0 (0) 0 (0); 0 (0) 0.08 ; 12 24 (56); 88 (196)S1238 14 ; 14 ; 0 (0) 0 (0); 0 (0) 0.08 ; 12 24 (56); 88 (196)S1269 10 ; 10 ; 0 (0) 0 (0); 0 (0) 0.10 ; 15 289 (145); 296 (152)S132071 152 ; 131 ; 12 (9) 26 (0); 527 (18) 1.02 ; 107 3155 (302); 24244 (2172)S1423 5 ; 5 ; 0 (0) 0 (0); 0 (0) 0.13 ; 15 2 (0); 278 (69)S1488 19 ; 18 ; 0 (0) 0 (0); 6 (8) 0.74 ; 23 0 (0); 114 (152)S1494 19 ; 18 ; 0 (0) 0 (0); 6 (8) 0.89 ; 23 0 (0); 114 (152)S1512 21 ; 10 ; 0 (0) 8 (8); 437 (283) 8.22 ; 24 135 (93); 837 (543)S158501 150 ; 135 ; 8 (1) 451 (54); 1450 (174) 0.68 ; 63 2425 (321); 9683 (1301)S2081 1 ; 1 ; 0 (0) 0 (0); 0 (0) 0.27 ; 15 0 (0); 8 (10)S27 1 ; 1 ; 0 (0) 0 (0); 0 (0) 0.23 ; 12 0 (0); 3 (4)S298 6 ; 6 ; 0 (0) 0 (0); 0 (0) 0.12 ; 15 22 (6); 54 (18)S3271 14 ; 14 ; 0 (0) 0 (0); 0 (0) 0.11 ; 15 0 (0); 1248 (339)S3330 73 ; 73 ; 0 (0) 0 (0); 0 (0) 0.08 ; 15 146 (125); 2044 (1442)S3384 26 ; 26 ; 0 (0) 0 (0); 0 (0) 0.09 ; 15 26 (25); 2587 (425)S344 11 ; 10 ; 1 (0) 0 (0); 0 (0) 0.09 ; 15 6 (2); 129 (75)S349 11 ; 10 ; 1 (0) 0 (0); 0 (0) 0.09 ; 15 3 (1); 126 (74)S35932 320 ; 320 ; 0 (0) 0 (0); 0 (0) 2.01 ; 105 0 (0); 331776 (11200)S382 6 ; 6 ; 0 (0) 0 (0); 0 (0) 1.71 ; 15 34 (6); 96 (18)S385841 304 ; 301 ; 1 (0) 0 (0); 1377 (24) 1.54 ; 88 17925 (458);105273 (2564)S386 7 ; 7 ; 0 (0) 0 (0); 0 (0) 0.07 ; 14 0 (1); 42 (43)S400 6 ; 6 ; 0 (0) 0 (0); 0 (0) 1.72 ; 15 34 (6); 96 (18)S4201 1 ; 1 ; 0 (0) 0 (0); 0 (0) 0.25 ; 15 0 (0); 16 (18)S444 6 ; 6 ; 0 (0) 0 (0); 0 (0) 1.80 ; 15 34 (6); 96 (18)S4863 16 ; 16 ; 0 (0) 0 (0); 0 (0) 0.09 ; 15 0 (0); 1664 (784)S499 22 ; 22 ; 0 (0) 0 (0); 0 (0) 0.09 ; 16 0 (0); 484 (22)S510 7 ; 4 ; 0 (0) 0 (0); 18 (57) 6.64 ; 25 0 (0); 42 (133)S526N 6 ; 2 ; 0 (0) 8 (2); 64 (12) 10.44 ; 27 10 (2); 96 (18)S5378 49 ; 47 ; 1 (1) 4 (0); 164 (33) 0.59 ; 26 165 (37); 7087 (1456)S635 1 ; 0 ; 0 (0) 0 (0); 32 (2) 18.23 ; 15 0 (0); 32 (2)S641 24 ; 23 ; 1 (1) 0 (0); 0 (0) 0.09 ; 15 64 (64); 319 (338)S6669 55 ; 55 ; 0 (0) 0 (0); 0 (0) 0.08 ; 15 16 (0); 3061 (1466)S713 23 ; 22 ; 1 (1) 0 (0); 0 (0) 0.10 ; 15 64 (64); 304 (323)S820 19 ; 19 ; 0 (0) 0 (0); 0 (0) 0.23 ; 13 0 (0); 90 (324)S832 19 ; 19 ; 0 (0) 0 (0); 0 (0) 0.25 ; 13 0 (0); 90 (324)S8381 1 ; 1 ; 0 (0) 0 (0); 0 (0) 0.29 ; 15 0 (0); 32 (34)S92341 39 ; 37 ; 2 (0) 0 (0); 0 (0) 0.06 ; 16 146 (24); 1786 (317)S938 1 ; 1 ; 0 (0) 0 (0); 0 (0) 0.33 ; 15 0 (0); 32 (34)S953 23 ; 23 ; 0 (0) 0 (0); 0 (0) 0.13 ; 15 23 (8); 143 (288)S967 23 ; 23 ; 0 (0) 0 (0); 0 (0) 0.14 ; 15 23 (8); 143 (288)S991 17 ; 17 ; 0 (0) 0 (0); 0 (0) 0.08 ; 13 64 (564); 67 (629)
Table 8.1: Target enlargement results for ISCAS89 benchmarks
124
As indicated in Table 8.1, our techniques solve most targetsregardless of netlist size,
1575 of 1615, whether reachable or not. Refer to Table 6.5 forthe size of these netlists.
Though the “difficulty” of these targets is unknown, this is an indication of the robustness of
our approach. For netlists with unsolved targets, we achieve an average reduction per netlist
of 5.3% in REGISTER count and 5.0% in FREE vertex count, and a cumulative reduction
of 12.2% for REGISTERs and 10.3% for FREE vertices. Our reduction-only run yields an
average reduction per netlist of 13.9% in REGISTERs and 13.0% in FREE vertices.
In Table 8.2 we provide a similar analysis for randomly-selected targets from the
IBM Gigahertz Processor (GP), after performing phase abstraction (refer to Chapter 10).
Most targets, 254 out of 284, are solved; refer to Table 6.6 for the size of these netlists.
We achieve an average reduction per netlist of 12.1% in REGISTERs and 11.1% in FREE
vertices. The reduction-only run yields an average reduction per netlist of 54.9% in REG-
ISTERs and 54.8% in FREE vertices, and a cumulative reduction of 70.6% of REGISTERs
and 69.5% of FREE vertices.
We now discuss several results in more detail. IIBBQn is a large table-based netlist.
Forward reachability analysis of the redundancy removed [51] cone of a single unreach-
able target with a diameter of three (comprising 442 REGISTERs and 134 FREE vertices)
requires 172.3 seconds and 25 MB with a MLP [84] algorithm, with sift variable reordering
enabled and a random initial order. Ourcompose-basedsearch requires 34.7 seconds and
16 MB for the same BDD conditions. After one step of enlargement, the cone drops to 380
REGISTERs and 132 FREE vertices; the second step solves the target.
Netlist L FLUSHn is primarily acyclic; less than 5% of its REGISTERs are elements
of directed cycles. For one target with 38 REGISTERs and 47 FREE vertices, reachability
analysis of the redundancy-removed [51] target with MLP requires 1.20 seconds and 11
MB. Redundancy removal plus retiming [10] with MLP solves the target in 0.60 seconds
with 13 MB. Compose-basedsearch requires 0.50 seconds and 9 MB. The first two steps
of enlargement of this target reduce it to 4 then 2 REGISTERs, and 3 then 2 FREE vertices,
125
Standard Run Reduction-Only RunDesign jT j; Hit; jRj (jIj) Time/jT j (s); jRj (jIj)
Unrch (BDDs) Eliminated ; Sum Memory (MB) Eliminated ; SumCP RAS 2 ; 2 ; 0 (0) 0 (0); 0 (0) 0.61 ; 19 1 (0); 554 (131)CLB CNTL 2 ; 2 ; 0 (0) 0 (0); 0 (0) 0.24 ; 15 0 (0); 84 (12)CR RAS 1 ; 0 ; 0 (0) 0 (0); 401 (99) 3.55 ; 24 0 (0); 401 (99)D DASA 2 ; 2 ; 0 (0) 0 (0); 0 (0) 0.24 ; 15 11 (17); 20 (25)D DCLA 2 ; 1 ; 1 (1) 0 (0); 0 (0) 7.65 ; 44 273 (67); 469 (133)D DUDD 22 ; 14 ; 8 (6) 0 (0); 0 (0) 1.15 ; 25 491 (353); 1009 (725)I IBBQn 15 ; 8 ; 7 (0) 0 (0); 0 (0) 0.28 ; 60 190 (30); 2169 (437)I IFAR 2 ; 2 ; 0 (0) 0 (0); 0 (0) 0.31 ; 16 8 (0); 101 (35)I IFPF 1 ; 1 ; 0 (0) 0 (0); 0 (0) 2.72 ; 40 745 (152); 746 (154)L3 SNP1 5 ; 4 ; 1 (0) 0 (0); 0 (0) 1.21 ; 22 7 (0); 595 (164)L EMQn 1 ; 0 ; 1 (1) 0 (0); 0 (0) 11.57 ; 18 127 (89); 127 (89)L EXEC 2 ; 2 ; 0 (0) 0 (0); 0 (0) 0.47 ; 18 433 (200); 433 (200)L FLUSHn 7 ; 6 ; 1 (0) 0 (0); 0 (0) 0.11 ; 12 128 (170); 165 (222)L INTRo 30 ; 24 ; 6 (0) 0 (0); 0 (0) 0.06 ; 12 750 (626); 830 (672)L LMQo 16 ; 0 ; 8 (8) 0 (0); 2592 (1512) 14.01 ; 39 2568 (1512); 5160 (3024)L LRU 12 ; 5 ; 7 (7) 0 (0); 0 (0) 6.27 ; 19 721 (192); 721 (192)L PFQo 67 ; 0 ; 67 (66) 0 (0); 0 (0) 10.99 ; 77 10318 (3036); 10318 (3036)L PNTRn 31 ; 0 ; 31 (8) 0 (0); 0 (0) 2.92 ; 19 1057 (1023); 1057 (1023)L PRQn 10 ; 0 ; 8 (2) 24 (8); 36 (12) 0.30 ; 19 42 (16); 54 (20)L SLB 3 ; 1 ; 2 (0) 0 (0); 0 (0) 0.16 ; 15 1 (1); 61 (29)L TBWKn 21 ; 1 ; 3 (3) 2 (0); 291 (238) 17.07 ; 26 36 (28); 342 (280)M CIU 6 ; 1 ; 5 (0) 0 (0); 0 (0) 0.24 ; 18 775 (60); 775 (60)SIDECAR 4 1 ; 0 ; 0 (0) 1 (0); 137 (13) 18.64 ; 27 1 (0); 137 (13)S SCU1 3 ; 2 ; 1 (1) 0 (0); 0 (0) 0.66 ; 24 386 (142); 579 (213)V CACH 1 ; 0 ; 1 (1) 0 (0); 0 (0) 0.61 ; 16 86 (21); 86 (21)V DIR 2 ; 2 ; 0 (0) 0 (0); 0 (0) 0.20 ; 15 33 (16); 33 (16)V SNPM 2 ; 1 ; 1 (0) 0 (0); 0 (0) 1.27 ; 32 905 (266); 905 (266)W GAR 7 ; 6 ; 0 (0) 4 (0); 86 (37) 2.43 ; 20 4 (0); 500 (224)W SFA 8 ; 8 ; 0 (0) 0 (0); 0 (0) 0.08 ; 15 42 (21); 112 (56)
Table 8.2: Target enlargement results for GP netlists
respectively. The third step hits the target.
One target of netlist S158501 comprises 476 REGISTERs and 55 FREE vertices.
MLP-based analysis is infeasible on this cone, even after redundancy removal [51] plus
retiming [10] which yields 397 REGISTERs. However, the first five steps of structural
enlargement of this target reduce it to 475, 38, 36, 35, and finally 24 REGISTERs, and to
55, 55, 14, 13, and 13 FREE vertices, respectively. MLP-based forward reachability hits
the 5-step-enlarged target in 10 iterations with a combinedeffort of 2.5 seconds and 23
MB. The only other approach that is able to hit this target is a15-stepBMC which requires
7.3 seconds and 14 MB; if unreachable,BMC would not have been applicable. Traditional
approaches of target enlargement would be ineffective on this netlist since they do not offer
reduction capability, without which the enlarged target remains infeasibly complex.
126
Chapter 9
C-Slow Abstraction
In this chapter we discuss our generalized -slow abstraction techniques, extending the re-
sults of collaborative work with Anson Tripp, Adnan Aziz, Vigyan Singhal, and Flemming
Andersen reported in [17]. The goal of this abstraction is toreduce REGISTER count and
diameter; in doing so, we often benefit BDD-based algorithmsdue to reducing variable
count, which often reduces their size and reordering time. Further, the removal of REG-
ISTERs allows “collapsing” of adjacent logic cones to a single combinational cone, which
increases the domain of applicability of combinational redundancy removal techniques,
thereby helping to enable a smaller netlist graph which benefits arbitrary algorithms. How-
ever, this elimination of REGISTERs also risks explosion of BDD sizes representing these
composite cones; this abstraction thus has the potential toharm BDD-based analysis. Nev-
ertheless, as our experiments demonstrate, this abstraction tends to enhance BDD-based
analysis; refer to Section 9.3 for a more detailed discussion of this topic.
Leiserson and Saxe [68, 66] define a -slow netlistN as one which is retiming-
equivalent (i.e., may be made structurally equivalent through retiming, ignoring initial
value cones) to another netlistN 0, where the number of REGISTERs along each net ofN 0 is a multiple of . NetlistN 0 may be viewed as having equivalence classes of REGIS-
TERs; those in classi may only fan out to those in class(i + 1) mod . Each equivalence
127
class of REGISTERs ofN 0 contains data from an independent stream of execution, and data
from two independent streams may never arrive at any vertex concurrently. Intuitively, it is
this property which allows -slow abstraction to “fold” such designs to a smaller domain
of a single coloring of REGISTERs – rendering a netlist where each vertex may be a func-
tion of each data source at each time-step. They demonstratehow designs may be made
systolic1 throughslowdown(increasing ) and retiming, and how this process may signif-
icantly benefit the clock period of such designs through reducing their maximum-length
combinational path.
Definition 9.1. A c-slow netlist, for > 1, is one whose gates may be -colored. We denote
the coloring functionC : V 7! 0; : : : ; � 1, defined as follows.
1. If the color of REGISTER r isC(r), then the color of each REGISTERv in the com-
binational fanout ofr is�C(r) + 1� mod .
2. If the color of REGISTER r is C(r), then the color of each non-REGISTER v in the
combinational fanout ofr isC(r).3. If the color of REGISTERr isC(r), then the color of each gatev in the combinational
fanin of inlist(r) is�C(r)� 1 + � mod .
4. If the color of REGISTERr isC(r), then the color ofZ(r) isC(r).5. The color of each target is � 1.
The last two rules of Definition 9.1 are additions to the definition of [66] to fit our
verification paradigm. In Definition 9.4 we generalize this definition to ensure that these
conditions, and others, do not preclude the application of -slow abstraction. For optimality
of reduction, we wish to find the maximum consistent with this definition; if = 1,
1Informally, a systolic netlist is one in which predefined SCCclusters connect to others only through pathsof strictly positive sequential weight.
128
then the design is not -slow. This definition lends itself to a simple linear-time coloring
algorithm for determining a maximal which will be provided in Figure 9.7.
Consider the 3-slow netlistN depicted in Figure 9.1. We label vertices according
to their color: e.g., REGISTERs Ri have colori. NetlistN is defined by the following
expressions:_p(VA2; i) = f2� _p(VF2; i); _p(I2; i)�; _p(VB0; i + 1) = _p(VA2; i); _p(VC0; i) =f0� _p(VB0; i); _p(I0; i)�; _p(VD1; i + 1) = _p(VC0; i); _p(VE1; i) = f1� _p(VD1; i); _p(I1; i)�; and_p(VF2; i + 1) = _p(VE1; i). Through unrolling, we obtain the expression_p(VF2; i + 3) =f1�f0(f2( _p(VF2; i); _p(I2; i)); _p(I0; i+1)); _p(I1; i+2)�. This illustrates thatVF2 is a func-
tion with modulo-3 feedback. Similar analysis demonstrates that all nets within the SCC
have this property.
f2f1R1f0 VE1VD1VC0VA2 VB0
R2R0 VF2I0I1
I2Figure 9.1: Example three-slow netlistN
Consider netlistN 00 depicted in Figure 9.2, and netlistN 0 depicted in Figure 9.3,
which collectively comprise our -slow abstraction ofN . We useN 0 as aninitialization
structurefor N 00 which is therecurrence structureof the -slow abstracted netlist.
NetlistN 00 depicted in Figure 9.2 represents the recurrence structureof N . We ob-
tain the expressions_p(V 00A2; i) = f2� _p(V 00F2; i); _p(I 002 ; i)�; _p(V 00B0; i) = _p(V 00A2; i); _p(V 00C0; i) =f0� _p(V 00B0; i); _p(I 000 ; i)�; _p(V 00D1; i) = _p(V 00C0; i); _p(V 00E1; i) = f1� _p(V 00D1; i); _p(I 001 ; i)�; and fi-
nally _p(V 00F2; i + 1) = _p(V 00E1; i). We additionally define_p(V 00F2; 0) = _p(V 0F1; 0), the latter
of which is defined in the initialization structureN 0 depicted in Figure 9.3. Through un-
rolling, we obtain _p(V 00F2; i + 1) = f1�f0(f2( _p(V 00F2; i); _p(I 002 ; i)); _p(I 000 ; i)); _p(I 001 ; i)�. This
129
f0 f1f2 V 00E1
V 00B0V 00A2I 002I 000I 001V 00D1V 00C0 R002 V 00F2
Figure 9.2: Abstracted three-slow netlistN 00: recurrence structure
f0 f1V 0A2f2 Z(R0)0 V 0E1V 0D1Z(R1)0 Z(R2)0
01 01 V 0F2I 01m0I 02I 00m1 n2n1
V 0B0 V 0C0n0
Figure 9.3: Abstracted three-slow netlistN 0: initialization structure
illustrates thatV 00F2 is a function with modulo-1 feedback in the abstracted netlist.
The initialization structure ofN 00 is N 0, depicted in Figure 9.3. We obtain the
expressions_p(V 0A2; i) = f2� _p(V 0F2; i); _p(I 02; i)�; _p(V 0B0; i) = _p�(Z(R0)0; i�; _p(V 0C0; i) =f0� _p(V 0B0; i); _p(I 00; i)�; _p(V 0D1; i) = ite� _p(n1; i); _p(Z(R1)0; i); _p(V 0C0; i)�; _p(V 0E1; i) = f1�_p(V 0D1; i); _p(I 01; i)�; and finally _p(V 0F2; i) = ite� _p(n2; i); _p(Z(R2)0; i); _p(V 0E1; i)�. TermZ(Rj)0 represents a copy of the initial value coneZ(Rj) fromN .
Definition 9.2. A -slow abstractionis a structural transformation of a -slow netlistN as
follows. We first preprocess the targets of the netlist to ensure that they are color � 1.
For each targett 2 T , if C(t) = � 1 then no action is necessary. Otherwise, we create a
sequence of � 1� C(t) REGISTERs with initial values of ZERO connected in series, the
first being sourced byt, and re-label the last REGISTER in the sequence as the target.
130
We next create two correspondents for each vertex inN : one for an initialization
structureN 0, and one for the sequential recurrence structureN 00. We createN 0 as follows.� We replace each non-REGISTERgatev by an identical gatev0 in N 0.� We replace each REGISTERv of colorC(v) = 0 by a 1-input AND gate sourced by
the correspondent ofZ(v).� We replace each REGISTER v of color C(v) 6= 0 by a multiplexorv0 in N 0. The
“then” input is driven by the correspondent ofZ(v). The “else” input is driven by
the correspondent ofinlist(v). The selector is driven bynC(v), defined as follows.
– We createi = dlog2( )e new FREE verticesm0; : : : ; mi�1.– nj = �unsigned(m0; : : : ; mi�1) � j� for each0 < j < .– n0 = V �1j=1 :nj.
We createN 00 as follows. Eacht00 corresponding tot 2 T will be an abstracted
target.� We replace each non-REGISTERgatev by an identical gatev00 in N 00.� We replace each REGISTERv of colorC(v) 6= � 1 by a 1-input AND gatev00.� We replace each REGISTERv of colorC(v) = � 1 by another REGISTERv00. The
initial value ofv00 is v0 fromN 0.Intuitively, the nondeterministic valuesni allow us to initialize the REGISTERs ofN 00 with the set of all valuations that the corresponding REGISTERs of N could take at
times0; : : : ; �1 by selecting an initial value of a specific color. Thereafter, straightforward
reachability analysis will ensure correspondence ofN and ~N = N 0 k N 00. This allows the
subsequent verification flow to select any of these initial values that may be necessary to hit
131
a target; unreachability may be assessed only when it is determined that no color of initial
values may hit the target. Note that we only explicitly use valuesnj for 0 < j < to signify
the selection of initial values of colorj; n0 signifies that none of thesej colors have been
selected hence implicitly selects the initial value of color 0. To simplify the subsequent
verification task, we normalize all targets to be color � 1. For optimality, we prefer a
coloring from the set of possible colorings implied by Definition 9.1 which assigns � 1to as many targets as possible. However, this normalizationentails a small overhead since
the added pipeline REGISTERs are sinkless, hence may be trivially removed by retiming or
structural target enlargement.
Generalizing our analysis from 3-slow to arbitrary -slow netlists, we obtain the
following expression for the color- � 1 verticesV �1. We define the numerical sequencea0; : : : ; a �1 asai = � 1� i._p(V �1; j) = 8>>>>><>>>>>:ff � 1g� _p(Z(R �1); 0); _p(I �1; 0)� : j = 0ffa0g� : : : (ffajg( _p(Z(Raj ); 0); _p(Iaj ; 0));: : : ); _p(Ia0 ; j)� : 0 < j < (9.1)
In formula (9.1), the first sequence represents a nesting offfa0g�ffa1g(: : : (ffajg. We
use this same sequencing in (9.3). The second sequence represents the closing of these
functions with the corresponding FREE vertices _p(Iaj ; 0)); : : : ); _p(Ia1 ; j � 1)); _p(Ia0 ; j)�.In (9.3), the FREE vertex ordering is identical, but the temporal arguments are all 0._p(V �1; i+ ) = ffa0g� : : : (ffa �1g(ff � 1g( _p(R �1; i); _p(I �1; i));_p(Ia �1 ; i+ 1)); : : : ); _p(Ia0 ; i+ )� (9.2)
In formula (9.2), the first sequence representsffa0g�ffa1g(: : : (ffa �1g. We use this
same sequencing in (9.4). The second sequence represents the closing of these functions
with the corresponding FREE vertices _p(Ia �1 ; i+1)); : : : ); _p(Ia1 ; i+ � 1); _p(Ia0 ; i+ )�.In (9.4), the vertex ordering is identical, but the temporalarguments arei; i; : : : ; i; i+ 1.
132
After -slow abstraction, we obtain the following._p(V 00 �1; 0) = � n �1 ^ ff � 1g( _p(Z(R �1)0; 0); _p(I 00 �1; 0)) � _ �1_j=1� naj ^ ffa0g� : : : (ffajg( _p(Z(Raj )0; 0); _p(I 00aj ; 0));: : : ); _p(I 00a0 ; 0)�� (9.3)_p(V 00 �1; i+ 1) = ffa0g� : : : (ffa �1g(ff � 1g( _p(R00 �1; i); _p(I 00 �1; i));_p(I 00a �1 ; i); : : : ); _p(I 00a0 ; i+ 1)� (9.4)
The key observations that follow from formulas (9.1 - 9.4) are the following.� As demonstrated by (9.2), valuations toV �1 of the -slow netlist at timei + and
greater are a function ofR �1 only from the -th predecessor time-step, and of each
FREE vertex at most once during the previous time-steps.� As demonstrated by (9.4), valuations toV 00 �1 of the abstracted netlist at timei+1 and
greater are a function ofR00 �1 only from the previous time-step, and of each FREE
vertex from the previous time-step.� Since exactly one of thenj terms are asserted at any time, (9.1) and (9.3) demonstrate
that the initial states of the abstracted netlist correspond to all reachable valuations toV �1 at times0; : : : ; � 1.
These observations demonstrate that valuations toV �1 andV 00 �1 directly correspond
with a time-folding modulo- . Based upon this analysis, we introduce our notion of -slow
bisimilarity.
Definition 9.3. A -slow bisimulation relation2 with respect tobisimilar vertex setsA andA00, whereC(A) = � 1, holds between -slow netlistN and its abstraction~N = N 0 k2Though in this thesis we only require trace equivalence for invariant checking, as we demonstrate in [17]
this abstraction preserves a type of bisimilarity.
133
N 00, respectively, iff there exists a bijective mapping : A 7! A00 which satisfies the
following conditions.
1. 8p 2 P:8i < :9~p 2 ~P:8j 2 N :8a 2 A: p(a; i+ j � ) = ~p� (a); j�2. 8~p 2 ~P:9p 2 P:9i < :8j 2 N :8a 2 A: p(a; i+ j � ) = ~p� (a); j�
We restrict -slow bisimilarity to color- � 1 vertices so that we may directly use
the results of (9.1-9.4). Without this restriction, additional effort is necessary to map~Nto and fromN with respect to initial states. Practically, since we may “rotate” coloring,
and since we “pad” targets to make them color- � 1, this restriction is not limiting for
invariant checking.
Lemma 9.1. If ~N = N 0 k N 00 is a -slow abstraction ofN , thenN is -slow bisimilar to~N with respect to any corresponding vertex setsA andA00 such thatC(A) = � 1.
Proof. From (9.1) and (9.3), for any valuation toA � V �1 reachable at timei < in
tracep, there exists an equivalent valuation ofA00 � V 00 �1 at time 0 in trace~p which hasn �1�i = 1.
The correspondence of transitions ofN to single transitions of~N follows from
(9.2) and (9.4). In particular, for anyp andi < there exists a~p with ~p(n �1�i; 0) = 1 such
that valuations toV �1 at timesi+j � andi+k � in p are equivalent to valuations toV 00 �1 at
timesj andk in ~p for all j andk. Similarly, for every~p there exists ani : ~p(n �1�i; 0) = 1and ap such that valuations toV �1 at timesi + j � andi + k � in p are equivalent to
valuations toV 00 �1 at timesj andk in ~p.We now discuss generalizations of netlist topologies suitable for -slow abstraction,
using a type of logic replication. We note that Definition 9.1is overly restrictive from the
view-point that many REGISTERs are likely to have ZERO as their initial values. However,
if two REGISTERs of different colors have ZERO as their initial values, then the netlist is
not -slow by this definition. We clearly could replicate ZERO for each colora, and use
134
the properly-colored copy ZEROa for each corresponding initial value to create a -slow
netlist without altering semantics. Due to this example, and by the observation that no
vertex in the netlist may ever be a function of more than one color of initial value anyway,
we may wish to globally relax Rule 4. However, this relaxation would be unsound with
our abstraction. For example, assume that the initial valueof a color-0 REGISTER is FREE
vertexv, which also fans out to functionf1 of color 1. If we perform -slow abstraction on
this netlist, due to its temporal folding we could miss a certain hit of a target which requiresv as the initial value of the REGISTER to be a 1 at time 0, and also requiresv as the source
of f1 to be 0 at time 1.
We therefore introduce a preprocessing transformation of the netlist which allevi-
ates these restrictions, and enables a generalization of the -slow topology for verification
purposes. As we discussed in [17], we may readily relax Rule 3of Definition 9.1 and
yield a bisimilar model by replicating multi-colored combinational cones. This relaxation
allows combinational cones of logic to fan out to multiple colors of REGISTERs. We now
formalize this concept, and extend it to handle multi-colored sequential cones.
Definition 9.4. A generalized -slow netlistis one in which each directed cycle has sequen-
tial weight which is a non-zero multiple of > 1. Given a generalized -slow netlistN , we
may attribute a hypercoloringC : V 7! 2f0;:::; �1g, defined as follows.
1. If REGISTER r has colora 2 C(r), then each gatev in the combinational fanin of
inlist(r) has color(a� 1 + ) mod 2 C(v).2. If non-REGISTER v has colora 2 C(v), then each gateu in the combinational fanin
of inlist(r) has colora 2 C(v).3. If REGISTERr has colora 2 C(r), then the initial value ofr has colora 2 C�Z(r)�.
For optimality, we wish to minimally hypercolor a generalized -slow netlist; an
efficient coloring algorithm is presented in Figure 9.7. We may obtain a -slow netlistN135
from a generalized -slow netlistN using the algorithm of Figure 9.4. This preprocessing
transformation is itself sound and complete for a generalized -slow netlist; however, since
it often increases vertex count of each type, it is not usefulas a standalone transformation.
Therefore, we introduce this transformation only as a preprocessing to enable generalized -slow abstraction, which will offset its potential increase in REGISTER count. Note that
we develop a vertex mappingV in addition to coloringC through this preprocessing.
Lemma 9.2. If N was obtained via a preprocessing of generalized -slow netlistN , thenN is legal and is a -slow netlist.
Proof. We first prove thatN is legal. Note thatN is legal by assumption. We consider the
requirements for legality enumerated in Definition 3.24.
1. The preprocessing step only generates legal gates. By thehypercoloring of Defini-
tion 9.4, the indegree of each gateua generated by the preprocessing is identical to
that ofu such thatV (ua) = u. Padding REGISTERs are also legal by construction.
2. The number of gates ofN is at most � (jVj + 2 � jT j), and we may ensure that is
finite for any legal netlist, henceN is finite.
3. The only initial values generated by preprocessing are either ZERO which is com-
binational, or are replications of initial values ofN hence are combinational by as-
sumption.
4. Any directed cycleA of a generalized -slow netlist has sequential weight ofi � for somei > 0 by assumption. By the hypercoloring rules of Definition 9.4,every
correspondent of cycleA in the preprocessed netlist will have the same sequential
weight as the originalA.
136
h Netlist, C, V i PreprocessC Slow(Netlist N , Hypercoloring C) fC = V = V = E = G = Z = T = ;;foreach u 2 V f
foreach a 2 C(u) fadd new vertex ua to V;G(ua) = G(u); V (ua) = v; C(ua) = a;gg
foreach u 2 V fforeach v 2 inlist(u) f
foreach a 2 C(u) fb = color of v consistent by Definition 9.1w.r.t. u having color a;
add edge (vb; ua) to E;gggforeach u 2 R f
foreach a 2 C(u) fZ(ua) = fva : �V (va) � Z(u)� ^ �C(va) � a�g;ggforeach t 2 T fu = t = fmax-colored v : V (v) � tg;
for�a = C(t)+1; a � � 1; a++� f
create REGISTER v; C(v) = a; Z(v) = ZEROa;add edge (u; v) to E;u = v;gT = T [ u;gN = hhV; Ei; G; Z; T i;
return hN;C; V i;gFigure 9.4: Algorithm for preprocessing generalized -slow netlists
137
We next prove thatN is -slow. Every edge added toN maintains the satisfaction
of the coloring rules of Definition 9.1 by construction. Additionally, we guarantee that the
color of initial values of REGISTERs are equivalent to the corresponding REGISTERs, and
that our set of targets are of color � 1.
Lemma 9.3. If N was obtained via a preprocessing of generalized -slow netlistN , then
vertex setA of N is trace-equivalent to any corresponding vertex setA of N such that8a; a0 2 A:C(a) = C(a0).Proof. The intuition of this proof is that each “clone”vj of a given vertexv 2 A for a given
colorj is trace-equivalent tov. Because every directed cycle ofN hasmod -c REGISTERs,
no vertex inN can “observe” the replication inherent in preprocessing.
We first assume that no REGISTERs are replicated. Take any color-j setA derived
via preprocessing fromA. Because every directed cycle has modulo- REGISTERs, it fol-
lows that valuations toVj at timei + are exclusively functions ofRj at timei, and FREE
vertices at time-steps duringi; : : : ; i+ similarly to (9.2). Unlike the -slow case, for gen-
eralized -slow netlists, a given multi-colored FREE vertexv may appear in this expression
during multiple time-stepsi; : : : ; i + � 1. However, sincev may take valuations at each
time-step independently of other time-steps, clearly all possible transitions of valuations
to Vj from time-stepi to i + are trace-equivalent to those possible if we replicate the
multi-colored FREE vertices into one copy per color, and replace each color-a occurrence
of v with va, the “properly-colored” copy ofv. Therefore, we conclude that sequences of transitions out of any state are trace-equivalent with respect toA andA. Also, similarly
to (9.1), it follows that valuations toVj at timei < are a function only of initial values of
REGISTERs Rj�i and FREE vertices. Using the above analysis, any cross-dependencies or
lack thereof between initial value cones and the fanin cone of A are preserved inA because
the initial value of a REGISTER must have the same color as that REGISTER. We there-
fore conclude that preprocessing preserves trace-equivalence, provided that no REGISTER
was replicated. This result is similar to Lemma 7.1 for cut-based abstraction; -slow pre-
138
processing effectively replaces semantic cuts of formulas(9.1 - 9.2) with trace-equivalent
cones. In the -slow case, these semantic cuts are on a per-color basis.
We next eliminate the restriction that no REGISTER is replicated in preprocessing.
Because each clone of a REGISTER is of a distinct color, and has an identically-colored
initial value, we note that the initial value of eachr 2 R corresponds tor 2 R : V (r) = r.Therefore, we may conclude that time-0 valuations toA, determined uniquely by initial
values ofR and FREE vertices of colorC(A), are trace-equivalent to those ofA since these
valuations are a function of a single color. Valuations to the color-j vertices at timei define
the state of the color-(j + 1) mod REGISTERs at timei + 1. We therefore conclude by
a simple inductive argument that -slow preprocessing preserves trace-equivalence for all
color-j vertices for anyj. This lemma follows by assigningj = C(A).Lemma 9.3 implies that, regardless of how many copies we makeof a targett, we
need only verify one – the one which will be preprocessed to have color �1. We introduce
our -slow trace lifting algorithm in Figure 9.5, which is the last necessary component to
demonstrate soundness and completeness of -slow abstraction.
Theorem 9.1.C-slow abstraction is sound and complete for invariant checking.
Proof. A target unreachableresult will be generated only if the abstracted target~t is proven
to be unreachable by a child verification flow. We first note that padding REGISTERs with
initial values of ZERO onto a target does not affect its unreachability. Second, wenote
that preprocessing preserves trace-equivalence with respect to any color of vertices and
hence invariant checking as per Lemma 9.3. Third, -slow abstraction guarantees that
every valuation to a color- � 1 vertex is preserved through -slow abstraction due to -slow bisimilarity as per Lemma 9.1. Therefore,unreachableresults are correct.
A target hit result, accompanied by a trace demonstrating a hit of the target, will
be generated only when an abstracted target~t is hit by a child verification flow. By as-
sumption, the corresponding tracep0 is semantically correct and hits~t. We first note that
139
Partial Trace Lift Trace(Partial Trace p0) fcomplete p0 up to its length with Simulate;n = fi : p0(ni; 0) = 1g;i = � 1� n;foreach v 2 V f
if�n � C(v) < � 1� fp = p [ h�v; C(v)� n�; p0(v0; 0)i;gg
for�j = 0; j < length(p0); j++� fforeach v 2 V fk = �C(v) + 1� mod ;p = p [ h(v; � j + i+ k); p0(v00; j)i;gg
return p;gFigure 9.5:C-Slow trace lifting algorithm
any trace hitting a REGISTER-padded target must previously hit the un-padded target. By
Lemma 9.3, preprocessing preserves trace-equivalence. ByLemma 9.1,~t will be hittable
iff t is hittable. Thus our obligation is only to demonstrate thattrace lifting yields a seman-
tically correct trace.
The lifting of values fromN 0 for the firsti time-steps directly reflects (9.1) and (9.3),
hence is consistent withN . From (9.2) and (9.4), the effect of -slow abstraction is to fold
time modulo- , hence our lifting of values fromN 00 multiplies time by . The addition ofiaccounts for the temporal folding of initial states from (9.1) and (9.3), and correlates to the
bisimilarity offset of Definition 9.3. The addition ofk is necessary for the generalization
of (9.2) and (9.4) to arbitrary-colored vertices.
Note that a vertex which was not replicated will only have valuations inp at most
once per consecutive time-steps. For a vertexv that was replicated fork colors, we will
attaink valuations tov per consecutive time-steps. The correctness of the trace lifting
given in the presence of replications follows from the trace-equivalence demonstrated in
Lemma 9.3. Therefore, we conclude that our lifted trace is semantically correct.
140
Theorem 9.2.A -slow abstracted netlist is a legal netlist.
Proof. By Lemma 3.24, preprocessing yields a legal netlist. We therefore need only prove
that the -slow abstraction procedure yields a legal netlist. We consider the requirements
for legality enumerated in Definition 3.24.
1. The only gates generated by -slow abstraction other than cloned vertices are one-
or-more input AND gates, one-input INVERTERs, FREE vertices, and multiplexor
structures, all of which are legal by construction.
2. For bothN 0 andN 00, each gate ofN is either cloned, or translated into a buffer or
multiplexor. Theni logic requires at most 2 AND and FREE and INVERTER gates.
We may guarantee that is finite, hence the abstracted netlist will be finite.
3. The only REGISTERScreated in the abstracted netlist have initial values defined byN 0, which is a purely combinational structure. Hence all initial values will be com-
binational.
4. Every directed cycle initially contains at least one REGISTER of each color. N 0contains no cycles since color-0 REGISTERs have their initial values inlined, which
are combinational and acyclic by assumption. Every cycle inN 00 will contain at least
one preserved REGISTERof color- �1. SinceN 0 andN 00 are composed in an acyclic
fashion, ~N has no combinational cycles.
Theorem 9.3. If the diameter of a set of verticesU 00 of -slow abstracted netlist~N = N 0 kN 00 is d(U 00), then the diameter of the corresponding set of verticesU of the original netlistN , provided that8u 2 U:C(u) = � 1, is at most � d( ~U).Proof. By Definition 4.2, if the diameter of the -slow abstracted vertex setU 00 is d(U 00),then the longest required duration to witness a particular valuation toU 00 is d(U 00) time-
steps. From Lemmas 9.1 and 9.3, we know that -slow abstraction folds time modulo- .141
void C Slow Abstract(Netlist N )1. Color netlistN using algorithmColor Generalized C Slow to determine .� If = 1, no abstraction is possible.� If = 1, the netlist is acyclic; skip the abstraction or assign to be the maxi-
mum color of any vertex.
2. PreprocessN using algorithmPreprocessC Slow to yield -slow netlistN .
3. Perform -slow abstraction uponN using the algorithm provided in Definition 9.2.
Figure 9.6:C Slow Abstract algorithm
Therefore, any transition of states in~N correlates to transitions inN , and the correspond-
ing valuation toU will occur within �d(U 00) time-steps. Note that the first �1 time-steps
account for any necessary delay inN to produce a corresponding initial value of~N .
9.1 C-Slow Abstraction Algorithms
In this section we provide our algorithms for -slow abstraction. Several core algorithms
were introduced in Figures 9.4 and 9.5 for preprocessing generalized -slow netlists and
for trace lifting, respectively. Our top-levelC Slow Abstract function of Figure 9.6 calls
our coloring algorithm depicted in Figure 9.7. Performing acone-of-influence reduction,
and redundancy removal, prior to -slow abstraction is beneficial to prevent unnecessary
logic from reducing . GCD refers to agreatest-common divisor. We note that if we
obtain = 1, then all vertices will have color0 hence -slow abstraction is not useful.
Because a legal netlist is finite, = 1 uniquely identifies an acyclic netlist; in such cases,
we assign to be the maximum color of any vertex. However, use ofBMC with a tight
diameter overapproximation resulting from our technique from Chapter 4 is often a superior
verification strategy in such cases.
142
Hypercoloring Color Generalized C Slow(Netlist N) fC = ;; =1; visited(V) = ?;foreach v 2 V f
if�C(v) � ;� f
Color(v,jVj, , C);ggsubtract minfa : 9v 2 V:a 2 C(v)g from each value in C;return C;g
void Color(Vertex v; N color; N ; Hypercoloring C) fcolor = color mod ;if
�color 2 C(v)� freturn;g
if (visited(v) � >) fa = maxfa 2 C(v)g; a0 = maxfa; olorg �minfa; olorg;if (a0 mod 6= 0) f = GCD( ; a0); Normalize(C; );greturn;gC(v) = C(v) [ color;visited(v) = >;
new color = � + olor� (G(v) � REGISTER)� mod ;foreach u 2 inlist(v) f
Color(u; new color; ; C);gif
�G(v) � REGISTER� f
Color(Z(v); color; ; C);gvisited(v) = ?;gvoid Normalize(Hypercoloring C; N new c)f
foreach v 2 V fforeach a 2 C(v) fC(v) = (A n a) [ (a mod new c);ggg
Figure 9.7: Algorithms for coloring generalized -slow netlists
143
The running time of our -slow algorithms areO� � (jEj+ jVj+ jT j)�. This follows
from noting that in the worst case, each vertex will obtain every color during algorithm
Color hence preprocessing will need to replicate every vertex times, and pad every target
with at most REGISTERs. This bound assumes that the number of calls toNormalize
will be a constant factor of ; in a pathological and rare case, the number of calls may
become logarithmic injRj. This function is actually unnecessary except as a final step
of Color GeneralizedC Slow; its processing may be emulated during reads and writes
of C, though this clutters the exposition of the algorithms. Theabstraction process itself
performs a linear sweep over the preprocessed netlist, replicating each vertex twice.
9.2 Related Work
The use ofslowdown(increasing ) as a design optimization technique was first proposed
by Leiserson and Saxe [68, 66]. They demonstrate that slowdown coupled with retiming is
capable of yielding significant reductions in the clock period of a design through decreasing
its longest combinational path. They also provide algorithms to increase and decrease as
a design technique.
The topic of retiming (refer to Chapter 6) is a related, yet orthogonal structural
transformation. Retiming itself is insufficient to achievethe results of -slow abstraction –
for example, retiming cannot alter the weight of a directed cycle. However, retiming is
a complementary technique which yields different types of reductions – for example, the
REGISTERplacement after -slow abstraction will match that of the color- �1 REGISTERs
before the abstraction, whereas retiming may move REGISTERs to fairly arbitrary positions.
Phase abstraction (refer to Chapter 10) is a topologically-related though fundamen-
tally different state-folding approach. Phase abstraction applies only to LEVEL-SENSITIVE
LATCH-based netlists, wheres -slow abstraction is applicable to REGISTER-based netlists.
It is possible that repeated -slow abstractions may be useful, interleaved with other trans-
144
formations that render increasing simplifications; phase abstraction is useful at most once
for a verification run. Semantically, in multi-phase designs, only one class of latches up-
dates at each time-step, hence the LATCHes stutter.C-slow designs generally do not stutter
whatsoever. Furthermore, the initial values of all but one class of LATCHes will be over-
written before propagation. In contrast, all initial values of a -slow design may propagate.
Overall, these two techniques are complementary.
For acyclic netlists, we have demonstrated in [17] that a simple modification of -slow abstraction may be used to yield a purely combinationalnetlist. This is a similar
result as using our diameter bounding algorithms presentedin Chapter 4 to render a com-
binational netlist through unfolding. However, the diameter overapproximation algorithms
are superior to -slow abstraction in such cases, since they enable tight bounds and obviate
the need for creating theni logic.
9.3 Experimental Results
Our experimental results were performed with the model checker RuleBase [103]. All re-
sults were obtained on an IBM RS/6000 Workstation Model 595 with 2 GB main memory.
We arbitrarily selected ten components of IBM’s Gigahertz Processor which had previously
been model checked. Our algorithms identified two of these asbeing -slow. The first is an
acyclic pipeline with aninstruction qualifierinput that combinationally fans out to multiple
stages; the second is an intricate cyclic five-slow pipelinewith an asynchronous interrupt
to every stage. These multi-colored inputs prevent the classification of these designs as -slow by Definition 9.1, but our generalized Definition 9.4 enables a -slow classification.
Both of these examples were explicitly entered in HDL as -slow designs – this topology
is not the by-product of a synthesis optimization. Both of these components had been un-
dergoing verification for more than 12 months prior to the development of this abstraction
technique. Consequently, the unabstracted variants had very good BDD orders available.
145
RuleBase was run with phase abstraction [16] and its redundancy removal reductions en-
abled, and with dynamic BDD reordering enabled using the technique of Rudell [104].
Prior to performing -slow abstraction, we performed a structural transformation to elimi-
nate scan chain connections between the REGISTERs (which unnecessarily limited ), and
to cut self-feedback loops on constant REGISTERs.
We first deployed this abstraction technique on the acyclic pipeline for the most
complex property against which the design had been verified.The unabstracted version
had 148 variables, and with our best initial order required 409.6 seconds with a maximum
of 1410244 allocated BDD nodes to complete this property. The first run on the abstracted
variant with a random initial order (though pairing present-state and next-state variables
for each state element) had 53 variables, and required 44.9 seconds with a maximum of
201224 BDD nodes. While this speedup is significant, this comparison is skewed since the
unabstracted run benefited from the extensive prior BDD reordering. Re-running the un-
abstracted experiment with a random initial order required3657.9 seconds, with 2113255
BDD nodes. Re-running the abstracted experiment using the order obtained during the first
run required 4.6 seconds with 98396 nodes. Computing the c-slow abstraction required 0.3
seconds. Overall, our abstraction yielded a factor of 81 speedup with random initial orders.
A justifications for comparing relative to random initial orders include the following points.� At the initial stages of a verification effort, no good ordersare available.� These results capture the difficulty of calculating a reasonable BDD order before and
after abstraction (since reordering times are included theresults), and reflect the time
necessary to obtain a result for a new problem.
We even attained a factor of 9 speedup in the extremely skewedcase that the abstracted run
had a random order and the unabstracted run had a very good order.
The next example is the five-slow design. With our best BDD order, model checking
the unabstracted design against one arbitrarily selected formula required 5526.4 seconds,
146
with 251 variables and 3662500 nodes. With a random initial order, the unabstracted run
required 23692.5 seconds with 7461703 nodes. The first run ofthe abstracted design with
a random initial order required 381.5 seconds, with 134 variables and 339424 nodes. Re-
running the formula twice more and reusing the calculated BDD orders yielded a run of
181.1 seconds with 293545 nodes. Performing -slow abstraction required 3.2 seconds.
Due to the potential increase in depth of combinational cones through this abstrac-
tion, there is a potential for a significant blowup of a BDD-based representation of the
resulting netlist. Advanced techniques such as splitting and conjoining [43], or fine-grain
reachability analysis [44], may be used to help combat such blowup. A reasonable BDD or-
der furthermore seems fairly important when performing this abstraction. With reordering
off, and a random initial order, the results for the acyclic pipeline were akin to those re-
ported above. However, the five-slow abstracted transitionrelation was significantly larger
than the unabstracted variant given the random order, thereby resulting in a much slower
execution than on the unabstracted run. With reordering enabled, the results as reported
above were consistently significantly superior.
147
Chapter 10
Phase Abstraction
In this chapter we discuss the technique of phase abstraction, extending results of collabo-
rative work with Tamir Heyman, Vigyan Singhal, and Adnan Aziz reported in [16]. Phase
abstraction is an efficient method to translate a netlist comprised of LEVEL-SENSITIVE
LATCHes into one comprised instead of REGISTERs. The LEVEL-SENSITIVE LATCH, or
simply LATCH, was purposefully not introduced as a possible gate type previously in this
thesis; this is the only chapter in which LATCHes will be discussed, and by performing
phase abstraction, we practically circumvent the need to consider LATCHes elsewhere in
this thesis, or in a verification toolset. Definition 10.1 formalizes the LATCH, and serves as
an addendum to Definition 3.11 and 3.12 for this chapter.
Definition 10.1. A L ATCH vertexv has two-inputs:clockanddata. TermGv is not refer-
enced for LATCHes.� If i > 0, thenp(v; i) = ite�p(clock(v); i); p(data(v); i); p(v; i � 1)�. Otherwisep(v; 0) = ite�p(clock(v); 0); p(data(v); 0); p(Z(v); 0)�.We refer to the set of LATCHes asL. We define the clock logic of a netlist asD =
fanin cone�clock(L)�. We assume without practical loss of generality thathD;V n Di
constitutes a cut of the netlist whose crossing edges areclockedges.
148
In common two-phase designs, a “correct” clocking scheme may be visualized as a
global clock vertex which alternates between 0 and 1 at everytime-step. A LATCH which
is transparent when the global clock is a 0 will be denoted as a�0 LATCH (often referred
to as anL1 LATCH); one which is transparent when the global clock is a 1 will bedenoted
as�1 (often referred to as anL2 LATCH). Hardware design rules, arising from timing con-
straints, require any structural path between two�0 LATCHes to comprise a�1 LATCH,
and vice-versa. An elementary design style requires each�0 LATCH to fan out directly
to a�1 LATCH (called a master-slave LATCH pair), and allows only�1 to drive combina-
tional logic. However, a common high-performance hardwaredesign technique involves
distributing combinational logic freely between�0 and�1 LATCHes to better utilize each
half-period of the clock. Such designs are often explicitlyimplemented in this manner; this
topology is not the byproduct of a synthesis tool, but instead a necessary design technique
to ensure the highest performance hardware.
One may readily model a LATCH using implicitly clocked REGISTERs as demon-
strated in Figure 10.1. We use a multiplexor selected byclock; when clock is a 1, and
the LATCH is transparent, we sensitize a combinational flow-through path from data to
dout. Otherwise, we sensitize a path driven by a REGISTER with the same initial value as
the LATCH which shadows the last-driven value through the LATCH. Note that modeling
LATCHes in this fashion may cause the appearance of combinationalcycles, for example,
given a structural directed cycle from a�0 to a �1 back to the�0. In the presence of a
correct clocking scheme, this apparent combinational pathis an unsensitizablefalse path.
However, in case of a clocking flaw, a semantic combinationalcycle may truly exist.
Because LATCH-based netlists tend to contain more sequential elements than func-
tionally correspondent REGISTER-based netlists, verification algorithms that enumerate
states often require more time and memory in the former case,potentially exponentially
more. We furthermore must model an oscillating clock, whichis in the support of all
LATCHes. Additionally, sincek image computations are necessary per clock period, the
149
� doutdout
data
data
clock
clock
10Figure 10.1: Semantics-preserving translation of LATCHes to REGISTERs
diameter of such a netlist isk times that of a correspondent REGISTER-based netlist. We
therefore propose the technique ofphase abstractionto overcome these difficulties. We
perform this abstraction by selectively eliminating LATCHes. In doing so, this technique
often reduces netlist size, thereby enhancing arbitrary transformation and verification al-
gorithms which may consume superlinear, possibly even exponential, resources. Addition-
ally, by eliminating state elements, we often benefit BDD-based algorithms due to reducing
variable count, which often reduces their size and reordering time. Further, the removal of
REGISTERs allows “collapsing” of adjacent logic cones to a single combinational cone,
which increases the domain of applicability of combinational redundancy removal tech-
niques. However, this elimination of state elements also risks explosion of BDD sizes
representing these composite cones; this abstraction thushas the potential to harm BDD-
based analysis. Nevertheless, as our experiments demonstrate, this abstraction tends to
enhance BDD-based analysis. We have not observed one case where phase abstraction hurt
a verification effort to the point where it needed to be disabled during five years of using
this technique; refer to Section 10.3 for a more detailed discussion of this topic.
Definition 10.2. A k-phase netlistN , for k � 2, contains LATCHes but no REGISTERs.
We associate� representing a “global” clock, andC : V n D 7! 0; : : : ; k � 1 repre-
senting ak-coloring function, withN . Semantically,� acts an unconditionalmod-k up
counter which initializes to 0, thusp(�; i) = i mod k for anyp 2 P . We require thatp� lo k(v); i� = 1 iff�p(�; i) = C(v)� for eachv 2 L and eachp 2 P . Therefore,� indi-
150
cates which phase of LATCHes are transparent at the corresponding time-step. We require
thatT \D = ;. ColoringC is defined as follows.
1. If the color of LATCH v isC(v), then the color of each LATCH v0 in the combinational
fanout ofv is�C(v) + 1� mod k.
2. If the color of LATCH v isC(v), then the color of each non-LATCH v0 in the combi-
national fanout ofv isC(v).3. If the color of LATCH v isC(v), then the color of each non-LATCH v0 in the combi-
national fanin ofdata(v) is�C(v)� 1 + k� mod k.
4. If the color of LATCH v is C(v), andZ(v) is not ZERO or ONE, thenC�Z(v)� =C(v).If the topology of a netlist renders a consistent gate coloring with respect to Defi-
nition 10.2 infeasible, the netlist is notk-phase. Note that the coloring ofk-phase netlists
resembles that of -slow netlists described in Definition 9.1. A linear-time algorithm may
be used to color the vertices ofN ; � further provides a “seed” for the coloring. It is possible
to generalizek-phase topologies similarly to our generalization of -slow netlists provided
in Definition 9.4; however, synthesis constraints precludethe need for such. Note that we
require that the target not be an element of the clock logicD, which is necessary for sound-
ness of the abstraction. Use of an integer rather than a set ofbinary values for� is merely
a notational shorthand.
Definition 10.3. Givenk-phase netlistsN andN 0, letA = fu 2 V : C(u) = k�1gnfu 2V : combinational fanin(u) \ I 6= ;g andA0 = fu0 2 V 0 : C 0(u0) = k � 1g n fu0 2V 0 : combinational fanin(u0)\ I 0 6= ;g. Thecomposition ofk-phase netlistsN 00 = N k N 0is defined by merging some FREE verticesv of N onto verticesu0 2 fI 0 [ A0g of N 0 with
equal color:C(v) = C 0(u0), and by merging some FREE verticesv0 of N 0 onto vertices
151
u 2 fI [ Ag of N with equal color:C 0(v0) = C(u). We require thatfanin cone�Z 00(L00)�
remain combinational after merging.
Since we may only merge vertices of the same color, the composition of twok-phase
netlistsN andN 0 yields ak-phase netlistN 00 which inherits coloring, henceC 00 = C [C 0.Composition furthermore is guaranteed to yield a legalk-phase netlist since we may only
merge a FREE vertex ofN onto a FREE vertex ofN 0, or onto a non-FREE vertex ofN 0which has no FREE vertices in its combinational fanin. The optimality of our algorithm
results from representingN as the composition ofminimal dependent layers(MDLs) of
LATCHes, and preserving only one phase of LATCHes per MDL.
Definition 10.4. A dependent layerof a k-phase netlist is a nonempty set of�0; : : : ; �k�1LATCHes l0; : : : ; lk�1, such thatli+1 is a superset of all LATCHes in the combinational
fanout ofli, andli is a superset of all LATCHes in the combinational fanin ofdata(li+1), for0 � i < k � 1.
Definition 10.5. A dependent layerl is termedminimal if and only if there does not exist
a nonempty set of LATCHesl0 which may be removed froml and still result in a nonempty
dependent layerl n l0.Lemma 10.1.There is a unique MDL partition of anyk-phase netlist.
Proof. We prove this lemma by contradiction. LetQ0 andQ1 be two non-equivalent MDL
partitions ofN . Let q0i represent thei-th MDL of Q0, andq1i the i-th MDL of Q1. ForQ0 to be non-unique, there must exist aq1i which is not an element ofQ0. Note that there
cannot exist aq0i which is a superset of thisq1i elseq0i is not minimal (or is equivalent toq1i); similarly, thisq1i cannot be a superset of anyq0i .If q1i is a singletonflg of color j, there must exist no other LATCHes in the fanout
(unlessj = k � 1) or fanin (unlessj = 0) cone ofl elseq1i is not a dependent layer.
Clearly, theq0i which containsl is not minimal since we may removel from that set and
the remaining nonempty setq0i n l is still a dependent layer – as is the singletonflg.152
If q1i has cardinality greater than one, there must exist two LATCHesl andm in q1isuch thatl 2 q0i andm 2 q0j for i 6= j. If l is a�i for i < k � 1, we note that all�i+1LATCHes li+1 in the combinational fanout ofl must be inq0i andq1i , and all�i LATCHesli in the combinational fanin ofdata(li+1) must also be inq0i andq1i . We may iteratively
repeat this analysis for LATCHes in the combinational fanout oflj (for 0 � j < k� 1), and
for LATCHes in the combinational fanin ofdata(lj) (for 0 < j � k� 1), until all LATCHes
have been encountered and we have reached a fixed point of dependent LATCHes. Note
thatm must be one of the LATCHes reached in this fixed point, elseq1i is not minimal.
Furthermoremmust also be an element ofq0i elseq0i is not a dependent layer, contradicting
the claim thatq0i 6= q1i .B
A
�0�0 �1 �0 �1�1Figure 10.2: Example netlist with two minimal dependent layers
Consider the example two-phase netlist depicted in Figure 10.2. The two unique
MDLs are marked with dotted boxes. Merely removing all�0 or all �1 LATCHes will not
yield an optimum reduction for this netlist; the�0 LATCHes of layer A, and the�1 LATCHes
of layer B must be removed to yield an optimum solution of two LATCHes.
Consider the generic two-phase netlist shown in Figure 10.3. The initial values of
the LATCHes areZ(VB0) andZ(VD1). Note thatZ(VB0) will not be visible since the�0LATCHes are transparent at time 0. Let� denote the global clock, which initializes to 0, and
alternates between 0 and 1 at every time-step, indicating whether the�0 or �1 LATCHes,
respectively, are presently transparent. For alli, we have_p(VB0; i) = ite� _p(�; i); _p(VB0; i�153
f0 VC0VA1f1 VB0 VD1�1�0I1I0Figure 10.3: Two-phase netlistN1); _p(VA1; i)�. For i > 0, we have_p(VD1; i) = ite� _p(�; i); _p(VC0; i); _p(VD1; i� 1)� For the
combinational gates, for alli we have_p(VA1; i) = f1� _p(VD1; i); _p(I1; i)� and _p(VC0; i) =f0� _p(VB0; i); _p(I0; i)�.Extending our example to an arbitraryk-phase netlist, we denote the combinational
logic sourcingdata(�(j+1) mod k) asffjg, which has LATCHes�j and FREE verticesIj of
color j in its combinational fanin. Note that the initial values of the�0 LATCHes are of
no semantic importance since the�0 LATCHes are transparent at time 0. The initial values
of only the�k�1 LATCHes propagate to other LATCHes (since all others are transparent
before�k�1) – though the initial values of LATCHes of color0 < j < k� 1 are of semantic
importance during the firstj � 1 time-steps. For this reason, unlike -slow abstraction, we
cannot exploit the simplifying assumption that targets will be colork� 1 through padding.
Finally, we note that the�j LATCHes are transparent only at time-stepsk � i+ j, and stutter
between. We obtain the following expressions for color-j verticesVj._p(Vj; j) = 8>>>>>>>><>>>>>>>>:
ff0g�ffk � 1g( _p(Z(�k�1); 0); _p(Ik�1; 0)); _p(I0; 0)� : j = 0ffjg� : : : (ff0g(ffk � 1g( _p(Z(�k�1); 0); _p(Ik�1; 0));_p(I0; 1)); : : : ); _p(Ij; j)� : j 6= 0 (10.1)
In (10.1), the first sequence represents a nesting offfjg�ffj � 1g(: : : (ff1g(ff0g. We
will use this same sequence in (10.4). The second sequence closes the first with the corre-
sponding FREE vertices _p(I0; 1)); _p(I1; 2)); : : : ); _p(Ij�1; j)); _p(Ij; j)�. In (10.4), the FREE
154
vertex ordering is identical, but the temporal arguments are all 0._p�Vj; bj + k) = ffa0jg� : : : (ffak�1j g(ffjg( _p(�j; bj); _p(Ij; bj + 1));_p(Iak�1j ; bj + 2)); : : : ); _p(Ia0j ; bj + k)� (10.2)
We define the sequencea0j ; : : : ; ak�1j asaij = (k + j � i) mod k. We definebj =k � i+ j; the�j LATCHes are transparent only during such time-steps. The first sequence of
(10.2) represents a nesting of functionsffa0jg�ffa1jg(: : : (ffak�2j g(ffak�1j g. The second
sequence closes the first with the corresponding vertices_p(Iak�1j ; bj + 2)); _p(Iak�2j ; bj +3)); : : : ); _p(Ia1j ; bj + k)); _p(Ia0j ; bj + k)�. In (10.4), the FREE vertex ordering is identical;
the temporal arguments are discussed below.
Letting j = k � i + j 0 for any j 0 : (j 0 6= j) ^ (j 0 < k), we obtain the following
valuations for time-steps when�j is not transparent hence stutters._p�Vj; j) = ffjg� _p(�j; j); _p(Ij; j)� (10.3)
Similarly to the analysis of -slow netlists provided in formulas (9.1) and (9.2), formu-
las (10.1) and (10.2) indicate that each valuation to a�j LATCH is a deterministic function
of the value of the�j LATCH k time-steps past, and the FREE vertex valuations at the ap-
propriate time-step since. Unlike -slow designs, only the initial values of�k�1 LATCHes
propagate as per (10.1). Furthermore, each�j only updates once perk time-steps.
Either layer of LATCHes of a two-phase netlistN may be turned into buffers (one-
input AND gates), and the remaining layer transformed to REGISTERs; the resulting ab-
stracted netlist will be shown to be bisimilar to the original netlist with respect to the
colored verticesV n D. Figure 10.4 illustrates the first abstraction, which removes the�0 LATCHes. TermZ(VD1)0 = _p(V 0D1; 0) is the cloned initial value of the preserved
sequential elements. Fori > 0, we have _p(V 0D1; i) = _p(V 0C0; i � 1). For the com-
binational nets, we have_p(V 0A1; i) = f1� _p(V 0D1; i); _p(I 01; i)�; _p(V 0B0; i) = _p(V 0A1; i); and_p(V 0C0; i) = f0� _p(V 0B0; i); _p(I 00; i)�.155
f0f1 V 0A1 V 0C0 V 0D1I01I00 V 0B0Figure 10.4: Phase-abstracted netlistN 0
Figure 10.5 shows the second abstractionN 00 with layer�1 removed. We need a
new REGISTERInit00 whose initial value is 1, and thereafter is 0. This REGISTERensures
that the initial value cloneZ(VF1)00 is applied to netsV 00D1 in N 00 at time 0. The initial
values of the preserved state variables (which have been transformed from LATCHes into
REGISTERs) are transformed tof1�Z(VD1)00; I 001 �, which is equivalent to_p(V 00A0; 0). This is
to prevent false-hits of a color-0 target at time 0, since�0 is transparent at time 0 hence its
initial value is of no semantic importance. Fori > 0 we have_p(V 00B0; i) = _p(V 00A1; i � 1).If _p(Init00; i) = 1, then _p(V 00D1; i) = _p(Z(VD1)00; i), else _p(V 00D1; i) = _p(V 00C0; i). For the
other combinational nets, we have that_p(V 00A1; i) = f1� _p(V 00D1; i); _p(I 001 ; i)� and _p(V 00C0; i) =f0� _p(V 00B0; i); _p(I 000 ; i)�.Either of these abstractions may be applied to each MDL partition of a two-phase
netlist independently of the other MDLs, thus yielding an abstraction which has a globally
minimum number of LATCHes (refer to Theorem 10.4). This minimum would, in general,
be less than removing either all�0 or all�1 LATCHes. We now define the phase abstraction
process for arbitraryk � 2.
Definition 10.6. A k-phase abstractionis a transformation of ak-phase netlistN to a
REGISTER-based netlist~N as follows. The transformation operates upon maximal directed
pathse from a�i to a�j LATCH including each color at most once;i = 0, andj = k � 1,
unless there exist no other LATCHes in the fanin cone of�i, and no other LATCHes in the
fanout cone of�j, respectively.
156
f0f1 Init00ZERO00V 00A1 V 00B0 V 00C0 V 00D1Z(VD1)00 10I000I001
Figure 10.5: Alternate phase-abstracted netlistN 00� Exactly one LATCH l alonge is replaced by a REGISTER r if i = 0 andj = k � 1,
otherwise at most one LATCH is replaced by a REGISTER. Letting l be a�i LATCH,
we refer to the transformation as apreserve-�i abstraction.
– If C(l) = k � 1, then useZ(�k�1) as the initial value ofr.– If C(l) 6= k � 1, use an unfolding-based approach to calculate_p�r; C(l)� as a
function overZ(�k�1) andI as per (10.1).� All L ATCHes alonge other thanl are eliminated.
– For a�i LATCH (i 6= k � 1) which is eliminated, it is replaced by a buffer.
– For a�k�1 LATCH which is eliminated, it is replaced by a multiplexor structure
to preserve its initial value as demonstrated by Figure 10.5.
Straight-forward analysis demonstrates that valuations to the phase-abstracted ver-
tices of colorj, denoted by~Vj, satisfy the following formulas.
157
_p( ~Vj; 0) =8>>>>>>>>>>>><>>>>>>>>>>>>:ff0g�ffk � 1g( _p( ~Z(�k�1); 0); _p(~Ik�1; 0)); _p(~I0; 0)� : j = 0ffjg� : : : (ff0g(ffk � 1g( _p( ~Z(�k�1); 0); _p(~Ik�1; 0));_p(~I0; 0)); : : : ); _p(~Ij; 0)� : 0 < j < k � 1ffk � 1g� _p( ~Z(�k�1); 0); _p(~Ik�1; 0)� : j = k � 1 (10.4)
The following formula defines transitions of~N . Note that the “artifact” initial values of any
preserved�j for j 6= k � 1, obtained through unfolding, imply that valuations to~�j will
stutter from time-steps0 to 1. We therefore require consideration of the type of abstraction
in this formula. Let�j(i) = (k � 1� j + i) mod k. Term�j(v) is 1 if a�j0 LATCH was
preserved for�j(j 0) > �j�C(v)� in any pathe containingv (refer to Definition 10.6), else�j(v) is 0. TermÆj(v) = �C(v) 6= k � 1� ^ :�j(v). We also useÆj(v) in our k-phase
bisimilarity of Definition 10.7 and ourtrace lifting algorithm of Figure 10.7 to ignore the
time-0 valuations to verticesv with Æj(v) = 1, due to the above-mentioned stuttering. We
again define the numerical sequencea0j ; : : : ; ak�1j asaij = (k + j � i) mod k for (10.5).
Let ~V 0j be a subset of~Vj such that8v; v0 2 ~V 0j : �j(v) = �j(v0), and leti beÆj( ~V 0j ) + for
any 2 N ._p� ~V 0j ; i+ 1) = ffa0jg�ffa1jg( : : : (ffak�1j g(ffjg( _p( ~�j; i); _p(~Ij; i)); (10.5)_p(~Iak�1j ; i+ 1� �j(~Iak�1j ))); : : : ); _p(~Ia1j ; i+ 1� �j(~Ia1j )); _p(~Ia0j ; i+ 1)�The first sequence of formula (10.5), like that of (10.2), represents a nesting of functionsffa0jg�ffa1jg(: : : (ffak�2j g(ffak�1j g. The second sequence closes the first with the corre-
sponding vertices_p(~Iak�1j ; i + 1 � �j(~Iak�1j ))); _p(~Iak�2j ; i + 1 � �j(~Iak�2j ))); : : : ); _p(~Ia1j ; i +1� �j(~Ia1j ))); _p(~Ia0j ; i+ 1)�.Comparing formulas (10.2) and (10.5), we note that similarly to -slow abstraction,
the semantic effect of phase abstraction is to fold time modulo-k. Furthermore, the time-j158
valuations to the color-j vertices as per (10.1) correspond to the time-0 valuations to the
abstracted vertices as per (10.4) and (10.5).
Definition 10.7. A k-phase bisimulation relation1 with respect tobisimilar vertex setsAand ~A, where8a; a0 2 A:�C(a) = C(a0)�^��C(a)(a) = �C(a0)(a0)�, holds betweenk-phase
netlistN and its abstraction~N , respectively, iff there exists a bijective mapping : A 7! ~Awhich satisfies the following conditions. Let (a) = �C(a) + 1� mod k.
1. 8p 2 P:9~p 2 ~P:8j 2 N :8a 2 A: p�a; (a) + j � k� = ~p� (a); j + ÆC(a)(a)�2. 8~p 2 ~P:9p 2 P:8j 2 N :8a 2 A: p�a; (a) + j � k� = ~p� (a); j + ÆC(a)(a)�
Note thatk-phase bisimilarity leaves an initial semantic gap due to (a), necessary
because the initial values for�j LATCHes (where0 < j < k � 1) may not be represented
in the abstracted netlist. We will useBMC to patch this hole during invariant checking,
thereby performing a temporal decomposition of the verification task. We definek-phase
bisimilarity only with respect to a vertex setA of a single color, and with an identical� val-
ues. This not only guarantees applicability of the results of formulas (10.1)-(10.5), but is
necessary since certain cross-products of concurrent valuations to multi-colored vertices inN may become unreachable in~N through the temporal folding inherent in phase abstrac-
tion. Furthermore, the stuttering at time 0 and 1 of the preserved lower-or-equal colored
LATCH causingÆC(a)(a) = 1 requires special consideration – the addition ofÆC(a)(a) – in
this bisimilarity.
Lemma 10.2. If ~N is ak-phase abstraction ofN , thenN is k-phase bisimilar to~N with
respect to any corresponding vertex setsA and ~A such that8a; a0 2 A: �C(a) = C(a0)� ^��C(a)(a) = �C(a)(a0)�.Proof. Formulas (10.1) and (10.4) demonstrate bisimilarity of thetime-C(a) valuations ofN to the time-0 valuations to~N for C(a) 6= k � 1. Bisimilarity of the time-0 valuations to
1Though in this thesis we only require trace equivalence for invariant checking, as we demonstrate in [16]this abstraction preserves a type of bisimilarity.
159
C(k � 1) vertices follow from (10.3) and (10.4). Correspondence ofk transitions ofN to
one transition of~N follows from (10.2) and (10.5). Note that incrementing the color of the
vertices to use as the offset (a) within the clock period is necessary as per (10.2). Note
also that forÆC(a)(a) = 1, valuations to~a reflect a stuttering from time 0 to 1 which is not
present inN , correlating to the addition ofÆC(a)(a) in Definition 10.7.
Lemma 10.3. Let N and ~N be k-phase bisimilar with respect to vertex setsA and ~A,
andN 0 and ~N 0 be k-phase bisimilar with respect to vertex setsA0 and ~A0. A k-phase
compositionN 00 = N k N 0 is k-phase bisimilar to~N 00 = ~N k ~N 0 with respect toA [ A0and ~A [ ~A0 provided that8a; a0 2 fA [ A0g: �C(a) = C(a0)� ^ ��C(a)(a) = �C(a)(a0)�.Proof. This proof follows immediately from (10.1)-(10.5) and the analysis of Lemma 10.2,
noting that as per Definition 10.3, the rules of composition of k-phase netlists yield anotherk-phase netlist.
Lemmas 10.2 and 10.3 allow us to apply the variousk-phase abstractions indepen-
dently on each dependent layer, and still render ak-phase bisimilar netlist on the composi-
tion of the abstractions.
Our top-level phase-abstraction algorithm is depicted in Figure 10.6. We first color
the netlist using the algorithm implied by Definition 10.2 and a seed provided from the
clocking logic. We useBMC to determine if any targets of color1; : : : ; k�2 may be hit by
the initial values of the corresponding LATCHes, since as demonstrated by Definition 10.7,
phase abstraction may not preserve those initial values. Practically, thisBMC is rarely nec-
essary sincek is almost always equal to 2 for industrial designs; furthermore, the likelihood
that a target is of colork � 1 tends to be rather high regardless ofk. Even when necessary,
constant propagations of initial values are likely to trivialize theBMC call, similarly to the
observation that a retiming stump tends to be small due to constant propagations (refer to
Section 6.4). Note that we need only perform aBMC for one time-step – this is because
the initial values of these intermediate-colored verticesdo not propagate, and because the
160
void PhaseAbstract(Netlist N)1. Color netlistN .
2. If k < 2, orN cannot bek-colored, orT \D 6= ;, no abstraction is possible.
3. RunBMC on all targets of colorf1; : : : ; k � 2g for time 0.
4. Partition netlist into MDLs.
5. Perform phase abstraction. For each MDLAi:(a) Bi = maxfj : 8j 0 6= j:j�j \ Aij � j�j0 \ Aijg.(b) Perform apreserve-�Bi abstraction onAi.
Figure 10.6:PhaseAbstract algorithm�j LATCHes hold their initial values through timej�1 as per formula (10.3). We next par-
tition the netlist, and abstract viapreserve-�i for a maximal-coloredi of smallest LATCH
cardinality for each element of the partition.
Our algorithm for trace lifting is depicted in Figure 10.7. The colorC(v) vertices
are those of semantic importance at offset (v) within thek-step clock period because at
such times, the FREE vertices of colorC(v) are those that impact transitions of the netlist
as per (10.2). Terms�(v) and Æ(v) capture the type of abstraction. IfÆ(v) = 1, then
the corresponding MDL was abstracted with a “preserve-�C(v) or lesser” abstraction forC(v) 6= k � 1, hence the valuation must be pushed back one clock-period toproperly
capture its temporal correlation as per (10.5), causing itstime-0 valuation to be dropped.
For trace lifting, we do not define� andÆ on a per-j basis unlike (10.5) because we root
our evaluation directly to color-k � 1 vertices.
Theorem 10.1.Phase abstraction is sound and complete for invariant checking.
Proof. A target unreachableresult will be generated by phase abstraction only if the corre-
sponding abstracted target was proven to be unreachable by achild verification flow. From
Lemmas 10.2 and 10.3, phase abstraction preserves all valuations to colored vertices other
161
Partial Trace Lift Trace(Partial Trace p0) fforeach v 2 fV nDg f (v) = �C(v) + 1� mod k;�(v) = MDL(v) was abstracted via preserve-�j for j > C(v);Æ(v) = �C(v) 6= k � 1� ^ :�(v);
for�i = 0;i < length(p0);i++� fj = k � (i� Æ(v)) + (v);
if�(j � 0) ^ (9b:h(~v; i); bi 2 p0)� fp = p [ h(v; j); p0(~v; i)i;ggg
return p;gFigure 10.7: Phase abstraction trace lifting algorithm
than initial values of LATCHes of color1; : : : ; k � 2. Additionally, aBMC call is used to
guarantee that targets of color1; : : : ; k� 2 are not hittable at time 0;BMC thus fills in this
temporal gap for invariant checking. Hence unreachable results are correct.
Note that, unlike -slow bisimilarity from Lemma 9.1, in ak-phase netlist a val-
uation to a particular FREE vertex of colorj only affects transitions of the trace for one
time-step,a = k � i + (j + 1) mod k, perk time-stepsa; : : : ; a+ k � 1. However, (10.3)
demonstrates that any valuation toVj during the range between consecutive transparent
states of�j is producible at each time-step during this range, thus thischaracteristic does
not impact invariant checking. The stuttering of state elements in the combinational fanin
of vertices~v for whichÆC(v)(v) = 1 similarly does not impact invariant checking.
A target hitresult will be generated only when an abstracted target~t is hit by BMC
or a child verification flow. If the target was hit byBMC , this result is correct by assump-
tion. Otherwise, we note that the corresponding tracep0 is semantically correct with respect
to ~N and hits~t by assumption. Comparing our trace lifting algorithm from Figure 10.7 to
(10.1) and (10.2), and to (10.4) and (10.5), we see that the lifting correctly temporally
transforms the trace into one that is semantically consistent withN . Furthermore, our trace
162
lifting algorithm propagates every valuation to colored vertices from the abstracted trace,
aside from time-0 valuations to vertices~v for whichÆC(v)(v) = 1 which are equivalent their
time-1 valuations, and aside from time-0; : : : ; j � 1 valuations to color-j 6= k � 1 vertices
which cannot comprise a target hit as validated byBMC . Our target is colored, hence the
lifted trace also hits the target.
Theorem 10.2.Phase abstraction generates a legal netlist.
Proof. We consider the requirements for legality enumerated in Definition 3.24.
1. The only gates fabricated by phase abstraction are one-input AND gates, REGISTERs,
multiplexors, and cloned vertices, all of which are legal (possibly by assumption).
2. Each set of fabricated gates (see the previous point) which replaces a gate in thek-phase netlist is of constant size. Thus~N is finite by assumption.
3. Phase abstraction replicates LATCH initial values, which are combinational by as-
sumption, to use for preserved-�k�1 REGISTERs, and uses combinational unfolding
to generate initial values of preserved�i REGISTERs for i 6= k � 1. Hence all initial
values of ~N are combinational.
4. By assumption, every original directed cycle will include every LATCH color at least
once (else we do not have a legalk-phase netlist). Phase abstraction will guarantee
that at least one LATCH of each path from a�0 to a �k�1 will be translated to a
REGISTER, hence phase abstraction cannot generate combinational cycles.
Theorem 10.3.If the diameter of a set of vertices~U of phase abstracted netlist~N is d( ~U),then the diameter of the corresponding verticesU of thek-phase netlistN , provided that8u; u0 2 U: �C(u) = C(u0)� ^ ��C(u)(u) = �C(u)(u0)�, is at mostk � d( ~U).
163
Proof. By Definition 4.2, if the diameter of the phase abstracted abstracted vertex set~U isd( ~U), then the longest required duration to witness a particularvaluation to~U is d( ~U) time-
steps. From Lemma 10.2, we know that phase abstraction foldstime modulo-k. Therefore,
any transition of states in~N correlates tok transitions inN , and the corresponding valua-
tion toU will occur within k � d( ~U) time-steps.
10.1 Phase Abstraction AlgorithmsIn this section we discuss our algorithms for abstractingk-phase netlists. Because use
of implicit-clocked REGISTERs simplifies a toolset, and since phase abstraction is not it-
eratively applicable, it is often beneficial to perform phase abstraction during the design
compile and import process.
Several important algorithms were already provided in Figure 10.6 (our top-level
phase abstraction algorithm), and in Figure 10.7 (for tracelifting). We thus need only
provide our MDL partitioning algorithm in Figure 10.8. Notethat this algorithm may read-
ily be optimized so that each net is considered only a constant number of times in fanout
traversal as well as fanin traversal, thus ensuring linearity of the phase abstraction process.
Theorem 10.4.Algorithm PhaseAbstract performs optimalk-phase abstraction reduc-
tions for two-phase netlists.
Proof. By Lemma 10.1, there is a unique MDL partition of a netlist. Each MDL is of min-
imum size, resulting in a maximum number of dependent layersin the netlist. Since each
MDL may be abstracted independently of the others, the locally optimal solutions yield a
globally optimal result for two-phase netlists. This property follows from the observation
that eliminating any LATCH l from a two-phase MDL implies that all LATCHes l0 in the
combinational fanin or fanout of this LATCH (within the MDL) must be preserved, which
in turn implies that all LATCHes in the combinational fanin or fanout ofl0 must be elim-
inated, and so on. Therefore, all LATCHes of a single color must be eliminated, and the
other color preserved, within the MDL.
164
Partition K PhasePartition (Netlist N) fi = �1;foreach v 2 L f
if (v 2 Sij=0A[j℄) fcontinue;gi++;A[i℄ = in q = out q = fvg;while
�:empty(in q) _ :empty(out q)� fif
�:empty(in q)� fu = Pop(in q);if
�C(u) � 0� fcontinue;g� = fw : w 2 combinational fanin�data(u)� ^ w 2 Lg n A[i℄;
Assert�C(�) � C(u)� 1�;
Push(in q; �); Push(out q; �);A[i℄ = A[i℄ [ �;gif
�:empty(out q)� fu = Pop(out q);if�C(u) � k � 1� fcontinue;g� = fw : w 2 combinational fanout(u) ^ w 2 Lg n A[i℄;
Assert�C(�) � C(u) + 1�;
Push(in q; �); Push(out q; �);A[i℄ = A[i℄ [ �;gggreturn A[ ℄;g
Figure 10.8: MDL partitioning algorithm
For the relatively uncommon case ofk � 3, this linear-time algorithm may not
yield an optimal solution. Consider the3-phase MDL of Figure 10.9, where the numbers
after the slashes indicate the “width” of the correspondingvectored LATCHes. Preserv-
ing any single phase will yield a solution with three REGISTERs, whereas preserving the�1 A2 and the�0 A3 yields an optimum solution of two. Clearly the optimum solution
is achievable in superlinear polynomial time by solving as-t node min-cut problem2 on
2One of the most efficient known algorithms for solving thes-t min-cut problem is thehighest-labelpreflow-push algorithm, which isO(jVj2 � jEj1=2) [105].
165
A1/2 A2 /1 A5 /3A4 /2A3/1
�0 �1�1�0 �2Figure 10.9: Example three-phase MDL
the LATCH connectivity graph (whose vertices are all LATCHes, and whose directed edges
represent a combinational fanout connectivity between LATCHes of colori to color i + 1for 0 � i < k � 1) betweensourcesof color 0 andsinksof color k � 1. Rather than
spending superlinear resources for phase abstraction, it is our experience that an optimal
tool implementation will use a linear technique to achieve efficient phase abstraction (yet
to obtain superior reductions compared to a globalpreserve-�i approach [106]), then to
subsequently use other superlinear reduction techniques such as retiming (refer to Chap-
ter 6) to provide additional reductions. As noted in [106], retiming will compensate for any
suboptimality in the phase abstraction process, thus a global preserve-�i approach coupled
with retiming may be a reasonable choice for a simpler tool implementation aside from the
extra processing time required for retiming the less optimal phase-abstracted netlist. Alter-
natively, if optimal phase abstraction is desired, a more sophisticated algorithm may readily
be incorporated into the above framework to perform finer-grained abstraction decisions.
10.2 Related Work
A similar technique for phase abstraction was proposed in [106] for sequential hardware
equivalence. They propose globally converting LATCHes of all but a single phase into
buffers. Their work proves correctness for the steady-state subgraph of the abstracted
netlist; as such, initial values are discarded. However, this approach is insufficient for
166
invariant checking; modern hardware designs typically require an explicit initialization se-
quence (e.g., via scan chains) before proper functionalityis ensured. Failure to consider
initial values, and transitions outside of the steady state, may fail to expose certain arbi-
trarily complicated design flaws existing before steady state is reached, and even prevent
the netlist from reaching its intended steady state. Our approach does preserve initial val-
ues, and we prove a bisimilarity between the original and abstracted netlists relative to
their initial states. The technique in [106] further proposed a globally greedy approach of
“removing all but the smallest phase set” of LATCHes. They propose retiming as a sec-
ond reduction step to ensure minimal LATCH counts. Our work calculates minimally-sized
partitions of the netlist, and allows a greedy choice of which phases to discard for each
partition, independently of the other partitions, hence isable to achieve reductions beyond
those possible with the technique of [106] alone (without the more costly retiming step). A
customized algorithm for efficient image calculation uponk-phase netlists, exploiting the
distinctness of vertices of each color, is presented in [106] – though it does not offer the
above-mentioned benefits of our structural transformation.
Many hardware compilers allow automatic translation of master-slave LATCH sets
into a single REGISTER. Retiming algorithms [68] may be used to retime the netlist such
that�0-�k�1 layers become adjacent and one-to-one. However, use of sucha mechanism
would require interpretation of LATCHes in addition to REGISTERs in a re-entrant retim-
ing engine, and furthermore would require additional retiming constraints to ensure such
adjacency, which is somewhat unattractive. Additionally,retiming requires quadratic re-
sources or greater3; a prior linear-time phase abstraction may offer significant speed-ups
to retiming algorithms by decreasing REGISTER count, as was also observed in [106]. We
therefore have found that use of retiming to perform phase abstraction is less attractive than
our approach. However, phase abstraction and retiming offer complementary benefits, and
we have found the subsequent use of retiming extremely beneficial.
3Retiming is solvable as a min-cost flow problem [66], for which one of the most efficient known algo-rithms is theenhanced capacity scaling algorithm, which isO�jEj � log(jVj) � (jEj+ jVj � log(jVj))� [75].
167
Stuttering bisimulation[28], which relates two machines which are semantically
equivalent except that either may add “repetitious state transitions” that do not appear in
the other, is a related concept. The satisfaction of two stuttering bisimilar states is identi-
cal for CTL* formulae with noX(f) subformulae [107], which covers invariant checking.
Stuttering bisimilarity offers some degree of insight intothe nature of ourk-phase bisim-
ilarity. Indeed,k-phase abstraction yields an abstracted netlist~N related to its originalNby ak-stuttering upon sets of vertices with no FREE vertices in their combinational fanin.
Therefore, the results of prior research on stuttering bisimulation hold between selective
vertices ofN and ~N . However, there are several important distinctions between these top-
ics. One is that we establish trace-equivalence over vertices with FREE vertices in their
combinational fanin, which generally do not stutter. Additionally, there is one fundamental
contribution of this work beyond (or leading to) stutteringbisimulation: we provide linear-
time algorithms that analyze and transform the structure ofa netlist, hence there is no need
to transform or even analyze the state transition graph of a netlist to achieve our reductions.
Thus, while stuttering bisimulation is a good framework from which to theoretically under-
stand phase abstraction, it does not offer a practically useful mechanism to perform phase
abstraction on very large netlists.
Related to the topic of phase abstraction is -slow abstraction (refer to Chapter 9).
Like a -phase netlist, the topology of -slow netlists guarantee that any directed cycle has
modulo- state elements; similar -coloring may be applied to both netlist types. How-
ever, unlike phase abstraction, -slow abstraction is only applicable to netlists composed
of REGISTERs. Furthermore, the state elements of -slow netlists generally do not stutter
whatsoever, and all of their initial values have semantic importance, unlikek-phase netlists.
The use of -slow abstraction after phase abstraction may be a beneficial verification strat-
egy; these techniques are complementary.
The work of [108] provides a methodology for a specification to operate at a dif-
ferent time-scale as a hardware implementation to increasethe utility of assume-guarantee
168
reasoning. However, they do not focus on abstractions to enhance verification, only on the
mechanics of interfacing a specification in one time-scale to an implementation in another.
The work of [109] provides a general set of formalisms to relate various transformations
of netlists with various latching and clocking schemes, such as multi-phase netlists, re-
timed netlists, and netlists with multiple clock domains. However, their approach does not
address techniques for reducing netlist size.
In [76], Touati and Brayton proposed a method for adding reset logic which forces
an equivalent initial state for retimed netlists. This reset logic is similar to our technique of
preserving the initial value of eliminated phase-k� 1 LATCHES as depicted in Figure 10.5.
10.3 Experimental Results
Our experimental results are reported for two-phase abstraction using the model checker
RuleBase [103]. This algorithm has been deployed for use on many components of IBM’s
Gigahertz Processor. The results of this reduction on several components of this processor
are provided in Table 10.1. During the initial stages of model checking, this abstraction was
not available. Once the abstraction became available, properties which previously required
many hours to complete would finish in several minutes. More encompassing properties
became feasible on the abstracted netlist which would not otherwise complete.
These experiments were run on an IBM RS/6000 Workstation Model 590 with 2 GB
main memory. RuleBase was run with redundancy removal reductions enabled. These ex-
periments were run with a random initial BDD order (though pairing present-state and next-
state variables), and with dynamic reordering enabled using the technique of Rudell [104].
One property run on the Load Serialization Logic required 25.6 seconds and 36 MB
on the abstracted netlist (with 81 FREE plus REGISTERvariables), including phase abstrac-
tion resources. The same property required 450.2 seconds and 92 MB for the unabstracted
netlist (with 116 variables). A more challenging property run on the Instruction Flushing
169
Logic FunctionState Elements
Before ReductionState ElementsAfter Reduction
Load Serialization Logic 8096 2586L1 Cache Reload Logic 3102 1418Instruction Flushing Logic 138 69Instruction-Fetch Address Generation Logic 4891 2196Branch Logic 6918 3290Instruction Issue Logic 6578 3249Tag Management Logic 578 289Instruction Decode Logic 1980 978Load / Store Control 821 409
Table 10.1: Phase abstraction results for GP netlists
Logic required 852 seconds of user time and 48 MB on the abstracted netlist (with 96 vari-
ables). This same property did not complete on the unabstracted netlist (with 162 variables)
within 72 hours.
While it may seem surprising in these two case that the numberof variables after
phase abstraction ismore than halfthat without phase abstraction, this is due to several
phenomena. First, some of these variables are used for the driver and property automata;
these may be modeled directly as REGISTERs rather than�0-�1 LATCHes even for the un-
abstracted two-phase netlist. Second, phase abstraction does not eliminate FREE variables.
Third, since these results include redundancy removal, some of the initial LATCH variables
may be eliminated by this technique.
With this abstraction, as demonstrated above, model checking was enabled to verify
much “larger” and more meaningful properties in less time. All RuleBase users on the
Gigahertz processor project began running exclusively with this abstraction. There have
been more than one thousand formulae written and model checked to date on this project,
which collectively have exposed more than two hundred bugs at various design stages.
This abstraction thus provided an efficient means to help alleviate the verification burdens
imposed by the low level of the high-performance implementation.
Additionally, we have implemented phase abstraction within the transformation-
170
based verification system discussed in previous chapters; all Gigahertz processor experi-
ments mentioned in those chapters use this technique. Though elimination of state elements
often reduces verification complexity as per the above results, such an approach risks ex-
ponential blowup of BDDs representing the composite cones.Advanced BDD-based tech-
niques such as implicit conjoining [110], or fine-grained reachability analysis [44], may
be used to minimize such risk. However, it is noteworthy thatin more than five years of
deploying this technique for model checking and invariant checking on nearly 100 design
components, we never once needed to disable this abstraction to prevent BDD-blowup.
171
Chapter 11
Conclusions and Future Work
In this chapter we summarize the contributions of this thesis, and discuss future research di-
rections. Our overall research thrust has focused upon the deployment of structural analysis
and abstraction techniques to enhance hardware verification. At a high-level, our contribu-
tions are two-fold: we discuss a set of structural abstraction techniques to simplify netlist
representations, and we provide theory for compositionally and structurally deriving di-
ameter bounds from netlist partitions which enables the useof abstractions to help tighten
these bounds. A common theme across many of these techniquesis that atemporalde-
composition of the verification task enables significantspatialreductions. We develop all
techniques as re-entrant modules, allowing arbitrary sequencing of and synergy between
these techniques under a transformation-based verification framework as proposed in [10].
Numerous experimental results have been provided to demonstrate the power and synergy
of these techniques, as well as their overall ability to increase the capacity of automated
proof systems. Our specific contributions include the following.� Our diameter approximation techniques are discussed in Chapter 4. We discuss a
structural algorithm for overapproximating design diameter. Though overapproxi-
mate, this approach is very efficient and able to yield tight bounds for some netlists
for which other approximate techniques (such as recurrencediameter) are exponen-
172
tially loose. We perform the overapproximation based upon apartitioning of the
netlist, and develop theory to allow arbitrary methods to beused on a per-component
basis. Additionally, we discuss the effects of our abstraction techniques upon diam-
eter in each of the corresponding chapters, allowing per-component transformations
to improve diameter bounds.� We discuss redundancy removal in Chapter 5. Our contribution in this area is the
technique of on-the-fly retiming, and the efficient AND/INVERTER/REGISTERgraph
netlist representation.� We discuss our technique of generalized retiming in Chapter6. Our generalizations
include the use of peripheral retiming, NEGATIVE REGISTERs, and a relaxed reset
state in an invariant checking framework. We furthermore propose the concept of
fanin sharingof REGISTERs to enhance min-area retiming.� We discuss the use of structural cut-based abstraction in Chapter 7, based upon the
technique presented in [88]. This abstraction is useful in eliminating combinational
logic and FREE vertices in a netlist.� We discuss structural target enlargement in Chapter 8. Thistechnique is capable of
providing significant reductions in netlist size, in addition to the common character-
istic of making targets probabilistically easier, and shallower, to hit.� We discuss generalized -slow abstraction in Chapter 9. This state folding technique
is capable of providing significant reductions in REGISTERcount.� We discuss phase abstraction in Chapter 10. This approach renders a REGISTER-
based netlist from a LATCH-based one, which is easier to support in a verification
toolset and contains significantly fewer sequential elements.
There are numerous future work directions to enhance the results reported herein.
First, structural diameter overapproximation techniquesshould be improved to enable bet-
173
ter bounding for SCCs. Additionally, as semantic approaches (such as QBF and recurrence
diameter) are improved, our compositional theory providesa framework to synergistically
exploit their strengths. Finally, the incorporation and exploitation of other abstraction tech-
niques to help bound netlist or component diameters can further help enable the most effi-
cient diameter overapproximation system, capable of yielding the tightest diameter bounds.
This research direction has the potential to greatly increase automatic proof capacity on
large netlists due to the efficiency ofBMC techniques.
Second, there are numerous synergistic abstraction approaches which may be in-
cluded in a transformation-based verification setting, such as symmetry reductions and
more powerful synthesis optimizations. Further enhancements to the techniques proposed
herein are also possible, as discussed in the respective chapters. Furthermore, the use of
our completely automatic structural abstractions may be augmented by the application of
manually-guided abstractions and those that require more abstract netlist representations,
perhaps as a preprocessing step.
Third, as improved verification techniques emerge such as enhancements to SAT and
reachability analysis, the overall capacity of encapsulating verification tools will increase.
This will enhance verification capacity synergistically toabstraction techniques.
Finally, further research into applying these techniques to more general property
checking frameworks may be useful to exploit their potential to reduce verification com-
plexity for more general types of proof (such as liveness).
174
Appendix A
Appendix
A.1 Modeling Interconnections as Nets
Throughout this thesis we refer tointerconnectionsasnets. This is somewhat imprecise;
a net may have multiple sinks and sources, and is not necessarily “directed,” whereas an
edge has exactly one source and one sink, and is directed. In this section we discuss ways
to model more general interconnections as nets.
1. An interconnection may have multiple sinks. This may simply be handled by repre-
senting ann-sink net asn distinct edges in our graph model. Recall that each edge is
semantically equivalent to its source vertex.
2. An interconnection may have multiple sources – and furthermore these sources may
drive conflicting values, or no values, onto the net at any time-step.
In this case, we may need to add additional logic to create a semantically-equivalent
net. For example, if the interconnection acts as an OR between various sources, we
may need to inject an OR gate whose inputs are the sources of the interconnection,
which in turn is the source of the edge correlating to the original interconnection.
If an interconnection is defined as a multi-source bus, a morecomplicated function
175
may exist for its behavior in the presence of multiple activesources, for example,
due to varying transistor sizes and drive strengths. Nevertheless, a straightforward
modeling is often possible.
In the case of zero active drivers, there are two common possibilities. First, the net
may act as a pull-up or a pull-down, which may be modeled by logic that drives a
ONE or ZERO, respectively, in case of zero active drivers. If no driversare active and
no “default-value” logic is in place, or if two or more drivers are driving conflicting
values and driver strengths cannot resolve the value deterministically, we may con-
servatively inject a new driver – a unique FREE vertex, whose value takes dominance
in precisely these ambiguous cases.
Given such conventions, we may transform arbitrary interconnections to nets.
A.2 Alternate Gate Types
One may wish to add to the possible gate types of our netlist model – e.g., to add OR or
XOR primitives. We may readily extend Definition 3.11 by definingGv = fv(u1; u2; :::; uj)for a new gate functionfv. Note that our set of gate types from Definition 3.11 are all
completely symmetric on their inputs. If we wish to include more complex gates which
are not symmetric on inputs (e.g., a multiplexor), our netlist model would need to reflect
the ordering of incoming edges of each vertex – and the graph representation and structural
algorithms may need to be altered appropriately. Synthesisof alternate gate types into our
supported types is possible using common logic decompositions. Our choice of the AND
gate as the only multi-input primitive is motivated by priorresearch [79, 51, 25].
Our use of implicit-clock REGISTERs may seem limiting, since a hardware design
may have GATED-CLOCK REGISTERs, MULTI -PORT REGISTERs, or LEVEL-SENSITIVE
LATCHes. Translation of LEVEL-SENSITIVE LATCHes to REGISTERs is handled by phase
abstraction (refer to Chapter 10). We now show that the alternate REGISTERs may be
176
readily modeled by the implicit-clock version with some added combinational gates.
Definition A.1. A GATED-CLOCK REGISTER has 2 inputs:dataandgate. Its semantics
are defined as follows.� If i > 0, then p(v; i + 1) = ite�p(gate(v); i); p(data(v); i); p(v; i)�. Otherwisep(v; 0) = p�Z(v); 0�.As with the LEVEL-SENSITIVE LATCH in Definition 10.1, Definition A.1 requires
an ordering of incoming edges to map the structure of the GATED-CLOCK REGISTER to
a precise semantics. We may model GATED-CLOCK REGISTERs as normal REGISTERs,
with the addition of a feedback path as depicted in Figure A.1[64]. This figure depicts a
GATED-CLOCK REGISTERto the left, with thegatesignal which may force it to holddout.
To the right we indicate a semantically equivalent model of this structure using an implicit-
clock REGISTER. This transformation consists of adding a multiplexor, selected by the
gate, which sensitizes a sampling ofdata if a 1, else a feedback loop from the REGISTER
to emulate the “hold” condition if thegateis a0.
gate
datadout
gatedout data
01Figure A.1: Remodeling GATED-CLOCK REGISTERs
MULTIPLE-PORT REGISTERs may be represented by a set ofk > 1 gate, datainput
ports. They must have a pre-specified permutation of “priorities” between them to define
which input port’sdata value will be sampled in case of multiple activegates – though
such a condition is almost always a design error. Intuitively, this MULTI -PORT REGISTER
will sample and delay thedataof the highest-priority port which has a non-blockinggate,
or hold its value if none are non-blocking. MULTI -PORT REGISTERs may be represented
by generalizing the synthesis of Figure A.1 in the straightforward fashion.
177
Bibliography
[1] G. E. Moore, “Cramming more components onto integrated circuits,” Electronics, vol. 38,
pp. 114–117, April 1965.
[2] P. Gelsinger, P. Gargini, G. Parker, and A. Yu, “Microprocessors circa 2000,”IEEE Spectrum,
vol. 26, pp. 43–47, October 1989.
[3] D. W. Jorgenson and C. W. Wessner, eds.,Measuring and Sustaining the New Economy:
Report of a Workshop. National Academies Press, 2002.
[4] N. Weste and K. Eshraghian,Principles of CMOS VLSI Design - A System Perspective. Sec-
ond Edition. Addison-Wesley Publishing Company, 1993.
[5] M. Srivas, H. Rueß, and D. Cyrluk, “Hardware verificationusing PVS,” inFormal Hardware
Verification: Methods and Systems in Comparison, pp. 156–205, Springer-Verlag, 1997.
[6] M. Kaufmann, P. Manolios, and J. S. Moore,Computer-Aided Reasoning: An Approach.
Kluwer Academic Publishers, 2000.
[7] A. Aziz, V. Singhal, and R. K. Brayton, “Verifying interacting finite state machines: Com-
plexity issues,” Tech. Rep. UCB/ERL M93/52, Electronics Research Lab, University of Cal-
ifornia at Berkeley, July 1993.
[8] J. M. Ludden, W. Roesner, G. M. Heiling, J. R. Reysa, J. R. Jackson, B.-L. Chu, M. L. Behm,
J. Baumgartner, R. D. Peterson, J. Abdulhafiz, W. E. Bucy, J. H. Klaus, D. J. Klema, T. N.
Le, F. D. Lewis, P. E. Milling, L. A. McConville, B. S. Nelson,V. Paruthi, T. W. Pouarz,
A. D. Romonosky, J. Stuecheli, K. D. Thompson, D. W. Victor, and B. Wile, “Functional
178
verification of the POWER4 microprocessor and POWER4 multiprocessor systems,”IBM
Journal of Research and Development, vol. 46, pp. 53–76, January 2002.
[9] D. A. Patterson and D. R. Ditzel, “The case for the reducedinstruction set computer,”Com-
puter Architecture News, vol. 8, pp. 25–33, October 1980.
[10] A. Kuehlmann and J. Baumgartner, “Transformation-based verification using generalized
retiming,” inComputer-Aided Verification (CAV’01), (Paris, France), pp. 104–117, July 2001.
[11] A. Pnueli, “In transition from global to modular temporal reasoning about programs,”Logics
and Models of Concurrent Systems, vol. F13, pp. 123–144, 1985.
[12] E. M. Clarke and E. A. Emerson, “Design and synthesis of synchronization skeletons us-
ing branching-time temporal logic,” inProceedings of the Workshop on Logic of Programs,
(Yorktown Heights, NY), pp. 52–71, May 1981.
[13] E. A. Emerson, “Temporal and modal logic,”Handbook of Theoretical Computer Science,
vol. B, pp. 996–1072, 1990.
[14] R. Gerth, D. Peled, M. Y. Vardi, and P. Wolper, “Simple on-the-fly automatic verification of
linear temporal logic,” inProtocol Specification Testing and Verification, (Warsaw, Poland),
pp. 3–18, June 1995.
[15] I. Beer, S. Ben-David, and A. Landver, “On-the-fly modelchecking of RCTL formulas,” in
Computer-Aided Verification (CAV’98), (Vancouver, BC, Canada), pp. 184–194, July 1998.
[16] J. Baumgartner, T. Heyman, V. Singhal, and A. Aziz, “Model checking the IBM Gigahertz
Processor: An abstraction algorithm for high-performancenetlists,” inComputer-Aided Ver-
ification (CAV’99), (Trento, Italy), pp. 72–83, July 1999.
[17] J. Baumgartner, A. Tripp, A. Aziz, V. Singhal, and F. Andersen, “An abstraction algorithm
for the verification of generalized C-slow designs,” inComputer-Aided Verification (CAV’00),
(Chicago, IL), pp. 5–19, July 2000.
179
[18] O. Coudert, C. Berthet, and J. C. Madre, “Verification ofsynchronous sequential machines
based on symbolic execution,” inInternational Workshop on Automatic Verification Methods
for Finite State Systems, (Grenoble, France), pp. 365–373, June 1989.
[19] J. R. Burch, E. M. Clarke, K. L. McMillan, D. L. Dill, and L. J. Hwang, “Symbolic model
checking:1020 states and beyond,” inFifth Annual Symposium on Logic in Computer Sci-
ence, (Philadelphia, PA), pp. 428–439, June 1990.
[20] T. Niermann and J. H. Patel, “HITEC: A test generation package for sequential circuits,” in
European Conference on Design Automation, (Amsterdam, The Netherlands), pp. 214–218,
February 1991.
[21] J. A. Darringer, D. Brand, J. V. Gerbi, W. H. Joyner, and L. H. Trevillyan, “Logic synthesis
through local transformations,”IBM Journal on Research and Development, vol. 25, pp. 272–
280, July 1981.
[22] E. M. Sentovich, K. J. Singh, C. Moon, H. Savoj, R. K. Brayton, and A. L. Sangiovanni-
Vincentelli, “Sequential circuit design using synthesis and optimization,” inIEEE Interna-
tional Conference on Computer Design, pp. 328–333, October 1992.
[23] A. Kuehlmann, V. Paruthi, F. Krohm, and M. Ganai, “Robust Boolean reasoning for equiva-
lence checking and functional property verification,”IEEE Transactions on Computer-Aided
Design, vol. 21, December 2002.
[24] J. Baumgartner, A. Kuehlmann, and J. Abraham, “Property checking via structural analysis,”
in Computer-Aided Verification (CAV’02), (Copenhagen, Denmark), pp. 151–165, July 2002.
[25] J. Baumgartner and A. Kuehlmann, “Min-area retiming onflexible circuit structures,” in
IEEE/ACM International Conference on Computer-Aided Design, (San Jose, CA), pp. 176–
192, November 2001.
[26] A. Aziz, V. Singhal, G. M. Swamy, and R. K. Brayton, “Minimizing interacting finite state
machines: A compositional approach to language containment,” in IEEE International Con-
ference on Computer Design, (Cambridge, MA), pp. 255–261, October 1994.
180
[27] K. Fisler and M. Vardi, “Bisimulation and model checking,” in Correct Hardware Design and
Verification Methods (CHARME’99), (Bad Herrenalb, Germany), pp. 338–341, September
1999.
[28] M. C. Browne, E. M. Clarke, and O. Grumberg, “Characterizing finite Kripke structures in
propositional temporal logic,”Theoretical Computer Science, vol. 59, pp. 115–131, 1988.
[29] A. Aziz, T. R. Shiple, V. Singhal, and A. L. Sangiovanni-Vincentelli, “Formula-dependent
equivalence for compositional CTL model checking,” inComputer-Aided Verification
(CAV’94), (Stanford, CA), pp. 324–337, June 1994.
[30] P. Cousot and R. Cousot, “Abstract interpretation: A unified lattice model for static analy-
sis of programs by construction or approximation of fixpoints,” in ACM Symposium on the
Principles of Programming Languages, (Los Angeles, CA), pp. 238–252, January 1977.
[31] D. Dams, R. Gerth, and O. Grumberg, “Abstract interpretation of reactive systems,”ACM
Transactions on Programming Languages and Systems, vol. 19, no. 2, pp. 253–291, 1997.
[32] E. M. Clarke, O. Grumberg, and D. E. Long, “Model checking and abstraction,” inSympo-
sium on the Principles of Programming Languages, (Albuquerque, New Mexico), pp. 343–
354, January 1992.
[33] D. E. Long, Model Checking, Abstraction and Compositional Verification. PhD thesis,
Carnegie Mellon University, Pittsburgh, Pennsylvania, July 1993.
[34] R. P. Kurshan,Computer-Aided Verification of Coordinating Processes. Princeton University
Press, 1994.
[35] E. M. Clarke, O. Grumburg, S. Jha, Y. Lu, and H. Veith, “Counterexample-guided abstrac-
tion refinement,” inComputer-Aided Verification (CAV’00), (Chicago, IL), pp. 154–169, July
2000.
[36] R. Hojati and R. K. Brayton, “Automatic datapath abstraction of hardware systems,” in
Computer-Aided Verification (CAV’95), (Liege, Belgium), pp. 98–113, July 1995.
181
[37] R. Hojati, A. J. Isles, D. Kirkpatrick, and R. K. Brayton, “Verification using uninterpreted
functions and finite instantiations,” inFormal Methods in Computer-Aided Design, (Palo
Alto, CA), pp. 218–232, November 1996.
[38] P.-H. Ho, A. J. Isles, and T. Kam, “Formal verification ofpipeline control using controlled
token nets and abstract interpretation,” inIEEE/ACM International Conference on Computer-
Aided Design, (San Jose, CA), pp. 529–536, November 1998.
[39] V. Paruthi, N. Mansouri, and R. Vemuri, “Automatic datapath abstraction for verification of
large scale designs,” inIEEE International Conference on Computer Design, (Austin, TX),
pp. 192–194, October 1998.
[40] K. S. Namjoshi and R. P. Kurshan, “Syntactic program transformations for automatic ab-
straction,” inComputer-Aided Verification (CAV’00), (Chicago, IL), pp. 435–449, July 2000.
[41] O. Coudert and J. C. Madre, “A unified framework for the formal verification of sequential
circuits,” in IEEE International Conference on Computer-Aided Design, (Santa Clara, CA),
pp. 126–129, November 1990.
[42] I.-H. Moon, J.-Y. Jang, G. D. Hachtel, F. Somenzi, J. Yuan, and C. Pixley, “Approximate
reachability don’t cares for CTL model checking,” inIEEE/ACM International Conference
on Computer-Aided Design, (San Jose, CA), pp. 351–358, November 1998.
[43] I.-H. Moon, J. H. Kukula, K. Ravi, and F. Somenzi, “To split or to conjoin: the question
in image computation,” inACM/IEEE Design Automation Conference, (Los Angeles, CA),
pp. 23–28, June 2000.
[44] H. Jin, A. Kuehlmann, and F. Somenzi, “Fine-grain conjunction scheduling for symbolic
reachability analysis,” inTools and Algorithms for the Construction and Analysis of Systems,
(Grenoble, France), pp. 312–326, April 2002.
[45] E. M. Clarke, D. E. Long, and K. L. McMillan, “Compositional model checking,” inIEEE
Symposium on Logic in Computer Science, (Pacific Grove, CA), pp. 353–362, June 1989.
182
[46] R. Beers, R. Ghughal, and M. Aagaard, “Applications of hierarchical verification in model
checking,” inFormal Methods in Computer-Aided Design, (Austin, TX), November 2000.
[47] E. A. Emerson and A. P. Sistla, “Symmetry and model checking,” in Computer Aided Verifi-
cation (CAV’93), (Elounda, Greece), pp. 463–478, 1993.
[48] C. N. Ip and D. L. Dill, “Better verification through symmetry,” in Computer Hardware
Description Languages and their Applications, (Ottawa, Canada), pp. 97–111, 1993.
[49] G. S. Manku, R. Hojati, and R. K. Brayton, “Structural symmetry and model checking,” in
Computer-Aided Verification (CAV’98), (Vancouver, BC, Canada), pp. 159–171, July 1998.
[50] M. K. Ganai and A. Kuehlmann, “On-the-fly compression oflogical circuits,” in Interna-
tional Workshop on Logic & Synthesis, (Dana Point, CA), May 2000.
[51] A. Kuehlmann, M. K. Ganai, and V. Paruthi, “Circuit-based Boolean reasoning,” in
ACM/IEEE Design Automation Conference, (Las Vegas, NV), pp. 232–237, June 2001.
[52] Z. Kohavi,Switching and Finite Automata Theory. New York, NY: McGraw-Hill, 1978.
[53] R. E. Bryant, “Graph-based algorithms for Boolean function manipulation,”IEEE Transac-
tions on Computers, vol. C-35, pp. 677–691, August 1986.
[54] P. F. Williams, A. Biere, E. M. Clarke, and A. Gupta, “Combining decision diagrams and
SAT procedures for efficient symbolic model checking,” inComputer-Aided Verification
(CAV’00), (Chicago, IL), pp. 124–138, July 2000.
[55] K. L. McMillan, “Applying SAT methods in unbounded symbolic model checking,” in
Computer-Aided Verification (CAV’02), (Copenhagen, Denmark), pp. 250–264, July 2002.
[56] H. Cho, G. D. Hachtel, E. Macii, B. Pleisser, and F. Somenzi, “Algorithms for approximate
FSM traversal based on state space decomposition,”IEEE Transactions on Computer-Aided
Design, vol. 15, pp. 1465–1478, December 1996.
183
[57] A. Biere, A. Cimatti, E. M. Clarke, and Y. Zhu, “Symbolicmodel checking without BDDs,”
in Tools and Algorithms for Construction and Analysis of Systems, (Amsterdam, The Nether-
lands), pp. 193–207, March 1999.
[58] M. K. Ganai, A. Aziz, and A. Kuehlmann, “Enhancing simulation with BDDs and ATPG,”
in ACM/IEEE Design Automation Conference, (New Orleans, LA), pp. 385–390, June 1999.
[59] P.-H. Ho, T. Shiple, K. Harer, J. Kukula, R. Damiano, V. Bertacco, J. Taylor, and J. Long,
“Smart simulation using collaborative formal and simulation engines,” inIEEE/ACM Inter-
national Conference on Computer-Aided Design, (San Jose, CA), pp. 120–126, November
2000.
[60] D. Deharbe and A. M. Moreira, “Using induction and BDDs to model check invariants,”
in Correct Hardware Design and Verification Methods (CHARME’97), (Montreal, Canada),
pp. 203–213, October 1997.
[61] M. Sheeran, S. Singh, and G. Stalmarck, “Checking safety properties using induction and
a SAT-solver,” inFormal Methods in Computer-Aided Design, (Austin, TX), pp. 108–125,
November 2000.
[62] L. J. Stockmeyer and A. R. Meyer, “Word problems requiring exponential time,” inProceed-
ings of the 5th ACM Symposium on the Theory of Computing, (Austin, TX), pp. 1–9, April
1973.
[63] C.-C. Yen, K.-C. Chen, and J.-Y. Jou, “A practical approach to cycle bound estimation,” in
International Workshop on Logic & Synthesis, (New Orleans, LA), pp. 149–154, June 2002.
[64] R. K. Ranjan,Design and Implementation Verification of Finite State Systems. PhD thesis,
University of California at Berkeley, Berkeley, CA, December 1997.
[65] I. Beer, S. Ben-David, D. Geist, R. Gewirtzman, and M. Yoeli, “Methodology and system for
practical formal verification of reactive hardware,” inComputer-Aided Verification (CAV’94),
(Stanford, CA), pp. 182–193, July 1994.
184
[66] C. Leiserson and J. Saxe, “Retiming synchronous circuitry,” Algorithmica, vol. 6, pp. 5–35,
1991.
[67] G. P. Bischoff, K. S. Brace, S. Jain, and R. Razdan, “Formal implementation verification of
the bus interface unit for the Alpha 21264 microprocessor,”in IEEE International Conference
on Computer Design, (Austin, TX), pp. 16–24, October 1997.
[68] C. Leiserson and J. Saxe, “Optimizing synchronous systems,”Journal of VLSI and Computer
Systems, vol. 1, pp. 41–67, January 1983.
[69] S. Malik, E. M. Sentovich, R. K. Brayton, and A. Sangiovanni-Vincentelli, “Retiming and
resynthesis: Optimizing sequential networks with combinational techniques,”IEEE Trans-
actions on Computer-Aided Design, vol. 10, pp. 74–84, January 1991.
[70] S. Hassoun and C. Ebeling, “Experiments in the iterative application of resynthesis and re-
timing,” in International Workshop on Timing Issues in the Specification and Synthesis of
Digital Systems, December 1997.
[71] A. Gupta, P. Ashar, and S. Malik, “Exploiting retiming in a guided simulation based valida-
tion methodology,” inCorrect Hardware Design and Verification Methods (CHARME’99),
(Bad Herrenalb, Germany), pp. 350–353, September 1999.
[72] S. Hassoun and C. Ebeling, “Using precomputation in architecture and logic synthesis,” in
IEEE/ACM International Conference on Computer-Aided Design, (San Jose, CA), pp. 316–
323, November 1998.
[73] J. J. Forrest. Personal communication, 2000.
[74] G. G. Dantzig,Linear Programming and Extensions. Princeton University Press, 1963.
[75] J. B. Orlin, “A faster strongly polynomial minimum costflow algorithm,” in Proceedings of
the 20th ACM Symposium on the Theory of Computing, (Chicago, IL), pp. 377–387, May
1988.
[76] H. J. Touati and R. K. Brayton, “Computing the initial states of retimed circuits,”IEEE
Transactions on Computer-Aided Design, vol. 12, pp. 157–162, January 1993.
185
[77] G. Even, I. Y. Spillinger, and L. Stok, “Retiming revisited and reversed,”IEEE Transactions
on Computer-Aided Design, vol. 15, pp. 348–357, March 1996.
[78] G. Cabodi, S. Quer, and F. Somenzi, “Optimizing sequential verification by retiming transfor-
mations,” inACM/IEEE Design Automation Conference, (Los Angeles, CA), pp. 601–606,
June 2000.
[79] E. Lehman, Y. Watanabe, J. Grodstein, and H. Harkness, “Logic decomposition during tech-
nology mapping,”IEEE Transactions on Computer-Aided Design, vol. 16, pp. 813–834, Au-
gust 1997.
[80] G. D. Micheli, “Synchronous logic synthesis: Algorithms for cycle-time minimization,”
IEEE Transactions on Computer-Aided Design, vol. 10, pp. 63–73, January 1991.
[81] M. S. Hung, W. O. Rom, and A. D. Waren,Optimization with IBM OSL. Scientific Press,
1993.
[82] R. K. Brayton, G. D. Hachtel, A. Sangiovanni-Vincentelli, F. Somenzi, A. Aziz, S.-T. Cheng,
S. Edwards, S. Khatri, Y. Kukimoto, A. Pardo, S. Qadeer, R. K.Ranjan, S. Sarwary, T. R.
Shiple, G. Swamy, and T. Villa, “VIS: A system for verification and synthesis,” inComputer-
Aided Verification (CAV’96), (New Brunswick, NJ), pp. 428–432, July 1996.
[83] R. K. Ranjan, A. Aziz, R. K. Brayton, B. Plessier, and C. Pixley, “Efficient BDD algorithms
for FSM synthesis and verification,” inInternational Workshop on Logic & Synthesis, (Lake
Tahoe, NV), June 1995.
[84] I.-H. Moon, G. D. Hachtel, and F. Somenzi, “Border-block triangular form and conjunction
schedule in image computation,” inFormal Methods in Computer-Aided Design, (Austin,
TX), pp. 73–90, November 2000.
[85] A. Dovier, C. Piazza, and A. Polticriti, “A fast bisimulation algorithm,” inComputer-Aided
Verification (CAV’01), (Paris, France), pp. 79–90, July 2001.
[86] P. Jain and G. Gopalakrishnan, “Efficient symbolic simulation-based verification using the
186
parametric form of Boolean expressions,”IEEE Transactions on Computer-Aided Design,
vol. 13, pp. 1005–1015, April 1994.
[87] M. D. Aagaard, R. B. Jones, and C.-J. H. Seger, “Formal verification using parametric rep-
resentations of Boolean constraints,” inACM/IEEE Design Automation Conference, (New
Orleans, LA), pp. 402–407, June 1999.
[88] I.-H. Moon, H. H. Kwak, J. Kukula, T. Shiple, and C. Pixley, “Simplifying circuits for formal
verification using parametric representation,” inFormal Methods in Computer-Aided Design,
(Portland, OR), pp. 52–69, November 2002.
[89] J. H. Kukula and T. R. Shiple, “Building circuits from relations,” in Computer-Aided Verifi-
cation (CAV’00), (Chicago, IL), pp. 113–123, July 2000.
[90] L. R. Ford and D. R. Fulkerson, “Maximal flow through a network,” Canadian Journal of
Mathematics, vol. 8, pp. 399–404, 1956.
[91] J. Yuan, J. Shen, J. Abraham, and A. Aziz, “On combining formal and informal verification,”
in Computer-Aided Verification (CAV’97), (Haifa, Israel), pp. 376–387, June 1997.
[92] C. H. Yang and D. L. Dill, “Validation with guided searchof the state space,” inACM/IEEE
Design Automation Conference, (San Francisco, CA), pp. 599–604, June 1998.
[93] L. de Alfaro, T. A. Henzinger, and F. Y. C. Mang, “Detecting errors before reaching them,”
in Computer-Aided Verification (CAV’00), (Chicago, IL), pp. 186–201, July 2000.
[94] J. R. Burch, E. M. Clarke, D. E. Long, K. L. McMillan, and D. L. Dill, “Symbolic model
checking for sequential circuit verification,”IEEE Transactions on Computer-Aided Design,
vol. 13, pp. 401–424, April 1994.
[95] O. Coudert, C. Berthet, and J. C. Madre, “Verification ofsequential machines using Boolean
functional vectors,” inIMEC-IFIP International Workshop on Applied Formal Methods for
Correct VLSI Design, (Leuven, Belgium), pp. 111–128, November 1989.
[96] T. Filkorn, “Functional extension of symbolic model checking,” inComputer-Aided Verifica-
tion (CAV’91), (Aalborg, Denmark), pp. 225–232, June 1991.
187
[97] Y. Hong, P. A. Beerel, J. R. Burch, and K. L. McMillan, “Safe BDD minimization using
don’t cares,” inACM/IEEE Design Automation Conference, (Anaheim, CA), pp. 208–213,
June 1997.
[98] M. Ganai and A. Aziz, “Enhancements to invariant verification using SIVA,” inInternational
Workshop on High Level Design Validation and Test (HLDVT’99), (San Diego, CA), Novem-
ber 1999.
[99] M. K. Ganai,Algorithms for Efficient State Space Search. PhD thesis, University of Texas,
Austin, TX, May 2001.
[100] P. Ashar, S. Devadas, and K. Keutzer, “Gate-delay-fault testability properties of multiplexor-
based networks,”Formal Methods in System Design, vol. 2, no. 1, pp. 93–112, 1993.
[101] C. H. Yang and D. L. Dill, “Spotlight: Best-first searchof FSM state space,” inInternational
Workshop on High Level Design Validation and Test (HLDVT’96), (Oakland, CA), November
1996.
[102] P. Yalagandula, V. Singhal, and A. Aziz, “Automatic lighthouse generation for directed state
space search,” inDesign, Automation, and Test in Europe, (Paris, France), pp. 237–242,
March 2000.
[103] I. Beer, S. Ben-David, C. Eisner, and A. Landver, “RuleBase: an industry-oriented formal
verification tool,” inACM/IEEE Design Automation Conference, (Las Vegas, NV), pp. 655–
660, June 1996.
[104] R. Rudell, “Dynamic variable ordering for ordered binary decision diagrams,” inInterna-
tional Workshop on Logic & Synthesis, (Tahoe City, CA), May 1993.
[105] J. Cheriyan and S. N. Maheshwari, “Analysis of preflow push algorithms for maximum net-
work flow,” SIAM Journal on Computing, vol. 18, no. 6, pp. 1057–1086, 1989.
[106] G. Hasteer, A. Mathur, and P. Banerjee, “Efficient equivalence checking of multi-phase de-
signs using retiming,” inIEEE/ACM International Conference on Computer-Aided Design,
(San Jose, CA), pp. 557–562, November 1998.
188
[107] E. A. Emerson and J. Y. Halpern, “‘Sometimes’ and ‘not never’ revisited: on branching time
versus linear time temporal logic,”Journal of the ACM, vol. 33, no. 1, pp. 151–178, 1986.
[108] T. A. Henzinger, S. Qadeer, and S. K. Rajamani, “Assume-guarantee refinement between
different time scales,” inComputer-Aided Verification (CAV’99), (Trento, Italy), pp. 208–
221, July 1999.
[109] A. R. Albrecht and A. J. Hu, “Register transformationswith multiple clock domains,” in
Correct Hardware Design and Verification Methods (CHARME’01), (Livingston, Scotland),
pp. 126–139, September 2001.
[110] A. J. Hu, G. York, and D. L. Dill, “New techniques for efficient verification with implicitly
conjoined BDDs,” inACM/IEEE Design Automation Conference, (San Diego, CA), pp. 276–
282, June 1994.
189
Vita
Jason Raymond Baumgartner received his Bachelor of Sciencein Electrical Engineering
from the University of Florida in May 1995. He immediately joined IBM’s Server Group
in Austin, TX, becoming involved in hardware verification. He began graduate school in
the Computer Engineering program at the University of Texasat Austin in 1996, receiv-
ing his Master of Science in 1998. He immediately became captivated by algorithms and
mathematical logic, and their implications to formal verification. This interest led him
to begin deploying model checking technologies at IBM. His efforts have yielded hun-
dreds of complex design flaws, and helped to establish formalverification as an essential
complementary verification technique for emerging designs. His research is focused upon
automatic abstraction techniques to enable formal verification to scale to large and complex
industrial designs.
Permanent Address: 14936 Purslane Meadow Trail
Austin, TX 78728
This dissertation was typeset with LATEX 2"1 by the author.
1LATEX 2" is an extension of LATEX. LATEX is a collection of macros for TEX. TEX is a trademark of theAmerican Mathematical Society. Some of the macros used in formatting this dissertation were written byDinesh Das, Department of Computer Sciences, The University of Texas at Austin, and extended by BertKay and James A. Bednar.
190