good programming practices for building less memory-intensive eda applications alan mishchenko...

Good Programming Practices for Good Programming Practices for

Building Less Memory-Intensive Building Less Memory-Intensive

EDA ApplicationsEDA Applications

Alan MishchenkoAlan Mishchenko

University of California, BerkeleyUniversity of California, Berkeley

22

OutlineOutline IntroductionIntroduction

What is special about programming for EDAWhat is special about programming for EDA Why much of industrial code is not efficientWhy much of industrial code is not efficient Why saving memory also saves runtimeWhy saving memory also saves runtime When to optimize for memoryWhen to optimize for memory Simplicity winsSimplicity wins

Suggestions for improvementSuggestions for improvement Design custom data-structuresDesign custom data-structures Store objects in a topological orderStore objects in a topological order Make fanout representation optionalMake fanout representation optional Use 4-byte integers instead of 8-byte pointersUse 4-byte integers instead of 8-byte pointers Never use linked listsNever use linked lists

ConclusionsConclusions

33

EDA ProgrammingEDA Programming Programming for EDA is different from Programming for EDA is different from

programming for the webprogramming for the web programming databases, etcprogramming databases, etc

EDA deals with EDA deals with Very complex computations (NP-hard problems)Very complex computations (NP-hard problems) Very large datasets (designs with 100M+ objects)Very large datasets (designs with 100M+ objects)

Programming for EDA requires knowledge of Programming for EDA requires knowledge of algorithms/data-structures and careful hand-algorithms/data-structures and careful hand-crafting of efficient solutionscrafting of efficient solutions

Finding an efficient solution is often the result of Finding an efficient solution is often the result of a laborious and time-consuming trial-and-errora laborious and time-consuming trial-and-error

44

Why Industrial Code Is Often Bad Why Industrial Code Is Often Bad

Heritage codeHeritage code Designed long ago by somebody who did not know or Designed long ago by somebody who did not know or

did not care or bothdid not care or both

Overdesigned codeOverdesigned code Designed for the most general case, which rarely or Designed for the most general case, which rarely or

never happensnever happens

Underdesigned codeUnderdesigned code Designed for small netlists, while the size of a typical Designed for small netlists, while the size of a typical

netlist doubles every few years, making scalability an netlist doubles every few years, making scalability an elusive targetelusive target

55

Less Memory = Less RuntimeLess Memory = Less Runtime

Although not true in general, in most EDA Although not true in general, in most EDA applications dealing with large datasets, applications dealing with large datasets, smaller memory results in faster codesmaller memory results in faster codeBecause most of the EDA computations are Because most of the EDA computations are

memory intensive, the effect of CPU cache memory intensive, the effect of CPU cache misses determines their runtimemisses determines their runtime

Keep this in mind when designing new Keep this in mind when designing new data-structuresdata-structures

66

When to Optimize Memory?When to Optimize Memory?

Optimize memory if we store Optimize memory if we store manymany similar similar entries (nodes in a graph, timing objects, entries (nodes in a graph, timing objects, placement locations, etc)placement locations, etc)For example, when designing a netlist, which For example, when designing a netlist, which

typically stores millions of individual objects, typically stores millions of individual objects, the object data-structure is very importantthe object data-structure is very important

However, if only a few instances of a netlist However, if only a few instances of a netlist are used at the same time, the netlist data-are used at the same time, the netlist data-structure is less importantstructure is less important

77

Design Custom Data-StructuresDesign Custom Data-Structures

Figure out what is needed in each application Figure out what is needed in each application and design a custom data-structureand design a custom data-structure The lowest possible memory usage The lowest possible memory usage The fastest possible runtimeThe fastest possible runtime Simpler and cleaner codeSimpler and cleaner code Often good data-structures can be reused elsewhereOften good data-structures can be reused elsewhere Translation to and from a custom data-structure Translation to and from a custom data-structure

rarely takes more than 3% of runtimerarely takes more than 3% of runtime Example: In a typical synthesis/mapping Example: In a typical synthesis/mapping

application, it is enough to have ‘node’ and application, it is enough to have ‘node’ and there is no need for ‘net’, ‘edge’, ‘pin’, etcthere is no need for ‘net’, ‘edge’, ‘pin’, etc

88

Store Objects In a Topo OrderStore Objects In a Topo Order

Topological orderTopological order When fanins (incoming edges) of a node precede the node itselfWhen fanins (incoming edges) of a node precede the node itself

Using topological order makes it unnecessary to Using topological order makes it unnecessary to recompute it when performing local or global changesrecompute it when performing local or global changes Saves runtimeSaves runtime

Using topological order reduces CPU cache misses, Using topological order reduces CPU cache misses, which occur when computation jumps all over memorywhich occur when computation jumps all over memory Saves runtimeSaves runtime

It is best to have a specialized procedure or command to It is best to have a specialized procedure or command to establish a topo order of the network (graph, etc)establish a topo order of the network (graph, etc)

99

Fanout RepresentationFanout Representation Traditionally, each object (node) in a netlist has Traditionally, each object (node) in a netlist has

both fanins (incoming edges) and fanouts both fanins (incoming edges) and fanouts (outgoing edges)(outgoing edges)

In most applications, only fanins are enoughIn most applications, only fanins are enough Reduces memory ~2xReduces memory ~2x Reduces runtimeReduces runtime

Fanouts can be computed on demandFanouts can be computed on demand Exercise: Implement computation of required times of Exercise: Implement computation of required times of

all nodes in a combinational netlist without fanoutsall nodes in a combinational netlist without fanouts If many cases, it’s enough to have “static fanout”If many cases, it’s enough to have “static fanout”

If netlist is fixed, fanouts are never added/removedIf netlist is fixed, fanouts are never added/removed

1010

Use Integers Instead of PointersUse Integers Instead of Pointers

In the old days, integer (In the old days, integer (intint) and pointer () and pointer (void *void *) ) used the same amount of memory (4 bytes)used the same amount of memory (4 bytes)

In recently years, most of the EDA companies In recently years, most of the EDA companies and their customers switched to using 64-bitsand their customers switched to using 64-bits One pointers now takes 8 bytes!One pointers now takes 8 bytes! However, most of the code uses a lot of pointersHowever, most of the code uses a lot of pointers This leads to a 2x memory increase for no reasonThis leads to a 2x memory increase for no reason

Suggestion: Design your code to store attributes Suggestion: Design your code to store attributes of objects as integers, rather than as pointersof objects as integers, rather than as pointers

1111

Avoiding Pointers (example)Avoiding Pointers (example)

Node points to its faninsNode points to its fanins Fanins can be integer IDs, instead of pointersFanins can be integer IDs, instead of pointers Instead of a linked list of node pointers, use an array of Instead of a linked list of node pointers, use an array of

integer IDsinteger IDs

A linked list uses at least 6x more memoryA linked list uses at least 6x more memory Iterating through a linked list is slowerIterating through a linked list is slower

1212

Integer IDs for Indexing AttributesInteger IDs for Indexing Attributes Each node in the netlist can have an integer IDEach node in the netlist can have an integer ID The node structure can be as simple as possibleThe node structure can be as simple as possible struct Node {struct Node { int ID;int ID; int nFanins;int nFanins; int * pFanins;int * pFanins; };}; Any attribute of the node can be represented as an entry Any attribute of the node can be represented as an entry

in the array with node’s ID used as an indexin the array with node’s ID used as an index Vec<int> Type;Vec<int> Type; Vec<int> Level;Vec<int> Level; Vec<float> Slack;Vec<float> Slack; Attributes can be allocated/freed on demand, which Attributes can be allocated/freed on demand, which

helps control memory usagehelps control memory usage Light-weight basic data-structure makes often-used Light-weight basic data-structure makes often-used

computations (such as traversals) very fastcomputations (such as traversals) very fast

1313

Avoid Linked ListsAvoid Linked Lists

Each link, in addition to user’s Each link, in addition to user’s datadata, has , has previousprevious and and nextnext fields fields Potentially Potentially 3x3x increase in memory usage increase in memory usage

Most of linked lists use pointersMost of linked lists use pointers Potentially Potentially 2x2x increase in memory usage increase in memory usage

Other drawbacksOther drawbacks Allocating numerous links leads to memory Allocating numerous links leads to memory

fragmentationfragmentation Most data-structures can be efficiently implemented Most data-structures can be efficiently implemented

without linked listswithout linked lists

1414

Simplicity WinsSimplicity Wins

Whenever possible keep data-structures Whenever possible keep data-structures simple and light-weightsimple and light-weight It is better to have on-demand attributes It is better to have on-demand attributes

associated with objects, rather than an overly associated with objects, rather than an overly complex object data-structurecomplex object data-structure

1515

Case Study: Storage for Many Case Study: Storage for Many Similar EntriesSimilar Entries

Same-size entriesSame-size entries (for example, AIG or BDD nodes) are (for example, AIG or BDD nodes) are best stored in an arraybest stored in an array Node’s index is the place in the array where the node is storedNode’s index is the place in the array where the node is stored

Different-size entriesDifferent-size entries (for example, nodes in a logic (for example, nodes in a logic network) are best stored in a custom memory managernetwork) are best stored in a custom memory manager Manager allocates memory in pages (e.g. 1MB / page)Manager allocates memory in pages (e.g. 1MB / page) Each page can store entries of different sizeEach page can store entries of different size Each entry is assigned an integer number (called ID)Each entry is assigned an integer number (called ID) There is a vector mapping IDs into pointers to memory for each There is a vector mapping IDs into pointers to memory for each

objectobject

1616

ConclusionConclusion Reviewed several reasons for inefficient memory Reviewed several reasons for inefficient memory

usage in industrial codeusage in industrial code Offered several suggestions and good coding Offered several suggestions and good coding

practicespractices Gave a vow to think carefully about memory Gave a vow to think carefully about memory

when designing new data-structureswhen designing new data-structures

good programming practices for building less memory-intensive eda applications alan mishchenko...

Documents