good programming practices for building less memory-intensive eda applications alan mishchenko...
TRANSCRIPT
Good Programming Practices for Good Programming Practices for
Building Less Memory-Intensive Building Less Memory-Intensive
EDA ApplicationsEDA Applications
Alan MishchenkoAlan Mishchenko
University of California, BerkeleyUniversity of California, Berkeley
22
OutlineOutline IntroductionIntroduction
What is special about programming for EDAWhat is special about programming for EDA Why much of industrial code is not efficientWhy much of industrial code is not efficient Why saving memory also saves runtimeWhy saving memory also saves runtime When to optimize for memoryWhen to optimize for memory Simplicity winsSimplicity wins
Suggestions for improvementSuggestions for improvement Design custom data-structuresDesign custom data-structures Store objects in a topological orderStore objects in a topological order Make fanout representation optionalMake fanout representation optional Use 4-byte integers instead of 8-byte pointersUse 4-byte integers instead of 8-byte pointers Never use linked listsNever use linked lists
ConclusionsConclusions
33
EDA ProgrammingEDA Programming Programming for EDA is different from Programming for EDA is different from
programming for the webprogramming for the web programming databases, etcprogramming databases, etc
EDA deals with EDA deals with Very complex computations (NP-hard problems)Very complex computations (NP-hard problems) Very large datasets (designs with 100M+ objects)Very large datasets (designs with 100M+ objects)
Programming for EDA requires knowledge of Programming for EDA requires knowledge of algorithms/data-structures and careful hand-algorithms/data-structures and careful hand-crafting of efficient solutionscrafting of efficient solutions
Finding an efficient solution is often the result of Finding an efficient solution is often the result of a laborious and time-consuming trial-and-errora laborious and time-consuming trial-and-error
44
Why Industrial Code Is Often Bad Why Industrial Code Is Often Bad
Heritage codeHeritage code Designed long ago by somebody who did not know or Designed long ago by somebody who did not know or
did not care or bothdid not care or both
Overdesigned codeOverdesigned code Designed for the most general case, which rarely or Designed for the most general case, which rarely or
never happensnever happens
Underdesigned codeUnderdesigned code Designed for small netlists, while the size of a typical Designed for small netlists, while the size of a typical
netlist doubles every few years, making scalability an netlist doubles every few years, making scalability an elusive targetelusive target
55
Less Memory = Less RuntimeLess Memory = Less Runtime
Although not true in general, in most EDA Although not true in general, in most EDA applications dealing with large datasets, applications dealing with large datasets, smaller memory results in faster codesmaller memory results in faster codeBecause most of the EDA computations are Because most of the EDA computations are
memory intensive, the effect of CPU cache memory intensive, the effect of CPU cache misses determines their runtimemisses determines their runtime
Keep this in mind when designing new Keep this in mind when designing new data-structuresdata-structures
66
When to Optimize Memory?When to Optimize Memory?
Optimize memory if we store Optimize memory if we store manymany similar similar entries (nodes in a graph, timing objects, entries (nodes in a graph, timing objects, placement locations, etc)placement locations, etc)For example, when designing a netlist, which For example, when designing a netlist, which
typically stores millions of individual objects, typically stores millions of individual objects, the object data-structure is very importantthe object data-structure is very important
However, if only a few instances of a netlist However, if only a few instances of a netlist are used at the same time, the netlist data-are used at the same time, the netlist data-structure is less importantstructure is less important
77
Design Custom Data-StructuresDesign Custom Data-Structures
Figure out what is needed in each application Figure out what is needed in each application and design a custom data-structureand design a custom data-structure The lowest possible memory usage The lowest possible memory usage The fastest possible runtimeThe fastest possible runtime Simpler and cleaner codeSimpler and cleaner code Often good data-structures can be reused elsewhereOften good data-structures can be reused elsewhere Translation to and from a custom data-structure Translation to and from a custom data-structure
rarely takes more than 3% of runtimerarely takes more than 3% of runtime Example: In a typical synthesis/mapping Example: In a typical synthesis/mapping
application, it is enough to have ‘node’ and application, it is enough to have ‘node’ and there is no need for ‘net’, ‘edge’, ‘pin’, etcthere is no need for ‘net’, ‘edge’, ‘pin’, etc
88
Store Objects In a Topo OrderStore Objects In a Topo Order
Topological orderTopological order When fanins (incoming edges) of a node precede the node itselfWhen fanins (incoming edges) of a node precede the node itself
Using topological order makes it unnecessary to Using topological order makes it unnecessary to recompute it when performing local or global changesrecompute it when performing local or global changes Saves runtimeSaves runtime
Using topological order reduces CPU cache misses, Using topological order reduces CPU cache misses, which occur when computation jumps all over memorywhich occur when computation jumps all over memory Saves runtimeSaves runtime
It is best to have a specialized procedure or command to It is best to have a specialized procedure or command to establish a topo order of the network (graph, etc)establish a topo order of the network (graph, etc)
99
Fanout RepresentationFanout Representation Traditionally, each object (node) in a netlist has Traditionally, each object (node) in a netlist has
both fanins (incoming edges) and fanouts both fanins (incoming edges) and fanouts (outgoing edges)(outgoing edges)
In most applications, only fanins are enoughIn most applications, only fanins are enough Reduces memory ~2xReduces memory ~2x Reduces runtimeReduces runtime
Fanouts can be computed on demandFanouts can be computed on demand Exercise: Implement computation of required times of Exercise: Implement computation of required times of
all nodes in a combinational netlist without fanoutsall nodes in a combinational netlist without fanouts If many cases, it’s enough to have “static fanout”If many cases, it’s enough to have “static fanout”
If netlist is fixed, fanouts are never added/removedIf netlist is fixed, fanouts are never added/removed
1010
Use Integers Instead of PointersUse Integers Instead of Pointers
In the old days, integer (In the old days, integer (intint) and pointer () and pointer (void *void *) ) used the same amount of memory (4 bytes)used the same amount of memory (4 bytes)
In recently years, most of the EDA companies In recently years, most of the EDA companies and their customers switched to using 64-bitsand their customers switched to using 64-bits One pointers now takes 8 bytes!One pointers now takes 8 bytes! However, most of the code uses a lot of pointersHowever, most of the code uses a lot of pointers This leads to a 2x memory increase for no reasonThis leads to a 2x memory increase for no reason
Suggestion: Design your code to store attributes Suggestion: Design your code to store attributes of objects as integers, rather than as pointersof objects as integers, rather than as pointers
1111
Avoiding Pointers (example)Avoiding Pointers (example)
Node points to its faninsNode points to its fanins Fanins can be integer IDs, instead of pointersFanins can be integer IDs, instead of pointers Instead of a linked list of node pointers, use an array of Instead of a linked list of node pointers, use an array of
integer IDsinteger IDs
A linked list uses at least 6x more memoryA linked list uses at least 6x more memory Iterating through a linked list is slowerIterating through a linked list is slower
1212
Integer IDs for Indexing AttributesInteger IDs for Indexing Attributes Each node in the netlist can have an integer IDEach node in the netlist can have an integer ID The node structure can be as simple as possibleThe node structure can be as simple as possible struct Node {struct Node { int ID;int ID; int nFanins;int nFanins; int * pFanins;int * pFanins; };}; Any attribute of the node can be represented as an entry Any attribute of the node can be represented as an entry
in the array with node’s ID used as an indexin the array with node’s ID used as an index Vec<int> Type;Vec<int> Type; Vec<int> Level;Vec<int> Level; Vec<float> Slack;Vec<float> Slack; Attributes can be allocated/freed on demand, which Attributes can be allocated/freed on demand, which
helps control memory usagehelps control memory usage Light-weight basic data-structure makes often-used Light-weight basic data-structure makes often-used
computations (such as traversals) very fastcomputations (such as traversals) very fast
1313
Avoid Linked ListsAvoid Linked Lists
Each link, in addition to user’s Each link, in addition to user’s datadata, has , has previousprevious and and nextnext fields fields Potentially Potentially 3x3x increase in memory usage increase in memory usage
Most of linked lists use pointersMost of linked lists use pointers Potentially Potentially 2x2x increase in memory usage increase in memory usage
Other drawbacksOther drawbacks Allocating numerous links leads to memory Allocating numerous links leads to memory
fragmentationfragmentation Most data-structures can be efficiently implemented Most data-structures can be efficiently implemented
without linked listswithout linked lists
1414
Simplicity WinsSimplicity Wins
Whenever possible keep data-structures Whenever possible keep data-structures simple and light-weightsimple and light-weight It is better to have on-demand attributes It is better to have on-demand attributes
associated with objects, rather than an overly associated with objects, rather than an overly complex object data-structurecomplex object data-structure
1515
Case Study: Storage for Many Case Study: Storage for Many Similar EntriesSimilar Entries
Same-size entriesSame-size entries (for example, AIG or BDD nodes) are (for example, AIG or BDD nodes) are best stored in an arraybest stored in an array Node’s index is the place in the array where the node is storedNode’s index is the place in the array where the node is stored
Different-size entriesDifferent-size entries (for example, nodes in a logic (for example, nodes in a logic network) are best stored in a custom memory managernetwork) are best stored in a custom memory manager Manager allocates memory in pages (e.g. 1MB / page)Manager allocates memory in pages (e.g. 1MB / page) Each page can store entries of different sizeEach page can store entries of different size Each entry is assigned an integer number (called ID)Each entry is assigned an integer number (called ID) There is a vector mapping IDs into pointers to memory for each There is a vector mapping IDs into pointers to memory for each
objectobject
1616
ConclusionConclusion Reviewed several reasons for inefficient memory Reviewed several reasons for inefficient memory
usage in industrial codeusage in industrial code Offered several suggestions and good coding Offered several suggestions and good coding
practicespractices Gave a vow to think carefully about memory Gave a vow to think carefully about memory
when designing new data-structureswhen designing new data-structures