Routing Wire Optimizationthrough Generic Synthesis on FPGA Carry
Hadi P. Afshar
Joint work with: Grace Zgheib, Philip Brisk and Paolo Ienne
2
FPGAs and ASICs Gaps*
• Performance– Ratio: 3-4
• Area– Ratio: 20-35
• Power– Ratio: 7-15
*I. Kuon and J. Rose, "Measuring the gap between FPGAs and ASICs“, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 26, NO. 2, FEBRUARY 2007, pp. 203 – 215.
Routing resources consume ≈60-80% of the chip area and are significant contributors to circuit delay.
Concerns:✘ Lack of generality and flexibility✘ Underutilization✘ Change in routing structure
How to narrow the gap? Specialized (DSP) blocks Coarser grained logic blocks Hard-wired connections
3
Carry Chains
4-LUT
4-LUT
4-LUT
4-LUT
+
+
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
8 In
puts
4
Motivation Example
5
Problem DefinitionLUT Mapped Flow Graph
Step1: Logic Matching
Step2: Chaining
6
Logic Matching
• Step1: Enumeration of Programmable Part• Step2: Identifying regular and independent
segments • Step3: Developing alphabet library of the
macro cell• Step4: Mask division and library matching
B
LUT
LUT
+
A Cin
Cout
7
Logic Matching (Example)
• Step1: Enumerationi3 i2 i1 i0 LUT1 LUT2
0 0 0 0 A0 B0
0 0 0 1 A0 B1
0 0 1 0 A1 B0
0 0 1 1 A1 B1
0 1 0 0 A2 B2
0 1 0 1 A2 B3
0 1 1 0 A3 B2
0 1 1 1 A3 B3
1 0 0 0 A4 B4
1 0 0 1 A4 B5
1 0 1 0 A5 B4
1 0 1 1 A5 B5
1 1 0 0 A6 B6
1 1 0 1 A6 B7
1 1 1 0 A7 B6
1 1 1 1 A7 B7
8
Logic Matching (Example)
• Step2: Regular and Independent Segmentsi3 i2 i1 i0 LUT1 LUT2
0 0 0 0 A0 B0
0 0 0 1 A0 B1
0 0 1 0 A1 B0
0 0 1 1 A1 B1
0 1 0 0 A2 B2
0 1 0 1 A2 B3
0 1 1 0 A3 B2
0 1 1 1 A3 B3
1 0 0 0 A4 B4
1 0 0 1 A4 B5
1 0 1 0 A5 B4
1 0 1 1 A5 B5
1 1 0 0 A6 B6
1 1 0 1 A6 B7
1 1 1 0 A7 B6
1 1 1 1 A7 B7
9
Logic Matching (Example)
• Step3: Alphabet library of the cell
LUT1 LUT2 Cin 8-bit alphabets of configuration mask dictionaryA0 B0 0 0 0 0 0 0 …A0 B1 0 0 0 0 0 0 …A1 B0 0 0 0 0 0 0 …A1 B1 0 0 0 0 0 0 …A0 B0 1 0 1 0 1 1 …A0 B1 1 0 1 0 1 0 …A1 B0 1 0 0 1 1 1 …A1 B1 1 0 0 1 1 0 …
A0 = 0A1 = 0B0 = 0 B1 = 0
A0 = 1A1 = 0B0 = 0 B1 = 0
A0 = 0A1 = 1B0 = 0 B1 = 0
A0 = 1A1 = 1B0 = 0 B1 = 0
A0 = 0A1 = 0B0 = 1 B1 = 0
10
Logic Matching (Example)
• Step4: Mask segmented matching
8-bit 8-bit 8-bit 8-bit
Library
How much we gain?
• Assume that mask is 32-bit
– N Segments
– M Patterns in each segment
– Our Library Size = Bits
– Num of all configurations =
11
32.MN
32.NMN
Order of magnitudes less memory Order of magnitudes less comparisons
12
Chaining HeuristicInput
Output
1 2
3
4
5
2 0
5
1
Input
Output
2
0
1
1
Input
Output
We need to find chains of functions, which are mappable to the macrocell, to be placed on the carry chains
Synthesis and Chaining ResultsBenchmark Chainable Chained Max Chain
LengthAverage Chain
Lengthalu4 74% 39% 4 3.5
pdc 69% 35% 6 3.9
misex3 68% 42% 4 3.1
ex1010 71% 41% 5 3.4
ex5p 72% 40% 4 3.5
des* 65% 31% 3 3.0
apex2 73% 42% 4 3.6
apex4 75% 39% 4 3.7
spla 72% 43% 6 4.2
seq 69% 38% 4 3.4
Average 70% 39% 4.4 3.5
13* The minimum threshold for the chain length is 4, except for “des” which is 3.
14
Experimental MethodologyGoal: Extract chains of eligible functions from the synthesized netlist in order to place them on the logic chains; the non-chained ones are remained unchanged.
Our SynthesisEngine
Chain HeuristicLogic Matching
Chain HeuristicChaining Heuristic
Netlist GenerationNetlist Generation
DAG GenerationVQM Parser DAG Generation
Synthesis and LUT MappingQuartus-II LUT Mapping & Syn
Place and RouteQuartus-II Place & Route
15
Local Routing Wires26% saving in local wires number
16
Total Wire Lengths
9% saving in total wire lengths
17
Delay3% delay penalty due to large in-out delay of the adder
18
Conclusion
Narrow the FPGA and ASIC Gaps
Lighten the stress on routing resources
Hardwired connections + Dedicated logic
Improved Routability with a Lighter Network