design, synthesis and evaluation of heterogeneous fpga with mixed luts and macro-gates yu hu 1,...
TRANSCRIPT
![Page 1: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical](https://reader034.vdocument.in/reader034/viewer/2022042615/56649cca5503460f9499223b/html5/thumbnails/1.jpg)
Design, Synthesis and Evaluation of Heterogeneous FPGA
with Mixed LUTs and Macro-Gates
Yu Hu1, Satyaki Das2, Steve Trimberger2, and Lei He1
1. Electrical Engineering Dept., UCLA
2. Research Labs, Xilinx Inc.
Presented by Yu HuPresented by Yu Hu
Address comments to [email protected] comments to [email protected]
![Page 2: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical](https://reader034.vdocument.in/reader034/viewer/2022042615/56649cca5503460f9499223b/html5/thumbnails/2.jpg)
Outline
Introduction
Design of the Macro-gates
Synthesis for the Proposed FPGA Architecture
Comparison of Heterogeneous FPGA Architectures
Conclusions and Future Work
![Page 3: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical](https://reader034.vdocument.in/reader034/viewer/2022042615/56649cca5503460f9499223b/html5/thumbnails/3.jpg)
Heterogeneity in FPGA Architectures
Heterogeneity among SLICEs Programmable logic and routing Tiles are not identical
soft logic fabric [Kaviani, FPGA’96]] hard structures [Jamieson, FPL’05]
Dedicated hard structures e.g. DSP e.g memory block
Heterogeneity within a SLICE Programmable logic and routing Tiles (SLICEs) are identical Different logics exist within a SLICE
e.g. LUTs with different size [Cong, FPGA’99] e.g. mixed PLAs and LUTs [Cong, TODAES’05] e.g. mixed macro-gates and LUTs
(source: Jamieson@FPL’05)
![Page 4: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical](https://reader034.vdocument.in/reader034/viewer/2022042615/56649cca5503460f9499223b/html5/thumbnails/4.jpg)
Heterogeneous FPGA with Macro-Gates
There exists programmability and cost trade-off between LUTs and macrogates Xilinx V4 benefits from small gates (MUX2, XOR2) built in
SLICEs.
The benefit of wider macro-gates Effectiveness of the incorporation of wider logic functions (macro
gates) is not clear.
Our contributions Design a new FPGA architecture with mixed LUTs and macro-
gates Propose a new automatic synthesis flow for mapping a circuit to
the proposed FPGA architecture Evaluate the architecture and show that the proposed
architecture reduces delay and area by 16.5% and 30%, respective, compared to the LUT-only architecture.
![Page 5: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical](https://reader034.vdocument.in/reader034/viewer/2022042615/56649cca5503460f9499223b/html5/thumbnails/5.jpg)
Outline
Introduction
Design of the Macro-gates
Synthesis for the Proposed FPGA Architecture
Comparison of Heterogeneous FPGA Architectures
Conclusions and Future Work
![Page 6: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical](https://reader034.vdocument.in/reader034/viewer/2022042615/56649cca5503460f9499223b/html5/thumbnails/6.jpg)
Overview of Macro-Gate Design
Key problem Select the logic functions for the macro-gate
Problem formulation: Input: a set of training circuits, which have been
mapped to K-input LUTs Output: N K-input Boolean functions: f1 , … , fN Objective: Maximize the number of logics (in the
training circuit set) which can be implemented by f1 , … , fN
The proposed solution Ranking of the logic functions for a set of training
circuits
![Page 7: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical](https://reader034.vdocument.in/reader034/viewer/2022042615/56649cca5503460f9499223b/html5/thumbnails/7.jpg)
NPN-Class Diagram: Organization of Logics Canonical and efficient representation of all NPN classes
NPN-Equivalent: functional equivalency under inputs negation, permutation or output negation
E.g., f(a,b,c)=a+bc, g(a,b,c)=b’a+b’c
NPN-Cofactor relationship is indicated
DAG: easy to manipulate
It becomes impractical to compute for more than 6-input functions! Solution: Utilization NPN-Class Diagram
Level3: 3-inputLevel2: 2-input
Level1: 1-input
Level0: constant
Wider inputs
![Page 8: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical](https://reader034.vdocument.in/reader034/viewer/2022042615/56649cca5503460f9499223b/html5/thumbnails/8.jpg)
UND: Utilization NPN-Class Diagram UND is an DAG, sub-graph of NCD
Help for scoring and ranking functions ab’c’+a’bc’
ab’c’+a’bc’ / 1 / xx% abc
ab / 0 / xx%
a / 0 / xx%
ab’+a’b / 0 / xx%
-0- / 0 / xx%
abc/ 1 / xx%
ab’+a’b
a
Implementation capability
Appearance frequency
functionality
![Page 9: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical](https://reader034.vdocument.in/reader034/viewer/2022042615/56649cca5503460f9499223b/html5/thumbnails/9.jpg)
UND: Utilization NPN-Class Diagram
ab’c’+a’bc’
ab’c’+a’bc’ / 1 / xx% abc
ab / 0 / xx%
a / 0 / xx%
ab’+a’b / 0 / xx%
-0- / 0 / xx%
abc/ 1 / xx%
ab’+a’b
aab’+a’b / 1 / xx%
a / 1 / xx%
![Page 10: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical](https://reader034.vdocument.in/reader034/viewer/2022042615/56649cca5503460f9499223b/html5/thumbnails/10.jpg)
a / 1 / 25%
ab’+a’b / 1 / 50%
UND: Utilization NPN-Class Diagram
Calculate Implementation Capability
ab’c’+a’bc’
ab’c’+a’bc’ / 1 / 75% abc
ab / 0 / 25%
-0- / 0 / xx%
abc/ 1 / 50%
ab’+a’b
a
Fanout cone ofab’c+a’bc’
The topology property (DAG) of UND enables us to efficiently explore
different metrics for functionality ranking, e.g., utilization rate.
![Page 11: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical](https://reader034.vdocument.in/reader034/viewer/2022042615/56649cca5503460f9499223b/html5/thumbnails/11.jpg)
Recap: Overall Flow for Macro-Gate Design
Map with LUT-N
Extract logic functions
Generate UtilizationNPN Diagram
Calculate score For logic functions
Rank logic functions
ab’c’+a’bc’ / 1 / xx%
ab / 0 / xx%
a / 0 / xx%
ab’+a’b / 0 / xx%
-0- / 0 / xx%
abc/ 1 / xx%
ab’+a’b / 1 / xx%
a / 1 / xx%
F
f
g
d
e
h
b
a
c
LUTLUT
LUTLUT
LUTLUT
and2(3)
inv(1)
nand2(2)
000000100000000000000100000000000000100000000000000100000000000000100000000000000100000000000000
……
a / 1 / 25%
ab’+a’b / 1 / 50%
ab’c’+a’bc’ / 1 / 75%
ab / 0 / 25%
-0- / 0 / xx%
abc/ 1 / 50%
1+1*1/2=1.5
1
1*1/2=0.5
1+1*1/3=1.33 1+1*2/3+1*1/3=2
Best function: ab’c’+a’bc’
![Page 12: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical](https://reader034.vdocument.in/reader034/viewer/2022042615/56649cca5503460f9499223b/html5/thumbnails/12.jpg)
Proposed Macro-Gates and FPGA Architecture
For IWLS’05 benchmarks, the following four 6-input functions have the highest ranks GI1=a b c d e f (AND-6) GI2=a’ b’ c’ + b c f’ + b c’ d’ + b’ c e (MUX-4) GI3=a b' c d' e + b c e f + d e f GI4=a b' + a' c d' + b' c' + e' + f‘
It can implement over 50% of logic functions in IWLS’05 benchmarks.
The architecture of the proposed macro-gate and FPGA SLICE are
![Page 13: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical](https://reader034.vdocument.in/reader034/viewer/2022042615/56649cca5503460f9499223b/html5/thumbnails/13.jpg)
Outline
Design of the Embedded Macro-gates
Synthesis for the Proposed FPGA Architecture
Technology Mapping for Heterogeneous FPGAs
SAT-based Packing
Place and Routing
Comparison of Heterogeneous FPGA Architectures
Conclusions and Future Work
![Page 14: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical](https://reader034.vdocument.in/reader034/viewer/2022042615/56649cca5503460f9499223b/html5/thumbnails/14.jpg)
Functional & Structural Cut Enumeration
ab
d
zyx
c
w
a=(x+y)’b=y+wz
d=ab=(x+y)’(y+wz)=x’y’wz
Is x’v’wz in library?
4-input macro gate lib
000000100000000000000100000000000000100000000000000100000000000000100000000000000100000000000000
……Yes
Phase1:Enumerate and label cuts from PIs to Pos Check the feasibility of a cut w.r.t. the macro-gate
Phase2:Select best choice from POs to Pis
A general yet efficient solution is SAT based Boolean matching Exploiting Symmetry in SAT-Based Boolean Matching for Heterogeneous FPGA Technology
Mapping , Session 5C.1, ICCAD 07
![Page 15: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical](https://reader034.vdocument.in/reader034/viewer/2022042615/56649cca5503460f9499223b/html5/thumbnails/15.jpg)
Key in Technology Mapping: Balance Resource Utilization
Asymmetric architecture causes problem to resource utilization
Exclusively use of one logic resource leads to lots of unused fabric
Simple yet effective solution : Change LUT-MG ratio by adjusting their area weights. Precise calibration is hard to reach by this approach.
0
1000
2000
3000
4000
5000
6000
1:1 1:0.95 1:0.9 1:0.8 1:0.5 1:0.1
MG# LUT6#
Total# too large!
Hard to obtain precise calibrationObjective
architecture:LUT6:MacroGate6
=1:1
Best LUT-MG ratio= 1:1
LUT-MG ratio = LUT#/MG#
![Page 16: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical](https://reader034.vdocument.in/reader034/viewer/2022042615/56649cca5503460f9499223b/html5/thumbnails/16.jpg)
Post-Mapping Area Recovery (motivation example)
Given: Target architecture = LUT6 + MG6 LUT-MG ratio in target architecture = 1:1 LUT# < MG# in the mapped design Intrinsic delay (LUT6 : MG6) = 5:4
Objective: balance LUT MG number without increasing delay
LUT6
MG6
MG6
MG6
MG6
5 / 5
4 / 5
9 / 9
9 / 13
17 / 17
13 / 13
PI POMG6
MG68 / 9
![Page 17: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical](https://reader034.vdocument.in/reader034/viewer/2022042615/56649cca5503460f9499223b/html5/thumbnails/17.jpg)
Post-Mapping Area Recovery (motivation example)
Given: Target architecture = LUT6 + MG6 LUT-MG ratio in target architecture = 1:1 LUT# < MG# in the mapped design Intrinsic delay (LUT6 : MG6) = 5:4
Objective: balance LUT MG number without increasing delay
LUT6
MG6
MG6
MG6
5 / 5
4 / 5
9 / 9
10 / 13
17 / 17
13 / 13
PI POMG6
MG68 / 9
LUT6
![Page 18: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical](https://reader034.vdocument.in/reader034/viewer/2022042615/56649cca5503460f9499223b/html5/thumbnails/18.jpg)
Post-Mapping Area Recovery (motivation example)
Given: Target architecture = LUT6 + MG6 LUT-MG ratio in target architecture = 1:1 LUT# < MG# in the mapped design Intrinsic delay (LUT6 : MG6) = 5:4
Objective: balance LUT MG number without increasing delay
LUT6
MG6 MG6
5 / 5
5 / 5
9 / 9
10 / 13
18 / 17
14 / 13
PI POMG6
10 / 9
LUT6
LUT6 LUT6 Timing target violation!
Timing slack budgeting is necessary!
![Page 19: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical](https://reader034.vdocument.in/reader034/viewer/2022042615/56649cca5503460f9499223b/html5/thumbnails/19.jpg)
Post Mapping Area Recovery by Timing Budgeting
Formulated as an Integer Linear Programming (ILP) Problem
Objective (minimize gap between target and actual LUT-MG ratios): min |m2+…+m7-7/2|
Arrival time constraints: ai+dj+bj<=aj
Clock period target: ai<=17
LUT assignment with given timing slack: (5-4)*mj<=bj, mj={0,1}
LUT6
MG6 MG6
a1
a6
a5
a2
a3
a4
PI POMG6
a7
MG6
MG6 MG6
Easy to be generalized to handle arch with multiple macro gates with different input pin numbers
![Page 20: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical](https://reader034.vdocument.in/reader034/viewer/2022042615/56649cca5503460f9499223b/html5/thumbnails/20.jpg)
Outline
Design of the Embedded Macro-gates
Synthesis for the Proposed FPGA Architecture
Technology Mapping for Heterogeneous FPGAs
SAT-based Packing
Comparison of Heterogeneous FPGA Architectures
Conclusions and Future Work
![Page 21: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical](https://reader034.vdocument.in/reader034/viewer/2022042615/56649cca5503460f9499223b/html5/thumbnails/21.jpg)
SAT-Based Packing
Motivation Traditional packing tools, e.g., T-VPack, hard-codes the architecture
specification of a SLICEs…. Re-impalement from scratch when architecture changes
Propose a unified implementation of the packers for different architectures: easy to perform architecture exploration!
The architecture dependent sub-problem in packing Structural feasibility checking for a sub-circuit to the SLICE
Solution Solve the problem of validating SLICE packing as a local
place&route problem A SAT solver is used to carry out the validation checking
![Page 22: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical](https://reader034.vdocument.in/reader034/viewer/2022042615/56649cca5503460f9499223b/html5/thumbnails/22.jpg)
Example of SAT-Based SLICE Packing Examples of constraints: (for each classes of constraint…)
Placement and routing choice variables: X@A, X@B, U5@N10
Exclusively constraint: (¬X@A) (¬X@B)∨
Presence constraint: (X@A) (¬X@B)∨
Input/Output constraint: X@A → U5@N10
Routing constraint: G0 →out U∧ 5@N10) → U5@N12
![Page 23: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical](https://reader034.vdocument.in/reader034/viewer/2022042615/56649cca5503460f9499223b/html5/thumbnails/23.jpg)
Recap: Overall Synthesis Flow
Area weightSetting
Cut-based Mapping
Area-BalanceTrade-off?
Y
NPost-mappingArea recovery
LUT6
MG6
MG6
MG6
MG6
MG6
MG6
LUT6
MG6
MG6
MG6
MG6LUT6
LUT6
MG6
MG6
MG6
LUT6
LUT6
packing
F
f
g
d
e
h
b
a
c
LUTLUT
LUTLUT
LUTLUT
LUT
LUT
LUT
![Page 24: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical](https://reader034.vdocument.in/reader034/viewer/2022042615/56649cca5503460f9499223b/html5/thumbnails/24.jpg)
Outline
Motivation and Objectives
Methodology for Logic Function Exploration
Technology Mapping for Heterogeneous FPGAs
Evaluation of Heterogeneous FPGA Architectures
Conclusions and Future Work
![Page 25: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical](https://reader034.vdocument.in/reader034/viewer/2022042615/56649cca5503460f9499223b/html5/thumbnails/25.jpg)
Experimental Setting Design library parameters [Cong, TODAES’05]
Benchmark set: IWLS 2005
Four architectures are compared: LUT4, LUT4 + macro gate, LUT6, and LUT6 + macro gate Synthesize the proposed macro-gate by SIS1.2 Delay and area model
Interconnect delay is igonired
![Page 26: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical](https://reader034.vdocument.in/reader034/viewer/2022042615/56649cca5503460f9499223b/html5/thumbnails/26.jpg)
Delay Comparisons
Compared to LUT4, LUT4+MG reduces both logic depth and delay by 9.2%.
Compared to LUT6, LUT6+MG reduces delay by 30% while increasing logic depth by 36.5%. A LUT6 can implement more logics than a macro-gate
Logic depth
7.867.14
5.48
7.48
0
2
4
6
8
10
delay
7.86 7.14
10.959.14
02468
1012
![Page 27: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical](https://reader034.vdocument.in/reader034/viewer/2022042615/56649cca5503460f9499223b/html5/thumbnails/27.jpg)
Logic Area Comparisons
Compared to LUT4, LUT4+MG reduces logic area by 12.5%.
Compared to LUT6, LUT6+MG reduces logic area by 16.9%.
PLB#
6406
29853711
2142
01000200030004000500060007000
Area
7346 6408
16816
11849
02000400060008000
1000012000140001600018000
![Page 28: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical](https://reader034.vdocument.in/reader034/viewer/2022042615/56649cca5503460f9499223b/html5/thumbnails/28.jpg)
Outline
Motivation and Objectives
Methodology for Logic Function Exploration
Technology Mapping for Heterogeneous FPGAs
Comparison of Heterogeneous FPGA Architectures
Conclusions and Future Work
![Page 29: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical](https://reader034.vdocument.in/reader034/viewer/2022042615/56649cca5503460f9499223b/html5/thumbnails/29.jpg)
Conclusions
Conclusions A novel FPGA architecture with the mixed LUTs and macro-
gates is proposed A synthesis flow for the proposed architecture is implemented The preliminary experimental results show the effectiveness of
the proposed architecture for the area and delay reduction
Future Work Perform the physical design for the synthesized circuits and
compare the routing costs, architecture evaluation considering interconnect delay
Study the effectiveness of the power reduction for the proposed architecture
Macro-gates with wider inputs will be examined