synthesis : e w a fpga ^ tools lag behind industrycommercial xilinx fpgas, a comparison can be made...
TRANSCRIPT
Experimental Flow
(SYNTHESIS)
: EXAMINING WHERE ACADEMIC FPGA TOOLS LAG BEHIND INDUSTRY^
Photo credit: Leandro Amato – https://www.flickr.com/photos/grunge/14302660967/
Department of Computing, Department of Electrical and Electronic Engineering
Executive Summary: Through extending the open-source academic CAD suite Verilog-To-Routing (VTR) onto targetingcommercial Xilinx FPGAs, a comparison can be made with industrial tools that indicate theacademic delay gap is +31% at synthesis, +10% at packing & placement, and +15% at routing.
Introduction
FPGA research into architecture design, and on the corresponding computer-aided design (CAD) algorithms, is typically conducted on the academic VTR toolflow. Unfortunately, since these tools do not currently target commercial architectures, any CAD innovations that are made cannot be easily evaluated on physical silicon.
Key Challenge: Adding VTR support ...
… for the complex and irregular Xilinx Virtex-6 architecture.
FPGA Tile
x4
x4
x2
x4 L=2L=4
x4
L=16(bi-dir)
L=1
x4
x4
x4L=4
x7
L=16(bi-dir)
L=2
L=1
x5
x4
L=2
L=4
L=2
L=4
x4
x4L=4
L=2
x6
x2
N
E
S
W
x5
x4 x7x4
x4
x4
x4
SLICE x2
BLE
O6
FF
O6
O5
{O6,O5,COUT}
COUT
FF{O6,O5,AX}AQ
A
AMUX
A6:A1
AX
BLEBLEBLE x3
XORCY
XADDER
CIN3
5LUT
5LUT
Fracturable 6LUTA6
CIN3
CIN
MUXCY
Industrial Comparison Results
Flow
1 Flow
5
Geomean
OD
IN II fa
iled
OD
IN II fa
iled
Area:
AreaGap:+82%
(13%)
(6%)
(7%)
(4%)
Synthesis proportion
Runtime:
Runtime Gap: 3.5X(synthesis stage consumes
a diminishing proportionof total runtime ...)
+31%
+10%
+15%
Delay: (from Xilinx TRCE) Difference between flows 1 & 2
Synthesis Gap: 31%(… yet contributes largest gap!)
Pack&Place Gap: 10%
Routing Gap: 15%
Target FPGA: mid-range Xilinx Virtex-6 xc6vlx240t, with 150KLUTs.All results geometrically averaged over 10 different placement seeds.
Logic cluster: composed of four 6-input LUTs, fracturable into two 5-input LUTs followed by two flip-flops, with partially-hardened adder resources (XADDER) as well as a vertical carry-chain. RAMs and DSPs are also supported.
Routing architecture: On top of VTR's default routing model, Virtex-6 devices also contain a mix of bi- and unidirectional, diagonal and bent wires, amongst other features. This exact architecture is extracted from Torc and stitched into VTB.
Yosys is a Verilog synthesis tool gaining traction in the ASIC community with extensive support for Verilog-2005 as well as for BLIF and EDIF netlist formats
Current Limitations● Support for xc6vlx240t only, contributions welcome for other Xilinx architectures
(though due to ISE being deprecated, no path beyond Virtex-7 currently exists)● Incomplete support for all architectural features, e.g. register control (clock enable,
set-reset), distributed memory (SLICEM), wide multiplexers (MUXF7/F8), etc.
Future Work● Investigate outliers (e.g. area utilisation of the 'stereo2' benchmark) ● Improve routing runtime (in particular, two high fanout nets in the 'mcml' benchmark
were observed to consume 70% of the total routing runtime)● Examine the effect of increasing synthesis effort – for example with more
aggressive technology-mapping algorithms.
Conclusion
The finding that the synthesis stage of the academic VTR CAD suite consumes the least amount of total runtime (on average, 4%) yet contributes the largest delay gap (31%) across the three stages leads us to believe that not only should research focus on back-end tools such as VPR, but that opportunities also exist at the front-end, too.
Eddie [email protected]
Imperial College London, England
VTR-to-Bitstream v2.0 available from http://eddiehung.github.io
Grateful for support from the UK EPSRC (grants EP/I012036/1 & EP/I020357/1) and for equipment and license donations from Xilinx.
Acknowledgements:
The long term goal of this work is to bridge this divide, allowing researchers to use real parts in weird (but also wonderful!) ways.
One such way that is now accessible is to make a robust comparison between academic and industrial tools by targeting the same FPGA.
Theoretical FPGA
Architectures
ProprietaryCAD Tools
PhysicalFPGAs?
Academic
Industry
VTR/VPR
Main Contributions
i) VTR-to-Bitstream (VTB) v2.0 update – an open-source extension that improves front-end Verilog support by leveraging Yosys, as well as back-end support for timing-driven routing on Xilinx architectures. Available from: http://eddiehung.github.io
ii) Applied this extended toolflow to make fair and rigorous comparisons between the quality of results gained by academic and industrial offerings, by targeting an identical commercial device and analysing the outputs using an industrial static timing analysis tool.
Xilinx ISE
Verilog HDL
.edif
.bit .twr
Yosys – Synthesis (new) VTR
Flow 1 Flow 2 Flow 5Flow 3 Flow 4.ncd
.ncd
.blif
.blif
Vanilla ISE Vanilla VTR
.ncd
bitgen trce – STA
ABC – T.map
VPR – Route
Odin II – Synthesis
ABC – Tech. map
VPR – Route
VPR – Pack & Place
xdl2ncd
VTR-to-Bitstream v2
VPR – Pack & Place
ngdbuild – Merge
xst – Synthesis
DRC
xdl2ncd DRC xdl2ncd DRC
.blif
Priorwork
xdl2ncd
VTB v1
par
Flows proposed in this work
ArchitectureDescription
(new)(e.g. cluster model,
placement sites,routing model,
wire delays, etc.)
VTR-to-Bitstream v2 VTR-to-Bitstream v2
ngdbuild – Merge
map – Pack & Placemap – Pack & Place
par – Route par – Routepar – Route