integrated tool suite for post synthesis fpga power consumption analysis
DESCRIPTION
Integrated Tool Suite for Post Synthesis FPGA Power Consumption Analysis. Matthew French, Li Wang University of Southern California, Information Sciences Institute Tyler Anderson, Michael Wirthlin Brigham Young University. Internal Power Consumption. Number of F-F’s. Power (mW). - PowerPoint PPT PresentationTRANSCRIPT
Slide 1French 207 MAPLD 2005
Integrated Tool Suite for Post Synthesis FPGA Power Consumption
Analysis
Matthew French, Li WangUniversity of Southern California, Information Sciences Institute
Tyler Anderson, Michael WirthlinBrigham Young University
Slide 2French 207 MAPLD 2005
FPGA Power Trends & Needs
1
10
100
1,000
10,000
100,000
1,000,000
Virtex Virtex-E Virtex-II Virtex-II Pro Virtex 4 LX
Xilinx Family
Number of F-F’s
Power (mW)
Clocking Frequency (MHz)
Voltage (V)
Internal Power Consumption
Power calculated assuming 80% device utilization, 80% peak clock frequency, 12.5% toggling rate. Internal logic only, no I/O.
• Number of logic blocks & maximum operating frequency track Moore’s Law• Voltage reduction is slower• Resulting power increase is exponential• Power needs to be a first class design constraint• Limited power tools available
– Spreadsheets• Manual entry
• Prone to guess-timation
– XPower (post-routing)• At end of design cycle
• Profiled after timing simulation• Time intensive
• Unwieldy file sizes
• Limited Reporting• Only total power consumed
• No ability to capture power transients
• Limited design path if specifications not met
• Routing tools optimize only throughput
Slide 3French 207 MAPLD 2005
Power Tools: Goals
• Push power analysis, visualization, and optimization to front of the tools chain:
– Analyze power consumption at logic simulation with two levels of accuracy
• Pre-place-and-route, using heuristic estimates based on fanout
• Back-annotated with precise post-place-and-route RC data
– Visualize by providing intuitive views to help the designer rapidly find and correct inefficient circuits, operating modes, data patterns, etc.
– Optimize systems by automatically identifying problem paths and suggesting improvements
• Benefits– Closer to logical level and design entry– Power profiling during functional simulation– Early estimation before place and route– Automatic specific resource utilization power
details – Facilitates high level design alternative
exploration
FPGA Tool Flow
Proposed Power Tool Entry Point
Current Power Tool Entry Point
Slide 4French 207 MAPLD 2005
Tool Backbone: JHDL & EDIF Parser
• Leverage JHDL simulation Environment with EDIF Parser circuit manipulation• JHDL
– Java-based structural design tool for FPGAs– Circuits described by creating Java Classes– Design libraries provided for several FPGA families– http://www.jhdl.org
• JHDL design aides– Logic simulator & waveform viewer– Circuit schematic & hierarchy browser – Module Generators
• Circuit designer does not need to know Java!
JHDLData
StructureEDIFNetlist
EDIFData
Structure
ManipulationTools
EDIFParser
3rd PartyTools
• EDIF Parser– Supports multiple EDIF files
– Virtex2 libraries and memory initialization
– Support for “black boxes”
– No JHDL wrapper required
– http://splish.ee.byu.edu/reliability/edif/
– Verified: Synplicity, Synplcity Pro, Coregen, System Generator, Chipscope
JHDL Environment
EDIF Parser
Slide 5French 207 MAPLD 2005
Power Tool Flow: Timing-Level
Source Code
Synthesis
Map Place & Route
Xpower
Bitgen
EDIF Parser
JHDLPower Analysis & Visualization
Routed Circuit Model
EDIF
VHDL Verilog JHDL
Xilinx Tool Flow
.ncd .ncd
To Target
.pwr
Power Tools
• Event Model Restructured– Tool Interoperability– Cross-probing Enabled
• Support dynamic insertion of 3rd party (Power) tools– Circuit APIs in place– Graphical User Interfaces (GUI) support
Slide 6French 207 MAPLD 2005
Power Visualization Tool
• Two views:– Instantaneous vs. cumulative power
consumption over time
– Sorted tree view of “worst offenders”
• Integrated “cross-probing” with existing JHDL tools
– Unified Environment
– Allows Experimentation
– Smart Re-use of CPU Memory
• Help rapidly identify inefficient circuits and operating modes
• Per-cell / per-bit granularity
• Simulation trigger on power specification
Cross Probing
Slide 7French 207 MAPLD 2005
Post Synthesis Level Power Modeling
• Power Modeling– Quiescent power based on total circuit size– Dynamic Power
• Toggle Rates (Data Dependant)• Components Used• Routing Interconnect
– Actual quiescent and dynamic power not known until circuit is placed and routed
• Leverage existing JHDL tool environment– Toggling rates derived from simulator
• Will lose glitching information– Components known from EDIF or JHDL primitives
• Component capacitance imported from Xpower
– How to model routing interconnect?• Do not have exact routing information at
synthesis• Routing tools can pick different route each
iteration– Interconnect length and combinations vary
))()((% WireComponentClock CapCapFreqtogglePower Component Cap
(pF)Component Cap
(pF)
FF 1.21 LUT 1.0
SRL 3.0 LD 1.0
INV 1.0 AND 1.0
RAM 1.0 MULT 17.2
DLL 40.0 IBUF 1.0
BUFG 6.0 BRAM 59.0
Xpower Component Capacitance
Interconnect Cap (pF)
Long Line 11.8
Hex Line 0.59
Double Line 0.44
Direct Connect 0.29
Xpower Interconnect Capacitance
Slide 8French 207 MAPLD 2005
Wire Power Model Analysis
• Developed power tools to analyze relationships
• Can plot capacitance vs – Fanout– Programmable Interconnect Points– Wire Length– Total Number of Nets– Total Number of Components
• Which relationships maintain correlation from synthesis to place and route?
– Optimizer removes components, nets
• Can also use tools to judge routing quality
– Identify Outliers– Information Available to do Power Weighted
Placement and Routing• Use Placement Macros in JHDL• Use UCF placement and/or timing
constraints
Optimization Candidates
Slide 9French 207 MAPLD 2005
Low Fanout Capacitance Variance
• Not all routes are created Equal
• Up to 60% variance on “same” route length
• East-West vs North-South Bias
• Switches sometimes use Doubles instead of Direct Connects
2.45 pF (#2727)
YQ -> F2 (omux-B3)
2.37 pF (#4791)
YQ -> G4 (omux-B4)
1.46 pF (#2768)
YQ -> F4 (omux-A2)
0.75 pF (#131)
YQ -> F2 (omux-A7)Direct Connect Double Wire
Direct vs Double
Switch Logic
Slide 10French 207 MAPLD 2005
Capacitance vs Fanout
• Fanout model well correlated
• Secondary fit line corresponds to Macros
• High variance at low fanout
• Achieving 4.3% average error, 16% variance
• Explored device utilization models as well
Placement Macros
Slide 11French 207 MAPLD 2005
Resulting Power Tool Flow
Source Code
Synthesis
Map Place & Route
Xpower
Bitgen
EDIF Parser
JHDLPower Analysis & Visualization
Virtex II Power Model
Routed Circuit Model
EDIF
VHDL Verilog JHDL
Xilinx Tool Flow
.ncd .ncdTo Target
.pwr
Power Tools
Slide 12French 207 MAPLD 2005
Power Optimization Approach
• Influence Xilinx Place&Route tools for power efficiency
– Minimize clock/wire lengths of high power nets
• Use power analysis tools to identify hot-spots and generate constraints
– Timing constraints on non-clock signals– Location constraints on sink flip-flops of clock signals
• Verify power optimization approaches– Use final circuit timing model to verify power savings
Timing Constraint
(ns)
Placement Constraint
(X,Y)
bitgenPlace & Route
Xilinx Tool Flow.ncdNgdbuild
& Map
.ncd
.ucf
EDIF Parser
Power Tools
EDIF
Optimization
Xpower
Tool Verification
vcd
ModelSim
vhd
Verification
Slide 13French 207 MAPLD 2005
Timing Constraint Power Optimization
• Wire power is optimized by reducing length
– MAXDELAY constraint in UCF file defines the maximum latency a wire has
• Power tools contain Wire Table database– Sortable by: Average power, Toggling rate,
Fanout, Load
– Apply constraints
Default Constraints Constraint Freq : 50 MHz Operating Freq : 50 MHz Poor Power Efficiency
Power Timing Constraints Constraint Freq : 100 MHz Operating Freq : 50 MHz Better Power Efficiency
Wire Table
Slide 14French 207 MAPLD 2005
Timing Constraint Power Optimization:
Preliminary Results
- Power is reduced by from –1.4% to 11.8%
- More constraints are not necessarily better
- Can also vary amount of timing that nets are constrained by
- Circuits still meet original timing specification requirements
% of total nets constrained
Clock (mW) Signal (mW) Total Power (mW)
Clock + Signal
Baseline, no constraints
N/A 442.5 19.9 462.4
All nets constrained
12.5% 439.3 29.4 468.7 (-1.4%)
Fanout < 10 constrained
11.1% 394.2 23.7 417.9 (9.6%)
Fanout < 4 constrained
10.6% 400.6 23.1 423.7 (8.4%)
Top 25% constrained
4.1% 384.5 23.4 407.9 (11.8%)
Slide 15French 207 MAPLD 2005
Location Constraint Power Optimization
• Power Optimization Guidelines
– Minimize clock zone utilization
– Group flip-flops as tightly as possible
– Group flip-flops closer to clock trunks
Less Power Efficient More Power Efficient
Reduce clock paths by putting constraints on flip-flops
locations, thus reducing the clock capacitance and power.
Slide 16French 207 MAPLD 2005
Location Constraint Power Optimization
Interface
• Clock table can be sorted by power, number of flip-flops etc.
• Users can select locations of flip-flops- Users can select how tightly flip-flops are placed- Users can define the area where flip-flops are placedThe tool checks the validity of constraint areas.- Users can select which flip-flop groups are added with the constraints
Clock Table
Slide 17French 207 MAPLD 2005
Location Constraint Power Optimization
Preliminary Results
Clock (mW) Signal (mW) Logic (mW) Total Power (mW) Clock + Signal + Logic
Baseline, no constraints
442.5 19.9 285.8 748.2
All FFs Placed
293.7 (33.6%)
27.6 (-38.8%)
255.4 (10.6%)
576.7 (22.9%)
IOs in IOBs, all other FFs placed
356,251 (19.5%)
21,909 (-10%)
285,787 (0%)
663,947 (11.3%)
- Individual clock net improvement ranged from -4% to 57%
- Achieve up to 22.9% total power improvement
- Circuits still meet timing requirement if IO buffer flip-flops are left in IOBs
- Power could be further reduced if IO buffer flip-flops are not constrained to be within IOBs
Unconstrained
Constrained
Slide 18French 207 MAPLD 2005
Conclusions
• Post-synthesis level power modeling is feasible– Some accuracy trade-offs inevitable– Quicker power results enable
• Capability to determine power specifications early in the design flow
• Feedback on design-level circuit power ramifications• Tighter feedback loop to designer for more design
iterations
• Optimization– Preliminary results encouraging– Tools do not alter original circuit functionality & use COTS
inputs– Developing optimization algorithms & routines
• Tools are open source: http://rhino.east.isi.edu• This research made possible by a grant from
the NASA Earth-Sun System Technology Office