fully pipelined fpu for or1200
DESCRIPTION
Fully Pipelined FPU for OR1200. Eric Zhang. Electrical & Computer Engineering. Introduction & Motivation. Floating Point Unit: Performs floating point operations such as: a dd/sub, multiplication, division, sine, cosine, FMA Wide dynamic range and high precision - PowerPoint PPT PresentationTRANSCRIPT
Fully Pipelined FPU for OR1200
Eric Zhang
Electrical & Computer Engineering
Introduction & Motivation
• Floating Point Unit:
– Performs floating point operations such as:
• add/sub, multiplication, division, sine, cosine, FMA
– Wide dynamic range and high precision
– Required by many algorithms and applications
• Eg. Hotspot, SRAD, etc.
– High performance and Low power consumption
FPU in OR1200
• Arithmetic, Conversion, Comparison
FPU in OR1200
• Serial implementation with long stalls10 cycles total
38 cycles total
37 cycles total
Goals and Objectives
• Pipeline the current version of floating point
multiplication and division
• Reduce number of clock cycles
• Eliminate the stalls due to serial implementation
• Synthesize and obtain the physical layout of the
pipelined FPU using Synopsys Top-Down design flow
Methodology
• Analyze existing floating point implementation
– Identify serial implementation that possible for pipelining
• Pipeline the FPU multiplier and divider using Synopsys
Register Retiming design flow
• DC for synthesis, VCS for functional simulation and
verification, IC compiler for physical layout, and power
and area measurement
Register Retiming
Register Retiming
1. Library setup
2. Constraint setup
4. Compile
5. New constraint
6. Retiming
3.
Register Retiming Flow
Register Retiming Timing Report
Schematic Before Retiming
Schematic After Retiming
VCS Functional Simulation
1.6 * 4.0 = 6.4
VCS Functional Simulation
1.6 / 4.0 = 0.0625
Physical Layout
Specification Results
Spec Pipelined Original
Frequency 222 MHz 222Mhz
VDD 1.05V 1.05 V
Metal Layers 9 9
# of input pins 143 143
# of output pins 80 80
Area 0.5 mm^2 0.45 mm^2
FPMUL Cycles 13 38
FPDIV Cycles 11 37
Dynamic Power 3.79 mW 0.65 mW
Leakage Power 1.33 mW 0.69 mW
Total Power 5.13 mW 1.34mW
DesignWare IP
• Technology-independent
• Microarchitecture-level library
• Synthesizable for ASIC, SoC, and FPGA design
• IPs include:
– Arithmetic Components: Multiplier, divider,adder, etc
• DW01_add, DW02_mult, DW_fp_mult
– DSP, AMBA Bus, Memory Controller
• DW_fir
– etc
DesignWare IP
• To use DesignWare IP:
1. set synthetic_library dw_foundation.sldb
2. set link_library $target_library $synthetic_library
3. License: DesignWare
• Instantiation In Verilog file:
– DW01_mult #(8, 8) U1 (A, B, TC, PRODUCT);
• Synthesize using normal flow
DesignWare IP
• Benefits of using DesignWare IP
– Increased productivity: parameterized, pre-verified
– Better quality of results (QoR): optimized by Synopsys
– Design reusability
Improved Scripts for design flow
• Automaticly setup all necessary folders and scripts
• Automaticly setup scratch storage for synthesis
results
• Scripts common to different projects are created as
symbolic links
– Eg. setup.tcl
Improved Scripts for design flow
Top level folder without any projects:
Create a project called “test”:
Improved Scripts for design flow
Top level folder after creating “test”:
Folder layout of project “test” :
Other useful scripts : timing_closure.sh : binary search for minimum delay
project_init.tcl: Project specific information: top-level design name, language, etc
Thank you!