fully pipelined fpu for or1200

Fully Pipelined FPU for OR1200

Eric Zhang

Electrical & Computer Engineering

Introduction & Motivation

• Floating Point Unit:

– Performs floating point operations such as:

• add/sub, multiplication, division, sine, cosine, FMA

– Wide dynamic range and high precision

– Required by many algorithms and applications

• Eg. Hotspot, SRAD, etc.

– High performance and Low power consumption

FPU in OR1200

• Arithmetic, Conversion, Comparison

FPU in OR1200

• Serial implementation with long stalls10 cycles total

38 cycles total

37 cycles total

Goals and Objectives

• Pipeline the current version of floating point

multiplication and division

• Reduce number of clock cycles

• Eliminate the stalls due to serial implementation

• Synthesize and obtain the physical layout of the

pipelined FPU using Synopsys Top-Down design flow

Methodology

• Analyze existing floating point implementation

– Identify serial implementation that possible for pipelining

• Pipeline the FPU multiplier and divider using Synopsys

Register Retiming design flow

• DC for synthesis, VCS for functional simulation and

verification, IC compiler for physical layout, and power

and area measurement

Register Retiming

Register Retiming

1. Library setup

2. Constraint setup

4. Compile

5. New constraint

6. Retiming

3.

Register Retiming Flow

Register Retiming Timing Report

Schematic Before Retiming

Schematic After Retiming

VCS Functional Simulation

1.6 * 4.0 = 6.4

VCS Functional Simulation

1.6 / 4.0 = 0.0625

Physical Layout

Specification Results

Spec Pipelined Original

Frequency 222 MHz 222Mhz

VDD 1.05V 1.05 V

Metal Layers 9 9

# of input pins 143 143

# of output pins 80 80

Area 0.5 mm^2 0.45 mm^2

FPMUL Cycles 13 38

FPDIV Cycles 11 37

Dynamic Power 3.79 mW 0.65 mW

Leakage Power 1.33 mW 0.69 mW

Total Power 5.13 mW 1.34mW

DesignWare IP

• Technology-independent

• Microarchitecture-level library

• Synthesizable for ASIC, SoC, and FPGA design

• IPs include:

– Arithmetic Components: Multiplier, divider,adder, etc

• DW01_add, DW02_mult, DW_fp_mult

– DSP, AMBA Bus, Memory Controller

• DW_fir

– etc

DesignWare IP

• To use DesignWare IP:

1. set synthetic_library dw_foundation.sldb

2. set link_library $target_library $synthetic_library

3. License: DesignWare

• Instantiation In Verilog file:

– DW01_mult #(8, 8) U1 (A, B, TC, PRODUCT);

• Synthesize using normal flow

DesignWare IP

• Benefits of using DesignWare IP

– Increased productivity: parameterized, pre-verified

– Better quality of results (QoR): optimized by Synopsys

– Design reusability

Improved Scripts for design flow

• Automaticly setup all necessary folders and scripts

• Automaticly setup scratch storage for synthesis

results

• Scripts common to different projects are created as

symbolic links

– Eg. setup.tcl


Top level folder without any projects:

Create a project called “test”:


Top level folder after creating “test”:

Folder layout of project “test” :

Other useful scripts : timing_closure.sh : binary search for minimum delay

project_init.tcl: Project specific information: top-level design name, language, etc

Thank you!

fully pipelined fpu for or1200

Documents

retiming schematic

fpu multiplier

pipelined fpu

folder layout of project

design flowautomaticly

fpmul cycles

comparison fpu

useful scripts