fully pipelined fpu for or1200

23
Fully Pipelined FPU for OR1200 Eric Zhang Electrical & Computer Engineeri

Upload: uri

Post on 24-Feb-2016

32 views

Category:

Documents


2 download

DESCRIPTION

Fully Pipelined FPU for OR1200. Eric Zhang. Electrical & Computer Engineering. Introduction & Motivation. Floating Point Unit: Performs floating point operations such as: a dd/sub, multiplication, division, sine, cosine, FMA Wide dynamic range and high precision - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Fully Pipelined FPU for OR1200

Fully Pipelined FPU for OR1200

Eric Zhang

Electrical & Computer Engineering

Page 2: Fully Pipelined FPU for OR1200

Introduction & Motivation

• Floating Point Unit:

– Performs floating point operations such as:

• add/sub, multiplication, division, sine, cosine, FMA

– Wide dynamic range and high precision

– Required by many algorithms and applications

• Eg. Hotspot, SRAD, etc.

– High performance and Low power consumption

Page 3: Fully Pipelined FPU for OR1200

FPU in OR1200

• Arithmetic, Conversion, Comparison

Page 4: Fully Pipelined FPU for OR1200

FPU in OR1200

• Serial implementation with long stalls10 cycles total

38 cycles total

37 cycles total

Page 5: Fully Pipelined FPU for OR1200

Goals and Objectives

• Pipeline the current version of floating point

multiplication and division

• Reduce number of clock cycles

• Eliminate the stalls due to serial implementation

• Synthesize and obtain the physical layout of the

pipelined FPU using Synopsys Top-Down design flow

Page 6: Fully Pipelined FPU for OR1200

Methodology

• Analyze existing floating point implementation

– Identify serial implementation that possible for pipelining

• Pipeline the FPU multiplier and divider using Synopsys

Register Retiming design flow

• DC for synthesis, VCS for functional simulation and

verification, IC compiler for physical layout, and power

and area measurement

Page 7: Fully Pipelined FPU for OR1200

Register Retiming

Page 8: Fully Pipelined FPU for OR1200

Register Retiming

1. Library setup

2. Constraint setup

4. Compile

5. New constraint

6. Retiming

3.

Page 9: Fully Pipelined FPU for OR1200

Register Retiming Flow

Page 10: Fully Pipelined FPU for OR1200

Register Retiming Timing Report

Page 11: Fully Pipelined FPU for OR1200

Schematic Before Retiming

Page 12: Fully Pipelined FPU for OR1200

Schematic After Retiming

Page 13: Fully Pipelined FPU for OR1200

VCS Functional Simulation

1.6 * 4.0 = 6.4

Page 14: Fully Pipelined FPU for OR1200

VCS Functional Simulation

1.6 / 4.0 = 0.0625

Page 15: Fully Pipelined FPU for OR1200

Physical Layout

Page 16: Fully Pipelined FPU for OR1200

Specification Results

Spec Pipelined Original

Frequency 222 MHz 222Mhz

VDD 1.05V 1.05 V

Metal Layers 9 9

# of input pins 143 143

# of output pins 80 80

Area 0.5 mm^2 0.45 mm^2

FPMUL Cycles 13 38

FPDIV Cycles 11 37

Dynamic Power 3.79 mW 0.65 mW

Leakage Power 1.33 mW 0.69 mW

Total Power 5.13 mW 1.34mW

Page 17: Fully Pipelined FPU for OR1200

DesignWare IP

• Technology-independent

• Microarchitecture-level library

• Synthesizable for ASIC, SoC, and FPGA design

• IPs include:

– Arithmetic Components: Multiplier, divider,adder, etc

• DW01_add, DW02_mult, DW_fp_mult

– DSP, AMBA Bus, Memory Controller

• DW_fir

– etc

Page 18: Fully Pipelined FPU for OR1200

DesignWare IP

• To use DesignWare IP:

1. set synthetic_library dw_foundation.sldb

2. set link_library $target_library $synthetic_library

3. License: DesignWare

• Instantiation In Verilog file:

– DW01_mult #(8, 8) U1 (A, B, TC, PRODUCT);

• Synthesize using normal flow

Page 19: Fully Pipelined FPU for OR1200

DesignWare IP

• Benefits of using DesignWare IP

– Increased productivity: parameterized, pre-verified

– Better quality of results (QoR): optimized by Synopsys

– Design reusability

Page 20: Fully Pipelined FPU for OR1200

Improved Scripts for design flow

• Automaticly setup all necessary folders and scripts

• Automaticly setup scratch storage for synthesis

results

• Scripts common to different projects are created as

symbolic links

– Eg. setup.tcl

Page 21: Fully Pipelined FPU for OR1200

Improved Scripts for design flow

Top level folder without any projects:

Create a project called “test”:

Page 22: Fully Pipelined FPU for OR1200

Improved Scripts for design flow

Top level folder after creating “test”:

Folder layout of project “test” :

Other useful scripts : timing_closure.sh : binary search for minimum delay

project_init.tcl: Project specific information: top-level design name, language, etc

Page 23: Fully Pipelined FPU for OR1200

Thank you!