embedded isa support for enhanced floating-point to fixed-point ansi c compilation

32
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Tor Aamodt and Paul Chow University of Toronto { aamodt, pc }@eecg.utoronto.ca 3rd ACM International Conference on Compilers, Architectures and Synthesis for Embedded Systems, Nov. 17- 18th, 2000, San Jose CA

Upload: kermit

Post on 24-Feb-2016

34 views

Category:

Documents


1 download

DESCRIPTION

Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation. Tor Aamodt and Paul Chow University of Toronto { aamodt, pc }@eecg.utoronto.ca. 3rd ACM International Conference on Compilers, Architectures and Synthesis for Embedded Systems, Nov. 17-18th, 2000, San Jose CA. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt and Paul ChowUniversity of Toronto

{ aamodt, pc }@eecg.utoronto.ca

3rd ACM International Conference on Compilers, Architectures and Synthesis for Embedded Systems, Nov. 17-18th, 2000, San Jose CA

Page 2: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul Chow

University of Toronto

Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 2 of 32

What is this presentation about?

FOCUS: Signal processing applications developed using high-level language representation and floating-point data types...

WANT: Faster fixed-point software development...

QUESTION: Are there “better” fixed-point DSP instruction-sets in terms of runtime, power, or roundoff-noise performance?

Page 3: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul Chow

University of Toronto

Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 3 of 32

Presentation Outline

Motivation & BackgroundFocus on…

Automatic Conversion to Fixed-PointArchitectural EnhancementsSome Experimental Results

Summary / Future Directions

Page 4: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul Chow

University of Toronto

Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 4 of 32

Motivation

80% of DSPs in use are Fixed-Point. Why?

Because fixed-point hardware is cheaper and uses less power …

… however, it is much harder to develop signal-processing software for.

Page 5: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul Chow

University of Toronto

Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 5 of 32

Background

UTDSP Project: DSP Compiler/Architecture Co-design Traditional DSP architectures are hard for compilers to generate

efficient code for… eg. extended precision accumulators First Generation Silicon Sept. 30, 1999: 108 pin PGA 0.35 µm

CMOS / 63 MHz (Sean Peng’s M.A.Sc.) 16-bit Fixed-Point VLIW DSP with novel 2-level Instruction

fetching architecture (reduced pin-count)

June 2000: Synopsys CoCentric Fixed-Point Designer Tool First commercial tool for transforming floating-point ANSI C

programs into fixed-point ($20,000 US)

Page 6: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul Chow

University of Toronto

Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 6 of 32

Background: Fixed-Point versus Floating-Point

Fractional PartInteger Partsign bit

sign bit 8 bit exponent (excess 127)

23+1 bit normalizedmantissa

Fixed-Point:

32 bit Floating-Point (IEEE):

implied binary-point

explicitbinary-point

Page 7: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul Chow

University of Toronto

Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 7 of 32

Background: Using Fixed-Point Arithmetic

yn = yn-1 + xn

yn = ((•yn-1>>3) + xn ) << 1

Floating-Point:

Fixed-Point:

Explicit Scaling Operations

Page 8: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul Chow

University of Toronto

Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 8 of 32

Automatic Conversion Process

Input Program

Parser Optimizer Code Generator Processor

Traditional Optimizing Compiler:

• CONSTRAINT: Input/Output Invariance

• GOAL: Application Speedup

ie. make code faster, but do not break anything!!!

Page 9: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul Chow

University of Toronto

Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 9 of 32

Automatic Conversion Process

Input Program Parser Optimizer Code Generator Processor

Floating-Point to Fixed-Point Translator

• “RELAX” CONSTRAINTS…

• GOALS: “Good” Input/Ouput Fidelity (eg. good signal-to-noise ratio) Fast/Low-Power Operation (10-500 faster than FP emulation)

Traditional Optimizing Compiler:

SampleInputs

Page 10: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul Chow

University of Toronto

Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 10 of 32

Floating-Point to Fixed-Point Translation

float a, b, x[N];y = a*x[i] + b*x[i+1];

int a, b, x[N];y = a•x[i] >> 2 + b•x[i+1];

1. Type Conversion

3. Fractional Fixed-Point Operations

2. Scaling Operations

Page 11: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul Chow

University of Toronto

Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 11 of 32

Floating-Point to Fixed-Point Translator

SUIF Parser*

*SUIF = Stanford University Intermediate Format See: http://suif.stanford.edu

Identifier Assignment

Optimizer

Instrument Code

ProfileSample Inputs

Fixed-PointConversion

Page 12: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul Chow

University of Toronto

Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 12 of 32

Collecting Dynamic Range Information

profile(tmp_1,1);

profile(tmp_2,2);

profile(y,0);

Code Instrumentation:

Consider the ANSI C code:

float a, b, x[N]; y = a*x[i] + b*x[i+1]; tmp_1 = a*x[i];

tmp_2 = b*x[i+1];

y = tmp_1 * tmp_2;

ID Assignment:

“1” : tmp_1

“2” : tmp_2

“0” :

Equivalent Expression Tree:

+

*

*

a

x[i+1]

b

x[i]y

Page 13: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul Chow

University of Toronto

Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 13 of 32

Generating Scaling Operations

Signal Scaling: Integer Word Length (IWL)definition: IWL[x] = log2 max(x) + 1

Fractional PartInteger PartSign bit

IWL

Page 14: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul Chow

University of Toronto

Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 14 of 32

Generating Scaling Operations

IWLA measured

IWLA current

IWLA op B measured

IWLA op B current

IWLB measured

IWLB current

Converted Sub-Expressions

Example: “A op B”:

op

A B

?

Page 15: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul Chow

University of Toronto

Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 15 of 32

Automatic Conversion Process:

IRP: Using Intermediate Result Profile Data Previous Algorithms:

‘Worst-Case Evaluation’: Markus Willems et. al. FRIDGE: An Interactive Code Generation Environment for HW/SW CoDesign. ICASSP, April 1997. (a.k.a. Predecessor to Synopsys CoCentric Fixed-Point Designer Tool)

A ‘Statistical’ Approach: Ki-Il Kum, Jiyang Kang, and Wonyong Sung. A Floating-Point to Fixed-Point C Converter for Fixed-Point Digital Signal Processors. In Proc. 2nd SUIF Compiler Workshop, August 1997.

Neither use Intermediate Result Profile data, instead, they combine range information from leaf nodes Is Useful Information Lost?

Page 16: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul Chow

University of Toronto

Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 16 of 32

IRP: Additive Operations

where: nA = IWLA current - IWLA measurednB = IWLA current - IWLB measuredn = IWLA measured - IWLB measured

“A B” “(A << nA) (B >> [n-nB])”

IWLA+B current = IWLA measured

n

“A ± B”

B:

A:

For example, assume |A| > |B|, andIWLA+B measured IWLA measured

>> n

Page 17: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul Chow

University of Toronto

Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 17 of 32

IRP: Multiplication

“A • B” “(A << nA) • (B << nB)”

where: nA = IWLA current - IWLA measured

nB = IWLA current - IWLB measured

IWLA•B current = IWLA measured + IWLB measured

Page 18: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul Chow

University of Toronto

Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 18 of 32

IRP: Division

“A / B” “(A >> [ndividend - nA]) / (B << nB)”

nA = IWLA current - IWLA measured

nB = IWLA current - IWLB measured

ndiff = IWLA/B measured - IWLA measured + IWLB measured

ndividend =ndiff , if ndiff 00 , otherwise

Page 19: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul Chow

University of Toronto

Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 19 of 32

IRP-SA: Using ‘Shift Absorption’

Question: Is information discarded unnecessarily here?

y = (a*x[i]<<1) + b*x[i+1]

Consider the following alternative:

Example:

y = (a*x[i] + (b*x[i+1]>>1)) << 1

BUT: Can we really discard most significant bits and get roughly the same answer???? YES!

Page 20: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul Chow

University of Toronto

Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 20 of 32

Architectural Support

Fractional Multiplicationwith internal Left Shift

IWLA+ IWLBA*B:

IWLB

IWLA

A:

B:

Common occurrence (using IRP-SA): A•B << n

n

Page 21: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul Chow

University of Toronto

Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 21 of 32

Experimental Results

Benchmarks

4th Order Cascaded/Parallel IIR Filter (IIR-C, IIR-P)(Normalized) Lattice Filter (LAT, NLAT)128-Point Radix 2 Decimation in Time FFT (FFT-NR, FFT-MW)Levinson-Durbin Recursion (LEVDUR)10x10 Matrix-Multiply (MMUL10)Nonlinear Control (INVPEND)Trig Function (SIN)

Page 22: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul Chow

University of Toronto

Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 22 of 32

SQNR Enhancement: FMLS and/or IRP-SA

-0.5

0

0.5

1

1.5

2

Equi

vale

nt B

its

IIR4-C IIR4-P NLAT LAT FFT-NR FFT-MW INVPEND LEVDUR MMUL10 SIN

IRP-SAFMLSIRP-SA w/ FMLS

Page 23: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul Chow

University of Toronto

Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 23 of 32

What Is The Effect of “Shift Absorption” ?

0

0.2

0.4

0.6

0.8R

elat

ive

Freq

uenc

y

3 left 2 left 1 left none 1 rightFMLS Ouput Shift Distance

Distribution of Fractional Multiply Output Shifts

IRP IRP-SA

Page 24: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul Chow

University of Toronto

Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 24 of 32

Experimental Results:

Rotational Inverted PendulumU of T System Control GroupNon-linear Testbench

Page 25: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul Chow

University of Toronto

Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 25 of 32

Closed-Loop System Response: Rotational Inverted Pendulum 12-bit Controller Comparison

WC : 32.8 dBIRP-SA: 41.1 dBIRP-SA w/ fmls: 48.0 dB

Page 26: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul Chow

University of Toronto

Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 26 of 32

128-Point Radix-2 FFT (Generated by MATLAB RealTime Workshop)

Page 27: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul Chow

University of Toronto

Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 27 of 32

Speedup?Rotational Inverted Pendulum: Fractional Multiply Output Shift Relative Frequencies

Page 28: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul Chow

University of Toronto

Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 28 of 32

…Yup!

Page 29: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul Chow

University of Toronto

Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 29 of 32

Speedup* Using FMLS

1

1.1

1.2

1.3

1.4R

elativ

e Sp

eedu

p

IIR4-

C

IIR4-

P

NLA

T

LAT

FFT-

NR

FFT-

MW

LEV

DU

R

MM

UL1

0

INV

PEN

D

SIN

Limiting8-FMUL = { 4 left thru 3 right }4-FMUL = { 2 left thru 1 right }2-FMUL = { one left, no shift }

Page 30: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul Chow

University of Toronto

Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 30 of 32

SQNR Enhancement for various Output Shift Sets

0

0.5

1

1.5

2

Equi

vale

nt B

its

IIR4-C IIR4-P NLAT LAT FFT-NR FFT-MW LEVDUR MMUL10 INVPEND SIN

Limiting8-FMUL4-FMUL2-FMUL

Page 31: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul Chow

University of Toronto

Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 31 of 32

Summary

The Fractional Multiply with internal Left Shift (FMLS) operation can improve runtime and signal-to-noise performance. Speedups of up to 35% and SQNR enhancement equivalent of up to 2 bits maybe even 4 bits (depending on how you choose to measure it)

Easy VLSI implementation, and easy for compiler to use.

Page 32: Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation

Tor Aamodt & Paul Chow

University of Toronto

Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 32 of 32

Future Directions

Higher Level Transformations:Automatic Generation of Block-Floating-Point...Quantization Error Feedback…BOTH need signal-flow-graph representation…

therefore probably need a better DSP language than ANSI C

Variable Precision Arithmetic (How much precision does each operation need?)