truncated logarithmic approximation work many approximate conversion schemes exist (di er in...

21
Truncated Logarithmic Approximation Michael Sullivan and Earl E. Swartzlander, Jr. University of Texas April 10, 2013

Upload: dinhcong

Post on 26-Mar-2018

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Truncated Logarithmic Approximation Work Many approximate conversion schemes exist (di er in precision, latency, cost, and exibility). The least costly (and least precise) schemes

Truncated Logarithmic Approximation

Michael Sullivan and Earl E. Swartzlander, Jr.University of Texas

April 10, 2013

Page 2: Truncated Logarithmic Approximation Work Many approximate conversion schemes exist (di er in precision, latency, cost, and exibility). The least costly (and least precise) schemes

Introduction

Approximate arithmetic units have the potential to save power,area, and latency over conventional circuits.

Approximate logarithmic conversion is attractive because it canestimate multiplication, division, rooting, and raising to apower with bounded relative error.

Known application areas include:

I Graphics

I Neural networks

I DSP applicationsI Specialized circuitry

I period meters for nuclear power plants

Page 3: Truncated Logarithmic Approximation Work Many approximate conversion schemes exist (di er in precision, latency, cost, and exibility). The least costly (and least precise) schemes

Example: Approximate division

For example, division of logs can be performed using subtraction.

log2(a/b) = log2(a) − log2(b)

a/b = 2(log2(a)−log2(b))

Approximate logarithms can be used to cheaply approximatefixed-point division.

a/b ≈ 2̃

(l̃og2(a)−l̃og2(b)

)

Page 4: Truncated Logarithmic Approximation Work Many approximate conversion schemes exist (di er in precision, latency, cost, and exibility). The least costly (and least precise) schemes

This WorkMany approximate conversion schemes exist (differ in precision,latency, cost, and flexibility).

The least costly (and least precise) schemes use piece-wise linearinterpolation. All such schemes refine a simple linear interpolationscheme (Mitchell, 1962).

Intuition:The precision of interpolation should be proportional to theprecision of the final result.

The Idea:Rounding off the log and anti-log approximation reduces theircosts, and can actually improve the average precision of the result.

Truncated logarithmic approximation can be used as a drop-inreplacement for Mitchell’s scheme, improving its cost and precision.

Page 5: Truncated Logarithmic Approximation Work Many approximate conversion schemes exist (di er in precision, latency, cost, and exibility). The least costly (and least precise) schemes

Some Background

A fixed-point input, N can be written as N = 2k ∗ (1 + f ).

I k is the characteristic (0≤k< log2(N))

I f is the fractional component (0≤ f < 1)

The binary logarithm of N is log2(N) = k + log2(1 + f ).Approximate logarithmic computations approximate log2(1 + f)with enough fidelity to achieve a set precision.

Page 6: Truncated Logarithmic Approximation Work Many approximate conversion schemes exist (di er in precision, latency, cost, and exibility). The least costly (and least precise) schemes

Mitchell’s Approximation (cont.)

Mitchell estimates log2(1+f ) using a single straight-lineapproximation to the logarithm curve, f ≈ log2(1+f )

26 27 28

6.0

6.5

7.0

7.5

8.0

N

Valu

e

Log2 HNL

Mitchell

Figure: Mitchell’s log approximation.

Page 7: Truncated Logarithmic Approximation Work Many approximate conversion schemes exist (di er in precision, latency, cost, and exibility). The least costly (and least precise) schemes

Mitchell’s Logarithm Generation

LOD Shifter

Encode

X

nlog (n)

n nf

k2

(a) Logarithm Generation

Shifterf'n 2N

z

k'log (n)+12

(b) Anti-log Generation

Figure: Approximate log and anti-log conversion.

Page 8: Truncated Logarithmic Approximation Work Many approximate conversion schemes exist (di er in precision, latency, cost, and exibility). The least costly (and least precise) schemes

Mitchell Hardware Cost

Hardware costs are dominated by two shifters:

X0X1X2X3X4X5X6X7X8X9X10X11X12X13X14

f0f1f2f3f4f5f6f7f8f9f10f11f12f13f14

(a) Logarithm Shifter

1f14f13f12f11f10f9f8f7f6f5f4f3f2f1f0

Z15Z14Z13Z12Z11Z10Z9Z8Z7Z6Z5Z4Z3Z2Z1Z0 Z31Z30Z27Z26 Z29Z28Z19Z18Z17Z16 Z25Z24Z21Z20 Z23Z22

(b) Anti-Log Shifter (after multiplication)

Figure: Shifters for approximate (anti-)logarithmic conversion.

Page 9: Truncated Logarithmic Approximation Work Many approximate conversion schemes exist (di er in precision, latency, cost, and exibility). The least costly (and least precise) schemes

Mitchell’s Approximation

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

f

Valu

e Log2 H1+fLf

(a) Values

0.0 0.2 0.4 0.6 0.8 1.0

-0.08

-0.06

-0.04

-0.02

0.00

f

Absolu

te

Error

f- Log2 H1+fL

(b) Absolute Error

Figure: The error in Mitchell’s log approximation.

Page 10: Truncated Logarithmic Approximation Work Many approximate conversion schemes exist (di er in precision, latency, cost, and exibility). The least costly (and least precise) schemes

Truncated Logarithmic Approximation

Truncated logarithmic approximation replaces Mitchell’s algorithm,retaining only the t most-significant bits of the fractionalcomponent.

log2-T↑(X , t) = k + (f mod 2−t) + 2−t ≈ log2(X ) (1)

LOD Shifter

Encode

X

nlog (n)

tf

k2

n

Shifterf't 2N

z

k'

log (n)+12

Page 11: Truncated Logarithmic Approximation Work Many approximate conversion schemes exist (di er in precision, latency, cost, and exibility). The least costly (and least precise) schemes

Hardware Savings - Log Generation

X0X1X2X3X4X5X6X7X8X9X10X11X12X13X14

f0f1f2f3f4f5f6f7f8f9f10f11f12f13f14

(a) Logarithm Shifter

X0X1X2X3X4X5X6X7X8X9X10X11X12X13X14

f11f12f13f14

tb

The TruncationSpecific Region

(b) Trunc. Log Shift (t = 4)

Figure: The impact of truncation on the log generation shifter.

Page 12: Truncated Logarithmic Approximation Work Many approximate conversion schemes exist (di er in precision, latency, cost, and exibility). The least costly (and least precise) schemes

Hardware Savings - Anti-Log Generation

1f14f13f12f11f10f9f8f7f6f5f4f3f2f1f0

Z15Z14Z13Z12Z11Z10Z9Z8Z7Z6Z5Z4Z3Z2Z1Z0 Z31Z30Z27Z26 Z29Z28Z19Z18Z17Z16 Z25Z24Z21Z20 Z23Z22

(a) Full Anti-Logarithm Shifter

1f14f13f12f11

Z15Z14Z13Z12Z11Z10Z9Z8Z7Z6Z5Z4Z3Z2Z1Z0 Z31Z30Z27Z26 Z29Z28Z19Z18Z17Z16 Z25Z24Z21Z20 Z23Z22

(b) Truncated Anti-Log Shift (t=4)

Page 13: Truncated Logarithmic Approximation Work Many approximate conversion schemes exist (di er in precision, latency, cost, and exibility). The least costly (and least precise) schemes

Truncated Log Error

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

f

Valu

e Exact

Mitchell

Log2-T­ Ht=4L

Envelope ­ Ht=4L

(c) Up-Rounded Values

0.0 0.2 0.4 0.6 0.8 1.0

-0.08

-0.06

-0.04

-0.02

0.00

0.02

0.04

0.06

f

Error

Mitchell

Log2-T­ Ht=4L

Log2-T­ Ht=6L

Envelope ­ Ht=4L

(d) Up-Rounded Error

Figure: The error of upward-rounded truncation.

Page 14: Truncated Logarithmic Approximation Work Many approximate conversion schemes exist (di er in precision, latency, cost, and exibility). The least costly (and least precise) schemes

Cost Savings

0 5 10 15 20 25 30200

400

600

800

1000

t

Cost

HUG

ML

Mitchell

Log2-T�

(a) Cost Across t (n=32)

æ

æ

æ

æ

à

à

à

à

16 32 64 1280

1000

2000

3000

4000

5000

n

Cost

HUG

ML

Mitchell

Log2-T�Ht=4L

(b) Cost Across n (t=4)

Figure: The relative costs of truncated log gen/anti-gen.

Page 15: Truncated Logarithmic Approximation Work Many approximate conversion schemes exist (di er in precision, latency, cost, and exibility). The least costly (and least precise) schemes

Cost Savings (cont.)

10

20

30

4050

16 32 64 128

1

16

32

64

128

n

t

Figure: Exploring the n/t landscape.

Page 16: Truncated Logarithmic Approximation Work Many approximate conversion schemes exist (di er in precision, latency, cost, and exibility). The least costly (and least precise) schemes

Piece-wise Linear Logarithmic Approximation

1. M. Combet, H. Van Zonneveld, and L. Verbeek, “Computation ofthe base two logarithm of binary numbers,” IEEE Transactions onComputers, vol. EC-14, no. 6, pp. 863–867, 1965.

2. E. Hall, D. Lynch, and S. Dwyer, “Generation of products andquotients using approximate binary logarithms for digital filteringapplications,” IEEE Transactions on Computers, vol. C-19, no. 2,pp. 97–105, 1970.

3. S. SanGregory, C. Brothers, D. Gallagher, and R. Siferd, “A fast,low-power logarithm approximation with CMOS VLSIimplementation,” in Midwest Symposium on Circuits and Systems,vol. 1, 1999, pp. 388–391.

4. K. Abed and R. Siferd, “CMOS VLSI implementation of alow-power logarithmic converter,” IEEE Transactions on Computers,vol. 52, no. 11, pp. 1421–1433, 2003.

Page 17: Truncated Logarithmic Approximation Work Many approximate conversion schemes exist (di er in precision, latency, cost, and exibility). The least costly (and least precise) schemes

Using as a Drop-In Replacement for Combet et. al.

0.0 0.2 0.4 0.6 0.8 1.0

-0.05

0.00

0.05

f

Error

Correction

Mitchell

Log2-T­ Ht=8L

Log2-T­+Combet Ht=8L

(a) Truncated Combet et. al.

10 15 20 25 300.000

0.005

0.010

0.015

0.020

0.025

t

Error

Maximum

Average

Log2-T­+Combet Ht=8L

(b) Truncated Combet t Exploration

Figure: Truncation applied to Combet et. al.’s correction scheme.

Page 18: Truncated Logarithmic Approximation Work Many approximate conversion schemes exist (di er in precision, latency, cost, and exibility). The least costly (and least precise) schemes

A Novel Error Correction Scheme with Truncation

0.0 0.2 0.4 0.6 0.8 1.0

-0.05

0.

0.05

f

Error

Correction

Mitchell

Log2-T­ Ht=4L

Log2-T­+EC4Ht=4, tec=4L

(a) log2-T↑+EC4 (tec =4)

æ

æ

æ

æ

à

à

à

à

ì

ì

ì

ì

16 32 64 1280

20

40

60

80

n

Cost

for

Correctio

nHU

GM

L

Log2-T­+EC4Htec=4L

Log2-T­+EC4Htec=3L

Log2-T­+EC4Htec=2L

(b) log2-T↑+EC4 (tec =4) cost

Figure: Novel error correction for truncated logarithms.

Page 19: Truncated Logarithmic Approximation Work Many approximate conversion schemes exist (di er in precision, latency, cost, and exibility). The least costly (and least precise) schemes

To Recap (Numerical Results)

Table: Precision and Cost of Truncated Mitchell Analogues (n = 32).

TechniqueMin.Error

Max.Error

Max. Abs.Error

AverageAbs. Error

Cost

Mitchell -0.086 0 0.086 0.057 1.0log2−T↑(t=4) -0.086 0.062 0.086 0.035 0.57

log2−T↑+EC4(tec =2) -0.081 0.078 0.081 0.028 0.60

log2−T↑+EC4(tec =3) -0.066 0.070 0.070 0.024 0.61

log2−T↑+EC4(tec =4) -0.061 0.066 0.061 0.023 0.63

Page 20: Truncated Logarithmic Approximation Work Many approximate conversion schemes exist (di er in precision, latency, cost, and exibility). The least costly (and least precise) schemes

Conclusion

Truncated approximate logarithms improve piecewise-linearapproximate logarithm computations. They are based off of theintuition that the internal precision of a conversion schemeshould be proportional to the precision of the approximation.

Benefits:

I Decrease cost (up to ∼50%)

I May improve the precision of results

I Amenable to existing error reduction techniques

I May allow unique truncation-specific error reduction

Page 21: Truncated Logarithmic Approximation Work Many approximate conversion schemes exist (di er in precision, latency, cost, and exibility). The least costly (and least precise) schemes

Methodology - Unit Gate Model

Delay and cost (energy) estimated using a unit-gate model:

I Simple 2-input gates (AND, OR) [C = 1,T = 1]

I 2-input XOR gates and MUXes [C = 2,T = 2]

I m-input gates composed of a tree of 2-input gates

Advantages of the unit gate model:

I Offers a rough technology-agnostic model for circuit efficiency

I Betters understanding of scaling properties, bottlenecks

I Can be used for rapid design-space exploration

Inverters, buffering, and wiring concerns are ignored, but:

I Limited fan-out components are used

I Wiring stays roughly equivalent after truncation