dept. electrónica y computación univ. santiago de compostela lab. de l’informatique du...
TRANSCRIPT
Dept. Electrónica y ComputaciónUniv. Santiago de Compostela
Lab. de l’Informatique duParallélisme. ENS-Lyon
Faithful Powering (X p) Computation
using Table Look-Up and a
Fused Accumulation Tree
José-Alejandro PiñeiroJavier D. BrugueraJean-Michel Muller
Dept. Electrónica y ComputaciónUniv. Santiago de Compostela
Lab. de l’Informatique duParallélisme. ENS-Lyon
INTRODUCTION
ALGORITHM 2nd-degree minimax approximation specialized squaring unit fused accumulation tree error computation
PROPOSED ARCHITECTURES unfolded & pipelined architectures pre-layout synthesis results &
comparison
CONCLUSION
SUMMARY
Dept. Electrónica y ComputaciónUniv. Santiago de Compostela
Lab. de l’Informatique duParallélisme. ENS-Lyon
Faithfully-rounded single-precision floating-point computation of powering function (Xp)
p parameter
Computer graphics and DSP
Framework for the computation of reciprocal (X-1) ( division), square root (X1/2), inverse square root (X-1/2), inverse squaring (X-2), ...
2nd-degree minimax approximation
Table Look-Up + Specialized Squaring Unit + Fused Accumulation Tree
Speed of 1st-order & area of 2nd-order methods
INTRODUCTION
Dept. Electrónica y ComputaciónUniv. Santiago de Compostela
Lab. de l’Informatique duParallélisme. ENS-Lyon
Direct Table Look-Up
Polynomial or rational approximations
Iterative algorithms
digit-recurrence methods (linear convergence)
multiplicative-based algorithms (quadratic conv.)
Table-based Methods (table look-up + low-degree polynomial)
INTRODUCTIONtypes of methods
Dept. Electrónica y ComputaciónUniv. Santiago de Compostela
Lab. de l’Informatique duParallélisme. ENS-Lyon
INTRODUCTIONtable-based methods
bipartite tables
linear approximations
second-order approximations
),(),()( 311210 XXaXXaXf
210)( XCCXf
222210)( XCXCCXf
'')( XCXf
table LU + addition
table LU + multiplication
table LU + 2nd-degree polynomial evaluation
162
122
82
m22 table sizes
2/32 m table sizes
m2 table sizes
8m
8m
8m
Dept. Electrónica y ComputaciónUniv. Santiago de Compostela
Lab. de l’Informatique duParallélisme. ENS-Lyon
ALGORITHM2nd-degree minimax approx.
222210 XCXCCX p
mnmm
m
xxxX
xxxX
2][.
].1[
212
211
2/1p
8m
TablesUpLookCCC 210 ,,
unitsquaringspecX .22
recodingSD 4
treeaccfusedevalpolynomial .
21 XXX
Dept. Electrónica y ComputaciónUniv. Santiago de Compostela
Lab. de l’Informatique duParallélisme. ENS-Lyon
ALGORITHM2nd-degree minimax approx.
Maple program for coefficients obtention
??_),,(
),(
)(
),(
,)(
),(
,,)(
0***
0
***0
22221
2**
2
**2
**021
1*1
*2
*1
*0
bounderrorbitskjierror
CbitsiCrounding
CXCXCXminimax
CbitskCrounding
CCXCXminimax
CbitsjCrounding
CCCXminimax
p
p
p
222210 XCXCCX p
Dept. Electrónica y ComputaciónUniv. Santiago de Compostela
Lab. de l’Informatique duParallélisme. ENS-Lyon
ALGORITHMspecialized squaring unit
jiijji
ijji
iii
xxxxxx
xxxx
xxx
2
X2 : m leading zeros
X22 : 2m leading zeros
8m
math. identities
Carry-Save output
leading zeros & truncation
6 22 3 2
2
h m
X
Dept. Electrónica y ComputaciónUniv. Santiago de Compostela
Lab. de l’Informatique duParallélisme. ENS-Lyon
ALGORITHM
Delay about 24x24-bit multiplier Reduced area (coefficients wordlength)
fused accumulation tree
8m
)()( 222210 XCppsXCppsC
recoding to SD-4
CSAlevelspps 315681
Dept. Electrónica y ComputaciónUniv. Santiago de Compostela
Lab. de l’Informatique duParallélisme. ENS-Lyon
ALGORITHMerror computation
FAITHFUL ROUNDING the intermediate results
between the two correct machine numbers
rrounderminttotal
212 r
round12 r
computapproxermint 6 2
22 3
h m
max squaring computC
100 2 rCC
Dept. Electrónica y ComputaciónUniv. Santiago de Compostela
Lab. de l’Informatique duParallélisme. ENS-Lyon
ARCHITECTURE
inverse square root (p = -1/2)
Table size:11.75 Kb
unfolded architecture
8m
2 functions : inserting a new set of tables and multiplexers
treeaccfusedSDCSsquaringunfolded tttt __2 treeaccfusedtableluunfolded ttt __
Dept. Electrónica y ComputaciónUniv. Santiago de Compostela
Lab. de l’Informatique duParallélisme. ENS-Lyon
latency : 3 cycles throughput : 1 result / cycle
ARCHITECTUREpipelined architecture
regSDCSsquaringpipel tttt 2.
regtablelupipel ttt .
regCPApipel ttt .
regCSAgenpppipel tttt 2:4_. 3
Dept. Electrónica y ComputaciónUniv. Santiago de Compostela
Lab. de l’Informatique duParallélisme. ENS-Lyon
ARCHITECTUREsynthesis results and comparison
pre-layout synthesis CMOS 0.35 m VHDL design-flow Synopsys
unfolded arch.
pipelined arch.
comparison
Dept. Electrónica y ComputaciónUniv. Santiago de Compostela
Lab. de l’Informatique duParallélisme. ENS-Lyon
New method for the single-precision floating-point faithfully-rounded powering computation
Second-degree minimax approximation, look-up tables, specialized squaring unit and fused accumulation tree
Unfolded and pipelined (3:1) architectures proposed
Pre-layout synthesis results (CMOS 0.35 m) and comparison with previous table-based methods
Speed of linear approx. & reduced area of 2nd-degree interpolations
Future work• Generalization to any f(X)• Employment for seed obtention in double-precision
computations. Multiplicative-based methods: a single Newton-Raphson or Goldschmidt iteration required
CONCLUSION
Dept. Electrónica y ComputaciónUniv. Santiago de Compostela
Lab. de l’Informatique duParallélisme. ENS-Lyon
Faithful Powering (X p) Computation
using Table Look-Up and a
Fused Accumulation Tree
José-Alejandro PiñeiroJavier D. BrugueraJean-Michel Muller