a 5ghz+ 128-bit binary floating-point adder for the power6 processor

4
A 5GHz+ 128-bit Binary Floating-Point Adder f o r t h e POWER Processor Xiao Y an Yu,Yiu-Hing Chan, Brian Curran Eric Schwarz, Michael Kelly I BM Systems a n d Technology Group Poughkeepsie, U S A {xianyu, chanyiu, curranb, eschwarz, mrkelly}@us.ibm.com Abstract-A fast 128-bit end-around carry adder i s designed a n d fabricated a s part o f t h e POWER floating-point unit i n a 65nm SO I process technology. Efficient u s e o f static circuits a n d careful balance of t h e look-ahead tree enable o u r floating- point design t o operate beyond 5GHz with 1.1V supply. I . INTRODUCTION Addition is often the timing critical path o f modem microprocessors. A number o f high-performance adders have been proposed i n t h e past [5],[6],[7]. Al l o f them implement a parallel adder structure using dynamic logic i n order t o achieve t h e performance target. These techniques a ll result i n significantly more power consumption. Designing solely for higher frequency will only yield a less power efficient design. Design space should b e fully explored before a design choice i s made. Recent silicon technologies have l e d t o the increase i sub- threshold leakage. This i s especially true f o r low-Vt transistors which have been intensively used i n high performance designs. O n e must fully exploit t he u s e o f high- V t devices before inserting low-Vt devices. As circuits approaching the ultimate power limit, careful circuit a nd layout implementations a e required i n order t o produce a power efficient design. Interconnect delay h as become more significant in each generation [8]. Currently, wire delays contribute large percentage o f cycle time. Designs with o f long wires will suffer from significant increase i n area, delay a n d power. Performance impact d u e t o physical implementation must b e analyzed a t design time i n order t o guarantee t h e optimality o f a design. A s a result, adders with dense prefix trees a r e no t desirable d u e t o their massive signal communications. This paper presents a fast 128-bit adder implemented i n static circuits using a 65nm S O I technology [ 2 ] with nominal V t devices. T he adder is realized i n a 7-cycle multiply-add pipeline that i s a part of the POWER6 microprocessor [1]. Several adders havebeen proposed i n t h e past t o beused i n the multiply-add operation [16],[17],[18],[19],[20]. They utilize various adder schemes based o n t h e delay profile o f Bruce Fleischer IBM T . J . Watson Research Center Yorktown Heights, U S A [email protected] th e multiply compression tree. These designs a r e power efficient only when the final addition i s performed right after th e compression tree a n d when t h e end-around carry computation i s n o t needed. ............. l ............. .............. ............... ... r~~~ ~~~~~~~ a : res, 2n~3d Cycle 6 t h Cycle 7 t h Cycle Figure 1 . POWER6 floating-point dataflow [1 ] Figure 1 shows a block diagram o f our floating-point unit. T he shaded boxes indicate t h e latch points ofeach pipeline stage. Addition i s partitioned into three different cycles i n order t o provide the best floating-point performance. T h e generation of bit-wise propagate a n d generate terms is done a t e n d o f third pipeline cycle. T h e actual end-around carry computation a n d the generation of 3 2 b group conditional sums a r e accomplished during fourth cycle. T h e final sums a r e selected prior t o normalization a t beginning o f fifth cycle. This floating-point unit requires a high performance adder design sincethe carry signal i s o n t h e critical path. O u r adder i s implemented using a prefix-2 Kogge-Stone tree which i s 1-4244-0303-4/06/ 20.00 C2006 IEEE. 1 6 6

Upload: salloum18

Post on 02-Jun-2018

230 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A 5GHz+ 128-bit Binary Floating-Point Adder for the POWER6 Processor

8/11/2019 A 5GHz+ 128-bit Binary Floating-Point Adder for the POWER6 Processor

http://slidepdf.com/reader/full/a-5ghz-128-bit-binary-floating-point-adder-for-the-power6-processor 1/4

A

5GHz+ 1 2 8 - b i t Binary

F l o a t i n g - P o i n t

Adder

for

t h e

POWER

Processor

X i a o Yan Y u , Y i u - H i n g C h a n ,

B r i a n

C u r r a n

E r i c

S c h w a r z , M i c h a e l K e l l y

IBM

S y s t e m s

a n d T e c h n o l o g y G r o u p

P o u g h k e e p s i e , USA

{ x i a n y u ,

c h a n y i u ,

c u r r a n b ,

e s c h w a r z ,

m r k e l l y } @ u s . i b m . c o m

A b s t r a c t - A

f a s t

1 2 8 - b i t

e n d - a r o u n d

c a r r y a d d e r i s d e s i g n e d

a n d f a b r i c a t e d

a s p a r t

o f t h e POWER f l o a t i n g - p o i n t u n i t i n a

65nm SO I

p r o c e s s t e c h n o l o g y .

E f f i c i e n t

u s e

o f s t a t i c

c i r c u i t s

a n d

c a r e f u l b a la n c e o f t h e

l o o k - a h e a d t r e e

e n a b l e

o u r f l o a t i n g -

p o i n t d e s i g n

t o

o p e r a t e

b e y o n d 5GHz

w i t h

1 . 1 V

s u p p l y .

I . INTRODUCTION

A d d i t i o n i s

o f t e n t h e

t i m i n g c r i t i c a l p a t h o f modem

m i c r o p r o c e s s o r s . A n u m b e r

o f h i g h - p e r f o r m a n c e

a d d e r s h a v e

b e e n

p r o p o s e d

i n

t h e

p a s t [ 5 ] , [ 6 ] , [ 7 ] .

A l l

o f

t h e m

i m p l e m e n t

a p a r a l l e l

a d d e r

s t r u c t u r e

u s i n g

d y n a m i c

l o g i c

i n o r d e r

t o

a c h i e v e

t h e p e r f o r m a n c e t a r g e t .

T h e s e

t e c h n i q u e s

a l l

r e s u l t

i n

s i g n i f i c a n t l y

m o r e p o w e r

c o n s u m p t i o n .

D e s i g n i n g s o l e l y f o r

h i g h e r

f r e q u e n c y

w i l l

o n l y

y i e l d

a

l e s s p o w e r

e f f i c i e n t

d e s i g n .

D e s i g n s p a c e s h o u l d b e

f u l l y

e x p l o r e d

b e f o r e

a

d e s i g n

c h o i c e

i s

m a d e .

R e c e n t

s i l i c o n

t e c h n o l o g i e s h a v e

l e d t o

t h e i n c r e a s e

i n

s u b -

t h r e s h o l d l e a k a g e .

T h i s

i s

e s p e c i a l l y

t r u e

f o r

l o w - V t

t r a n s i s t o r s w h i c h h a v e b e e n

i n t e n s i v e l y u s e d

i n

h i g h

p e r f o r m a n c e d e s i g n s .

O n e m u s t

f u l l y e x p l o i t

t h e

u s e

o f h i g h -

V t

d e v i c e s b e f o r e

i n s e r t i n g l o w - V t

d e v i c e s . As c i r c u i t s

a p p r o a c h i n g

t h e u l t i m a t e

p o w e r l i m i t ,

c a r e f u l c i r c u i t a n d

l a y o u t

i m p l e m e n t a t i o n s

a r e

r e q u i r e d

i n o r d e r

t o

p r o d u c e a

p o w e r

e f f i c i e n t

d e s i g n .

I n t e r c o n n e c t

d e l a y

h a s

b e c o m e

m o r e

s i g n i f i c a n t

i n

e a c h

g e n e r a t i o n [ 8 ] .

C u r r e n t l y , w i r e d e l a y s

c o n t r i b u t e

l a r g e p e r c e n t a g e o f c y c l e t i m e . D e s i g n s

w i t h

o f

l o n g

w i r e s w i l l

s u f f e r

f r o m

s i g n i f i c a n t

i n c r e a s e

i n a r e a , d e l a y

a n d

p o w e r .

P e r f o r m a n c e

i m p a c t

d u e

t o

p h y s i c a l

i m p l e m e n t a t i o n

m u s t

b e a n a l y z e d

a t

d e s i g n t i m e

i n

o r d e r t o

g u a r a n t e e

t h e

o p t i m a l i t y

o f

a

d e s i g n .

A s a

r e s u l t ,

a d d e r s w i t h

d e n s e

p r e f i x

t r e e s

a r e n o t

d e s i r a b l e d u e

t o

t h e i r m a s s i v e

s i g n a l

c o m m u n i c a t i o n s .

T h i s

p a p e r p r e s e n t s a f a s t 1 2 8 - b i t

a d d e r

i m p l e m e n t e d

i n

s t a t i c c i r c u i t s

u s i n g

a

65nm S O I

t e c h n o l o g y [ 2 ]

w i t h n o m i n a l

V t

d e v i c e s . T h e a d d e r i s r e a l i z e d

i n

a

7 - c y c l e m u l t i p l y - a d d

p i p e l i n e

t h a t

i s

a

p a r t o f

t h e POWER6

m i c r o p r o c e s s o r

[ 1 ] .

S e v e r a l a d d e r s

h a v e b e e n

p r o p o s e d

i n t h e

p a s t

t o

b e u s e d

i n

t h e

m u l t i p l y - a d d o p e r a t i o n [ 1 6 ] , [ 1 7 ] , [ 1 8 ] , [ 1 9 ] , [ 2 0 ] . T h e y

u t i l i z e v a r i o u s

a d d e r s c h e m e s

b a s e d

o n t h e

d e l a y p r o f i l e

o f

B r u c e

F l e i s c h e r

IBM T .

J .

W a t s o n R e s e a r c h C e n t e r

Y o r k t o w n H e i g h t s , USA

f l e i s c h r @ u s . i b m . c o m

t h e

m u l t i p l y c o m p r e s s i o n

t r e e . T h e s e d e s i g n s a r e p o w e r

e f f i c i e n t

o n l y

w h e n

t h e f i n a l a d d i t i o n

i s

p e r f o r m e d

r i g h t

a f t e r

t h e

c o m p r e s s i o n

t r e e a n d w h e n t h e e n d - a r o u n d

c a r r y

c o m p u t a t i o n

i s

n o t n e e d e d .

. . . . . . . . . . . . .l . . . . . . . . . . . . .. . . . . . . . . . . . . .. . . . . . . . . . . . . . .. . .

r ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

a

: r e s , 2 n ~ 3 d

C y c l e

6 t h

C y c l e

7 t h

C y c l e

F i g u r e

1 . POWER6

f l o a t i n g - p o i n t

d a t a f l o w

[ 1 ]

F i g u r e

1

s h o w s a b l o c k d i a g r a m o f

o u r

f l o a t i n g - p o i n t

u n i t .

T h e

s h a d e d

b o x e s

i n d i c a t e

t h e l a t c h p o i n t s

o f e a c h

p i p e l i n e

s t a g e .

A d d i t i o n i s

p a r t i t i o n e d

i n t o

t h r e e

d i f f e r e n t

c y c l e s

i n

o r d e r

t o

p r o v i d e

t h e b e s t

f l o a t i n g - p o i n t

p e r f o r m a n c e .

T h e

g e n e r a t i o n

o f

b i t - w i s e

p r o p a g a t e

a n d

g e n e r a t e

t e r m s

i s d o n e

a t

e n d

o f t h i r d

p i p e l i n e c y c l e .

T h e a c t u a l e n d - a r o u n d

c a r r y

c o m p u t a t i o n

a n d t h e

g e n e r a t i o n

o f 3 2 b

g r o u p

c o n d i t i o n a l

s u m s

a r e

a c c o m p l i s h e d

d u r i n g

f o u r t h

c y c l e .

T h e

f i n a l

s u m s

a r e

s e l e c t e d

p r i o r

t o n o r m a l i z a t i o n a t

b e g i n n i n g

o f f i f t h

c y c l e .

T h i s

f l o a t i n g - p o i n t

u n i t

r e q u i r e s

a

h i g h

p e r f o r m a n c e a d d e r

d e s i g n

s i n c e t h e

c a r r y s i g n a l

i s o n t h e c r i t i c a l

p a t h .

O u r

a d d e r

i s

i m p l e m e n t e d

u s i n g

a

p r e f i x - 2 K o g g e - S t o n e

t r e e w h i c h

i s

1 - 4 2 4 4 - 0 3 0 3 - 4 / 0 6 / 2 0 . 0 0

C 2 0 0 6

I E E E . 1 6 6

Page 2: A 5GHz+ 128-bit Binary Floating-Point Adder for the POWER6 Processor

8/11/2019 A 5GHz+ 128-bit Binary Floating-Point Adder for the POWER6 Processor

http://slidepdf.com/reader/full/a-5ghz-128-bit-binary-floating-point-adder-for-the-power6-processor 2/4

d e s c r i b e d

i n

S e c t i o n I I . T h e

c h i p m e a s u r e m e n t s d e m o n s t r a t e

t h e

o p e r a t i n g f r e q u e n c y

b e y o n d 5GHz a t

l . l V . S t o r a g e

e l e m e n t s a r e c l o c k - g a t e d

w h e n

i t

i s n o t

i n

u s e

t o s a v e a c t i v e

p o w e r . T h e s t r u c t u r e i s

t u n e d u s i n g

s l a c k - b a s e d t r a n s i s t o r

l e v e l

t i m i n g

m e t h o d o l o g y .

T h i s

a l l o w s

u s

t o

p r o d u c e

a

p o w e r

e f f i c i e n t

d e s i g n .

I I . ADDER IMPLEMENTATION

A .

P r e l i m i n a r y

T h e

d e s c r i p t i o n s o f

b i n a r y

f l o a t i n g - p o i n t u n i t w i t h m u l t i p l y -

a d d

d a t a f l o w c a n

b e f o u n d

i n

[ 3 ]

a n d

[ 4 ] .

T h i s

i m p l e m e n t a t i o n

a l l o w s

t h e

r e a l i z a t i o n

o f

f u s e d

m u l t i p l y - a d d

o p e r a t i o n :

T

=

B

+

A

x

C

p e r f o r m a n c e .

W i t h t h i s

c o n f i g u r a t i o n , t h e c a r r y

p a t h

b e c o m e s t h e m o s t c r i t i c a l .

, I

Su m s e e c t i o n

V W i r e f l o w

D i r e c t i o n o f

d a t a

C a r r y

a n d

c o n d i t i o n a l

su m

co m

u t a t i o n

  1 )

T h e e n d - a r o u n d c a r r y a d d e r p e r f o r m s t h e

f i n a l

a d d i t i o n

a f t e r

t h e

m u l t i p l y

c o u n t e r t r e e .

I t s

c a r r y

c h a i n s a r e

e q u a l l e n g t h

a n d

w r a p s

a r o u n d

f o r e f f e c t i v e

s u b t r a c t i o n .

A s s u m i n g

t h e

a d d e r i s d i v i d e d i n t o f o u r

g r o u p s ,

t h e

c a r r y

f o r e a c h

g r o u p

c a n b e

e x p r e s s e d

a s :

( b )a )

F i g u r e 2 .

B i n a r y

f l o a t i n g p o i n t

u n i t

f l o o r p l a n [ 1 ]

C o

G o

+

P o G ,

+

P o P I G 2

+

P o P I P 2 G 3

+

PI

P 2 P 3

C l

G C

+

P J G 2

+

P l P 2 G 3

+

P P 2 P 3 G 0

+

P o P I P 2 P 3

( 2 )

C 2 = G 2 + P 2 G 3

+ P 2 P 3 G O

+ P 2 P 3 P C G j

+

P o P I P 2 P 3

C 3 = G 3 + P 3 G o

+ P 3 P o G C

+

P 3 P o P C G 2

+

PI

P 2 P 3

I n t e r e s t e d

r e a d e r s

c a n

r e f e r

t o

[ 4 ]

f o r

m o r e d e t a i l s o n

e n d -

a r o u n d

c a r r y

a d d e r .

B . O u r A d d e r S t r u c t u r e

I n o r d e r t o

m i n i m i z e

c o m m u n i c a t i o n

o v e r h e a d

w i t h i n

t h e

f l o a t i n g p o i n t

u n i t ,

we c r e a t e d a n

  O

s h a p e d f l o o r p l a n .

D a t a

f l o w s c l o c k w i s e

a l o n g

t h e

r i g h t

s t a c k a n d

t h r o u g h

t h e

a d d e r

a n d

u p

t h e

l e f t

s t a c k b a c k t o t h e

r e g i s t e r s

s h o w n

i n

F i g u r e

2 ( a ) .

T h i s

f l o o r p l a n

l i m i t s t h e w i r e r e s o u r c e s

t h a t t h e a d d e r

c a n u s e

i n t e r n a l l y .

F o r t h i s

r e a s o n ,

we d e c i d e d t o u s e a n o n -

u n i f o r m

s p a r s e

a d d e r

s c h e m e .

U n i f o r m l y s p a r s e

a d d e r

s c h e m e s

w o u l d

o c c u p y

much

m o r e w i r e t r a c k s

f o r

i n t e r m e d i a t e c a r r i e s a n d i s n o t f e a s i b l e

i n

o u r

c a s e .

We u s e

d e n s e r

p r e f i x

t r e e

f o r

b l o c k s

w i t h

r e l a t i v e l y

s h o r t

w i r e s

a n d

s p a r s e r p r e f i x

t r e e f o r b l o c k s w i t h

l o n g

w i r e s .

By d o i n g

t h i s

w a y ,

we a r e

a b l e

t o r o u t e

o u r

c r i t i c a l

s i g n a l s

w i t h

b e t t e r

w i r e

w i d t h

a n d

s p a c e

w i t h o u t

a l l o c a t i n g

d e d i c a t e d

r o u t i n g

a r e a s

i n

o u r

d e s i g n

f o r

t h e m .

I n

o r d e r t o

f i t

t h e e n t i r e

f l o a t i n g p o i n t

u n i t

i n

t h e

g i v e n a r e a ,

t h e

a d d e r i s

s e p a r a t e d

i n t o t w o s e c t i o n s

p l a c e d

s i d e

b y

s i d e .

T h e

c a r r y

c o m p u t a t i o n

a n d t h e

f i n a l

sum s e l e c t i o n

a r e a

s h o w n

i n

F i g u r e 2 ( b ) .

T h e

e n d - a r o u n d c a r r i e s a n d

t h e

c o n d i t i o n a l

sum

s i g n a l s

a r e

r e q u i r e d

t o t u r n

1 8 0 °

b e f o r e

e n t e r i n g

t h e

f i n a l

sum s e l e c t i o n

b l o c k .

O u r i n t e n t i o n

i s

n o t t o

p r o d u c e

a n

a dd e r w i t h t h e

b e s t

s t a n d - a l o n e

p e r f o r m a n c e

b u t

o n e

t h a t c a n

p r o v i d e

t h e b e s t o v e r a l l

f l o a t i n g p o i n t

F i g u r e 3 ( a )

s h o w s t h e b l o c k

d i a g r a m

o f o u r

a d d e r . I t i s

d i v i d e d i n t o

t h r e e

d i f f e r e n t

s u b - b l o c k s ,

t h e

3 2 - b i t

a d d e r

b l o c k , t h e e n d - a r o u n d

c a r r y

g e n e r a t i o n

b l o c k a n d t h e f i n a l

sum s e l e c t i o n

b l o c k .

C i r c u i t b l o c k s a r e

p l a c e d

o p t i m a l l y

t o

s p e e d u p

t h e

c a r r y p a t h s . F i g u r e 3 ( b )

s h o w s

t h e

b l o c k

d i a g r a m

o f t h e

3 2 - b i t

a d d e r w i t h c r i t i c a l

p a t h

l a b e l e d w i t h a

t h i c k l i n e . I t

i s

p a r t i t i o n e d

i n t o t h r e e

s u b - c o m p o n e n t s .

F i r s t

s u b - c o m p o n e n t

i s t h e 8 - b i t

p r e f i x - 2

K o g g e - S t o n e

t r e e

[ 9 ]

w i t h

s p a r s e n e s s

o f

2

t h a t

g e n e r a t e s

8 - b i t

c a r r y ,

c a r r y + 1

a n d

p r o p a g a t e

t e r m s

a s

w e l l

a s

c o n d i t i o n a l

s u m s .

T h i s i s n e e d e d

l a t e r f o r sum

s e l e c t i o n .

S e c o n d

s u b - c o m p o n e n t i s t h e

p r e f i x -

2

K o g g e - S t o n e

t r e e w i t h

s p a r s e n e s s

o f

8 t h a t

g e n e r a t e s

3 2 - b i t

c a r r y , p r o p a g a t e

t e r m a n d a s w e l l a s

3 2 - b i t

c o n d i t i o n a l

s u m s .

C a r r y + 1

t e r m

i s

o n l y p r o p a g a t e d

w i t h i n t h e

3 2 - b i t

g r o u p .

S i n c e

c a r r y - o u t

i s o u r

c r i t i c a l

p a t h ,

we

h a v e i s o l a t e d t h i s

p a t h

s o

t h a t t h e

f a n - o u t

o f

e a c h n e t o n t h i s

p a t h

i s

1 . T h e

c a r r y

p a t h

i s

r e p l i c a t e d

i n o r d e r

t o

g e n e r a t e

i n t e r m e d i a t e

c a r r i e s .

T h e r e

a r e

s e v e r a l

w a y s

t o

i m p l e m e n t

t h e

c a r r y

p r o p a g a t i o n .

F i r s t way

i s t o

p r o p a g a t e

t h e

c a r r y a s s u m i n g

t h e

c a r r y - i n

o f

t h e

g r o u p

i s 0 .

C a r r y + 1

a t i t

b i t

c a n t h e n

b e

g e n e r a t e d

b y

a n

OR

o p e r a t i o n :

( C a r r y + 1 ) i

=

C a r r y i

+

P i

  3 )

T h i s

way d o e s

n o t

r e q u i r e t h e

p r o p a g a t i o n

o f b o t h

c a r r y

a n d

c a r r y + 1 . C a r r y + 1

s i g n a l

c a n

b e

p r o d u c e d

i n a n

a d d i t i o n a l

s t a g e .

S e c o n d

way

i s

t o

p r o p a g a t e

b o t h

c a r r y

s i g n a l s

a n d u s e

c a r r y + 1

a s t h e

p r o p a g a t e

t e r m s i n c e

P i

c

( C a r r y + l ) i .

Due t o

t h e f a c t t h a t t h e

c a r r y

s i g n a l

i s c r i t i c a l

i n o u r

t i m i n g ,

we

d e c i d e d t o

c h o o s e t h e s e c o n d s c h e m e . T h i s

r e q u i r e s

o n e l e s s

s t a g e

o n t h e c r i t i c a l

p a t h

c o m p a r i n g

t o

t h e

f i r s t s c h e m e .

U s i n g

t h i s

t e c h n i q u e ,

we a r e a b l e t o m e e t t h e

t i m i n g

r e q u i r e m e n t

o f

i n t e r m e d i a t e

c a r r y p a t h s

w i t h o u t

d i s t u r b i n g

t h e c r i t i c a l

p a t h s .

O u r

sum

s e l e c t i o n b l o c k i s

i m p l e m e n t e d

u s i n g

t r a n s m i s s i o n

g a t e

m u l t i p l e x e r s

w i t h

b u f f e r s

t o d r i v e t h e

1 8 0 °

w i r e

t u r n . T h e e n d - a r o u n d

c a r r y l o g i c

b l o c k

i m p l e m e n t s

1 6 7

Page 3: A 5GHz+ 128-bit Binary Floating-Point Adder for the POWER6 Processor

8/11/2019 A 5GHz+ 128-bit Binary Floating-Point Adder for the POWER6 Processor

http://slidepdf.com/reader/full/a-5ghz-128-bit-binary-floating-point-adder-for-the-power6-processor 3/4

t h e

e q u a t i o n s ( 2 )

s h o w n

i n

p r e v i o u s

s e c t i o n .

T h e c a r r y

b l o c k s

a r e p l a c e d

t o

e n s u r e b a l a n c e d w i r e d e l a y s a t e a c h s t a g e . T h e

f i n a l

s u m s e l e c t i o n

i s

i m p l e m e n t e d u s i n g s i m i l a r s t r u c t u r e

a s

t h e

s u m s e l e c t i o n

i n

3 2 - b a d d e r b l o c k .

n u m b e r s h o w s t h e

r e l a t i v e

s p e e d o f a

d e s i g n

i s

c o m p a r e d t o

t h e

c y c l e t i m e t a r g e t .

T h e

a v e r a g e p o w e r d i s s i p a t i o n

o f a

d e s i g n

a t e a c h

p e r f o r m a n c e p o i n t

i s s i m u l a t e d

u s i n g

o u r

i n -

h o u s e

p o w e r

s i m u l a t o r , CP M [ 1 4 ] . E a c h o u t p u t

i s l o a d e d

w i t h

e q u i v a l e n t c a p a c i t i v e

l o a d

c a l c u l a t e d

a t

u n i t

l e v e l .

A l l

s l a c k n u m b e r s a r e n o r m a l i z e d

t o

t h e f a n o u t - o f - 4

( F O 4 )

i n v e r t e r d e l a y , w h i c h i s

i n d e p e nd e n t o f t e c h no lo gy a n d

e n v i r o n m e n t

[ 1 5 ] .

F i g u r e 4 ( a ) s h o w s t h e p o w e r - p e r f o r m a n c e

c u r v e

o f t h e s e

d e s i g n s w i t h 20 i n p u t

s w i t c h i n g

a c t i v i t y .

A l l

d e s i g n s

a r e

o p t i m i z e d u n d e r

s a m e

r a n g e

o f

p o w e r - p e r f o r m a n c e

t r a d e o f f

f a c t o r s .

.-

CN

a )

a )

a )

( a )

- 7 . 0 0

- 6 . 0 0 - 5 . 0 0 - 4 . 0 0 - 3 . 0 0 - 2 . 0 0 - 1 . 0 0

0 . 0 0

1 . 0 0

S l a c k

( F O 4 )

( a )

r i t i c a l

p

g 3 2 , P 3 2

SumO

07

Sum

8 - 1 5

S u m O 1 6

2 3

s u m ° 2 4 - 3 1

S u m l 0 S u m 1 8 1 5

S u m 1

1 6 - 2 3

S u m 1 2 4 3 1

C a r r y 8 i

C a r r y 1 6 p 8 p + 1 E

G

P 1

C a r r y 8

( F o r c r i t i c a l

p a t h )

8 i

( F o r

c r i t i c a l

p a t h )

  C a r r y + 1 ) 8 ,

  C a r r y + 1 ) 8 ,

C a r r y C o p y 8 i + 1

C a r r y C o p y 6 i

  C a r r y + l ) 8 , i , I l

( C a r r y + 1

  1 6 i

C a r r y C o p y 8

( F o r

n o n - c r i t i c a l

C a r r y C o p y 8

( F o r

n o n - c r i t i c a l

p a t h )

p a t h )

( b )

F i g u r e 3 . ( a ) B l o c k

d i a g r a m

o f

o u r

a d d e r

( b ) D i a g r a m

o f 3 2 - b

b l o c k

a )

a )

a )

  7 0 0

- 6 . 0 0 - 5 . 0 0 - 4 . 0 0 - 3 . 0 0 - 2 . 0 0 - 1 . 0 0

0 . 0 0 1 . 0 0

S l a c k

( F O 4 )

( b )

I I I . C O M P AR I SO N W IT H C O N V EN T I O N A L

DESIGNS

We h a v e

c o m p a r e d

o u r

d e s i g n

a g a i n s t

t h e

L a d n e r - F i s c h e r

( L F A )

d e s i g n

[ 1 0 ] ,

[ 1 1 ]

a n d

a p r e f i x - 2

K o g g e - S t o n e

a d d e r

w i t h

s p a r s e n e s s

o f

8

( S p a r s e - 8 ) .

T h e

LFA

d e s i g n

i s

u s e d

i n

o u r

f i r s t

p a s s

t e s t

c h i p .

I t s 8 - b

s u b - c o m p o n e n t

i s

i m p l e m e n t e d

u s i n g

f u l l L a d n e r

F i s c h e r t r e e .

T h e 3 2 - b s u b -

c o m p o n e n t

i s

i m p l e m e n t e d u s i n g

t h e

s a m e

p r e f i x

s c h e m e

a s

i t s 8 - b

s u b - c o m p o n e n t

w i t h

s p a r s e n e s s

o f

8 . A l l

d e s i g n s

u s e

o n l y

n o m i n a l

V t

t r a n s i s t o r s .

T h e

o p t i m i z a t i o n p o i n t s

o f e a c h

d e s i g n

a r e

o b t a i n e d

b y

v a r y i n g p o w e r p e r f o r m a n c e

t r a d e o f f

f a c t o r

u s i n g

o u r i n - h o u s e f o r m a l s t a t i c

t u n e r ,

E i n s t u n e r

w i t h

c o n s t r a i n e d

i n p u t

s i z e

[ 1 2 ] .

T h e

p e r f o r m a n c e

o f e a c h

p o i n t

i s

s i m u l a t e d

u s i n g

o u r i n - h o u s e t r a n s i s t o r l e v e l s t a t i c

t i m e r ,

E i n s T L T

[ 1 3 ]

a n d i s

p r e s e n t e d

a s

a s l a c k n u m b e r .

T h i s

s l a c k

F i g u r e

4 .

( a ) A v e r a g e

P o w e r

v s .

S l a c k

( b ) L e a k a g e

P o w e r v s . S l a c k

A l l

t h r e e

d e s i g n s

b e h a v e

s i m i l a r l y

a t s l a c k

t i m e o f

- 6 F 0 4 .

A r o u n d

t h a t

s l a c k ,

e a c h

t o p o l o g y

c a n b e

i m p l e m e n t e d

t h r o u g h

c i r c u i t

t u n i n g

t o

g e t

t o

a n e f f i c i e n t

d e s i g n .

We

b e g i n

t o s e e d i f f e r e n c e s b e t w e e n t h e s e

t o p o l o g i e s

a s

t h e

c y c l e

t i m e

a p p r o a c h e s

o u r

t a r g e t .

LFA h a s

h i g h e s t

p e r f o r m a n c e

a t - 0 . 4

F04

s l a c k . T h e

p e r f o r m a n c e

o f

o u r

d e s i g n

a n d t h e

S p a r s e - 8

d e s i g n ,

o n t h e o t h e r

h a n d ,

c o n t i n u e s

t o

i m p r o v e

b e y o n d

o u r

t a r g e t .

O u r

d e s i g n

h a s

s i m i l a r

p e r f o r m a n c e

a s t h e

S p a r s e - 8

d e s i g n

a t e a c h

t r a d e - o f f

p o i n t .

H o w e v e r ,

t h e

p o w e r - p e r f o r m a n c e

c u r v e

o f

o u r

d e s i g n

r e s i d e s

b e l o w

t h a t

o f t h e

S p a r s e - 8

a n d c r o s s e s o v e r t h e

LFA

d e s i g n .

F i g u r e

4 ( b )

s h o w s t h e

t r e n d

o f

l e a k a g e p o w e r

a s

a f u n c t i o n

1 6 8

4

 

* - O u r

I m p l e m e n t a t i o n

-s -

LFA

- - A r -

S p a r s e

8

  3

I - - ,

29 -

2 7 - -

2 - r . , - -

1 1 2 -

 

- 0 -

O u r

I m p l e m e n t a t i o n

t

-B -

LFA

 

-A-

S p a s e - 8

1 0

I

z.

11

 9

4

 

Page 4: A 5GHz+ 128-bit Binary Floating-Point Adder for the POWER6 Processor

8/11/2019 A 5GHz+ 128-bit Binary Floating-Point Adder for the POWER6 Processor

http://slidepdf.com/reader/full/a-5ghz-128-bit-binary-floating-point-adder-for-the-power6-processor 4/4

o f

p e r f o r m a n c e p e r t a i n i n g

t o e a c h d e s i g n .

T h e

l e a k a g e p o w e r

c u r v e o f o ur d e s i g n a n d t h e

LFA

d e s i g n a r e

c o i n c i d e

w i t h

e a c h o t h e r .

T h e

S p a r s e - 8 d e s i g n , h o w e v e r ,

i s s h i f t e d u p w a r d s

a p p r o x i m a t e l y 2mW.

T h i s s h i f t r e s u l t s

i n

r e d u c t i o n

o f p o w e r

e f f i c i e n c y

f o r

t h e

S p a r s e - 8

d e s i g n

i n

t h e

p o w e r

p e r f o r m a n c e

s p a c e . T h e r e f o r e , o u r d e s i g n i s m o r e

p o w e r e f f i c i e n t

t h a n t h e

S p a r s e - 8 d e s i g n a n d t h e

LFA d e s i g n .

A t t h e h i g h e s t p e r f o r m a n c e p o i n t s o f a l l d e s i g n s , c a r r y a n d

sum

p a t h s b e c o m e e q u a l l y c r i t i c a l .

O p t i m i z i n g

t h e d e s i g n

w i t h

e v e n h i g h e r

t r a d e o f f

f a c t o r w i l l s e e d i m i n i s h i n g

r e t u r n .

S i n c e we a r e o n l y i n t e r e s t e d a t a d e s i g n p o i n t t h a t i s a b l e t o

a c h i e v e t h e c y c l e t i m e t a r g e t , t h e p o i n t w h e r e i t j u s t c r o s s e s 0

s l a c k b o u n d a r y

i s d e s i r e d . T h i s

d e s i g n

p o i n t i s a b o u t

0 . 5

F0 4

f a s t e r

t h a n LFA

w i t h o n l y

5 p o w e r

i n c r e a s e

a n d

6

a r e a

i n c r e a s e .

T h i s

s h o w s

t h a t b a l a n c i n g t h e

p r e f i x t r e e i n

a

d e s i g n a c c o r d i n g

t o

i t s c r i t i c a l p a t h

i m p r o v e s

t h e o v e r a l l

p e r f o r m a n c e .

O u r a d d e r

i s i m p l e m e n t e d a n d f a b r i c a t e d u s i n g

a

65nm

S O I

t e c h n o l o g y .

F i g u r e

5

s h o w s

a

p a r t

o f

o u r

f l o a t i n g - p o i n t

u n i t . B o x e s o n t h e f i g u r e i n d i c a t e t h e

p o s i t i o n s

o f

o u r a d d e r

b l o c k s .

T h e

c h i p m e a s u r e m e n t s

s h o w t h a t

a d d e r

i s f u l l y f u n c t i o n a l a t 5GHz w i t h 1 . 1 V s u p p l y v o l t a g e .

F i g u r e

5 . A d d e r

l a y o u t

I V .

CONCLUSION

A f a s t

1 2 8 - b i t

f l o a t i n g - p o i n t

a d d e r i s

i m p l e m e n t e d a n d

f a b r i c a t e d

a s

p a r t

o f t h e

POWER6

p r o c e s s o r

i n

a

65nm

S O I

t e c h n o l o g y

[ 2 ] .

We

u s e d n o n - u ni f o r m

s p a r s e K o g g e - S t o n e

t r e e a n d

c a r e f u l l y

b a l a n c e d t h e

p r e f i x

t r e e

a c c o r d i n g

t o i t s

c r i t i c a l

p a t h .

T h i s

new

d e s i g n

h a s

m e t

t h e s t r i n g e n t

t i m i n g

r e q u i r e m e n t

a f t e r

r e d u c i n g t h e

s l a c k t i m e

b y 0 . 5 F04

c o m p a r e d

t o

t h e

L a d n e r F i s c h e r

s c h e m e ,

w h i c h

w a s

u s e d

i n

t h e f i r s t

t e s t

c h i p .

C o m p a r e d

t o L a d n e r F i s c h e r

d e s i g n ,

o u r

d e s i g n o n l y

c o n s u m e s

6

a r e a

o v e r h e a d a n d 5

p o w e r

i n c r e a s e .

T h e m e a s u r e m e n t s

d e m o n s t r a t e o p e r a t i o n o f

t h i s

a d d e r

b e y o n d

5GHz w i t h 1 . IV

s u p p l y .

ACKNOWLEDGEMENT

T h e a u t h o r s

w o u l d

l i k e t o t h a n k

V o j i n

O k l o b d z i j a ,

K e v i n

N o w k a ,

V i c t o r

Z y u b a n ,

M a r y

J o

S a c c a m a n g o

a n d M i l e n a

V r a t o n j i c f o r v a l u a b l e

d i s c u s s i o n s

a n d

s u g g e s t i o n s .

REFERENCES

[ 1 ] B .

C u r r a n ,

e t . a l ,  4GHz+

L o w - L a t e n c y F i x e d - P o i n t a n d

B i n a r y

F l o a t i n g - P o i n t E x e c u t i o n U n i t s f o r t h e

POWER6 P r o c e s s o r ,

D i g e s t

o f 2 0 0 6

IEEE I n t e r n a t i o n a l S o l i d - S t a t e

C i r c u i t s C o n f e r e n c e ,

F e b r u a r y

8 ,

2 0 0 6 .

[ 2 ] E .

L e o b a n d u n ,

e t .

a l ,

  H i g h P e r f o r ma n c e 6 5 nm S O I T e c h n o l o g y w i t h

D u a l S t r e s s

L i n e r a n d

Low c a p a c i t a n c e SR M

c e l l , D i g e s t o f

2 0 0 5

S y m p o s i u m

o n

VLSI T e c h n o l o g y , 2 0 0 5 .

[ 3 ] R . K . M o n t o y e , e t .

a l ,   D e s i g n o f t h e

IBM

RISC

S y s t e m / 6 0 0 0

f l o a t i n g - p o i n t e x e c u t i o n u n i t ,

IBM J o u r n a l o f

R es e ar ch a n d

D e v e l o p m e n t ,

v o l . 3 4 , n o . 1 ,

p p .

5 9 .

[ 4 ]

E . S c h w a r z , B i n a r y

F l o a t i n g - P o i n t U n i t D e si g n ,

b o o k c h a p t e r

i n

H i g h P e r f o r m a n c e E n e r g y E f f i c i e n t M i c r o p r o c e s s o r

D e s i g n , S p r i n g e r ,

E d i t e d

b y

R .

K r i s h n a m u r t h y

a n d V . G . O k l o b d z i j a ,

M a r c h

2 0 0 6 .

[ 5 ]

J . P a r k ,

e t .

a l ,

  4 7 0 p s

6 4 - b i t P a r a l l e l B i n a r y A d d e r , D i g e s t o f o f

2 0 0 0 S y m p o s i u m o n

VLSI C i r c u i t s , 2 0 0 0 .

[ 6 ]

S .

M a t h e w ,

e t .

a l ,

  S u b - 5 0 0 - p s

6 4 - b

ALUs

i n 0 . 1 8 - , u m

S O I / b u l k

CMOS:

d e s i g n

a n d

s c a l i n g t r e n d s ,

IEEE J o u r n a l o f

S o l i d - S t a t e

C i r c u i t s , V o l u m e 1 1 , N o v e m b e r

2 0 0 1 .

[ 7 ]

B .

Z e y d e l

e t .

a l ,

  E f f i c i e n t

M a p p i n g

o f A d di t i o n R e c ur r e nc e

A l g o r i t h m s

i n CMOS ,

1 7 t h

IEEE S y m p o s i u m

o n C o m p u t e r

A r i t h m e t i c , J u n e 2 7 - 2 9 , 2 0 05 .

[ 8 ]   I n t e r c o n n e c t ,

I n t e r n a t i o n a l

T e c h n o l o g y

R o a d m a p f o r

S e m i c o n d u c t o r s ( I T R S )

2 0 0 5 .

[ 9 ] P . M . K o g g e a n d H . S .

S t o n e ,

 A p a r a l l e l

a l g o r i t h m

f o r

t h e

e f f i c i e n t

s o l u t i o n o f a g e n e r a l c l a s s o f r e c u r r e n c e

e q u a t i o n s , IEEE

T r a n s .

C o m p u t e r s ,

V o l . C - 2 2 , N o . 8 , 1 9 7 3 , p p. 7 8 6 - 7 9 3.

[ 1 0 ] R . E . L a d n e r , e t .

a l ,

  P a r a l l e l

P r e f i x C o m p u t a t i o n ,

J . ACM,

v o l .

2 7 ,

n o . 4 , p p . 8 3 1 - 8 3 8 ,

1 9 8 0 .

[ 1 1 ]

S .

K n o w l e s ,

  A

F am i l y o f

A dd e r s , P r o c e e d i ng s o f t h e

1 4 t h

IEEE

S y m p o s i u m o n

C o m p u t e r A r i t h m e t i c , A d e l a i d e , A u s t r a l i a , A p r i l 1 4 -

1 6 , 1 9 9 9 .

[ 1 2 ] A .

R .

C o n n , e t . a l ,

  G r a d i e n t - B a s e d O p t i m i z a t i o n o f C u s t o m

C i r c u i t s

U s i n g

a S t a t i c - T i m i n g

F o r m u l a t i o n ,

P ro ce e d i ng s o f

t h e

D e s i g n

A u t o m a t i o n C o n f e r e n c e , J u n e 1 9 9 9 ,

p p .

4 5 2 - 4 5 9 .

[ 1 3 ] V . R a o , e t .

a l ,   E i n s T L T : T r a n s i s t o r L e v e l

T i m i n g

W i t h E i n s T i m e r ,

ACM/IEEE

I n t e r n a t i o n a l W o r k s h o p o n T i m i n g I s s u e s

i n

t h e

S p e c i f i c a t i o n

a n d

S y n t h e s i s

o f D i g i t a l S y s t e m s ,

M a r c h

8 - 9 ,

1 9 9 9 .

[ 1 4 ]

J .

S . N e e l y , e t . a l ,

  C P A M :

a common p o w e r

a n a l y s i s

m e t h o d o l o g y

f o r

h i g h - p e r f o r m a n c e

V L S I d e s i g n ,

i n

P r o c e e d i n g s ,

IEEE 9 t h

T o p i c a l

M e e t i n g

o n E l e c t r i c a l P e r f o r m a n c e

o f

E l e c t r o n i c P a c k a g i n g ,

O c t o b e r

2 0 0 0 ,

p p .

3 0 3 - 3 0 6 .

[ 1 5 ]

M . H o r o w i t z ,   V L S I

S c a l i n g

f o r A r c h i t e c t s , P r e s e n t a t i o n s l i d e s ,

C o m p u t e r S y s t e m s

L a b o r a t o r y , S t a n f o r d U n i v e r s i t y .

[ 1 6 ]

V . G . O k l o b d z i j a , e t . a l ,   M u l t i p l i e r D e s i g n

U t i l i z i n g

I m p r o v e d

C o l u m n

C o m p r e s s i o n

T r e e And

O p t i m i z e d

F i n a l

A d d e r

I n

C M O S

T e c h n o l o g y ,

P r o c e e d i n g s

o f

t h e 1 9 9 3 I n t e r n a t i o n a l

S y m p o s i u m o n

VLSI

T e ch n ol o gy , S y s t e m s

a n d A p p l i c a t i o n s ,

p p . 2 0 9 - 2 1 2 ,

1 9 9 3 .

[ 1 7 ]

V . G . O k l o b d z i j a , e t . a l ,   I m p r o v i n g M u l t i p l i e r D e s i g n By

U s i n g

I m p r o v e d C o l u m n

C o m p r e s s i o n

T r e e

And O p t i m i z e d F i n a l

A d d e r

I n

C M O S

T e c h n o l o g y ,

IEEE T r a n s a c t i o n s

o n VLSI S y s t e m s ,

V o l .

3 ,

N o .

2 , J u n e ,

1 9 9 5 .

[ 1 8 ]

V . G O k l o b d z i j a , e t . a l ,   D e s i g n S t r a t e g i e s f o r t h e F i n a l

A d d e r i n

a

P a r a l l e l

M u l t i p l i e r , T w e n t y - N i n t h

A n n u a l

A s i l o m a r

C o nf e r e n c e o n

s i g n a l s ,

S y s t e m s

a n d C o m p u t e r s ,

P a c i f i c

G r o v e ,

C a l i f o r n i a ,

O c t o b e r

2 9

-

N o v e m b e r 1 , 1 9 9 5 .

[ 1 9 ]

P .

S t e l l i n g , e t . a l ,   D e s i g n S t r a t e g i e s

f o r

O p t i m a l H y b r i d F i n a l A d d e r s

i n

a P a r a l l e l M u l t i p l i e r , s p e c i a l i s s u e o n VLSI A r i t h m e t i c , J o u r n a l o f

VLSI

S i g n a l

P r o c e s s i n g ,

K l u w e r A c a d e m i c

P u b l i s h e r s ,

V o l .

1 4 ,

N o .

3 , D e c e m b e r

1 9 9 6 .

[ 2 0 ] B . R . Z e y d e l , e t . a l ,

  A 90nm 1 G H z

22mW 1 6 x 1 6 - b i t

2 ' s

C o m p l e m e n t M u l t i p l i e r

f o r W i r e l e s s

B a s e b an d , P r o ce e d i ng s

o f t h e

2 0 0 3

S y m p o s i u m

o n VLSI

C i r c u i t s , K y o t o ,

JAPAN, J u n e 1 2

-

1 4 ,

2 0 0 3 .

1 6 9