computer arithmetic designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 computer...

318
1 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Computer Arithmetic Design Instructor: Kuan Jen Lin E-Mail: [email protected] Web: http://vlsi.ee.fju.edu.tw/teacher/kjlin/kjlin.htm Dept. of EE, FJU, Taiwan Room: SF 727B

Upload: others

Post on 21-Feb-2020

19 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

1Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Computer Arithmetic Design

Instructor: Kuan Jen Lin E-Mail: [email protected]: http://vlsi.ee.fju.edu.tw/teacher/kjlin/kjlin.htmDept. of EE, FJU, TaiwanRoom: SF 727B

Page 2: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

2Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

SW & HW

SW = Algorithm + Data Structure + Programming techniques

HW = Algorithm + Architecture + Design Method

Computing

Communication

Pipeline

Systolic array

Low power

Interface

Full custom

Cell based

FPGA

System level

Page 3: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

3Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Course ObjectivesLearn computer algorithms to do arithmetic operationsLearn hardware designs for computer arithmetic.After completing the course

Students are able to implement computer arithmetic hardware designs using HDL.Students are able to read research papers about computer arithmetic.

Page 4: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

4Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Textbook•Textbook

Behrooz Parhami,

“Computer Arithmetic

Algorithms and Hardware Designs,”

Oxford University Press

•Reference books:

Ercegovac and Lang, “Digital Arithmetic,” MKP.

Stine, “Digital Computer Aruthmetic datapath Design Using Verilog HDL,” CAP

Page 5: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

5Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Syllabus

Number representationTwo-operand AdditionMulti-operand AdditionMultiplicationDivisionSquare RootPapers reading and presentation

Page 6: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

6Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Grading

Mid Exam (30%)Papers reading and presentation (30%)Homework (some problems need HDL programming) (30%)Attendance and Others (10%)

Page 7: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

7Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Number Representation

Instructor: Kuan Jen Lin E-Mail: [email protected]. of EE, FJU, TaiwanRoom: SF 727B

Most slides are revision of PowerPoint files gotten from textbook website.

Page 8: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

8Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Numbers and Arithmetic

Chapter GoalsDefine scope and provide motivationSet the framework for the rest of the bookReview positional fixed-point numbers

Chapter HighlightsWhat goes on inside your calculator?Ways of encoding numbers in k bitsRadices and digit sets: conventional, exoticConversion from one system to another

Page 9: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

9Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

What is Computer Arithmetic?

Pentium Division Bug (1994-95): Pentium’s radix-4 SRT algorithm occasionally gave incorrect quotient First noted in 1994 by T. Nicely who computed sums of reciprocals of twin primes:

1/5 + 1/7 + 1/11 + 1/13 + . . . + 1/p + 1/(p + 2) + . . .

Worst-case example of division error in Pentium:

4 195 835

3 145 727

1.333 820 44... 1.333 739 06...

c = = Correct quotient circa 1994 Pentium double FLP value;

accurate to only 14 bits (worse than single!)

Page 10: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

10Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Hardware (our focus in this book) Software––––––––––––––––––––––––––––––––––––––––––––––––– ––––––––––––––––––––––––––––––––––––Design of efficient digital circuits for Numerical methods for solvingprimitive and other arithmetic operations systems of linear equations,such as +, –, ×, ÷, √, log, sin, cos partial differential equations, etc.Issues: Algorithms Issues: Algorithms

Error analysis Error analysisSpeed/cost trade-offs Computational complexityHardware implementation ProgrammingTesting, verification Testing, verification

General-purpose Special-purpose–––––––––––––––––––––– –––––––––––––––––––––––Flexible data paths Tailored toFast primitive applications like:

operations like Digital filtering+, –, ×, ÷, √ Image processing

Benchmarking Radar tracking

The Scope of Computer Arithmetic.

Page 11: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

11Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Using a calculator with √, x2, and xy functions, compute:u = √√ … √ 2 = 1.000 677 131 “1024th root of 2”v = 21/1024 = 1.000 677 131 Save u and v; If you can’t save, recompute values when neededx = (((u2)2)...)2 = 1.999 999 963x' = u1024 = 1.999 999 973 y = (((v2)2)...)2 = 1.999 999 983y' = v1024 = 1.999 999 994 Perhaps v and u are not really the same valuew = v – u = 1 × 10–11 Nonzero due to hidden digits (u – 1) × 1000 = 0.677 130 680 [Hidden ... (0) 68](v – 1) × 1000 = 0.677 130 690 [Hidden ... (0) 69]

A Motivating Example

Page 12: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

12Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Finite Precision Can Lead to DisasterExample: Failure of Patriot Missile (1991 Feb. 25)Source http://www.math.psu.edu/dna/455.f96/disasters.html American Patriot Missile battery in Dharan, Saudi Arabia, failed to intercept incoming Iraqi Scud missile

The Scud struck an American Army barracks, killing 28 Cause, per GAO/IMTEC-92-26 report: “software problem” (inaccurate calculation of the time since boot)Problem specifics: Time in tenths of second as measured by the system’s internal clock was multiplied by 1/10 to get the time in seconds Internal registers were 24 bits wide1/10 = 0.0001 1001 1001 1001 1001 100 (chopped to 24 b)Error ≈ 0.1100 1100 × 2–23 ≈ 9.5 × 10–8

Error in 100-hr operation period ≈ 9.5 × 10 –8 × 100 × 60 × 60 × 10 = 0.34 s

Distance traveled by Scud = (0.34 s) × (1676 m/s) ≈ 570 m

Page 13: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

13Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Numbers and Their Encodings

Some 4-bit number representation formats

Unsigned integer ± Signed integer

Signed fraction 2's-compl fraction

Floating point Logarithmic

Fixed point, 3+1

±

e s log x

Radix point

Base-2logarithm

Exponent in{−2, −1, 0, 1}

Significand in{0, 1, 2, 3}

Page 14: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

14Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Encoding Numbers in 4 Bits0 2 4 6 8 10 12 14 16 −2 −4 −6 −8 −10 −12 −14 −16

Unsigned integers

Signed-magnitude

3 + 1 fixed-point, xxx.x

Signed fraction, ±.xxx

2’s-compl. fraction, x.xxx

2 + 2 floating-point, s × 2 e in [−2, 1], s in [0, 3]

2 + 2 logarithmic (log = xx.xx)

±

±

Number format

log x

s e e

Page 15: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

15Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Fixed-Radix Positional Number Systems( xk–1xk–2 . . . x1x0 . x–1x–2 . . . x–l )r = xi r i

One can generalize to: Arbitrary radix (not necessarily integer, positive, constant) Arbitrary digit set, usually {–α, –α+1, . . . , β–1, β} = [–α, β]

Example 1.1. Balanced ternary number system: Radix r = 3, digit set = [–1, 1]

Example 1.2. Negative-radix number systems: Radix –r, r ≥ 2, digit set = [0, r – 1]The special case with radix –2 and digit set [0, 1] is known as the negabinary number system

Can it represent all integer number?

∑−

−=

1k

li

Page 16: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

16Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

More Examples of Number Systems

Example 1.3. Digit set [–4, 5] for r = 10: (3 –1 5)ten represents 295 = 300 – 10 + 5

Example 1.4. Digit set [–7, 7] for r = 10: (3 –1 5)ten = (3 0 –5)ten = (1 –7 0 –5)ten

Example 1.7. Quater-imaginary number system:radix r = 2j, digit set [0, 3]

Page 17: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

17Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Number Radix Conversion

Radix conversion, using arithmetic in the old radix rConvenient when converting from r = 10

u = w . v= ( xk–1xk–2 . . . x1x0 . x–1x–2 . . . x–l )r Old= ( XK–1XK–2 . . . X1X0 . X–1X–2 . . . X–L )R New

Radix conversion, using arithmetic in the new radix RConvenient when converting to R = 10

Whole part Fractional part

Example: (31)eight = (25)ten 31 Oct. = 25 Dec. Halloween = Xmas

Page 18: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

18Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Radix Conversion: Old-Radix ArithmeticConverting whole part w: (105)ten = (?)five

Repeatedly divide by five Quotient Remainder105 021 14 40

Therefore, (105)ten = (410)fiveConverting fractional part v: (105.486)ten = (410.?)five

Repeatedly multiply by five Whole Part Fraction.486

2 .4302 .1500 .7503 .7503 .750

Therefore, (105.486)ten ≅ (410.22033)five

Page 19: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

19Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Radix Conversion: New-Radix ArithmeticConverting whole part w: (22033)five = (?)ten

((((2 × 5) + 2) × 5 + 0) × 5 + 3) × 5 + 3 |-----| : : : :

10 : : : : |-----------| : : :

12 : : : |---------------------| : :

60 : : |-------------------------------| :

303 : |-----------------------------------------|

1518

Converting fractional part v: (410.22033)five = (105.?)ten(0.22033)five × 55 = (22033)five = (1518)ten

1518 / 55 = 1518 / 3125 = 0.48576Therefore, (410.22033)five = (105.48576)ten

Horner’srule or formula

Page 20: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

20Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Horner’s Rule for Fractions

Converting fractional part v: (0.22033)five = (?)ten

(((((3 / 5) + 3) / 5 + 0) / 5 + 2) / 5 + 2) / 5|-----| : : : :

0.6 : : : : |-----------| : : :

3.6 : : : |---------------------| : :

0.72 : : |-------------------------------| :

2.144 : |-----------------------------------------|

2.4288 |-----------------------------------------------|

0.48576

Horner’srule or formula

Page 21: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

21Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Classes of Number Representations

Signed numberRedundant number systemResidue number systemReal number

Page 22: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

22Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

2 Representing Signed Numbers

Chapter GoalsLearn different encodings of the sign infoDiscuss implications for arithmetic design

Chapter HighlightsUsing sign bit, biasing, complementationProperties of 2’s-complement numbersSigned vs unsigned arithmeticSigned numbers, positions, or digits

Page 23: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

23Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

0000 0001 1111

0010 1110

0011 1101

0100 1100

1000

0101 1011

0110 1010

0111 1001

0 +1

+3

+4

+5

+6 +7

-7

-3

-5

-4

-0 -1

+2-

+ _

Bit pattern (representation)

Signed values (signed magnitude)

+2 -6

Increment Decrement

-

Four-bit signed-magnitude number representation system for integers

Page 24: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

24Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Four-bit biased integer number representation system with a bias of 8

0000 0001 1111

0010 1110

0011 1101

0100 1100

1000

0101 1011

0110 1010

0111 1001

-8 -7

-5

-4

-3

-2 -1

+7

+3

+5

+4

0 +1 +2

+ _

Bit pattern (representation)

Signed values (biased by 8)

-6 +6

Increment Increment

Page 25: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

25Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Arithmetic with Biased Numbers

Addition/subtraction of biased numbersx + y + bias = (x + bias) + (y + bias) – biasx – y + bias = (x + bias) – (y + bias) + bias

A power-of-2 (or 2a – 1) bias simplifies addition/subtraction

Comparison of biased numbers:Compare like ordinary unsigned numbersfind true difference by ordinary subtraction

We seldom perform arbitrary arithmetic on biased numbersMain application: Exponent field of floating-point numbers

Page 26: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

26Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Example and Two Special CasesExample -- complement system for fixed-point numbers:

Complementation constant M = 12.000Fixed-point number range [–6.000, +5.999]Represent –3.258 as 12.000 – 3.258 = 8.742

Auxiliary operations for complement representationscomplementation or change of sign (computing M – x) computations of residues mod M

Thus, M must be selected to simplify these operations

Two choices allow just this for fixed-point radix-r arithmetic with k whole digits and l fractional digits

Radix complement M = rk

Digit complement M = rk – ulp (aka diminished radix compl)

ulp (unit in least position) stands for r−l

Allows us to forget about l, even for nonintegers

Page 27: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

27Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Two’s- Complement Numbers

0000 0001 1111

0010 1110

0011 1101

0100 1100

1000

0101 1011

0110 1010

0111 1001

+0 +1

+3

+4

+5

+6 +7

-1

-5

-3

-4

-8 -7

-6

+ _

Unsigned representations

Signed values (2’s complement)

+2 -2 Two’s complement = radix complement system for r = 2

M = 2k

2k – x = [(2k – ulp) – x] + ulp= xcompl + ulp

Range of representable numbers in with k whole bits:

from –2k–1 to 2k–1 – ulp

ulp (unit in least position) stands for r−l

Allows us to forget about l, even for nonintegers

Page 28: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

28Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

One’s-Complement Number Representation

One’s complement = digit complement (diminished radix complement) system for r = 2

M = 2k – ulp

(2k – ulp) – x = xcompl

Range of representable numbers in with k whole bits:

from –2k–1 + ulp to 2k–1 – ulp

0000 0001 1111

0010 1110

0011 1101

0100 1100

1000

0101 1011

0110 1010

0111 1001

+0 +1

+3

+4

+5

+6 +7

-0

-4

-2

-3

-7 -6

-5

+ _

Unsigned representations

Signed values (1’s complement)

+2 -1

Page 29: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

29Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Range/Precision extension for 2’s- and 1’s Complement

Range/precision extension for 2’s-complement numbers. . . xk–1 xk–1 xk–1 xk–1 xk–2 . . . x1 x0 . x–1 x–2 . . . x–l 0 0 0 . . .

Sign extension Sign LSD Extension bit

Range/precision extension for 1’s-complement numbers. . . xk–1 xk–1 xk–1 xk–1 xk–2 . . . x1 x0 . x–1 x–2 . . . x–l xk–1 xk–1 xk–1 . . .

Sign extension Sign LSD Extension bit

Page 30: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

30Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Mod 2k vs Mod 2k-1

Mod-2k operation needed in 2’s-complement arithmetic is trivial:Simply drop the carry-out (subtract 2k if result is 2k or greater)

Mod-(2k – ulp) operation needed in 1’s-complement arithmetic is done via end-around carry

(x + y) – (2k – ulp) Connect cout to cin

Since the dropped carry is worth 2k unites and the inserted carry is worth ulp, the combined effect is to reduce the magnitude by 2k-ulp.

Page 31: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

31Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Why 2’s-Complement Is the Universal Choice

Adder/subtractor architecture for 2’s-complement numbers.

Mux

Adder

0 1

x y

y or y _

s = x ± y

add/sub ___

c in

Controlled complementation

0 for addition, 1 for subtraction

c out

Can replace this mux with k XOR gates

Page 32: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

32Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Interpreting a 2’s-complement number as having a negatively weighted most-significant digit.

x = (1 0 1 0 0 1 1 0)two’s-compl

–27 26 25 24 23 22 21 20

–128 + 32 + 4 + 2 = –90

Check:x = (1 0 1 0 0 1 1 0)two’s-compl

–x = (0 1 0 1 1 0 1 0)two

27 26 25 24 23 22 21 20

64 + 16 + 8 + 2 = 90

Page 33: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

33Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Redundant Number Systems

Chapter GoalsExplore the advantages and drawbacks of using more than r digit values in radix r

Chapter HighlightsRedundancy eliminates long carry chainsRedundancy takes many forms: trade-offsConversions between redundant

and nonredundant representationsRedundancy used for end values too?

Page 34: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

34Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Coping with the Carry Problem

Ways of dealing with the carry propagation problem:1. Limit propagation to within a small number of bits (Chapters 3-4)

2. Detect end of propagation; don’t wait for worst case (Chapter 5)

3. Speed up propagation via lookahead etc. (Chapters 6-7)

4. Ideal: Eliminate carry propagation altogether! (Chapter 3)

Page 35: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

35Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Use Redundant Number System (1/2)

5 7 8 2 4 9

6 2 9 3 8 9 Operand digits in [0, 9]––––––––––––––––––––––––––––––––––

11 9 17 5 12 18 Position sums in [0, 18]

But how can we extend this beyond a single addition?Subsequent additions will cause problems.

+

•The digit values 10 through 18 are redundant.

•Carry occurs if the sum >= 10, while not >18.

Page 36: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

36Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Use Redundant Number System (2/2)

18 18 18 18 18

+ 0 0 0 0 1

Is there still carry propagation problem?

The sum of digits for each position is in [0, 36], each can be decomposed into an interim sum in [0, 16] and a transfer digit in [0, 2], i.e. carry.

8 8 8 8 9

1 1 1 1

1 9 9 9 9 9

Page 37: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

37Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Example: Addition of Redundant Numbers

Position sum decomposition [0, 36] = 10 × [0, 2] + [0, 16]

Absorption of transfer digit [0, 16] + [0, 2] = [0, 18]

6 12 9 10 8 18 Operand digits in [0, 18]

17 21 26 20 20 36

7 11 16 0 10 16

Position sums in [0, 36]

Interim sums in [0, 16]

1 1 1 2 1 2

1 8 12 18 1 12 16

11 9 17 10 12 18

Transfer digits in [0, 2]

Sum digits in [0, 18]

+

Page 38: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

38Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Carry-Free Addition Schemes

Interim sumat position i

Transfer digitinto position i

Operand digits at position i

s i+1 s i–1s i

xi–1 ,y i–1,x ixi+1 ,y i+1 y i xi–1 ,y i–1,x ixi+1 ,y i+1 y i

(b) Two-stage carry-free.

s i+1 s i–1s i

t i

(c) Single-stage with lookahead.

s i+1 s i–1s i

xi–1 ,y i–1,x ixi+1 ,y i+1 y i

(a) Ideal single-stage carry-free.

(Impossible for positional system with fixed digit set)

Page 39: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

39Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Redundancy IndexSo, redundancy helps us achieve carry-free addition

But how much redundancy is actually needed? Is [0, 11] enough for r = 10?

18 12 16 21 12 16 Position sums in [0, 22]

8 2 6 1 2 6

1 1 1 2 1 1

Interim sums in [0, 9]

Transfer digits in [0, 2]

1 9 3 8 2 3 6

11 10 7 11 3 8

Sum digits in [0, 11]

+ 7 2 9 10 9 8 Operand digits in [0, 11]

Redundancy index ρ = α + β + 1 – r For example, 0 + 11 + 1 – 10 = 2

Page 40: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

40Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Digit Sets and Digit-Set ConversionsExample 3.1: Convert from digit set [0, 18] to [0, 9] in radix 10

11 9 17 10 12 18 18 = 10 (carry 1) + 811 9 17 10 13 8 13 = 10 (carry 1) + 311 9 17 11 3 8 11 = 10 (carry 1) + 111 9 18 1 3 8 18 = 10 (carry 1) + 811 10 8 1 3 8 10 = 10 (carry 1) + 012 0 8 1 3 8 12 = 10 (carry 1) + 2

1 2 0 8 1 3 8 Answer; all digits in [0, 9]

Note: Conversion from redundant to nonredundant representation always involves carry propagation

Thus, the process is sequential and slow

Page 41: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

41Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Generalized Signed-Digit NumbersRadix-r Positional

ρ = 0 ρ ≥ 1

Non-redundant

α = 0 α ≥ 1

Conventional Non-redundant signed-digit

Generalized signed-digit (GSD)

ρ = 1 ρ ≥ 2

Minimal GSD

Non-minimal GSD

α = β(even r)

α ≠ β

Symmetric minimal GSD

r = 2

BSD or BSB

Asymmetric minimal GSD

α = 0 α = 1(r ?2)

Stored- carry (SC)

Non-binary SB

Symmetric non- minimal GSD

α = β α ≠ β

Asymmetric non- minimal GSD

α < r

Ordinary signed-digit

Minimally redundant OSD

Maximally redundant OSD BSCB

SCB

r = 2

α = 1β = rα = 0

Unsigned-digit redundant (UDR)

r = 2

BSC

α = r ?1α = ⎣ ⎦r/2 + 1

Radix rDigit set [–α, β]Requirement

α + β + 1 ≥ rRedundancy index

ρ = α + β + 1 – r

Page 42: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

42Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Binary Signed Digit (BSD)

xi 1 –1 0 –1 0 BSD representation of +6⟨s, v⟩ 01 11 00 11 00 Sign and value encoding2’s-compl 01 10 00 10 00 2-bit 2’s-complement ⟨n, p⟩ 01 10 00 10 00 Negative & positive flags ⟨n, z, p⟩ 001 100 010 100 010 1-out-of-3 encoding

Page 43: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

43Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Carry-Free Addition AlgorithmsCarry-free addition of GSD numbers

Compute the position sums pi = xi + yi

Divide pi into a transfer ti+1 and interim sum wi = pi – rti+1

Add incoming transfers to get the sum digits si = wi + ti

xi? ,yi?,xixi+1,yi+1 yi

s i+1 s i?s i

tiwi

If the transfer digits ti are in [–λ, μ], we must have:

–α + λ ≤ pi – rti+1 ≤ β – μ

interim sum

Smallest interim sum Largest interim sumif a transfer of –λ if a transfer of μis to be absorbable is to be absorbable

These constraints lead to:

λ ≥ α / (r – 1)

μ ≥ β / (r – 1)

Page 44: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

44Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Is Carry-Free Addition Always Applicable?No: It requires one of the following two conditions [Parh 90]

a. r > 2, ρ ≥ 3

b. r > 2, ρ = 2, α ≠ 1, β ≠ 1 e.g., not [−1, 10] in radix 10

In other words, it is inapplicable for

r = 2 Perhaps most useful case

ρ = 1 e.g., carry-save

ρ = 2 with α = 1 or β = 1 e.g., carry/borrow-save

BSD is not two-stage carry-free -1 -10 -1-1 -2-1

-1

Page 45: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

45Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Use Carry-Estimate

A position sum –1 is kept intact when the incoming transfer is in [0, 1], whereas it is rewritten as 1 with a carry of –1 for incoming transfer in [–1, 0]. This guarantees that ti ≠ wi and thus –1 ≤ si ≤ 1.

1 –1 0 –1 0 x in [–1, 1]

+ 0 –1 –1 0 1

1 –2 –1 –1 1

1 0 1 –1 –1

–1 –1 0 1

0 –1 1 0 –1

i

i+1

y in [–1, 1] i

p in [–2, 2] i

w in [–1, 1] i

s in [–1, 1] i

t in [–1, 1]

low low low high high high

0

0

e in {low: [–1, 0], high: [0, 1]} i

Page 46: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

46Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Residue Number Systems

Chapter GoalsStudy a way of encoding large numbers as a collection of smaller numbersto simplify and speed up some operations

Chapter HighlightsModuli, range, arithmetic operationsMany sets of moduli possible: tradeoffsConversions between RNS and binary The Chinese remainder theoremWhy are RNS applications limited?

Page 47: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

47Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

RNS Representations and Arithmetic

Chinese puzzle, 1500 years ago:

What number has the remainders of 2, 3, and 2 when divided by 7, 5, and 3, respectively?

Residues uniquely identify the number, hence they constitute a representation

Pairwise relatively prime moduli: mk–1 > . . . > m1 > m0

The residue xi of x wrt the ith modulus mi (similar to a digit):xi = x mod mi = ⟨x⟩mi

RNS representation contains a list of k residues or digits:x = (2 | 3 | 2)RNS(7|5|3)

Default RNS for this chapter: RNS(8 | 7 | 5 | 3)

Page 48: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

48Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

RNS Dynamic RangeProduct M of the k pairwise relatively prime moduli is the dynamic range

M = mk–1 × . . . × m1 × m0

For RNS(8 | 7 | 5 | 3), M = 8 ×7 ×5 ×3 = 840

Negative numbers: Complement relative to M⟨–x⟩mi = ⟨M – x⟩mi21 = (5 | 0 | 1 | 0)RNS

–21 = (8 – 5 | 0 | 5 – 1 | 0)RNS = (3 | 0 | 4 | 0)RNS

Here are some example numbers in our default RNS(8 | 7 | 5 | 3):(0 | 0 | 0 | 0)RNS Represents 0 or 840 or . . .(1 | 1 | 1 | 1)RNS Represents 1 or 841 or . . .(2 | 2 | 2 | 2)RNS Represents 2 or 842 or . . .. .(0 | 1 | 4 | 1)RNS Represents 64 or 904 or . . .(2 | 0 | 0 | 2)RNS Represents –70 or 770 or . . .(7 | 6 | 4 | 2)RNS Represents –1 or 839 or . . .

We can take the range of RNS(8|7|5|3) to be [−420, 419] or any other set of 840 consecutive integers

Page 49: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

49Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

We will see later how the weights can be determined for a given RNS

RNS as Weighted Representation

For RNS(8 | 7 | 5 | 3), the weights of the 4 positions are:

105 120 336 280

Example: (1 | 2 | 4 | 0)RNS represents the number

⟨105×1 + 120×2 + 336×4 + 280×0⟩840 = ⟨1689⟩840 = 9

For RNS(7 | 5 | 3), the weights of the 3 positions are:

15 21 70

Example -- Chinese puzzle: (2 | 3 | 2)RNS(7|5|3) represents the number

⟨15 × 2 + 21 × 3 + 70 × 2⟩105 = ⟨233⟩105 = 23

Page 50: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

50Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

RNS Encoding and Arithmetic Operations

Binary-coded format for RNS(8 | 7 | 5 | 3).

Arithmetic in RNS(8 | 7 | 5 | 3)(5 | 5 | 0 | 2)RNS Represents x = +5(7 | 6 | 4 | 2)RNS Represents y = –1(4 | 4 | 4 | 1)RNS x + y : ⟨5 + 7⟩8 = 4, ⟨5 + 6⟩7 = 4, etc.(6 | 6 | 1 | 0)RNS x – y : ⟨5 – 7⟩8 = 6, ⟨5 – 6⟩7 = 6, etc.

(alternatively, find –y and add to x)(3 | 2 | 0 | 1)RNS x × y : ⟨5 × 7⟩8 = 3, ⟨5 × 6⟩7 = 2, etc.

mod 8 mod 7 mod 5 mod 3

mod 8 mod 7 mod 5 mod 3

Mod-8 Unit

Mod-7 Unit

Mod-5 Unit

Mod-3 Unit

3 3 3 2

Operand 1 Operand 2

Result

Page 51: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

51Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Choosing the RNS Moduli

Target range for our RNS: Decimal values [0, 100 000]

Strategy 1: To minimize the largest modulus, and thus ensure high-speed arithmetic, pick prime numbers in sequence

Pick m0 = 2, m1 = 3, m2 = 5, etc. After adding m5 = 13:RNS(13 | 11 | 7 | 5 | 3 | 2) M = 30 030 Inadequate

RNS(17 | 13 | 11 | 7 | 5 | 3 | 2) M = 510 510 Too large

RNS(17 | 13 | 11 | 7 | 3 | 2) M = 102 102 Just right!5 + 4 + 4 + 3 + 2 + 1 = 19 bits

Fine tuning: Combine pairs of moduli 2 & 13 (26) and 3 & 7 (21)RNS(26 | 21 | 17 | 11) M = 102 102

Page 52: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

52Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

An Improved Strategy

Target range for our RNS: Decimal values [0, 100 000]

Strategy 2: Improve strategy 1 by including powers of smaller primes before proceeding to the next larger prime

RNS(22 | 3) M = 12RNS(32 | 23 | 7 | 5) M = 2520RNS(11 | 32 | 23 | 7 | 5) M = 27 720RNS(13 | 11 | 32 | 23 | 7 | 5) M = 360 360

(remove one 3, combine 3 & 5)RNS(15 | 13 | 11 | 23 | 7) M = 120 120

4 + 4 + 4 + 3 + 3 = 18 bits

Fine tuning: Maximize the size of the even modulus within the 4-bit limitRNS(24 | 13 | 11 | 32 | 7 | 5) M = 720 720 Too largeWe can now remove 5 or 7; not an improvement in this example

Page 53: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

53Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Low-Cost RNS ModuliTarget range for our RNS: Decimal values [0, 100 000]

Strategy 3: To simplify the modular reduction (mod mi) operations, choose only moduli of the forms 2a or 2a – 1, aka “low-cost moduli”

RNS(2ak–1 | 2ak–2 – 1 | . . . | 2a1 – 1 | 2a0 – 1)

We can have only one even modulus2ai – 1 and 2aj – 1 are relatively prime iff ai and aj are relatively prime

RNS(23 | 23–1 | 22–1) basis: 3, 2 M = 168RNS(24 | 24–1 | 23–1) basis: 4, 3 M = 1680RNS(25 | 25–1 | 23–1 | 22–1) basis: 5, 3, 2 M = 20 832RNS(25 | 25–1 | 24–1 | 23–1) basis: 5, 4, 3 M = 104 160

ComparisonRNS(15 | 13 | 11 | 23 | 7) 18 bits M = 120 120RNS(25 | 25–1 | 24–1 | 23–1) 17 bits M = 104 160

It’s easy to mod 2k and 2k -1

Page 54: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

54Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Encoding and Decoding of Numbers

Conversion from binary/decimal to RNS

–––––––––––––––––––––––––––––i 2i ⟨2i⟩7 ⟨2i⟩5 ⟨2i⟩3

–––––––––––––––––––––––––––––0 1 1 1 11 2 2 2 22 4 4 4 13 8 1 3 24 16 2 1 15 32 4 2 26 64 1 4 17 128 2 3 28 256 4 1 19 512 1 2 2

–––––––––––––––––––––––––––––

Table 4.1 Residues of the first 10 powers of 2

Example 4.1: Represent the number y = (1010 0100)two = (164)tenin RNS(8 | 7 | 5 | 3)

The mod-8 residue is easy to find

x3 = ⟨y⟩8 = (100)two = 4

We have y = 27+25+22; thus

x2 = ⟨y⟩7 = ⟨2 + 4 + 4⟩7 = 3

x1 = ⟨y⟩5 = ⟨3 + 2 + 4⟩5 = 4

x0 = ⟨y⟩3 = ⟨2 + 2 + 1⟩3 = 2

Page 55: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

55Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Conversion from RNS to Binary/DecimalTheorem 4.1 (The Chinese remainder theorem)

x = (xk–1 | . . . | x2 | x1 | x0)RNS = ⟨ ∑i Mi ⟨αi xi⟩mi ⟩Mwhere Mi = M/mi and αi = ⟨Mi

–1⟩mi (multiplicative inverse of Mi wrt mi)

Implementing CRT-based RNS-to-binary conversionx = ⟨ ∑i Mi ⟨αi xi⟩mi ⟩M = ⟨ ∑i fi(xi) ⟩M

We can use a table to store the fi values –- ∑i mi entries

Table 4.2 Values needed in applying the Chinese remainder theorem to RNS(8 | 7 | 5 | 3)

––––––––––––––––––––––––––––––i mi xi ⟨Mi ⟨αi xi⟩mi⟩M––––––––––––––––––––––––––––––3 8 0 0

1 1052 2103 315. .. .. .

Page 56: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

56Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Intuitive Justification for CRTPuzzle: What number has the remainders of 2, 3, and 2

when divided by the numbers 7, 5, and 3, respectively?

x = (2 | 3 | 2)RNS(7|5|3) = (?)ten

(1 | 0 | 0)RNS(7|5|3) = multiple of 15 that is 1 mod 7 = 15(0 | 1 | 0)RNS(7|5|3) = multiple of 21 that is 1 mod 5 = 21(0 | 0 | 1)RNS(7|5|3) = multiple of 35 that is 1 mod 3 = 70

(2 | 3 | 2)RNS(7|5|3) = (2 | 0 | 0) + (0 | 3 | 0) + (0 | 0 | 2)= 2 × (1 | 0 | 0) + 3 × (0 | 1 | 0) + 2 × (0 | 0 | 1)

= 2 × 15 + 3 × 21 + 2 × 70 = 30 + 63 + 140= 233 = 23 mod 105

Therefore, x = (23)ten

Page 57: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

57Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Difficult RNS Arithmetic Operations

Sign test Magnitude comparisonDivision

•Could convert back and forth to/from binary. •Another approach: convert to a mixed radix system, as numbers in a mixed radix system are comparable.

Page 58: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

58Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Difficult RNS Arithmetic Operations

Example: Of the following RNS(8 | 7 | 5 | 3) numbers:Which, if any, are negative?Which is the largest?Which is the smallest?

Assume a range of [–420, 419]a = (0 | 1 | 3 | 2)RNS

b = (0 | 1 | 4 | 1)RNS

c = (0 | 6 | 2 | 1)RNS

d = (2 | 0 | 0 | 2)RNS

e = (5 | 0 | 1 | 0)RNS

f = (7 | 6 | 4 | 2)RNS

Answers:d < c < f < a < e < b

–70 < –8 < –1 < 8 < 21 < 64

Page 59: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

59Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

General RNS DivisionGeneral RNS division, as opposed to division by one of the moduli (aka scaling), is difficult; hence, use of RNS is unlikely to be effective when an application requires many divisions

Scheme proposed in 1994 PhD thesis of Ching-Yu Hung (UCSB):Use an algorithm that has built-in tolerance to imprecision, and apply the approximate CRT decoding to choose quotient digits

Example –– SRT algorithm (s is the partial remainder)

s < 0 quotient digit = –1s ≅ 0 quotient digit = 0s > 0 quotient digit = 1

The BSD quotient can be converted to RNS on the fly

Page 60: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

60Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Limits of Fast Arithmetic in RNS

Known results from number theory

Implications to speed of arithmetic in RNS

Theorem 4.5: It is possible to represent all k-bit binary numbers in RNS with O(k / log k) moduli such that the largest modulus has O(log k) bits

That is, with fast log-time adders, addition needs O(log log k) time

Theorem 4.2: The ith prime pi is asymptotically i ln i

Theorem 4.3: The number of primes in [1, n] is asymptotically n / ln n

Theorem 4.4: The product of all primes in [1, n] is asymptotically en

Page 61: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

61Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan

Hardware Implementation for RNS Representations

mod 8 mod 7 mod 5 mod 3

Mod-8 Unit

Mod-7 Unit

Mod-5 Unit

Mod-3 Unit

3 3 3 2

Operand 1 Operand 2

Result

Page 62: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

1Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Addition/Subtraction

Instructor: Kuan Jen Lin E-Mail: [email protected]. of EE, FJU, TaiwanRoom: SF 727B

Most slides originate from the textbook author’s PowerPoint presentation files.

Page 63: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

2Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

II Addition / Subtraction

Chapter 8 Multioperand Addition

Chapter 7 Variations in Fast Adder

Chapter 6 Carry-Lookahead Adders

Chapter 5 Basic Addition and Counting

Topics in This Part

Review addition schemes and various speedup methods• Addition is a key op (in itself, and as a building block)• Subtraction = negation + addition• Carry propagation speedup: lookahead, skip, select, …• Two-operand versus multioperand addition

Page 64: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

3Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Basic Addition and Counting

Chapter GoalsStudy the design of ripple-carry adders, discuss why their latency is unacceptable,and set the foundation for faster adders

Chapter HighlightsFull adders are versatile building blocksLongest carry chain on average: log2k bitsFast asynchronous adders are simpleCounting is relatively easy to speed up

Page 65: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

4Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

HA and FA Adders

Half-adder (HA): Truth table and block diagram

Full-adder (FA): Truth table and block diagram

x y c c s ---------------------- 0 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 1 1 0 1 0 0 0 1 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1

Inputs Outputs

c out c in

out in x

y

s

FA

x y c s ---------------- 0 0 0 0 0 1 0 1 1 0 0 1 1 1 1 0

Inputs Outputs

HA

x y

c

s

Page 66: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

5Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Half-Adder Implementations

c

s

(b) NOR-gate half-adder.

xy

xy

(c) NAND-gate half-adder with complemented carry.

x

y

c

s

s

c xy

xy

(a) AND/XOR half-adder._

__c

Page 67: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

6Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Some Full-Adder Details

Logic equations for a full-adder:s = x ⊕ y ⊕ cin (odd parity function)

= xycin ∨ x ′y ′cin ∨ x ′y cin′ ∨ x y ′cin′

cout = x y ∨ x cin ∨ y cin (majority function)

Page 68: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

7Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Full-Adder Implementations

HA

HA

xy

cin

cout

(a) Built of half-adders.s

(b) Built as an AND-OR circuit.

(c) Suitable for CMOS realization.

cout

s

cin

xy

0 1 2 3

0 1 2 3

xy

cin

cout

s

0

1

Mux

Page 69: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

8Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Bit Serial Adder and Ripple Adder

x y

c

x

s

y

c

x

s

y

c out c in

0 0

0

c 0

31

31

31

31

FA

s

c c

1 1

1

1 2 FA FA

32 . . .

s 32

x

s

y

c c

i i

i

i i+1 FA Carry

FF Shift

Shift

x

y

s

(a) Bit-serial adder.

(b) Ripple-carry adder.

Clock

Page 70: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

9Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Critical Path Through a Ripple-Carry Adder

Critical path in a k-bit ripple-carry adder.

x

s

y

c

x

s

y

c

x

s

y

c

x

s

y

c

c out c in

0 0

0

c 0

1 1

1

1

k-2 k–2

k–2

2 k

k–1

k–1

k–1

k–1

FA FA FA FA . . . c k–2

s k

Tripple-add = TFA(x,y→cout) + (k – 2)×TFA(cin→cout) + TFA(cin→s)

Page 71: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

10Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Conditions and Exceptions

overflow2’s-compl = xk–1 yk–1 sk–1′ ∨ xk–1′ yk–1′ sk–1

overflow2’s-compl = ck ⊕ ck–1 = ck ck–1′ ∨ ck′ ck–1

FAFA

xy 11 x0y0

c0c1

s0s1

FAc2

sk–1

cout cin...

ck–1ck–2

sk–2

ck

xk–2yk–2xk–1yk–1

FA

Overflow

Negative

Zero

Overflows occurs when two numbers of like sign are added and a result of the opposite sign is produced.

Page 72: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

11Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Binary Adders as Versatile Building Blocks (1/2)

Fig. 5.6 Four-bit binary adder used to realize the logic function f = w + xyz and its complement.

c

3

c

4

c

2

c

1

c

0

0

1 w

1 z

0 y

x Bit 3 Bit 2 Bit 1 Bit 0

w ∨ xyz

(w ∨ xyz)′

w ∨ xyz xyz xy 0

Set one input to 0: cout = AND of other inputs

Set one input to 1: cout = OR of other inputs

Set one input to 0 and another to 1: s = NOT of third input

cout cin

x y

s

FA

Page 73: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

12Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Binary Adders as Versatile Building Blocks (2/2)

x y c c s----------------------0 0 0 0 00 0 1 0 10 1 0 0 10 1 1 1 01 0 0 0 11 0 1 1 01 1 0 1 01 1 1 1 1

Inputs Outputs

c out c in

outin x y

s

FA

Page 74: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

13Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Example of Carry Propagation

Bit positions15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0----------- ----------- ----------- -----------1 0 1 1 0 1 1 0 0 1 1 0 1 1 1 0

cout 0 1 0 1 1 0 0 1 1 1 0 0 0 0 1 1 cin\__________/\__________________/ \________/\____/

4 6 3 2Carry chains and their lengths

Page 75: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

14Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Using Probability to Analyze Carry PropagationGiven binary numbers with random bits, for each position i we have

Probability of carry generation = ¼ (both 1s)Probability of carry annihilation = ¼ (both 0s)Probability of carry propagation = ½ (different)

Probability that carry generated at position i propagates through position j – 1 and stops at position j (j > i)

2–(j–1–i) × 1/2 = 2–(j–i)

Expected length of the carry chain that starts at position i

)1()1()1(

)1(1

1

)1(1

1

)(

222)(2)1(2

2)(22)(2)(

−−−−−−−−−

−−−−−

=

−−−−−

+=

−−

−=−++−−=

−+=−+− ∑∑ikikik

ikik

l

likk

ij

ij

ikik

iklikij

Because the carry definitely stops at position k, the term for k is not multiplied by ½.

Page 76: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

15Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Carry Completion Detection

. . .

. . .

. . .

. . .

x y = x +y

alldoneFrom other bit positions

i+1

c = c

b = c

b = 1: No carry c = 1: Carry

b

i+1c 0

i i i i

ib

ic

x + yi i

x y i i

x y i i

0

in

in

}

di+1 ii

c = c k out

b k

bi ci0 0 Carry not yet known0 1 Carry known to be 11 0 Carry known to be 0

Dual rail coding

Page 77: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

16Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Self-Timed Adder

Page 78: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

17Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Self-Timed Adder with Parallel carry Completion Sensing

Page 79: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

18Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Addition of a Constant: Counters

Count register

Mux

Incrementer (Decrementer)

+1 (−1)

Data in

Load

Count / Initialize _____

x + 1

x

0 1

Data out

Reset Clear Enable Clock

Counter overflow

(x − 1)

c out

Page 80: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

19Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Implementing a Simple Up Counter

Four-bit asynchronous up counter built only of negative-edge-triggered T flip-flops.

T

Q

Q T

Q

Q T

Q

Q T

Q

QIncrement

0

0

1

1

2

2

3

3

Count Output

Ripple-carry incrementer for use in an up counter.

1

0

k−2

k−1

. . . c

k−1

c

k

c

k−2

c

1

x

x

x

x

c

2

1 0 k−2 k−1 s s s s 2 s

Page 81: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

20Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Manchester Carry Chains and AddersSum digit in radix r si = (xi + yi + ci) mod rSpecial case of radix 2 si = xi ⊕ yi ⊕ ci

Computing the carries ci is thus our central problem For this, the actual operand digits are not important What matters is whether in a given position a carry is

generated, propagated, or annihilated (absorbed)

For binary addition:gi = xi yi pi = xi ⊕ yi ai = xi′yi ′ = (xi ∨ yi) ′

It is also helpful to define a transfer signal:ti = gi ∨ pi = ai′ = xi ∨ yi

Using these signals, the carry recurrence is written asci+1 = gi ∨ ci pi = gi ∨ ci gi ∨ ci pi = gi ∨ ci ti

Page 82: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

21Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Manchester Carry Network

p

g

a

Logic 1

Logic 0

c

c

i+1

i

i

i

i

0

1

0

1

0 1

(a) Conceptual representation

c'i+1 ic'

Clock

ip

VDD

VSS

ig

(b) Possible CMOS realization.

The worst-case delay of a Manchester carry chain has three components:

1. Latency of forming the switch control signals2. Set-up time for switches3. Signal propagation delay through k switches

gi = xi yi pi = xi⊕ yi

ci+1 = gi∨ ci pi

Page 83: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

22Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Carry Network is the Essence of a Fast Adder

The main part of an adder is the carry network. The rest is just a set of gates to produce the g and p signals and the sum bits.

Carry network

. . . . . .

x i y i

g p

s

i i

i

c i c i+1

c k−1

c k c k−2 c 1

c 0

g p 1 1 g p 0 0

g p k−2 k−2 g p i+1 i+1 g p k−1 k−1

c 0 . . . . . .

0 0 0 1 1 0 1 1

annihilated or killed propagated generated (impossible)

Carry is: g i p i gi = xi yi

pi = xi ⊕ yi

Ripple; Skip;Lookahead;Parallel-prefix

Page 84: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

23Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Carry Propagation Network of a Ripple-Carry Adder

. . . c

k−1

c

k c k−2

c 1

g

p

1

1

g

p

0

0

g

p

k−2

k−2

g

p

k−1

k−1

c

0 c 2

The carry recurrence: ci+1 = gi ∨ pi ci

Latency of k-bit adder is roughly 2k gate delays:

1 gate delay for production of p and g signals, plus 2(k – 1) gate delays for carry propagation, plus1 XOR gate delay for generation of the sum bits

Page 85: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

24Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Carry-Lookahead Adders

Chapter GoalsUnderstand the carry-lookahead method and its many variationsused in the design of fast adders

Chapter HighlightsSingle- and multilevel carry lookaheadVarious designs for log-time addersRelating the carry determination problem

to parallel prefix computationImplementing fast adders in VLSI

Page 86: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

25Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Unrolling the Carry RecurrenceRecall the generate, propagate, annihilate (absorb), and transfer signals:

Signal Radix r Binarygi is 1 iff xi + yi ≥ r xi yipi is 1 iff xi + yi = r – 1 xi ⊕ yiai is 1 iff xi + yi < r – 1 xi′yi ′ = (xi ∨ yi) ′ti is 1 iff xi + yi ≥ r – 1 xi ∨ yi

si (xi + yi + ci) mod r xi ⊕ yi ⊕ ci

The carry recurrence can be unrolled to obtain each carry signal directly from inputs, rather than through propagation

ci = gi–1 ∨ ci–1 pi–1= gi–1 ∨ (gi–2 ∨ ci–2 pi–2)pi–1= gi–1 ∨ gi–2pi–1 ∨ ci–2 pi–2pi–1= gi–1 ∨ gi–2pi–1 ∨ gi–3 pi–2pi–1 ∨ ci–3 pi–3 pi–2pi–1= gi–1 ∨ gi–2pi–1 ∨ gi–3 pi–2pi–1 ∨ gi–4 pi–3 pi–2pi–1 ∨ ci–4 pi–4 pi–3 pi–2pi–1=….

Where pj can be replaced with tj.

Page 87: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

26Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Four-Bit Carry-Lookahead Adder (1/2)Complexity reduced by deriving the carry-out indirectlyc4=g3+c3p3

g0

g1

g2

g3

c0

c4

c1

c2

c3

p3

p2

p1

p0

Full carry lookahead is quite practical for a 4-bit adder

c1 = g0 ∨ c0 p0c2 = g1 ∨ g0p1 ∨ c0 p0p1c3 = g2 ∨ g1p2 ∨ g0 p1p2 ∨ c0 p0 p1p2c4 = g3 ∨ g2p3 ∨ g1 p2p3 ∨ g0 p1 p2p3

∨ c0 p0 p1 p2p3

Page 88: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

27Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Four-Bit Carry-Lookahead Adder (2/2)

Source: Ercegovac and Lang, “Digital Arithmetic,” MKP

Page 89: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

28Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Carry Lookahead Beyond 4 Bits

32-input AND

Consider a 32-bit adder

c1 = g0 ∨ c0 p0c2 = g1 ∨ g0p1 ∨ c0 p0p1c3 = g2 ∨ g1p2 ∨ g0 p1p2 ∨ c0 p0 p1p2

.

.

.

c31 = g30 ∨ g29p30 ∨ g28 p29p30 ∨ g27 p28 p29p30 ∨ . . . ∨ c0 p0 p1p2p3 ... p29p30

32-input OR. . . High fan-ins necessitate

tree-structured circuits

For wide words, full carry lookahead is impractical.

Page 90: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

29Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Two Schemes to Manage the ComplexityHigh-radix addition (i.e., radix 2h)

Increases the latency for generating g and p signals and sum digits,but simplifies the carry network (optimal radix?)

Multilevel lookahead

Example: 16-bit addition

Radix-16 (four digits)

Two-level carry lookahead (four 4-bit blocks)

Either way, the carries c4, c8, and c12 are determined first

c16 c15 c14 c13 c12 c11 c10 c9 c8 c7 c6 c5 c4 c3 c2 c1 c0cout ? ? ? cin

Page 91: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

30Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

One-Level carry Lookahead Adder

Source: Ercegovac and Lang, “Digital Arithmetic”, pp.72.

Page 92: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

31Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Block Generate and Propagate signals

Block generate and propagate signals

g [i,i+3] = gi+3 ∨ gi+2pi+3 ∨ gi+1 pi+2pi+3 ∨ gi pi+1 pi+2pi+3

p [i,i+3] = pi pi+1 pi+2pi+3

ic4-bit lookahead carry generator

g p g p g p g p

[i,i+3]p

i+1c i+2c i+3c

g

iii+1i+1i+2 i+2 i+3 i+3

[i,i+3]

Note: unrelated to ci

Ck = g[0,k-1]+c0p[0,k-1]

Ci+4 = g[i,i+3]+cip[i,i+3]

Page 93: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

32Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

4-bit Lookahead Carry Generator

gi

gi+1

g i+2

gi+3

ci

ci+1

ci+2

ci+3

pi+3

pi+2

pi+1

pi

g

p [i,i+3]

Block Signal GenerationIntermediate Carries

[i,i+3]

Page 94: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

33Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

A Two-Level Carry-Lookahead Adder (64 bits)

cccc

4-bit lookahead carry generator

4-bit lookahead carry generator

g p

ccc

g p

12 8 4 0

48 32 16

[0,63]

16-bit Carry-Lookahead Adder

[0,63]

[48,63][48,63] g

p[32,47][32,47] g

p[0,15][0,15]g

p[16,31][16,31]

g p [12,15]

[12,15] g p [8,11]

[8,11] g p [4,7]

[4,7] g p [0,3]

[0,3]

16 bit CLA

C4, C8 and C12 are the Ci+1, Ci+2 an Ci+3 respectively in last slide.

Ck = g[0,k-1]+c0p[0,k-1]

Page 95: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

34Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Latency of a 16-bit 2-Level l Carry-Lookahead Adder (1/2)

(Level 1) g and p for individual bit positions 1 gate level

(Level 1) g and p signals for 4-bit blocks 2 gate levelsi.e. g[0,3], p[0,3]……g[12, 15], p[12, 15]

(Level 2) Block carry-in signals c4, c8, and c12 2 gate levelsg[0,15], p[0,15]

(Level 1) Internal carries within 4-bit blocks 2 gate levelsc1, c2, c3, c5,…..(Level 2) C15 if required

(Level 1) Sum bits (XOR) 2 gate levels???

Page 96: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

35Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Latency of a 16-bit 2-Level l Carry-Lookahead Adder (2/2)

Total latency for the 16-bit adder is 9 gate levelsEach additional lookahead level adds 4 gate levels of latency (yellow block in last slide)

Latency for k-bit CLA adder:4 log4k + 1 gate levels

Page 97: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

36Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Combining of g and p signals

Combining of g and p signals of two (contiguous or overlapping) blocks B' and B" of arbitrary widths into the g and p signals for block B.

g" p"

i 0i 1

j 0j 1

g p

g' p'

Block B'Block B"

Block B(g, p)

(g", p") (g', p')

¢g = g" + g'p" p = p'p"

g p

g″ p″ g′ p′

Page 98: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

37Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Formulating the Prefix Computation ProblemThe problem of carry determination can be formulated as:Given (g0, p0) (g1, p1) . . . (gk–2, pk–2) (gk–1, pk–1) Find (g [0,0] , p [0,0]) (g [0,1] , p [0,1]) . . . (g [0,k–2] , p [0,k–2]) (g [0,k–1] , p [0,k–1])

c1 c2 . . . ck–1 ck

Carry-in can be viewed as an extra (−1) position: (g–1, p–1) = (cin, 0)

The desired pairs are found by evaluating all prefixes of(g0, p0) ¢ (g1, p1) ¢ . . . ¢ (gk–2, pk–2) ¢ (gk–1, pk–1)

The carry operator ¢ is associative, but not commutative[(g1, p1) ¢ (g2, p2)] ¢ (g3, p3) = (g1, p1) ¢ [(g2, p2) ¢ (g3, p3)]

Prefix sums analogy:Given x0 x1 x2 . . . xk–1Find x0 x0+x1 x0+x1+x2 . . . x0+x1+...+xk–1

Page 99: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

38Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

g0, p0g1, p1g2, p2g3, p3

g[0,0], p[0,0]= (c1, --)

g[0,1], p[0,1]= (c2, --)

g[0,2], p[0,2]= (c3, --)

g[0,3], p[0,3]= (c4, --)

Prefix-Based Carry Network

g p

g″ p″ g′ p′

++

++

26 5−1

712 56g0, p0g1, p1g2, p2g3, p3

g[0,0], p[0,0]= (c1, --)

g[0,1], p[0,1]= (c2, --)

g[0,2], p[0,2]= (c3, --)

g[0,3], p[0,3]= (c4, --)

¢¢

¢¢

Four-input prefix sums network

Scan order

Four-bitCarry lookahead network

Page 100: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

39Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Parallel Prefix Sums Network Built of Two k/2-Input Networks and k/2 Adders(Ladner-Fischer)

Delay recurrence D(k) = D(k/2) + 1 = log2kCost recurrence C(k) = 2C(k/2) + k/2 = (k/2) log2kIncurs large fanout

. . .

Prefix Sums k/2 Prefix Sums k/2

. . .

xk–1 xk/2 xk/2–1 x0

s k–1 s k/2

s k/2–1 s 0+ +. . .

. . .

. . . . . .

. . .

. . .. . .

Recursive dividing

Page 101: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

40Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

a is t in the textbook

Source: Ercegovac and Lang, “Digital Arithmetic”, pp.81

Page 102: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

41Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Eliminate Large Fanout

Increase the number of levelsIncrease the number of cells

Page 103: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

42Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

The Brent-Kung Recursive Construction

Delay recurrence D(k) = D(k/2) + 2 = 2 log2k – 1 (–2 really)Cost recurrence C(k) = C(k/2) + k – 1 = 2k – 2 – log2k

Parallel prefix sums network built of one k/2-input network and k – 1 adders.

Prefix Sums k/2

xk–1 xk–2 x3 x2 x1 x0

s k–1 s k–2 s 3 s 2 s 1 s 0

++

+

+

+

. . .

. . .

. . .

. . .

Page 104: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

43Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Brent-Kung Carry Network (8-Bit Adder)

¢ ¢ ¢ ¢

¢ ¢

¢ ¢

¢ ¢ ¢

[7, 7 ] [6, 6 ] [5, 5 ] [4, 4 ] [3, 3 ] [2, 2 ] [1, 1 ] [0, 0 ]

[0, 7 ] [0, 6 ] [0, 5 ] [0, 4 ] [0, 3 ] [0, 2 ] [0, 1 ] [0, 0 ]

g p [0,1] [0,1]

g p [1,1] [1,1] g p [0,0] [0,0]

[2, 3 ] [4, 5 ]

[6, 7 ]

[4, 7 ] [0, 3 ]

[0, 1 ]

Page 105: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

44Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Source: Ercegovacand Lang, “Digital Arithmetic”, pp.83

Page 106: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

45Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Brent-Kung Carry Network (16-Bit Adder)x0x1x2x3x4x5x6x7

x8x9x10x11x12x13x14x15

s0s1s2s3s4s5s6s7s8s9s10s11

s12s13s14s15

1 2 3 4 5 6

Level

Reason for latency being 2 log2k – 2

Page 107: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

46Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Kogge-Stone Carry Network (16-Bit Adder)x0x1x2x3x4x5x6x7

x8x9x10x11x12x13x14x15

s0s1s2s3s4s5s6s7s8s9s10s11

s12s13s14s15

log2k levels (minimum possible)

Cost formulaC(k) = (k – 1)

+ (k – 2) + (k – 4) + . . . + (k – k/2)

= k log2k – k + 1

Page 108: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

47Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Source: Ercegovacand Lang, “Digital Arithmetic”, pp.84

Page 109: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

48Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Speed-Cost Tradeoffs in Carry Networks

2k – 2 – log2k2 log2k – 2 Brent-Kung

k log2k – k + 1log2kKogge-Stone

(k/2) log2klog2kLadner-Fischer

CostDelayMethod

. . .

Prefix Sums k/2 Prefix Sums k/2

. . .

xk? xk/2 xk/2? x0

sk? sk/2

sk/2? s0+ +. . .

. . .

. . . . . .

. . .

. . .. . .Improving the Ladner/Fischer design

These outputs can be produced one time unit later without increasing the overall latency

This strategy saves enough to make the overall cost linear (best possible)

Page 110: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

49Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Hybrid B-K/K-S Carry Network (16-Bit Adder)x0x1x2x3x4x5x6x7

x8x9x10x11x12x13x14x15

s0s 1s2s 3s4s5s 6s7s8s9s 10s11s12s 13s14s 15

x0

x1

x2

x3

x4

x5

x6

x7

x8

x9

x10

x11

x12

x13

x14

x15

s0s1s2s3s4s5s6s7s8s 9s10s11s12s13s14s15

1 2 3 4 5 6

Level

x0x1x2x3x4x5x6x7x8x9x10x11

x12x13x14x15

s0s1s2s3s4s5s6s7s8s9s10s11

s12s13s14s15

Brent- Kung

Brent- Kung

Kogge- Stone

Brent-Kung: 6 levels

26 cells

Kogge-Stone: 4 levels

49 cells

Hybrid: 5 levels

32 cells

Page 111: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

50Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Four-Bit Manchester Carry Chains (Transistor Level)

PH2g2

PH2g3

PH2g1

PH2g0

p3

p2

p1

p0

g[0,3]

PH2p[0,3]

(a)

PH2

PH2

g2

g3

g1

g0

p3

p2

p1

p0

g[0,3]

p[0,3]

g[0,2]

p[0,2]

g[0,1]

p[0,1]

PH2PH2

(b)

PH2 PH2

PH2 PH2

PH2 PH2

PH2PH2

Page 112: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

51Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Variations in Fast Adders

Chapter GoalsStudy alternatives to the carry-lookahead method for designing fast adders

Chapter HighlightsMany methods besides CLA are available

(both competing and complementary)Best design is technology-dependent

(often hybrid rather than pure)Knowledge of timing allows optimizations

Page 113: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

52Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Simple Carry-Skip Adders

cc ccc

cc ccc

ppppSkipSkipSkip

4-Bit Block

Skip logic (2 gates)

16 12

8

4

0

0

4

8

1216

[12,15] [8,11] [4,7][0,3]

(a) Ripple-carry adder.

(b) Simple carry-skip adder.

3 2 1 0

Ripple-carry stages

4-Bit Block

4-Bit Block

4-Bit Block

4-Bit Block

4-Bit Block

3 2 1 0

Page 114: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

53Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Carry-Skip Adder Using MUX

Source: Ercegovac and Lang, “Digital Arithmetic”, pp.66.

Page 115: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

54Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Another View of Carry-Skip Addition

Street/freeway analogy for carry-skip adder.

c

g

p

4j+1

4j+1

g

p

4j

4j

g

p

4j+2

4j+2

g

p

4j+3

4j+3

c

4j

4j+4

c

4j+3

c

4j+2

c

4j+1

One-way street

Freeway

Page 116: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

55Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Carry-Skip Adder with Fixed Block SizeBlock width b; k/b blocks to form a k-bit adder (assume b divides k)

Example: k = 32, b opt = 4, T opt = 12.5 stages(contrast with 32 stages for a ripple-carry adder)

Tfixed-skip-add = (b – 1) + 0.5 + (k/b – 2) + (b – 1) in block 0 OR gate skips in last block

≅ 2b + k/b – 3.5 stages

dT/db = 2 – k/b2 = 0 ⇒ b opt = √k/2

T opt = 2√2k – 3.5

. . .

1stage =

2 gate levels

Page 117: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

56Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Worst Case Delay

Source: Ercegovac and Lang, “Digital Arithmetic”, pp.67-68.

Page 118: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

57Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

1111

+0001 C0=0Worst case in block 0

0111

+0000 C12=1Worst case in last block

Page 119: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

58Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Carry-Skip Adder with Variable-Width Blocks (1/2)

b b b b. . .

RippleSkip

Carry path (1)

01t–1 t–2 Block widths

Carry path (3)

Carry path (2)

Carry path (2) goes through one fewer skip than (1), so block t-2 can be one bit wider than block t-1 without increasing the total delay.

Carry path (3) goes through one fewer skip than (1), so block 1 can be one bit wider than block 0 without increasing the total delay.

Page 120: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

59Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Carry-Skip Adder with Variable-Width Blocks (2/2)

The total number of bits in the t blocks is k:

2[b + (b + 1) + . . . + (b + t/2 – 1)] = t(b + t/4 – 1/2) = k

b = k/t – t/4 + 1/2

Tvar-skip-add = 2(b – 1) + 0.5 + t – 2 = 2k/t + t/2 – 2.5

dT/db = –2k/t 2 + 1/2 = 0 ⇒ t opt = 2√k

T opt = 2√k – 2.5 (a factor of √2 smaller than for fixed-block)

Let b=1

Page 121: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

60Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Multilevel Carry-Skip Adders

S 1

c out c in

S 1 S 1 S 1 S 1

S 2

S 1

c out c in

S 1 S 1 S 1 S 1

c out c in

S 2

S

1

S

1

S

1

Page 122: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

61Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Single-Level Carry-Skip Adder (Example 7.1)Assumptions: Each of the following takes one unit of time: generation of gi and pi, generation of level-i skip signal from level-(i–1) skip signals, ripple, skip, and formation of sum bit once the incoming carry is known

Build the widest possible one-level carry-skip adder with total delay of 8

c cbbbbbbb 0

2345678

2

inout

S1 S1 S1 S1 S1

0123456

Stage b0 takes 2 time units: one for generating gp and the other for generating carry.

Stage b1 cannot be more than 3 bits, because its output is available at time 3, so it can take one time unit for generating gp and two for propagation across 2 bits.

At the right end, block width is limited by the output timing requirement.

Page 123: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

62Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Generalization of Example 7.1 for total time T (even or odd)1 2 3 . . . T/2 T/2 . . . 4 3 11 2 3 . . . (T + 1)/2 . . . 4 3 1

Thus, for any T, the total width is ⎣(T + 1)2/4⎦ – 2

Stage b4 cannot be more than 3 bits, because its input become available at time 5 and the total adder delay is to be 8 units..

Max adder width = 18 (1 + 2 + 3 + 4 + 4 + 3 + 1)

At the left end, block width is limited by input timing.

Page 124: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

63Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Two-Level Carry-Skip Adder (1/2)

Given the delay pair {β, α} for a level-2 block in Fig. 7.7a, the number of level-1 blocks that can be accommodated is γ = min(β–1, α)

Example 7.2

Single-level carry-skip adder with Tassimilate = α

Single-level carry-skip adder with Tproduce = β

Width of the ith level-1 block in the level-2 block characterized by {β, α} is bi = min(β – γ + i + 1, α – i); the total block width is then ∑i=0 to γ–1 bi

c cbb

234β

inout

S1 S1 S1 S1 S1

12

– 1β – 2βb –3βb –2β

S1

b0

S1

1

c cbb

0123

αinout

S1 S1 S1 S1 S1

12

– 1α – 2αS1

b0

S1

b –1α b –2α

Page 125: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

64Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Two-Level Carry-Skip Adder (2/2)

Max adder width = 30(4 + 8 + 8 + 6 + 3 + 1)

c c

80

7 6 5 34 3

b b b b b b{8, 1} {7, 2} {6, 3} {5, 4} {4, 5} {3, 8}

inoutABCDEF

S2 S2 S2 S2 S2

Tproduce Tassimilate

(a)

3457 6

2 t=0t=8cout cin2

3

Block E Block D Block C Block B Block AF

Page 126: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

65Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Carry-Skip Adder Optimization Scheme

Inputs

Level-h skip

Block of b full-adder uni ts

I(b)

A(b)

G(b)

E (b) h S (b) h

Page 127: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

66Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Carry-Select Adders

Cselect-add(k) = 3Cadd(k/2) + k/2 + 1

Tselect-add(k) = Tadd(k/2) + 1

k/2-bit adder k/2-bit adder

k - 1 k/2 k - 1 0

0 1

k/2+1 k/2+1 k/2

1 0 Mux

k/2 c out

c k/2

c in

High k /2 bits Low k /2 bits

k /2-bit adder Carry-select adder for k-bit numbers built from three k/2-bit adders.

Page 128: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

67Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Two-level Carry-Select Adder Built of k/4-bit adders

k /4-bit adder k/4-bit adder

k /2 - 1 k /4 k /4 - 1 0

0 1

k/4+1 k/4+1 k/4

1 0 Mux

k/4

k/4-bit adder

k - 1 3k/4 0 1

k/4+1 k/4+1 k/4

1 0 Mux

k /4-bit adder

3k/4 - 1 k /2 0 1

1 0 Mux

k/2+1

k/4

c k/2

c k/4

c out

c in

, High k /2 bits Middle k /4 bits Low k /4 bits

k/2-bit conditional-sum

Page 129: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

68Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Conditional Adder

Source: Ercegovac and Lang, “Digital Arithmetic”, pp.86

Page 130: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

69Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Carry Select Adder

Source: Ercegovac and Lang, “Digital Arithmetic”, pp.87

Page 131: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

70Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Conditional Sum Adder

Source: Ercegovac and Lang, “Digital Arithmetic”, pp.87

Page 132: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

71Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

16-Bit Conditional Sum Adder

The same as Fig. 7.20 in textbookSource: Ercegovac and Lang, “Digital Arithmetic”, pp.89

Page 133: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

72Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Conditional-Sum AdderMultilevel carry-select idea carried out to the extreme (to 1-bit blocks.

C(k) ≅ 2C(k/2) + k + 2 ≅ k (log2k + 2) + k C(1)

T(k) = T(k/2) + 1 = log2k + T(1)

where C(1) and T(1) are the cost and delay of the circuit of the following circuit for deriving the sum and carry bits with a carry-in of 0 and 1

sc

xy

sc

ii

ii+1 i+1 i

For c = 0iFor c = 1i

k + 2 is an upper bound on number of single-bit 2-to-1 multiplexers needed for combining two k/2-bit adders into a k-bit adder

Page 134: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

73Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

A Hybrid Carry-Lookahead/Carry-Select Adder

Lookahead Carry Generator

Carry-Select

c

g, p

in

MuxMuxMux

cout

01

01

01

Block

The most popular hybrid addition scheme:

Page 135: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

74Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Summary

Source: Ercegovac and Lang, “Digital Arithmetic”, pp.114.

Page 136: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

75Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

A Hybrid Ripple-Carry/Carry-Lookahead Design

Any Two Addition Schemes Can Be CombinedOther possibilities: hybrid carry-select/ripple-carry

hybrid ripple-carry/carry-select. . .

cccc

4-Bit Lookahead Carry Generator

c12 8 4 016

16-bit Carry-Lookahead Adder

g p [12,15]

[12,15] g p [8,11]

[8,11] g p [4,7]

[4,7] g p [0,3]

[0,3]

c32c48

(with carry-out)

Page 137: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

76Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Optimizations in Fast Adders

What looks best at the block diagram or gate level may not be best when a circuit-level design is generated (effects of wire length, signal loading, . . . )

Modern practice: Optimization at the transistor level

Variable-block carry-lookahead adder

Optimizations for average or peak power consumption

Timing-based optimizations (next slide)

Page 138: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

77Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Multioperand Addition

Chapter GoalsLearn methods for speeding up the addition of several numbers (needed for multiplication or inner-product)

Chapter HighlightsRunning total kept in redundant formCurrent total + Next number → New total Deferred carry assimilationWallace/Dadda trees and parallel counters

Page 139: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

78Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Some Applications of Multioperand Addition

• • • • a • • • • x ---------- • • • • x a • • • • x a • • • • x a • • • • x a ----------------• • • • • • • • p

×

0123

0123

2 2 2 2

• • • • • • p • • • • • • p • • • • • • p • • • • • • p • • • • • • p • • • • • • p • • • • • • p -----------------• • • • • • • • • s

(0)(1)(2)(3)(4)(5)(6)

Multioperand addition problems for multiplication or inner-product computation in dot notation.

Page 140: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

79Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Serial Implementation with One Adder

Tserial-multi-add = O(n log(k + log n))

= O(n log k + n log log n)

Therefore, addition time grows superlinearly with n when k is fixed and logarithmically with k for a given n

Adderx

k bits

k + log n bits∑ xj=0

i–1

(i)

2 (j)

Partial sum register

Page 141: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

80Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Pipelined Adder

Source: Ercegovac and Lang, “Digital Arithmetic”, pp.166.

Page 142: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

81Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Parallel Implementation as Tree of Adders

Adding 7 numbers in a binary tree of adders.

Adder Adder Adder

AdderAdder

Adder

k

k+1

k+2

k+3

k+2

k+1k+1

k kk kk k

Ttree-fast-multi-add = O(log k + log(k + 1) + . . . + log(k + ⎡log2n⎤ – 1))

= O(log n log k + log n log log n)

Ttree-ripple-multi-add = O(k + log n) [Justified on the next slide]

⎡log2n⎤adder levelsn – 1

adders

Page 143: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

82Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Elaboration on Tree of Ripple-Carry Adders

Ttree-ripple-multi-add = O(k + log n)

Adder Adder Adder

AdderAdder

Adder

k

k+1

k+2

k+3

k+2

k+1k+1

k kk kk k

Fig. 8.5 Ripple-carry adders at levels i and i + 1 in the tree of adders used for multi-operand addition.

. . .

. . . Level i

Level i+1

HAFA

HAFA

t

t+1

tt+1t+1

t+1

t+1

t+2

t+2 t+2

t+2

t+3t+2t+3

The absolute best latency that we can hope for is O(log k + log n)

There are kn data bits to process and using any set of computation elements with constant fan-in, this requires O(log(kn)) time

We will see shortly that carry-save adders achieve this optimum time

Page 144: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

83Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Carry-Save Adders

FA FAFA FA FAFA

FA FAFA FA FAFA

Cut

Carry-propagate adder

Carry-save adder (CSA) or (3; 2)-counter or 3-to-2 reduction circuit

c

in

c

out

dot notation.

Half-adder

Full-adder

Specifying full- and half-adder blocks, with their inputs and outputs, in dot notation.

Ripple carry adder

Carry save adder

Page 145: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

84Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Example of CSA

Also considered as reduction by column [3:2].

[p:q] counter: p bits of the same weight and produce q bits of adjacent weights.

3

2

Reduction by row (3:2) counter

Page 146: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

85Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Use Dot Notation

Carry-propagate adder

Carry-save adder (CSA) or (3; 2)-counter or 3-to-2 reduction circuit

c

in

c

out

Page 147: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

86Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Multioperand Addition Using Carry-Save Adders

Tree of carry-save adders reducing seven numbers to two.

CSACSA

CSA

CSA

CSA

Tcarry-save-multi-add = O(tree height + TCPA)

= O(log n + log k)

Ccarry-save-multi-add = (n – 2)CCSA + CCPA

Carry-propagate adder

Serial carry-save addition using a single CSA.

CSA

Input

Sum registerCarry register

Output

CPA

Page 148: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

87Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Reduction by a CSA Tree

12 FAs

6 FAs

6 FAs

4 FAs + 1 HA

7-bit adder

Total cost = 7-bit adder + 28 FAs + 1 HA

Addition of seven 6-bit numbers in dot notation.

8 7 6 5 4 3 2 1 0 Bit position

7 7 7 7 7 7 6×2 = 12 FAs2 5 5 5 5 5 3 6 FAs3 4 4 4 4 4 1 6 FAs

1 2 3 3 3 3 2 1 4 FAs + 1 HA 2 2 2 2 2 1 2 1 7-bit adder

--Carry-propagate adder--

1 1 1 1 1 1 1 1 1

Representing a seven-operand addition in tabular form.

A full-adder compacts 3 dots into 2(compression ratio of 1.5)

A half-adder rearranges 2 dots(no compression, but still useful)

Page 149: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

88Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Width of Adders in a CSA TreeAdding seven k-bit numbers and the CSA/CPA widths required.

Due to the gradual retirement (dropping out) of some of the result bits, CSA widths do not vary much as we go down the tree levels

k-bit CPA

k-bit CSA k-bit CSA

k-bit CSA

k-bit CSA

0k+2

The index pair [i, j] means that bit positions from i up to j are involved.

k-bit CSA

[0, k–1] [0, k–1]

[0, k–1] [0, k–1]

[0, k–1] [0, k–1]

[0, k–1] [0, k–1]

[0, k–1]

[1, k] [1, k]

[1, k]

[1, k]

[0, k–1]

[2, k+1] [2, k+1]

[2, k+1]

[2, k+1] [1, k–1]

1

[1, k+1]

Bit K+1 does not involve addition

Page 150: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

89Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Wallace and Dadda Trees

h(n) = 1 + h(⎡2n/3⎤)

n(h) = ⎣3n(h – 1)/2⎦

2×1.5h–1< n(h) ≤ 2×1.5h

. . . inputsn

2 outputs

levelshh levels

Table 8.1 The maximum number n(h) of inputs for an h-level CSA tree

––––––––––––––––––––––––––––––––––––h n(h) h n(h) h n(h)––––––––––––––––––––––––––––––––––––0 2 7 28 14 4741 3 8 42 15 7112 4 9 63 16 10663 6 10 94 17 15994 9 11 141 18 23985 13 12 211 19 35976 19 13 316 20 5395––––––––––––––––––––––––––––––––––––n(h): Maximum number of inputs for h levels

Page 151: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

90Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Wallace and Dadda Reduction Trees

6 FAs

11 FAs

7 FAs

4 FAs + 1 HA

7-bit adder

Total cost = 7-bit adder + 28 FAs + 1 HA

Adding seven 6-bit numbers using Dadda’s strategy.

12 FAs

6 FAs

6 FAs

4 FAs + 1 HA

7-bit adder

Total cost = 7-bit adder + 28 FAs + 1 HA

Addition of seven 6-bit numbers using Wallace strategy.

Wallace tree: Reduce the number of operands at the earliest possible opportunity

Dadda tree: Postpone the reduction to the extent possible without causing added delay

h n(h)2 43 64 95 136 19

Page 152: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

91Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

A Small Optimization in Reduction Trees

6 FAs

11 FAs

7 FAs

4 FAs + 1 HA

7-bit adder

Total cost = 7-bit adder + 28 FAs + 1 HA

Adding seven 6-bit numbers using Dadda’s strategy.

taking advantage of the final adder’s carry-in.

6 FAs

11 FAs

6 FAs + 1 HA

3 FAs + 2 HA

7-bit adder

Total cost = 7-bit adder + 26 FAs + 3 HA

Page 153: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

92Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Parallel Counters

A 10-input parallel counter also known as a (10; 4)-counter.

0

1 0 1 0 1 0

2 1 1 0

1

0

2

13 2

3-bit ripple-carry adder

FA FA

HA

HA

FA

FAFAFA1-bit full-adder = (3; 2)-counter

Circuit reducing 7 bits to their3-bit sum = (7; 3)-counter

Circuit reducing n bits to their ⎡log2(n + 1)⎤-bit sum

= (n; ⎡log2(n+1)⎤)-counter

Page 154: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

93Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Implementation of [4:2] Counter

Source: Ercegovac and Lang, “Digital Arithmetic”, pp.145.

Page 155: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

94Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Implementation of [5:2] Counter

Source: Ercegovac and Lang, “Digital Arithmetic”, pp.146.

Page 156: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

95Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Implementation of [7:2] Counter

Source: Ercegovac and Lang, “Digital Arithmetic”, pp.146.

Page 157: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

96Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

Generalized Parallel Counters

(5, 5; 4)-counter Dot notation for a (5, 5; 4)-counter and the use of such counters for reducing five numbers to two numbers.

. . .

Multicolumn reduction

(2, 3; 3)-counter

Unequal columns

Gen. parallel counter = Parallel compressor

Page 158: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

97Computer Arithmetic 2, Dept. of EE, Fu Jen Catholic University, Taiwan

A General Strategy for Column Compression

n + ψ1 + ψ2 + ψ3 + . . . ≤ 3 + 2ψ1 + 4ψ2 + 8ψ3 + . . .

n – 3 ≤ ψ1 + 3ψ2 + 7ψ3 + . . .

. . . i – 3 i – 2 i – 1 i

n inputs

To i + 1

To i + 2

To i + 3

One circuit slice

ψ 1 ψ 2

ψ 3

ψ 1 ψ 2 ψ 3

(n; 2)-counters

Example: Design a bit-slice of an (11; 2)-counterSolution: Let’s limit transfers to two stages. Then, 8 ≤ ψ1 + 3ψ2Possible choices include ψ1 = 5, ψ2 = 1 or ψ1 = ψ2 = 2

Page 159: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

1Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Multiplication

Instructor: Kuan Jen Lin E-Mail: [email protected]. of EE, FJU, TaiwanRoom: SF 727B

Most slides originate from the textbook author’s PowerPoint presentation files.

Page 160: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

2Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

III Multiplication

Chapter 12 Variations in Multipliers

Chapter 11 Tree and Array Multipliers

Chapter 10 High-Radix Multipliers

Chapter 9 Basic Multiplication Schemes

Topics in This Part

Review multiplication schemes and various speedup methods• Multiplication is heavily used (in arith & array indexing)• Division = reciprocation + multiplication• Multiplication speedup: high-radix, tree, . . . • Bit-serial, modular, and array multipliers

Page 161: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

3Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

9 Basic Multiplication Schemes

Chapter GoalsStudy shift/add or bit-at-a-time multipliersand set the stage for faster methods andvariations to be covered in Chapters 10-12

Chapter HighlightsMultiplication = multioperand additionHardware, firmware, software algorithmsMultiplying 2’s-complement numbersThe special case of one constant operand

Page 162: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

4Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Shift/Add Multiplication Algorithms

Notation for our discussion of multiplication algorithms:

a Multiplicand ak–1ak–2 . . . a1a0x Multiplier xk–1xk–2 . . . x1x0p Product (a × x) p2k–1p2k–2 . . . p3p2p1p0

Initially, we assume unsigned operands

Multiplication of two 4-bit unsigned binary numbers in dot notation.

Product

Partial products bit-matrix

a x

p

2

x a

0 0

1 x a 2 1 x a 2

2 2

2 3 3

x a

Multiplicand Multiplier ×

Page 163: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

5Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Preferred

Multiplication Recurrence

Multiplication with right shifts: top-to-bottom accumulation

p(j+1) = (p(j) + xj a 2k) 2–1 with p(0) = 0 and|–––add–––| p(k) = p = ax + p(0)2–k

|––shift right––|

Product

Partial products bit-matrix

a x

p

2

x a

0 0

1 x a 2 1 x a 2

2 2

2 3 3

x a

Multiplicand Multiplier ×

Multiplication with left shifts: bottom-to-top accumulation

p(j+1) = 2p(j) + xk–j–1a with p(0) = 0 and|shift| p(k) = p = ax + p(0)2k

|––––add––––|

Page 164: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

6Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Examples of Basic MultiplicationRight-shift algorithm Left-shift algorithm======================== =======================a 1 0 1 0 a 1 0 1 0x 1 0 1 1 x 1 0 1 1======================== =======================p(0) 0 0 0 0 p(0) 0 0 0 0+x0a 1 0 1 0 2p(0) 0 0 0 0 0––––––––––––––––––––––––– +x3a 1 0 1 02p(1) 0 1 0 1 0 ––––––––––––––––––––––––p(1) 0 1 0 1 0 p(1) 0 1 0 1 0+x1a 1 0 1 0 2p(1) 0 1 0 1 0 0––––––––––––––––––––––––– +x2a 0 0 0 02p(2) 0 1 1 1 1 0 ––––––––––––––––––––––––p(2) 0 1 1 1 1 0 p(2) 0 1 0 1 0 0+x2a 0 0 0 0 2p(2) 0 1 0 1 0 0 0––––––––––––––––––––––––– +x1a 1 0 1 02p(3) 0 0 1 1 1 1 0 ––––––––––––––––––––––––p(3) 0 0 1 1 1 1 0 p(3) 0 1 1 0 0 1 0+x3a 1 0 1 0 2p(3) 0 1 1 0 0 1 0 0––––––––––––––––––––––––– +x0a 1 0 1 02p(4) 0 1 1 0 1 1 1 0 ––––––––––––––––––––––––p(4) 0 1 1 0 1 1 1 0 p(4) 0 1 1 0 1 1 1 0======================== =======================

Page 165: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

7Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Programmed Using Right-Shift Algorithm{Using right shifts, multiply unsigned m_cand and m_ier, storing the resultant 2k-bit product in p_high and p_low. Registers: R0 holds 0 Rc for counter

Ra for m_cand Rx for m_ierRp for p_high Rq for p_low}

{Load operands into registers Ra and Rx}mult: load Ra with m_cand

load Rx with m_ier{Initialize partial product and counter}

copy R0 into Rpcopy R0 into Rqload k into Rc

{Begin multiplication loop}m_loop: shift Rx right 1 {LSB moves to carry flag}

branch no_add if carry = 0 add Ra to Rp {carry flag is set to cout}

no_add: rotate Rp right 1 {carry to MSB, LSB to carry}rotate Rq right 1 {carry to MSB, LSB to carry}decr Rc {decrement counter by 1}branch m_loop if Rc ≠ 0

{Store the product}store Rp into p_highstore Rq into p_low

m_done: ...

R0 Rc Counter0Ra RxRp Rq

Multiplicand MultiplierProduct, high Product, low

Page 166: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

8Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Time Complexity of Programmed Multiplication

Assume k-bit words

k iterations of the main loop 6-7 instructions per iteration, depending on the multiplier bit

Thus, 6k + 3 to 7k + 3 machine instructions,ignoring operand loads and result store

k = 32 implies 200+ instructions on average

This is too slow for many modern applications!Microprogrammed multiply would be somewhat better

Page 167: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

9Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Sequential Multiplication with Right Shifts

Multiplier x

Mux

Adder

0

out c

0 1

Doublewidth partial product p

Multiplicand a

Shift

Shift

(j)

j x

x a j

k

k

k

Hardware realization

Clock?

Control path?

Page 168: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

10Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Sequential Multiplication with Left Shifts

Multiplier x

Mux

2k-bit adder

0

out c

0 1

Doublewidth partial product p

Multiplicand a

Shift

Shift

(j)

k-j-1 x

a

2k

k k-j-1 x

2k

Page 169: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

11Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Multiplication of Signed Numbers

============================a 1 0 1 1 0x 0 1 0 1 1============================p(0) 0 0 0 0 0+x0a 1 0 1 1 0–––––––––––––––––––––––––––––2p(1) 1 1 0 1 1 0p(1) 1 1 0 1 1 0+x1a 1 0 1 1 0–––––––––––––––––––––––––––––2p(2) 1 1 0 0 0 1 0p(2) 1 1 0 0 0 1 0+x2a 0 0 0 0 0–––––––––––––––––––––––––––––2p(3) 1 1 1 0 0 0 1 0p(3) 1 1 1 0 0 0 1 0+x3a 1 0 1 1 0–––––––––––––––––––––––––––––2p(4) 1 1 0 0 1 0 0 1 0p(4) 1 1 0 0 1 0 0 1 0+x4a 0 0 0 0 0–––––––––––––––––––––––––––––2p(5) 1 1 1 0 0 1 0 0 1 0p(5) 1 1 1 0 0 1 0 0 1 0============================

Negative multiplicand,positive multiplier:

No change, other than looking out for propersign extension

Page 170: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

12Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Multiplication with a Negative Multiplier

============================a 1 0 1 1 0x 1 0 1 0 1============================p(0) 0 0 0 0 0+x0a 1 0 1 1 0–––––––––––––––––––––––––––––2p(1) 1 1 0 1 1 0p(1) 1 1 0 1 1 0+x1a 0 0 0 0 0–––––––––––––––––––––––––––––2p(2) 1 1 1 0 1 1 0p(2) 1 1 1 0 1 1 0+x2a 1 0 1 1 0–––––––––––––––––––––––––––––2p(3) 1 1 0 0 1 1 1 0p(3) 1 1 0 0 1 1 1 0+x3a 0 0 0 0 0–––––––––––––––––––––––––––––2p(4) 1 1 1 0 0 1 1 1 0p(4) 1 1 1 0 0 1 1 1 0+(−x4a) 0 1 0 1 0–––––––––––––––––––––––––––––2p(5) 0 0 0 1 1 0 1 1 1 0p(5) 0 0 0 1 1 0 1 1 1 0============================

Negative multiplicand,negative multiplier:

In last step (the sign bit), subtract rather than add

10101=-1x24 + 22+20

Page 171: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

13Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Booth’s Recoding–––––––––––––––––––––––––––––––––––––xi xi–1 yi Explanation–––––––––––––––––––––––––––––––––––––0 0 0 No string of 1s in sight0 1 1 End of string of 1s in x1 0 −1 Beginning of string of 1s in x1 1 0 Continuation of string of 1s in x

–––––––––––––––––––––––––––––––––––––

Example1 0 0 1 1 1 0 1 1 0 1 0 1 1 1 0 Operand x

(1) −1 0 1 0 0 −1 1 0 −1 1 −1 1 0 0 −1 0 Recoded version y

Justification2j + 2j–1 + . . . + 2i+1 + 2i = 2j+1 – 2i

Page 172: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

14Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Example Multiplication with Booth’s Recoding

============================a 1 0 1 1 0x 1 0 1 0 1 Multipliery −1 1 −1 1 −1 Booth-recoded============================p(0) 0 0 0 0 0+y0a 0 1 0 1 0–––––––––––––––––––––––––––––2p(1) 0 0 1 0 1 0p(1) 0 0 1 0 1 0+y1a 1 0 1 1 0–––––––––––––––––––––––––––––2p(2) 1 1 1 0 1 1 0p(2) 1 1 1 0 1 1 0+y2a 0 1 0 1 0–––––––––––––––––––––––––––––2p(3) 0 0 0 1 1 1 1 0p(3) 0 0 0 1 1 1 1 0+y3a 1 0 1 1 0–––––––––––––––––––––––––––––2p(4) 1 1 1 0 0 1 1 1 0p(4) 1 1 1 0 0 1 1 1 0y4a 0 1 0 1 0–––––––––––––––––––––––––––––2p(5) 0 0 0 1 1 0 1 1 1 0p(5) 0 0 0 1 1 0 1 1 1 0============================

2’ complement of 10110 is 01010

Page 173: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

15Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Multiplication by ConstantsExplicit, e.g. y := 12 ∗ x + 1

Implicit, e.g. A[i, j] := A[i, j] + B[i, j]

Address of A[i, j] = base + n ∗ i + j

Software aspects:Optimizing compilers replace multiplications by shifts/adds/subs

Produce efficient code using as few registers as possible Find the best code by a time/space-efficient algorithm

0 1 2 . . . n – 1 0 1 2 ...

m – 1

Row i

Column j

Hardware aspects:Synthesize special-purpose units such as filters

y[t] = a0x[t] + a1x[t – 1] + a2x[t – 2] + b1y[t – 1] + b2y[t – 2]

Page 174: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

16Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Multiplication Using Binary Expansion

Example: Multiply R1 by the constant 113 = (1 1 1 0 0 0 1)two

R2 ← R1 shift-left 1R3 ← R2 + R1R6 ← R3 shift-left 1R7 ← R6 + R1R112 ← R7 shift-left 4R113 ← R112 + R1

Shift, add Shift

Ri: Register that contains i times (R1)

This notation is for clarity; only one register other than R1 is needed

Shorter sequence using shift-and-add instructions

R3 ← R1 shift-left 1 + R1R7 ← R3 shift-left 1 + R1R113 ← R7 shift-left 4 + R1

Page 175: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

17Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Multiplication via Recoding

Example: Multiply R1 by 113 = (1 1 1 0 0 0 1)two = (1 0 0−1 0 0 0 1)two

R8 ← R1 shift-left 3R7 ← R8 – R1R112 ← R7 shift-left 4R113 ← R112 + R1

Shift, add Shift

Shorter sequence using shift-and-add/subtract instructions

R7 ← R3 shift-left 3 – R1R113 ← R7 shift-left 4 + R1

Shift, subtract

6 shift or add (3 shift-and-add) instructions needed without recoding

Page 176: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

18Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Multiplication via Factorization

Example: Multiply R1 by 119 = 7 × 17 = (8 – 1) × (16 + 1)

R8 ← R1 shift-left 3R7 ← R8 – R1R112 ← R7 shift-left 4R119 ← R112 + R7

Shorter sequence using shift-and-add/subtract instructions

R7 ← R3 shift-left 3 – R1R119 ← R7 shift-left 4 + R7

119 = (1 1 1 0 1 1 1)two = (1 0 0 0−1 0 0−1)two

More instructions may be needed without factorization

Requires a scratch register for holding the 7 multiple

Page 177: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

19Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

High-Radix Multipliers

Chapter GoalsStudy techniques that allow us to handlemore than one multiplier bit in each cycle(two bits in radix 4, three in radix 8, . . .)

Chapter HighlightsHigh radix gives rise to “difficult” multiplesRecoding (change of digit-set) as remedyCarry-save addition reduces cycle timeImplementation and optimization methods

Page 178: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

20Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Radix-4 Multiplication in Dot Notation

Number of cycles is halved, but now the “difficult” multiple 3amust be dealt with

Product

Partial products bit-matrix

a x

p

2

x a

0 0

1 x a 2 1 x a 2

2 2

2 3 3

x a

Multiplicand Multiplier ×

Multiplier x

p Product

Multiplicand a

(x x ) a 4 1 3 2 two

4 0 a (x x ) 1 0 two

×

Radix 2

Radix-4, or two-bit-at-a-time, multiplication in dot notation

Page 179: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

21Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

A Possible Design for a Radix-4 Multiplier

Precomputed via shift-and-add(3a = 2a + a) 0 a 2a

3aMultiplier

To the adder

2-bit shifts

00 01 10 11Mux

xi+1 xi

Page 180: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

22Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Example Radix-4 Multiplication Using 3a================================a 0 1 1 03a 0 1 0 0 1 0x 1 1 1 0================================p(0) 0 0 0 0+(x1x0)twoa 0 0 1 1 0 0–––––––––––––––––––––––––––––––––4p(1) 0 0 1 1 0 0p(1) 0 0 1 1 0 0+(x3x2)twoa 0 1 0 0 1 0–––––––––––––––––––––––––––––––––4p(2) 0 1 0 1 0 1 0 0p(2) 0 1 0 1 0 1 0 0================================

x

p

a

(x x )3 2

(x x )1 0

×

Page 181: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

23Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

A Second Design for a Radix-4 Multiplier

xi+1 xi c Mux control Set carry---- --- --- ---------------- ------------0 0 0 0 0 00 0 1 0 1 00 1 0 0 1 00 1 1 1 0 01 0 0 1 0 01 0 1 1 1 11 1 0 1 1 11 1 1 0 0 1

replacing 3a with 4a (carry into next higher radix-4 multiplier digit) and –a.

0 a 2a  

Multiplier

To the adder

+c FF Set if = = 1 or if = c = 1c

00 01 10 11Mux

2-bit shifts

mod 4Carry

xi+1 xi

xi+1xi+1

xixi+1(xi ∨ c)xi+1⊕ xi c xi ⊕ c

c

Page 182: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

24Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Radix-4 Booth’s Recoding–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––xi+1 xi xi–1 yi+1 yi zi/2 Explanation–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––0 0 0 0 0 0 No string of 1s in sight0 0 1 0 1 1 End of string of 1s0 1 0 0 1 1 Isolated 10 1 1 1 0 2 End of string of 1s1 0 0 −1 0 −2 Beginning of string of 1s1 0 1 −1 1 −1 End a string, begin new one1 1 0 0 −1 −1 Beginning of string of 1s1 1 1 0 0 0 Continuation of string of 1s–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––

(1) −2 2 −1 2 −1 −1 0 −2 Radix-4 version z

ContextRecoded

radix-2 digits Radix-4 digit

Example1 0 0 1 1 1 0 1 1 0 1 0 1 1 1 0 Operand x

(1) −1 0 1 0 0 −1 1 0 −1 1 −1 1 0 0 −1 0 Recoded version y

Only shifting and complementation required

Page 183: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

25Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Example Multiplication via Modified Booth’s Recoding

================================a 0 1 1 0x 1 0 1 0z −1 −2 Radix-4================================p(0) 0 0 0 0 0 0+z0a 1 1 0 1 0 0–––––––––––––––––––––––––––––––––4p(1) 1 1 0 1 0 0p(1) 1 1 1 1 0 1 0 0+z1a 1 1 1 0 1 0–––––––––––––––––––––––––––––––––4p(2) 1 1 0 1 1 1 0 0p(2) 1 1 0 1 1 1 0 0================================

x

p

a

(x x ) a 413 2 two

40a(x x )1 0 two

´

Page 184: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

26Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Multiple Generation with Radix-4 Booth’s Recoding

two non0a 2a

EnableSelect

z a

neg

ii+1 i?

i/2

0 1Mux

k+10, a, or 2a

To adder inputAdd/subtract control

x

Multiplier

xx

Recoding Logic

Multiplicand

0

k

0

2-bit shift

Init. 0

Could have named this signal one/two

Sign extension, not 0

Page 185: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

27Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Using Carry-Save Adders

Mux

0 2a

0 a

Multiplier

New Cumulative Partial Product

Old Cumulative Partial Product

CSA

Mux xi+1 xi

Adder

Page 186: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

28Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Keeping the Partial Product in Carry-Save Form

0

Multiplier

k

k

k-Bit CSA

k

Partial Product

k

Mux

k-Bit Adder

Mux

Multiplicand

Carry

Sum

Shift

Old PP

CS sum

New PP

Next multiple

Page 187: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

29Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Carry-Save Multiplier with Radix-4 Booth’s Recoding (1/2)

a

Multiplier

x i+1

x i

Adder

New cumulati ve partial product

Old cumulati ve partial product

FF

2-bit Adder

To the lower hal f of pa rtial product

Booth recoder and selector

CSA

x i-1

z a i/2

Extra “dot”

Page 188: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

30Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

x x x x

Recoding Logic

two non0a 2a

EnableSelect

z a

neg

ii+1 i?

i/2

i?

0 1Mux

k+10, a, or 2a

k+2

Selective Complement

0, a, , 2a, or ?a 

Extra "Dot" for Column i

xi+2

Carry-Save Multiplier with Radix-4 Booth’s Recoding (2/2)

Page 189: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

31Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Another Design for Radix-4 Multiplication

Mux

0 2a

0 a

Multiplier

CSA

Mux xi+1 xi

Adder

CSANew Cumulative Partial Product

Old Cumulative Partial Product

FF2-BitAdder

To the Lower Half of Partial Product

Page 190: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

32Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Radix-8 and Radix-16 MultipliersMultiplier

CSA CSA

CSA

CSA

Partial Product (Upper Half)

Mux0 8a

Mux0 4a

Mux0 2a

Mux0 a

x i+3

x i+2

x i+1

x i

CarrySum

4-Bit Shift

FF

To the Lower Half of Partial Product

3 4-BitAdder

4

4

4-bitrightshift

Page 191: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

33Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

A Spectrum of Multiplier Design Choices

Basic binary

Adder

Adder

Next multiple

Partial product

...

Several multiples

Adder

. . .All multiples

Small CSA tree Full CSA

tree

High-radix or partial tree

Full treeSpeed up Economize

Partial product

Page 192: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

34Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

VLSI Complexity IssuesA radix-2b multiplier requires:

bk two-input AND gates to form the partial products bit-matrixO(bk) area for the CSA treeAt least Θ(k) area for the final carry-propagate adder

Total area: A = O(bk)Latency: T = O((k/b) log b + log k)

Any VLSI circuit computing the product of two k-bit integers must satisfy the following constraints:

AT grows at least as fast as k3/2

AT2 is at least proportional to k2

The preceding radix-2b implementations are suboptimal, because:

AT = O(k2 log b + bk log k)AT2 = O((k3/b) log2b)

Page 193: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

35Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Comparing High- and Low-Radix Multipliers

Intermediate designs do not yield better AT or AT2 values;The multipliers remain asymptotically suboptimal for any b

O(k2)O(k2 log2k)O(k3)AT2

O(k3/2)O(k2 log k)O(k2)AT

AT- or AT2-Optimal

High Speedb = O(k)

Low-Costb = O(1)

AT = O(k2 log b + bk log k) AT2 = O((k3/b) log2b)

By the AT measure (indicator of cost-effectiveness), slower radix-2 multipliers are better than high-radix or tree multipliersThus, when an application requires many independent multiplications, it is more cost-effective to use a large number of slower multipliers

High-radix multiplier latency can be reduced from O((k/b) log b + log k) to O(k/b + log k) through more effective pipelining (Chapter 11)

Page 194: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

36Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Tree and Array Multipliers

Chapter GoalsStudy the design of multipliers for highest possible performance (speed, throughput)

Chapter HighlightsTree multiplier = reduction tree

+ redundant-to-binary converterAvoiding full sign extension in multiplying

signed numbersArray multiplier = one-sided reduction tree

+ ripple-carry adder

Page 195: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

37Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Full-Tree Multipliers

Higher-order product bits

Multipliera

a

a

a. . .

. . .

Some lower-order product bits are generated directly

Redundant result

Redundant-to-Binary Converter

Multiple- Forming Circuits

(Multi-Operand Addition Tree)

Partial-Products Reduction Tree

Page 196: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

38Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Full-Tree versus Partial-Tree Multiplier

Adder

Large tree of carry-save

adders

. . .

All partial products

Product

Adder

Small tree of carry-save

adders

. . .

Several partial products

Product

Log-depth

Log-depth

Page 197: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

39Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Variations in Full-Tree Multiplier Design

Designs are distinguished by variations in three elements:

Higher-order product bits

Multipliera

a

a

a. . .

. . .

Some lower-order product bits are generated directly

Redundant result

Redundant-to-Binary Converter

Multiple- Forming Circuits

(Multi-Operand Addition Tree)

Partial-Products Reduction Tree

2. Partial products reduction tree

3. Redundant-to-binary converter

1. Multiple-forming circuits

Page 198: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

40Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Example of Variations in CSA Tree Design

1 2 3 4 3 2 1 FA FA FA HA -------------------- 1 3 2 3 2 1 1 FA HA FA HA ---------------------- 2 2 2 2 1 1 1 4-Bit Adder ----------------------1 1 1 1 1 1 1 1

Wallace Tree (5 FAs + 3 HAs + 4-Bit Adder)

1 2 3 4 3 2 1 FA FA -------------------- 1 3 2 2 3 2 1 FA HA HA FA ---------------------- 2 2 2 2 1 2 1 6-Bit Adder ----------------------1 1 1 1 1 1 1 1

Dadda Tree (4 FAs + 2 HAs + 6-Bit Adder)

Two different binary 4 × 4 tree multipliers.

Latency!!

Page 199: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

41Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

A 7X7 Tree Multiplier

10-bit CPA

7-bit CSA 7-bit CSA

7-bit CSA

10-bit CSA

2Ignore

The index pair [i, j] means that bit positions from i up to j are involved.

7-bit CSA

[0, 6] [1, 7]

[2, 8] [6, 12]

[3, 11] [1,8]

[3, 9] [4, 10]

[5, 11]

[2, 8] [5, 11]

[6, 12]

[2,12]

[3, 12]

[4,13] [4,12]

[4, 13]

[3,9]

3

[3,12]

[2, 8]

[3,12]

[1, 6]

01

xxxxxxx [0,6]

xxxxxxx [1,7]

xxxxxxx [2,8]

xxxxxxx [3,9]

xxxxxxx [4,10]

xxxxxxx [5,11]

Xxxxxxx [6,12]

Page 200: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

42Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Balanced-Delay Tree for 11 Inputs

FA FA FA

FA FA

FA FA

FA

FA

Inputs

Level-1 carries

Level-2 carries

Level-3 carries

Level-4 carry

Outputs

FA

FA

FA

FA

FA

FA

FA

FA

FA

11 + ψ1 = 2ψ1 + 3

Therefore, ψ1 = 8 carries are needed

Page 201: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

43Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Binary Tree of 4-to-2 Reduction Modules

Due to its recursive structure, a binary tree is more regular than a 3-to-2 reduction tree when laid out in VLSI

CSA

CSA

4-to-2 4-to-2 4-to-2 4-to-2

4-to-2 4-to-2

4-to-24-to-2 reduction module implemented with twolevels of (3; 2)-counters

Page 202: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

44Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Tree Multipliers for Signed Numbers

From Fig. 8.18 Sign extension in multioperand addition.

---------- Extended positions ---------- Sign Magnitude positions ---------

xk–1 xk–1 xk–1 xk–1 xk–1 xk–1 xk–2 xk–3 xk–4 . . .yk–1 yk–1 yk–1 yk–1 yk–1 yk–1 yk–2 yk–3 yk–4 . . .zk–1 zk–1 zk–1 zk–1 zk–1 zk–1 zk–2 zk–3 zk–4 . . .

α

β

γ

αβγ

x α

β

γ

α

β

γ

α

β

γ

α

β

γ

α

β

γ

α

β

α

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x x

FA FA FA FA FA FA

Five redundant copies removed

Sign extensions Signs

The difference in multiplication is the shifting sign positions

Fig. 11.7 Sharing of full adders to reduce the CSA width in a signed tree multiplier.

Page 203: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

45Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Using the Negative-Weight Property of the Sign Bit

Sign extension is a way of converting negatively weighted bits (negabits) to positively weighted bits (posibits) to facilitate reduction, but there are other methods of accomplishing the same without introducing a lot of extra bits

Baugh and Wooley have contributed two such methods

4 3 2 1 0 4 3 2 1 0

4 3 2 1 0 4 3 2 1 0 a x a x a x a x a x

a a a a a x x x x x 4 0 3 0 2 0 1 0 0 0 4 1 3 1 2 1 1 1 0 1 4 2 3 2 2 2 1 2 0 2 4 3 3 3 2 3 1 3 0 3 4 4 3 4 2 4 1 4 0 4

×

a a a a a x x x x x ---------------------------- a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x --------------------------------------------------------- p p p p p p p p p p a a a a a x x x x x ---------------------------- -a x a x a x a x a x -a x a x a x a x a x -a x a x a x a x a x -a x a x a x a x a x a x -a x -a x -a x -a x --------------------------------------------------------- p p p p p p p p p p a a a a a x x x x x ---------------------------- a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a a 1 x x --------------------------------------------------------- p p p p p p p p p p --------------------------- a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x --------------------------------------------------------- p p p p p p p p p p

1 1

4 0 3 0 2 0 1 0 0 0 4 1 3 1 2 1 1 1 0 1 4 2 3 2 2 2 1 2 0 2 4 3 3 3 2 3 1 3 0 3 4 4 3 4 2 4 1 4 0 4 4 4 4 4

4 3 2 1 0 4 3 2 1 0

4 3 2 1 0 4 3 2 1 0

4 0 3 0 2 0 1 0 0 0 4 1 3 1 2 1 1 1 0 1 4 2 3 2 2 2 1 2 0 2 4 3 3 3 2 3 1 3 0 3 4 4 3 4 2 4 1 4 0 4

4 0 3 0 2 0 1 0 0 0 4 1 3 1 2 1 1 1 0 1 4 2 3 2 2 2 1 2 0 2 4 3 3 3 2 3 1 3 0 3 4 4 3 4 2 4 1 4 0 4

×

×

×

9 8 7 6 5 4 3 2 1 0

9 8 7 6 5 4 3 2 1 0

9 8 7 6 5 4 3 2 1 0

9 8 7 6 5 4 3 2 1 0

a. Unsigned

b. 2's-complement

c. Baugh-Wooley

d. Modified B-W __

__ __

__ __ __ __ __

_ _

_ _

_ _ _ _

Page 204: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

46Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Fig. 11.8

4 3 2 1 0 4 3 2 1 0 a x a x a x a x a x

a a a a a x x x x x 4 0 3 0 2 0 1 0 0 0 4 1 3 1 2 1 1 1 0 1 4 2 3 2 2 2 1 2 0 2 4 3 3 3 2 3 1 3 0 3 4 4 3 4 2 4 1 4 0 4

×

a x -a x -a x -a x -a x --------------------------------------------------------- p p p p p p p p p p a a a a a x x x x x ---------------------------- a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a a 1 x x --------------------------------------------------------- p p p p p p p p p p --------------------------- a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x --------------------------------------------------------- p p p p p p p p p p

1 1

4 0 3 0 2 0 1 0 0 0 4 1 3 1 2 1 1 1 0 1 4 2 3 2 2 2 1 2 0 2 4 3 3 3 2 3 1 3 0 3 4 4 3 4 2 4 1 4 0 4 4 4 4 4

4 3 2 1 0 4 3 2 1 0

4 4 3 4 2 4 1 4 0 4

×

9 8 7 6 5 4 3 2 1 0

9 8 7 6 5 4 3 2 1 0

9 8 7 6 5 4 3 2 1 0

c. Baugh-Wooley

d. Modified B-W __

__ __

__ __ __ __ __

_ _

_ _

_ _ _ _

The Baugh-Wooley Method and Its Modified Form

–a4x0 = a4(1 – x0) – a4= a4x0′ – a4

–a4 a4x0′a4

In next column

–a4x0 = (1 – a4x0) – 1= (a4x0)′ – 1

–1 (a4x0)′1

In next column

Page 205: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

47Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Alternate Views of the Baugh-Wooley Methods

+ 0 0 –a4x3 –a4x2 –a4x1 –a4x0+ 0 0 –a3x4 –a2x4 –a1x4 –a0x4--------------------------------------------– 0 0 a4x3 a4x2 a4x1 a4x0– 0 0 a3x4 a2x4 a1x4 a0x4--------------------------------------------+ 1 1 a4x3 a4x2 a4x1 a4x0+ 1 1 a3x4 a2x4 a1x4 a0x4

11

--------------------------------------------+ a4 a4 a4x3 a4x2 a4x1 a4x0+ x4 x4 a3x4 a2x4 a1x4 a0x4

a4x4--------------------------------------------

a41 x4

4 3 2 1 0 4 3 2 1 0

4 3 2 1 0 4 3 2 1 0 a x a x a x a x a x

a a a a a x x x x x 4 0 3 0 2 0 1 0 0 0 4 1 3 1 2 1 1 1 0 1 4 2 3 2 2 2 1 2 0 2 4 3 3 3 2 3 1 3 0 3 4 4 3 4 2 4 1 4 0 4

×

a a a a a x x x x x ---------------------------- a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x --------------------------------------------------------- p p p p p p p p p p a a a a a x x x x x ---------------------------- -a x a x a x a x a x -a x a x a x a x a x -a x a x a x a x a x -a x a x a x a x a x a x -a x -a x -a x -a x --------------------------------------------------------- p p p p p p p p p p a a a a a x x x x x ---------------------------- a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a a 1 x x --------------------------------------------------------- p p p p p p p p p p --------------------------- a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x a x --------------------------------------------------------- p p p p p p p p p p

1 1

4 0 3 0 2 0 1 0 0 0 4 1 3 1 2 1 1 1 0 1 4 2 3 2 2 2 1 2 0 2 4 3 3 3 2 3 1 3 0 3 4 4 3 4 2 4 1 4 0 4 4 4 4 4

4 3 2 1 0 4 3 2 1 0

4 3 2 1 0 4 3 2 1 0

4 0 3 0 2 0 1 0 0 0 4 1 3 1 2 1 1 1 0 1 4 2 3 2 2 2 1 2 0 2 4 3 3 3 2 3 1 3 0 3 4 4 3 4 2 4 1 4 0 4

4 0 3 0 2 0 1 0 0 0 4 1 3 1 2 1 1 1 0 1 4 2 3 2 2 2 1 2 0 2 4 3 3 3 2 3 1 3 0 3 4 4 3 4 2 4 1 4 0 4

×

×

×

9 8 7 6 5 4 3 2 1 0

9 8 7 6 5 4 3 2 1 0

9 8 7 6 5 4 3 2 1 0

9 8 7 6 5 4 3 2 1 0

a. Unsigned

b. 2's-complement

c. Baugh-Wooley

d. Modified B-W __

__ __

__ __ __ __ __

_ _

_ _

_ _ _ _

Page 206: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

48Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Partial-Tree Multipliers

Fig. 11.9 General structure of a partial-tree multiplier.

. . .

CSA Tree

h inputs

Adder

Lower part of the cumulative partial product

FF

h-Bit Adder

Sum Carry

Upper part of the cumulative partial product (stored-carry)

High-radix versus partial-tree multipliers: The difference is quantitative, not qualitative

For small h, say ≤ 8 bits, we view the multiplier of Fig. 11.9 as high-radix

When h is a significant fraction of k, say k/2 or k/4,then we tend to view it as a partial-tree multiplier

Better design through pipelining to be covered in Section 11.6

Page 207: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

49Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Truncated Multipliers

Removing the dots at the right does not lead to much loss of precision.

ulp. o o o o o o o o k-by-k fractional

× . o o o o o o o o multiplication---------------------------------. o o o o o o o|o. o o o o o o|o o. o o o o o|o o o. o o o o|o o o o. o o o|o o o o o. o o|o o o o o o. o|o o o o o o o. |o o o o o o o o---------------------------------. o o o o o o o o|o o o o o o o o

Max error = 8/2 + 7/4 + 6/8 + 5/16 + 4/32 + 3/64 + 2/128 + 1/256 = 7.004 ulp

Mean error = 1.751 ulp

Page 208: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

50Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Truncated Multipliers with Error Compensation

Constant and variable error compensation for truncated multipliers.

We can introduce additional “dots” on the left-hand side to compensate for the removal of dots from the right-hand side

Constant compensation Variable compensation

. o o o o o o o| . o o o o o o o|

. o o o o o o| . o o o o o o|

. o o o o o| . o o o o o|

. o o o o| . o o o o|

. o o o| . o o o|

. 1 o o| . o o|

. o| . x-1o|

. | . y-1 |

Max error = +4 ulpMax error ≅ −3 ulp

Max error = +? ulpMax error ≅ −? ulp

Mean error = ? ulp Mean error = ? ulp

Page 209: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

51Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Array Multipliers

A basic array multiplier uses a one-sided CSA tree and a ripple-carry adder.

0x ax ax a

x a

x a

CSA

CSA

CSA

CSA

Ripple-Carry Adder

012

3

4

ax

p

0

p

1

p

2

p

3

p

4

p 6 p 7 p 8

a x

0 0

a x

1 0

a x

2 0

a x

3 0

a x

4 0

0

0

0

0

a x

0 1

a x

1 1

a x

2 1

a x

3 1

p 9 p 5

a x

4 1

a x

4 2

a x

4 3

a x

4 4

a x

0 2

a x

1 2

a x

2 2

a x

3 2

a x

0 3

a x

1 3

a x

2 3

a x

3 3

a x

0 4

a x

1 4

a x

2 4

a x

3 4

0

Details of a 5×5 array multiplier using FA blocks.

[3:2] Adder, i.e. a full adder

Page 210: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

52Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Signed (2’s-complement) Array Multiplierusing the Baugh-Wooley method or to shorten the critical path.

p

0

p

1

p

2

p

3

p 4 p 6p 7p 8

a x

0 0

a x

1 0

a x

2 0

a x

3 0

a x

4 0

0

0

0

0

a x

0 1

a x

1 1

a x

2 1

a x

3 1

p 9 p 5

a x

4 1

a x

4 2

a x

4 3

a x

4 4

a x

0 2

a x

1 2

a x

2 2

a x

3 2

a x

0 3

a x

1 3

a x

2 3

a x

3 3

a x

0 4

a x

1 4

a x

2 4

a x

3 4 1

x

4

a

4

a

4 x

4

_

_

_

_

_

_

_

_

_

_

Page 211: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

53Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Array Multiplier Built of Modified Full-Adder Cells

Design of a 5 × 5 array multiplier with two additive inputs and full-adder blocks that include AND gates.

p p p p p

4 3 2 1 0 a a a a a

4

3

2

1

0

x

x

x

x

x

4

3

2

1

0

p

p

p

p

p

9 8 7 6 5

FA

Page 212: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

54Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Array Multiplier without a Final Carry-Propagate Adder

i+1i

i+1i

i i

Mux

Mux

Muxk

[k, 2k?] 1i?ii+1k?

Level i

k k

0

Mux

...

...

Bi+1

Bi

All remaining bits of the final product produced only 2 gate levels after pk–1

See next slide

Page 213: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

55Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Extend Bits in Less-Significant Part in a Conditional Adder

The circuit in the right part is considered a conditional adder as the circuit in the left part. Source: Ercegovac and Lang, “Digital Arithmetic”, pp.86-87

Page 214: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

56Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Pipelined Tree and Array Multipliers

. . .

CSA Tree

h inputs

Adder

Lower part of the cumulative partial product

FF

h-Bit Adder

Sum Carry

Upper part of the cumulative partial product (stored-carry)

General structure of a partial-tree multiplier.

Efficiently pipelined partial-tree multiplier.

. . .

h inputs

Adder

Lower part of the cumulative partial product

FF

h-Bit Adder

Sum Carry

CSA

Pipelined CSA Tree

Latches Latches Latches

CSA

(h + 2)-input CSA tree

Latch

Page 215: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

57Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Pipelined Array MultipliersWith latches after every FA level, the maximum throughput is achieved

Latches may be inserted after every h FA levels for an intermediate design

Pipelined 5×5 array multiplier using latched FA blocks. The small shaded boxes are latches.

p p p p p

4 3 2 1 0 a a a a a 4 3 2 1 0 x xxxx

4 3 2 1 0 p p p p p 9 8 7 6 5

Latched FA with AND gate

Latch

FA

FA

FA

FA

Example: 3-stage pipeline

Page 216: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

58Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Variations in Multipliers

Chapter GoalsLearn additional methods for synthesizing fast multipliers as well as other types of multipliers (bit-serial, modular, etc.)

Chapter HighlightsBuilding a multiplier from smaller units Performing multiply-add as one operationBit-serial and (semi)systolic multipliersUsing a multiplier for squaring is wasteful

Page 217: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

59Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Divide-and-Conquer DesignsBuilding wide multiplier from narrower ones

Divide-and-conquer (recursive) strategy for synthesizing a 2b × 2b multiplier from b × b multipliers.

a

×

p

Rearranged partial products in 2b-by-2b multiplication

2b bits

3b bits

H a L

xH xL

a L xH

a L xL

a H xLxHa H

a H xL

a L xH

a L xLxHa H

b bits

Page 218: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

60Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

General Structure of a Recursive Multiplier

2b × 2b use (3; 2)-counters3b × 3b use (5; 2)-counters4b × 4b use (7; 2)-counters

Using b × b multipliers to synthesize 2b × 2b, 3b× 3b, and 4b × 4b multipliers.

4b × 4b

3b × 3b

2b × 2b

b × b

Page 219: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

61Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

An 8 X 8 Multiplier Using 4 X 4 Multipliers a x a x a x a x

A dd

A dd

A dd

A dd A dd

pp p p

000

8

8

12

12

H LH H H LLL

[4 , 7] [4 , 7] [0 , 3] [4 , 7] [4 , 7] [0 , 3] [0 , 3] [0 , 3]

[12 ,15] [8 ,11] [8 ,11] [4 , 7] [8 ,11] [4 , 7] [4 , 7] [0 , 3]

[4 , 7]

[4 , 7]

[8 ,11 ]

[8 ,11 ]

[12,15]

[12,15] [8 ,11] [0 , 3][4 , 7]

M u ltip ly M ultip lyM ultip lyM ultip ly

Page 220: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

62Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Additive Multiply Modules

Additive multiply module with 2 × 4 multiplier (ax) plus 4-bit and 2-bit additive inputs (y and z).

c

in

y

z

ax

p

4-bit adder

y

z

x a

p = ax + y + z

(a) Block diagram (b) Dot notation

b-bit and c-bit multiplicative inputsb × c AMM b-bit and c-bit additive inputs

(b + c)-bit output

(2b – 1) × (2c – 1) + (2b – 1) + (2c – 1) = 2b+c – 1

Page 221: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

63Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Multiplier Built of AMMs

An 8 × 8 multiplier built of 4×2 AMMs. Inputs marked with an asterisk carry 0s.

[0, 1]

[2, 3]

[4, 5]

[6, 7]

[8, 9][10,11][12,15]

[0, 1][2, 3]

[4,5][6, 7]

x

x

x

x [0, 3]a

[0, 3]a

[0, 3]a

[0, 3]a

p

pp

pppp

[0, 1]x

[2, 3]

[4, 5]

[6, 7]x

x

x

[10,11]

[8, 9]

[4, 7]a

[4, 7]a

[4, 7]a

[4, 7]a

[8, 9]

[0, 1]

[2, 3][4, 5]

[6, 7][4,5]

[6, 7]

[8, 11]

[10,13]

[2, 5]

[4,7]

[6, 9][8, 11]

[6, 9]

*

*

* *

**

Legend: 2 bits 4 bits Understanding

an 8 × 8 multiplier built of 4 × 2 AMMs using dot notation

Page 222: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

64Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Bit-Serial Multipliers

FA

FFBit-serial adder(LSB first) x0

y0

s0x1

y1

s1x2

y2

s2…

Bit-serial multipliera0

x0

p0a1

x1

p1a2

x2

p2…

…?Systolic arrays: synchronous arrays of processing elements that are interconnected by only short, local wires thus allowing very high clock rates.

Page 223: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

65Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Semisystolic Serial-Parallel MultiplierMultiplicand (parallel in)

Multiplier (serial in)LSB-first

Carry

SumFA

Product (serial out)

FA FA FA

a 3 a 2 a 1 a 0x0 x1 x2 x3

Semi-systolic circuit for 4 × 4 multiplication in 8 clock cycles.

This is called “semisystolic” because it has a large signal fan-out of k(k-way broadcasting) and a long wire spanning all k positions

Page 224: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

66Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Systolic Retiming as a Design Tool

Example of retiming by delaying the inputs to CL and advancing the outputs from CL by d units

Cut

CL CR CL CR

ef

gh

e+df+d

g h 

+d

 

 

+dOriginal delays Adjusted delays

A semisystolic circuit can be converted to a systolic circuit via retiming, which involves advancing and retarding signals by means of delay removal and delay insertion in such a way that the relative timings of various parts are unaffected

Page 225: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

67Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

A First Attempt at Retiming

A retimed version of our semi-systolic multiplier.

Multiplicand (parallel in)

Multiplier (serial in)LSB-first

Carry

FAProduct (serial out)

FA FA FA

a 3 a 2 a 1 a 0x0 x1 x2 x3

Sum

Cut 1Cut 2Cut 3

Multiplicand (parallel in)

Multiplier (serial in)LSB-first

Carry

SumFA

Product (serial out)

FA FA FA

a 3 a 2 a 1 a 0x0 x1 x2 x3

Page 226: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

68Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Deriving a Fully Systolic Multiplier

Multiplicand (parallel in)

Multiplier (serial in)LSB-first

Carry

SumFA

Product (serial out)

FA FA FA

a 3 a 2 a 1 a 0x 0 x 1 x 2 x 3

A retimed version of our semi-systolic multiplier.

Multiplicand (parallel in)

Multiplier (serial in)LSB-first

SumFA

Product (serial out)

FA FA FA

a3 a2 a1 a0x0 x1 x2 x3

Carry

Page 227: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

69Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

A Direct Design for a Bit-Serial Multiplier

Fig. 12.13 Bit-serial multiplier design in dot notation.

p

x

a

Already accumulated

into three numbers

(i - 1)

a

x

(i - 1)

i

a

x

i

x

i

(i - 1)

a

i

a

x

(i - 1)

x

i

i

a

Already output

(a) Structure of the bit-matrix

(b) Reduction after each input bit

p

(i - 1)

i

a

x

(i - 1)

x

i

(i - 1)

a

x

i

i

a

2p

(i )

Shift right to obtain p

(i )

Mux

(5; 3)-counter

0

1

012

a x

a x

ss

c c

t t in

out in

in out

out

p

ii

ii(i?)

ax

ss

c c

t t in

out in

in out

out

p

ii

. . .. . .

. . .

. . .

. . .

i

LSB

0

Building block for a latency-free bit-serial multiplier.

The cellular structure of the bit-serial multiplier based on the cell in Fig. 12.11.

Page 228: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

70Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Modular Multipliers

. . .FA FAFAFAFA

Mod-15 CSA

Divide by 16

4

4

4

4

Mod-15 CSA

4

Mod-15 CPA

Modulo-(2b – 1) carry-save adder.

Design of a 4 × 4 modulo-15 multiplier.

Page 229: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

71Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Other Examples of Modular Multiplication

One way to design of a 4 × 4 modulo-13 multiplier.

16 mod 13 = 3 • •

Page 230: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

72Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Squaringx 0 x 1 x 2 x 3 x 4 x 0 x 1 x 2 x 3 x 4

x 0 x 1 x 2 x 3 x 4 x 0 x 0

p 0

x 4

x 1

x 4

x 0 x 1

x 2 x 3

x 4

x 0 x 1

x 2 x 3

x 4

x 0

Multiply x by x

x 1 x 2 x 3 x 4 x 0 x 1 x 2 x 3 x 4 x 0

x 1 x 2 x 3 x 4 x 0 x 1 x 2 x 3 x 4 x 0

x 1 x 2 x 3

x 1 x 2 x 3

x 2 x 3

x 4

p 1 p 2 p 3 p 4 p 5 p 6 p 7 p 8 p 9

x 1 x 2 x 3 x 4 x 0 x 1

x 0

x 2

x 0 x 1

x 0 x 2 x 3

x 4 x 0 x 3

x 4

x 0

x 1 x 2 x 1

x 2 x 3

x 3 x 4 x 4

p 2 p 3 p 4 p 5 p 6 p 7 p 8 p 9 0

_

Simplify

Design of a 5-bit squarer.

x1x0 –x1x0

Page 231: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

73Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Constant Multiplier

Source: Ercegovac and Lang, “Digital Arithmetic”, pp.224

Page 232: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

74Computer Arithmetic 3, Dept. of EE, Fu Jen Catholic University, Taiwan

Multiple Constant Multiplier

Source: Ercegovac and Lang, “Digital Arithmetic”, pp. 225

Page 233: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

1Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Division

Instructor: Kuan Jen Lin E-Mail: [email protected]. of EE, FJU, TaiwanRoom: SF 727B

Most slides are revision of PowerPoint files gotten from textbook website.

Page 234: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

2Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Division

Chapter 16 Division by Convergence

Chapter 15 Variations in Dividers

Chapter 14 High-Radix Dividers

Chapter 13 Basic Division Schemes

Topics in This Part

Review Division schemes and various speedup methods• Hardest basic operation (fortunately, also the rarest)• Division speedup methods: high-radix, array, . . .• Combined multiplication/division hardware • Digit-recurrence vs convergence division schemes

Page 235: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

3Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

13 Basic Division Schemes

Chapter GoalsStudy shift/subtract or bit-at-a-time dividersand set the stage for faster methods andvariations to be covered in Chapters 14-16

Chapter HighlightsShift/subtract divide vs shift/add multiplyHardware, firmware, software algorithmsDividing 2’s-complement numbersThe special case of a constant divisor

Page 236: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

4Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Shift/Subtract Division Algorithms

Notation for our discussion of division algorithms:

z Dividend z2k–1z2k–2 . . . z3z2z1z0d Divisor dk–1dk–2 . . . d1d0q Quotient qk–1qk–2 . . . q1q0s Remainder, z – (d × q) sk–1sk–2 . . . s1s0

Initially, we assume unsigned operands

Division of an 8-bit number by a 4-bit number in dot notation.

Dividend

Subtracted bit-matrix

z

s Remainder

Quotient q Divisor d

q d 2 3 3 –

q d 2 2 2 –

q d 2 1 1 –

q d 2 0 0 –

Page 237: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

5Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Division versus Multiplication (1/2)

Division is more complex than multiplication:Need for quotient digit selection or estimation

Overflow possibility: the high-order k bits of z must be strictly less than d; the quotient of a 2k bit number divided by a k bit number may have a width of more than k bits.

Dividend

Subtracted bit-matrix

z

s Remainder

Quotient q Divisor d

q d 2 3 3 –

q d 2 2 2 –

q d 2 1 1 –

q d 2 0 0 –

Page 238: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

6Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Division versus Multiplication (2/2)

Pentium III latenciesInstruction Latency Cycles/IssueLoad / Store 3 1Integer Multiply 4 1Integer Divide 36 36Double/Single FP Multiply 5 2Double/Single FP Add 3 1Double/Single FP Divide 38 38

Page 239: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

7Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Division Recurrence

Division with left shifts

s(j) = 2s(j–1) – qk–j (2k d) with s(0) = z and|–shift–| s(k) = 2ks|–––subtract–––|

(There is no corresponding right-shift algorithm)

Dividend

Subtracted bit-matrix

z

s Remainder

Quotient q Divisor d

q d 2 3 3 –

q d 2 2 2 –

q d 2 1 1 –

q d 2 0 0 –

Integer division is characterized by z = d × q + s

2–2kz = (2–kd) × (2–kq) + 2–2kszfrac = dfrac × qfrac + 2–ksfrac

Divide fractions like integers; adjust the remainder

No-overflow condition for fractions is:

zfrac < dfrac

k bits k bits

2z

2k d

0

Page 240: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

8Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Division Recurrence StepsInitializationIterations

One digit arithmetic left-shift of s(j) to produce rs(j)

Determination of the quotient digit q j+1 by the quotient-digit selection function;

The index of q could be different Generation of the divisor multiple d × qj+1

Subtraction of dqj+1 from rs(j).On-the-fly conversion of the quotient

Or done in the termination step

Termination: make sign(s)=sign(d)), conversion

Page 241: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

9Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Examples of Basic DivisionInteger division Fractional division====================== =====================z 0 1 1 1 0 1 0 1 zfrac . 0 1 1 1 0 1 0 124d 1 0 1 0 dfrac . 1 0 1 0 ====================== =====================s(0) 0 1 1 1 0 1 0 1 s(0) . 0 1 1 1 0 1 0 12s(0) 0 1 1 1 0 1 0 1 2s(0) 0 . 1 1 1 0 1 0 1–q3 24d 1 0 1 0 {q3 = 1} –q–1d . 1 0 1 0 {q–1=1}––––––––––––––––––––––– ––––––––––––––––––––––s(1) 0 1 0 0 1 0 1 s(1) . 0 1 0 0 1 0 12s(1) 0 1 0 0 1 0 1 2s(1) 0 . 1 0 0 1 0 1–q2 24d 0 0 0 0 {q2 = 0} –q–2d . 0 0 0 0 {q–2=0}––––––––––––––––––––––– ––––––––––––––––––––––s(2) 1 0 0 1 0 1 s(2) . 1 0 0 1 0 12s(2) 1 0 0 1 0 1 2s(2) 1 . 0 0 1 0 1–q1 24d 1 0 1 0 {q1 = 1} –q–3d . 1 0 1 0 {q–3=1}––––––––––––––––––––––– ––––––––––––––––––––––s(3) 1 0 0 0 1 s(3) . 1 0 0 0 12s(3) 1 0 0 0 1 2s(3) 1 . 0 0 0 1–q0 24d 1 0 1 0 {q0 = 1} –q–4d . 1 0 1 0 {q–4=1}––––––––––––––––––––––– ––––––––––––––––––––––s(4) 0 1 1 1 s(4) . 0 1 1 1s 0 1 1 1 sfrac 0 . 0 0 0 0 0 1 1 1q 1 0 1 1 qfrac . 1 0 1 1====================== =====================

Notice the index of q

What is the residual of 0.0112 / 0.1?

Page 242: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

10Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Main Factors Affecting the Overall Execution Time and Cost

Radix rQuotient-digit set

Redundant signed digit?Representation of the residual

CSA?Quotient-digit selection

Page 243: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

11Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Programmed Division

Register usage for programmed division.

Rs Rq

Rd0 0 . . . 0 0 0 0

2 dk

Carry Flag

Shifted Partial Remainder

Shifted Partial Quotient

Partial Remainder (2k – j Bits)

Partial Quotient (j Bits)

Next quotient digit inserted here

Divisor d

Page 244: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

12Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Assembly Language Program for Division

Programmed division using left shifts.

{Using left shifts, divide unsigned 2k-bit dividend,z_high|z_low, storing the k-bit quotient and remainder. Registers: R0 holds 0 Rc for counter

Rd for divisor Rs for z_high & remainder Rq for z_low & quotient}

{Load operands into registers Rd, Rs, and Rq}div: load Rd with divisor

load Rs with z_highload Rq with z_low

{Check for exceptions} branch d_by_0 if Rd = R0branch d_ovfl if Rs > Rd

{Initialize counter}load k into Rc

{Begin division loop}d_loop: shift Rq left 1 {zero to LSB, MSB to carry}

rotate Rs left 1 {carry to LSB, MSB to carry}skip if carry = 1branch no_sub if Rs < Rd sub Rd from Rs incr Rq {set quotient digit to 1}

no_sub: decr Rc {decrement counter by 1}branch d_loop if Rc   0

{Store the quotient and remainder}store Rq into quotientstore Rs into remainder

d_by_0: ...d_ovfl: ...d_done: ...

Rs Rq

Rd0 0 . . . 0 0 0 0

2 dk

Carry Flag

Shifted Partial Remainder

Shifted Partial Quotient

Partial Remainder (2k ?j Bits)

Partial Quotient (j Bits)

Next quotient digit inserted here

Divisor d

Register usage for programmed division.

Page 245: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

13Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Time Complexity of Programmed DivisionAssume k-bit words

k iterations of the main loop 6 or 8 instructions per iteration, depending on the quotient bit

Thus, 6k + 3 to 8k + 3 machine instructions,ignoring operand loads and result store

k = 32 implies 220+ instructions on average

This is too slow for many modern applications!

Microprogrammed division would be somewhat better

Page 246: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

14Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Restoring Hardware Dividers

Shift/subtract sequential restoring divider.

Quotient q

Mux

Adder out c

0 1

Partial remainder s (initial value z)

Divisor d

Shift

Shift

Load

1 in c

(j)

Quotient digit

selector

q k–j

MSB of 2s (j–1)

k

k

k

Trial difference

Page 247: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

15Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Indirect Signed DivisionIn division with signed operands, q and s are defined by

z = d × q + s sign(s) = sign(z) |s | < |d |

Examples of division with signed operands

z = 5 d = 3 ⇒ q = 1 s = 2

z = 5 d = –3 ⇒ q = –1 s = 2

z = –5 d = 3 ⇒ q = –1 s = –2

z = –5 d = –3 ⇒ q = 1 s = –2

Magnitudes of q and s are unaffected by input signsSigns of q and s are derivable from signs of z and d

Will discuss direct signed division later

(not q = –2, s = –1)

Page 248: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

16Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Example of Restoring Unsigned Division

=======================z 0 1 1 1 0 1 0 124d 0 1 0 1 0–24d 1 0 1 1 0=======================s(0) 0 0 1 1 1 0 1 0 1 2s(0) 0 1 1 1 0 1 0 1 +(–24d) 1 0 1 1 0 ––––––––––––––––––––––––s(1) 0 0 1 0 0 1 0 1 Positive, so set q3 = 12s(1) 0 1 0 0 1 0 1 +(–24d) 1 0 1 1 0 ––––––––––––––––––––––––s(2) 1 1 1 1 1 0 1 Negative, so set q2 = 0s(2)=2s(1) 0 1 0 0 1 0 1 and restore2s(2) 1 0 0 1 0 1 +(–24d) 1 0 1 1 0 ––––––––––––––––––––––––s(3) 0 1 0 0 0 1 Positive, so set q1 = 12s(3) 1 0 0 0 1 +(–24d) 1 0 1 1 0 ––––––––––––––––––––––––s(4) 0 0 1 1 1 Positive, so set q0 = 1s 0 1 1 1 q 1 0 1 1=======================

No overflow, because(0111)two < (1010)two

Page 249: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

17Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Nonrestoring and Signed DivisionThe cycle time in restoring division must be long enough to allow:

Shifting the registersAllowing signals to propagate through the adderDetermining and storing the next quotient digitStoring the trial difference, if required

Quotient q

Mux

Adder out c

0 1

Partial remainder s (initial value z)

Divisor d

Shift

Shift

Load

1 in c

(j)

Quotient digit

selector

q k–j

MSB of 2s (j–1)

k

k

k

Trial difference

Nonrestoring division to the rescue!

Assume qk–j = 1 and subtractStore the result as the new PR

(the partial remainder can become incorrect, hencethe name “nonrestoring”)

Page 250: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

18Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Justification for Nonrestoring Division

Why it is acceptable to store an incorrect value in the partial-remainder register?

Shifted partial remainder at start of the cycle is u

Suppose subtraction yields the negative result u – 2kd

Option 1: Restore the partial remainder to correct value u, shift left, and subtract to get 2u – 2kd

Option 2: Keep the incorrect partial remainder u – 2kd, shift left, and add to get 2(u – 2kd) + 2kd = 2u – 2kd

Page 251: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

19Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Example of Nonrestoring Unsigned Division

=======================z 0 1 1 1 0 1 0 124d 0 1 0 1 0–24d 1 0 1 1 0=======================s(0) 0 0 1 1 1 0 1 0 1 2s(0) 0 1 1 1 0 1 0 1 Positive,+(–24d) 1 0 1 1 0 so subtract––––––––––––––––––––––––s(1) 0 0 1 0 0 1 0 1 2s(1) 0 1 0 0 1 0 1 Positive, so set q3 = 1+(–24d) 1 0 1 1 0 and subtract––––––––––––––––––––––––s(2) 1 1 1 1 1 0 1 2s(2) 1 1 1 1 0 1 Negative, so set q2 = 0+24d 0 1 0 1 0 and add––––––––––––––––––––––––s(3) 0 1 0 0 0 1 2s(3) 1 0 0 0 1 Positive, so set q1 = 1+(–24d) 1 0 1 1 0 and subtract––––––––––––––––––––––––s(4) 0 0 1 1 1 Positive, so set q0 = 1s 0 1 1 1 q 1 0 1 1=======================

No overflow: (0111)two < (1010)two

Applying “if sign(s) = sign(d) then qk–j = 1 else qk–j = -1 “, we get 11-11, that equals 1011

Page 252: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

20Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Graphical Depiction of Nonrestoring Division

300

200

100

0

–100

117

234

74

148

–12

296

136

272

112

s

(0)

s

(1)

s

(2)

s

(3) s =16s

(4)

–160

2

×

2

×

2

×

×

2

–160

–160 –160

Par

tial r

emai

nder

(a) Restoring

148

300

200

100

0

–100

117

234

74

148

–12 –24

136

272

112

s

(0)

s

(1)

s

(2)

s

(3) s =16s

(4)

–160

2

×

2

×

2

×

×

2

–160 +160

–160

Par

tial r

emai

nder

(b) Nonrestoring

Example

(0 1 1 1 0 1 0 1)two / (1 0 1 0)two

(117)ten / (10)ten

Page 253: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

21Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Nonrestoring Division with Signed Operands

Restoring divisionqk–j = 0 means no subtraction (or subtraction of 0)qk–j = 1 means subtraction of d

Nonrestoring divisionWe always subtract or addIt is as if quotient digits are selected from the set {1, −1}:

1 corresponds to subtraction −1 corresponds to addition

Our goal is to end up with a remainder that matches the signof the dividend

This idea of trying to match the sign of s with the sign z, leads to a direct signed division algorithm

if sign(s) = sign(d) then qk–j = 1 else qk–j = −1

Example: q = . . . 0 0 0 1 . . .. . . 1 −1 −1 −1 . . .

Page 254: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

22Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Quotient Conversion and Final CorrectionPartial remainder variation and selected quotient digits during nonrestoring division with d > 0

d

0

−d

+d

−d

−d

−d

+d

+d

×2×2

×2

×2×2

−1 1 −1 −1 1 1

z

0 1 0 0 1 1

1 1 0 0 1 1 1

Quotient with digits −1 and 1

Final correction step if sign(s) ≠ sign(z):Add d to, or subtract d from, s; subtract 1 from, or add 1 to, q

Check: −32 + 16 – 8 – 4 + 2 + 1 = −25 = −64 + 32 + 4 + 2 + 1

Replace −1s with 0s

Shift left, complement MSB, and set LSB to 1 to get the 2’s-complement quotient

1 1 0 1 0 0 0

Page 255: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

23Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Example of Nonrestoring Signed Division

========================z 0 0 1 0 0 0 0 124d 1 1 0 0 1–24d 0 0 1 1 1========================s(0) 0 0 0 1 0 0 0 0 1 2s(0) 0 0 1 0 0 0 0 1 sign(s(0)) ≠ sign(d),+24d 1 1 0 0 1 so set q3 = −1 and add––––––––––––––––––––––––s(1) 1 1 1 0 1 0 0 1 2s(1) 1 1 0 1 0 0 1 sign(s(1)) = sign(d), +(–24d) 0 0 1 1 1 so set q2 = 1 and subtract––––––––––––––––––––––––s(2) 0 0 0 0 1 0 1 2s(2) 0 0 0 1 0 1 sign(s(2)) ≠ sign(d),+24d 1 1 0 0 1 so set q1 = −1 and add––––––––––––––––––––––––s(3) 1 1 0 1 1 1 2s(3) 1 0 1 1 1 sign(s(3)) = sign(d), +(–24d) 0 0 1 1 1 so set q0 = 1 and subtract––––––––––––––––––––––––s(4) 1 1 1 1 0 sign(s(4)) ≠ sign(z),+(–24d) 0 0 1 1 1 so perform corrective subtraction––––––––––––––––––––––––s(4) 0 0 1 0 1 s 0 1 0 1 q −1 1−1 1========================

p = 0 1 0 1 Shift, compl MSB1 1 0 1 1 Add 1 to correct

1 1 0 0 Check: 33/(−7) = −4

Page 256: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

24Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

On-The-Fly Conversion

Source: Ercegovac and Lang, “Digital Arithmetic”, pp. 257

Page 257: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

25Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Nonrestoring Hardware Divider

Shift-subtract sequential nonrestoring divider.

Quotient

k

Partial Remainder

Divisor

add/sub

k-bit adder

k

cout cin

Complement

qk  2s (j?)MSB of

Divisor Sign

Complement of Partial Remainder Sign

Page 258: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

26Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Division by ConstantsSoftware and hardware aspects:As was the case for multiplications by constants, optimizing compilers may replace some divisions by shifts/adds/subs; likewise, in custom VLSI circuits, hardware dividers may be replaced by simpler adders

Method 1: Find the reciprocal of the constant and multiply (particularly efficient if several numbers must be divided by the same divisor)

Method 2: Use the property that for each odd integer d, there exists an odd integer m such that d × m = 2n – 1; hence, d = (2n – 1)/m and

Number of shift-adds required is proportional to log k

Multiplication by constant Shift-adds

L)21)(21)(21(2)21(212

42 nnnnnnn

zmzmzmdz −−−

− +++=−

=−

=

Page 259: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

27Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Example: Division by a Constant

L)21)(21)(21(2)21(212

42 nnnnnnn

zmzmzmdz −−−

− +++=−

=−

=

Example: Dividing the number z by 5, assuming 24 bits of precision. We have d = 5, m = 3, n = 4; 5 × 3 = 24 – 1

Instruction sequence for division by 5

q ← z + z shift-left 1 {3z computed}q ← q + q shift-right 4 {3z(1+2–4) computed}q ← q + q shift-right 8 {3z(1+2–4)(1+2–8) computed}q ← q + q shift-right 16 {3z(1+2–4)(1+2–8)(1+2–16) computed}q ← q shift-right 4 {3z(1+2–4)(1+2–8)(1+2–16)/16 computed}

L)21)(21)(21(163

)21(23

123

51684

444−−−

− +++=−

=−

=zzzz

5 shifts4 adds

Page 260: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

28Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Preview of Fast Dividers

Like multiplication, there are but two ways to speed it up: a. Reducing the number of operands (divide in a higher radix)b. Adding them faster (keep partial remainder in carry-save form)

a x

p

2

x a

0 0

1 x a 2 1 x a 2

2 2

2 3 3

x a

×

(a) k × k integer multiplication

z

s

q Divisor d

q d 2 3 3 –

q d 2 2 2 –

q d 2 1 1 –

q d 2 0 0 –

(b) 2k / k integer division

Both (a) Multiplication and (b) division can be considered as multioperand addition problems.

There is one complication that makes division inherently more difficult: The terms to be subtracted from (added to) the dividend are not known a priori but become known as quotient digits are computed;quotient digits in turn depend on partial remainders

Page 261: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

29Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

14 High-Radix Dividers

Chapter GoalsStudy techniques that allow us to obtainmore than one quotient bit in each cycle(two bits in radix 4, three in radix 8, . . .)

Chapter HighlightsRadix > 2 ⇒ quotient digit selection harder Remedy: redundant quotient representationCarry-save addition reduces cycle timeImplementation methods and tradeoffs

Page 262: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

30Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Basics of High-Radix Division

Division with left shifts

s(j) = rs(j–1) – qk–j (r k d) with s(0) = z and|–shift–| s(k) = r ks|–––subtract–––|

Dividend z

s Remainder

Quotient q Divisor d

(q q ) d 4 1 3 – 2 two

4 0 d (q q ) 1 – 0 two

Radix-4 division in dot notation

k digits k digits

rz

qk–j rk d

0

Page 263: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

31Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Examples of High-Radix DivisionRadix-4 integer division Radix-10 fractional division====================== =================z 0 1 2 3 1 1 2 3 zfrac . 7 0 0 3 44d 1 2 0 3 dfrac . 9 9 ====================== =================s(0) 0 1 2 3 1 1 2 3 s(0) . 7 0 0 34s(0) 0 1 2 3 1 1 2 3 10s(0) 7 . 0 0 3–q3 44d 0 1 2 0 3 {q3 = 1} –q–1d 6 . 9 3 {q–1 = 7}––––––––––––––––––––––– ––––––––––––––––––s(1) 0 0 2 2 1 2 3 s(1) . 0 7 34s(1) 0 0 2 2 1 2 3 10s(1) 0 . 7 3–q2 44d 0 0 0 0 0 {q2 = 0} –q–2d 0 . 0 0 {q–2 = 0}––––––––––––––––––––––– ––––––––––––––––––s(2) 0 2 2 1 2 3 s(2) . 7 34s(2) 0 2 2 1 2 3 sfrac . 0 0 7 3–q1 44d 0 1 2 0 3 {q1 = 1} qfrac . 7 0––––––––––––––––––––––– =================s(3) 1 0 0 3 3 4s(3) 1 0 0 3 3 –q0 44d 0 3 0 1 2 {q0 = 2}–––––––––––––––––––––––s(4) 1 0 2 1 s 1 0 2 1 q 1 0 1 2======================

Page 264: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

32Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Difficulty of Quotient Digit SelectionWhat is the first quotient digit in the following radix-10 division?

_____________2 0 4 3 | 1 2 2 5 7 9 6 8

The problem with the pencil-and-paper division algorithm is that there is no room for error in choosing the next quotient digit

In the worst case, all k digits of the divisor and k + 1 digits in the partial remainder are needed to make a correct choice

12 / 2 = 6122 / 20 = 6

1225 / 204 = 612257 / 2043 = 5

Suppose we used the redundant signed digit set [–9, 9] in radix 10

Then, we could choose 6 as the next quotient digit, knowing that we canrecover from an incorrect choice by using negative digits: 5 9 = 6 -1

Page 265: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

33Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Radix-2 SRT Division (1/3)

The new partial remainder, s(j), as a function of the shifted old partial remainder, 2s(j–1), in radix-2 nonrestoring division.

Algorithm in Ch 13.4

–2d

2d

d

–d

q =–1

q =1

2s

(j–1)

s

(j)

–j

–j

d

–d

s(j) = 2s(j–1) – q–j dwith s(0) = zs(k) = 2ksq–j ∈ {−1, 1}

Page 266: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

34Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Robertson’s DiagramAxes: the shifted residual 2s(j–1) and the next residual s(j)

It shows the possibilities to choose q and keep the next residual bounded.

P-D DiagramShifted residual (Partial remainder) vs. divisor

Diagrams for Quotient Selection

Page 267: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

35Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

–2d

2d

d

–d

q =–1

q =0

q =1

2s

(j–1)

s

(j)

–j

–j

–j

d

–d

Radix-2 SRT Division (2/3)

q–j = 0 requires shifting only, which was faster than shift-and-subtractBut how can you tell if –d ≦ 2s (j-1) < d?

s(j) = 2s(j–1) – q–j dwith s(0) = zs(k) = 2ksq–j ∈ {−1, 0, 1}

•Allowing 0 as a quotient digit in nonrestoring Divisionq-j=0 for –d ≦ 2s (j-1) < d

Page 268: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

36Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

–2d

2d

d

–d

q =–1

q =0

q =1

2s

(j–1)

s

(j)

–j

–j

–j

d

–d

–1/2 1/2

–1

1

–1/2

1/2

Radix-2 SRT Division (3/3)

The relationship between new and old partial remainders in radix-2 SRT division.

Comparison with constants −½ and ½ is quite simple2s ≥ +½ means 2s = (0.1xxxxxxxx)2’s-compl2s < −½ means 2s = (1.0xxxxxxxx)2’s-compl

If 2s(j–1) < ½then q–j =-1else if 2s(j–1) ≧ ½

then q–j =1else q–j =0endif

endif

Page 269: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

37Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Radix-2 SRT Division with Variable ShiftsS(0) is adjusted to be in [-1/2, 1/2/).We use the comparison constants −½ and ½ for quotient digit selection

For 2s ≥ +½ or 2s = (0.1xxxxxxxx)2’s-compl choose q–j = 1For 2s < −½ or 2s = (1.0xxxxxxxx)2’s-compl choose q–j = −1

Choose q–j = 0 in other cases, that is, for:0 ≤ 2s < +½ or 2s = (0.0xxxxxxxx)2’s-compl−½ ≤ 2s < 0 or 2s = (1.1xxxxxxxx)2’s-compl

Observation: What happens when the magnitude of 2s is fairly small?

2s = (0.00001xxxx)2’s-compl

2s = (1.1110xxxxx)2’s-compl

Choosing q–j = 0 would lead to the same condition in the next step; generate 5 quotient digits 0 0 0 0 1

Generate 4 quotient digits 0 0 0 −1

Use leading 0s or leading 1s detection circuit to determine how many quotient digits can be spewed out at onceStatistically, the average skipping distance will be 2.67 bits

Page 270: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

38Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Example Unsigned Radix-2 SRT Division

========================z . 0 1 0 0 0 1 0 1d 0 . 1 0 1 0–d 1 . 0 1 1 0========================s(0) 0 . 0 1 0 0 0 1 0 1 2s(0) 0 . 1 0 0 0 1 0 1 ≥ ½, so set q−1 = 1+(−d) 1 . 0 1 1 0 and subtract––––––––––––––––––––––––s(1) 1 . 1 1 1 0 1 0 1 2s(1) 1 . 1 1 0 1 0 1 In [−½, ½), so set q−2 = 0––––––––––––––––––––––––s(2) =2s(1) 1 . 1 1 0 1 0 1 2s(2) 1 . 1 0 1 0 1 In [−½, ½), so set q−3 = 0––––––––––––––––––––––––s(3) =2s(2) 0 . 1 0 1 0 1 2s(3) 1 . 0 1 0 1 < −½, so set q−4 = −1+d 0 . 1 0 1 0 and add––––––––––––––––––––––––s(4) 1 . 1 1 1 1 Negative,+d 0 . 1 0 1 0 so add to correct––––––––––––––––––––––––s(4) 0 . 1 0 0 1 s 0 . 0 0 0 0 0 1 0 1 q 0 . 1 0 0−1 Uncorrected BSD quotientq 0 . 0 1 1 0 Convert and subtract ulp========================

In [−½, ½), so okay

0.1000

-0.0001

0.0111

-0.0001

0.0110

Page 271: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

39Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Using Carry-Save Adders

Constant thresholds used for quotient digit selection in radix-2 division with qk–j in {–1, 0, 1} .

–2d 2d

d

–d

q =–1

q =0 q =1

2s (j–1)

s (j)

–j

–j

–j

d–d

–1/2 0Choose –1 Choose 0 Choose 1

–1/0 0/+1Overlap Overlap

You can choose 0 or 1 in the overlay region

Page 272: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

40Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Quotient Digit Selection Based on Truncated PR

Sum part of 2s(j–1): u = (u1u0 . u–1u–2 . . .)2’s-complCarry part of 2s(j–1): v = (v1v0 . v–1v–2 . . .)2’s-compl

Approximation to the partial remainder:

t = u[–2,1] + v[–2,1] {Add the 4 MSBs of u and v}

t := u[–2,1] + v[–2,1]if t < –½then q–j = –1else if t ≥ 0

then q–j = 1else q–j = 0endif

endif

–2d 2d

d

–d

q =–1

q =0 q =1

2s (j–1)

s (j)

–j

–j

–j

d–d

–1/2 0Choose –1 Choose 0 Choose 1

–1/0 0/+1Overlap Overlap

Page 273: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

41Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Error in tThe 4-bit number t=(t1t0.t-1t-2)2/s0compl can be compared to the constants -1/2 and 0 based on only the three bit values t1, t0 and t-1.Regardless of sign, truncating the t-2 results in the maximum truncated value being ½ (when the trye carry-in to t-2 is 1 and t-2 is 1.). Still in overlay region:

If t < -1/2, the true value of 2s(j–1) is guaranteed to be less than 0.

If t < 0, we are guaranteed to have 2s(j–1) < ½ ≦d.

Page 274: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

42Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Divider with Partial Remainder in Carry-Save Form

Carry v

Mux

Adder

0 1

Divisor d

k k

Carry-save adder

Select q –j

4 bits Shift left

2s

+ulp for 2’s compl

Sum u

Non0 (enable)

Sign (select)

0, d, or d’

Carry Sum

Page 275: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

43Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Why We Cannot Use Carry-Save PR with SRT Division

Overlap regions in radix-2 SRT division.

–2d

2d

d

–d

q =–1

q =0

q =1

2s

(j–1)

s

(j)

–j

–j

–j

d

–d

1 – d

–1

1

–1/2

1/2

1 – dThe overlay can become arbitrarily small as d approaches 1.

Page 276: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

44Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Choosing the Quotient Digits

A p-d plot for radix-2 division with d ∈ [1/2,1), partial remainder in [–d, d), and quotient digits in [–1, 1].

d

p

Infeasible region (p cannot be ≥ 2d)

Infeasible region (p cannot be < −2d)

.100 .101 .110 .111 1.

00.1

00.0

11.1

10.0

10.1

11.0

01.1

01.0

−00.1

−01.0

−01.1

−10.0

d

2d

−2d

−d

Worst-case error margin in comparison

Choose 1

Choose −1

Choose 0

−1

1

−1 max

−1 min

1 min

1 max

0 max

0 min

Ove

rlap

Ove

rlap

0

Use p-d plot to understand the q selection and derive the needed precision (number of bits to look at).

Page 277: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

45Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Design of the Quotient Digit Selection Logic

4-bit adder

Combinational logic

Non0Sign

Shifted sum = (u1u0 . u−1u−2 . . .)2’s-compl

Shifted carry = (v1v0 . v−1v−2 . . .)2’s-compl

Approx shifted PR = (t1t0 . t−1t−2)2’s-compl

Non0 = t1′ ∨ t0′ ∨ t–1′ = (t1 t0 t−1)′Sign = t1 (t0′ ∨ t−1′)

Page 278: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

46Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Radix-4 SRT Division

New versus shifted old partial remainder in radix-4 division with q–j in [–3, 3].

Radix-4 fractional division with left shifts and q–j ∈ [–3, 3]

s(j) = 4s(j–1) – q–j d with s(0) = z and s(k) = 4ks|–shift–||––subtract––|

Two difficulties:How do you choose from among the 7 possible values for q−j?If the choice is +3 or −3, how do you form 3d?

–4d 4d

d

–d

4s(j–1)

–3 –2 –1 0 +1 +2 +3

s (j)

Page 279: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

47Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Building the p-d Plot for Radix-4 Division

A p-d plot for radix-4 SRT division with quotient digit set [–3, 3].

d

p

Infeasible region (p cannot be ≥ 4d)

.100 .101 .110 .111

10.1

10.0

01.1

00.0

00.1

01.0

11.1

11.0

d

2d

Choose 2

Choose 0

Choose 1

3

1

2 max

2 min

1 min

1 max

0 max

Ove

rlap

0

3d

4d

Choose 3

3 min

2

Ove

rlap

Ove

rlap

Uncertaintyregion

Uncertaintyregion

Uncertainty region: because of truncation.

The choice between q=3 or q=2 depends not only the p but also on one bit, d-2.

Page 280: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

48Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

–4d 4d

d

–d

4s(j–1) –3 –2 –1 0 +1 +2 +3

s(j)

2d/3

8d/3–2d/3

–8d/3

Restricting the Quotient Digit Set in Radix 4

Fig. 14.13 New versus shifted old partial remainder in radix-4 division with q–j in [–2, 2].

Radix-4 fractional division with left shifts and q–j ∈ [–2, 2]

s(j) = 4s(j–1) – q–j d with s(0) = z and s(k) = 4ks|–shift–||––subtract––|

For this restriction to be feasible, we must have:s ∈ [−hd, hd) for some h < 1, and 4hd – 2d ≤ hdThis yields h ≤ 2/3 (choose h = 2/3 to minimize the restriction)

Page 281: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

49Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

d

p

.100 .101 .110 .111

10.1

10.0

01.1

00.0

00.1

01.0

11.1

11.0

Choose 2

Choose 0

Choose 1 1

2 min

1 min

2 max

1 max

0 max

0

2

Ove

rlap

Ove

rlap

Infeasible region (p cannot be ≥ 8d/3)

8d/3

5d/3

4d/3

2d/3

d/3

Building the p-d Plot with Restricted Radix-4 Digit Set

A p-d plot for radix-4 SRT division with quotient digit set [–2, 2].

Depends on d

Page 282: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

50Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

General High-Radix Dividers

Carry v

CSA tree

Adder

Divisor d

k k

Select q –j

Shift left

2s Sum u

Multiple generation /

selection

Carry Sum

q –j

. . . q –j | | d or its complement

Process to derive the details:

Radix r

Digit set [–α, α] for q–j

Number of bits of p (v and u) and d to be inspected

Quotient digit selection unit (table or logic)

Multiple generation/selection scheme

Conversion of redundant q to 2’s complement

Page 283: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

51Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

15 Variations in Dividers

Chapter GoalsDiscuss practical aspects of designinghigh-radix division schemes and coverother types of fast hardware dividers

Chapter HighlightsBuilding and using p-d plots in practicePrescaling simplifies q digit selectionParallel hardware (array) dividersShared hardware in multipliers/dividersSquare-rooting not special case of division

Page 284: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

52Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Quotient Digit Selection RevisitedRadix-r division with quotient digit set [–α, α], α < r – 1 Restrict the partial remainder range, say to [–hd, hd)From the solid rectangle in Fig. 15.1, we get rhd – αd ≤ hd or h ≤ α/(r – 1) To minimize the range restriction, we choose h = α/(r – 1)

The relationship between new and shifted old partial remainders in radix-rdivision with quotient digits in [–α, +α].

–α

r s (j–1)

s (j)

r–1

rhd –rhd

hd

–hd

d

–d

–r+1 α –1 1 0

rd –rd αd –αd d –d 0

Page 285: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

53Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Why Using Truncated p and d Values Is Acceptable

A part of p-d plot showing the overlap region for choosing the quotient digitvalue β or β+1 in radix-r division with quotient digit set [–α, α].

p

d

Choose β + 1

Choose β

d min

Overlap region

(h + β + 1)d

A

(h + β)d

(–h + β + 1)d

(–h + β)d

B

4 bits of p 3 bits of d

3 bits of p 4 bits of d

Note: h = α / (r – 1)

Standard pxx.xxxx

Carry-save pxx.xxxxxxx.xxxxx

Page 286: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

54Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Table Entries in the Quotient Digit Selection LogicWe want to make the uncertainty rectangle as large as possible, to minimize the number of bits in p and d needed for choosing the quotient digits.

p

d

β

+1(h + )d

( + )d 

(h + + 1)d

( + + 1)d 

Note: h = /(r?)

β

β

β

β

β

αβ

β+1 ββ

ββ

ββ

ββ

β+1 β+1β+1 β+1

β+1 β+1β+1

β+1orδ+1δ

Origin

Staircaselikeselection boundary

Page 287: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

55Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Using p-d Plots in Practice

Establishing upper bounds on the dimensions of uncertainty rectangles.

Δp

p

d

Choose α

Choose α − 1

d min

Overlap region

(h + α − 1)d

(−h + α)d

Δd

d min Δd +

(h + α − 1) d min

(−h + α) d min

Smallest Δd occurs for the overlap region of α and α – 1

α+−−

=Δhhdd 12min

)12(min −=Δ hdp

Page 288: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

56Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Example: Lower Bounds on Precision

)12(min −=Δ hdp

Fig. 15.4

Δp

p

d

Choose α

Choose α − 1

d min

Overlap region

(h + α − 1)d

(−h + α)d

Δd

d min Δd +

(h + α − 1) d min

(−h + α) d min

For r = 4, divisor range [0.5, 1), digit set [–2, 2], we have α = 2, dmin = 1/2, h = α/(r – 1) = 2/3

Because 1/8 = 2–3 and 2–3 ≤ 1/6 < 2–2, we must inspect at least 3 bits of d (2, given its leading 1) and 3 bits of p These are lower bounds (not truncated bits) and may prove inadequateIn fact, 3 bits of p and 4 (3) bits of d are required With p in carry-save form, 4 bits of each component must be inspected

8/123/2

13/4)2/1( =+−

−=Δd 6/1)13/4)(2/1( =−=Δp

α+−−

=Δhhdd 12min

Page 289: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

57Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Upper Bounds for Precision

Theorem: Once lower bounds on precision are determined based on Δdand Δp, one more bit of precision in each direction is always adequate

u v

Δp

p

d

w

Choose a

Choose a − 1

d min

Overlap region

w

(a − 1 + h)d

(a − h)d

Δd A

B

Proof: Let w be the spacing of vertical grid linesw ≤ Δd/2 ⇒ v ≤ Δp/2 ⇒ u ≥ Δp/2

Page 290: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

58Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Some Implementation Details

The asymmetry of quotient digit selection process.

p

d

Choose β + 1

Choose β

d min

A

B

d max

−β

β + 1

Choose −β + 1

Choose −β

p

d

β

+1

β

β

β

β β

β

δ β

β+1

β+1

β+1

β+1

β+1

β+1 or

δ+1

δ

*

* *

*

Example of p-d plot allowing larger uncertainty rectangles, if the 4 cases marked with asterisks are handled as exceptions.

Page 291: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

59Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

5d/3

4d/3

d 1.000 1.001 1.010 1.011 1.100 0.100 0.101 0.110 0.111 1.000

01.10

01.01

01.00

00.11

00.10

00.00

00.01

11.11

11.10

11.01

11.00

10.11

10.10

2d/3

d/3

–d/3

–4d/3

–5d/3

–2d/3

2 1 2 1

2 1,2 1 1,2 1

2 1,2 1 2 1,2

Radix r = 4q–j in [–2, 2]d in [1/2, 1)p in [–8/3, 8/3]

The Pentium chip division bug

Page 292: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

60Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Division with Prescaling

Restricting the divisor to the shaded area simplifies quotient digit selection.

p

d

Choose β + 1

Choose β

d min d max

Choose −β + 1

Choose −β

Overlap regions of a p-d plot are wider toward the high end of the divisor range If we can restrict the magnitude of the divisor to an interval close to dmax (say 1 – e < d < 1 + d, when dmax= 1), quotient digit selection may become simpler Thus, we perform the division (zm)/(dm) for a suitably chosen scale factor m (m > 1)Prescaling (multiplying z and d by m) should be done without real multiplications

Page 293: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

61Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Modular Dividers and ReducersGiven dividend z and divisor d, with d ≥ 0, a modular divider computes

q = ⎣z / d⎦ and s = z mod d = ⟨z⟩d

The quotient q is, by definition, an integer but the inputs z and d do not have to be integers; the modular remainder is always positive

Example:

⎣–3.76 / 1.23⎦ = –4 and ⟨–3.76⟩1.23 = 1.16

The quotient and remainder of ordinary division are −3 and −0.07A modular reducer computes only the modular remainder and is in many cases simpler than a full-blown divider

<z>d =<zH2k + zL >d = <zH (2k-1)+ zH + ZL >d

Page 294: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

62Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Array DividersRestoring array divider composed of controlled subtractor cells.

z

z

–5

–6

s s s–4 –5 –6

q

q

q

–1

–2

–3

FS

Cell

z z z z–1 –2 –3 –4

1 0

d d d–1 –2 –3

0

0

0

–1 –2 –3 –4 –5 –6 –1 –2 –3 –1 –2 –3

–4 –5 –6

Dividend z = .z z z z z z Divisor d = .d d d Quotient q = .q q q Remainder s = .0 0 0 s s s

Page 295: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

63Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Nonrestoring Array DividerNonrestoringarray divider built of controlled add/subtract cells.

Similarity to array multiplier is deceiving

Critical path

Dividend z = z .z z z z z z Divisor d = d .d d d Quotient q = q .q q q Remainder s = 0 .0 0 s s s s

0 –1 –2 –3 –4 –5 –6 0 –1 –2 –3 0 –1 –2 –3

–3 –4 –5 –6

z

z

z

–4

–5

–6

s s s s–3 –4 –5 –6

q

q

q

0

–1

–2

q –3

d d d d0 –1 –2 –3z z z z0 –1 –2 –3

FA

XOR

Cell

1

Page 296: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

64Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Speedup Methods for Array Dividers

Critical path

However, we still need to know the carry/borrow-out from each rowSolution: Insert a carry-lookahead circuit between successive rowsNot very cost-effective; thus not used in practice

Idea: Pass the partial remainder downward in carry-save form to speed up the operation of each row

Fig. 15.8

Dividend z = z .z z z z z z Divisor d = d .d d d Quotient q = q .q q q Remainder s = 0 .0 0 s s s s

0 –1 –2 –3 –4 –5 –6 0 –1 –2 –3 0 –1 –2 –3

–3 –4 –5 –6

z

z

z

–4

–5

–6

s s s s–3 –4 –5 –6

q

q

q

0

–1

–2

q –3

d d d d0 –1 –2 –3z z z z0 –1 –2 –3

FA

XOR

Cell

1

Page 297: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

65Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Combined Multiply/Divide Units

Quotient

k

Partial Remainder

Divisor

add/sub

k-bit adder

k

cout cin

Complement

qk  2s (j?)MSB of

Divisor Sign

Complement of Partial Remainder Sign

Fig. 9.4 Fig. 13.10

Multiplier x

Mux

Adder

0

out c

0 1

Doublewidth partial product p

Multiplicand a

Shift

Shift

(j)

j x

x a j

k

k

k

Similarity of blocks in multipliers and dividers (only shift direction is different)

Page 298: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

66Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Single Unit for Sequential Multiplication and Division

The control unit proceeds through necessary steps for multiplication or division (including using the appropriate shift direction)

Sequential radix-2 multiply/divide unit.

Multiplier x or quotient q

Mux

Adder out c

0 1

Partial product p or partial remainder s

Multiplicand a or divisor d

Shift control

Shift

Enable

in c

q k–j

MSB of 2s (j–1)

k

k

k

j x

MSB of p (j+1)

Divisor sign

Multiply/ divide control

Select

Mul Div

The slight speed penalty owing to a more complex control unit is insignificant

Page 299: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

67Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Single Unit for Array Multiplication and Division

Each cell within the array can act as a modified adder or modified subtractor based on control input values

I/O specification of a universal circuit that can act as an array multiplier or array divider.

In some designs, squaring and square-rooting functions are also included within the same array

Multiplicand or divisor

Multiplier

Product or remainder

Quotient

Mul/Div

Additive input or dividend

Page 300: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

68Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

16 Division by Convergence

Chapter GoalsShow how by using multiplication as thebasic operation in each division step,the number of iterations can be reduced

Chapter HighlightsDigit-recurrence as convergence methodConvergence by Newton-Raphson iterationComputing the reciprocal of a numberHardware implementation and fine tuning

Page 301: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

69Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

General Convergence Methods

u (i+1) = f(u (i), v (i), w (i))v (i+1) = g(u (i), v (i), w (i))w (i+1) = h(u (i), v (i), w (i))

u (i+1) = f(u (i), v (i))v (i+1) = g(u (i), v (i))

The complexity of this method depends on two factors:

a. Ease of evaluating f and g (and h)b. Rate of convergence (number of iterations needed)

Constant

Desiredfunction

Guide the iteration such that one of the values converges to a constant (usually 0 or 1)

The other value then converges to the desired function

Page 302: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

70Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Division by Repeated Multiplications

Remainder often not needed, but can be obtained by another multiplication if desired: s = z – qd

Motivation: Suppose add takes 1 clock and multiply 3 clocks64-bit divide takes 64 clocks in radix 2, 32 in radix 4

Divide faster via multiplications faster if 10 or fewer needed

)1()1()0(

)1()1()0(

−== m

m

xxdxxxzx

dzq

L

LIdea:

Force to 1Converges to q

To turn the identity into a division algorithm, we face three questions:

1. How to select the multipliers x(i) ?2. How many iterations (pairs of multiplications)? 3. How to implement in hardware?

Page 303: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

71Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Formulation as a Convergence Computation

)1()1()0(

)1()1()0(

−== m

m

xxdxxxzx

dzq

L

LIdea:

Force to 1Converges to q

d (i+1) = d (i) x (i) Set d (0) = d; make d (m) converge to 1z (i+1) = z (i) x (i) Set z (0) = z; obtain z/d = q ≅ z (m)

Question 1: How to select the multipliers x (i) ? x (i) = 2 – d (i)

This choice transforms the recurrence equations into:

d (i+1) = d (i) (2 − d (i)) Set d (0) = d; iterate until d (m) ≅ 1z (i+1) = z (i) (2 − d (i)) Set z (0) = z; obtain z/d = q ≅ z (m)

u (i+1) = f(u (i), v (i))v (i+1) = g(u (i), v (i))

Fits the general form

Page 304: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

72Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Determining the Rate of Convergenced (i+1) = d (i) x (i) Set d (0) = d; make d (m) converge to 1z (i+1) = z (i) x (i) Set z (0) = z; obtain z/d = q ≅ z (m)

Question 2: How quickly does d (i) converge to 1?

We can relate the error in step i + 1 to the error in step i:

d (i+1) = d (i) (2 − d (i)) = 1 – (1 – d (i))2

1 – d (i+1) = (1 – d (i))2

For 1 – d (i) ≤ ε, we get 1 – d (i+1) ≤ ε2: Quadratic convergence

In general, for k-bit operands, we need

2m – 1 multiplications and m 2’s complementations

where m = ⎡log2 k⎤

Page 305: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

73Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Quadratic ConvergenceTable: Quadratic convergence in computing z/d by repeated multiplications, where 1/2 ≤ d = 1 – y < 1

–––––––––––––––––––––––––––––––––––––––––––––––––––––––i d (i) = d (i–1) x (i–1), with d (0) = d x (i) = 2 – d (i)

–––––––––––––––––––––––––––––––––––––––––––––––––––––––0 1 – y = (.1xxx xxxx xxxx xxxx)two ≥ 1/2 1 + y1 1 – y 2 = (.11xx xxxx xxxx xxxx)two ≥ 3/4 1 + y 2

2 1 – y 4 = (.1111 xxxx xxxx xxxx)two ≥ 15/16 1 + y 4

3 1 – y 8 = (.1111 1111 xxxx xxxx)two ≥ 255/256 1 + y 8

4 1 – y 16 = (.1111 1111 1111 1111)two = 1 – ulp–––––––––––––––––––––––––––––––––––––––––––––––––––––––Each iteration doubles the number of guaranteed leading 1s (convergence to 1 is from below)

Beginning with a single 1 (d ≥ ½), after log2k iterations we get as close to 1 as is possible in a fractional representation

Page 306: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

74Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Graphical Depiction of Convergence to q

Question 3 (implementation in hardware) to be discussed later

1 1 – ulp

d

z

q –

Iteration i

d

z

0 1 2 3 4 5 6

(i)

(i)

q ε

Page 307: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

75Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Division by Reciprocation

Convergence to a root of f(x) = 0 in the Newton-Raphson method.

The Newton-Raphson method can be used for finding a root of f (x) = 0

f(x)

xx(i+1)x

f(x )

Tangent at x(i)

Root α x(i)(i+2)

(i)

(i)

Start with an initial estimate x(0) for the root

Iteratively refine the estimate via the recurrence

x(i+1) = x(i) – f (x(i)) / f ′(x(i))

Justification:

tan α(i) = f ′(x(i))= f (x(i)) / (x(i) – x(i+1))

Page 308: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

76Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Computing 1/d by Convergence1/d is the root of f (x) = 1/x – d

f ′(x) = –1/x2

Substitute in the Newton-Raphson recurrence x(i+1) = x(i) – f (x(i)) / f ′(x(i)) to get:

x (i+1) = x (i) (2 − x (i)d)

One iteration = Two multiplications + One 2’s complementation

Error analysis: Let δ (i) = 1/d – x(i) be the error at the ith iteration

δ (i+1) = 1/d – x (i+1) = 1/d – x (i) (2 – x (i) d) = d (1/d – x (i))2 = d (δ (i))2

Because d < 1, we have δ (i+1) < (δ (i))2

−d

1/d x

f(x)

Page 309: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

77Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Choosing the Initial Approximation to 1/dWith x(0) in the range 0 < x(0) < 2/d, convergence is guaranteed

Justification: |δ(0) | = |x(0) – 1/d | < 1/d

δ(1)= |x(1) – 1/d | = d (δ(0))2 = (dδ(0))δ(0) < δ(0)

1

x

1/x

2

10

0

For d in [1/2, 1):

Simple choice x(0) = 1.5

Max error = 0.5 < 1/d

Better approx. x(0) = 4(√3 – 1) – 2d= 2.9282 – 2d

Max error ≅ 0.1

Page 310: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

78Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Speedup of Convergence Division

Division can be performed via 2⎡log2k⎤ – 1 multiplications

This is not yet very impressive64-bit numbers, 3-ns multiplier ⇒ 33-ns division

Three types of speedup are possible:

Fewer multiplications (reduce m) Narrower multiplications (reduce the width of some x(i)s)Faster multiplications

)1()1()0(

)1()1()0(

−== m

m

xxdxxxzx

dzq

L

L Compute y = 1/d Do the multiplication yz

Page 311: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

79Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Initial Approximation via Table LookupConvergence is slow in the beginning: it takes 6 multiplications to get 8 bits of convergence and another 5 to go from 8 bits to 64 bits

d x(0) x(1) x(2) = (0.1111 1111 . . . )two

Approx to 1/d

Better approx

Read this value, x(0+), directly from a table, thereby reducing 6 multiplications to 2

A 2w × w lookup table is necessary and sufficient for w bits of convergence after 2 multiplications

Example with 4-bit lookup: d = 0.1011 xxxx . . . (11/16 ≤ d < 12/16)Inverses of the two extremes are 16/11 ≅ 1.0111 and 16/12 ≅ 1.0101 So, 1.0110 is a good estimate for 1/d1.0110 × 0.1011 = (11/8) × (11/16) = 121/128 = 0.1111001 1.0110 × 0.1100 = (11/8) × (3/4) = 33/32 = 1.000010

Page 312: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

80Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Visualizing the Convergence with Table Lookup

Convergence in division by repeated multiplications with initialtable lookup.

1 1 – ulp

d

z

q –

Iterations

After table lookup and 1st pair of multiplications, replacing several iterations

After the 2nd pair of multiplications

ε

Page 313: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

81Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Convergence Does Not Have to Be from Below

1 1 ± ulp

d

z

q ±

Iterations

ε

Page 314: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

82Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Using Truncated Multiplicative Factors

Fig. 16.4 One step in convergence division with truncated multiplicative factors.

1

Approximate iteration

Precise iteration

B

A

i + 1 i

Iteration

(x (i+1)

d x (0) x (1) x (i) ... x (i+1)

) T

d x (0) x (1) x (i) ...

d x (0) x (1) x (i) ...

< 2 −a

Example (64-bit multiplication)Initial step: Table of size 256 × 8 = 2K bitsMiddle steps: Multiplication pairs, with 9-, 17-, and 33-bit multipliersFinal step: Full 64 × 64 multiplication

Problem 16.9aA truncated denominator d (i), with aidentical leading bits and b extra bits (b ≤ a), leads to a new denominator d (i+1) with a + b identical leading bits

Page 315: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

83Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Hardware ImplementationRepeated multiplications: Each pair of ops involves the same multiplier

d (i+1) = d (i) (2 − d (i)) Set d (0) = d; iterate until d (m) ≅ 1z (i+1) = z (i) (2 − d (i)) Set z (0) = z; obtain z/d = q ≅ z (m)

Two multiplications fully overlapped in a 2-stage pipelined multiplier.

z x(i)(i)

d x(i)(i)

x(i)z(i)d(i+1)

d(i+1)

x(i+1)

z x(i)(i)

d x(i+1)(i+1)

z(i+1)

2's Complz(i+1) x(i+1)

z x(i+1)(i+1)

d(i+2)

d x(i+1)(i+1)

Page 316: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

84Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Implementing Division with ReciprocationReciprocation: Multiplication pairs are data-dependent, so they cannot be pipelined or performed in parallel

x (i+1) = x (i) (2 − x (i)d)

Options for speedup via a better initial approximation

Consult a larger tableResort to a bipartite or multipartite table (see Chapter 24) Use table lookup, followed with interpolationCompute the approximation via multioperand addition

Unless several multiplications by the same multiplier are needed, division by repeated multiplications is more efficient

However, given a fast method for reciprocation (see Section 24.6), using a reciprocation unit with a standard multiplier is often preferred

Page 317: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

85Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

Analysis of Lookup Table SizeTable:Sample entries in the lookup table replacing the first four multiplications in division by repeated multiplications

–––––––––––––––––––––––––––––––––––––––––––––––––––––––Address d = 0.1 xxxx xxxx x (0+) = 1. xxxx xxxx

–––––––––––––––––––––––––––––––––––––––––––––––––––––––55 0011 0111 1010 010164 0100 0000 1001 1001

–––––––––––––––––––––––––––––––––––––––––––––––––––––––

Example: Table entry at address 55 (311/512 ≤ d < 312/512)

For 8 bits of convergence, the table entry f must satisfy

(311/512)(1 + . f) ≥ 1 – 2–8 (312/512)(1 + . f) ≤ 1 + 2–8

199/311 ≤ .f ≤ 101/156 or 163.81 ≤ 256 × . f ≤ 165.74

Two choices: 164 = (1010 0100)two or 165 = (1010 0101)two

Page 318: Computer Arithmetic Designscholar.fju.edu.tw/課程大綱/upload/054753/content/981...3 Computer Arithmetic 1, Dept. of EE, Fu Jen Catholic University, Taiwan Course Objectives Learn

86Computer Arithmetic 4, Dept. of EE, Fu Jen Catholic University, Taiwan

A General Result for Table Size

Proof strategy for sufficiency: Represent the table entry 1.f as the integer v = 2w × .f and derive upper / lower bound expressions for it. Then, show that at least one integer exists between vlb and vub

Theorem 16.1: To get w ≥ 5 bits of convergence after the first iteration of division by repeated multiplications, w bits of d (beyond the mandatory 1) must be inspected. The factor x(0+) read out from table is of the form (1.xxx . . . xxx)two, with w bits after the radix point

Proof strategy for necessity: Show that derived conditions cannot be met if the table is of size 2k–1 (no matter how wide) or if it is of width k – 1 (no matter how large)

Excluded cases, w < 5: Practically uninteresting (allow smaller table)

General radix r : Same analysis method, and results, apply