chapter 3 arithmetic for computers(revised) (1)

7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

1/64

Chapter 3Arithmetic for Computers


2/64

Chapter 3 Arithmetic for Computers 2

Arithmetic for Computers

Operations on integers Addition and subtraction

Multiplication and division

Dealing with overflow

Floating-point real numbers

Representation and operations

3.1Introduction


3/64

3

Arithmetic

Where we've been: Performance (seconds, cycles, instructions)

Abstractions:Instruction Set ArchitectureAssembly Language and Machine Language

What's up ahead:

Implementing the Architecture

32

32

32

operation

result

a

b

ALU


4/64

4

Bits are just bits (no inherent meaning) conventions define relationship between bits and numbers

Binary numbers (base 2)0000 0001 0010 0011 0100 0101 0110 0111 1000 1001...decimal: 0...2n-1

Of course it gets more complicated:

numbers are finite (overflow)fractions and real numbersnegative numberse.g., no MIPS subi instruction; addi can add a negative number)

How do we represent negative numbers?i.e., which bit patterns will represent which numbers?

Numbers


5/64

5

Sign Magnitude: One's Complement Two's Complement

000 = +0 000 = +0 000 = +0001 = +1 001 = +1 001 = +1010 = +2 010 = +2 010 = +2011 = +3 011 = +3 011 = +3100 = -0 100 = -3 100 = -4101 = -1 101 = -2 101 = -3110 = -2 110 = -1 110 = -2111 = -3 111 = -0 111 = -1

Issues: balance, number of zeros, ease of operations

Which one is best? Why?

Possible Representations


6/64

6

32 bit signed numbers:

0000 0000 0000 0000 0000 0000 0000 0000two = 0ten0000 0000 0000 0000 0000 0000 0000 0001two = + 1ten0000 0000 0000 0000 0000 0000 0000 0010two = + 2ten...0111 1111 1111 1111 1111 1111 1111 1110two = + 2,147,483,646ten0111 1111 1111 1111 1111 1111 1111 1111two = + 2,147,483,647ten1000 0000 0000 0000 0000 0000 0000 0000two = 2,147,483,648ten1000 0000 0000 0000 0000 0000 0000 0001two = 2,147,483,647ten1000 0000 0000 0000 0000 0000 0000 0010two = 2,147,483,646ten...1111 1111 1111 1111 1111 1111 1111 1101two = 3ten1111 1111 1111 1111 1111 1111 1111 1110two = 2ten1111 1111 1111 1111 1111 1111 1111 1111two = 1ten

maxint

minint

MIPS


7/64

7

Negating a two's complement number: invert all bits and add 1 remember: negate (+/-) and invert (1/0) are quite different!

Converting n bit numbers into numbers with more than n bits:

MIPS 16 bit immediate gets converted to 32 bits for arithmetic

copy the most significant bit (the sign bit) into the other bits0010 -> 0000 0010

1010 -> 1111 1010

"sign extension" (lbu vs. lb)

Two's Complement Operations


8/64

8

Just like in grade school (carry/borrow 1s)

0111 0111 0110

+ 0110 - 0110 - 0101

Two's complement operations easy

subtraction using addition of negative numbers0111

+ 1010

Overflow (result too large for finite computer word):

e.g., adding two n-bit numbers does not yield an n-bit number0111

+ 0001 note that overf low term is somewhat misleading,

1000 it does not mean a carry overflowed

Addition & Subtraction


9/64

9

No overflow when adding a positive and a negative number

No overflow when signs are the same for subtraction

Overflow occurs when the value affects the sign:

overflow when adding two positives yields a negative

or, adding two negatives gives a positive

or, subtract a negative from a positive and get a negative or, subtract a positive from a negative and get a positive

Consider the operations A + B, and A B

Can overflow occur if B is 0 ?

Can overflow occur if A is 0 ?

Detecting Overflow


10/64

10

An exception (interrupt) occurs

Control jumps to predefined address for exception

Interrupted address is saved for possible resumption

Details based on software system / language

example: flight control vs. homework assignment

Don't always want to detect overflow

new MIPS instructions: addu, addiu, subu

note: addiu sti l l sign-extends!note: sltu, sltiu for unsigned comparisons

Roll over: circular buffers Saturation: pixel l ightness contr ol

Effects of Overflow


11/64

11

Problem: Consider a logic function with three inputs: A, B, and C.

Output D is true if at least one input is trueOutput E is true if exactly two inputs are trueOutput F is true only if all three inputs are true

Show the truth table for these three functions.

Show the Boolean equations for these three functions.

Show an implementation consisting of inverters, AND, and OR gates.

Review: Boolean Algebra & Gates


12/64

12

Let's build an ALU to support the andi and ori instructions

we'll just build a 1 bit ALU, and use 32 of them

Possible Implementation (sum-of-products):

b

a

operation

result

op a b res

An ALU (arithmetic logic unit)


13/64

13

Appendix: C.5


14/64

14

Selects one of the inputs to be the output, based on a control input

Lets build our ALU using a MUX:

S

CA

B0

1

Review: The Multiplexor

note: we call this a 2-input mux

even though i t has 3 inputs!


15/64

15

Not easy to decide the best way to build something

Don't want too many inputs to a single gate

Dont want to have to go through too many gates

for our purposes, ease of comprehension is important

Let's look at a 1-bit ALU for addition:

How could we build a 1-bit ALU for add, and, and or?

How could we build a 32-bit ALU?

Different Implementations

cout = a b + a cin + b cinsum = a xor b xor cin

Sum

CarryIn

CarryOut

a

b


16/64

16

Building a 32 bit ALU

b

0

2

Result

Operation

a

1

CarryIn

CarryOut

Result31

a31

b31

Result0

CarryIn

a0

b0

Result1

a1

b1

Result2

a2

b2

Operation

ALU0

CarryIn

CarryOut

ALU1

CarryIn

CarryOut

ALU2

CarryIn

CarryOut

ALU31

CarryIn

R0 = a ^ b;

R1 = a V b;

R2 = a + b;

Case (Op) {

0: R = R0;

1: R = R1;

2: R = R2 }

Case (Op) {

0: R = a ^ b;

1: R = a V b;

2: R = a + b}


17/64

17

Two's complement approach: just negate b and add.

How do we negate?

A very clever solution:

Result = A + (~B) + 1

What about subtraction (a b) ?

0

2

Result

Operation

a

1

CarryIn

CarryOut

0

1

Binvert

b


18/64

18

Need to support the set-on-less-than instruction (slt) remember: slt is an arithmetic instruction

produces a 1 if rs < rt and 0 otherwise

use subtraction: (a-b) < 0 implies a < b

Need to support test for equality (beq $t5, $t6, $t7) use subtraction: (a-b) = 0 implies a = b

Tailoring the ALU to the MIPS


19/64

Supporting slt

Can we figure out the idea?0

3

Result

Operation

a

1

CarryIn

CarryOut

0

1

Binvert

b 2

Less

a.

0

3

Result

Operation

a

1

CarryIn

0

1

Binvert

b 2

Less

Set

Overflowdetection

Overflow

b.


20/64

20

Set

a31

0

ALU0 Result0

CarryIn

a0

Result1

a1

0

Result2

a2

0

Operation

b31

b0

b1

b2

Result31

Overflow

Binvert

CarryIn

Less

CarryIn

CarryOut

ALU1

Less

CarryIn

CarryOut

ALU2

Less

CarryIn

CarryOut

ALU31

Less

CarryIn


21/64

21

Test for equality

Notice control lines:

000 = and

001 = or

010 = add

110 = subtract

111 = slt

Note: zero is a 1 when the resul t is zero!

Seta31

0

Result0a0

Result1a1

0

Result2a2

0

Operation

b31

b0

b1

b2

Result31

Overflow

Bnegate

Zero

ALU0

Less

CarryIn

CarryOut

ALU1

Less

CarryIn

CarryOut

ALU2

Less

CarryIn

CarryOut

ALU31

Less

CarryIn

0

3

Result

Operation

a

1

CarryIn

CarryOut

0

1

Binvert

b 2

Less

a.


22/64

22

Conclusion

We can build an ALU to support the MIPS instruction set

key idea: use multiplexor to select the output we want

we can efficiently perform subtraction using twos complement

we can replicate a 1-bit ALU to produce a 32-bit ALU

Important points about hardware

all of the gates are always working the speed of a gate is affected by the number of inputs to the gate

the speed of a circuit is affected by the number of gates in series

(on the critical path or the deepest level of logic)

Our primary focus: comprehension, however,

Clever changes to organization can improve performance(similar to using better algorithms in software)

well look at two examples for addition and multiplication


23/64

23

Is a 32-bit ALU as fast as a 1-bit ALU?

Is there more than one way to do addition?

two extremes: ripple carry and sum-of-products

Can you see the ripple? How could you get rid of it?

c1 = b0c0 + a0c0 +a0b0 c2 = b1c1 + a1c1 +a1b1c3 = b2c2 + a2c2 +a2b2 c4 = b3c3 + a3c3 +a3b3

Expanding the carry chainsc2 = a1a0b0+a1a0c0+a1b0c0+b1a0b0+b1a0c0+b1b0c0+a1b1

c3 = ??? c4 = ???

Not feasible! Why?

Cn has 2P terms (P=0n)

Problem: ripple carry adder is slow


24/64

24

Appendix: C.6


25/64

25

An approach in-between our two extremes

Motivation:

If we didn't know the value of carry-in, what could we do?

When would we always generate a carry? gi = aibi When would we propagate the carry? pi = ai + bi

Did we get rid of the ripple?

c1 = g0 + p0c0 c2 = g1 + p1c1c3 = g2 + p2c2 c4 = g3 + p3c3

Expanding the carry chainsc2 = g1+p1g0+p1p0c0

Feasible! Why?

The carry chain does not disappear, but is much smaller:

Cn has n+1 terms

Carry-lookahead adder


26/64

26

How about constructing a 16-bit adder in CLA way?

Cant build a 16 bit adder this way... (too big) Could use ripple carry of 4-bit CLA adders

Better: use the CLA principle again!

Use principle to build bigger adders

CarryIn

Result0--3

ALU0

CarryIn

Result4--7

ALU1

CarryIn

Result8--11

ALU2

CarryIn

CarryOut

Result12--15

ALU3

CarryIn

C1

C2

C3

C4

P0G0

P1

G1

P2G2

P3G3

pigi

pi + 1

gi + 1

ci + 1

ci + 2

ci + 3

ci + 4

pi + 2gi + 2

pi + 3gi + 3

a0

b0a1b1a2b2a3b3

a4b4a5b5a6

b6a7b7

a8b8a9b9

a10b10a11b11

a12b12a13b13a14b14a15b15

Carry-lookahead unit

P0 = p3 p2 p1 p0

P1 = p7 p6 p5 p4

P2 =p11 p10 p9 p8

P3 = p15 p14 p13 p12

G0 = g3+p3g2+p3p2g1+p3p2p1g0

G1 = g7+p7g6+p7p6g5+p7p6p5g4

G2 = g11+p11g10+p11p10g9+p11p10p9g8

G3 = g15+p15g14+p15p14g13+p15p14p13g12

C1 = G0 +(P0

c0)

C2 = G1+(P1 G0)+(P1P0c0)

C3 = G2+(P2 G1)+(P2P1G0)+(P2P1P0c0)

C4 = G3+(P3 G2)+(P3P2G1)+(P3P2P1G0)

+(P3P2P1P0c0)

3


27/64


Multiplication

Start with long-multiplication approach

1000 1001

10000000000010001001000

Length of product isthe sum of operandlengths

multiplicand

multiplier

product

3.3Multiplica

tion


28/64


Multiplication Hardware

Initially 0


29/64


Optimized Multiplier

Perform steps in parallel: add/shift

One cycle per partial-product addition

Thats ok, if frequency of multiplications is low


30/64


Faster Multiplier

Uses multiple adders

Cost/performance tradeoff

Can be pipelined

Several multiplication performed in parallel


31/64


MIPS Multiplication

Two 32-bit registers for product

HI: most-significant 32 bits

LO: least-significant 32-bits

Instructions

mult rs, rt / multu rs, rt 64-bit product in HI/LO

mfhi rd / mflo rd

Move from HI/LO to rd

Can test HI value to see if product overflows 32 bits mul rd, rs, rt

Least-significant 32 bits of product> rd

3


32/64


Division

Check for 0 divisor

Long division approach If divisor dividend bits

1 bit in quotient, subtract

Otherwise 0 bit in quotient, bring down next

dividend bit

Restoring division Do the subtract, and if remainder

goes < 0, add divisor back

Signed division Divide using absolute values

Adjust sign of quotient and remainderas required

100101000 10010100

-1000101011010-1000

100

n-bit operands yield (n+1)-bitquotient and n-bit remainder

quotient

dividend

remainder

divisor

3.4Division


33/64


Division Hardware

Initially dividend

Initially divisorin left half


34/64


Optimized Divider

One cycle per partial-remainder subtraction Looks a lot like a multiplier!

Same hardware can be used for both


35/64


Faster Division

Cant use parallel hardware as in multiplier

Subtraction is conditional on sign of remainder

Faster dividers (e.g. SRT devision)generate multiple quotient bits per step

Still require multiple steps


36/64


MIPS Division

Use HI/LO registers for result

HI: 32-bit remainder

LO: 32-bit quotient

Instructions

div rs, rt / divu rs, rt

No overflow or divide-by-0 checking

Software must perform checks if required

Use mfhi, mflo to access result

3


37/64


Floating Point

Representation for non-integral numbers

Including very small and very large numbers

Like scientific notation

2.34 1056

+0.002 104

+987.02 109

In binary

1.xxxxxxx2 2yyyy

Types float and double in C

normalized

not normalized

3.5FloatingP

oint


38/64


Floating Point Standard

Defined by IEEE Std 754-1985

Developed in response to divergence ofrepresentations

Portability issues for scientific code

Now almost universally adopted

Two representations

Single precision (32-bit) Double precision (64-bit)


39/64


IEEE Floating-Point Format

S: sign bit (0 non-negative, 1 negative) Normalize significand: 1.0 |significand| < 2.0

Always has a leading pre-binary-point 1 bit, so no need torepresent it explicitly (hidden bit)

Significand is Fraction with the 1. restored

Exponent: excess representation: actual exponent + Bias Ensures exponent is unsigned Single: Bias = 127; Double: Bias = 1203

S Exponent Fraction

single: 8 bits

double: 11 bits

single: 23 bits

double: 52 bits

Bias)(ExponentS 2Fraction)(11)(x


40/64


Single-Precision Range

Exponents 00000000 and 11111111 reserved

Smallest value

Exponent: 00000001 actual exponent = 1 127 =126

Fraction: 00000 significand = 1.0 1.0 2126 1.2 1038

Largest value

exponent: 11111110 actual exponent = 254 127 = +127

Fraction: 11111significand 2.0

2.0 2+127 3.4 10+38


41/64


Double-Precision Range

Exponents 000000 and 111111 reserved

Smallest value

Exponent: 00000000001 actual exponent = 1 1023 =1022

Fraction: 00000 significand = 1.0 1.0 21022 2.2 10308

Largest value

Exponent: 11111111110 actual exponent = 2046 1023 = +1023

Fraction: 11111significand 2.0

2.0 2+1023 1.8 10+308

Fl i P i P i i


42/64


Floating-Point Precision

Relative precision

all fraction bits are significant

Single: approx 223

Equivalent to 23 log102 23 0.3 6 decimal

digits of precision

Double: approx 252

Equivalent to 52 log102 52 0.3 16 decimaldigits of precision

Fl ti P i t E l


43/64


Floating-Point Example

Represent0.75

0.75 = (1)1 1.12 21

S = 1

Fraction = 1000002

Exponent =1 + Bias

Single:1 + 127 = 126 = 011111102

Double:1 + 1023 = 1022 = 011111111102

Single: 101111110100000 Double: 101111111110100000

Fl ti P i t E l


44/64


Floating-Point Example

What number is represented by the single-precision float1100000010100000 S = 1

Fraction = 01000002 Fxponent = 100000012 = 129

x = (1)1 (1 + 012) 2(129 127)

= (1) 1.25 22

=5.0

IEEE 754 di f fl ti i t b


45/64

51

IEEE 754 encoding of floating-point numbers

Single precision Double precision Object represented

Exponent Fraction Exponent Fraction

0 0 0 0 0

0 Nonzero 0 Nonzero Denomalized number

1-254 Anything 1-2046 Anything Floating-point number

255 0 2047 0 Infinity

255 Nonzero 2047 Nonzero NaN (Not a Number)

+-+-

+-

Fl ti P i t Additi


46/64


Floating-Point Addition

Consider a 4-digit decimal example 9.999 101 + 1.610 101

1. Align decimal points Shift number with smaller exponent

9.999 101 + 0.016 101

2. Add significands 9.999 101 + 0.016 101 = 10.015 101

3. Normalize result & check for over/underflow

1.0015 102

4. Round and renormalize if necessary 1.002 102

Fl ti P i t Additi


47/64


Floating-Point Addition

Now consider a 4-digit binary example 1.0002 21 +1.1102 22 (0.5 +0.4375)

1. Align binary points Shift number with smaller exponent

1.0002 21 +0.1112 2

1

2. Add significands 1.0002 2

1 +0.1112 21 = 0.0012 2

1

3. Normalize result & check for over/underflow

1.0002 24

, with no over/underflow 4. Round and renormalize if necessary

1.0002 24 (no change) = 0.0625

FP Add H d


48/64


FP Adder Hardware

Much more complex than integer adder

Doing it in one clock cycle would take toolong

Much longer than integer operations

Slower clock would penalize all instructions

FP adder usually takes several cycles

Can be pipelined

FP Add H d


49/64


FP Adder Hardware

Step 1

Step 2

Step 3

Step 4

FP A ith ti H d


50/64


FP Arithmetic Hardware

FP multiplier is of similar complexity to FPadder But uses a multiplier for significands instead of

an adder

FP arithmetic hardware usually does Addition, subtraction, multiplication, division,

reciprocal, square-root

FP integer conversion

Operations usually takes several cycles Can be pipelined

FP I t ti i MIPS


51/64


FP Instructions in MIPS

FP hardware is coprocessor 1 Adjunct processor that extends the ISA

Separate FP registers 32 single-precision: $f0, $f1, $f31 Paired for double-precision: $f0/$f1, $f2/$f3,

Release 2 of MIPs ISA supports 32

64-bit FP regs FP instructions operate only on FP registers

Programs generally dont do integer ops on FP data,or vice versa

More registers with minimal code-size impact

FP load and store instructions lwc1, ldc1, swc1, sdc1

e.g., ldc1 $f8, 32($sp)

FP I t ti i MIPS


52/64


FP Instructions in MIPS

Single-precision arithmetic add.s, sub.s, mul.s, div.s

e.g., add.s $f0, $f1, $f6

Double-precision arithmetic add.d, sub.d, mul.d, div.d

e.g., mul.d $f4, $f4, $f6

Single- and double-precision comparison c.xx.s, c.xx.d (xxis eq, lt, le, ) Sets or clears FP condition-code bit

e.g. c.lt.s $f3, $f4

Branch on FP condition code true or false bc1t, bc1f

e.g., bc1t TargetLabel

FP E ample F to C


53/64


FP Example: F to C

C code:

float f2c (float fahr) {return ((5.0/9.0)*(fahr - 32.0));

}

fahr in $f12, result in $f0, literals in global memory

space Compiled MIPS code:f2c: lwc1 $f16, const5($gp)

lwc2 $f18, const9($gp)div.s $f16, $f16, $f18lwc1 $f18, const32($gp)sub.s $f18, $f12, $f18mul.s $f0, $f16, $f18jr $ra

FP E l A M lti li ti


54/64


FP Example: Array Multiplication

X = X + Y Z All 32 32 matrices, 64-bit double-precision elements

C code:void mm (double x[][],

double y[][], double z[][]) {

int i, j, k;for (i = 0; i! = 32; i = i + 1)

for (j = 0; j! = 32; j = j + 1)for (k = 0; k! = 32; k = k + 1)x[i][j] = x[i][j]

+ y[i][k] * z[k][j];}

Addresses ofx, y, z in $a0, $a1, $a2, andi, j, k in $s0, $s1, $s2



55/64



MIPS code:li $t1, 32 # $t1 = 32 (row size/loop end)

li $s0, 0 # i = 0; initialize 1st for loop

L1: li $s1, 0 # j = 0; restart 2nd for loop

L2: li $s2, 0 # k = 0; restart 3rd for loop

sll $t2, $s0, 5 # $t2 = i * 32 (size of row of x)

addu $t2, $t2, $s1 # $t2 = i * size(row) + jsll $t2, $t2, 3 # $t2 = byte offset of [i][j]

addu $t2, $a0, $t2 # $t2 = byte address of x[i][j]

l.d $f4, 0($t2) # $f4 = 8 bytes of x[i][j]

L3: sll $t0, $s2, 5 # $t0 = k * 32 (size of row of z)

addu $t0, $t0, $s1 # $t0 = k * size(row) + jsll $t0, $t0, 3 # $t0 = byte offset of [k][j]

addu $t0, $a2, $t0 # $t0 = byte address of z[k][j]

l.d $f16, 0($t0) # $f16 = 8 bytes of z[k][j]



56/64



sll $t0, $s0, 5 # $t0 = i*32 (size of row of y)addu $t0, $t0, $s2 # $t0 = i*size(row) + k

sll $t0, $t0, 3 # $t0 = byte offset of [i][k]

addu $t0, $a1, $t0 # $t0 = byte address of y[i][k]

l.d $f18, 0($t0) # $f18 = 8 bytes of y[i][k]

mul.d $f16, $f18, $f16 # $f16 = y[i][k] * z[k][j]

add.d $f4, $f4, $f16 # f4=x[i][j] + y[i][k]*z[k][j]

addiu $s2, $s2, 1 # $k k + 1

bne $s2, $t1, L3 # if (k != 32) go to L3

s.d $f4, 0($t2) # x[i][j] = $f4

addiu $s1, $s1, 1 # $j = j + 1

bne $s1, $t1, L2 # if (j != 32) go to L2addiu $s0, $s0, 1 # $i = i + 1

bne $s0, $t1, L1 # if (i != 32) go to L1

Interpretation of Data


57/64


Interpretation of Data

Bits have no inherent meaning

Interpretation depends on the instructions

applied Computer representations of numbers

Finite range and precision

Need to account for this in programs

The BIG Picture

Associativity3.6


58/64


Associativity

Parallel programs may interleaveoperations in unexpected orders

Assumptions of associativity may fail

6Parallelism

andComputerArithmetic:Ass

ociativity

(x+y)+z x+(y+z)

x -1.50E+38 -1.50E+38

y 1.50E+38

z 1.0 1.0

1.00E+00 0.00E+00

0.00E+00

1.50E+38

Need to validate parallel programs undervarying degrees of parallelism

x86 FP Architecture3.7


59/64


x86 FP Architecture

Originally based on 8087 FP coprocessor

8 80-bit extended-precision registers

Used as a push-down stack

Registers indexed from TOS: ST(0), ST(1),

FP values are 32-bit or 64 in memory Converted on load/store of memory operand

Integer operands can also be convertedon load/store

Very difficult to generate and optimize code Result: poor FP performance

7RealStuff

:FloatingPointinthex86

x86 FP Instructions


60/64


x86 FP Instructions

Optional variations I: integer operand P: pop operand from stack R: reverse operand order But not all combinations allowed

Data transfer Arithmetic Compare TranscendentalFILD mem/ST(i)

FISTP mem/ST(i)

FLDPI

FLD1

FLDZ

FIADDP mem/ST(i)

FISUBRP mem/ST(i)FIMULP mem/ST(i)FIDIVRP mem/ST(i)

FSQRT

FABSFRNDINT

FICOMP

FIUCOMP

FSTSW AX/mem

FPATAN

F2XMI

FCOS

FPTAN

FPREM

FPSIN

FYL2X

St i SIMD E t i 2 (SSE2)


61/64


Streaming SIMD Extension 2 (SSE2)

Adds 4 128-bit registers

Extended to 8 registers in AMD64/EM64T

Can be used for multiple FP operands

2 64-bit double precision

4 32-bit double precision

Instructions operate on them simultaneously

Single-Instruction Multiple-Data

Right Shift and Division3.8


62/64


Right Shift and Division

Left shift by iplaces multiplies an integer

by 2i Right shift divides by 2i?

Only for unsigned integers

For signed integers Arithmetic right shift: replicate the sign bit

e.g.,5 / 4 11111011

2>> 2 = 11111110

2=2

Rounds toward

c.f. 111110112 >>> 2 = 001111102 = +62

FallaciesandPitfalls

Who Cares About FP Accuracy?


63/64


Who Cares About FP Accuracy?

Important for scientific code

But for everyday consumer use?

My bank balance is out by 0.0002!

The Intel Pentium FDIV bug

The market expects accuracy

See Colwell, The Pentium Chronicles

Concluding Remarks3.9


64/64

Concluding Remarks

ISAs support arithmetic

Signed and unsigned integers

Floating-point approximation to reals

Bounded range and precision

Operations can overflow and underflow

MIPS ISA

Core instructions: 54 most frequently used

100% of SPECINT, 97% of SPECFP

Other instructions: less frequent

Concludin

gRemarks

chapter 3 arithmetic for computers(revised) (1)

Documents