arithmetic for computerscsce.uark.edu/.../lecture8-arithmetic-for-computers.pdf · 2020-02-20 ·...

Arithmetic for Computers

Alexander Nelson

February 20, 2020

University of Arkansas - Department of Computer Science and Computer Engineering

Arithmetic for Computers

Computers are digital – discrete

i.e. Cannot represent infinite precision

Computers use finite-length registers and arithmetic units

i.e. Must take care of overflow/underflow

Two fundamental concerns:

1. Operations on integers

2. Operations on floating-point real numbers

1

Operations on Integers

Integer Addition

Comparable to base-10 addition formula

Carry overflowing bits

1 (carry)

87

+ 25

====

112

(00110) (carry)

000111

+000110

========

001101

Overflow if result is out of range

2

Integer Subtraction

Add negation of second operand

Example: 7-6 = 7 +(-6)

carry: 1111 1111 1111 110

+7: 0000 0000 0000 0111

-6: 1111 1111 1111 1010

===========================

(1)0000 0000 0000 0001

Overflow?

3

Detecting Overflow

When does overflow occur?

• Add two positive values with resulting sign 1

• Add two negative values with resulting sign 0

• Subtract pos. number from neg. with res. sign 0

• Subtract neg. number from pos. with res. sign 1

Did overflow occur in our subtract example?

No – Subtracted positive number from positive number

4

Dealing with Overflow

Some languages (e.g. C) ignore overflow

i.e. Programmer must deal with it

Compiler does use MIPS addu, addui, subu instructions

Other languages (e.g. Ada, Fortran) require raising an exception

Use MIPS add, addui, sub instructions

On overflow – invoke an exception handler

• Save PC in exception program counter register

• Jump to predefined handler address

• mfc0 – move from coprocessor reg – retrieve EPC value to

return after correction

5

Trapping Overflow w/o Exception

addu $t0, $t1, $t2 # $t0 = sum, but don’t trap

xor $t3, $t1, $t2 # Check if signs differ

slt $t3, $t3, $zero # $t3 == 1 if signs differ

bne $t3, $zero, NO_OF # Branch depending on results

xor $t3, $t0, $t1 # Signs equal, does sign of sum match?

# $t3 negative if sum sign different

slt $t3, $t3, $zero # $t3 == 1 if sum sign different

bne $t3, $zero, OF # If all signs not same, goto overflow

Try it yourself with 6+2 for 4-bit adder arithmetic

6

Arithmetic for Multimedia

Graphics/Media processing operates on vectors of 8-bit and 16-bit

data

• Use 64-bit adder w/ partitioned carry chain

• Operate on 8x8-bit, 4x16-bit, or 2x32-bit vectors

• SIMD (single-instruction, multiple-data)

Saturating Operations

• On overflow, result is largest representable value

• Clipping in audio, saturation in video

7

Multiplication

How do you multiply?

Consider how you multiplied when you were little!

3 x 5 = 3 + 3 + 3 + 3 + 3 = 15

How many operations?

8

Multiplication

Long-Multiplication Approach, how you multiplied next:

13

x 22

=====

26

+260

=====

286

1000

x 1001

=======

1000

0000

0000

+1000

========

1001000

How many operations?

9

Multiplication in Hardware

10

Optimized Multiplier

Perform steps in parallel: add/shift

Once cycle per-partial product addition

OK if frequency of multiplication is low! 11

Faster Multiplier

Use multiple adders!

• Cost/performance tradeoff

Can be pipelined – several multiplications performed in parallel

12

MIPS Multiplication

Two 32-bit registers for product

• HI: Most significant 32-bits

• LO: least-significant 32-bits

Instructions:

• mult rs, rt / multu rs, rt

• 64-bit product in HI/LO

• mfhi rd / mflo rd

• Move from HI/LO to rd

• Can test HI value to see if product overflows 32 bits

• mul rd, rs, rt – least significatn 32 bits of product in rd

13

Division

How do you divide?

1. Check for 0 divisor

2. Long division approach

• If divisor ≤ dividend bits – 1 in quotient, subtract

• Otherwise – O bit in quotient, bring down next dividend bit

3. Restoring division – i.e. HW doesn’t know if it can divide, sotry then restore if wrong

• Do subtraction, and if remainder goes < 0, add divisor back

4. Signed Division

• Divide using absolute values

• Adjust sign of quotient and remainder as required

14

Division

Example:

Divide 74 by 8 = 9.25

1. 1001 < 1001 = 1 in quotient

2. 1000 > 10 = 0 in quotient

3. 1000 > 101 = 0 in quotient

4. 1000 < 1010 = 1 in quotient

5. Remainder = 10

15

Division in Hardware

Divisor reg, ALU, Remainder all

64 bits wide (remainder should

be 65 for carry out)

16

Optimized Divider

By shifting operands and quotient simultaneously with subtraction,

algorithm & hardware can be refined

Combine quotient register with right half of remainder register

17

Faster Division

Cannot use parallel hardware as in multiplier

Subtraction conditional on sign of remainder

Faster dividers (e.g. SRT division) generate multiple quotient bits

per step

• Still requires multiple steps

18

MIPS Division

Use HI/LO registers for result

• HI: 32-bit remainder

• LO: 32-bit quotient

Instructions:

• div rs, rt & divu rs, rt

• No overflow or divide-by-0 checking!

• Software must perform checks if required

• Use mfhi, mflo to access result

19

Signed Division

It is easier to assume absolute values & restore sign

Caveat: Must keep track of remainder!

Division must always satisfy the following:

Dividend = Quotient × Divisior + Remainder

Consider -7÷2 == -3 remainder -1

Or: -7÷2 = -4 remainder +1

Quotient cannot depend on sign of dividend/divisor:

−(x ÷ y) = (−x)÷ y – Must hold

Enforce that dividend and remainder must have the same signs

20

Signed Division

Takeaway:

Correct signed division negates quotient if signs of operands are

opposite

Makes sign of nonzero remainder match the dividend

21

Floating Point

Floating Point

Questions:

How do you represent non-integer values?

How do you represent very small or very large numbers?

Consider scientific notation:

−2.34× 1056 – normalized

0.002× 10−4 – not normalized

987.02× 109 – not normalized

22

Floating Point

Scientific notation in binary

±1.xxxxxxxx2 × 2yyyy

Types – Float/double in C

Floating Point Standard – Defined by IEEE Std. 754-1985

Developed in response to divergence of representations

• Portability for scientific code

Almost universally adopted

Two representations – single precision (32 bits), double precision

(64 bits)

23

IEEE Floating-Point Format

Single: Exponent = 8 bits, Fraction = 23 bits

Double: Exponent = 11 bits, Fraction = 52 bits

x = (−1)s × (1 + Fraction)× 2Exponent−Bias

• S: sign bit (0 = non-negative, 1 = negative)

• Normalize significand: 1.0 ≤ |significand | < 2.0

• Always has leading pre-binary-point 1 bit, no need to represent

explicitly

• Significand is teh Fraction with the 1. restored

• Exponent – Excess representation: actual exponent + Bias

• Ensures exponent is unsigned

• Singe: Bias = 127; Double: Bias = 1203

24

Single Precision




What is the range of a single precision float?

25

Single Precision Range




Smallest Value:

• Exponent = 0b00000001 == 1-127 = -126

• Fraction = 0b00000000000000000000000 == significand =

1.0

• Result = ±1.0× 2−126 ≈ ±1.2× 10−38

26

Single Precision Range




Largest Value:

• Exponent = 0b11111110 == 254-127 = +127

• Fraction = 0b11111111111111111111111 == significand ≈2.0

• Result = ±2.0× 2+127 ≈ ±3.4× 1038

27

Double Precision Range




Smallest Value:

• Exponent = 0b00000000001 == 1-1023 = -1022

• Fraction = 0b00...0000 == significand = 1.0

• Result = ±1.0× 2−1022 ≈ ±2.2× 10−308

28

Double Precision Range




Largest Value:

• Exponent = 0b11111111110 == 2046-1023 = +1023

• Fraction = 0b11...1111 == significand ≈ 2.0

• Result = ±2.0× 2+1023 ≈ ±1.8× 10308

29

Floating-Point Precision

Relative Precision

• All fraction bits are significant

• Single: approximately 2−23

• Equivalent to 23× log10 2 ≈ 23× 0.3 ≈ 6-decimal digits of

precision

• Double: approximately 2−52

• Equivalent to 52× log10 2 ≈ 52× 0.3 ≈ 16-decimal digits of

precision

30

Floating Point Example

Represent -0.75

• −0.75 = (−1)1 × 1.12 × 2−1

• S = 1

• Fraction = 1000..002

• Exponent = -1 + Bias

• Single: -1 + 127 = 126 =011111102

• Double: -1 + 1023 = 1022 = 0111111111102

Single: 1011111101000000...000000

Double: 1011111111110100000...000

31

Floating Point Example

Let’s go the other way:

What number is represented by single precision float:

11000000101000000...000000

• S = 1

• Fraction = 01000..002

• Exponent = 100000012 = 129 == 129-127

x = (−1)1 × (1 + 0.012)× 2129−127

x = (−1)× (1.25)× 22

x = −5.0

32

IEEE 754 Encoding

Why are exponent 0 and exponent max special?

Special encodings:

Single Precision Double Precision Object Represented

Exponent Fraction Exponent Fraction

0 0 0 0 0

0 Nonzero 0 Nonzero ±denormalized number

1-254 Anything 1-2046 Anything ±floating-point number

255 0 2047 0 ±infinity

255 Nonzero 2047 Nonzero NaN (Not a Number)

33

Denormal Numbers

Notice that if the exponent is zero, that produces a “denormalized

number”

This means that the “hidden bit” (i.e. the leading 1 in the

fraction), is instead a 0

x = (−1)S × (0 + 0.Fraction)× 2−Bias

Smaller than normal numbers

• Allows for gradual underflow with diminishing precision

34

Infinity & NaNs

Exponent = 111...1, Fraction = 000000

• ± infinity

• Can be used in subsequent calculations, avoiding need for

overflow check

Exponent = 111...1, Fraction 6= 000...0

• Not-a-Number (NaN)

• Indicates illegal or undefined result (e.g. 0.0/0.0)

• Can be used in subsequent calculations

35

Floating Point Arithmetic

Floating-Point Addition (Decimal)

Consider a 4-digit decimal example1:

9.999× 101 + 1.610× 10−1

How do we do this?

1. Align decimal points

• Shift number with smaller exponent

• 9.999× 101 + 0.016× 101

2. Add significands

• 9.999× 101 + 0.016× 101 = 10.015× 101

3. Normalize result, check for over/underflow

• 1.0015× 102

4. Round and renormalize if necessary

• 1.002× 102

1Examples directly from book so you can follow along

36

Floating-Point Addition (Binary)

Consider a 4-digit binary example:

1.000× 2−1 +−1.1102 × 2−2//(0.5 +−0.4375)

How do we do this?

1. Align binary points

• Shift number with smaller exponent

• 1.0002 × 2−1 +−0.1112 × 2−1

2. Add significands

• 1.0002 × 2−1 +−0.1112 × 2−1 = 0.0012 × 2−1

3. Normalize result, check for over/underflow

• 1.0002 × 2−4, with no over/underflow

4. Round and renormalize if necessary

• 1.000× 2−4 (no change) = 0.0625

37

Floating Point Adder Hardware

Much more complex than integer

add/subtraction

Doing it in one clock cycle would take

too long

• Much longer than integer

operations

• Slower clock would penalize all

instructions

FP Adder usually takes several cycles

• Can be pipelined

38

Floating Point Adder Hardware

39

What about Multiplication?

FP multiplier is similar complexity to FP adder

• Uses a multiplier for significands instead of an adder

FP arithmetic hardware usually does

• Addition, subtraction, multiplication, division, reciprocal,

square-root

• FP ↔ integer conversion

40

FP Instructions in MIPS

FP Hardware is coprocessor 1

• Adjunct processor that extends the ISA

Separate FP registers

• 32 single-precision: $f0, $f1, ..., $f31

• Paired for double-precision: $f0/$f1, $f2/$f3, ...

• Release 2 of MIPS ISA supports 32x64-bit FP registers

FP instructions operate only on FP registers

• Programs generally don’t do integer operations on FP data or

vise versa

• More registers with minimal code-size impact

FP load and store instructions

• lwc1, ldc1, swc1, sdc1 41

FP Instructions in MIPS

Single-precision arithmetic

• add.s, sub.s, mul.s, div.s

• e.g. add.s $f0, $f1, $f6

Double-precision arithmetic

• add.d, sub.d, mul.d, div.d

Single and double-precision comparison

• c.xx.s, c.xx.d (xx is eq, lt, le, ...)

• Sets or clears FP condition-code bit

Branch on FP condition code true or false

• bc1t, bc1f – branch on true/fasle

42

FP Example: Fahrenheit to Celsius

C Code:

float f2c (float fahr){

return ((5.0/9.0)*(fahr-32.0));

}

fahr in $f12, result in $f0, literals in global memory space

f2c: lwc1 $f16, const5($gp)

lwc2 $f18, const9($gp)

div.s $f16, $f16, $f18

lwc1 $f18, const32($gp)

sub.s $f18, $f12, $f18

mul.s $$f0, $f16, $f18

jr $ra

43

Accurate Arithmetic

IEEE Std 754 specifies additional rounding control

• Extra bits of precision (guard, round, sticky)

• Choice of rounding modes

• Allows programmer to fine-tune numerical behavior of

computation

Not all FP units implement all options

• Most programming languages and FP libraries just use defaults

Trade-off between hardware complexity, performance, and market

requirements

44

Who cares about FP Accuracy?

Important for scientific code!

What about everyday consumer use?

• Financial interactions on large scales (e.g. stock market

transfers)

• Video games (miss your target because of rounding issues)

Bottom line: The market expects accuracy from computers, and

generally doesn’t understand precision

45

arithmetic for computerscsce.uark.edu/.../lecture8-arithmetic-for-computers.pdf · 2020-02-20 ·...

Documents