chapter 3 arithmetic for computers(revised) (1)

Upload: yrikki

Post on 03-Apr-2018

237 views

Category:

Documents


1 download

TRANSCRIPT

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    1/64

    Chapter 3Arithmetic for Computers

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    2/64

    Chapter 3 Arithmetic for Computers 2

    Arithmetic for Computers

    Operations on integers Addition and subtraction

    Multiplication and division

    Dealing with overflow

    Floating-point real numbers

    Representation and operations

    3.1Introduction

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    3/64

    3

    Arithmetic

    Where we've been: Performance (seconds, cycles, instructions)

    Abstractions:Instruction Set ArchitectureAssembly Language and Machine Language

    What's up ahead:

    Implementing the Architecture

    32

    32

    32

    operation

    result

    a

    b

    ALU

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    4/64

    4

    Bits are just bits (no inherent meaning) conventions define relationship between bits and numbers

    Binary numbers (base 2)0000 0001 0010 0011 0100 0101 0110 0111 1000 1001...decimal: 0...2n-1

    Of course it gets more complicated:

    numbers are finite (overflow)fractions and real numbersnegative numberse.g., no MIPS subi instruction; addi can add a negative number)

    How do we represent negative numbers?i.e., which bit patterns will represent which numbers?

    Numbers

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    5/64

    5

    Sign Magnitude: One's Complement Two's Complement

    000 = +0 000 = +0 000 = +0001 = +1 001 = +1 001 = +1010 = +2 010 = +2 010 = +2011 = +3 011 = +3 011 = +3100 = -0 100 = -3 100 = -4101 = -1 101 = -2 101 = -3110 = -2 110 = -1 110 = -2111 = -3 111 = -0 111 = -1

    Issues: balance, number of zeros, ease of operations

    Which one is best? Why?

    Possible Representations

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    6/64

    6

    32 bit signed numbers:

    0000 0000 0000 0000 0000 0000 0000 0000two = 0ten0000 0000 0000 0000 0000 0000 0000 0001two = + 1ten0000 0000 0000 0000 0000 0000 0000 0010two = + 2ten...0111 1111 1111 1111 1111 1111 1111 1110two = + 2,147,483,646ten0111 1111 1111 1111 1111 1111 1111 1111two = + 2,147,483,647ten1000 0000 0000 0000 0000 0000 0000 0000two = 2,147,483,648ten1000 0000 0000 0000 0000 0000 0000 0001two = 2,147,483,647ten1000 0000 0000 0000 0000 0000 0000 0010two = 2,147,483,646ten...1111 1111 1111 1111 1111 1111 1111 1101two = 3ten1111 1111 1111 1111 1111 1111 1111 1110two = 2ten1111 1111 1111 1111 1111 1111 1111 1111two = 1ten

    maxint

    minint

    MIPS

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    7/64

    7

    Negating a two's complement number: invert all bits and add 1 remember: negate (+/-) and invert (1/0) are quite different!

    Converting n bit numbers into numbers with more than n bits:

    MIPS 16 bit immediate gets converted to 32 bits for arithmetic

    copy the most significant bit (the sign bit) into the other bits0010 -> 0000 0010

    1010 -> 1111 1010

    "sign extension" (lbu vs. lb)

    Two's Complement Operations

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    8/64

    8

    Just like in grade school (carry/borrow 1s)

    0111 0111 0110

    + 0110 - 0110 - 0101

    Two's complement operations easy

    subtraction using addition of negative numbers0111

    + 1010

    Overflow (result too large for finite computer word):

    e.g., adding two n-bit numbers does not yield an n-bit number0111

    + 0001 note that overf low term is somewhat misleading,

    1000 it does not mean a carry overflowed

    Addition & Subtraction

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    9/64

    9

    No overflow when adding a positive and a negative number

    No overflow when signs are the same for subtraction

    Overflow occurs when the value affects the sign:

    overflow when adding two positives yields a negative

    or, adding two negatives gives a positive

    or, subtract a negative from a positive and get a negative or, subtract a positive from a negative and get a positive

    Consider the operations A + B, and A B

    Can overflow occur if B is 0 ?

    Can overflow occur if A is 0 ?

    Detecting Overflow

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    10/64

    10

    An exception (interrupt) occurs

    Control jumps to predefined address for exception

    Interrupted address is saved for possible resumption

    Details based on software system / language

    example: flight control vs. homework assignment

    Don't always want to detect overflow

    new MIPS instructions: addu, addiu, subu

    note: addiu sti l l sign-extends!note: sltu, sltiu for unsigned comparisons

    Roll over: circular buffers Saturation: pixel l ightness contr ol

    Effects of Overflow

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    11/64

    11

    Problem: Consider a logic function with three inputs: A, B, and C.

    Output D is true if at least one input is trueOutput E is true if exactly two inputs are trueOutput F is true only if all three inputs are true

    Show the truth table for these three functions.

    Show the Boolean equations for these three functions.

    Show an implementation consisting of inverters, AND, and OR gates.

    Review: Boolean Algebra & Gates

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    12/64

    12

    Let's build an ALU to support the andi and ori instructions

    we'll just build a 1 bit ALU, and use 32 of them

    Possible Implementation (sum-of-products):

    b

    a

    operation

    result

    op a b res

    An ALU (arithmetic logic unit)

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    13/64

    13

    Appendix: C.5

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    14/64

    14

    Selects one of the inputs to be the output, based on a control input

    Lets build our ALU using a MUX:

    S

    CA

    B0

    1

    Review: The Multiplexor

    note: we call this a 2-input mux

    even though i t has 3 inputs!

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    15/64

    15

    Not easy to decide the best way to build something

    Don't want too many inputs to a single gate

    Dont want to have to go through too many gates

    for our purposes, ease of comprehension is important

    Let's look at a 1-bit ALU for addition:

    How could we build a 1-bit ALU for add, and, and or?

    How could we build a 32-bit ALU?

    Different Implementations

    cout = a b + a cin + b cinsum = a xor b xor cin

    Sum

    CarryIn

    CarryOut

    a

    b

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    16/64

    16

    Building a 32 bit ALU

    b

    0

    2

    Result

    Operation

    a

    1

    CarryIn

    CarryOut

    Result31

    a31

    b31

    Result0

    CarryIn

    a0

    b0

    Result1

    a1

    b1

    Result2

    a2

    b2

    Operation

    ALU0

    CarryIn

    CarryOut

    ALU1

    CarryIn

    CarryOut

    ALU2

    CarryIn

    CarryOut

    ALU31

    CarryIn

    R0 = a ^ b;

    R1 = a V b;

    R2 = a + b;

    Case (Op) {

    0: R = R0;

    1: R = R1;

    2: R = R2 }

    Case (Op) {

    0: R = a ^ b;

    1: R = a V b;

    2: R = a + b}

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    17/64

    17

    Two's complement approach: just negate b and add.

    How do we negate?

    A very clever solution:

    Result = A + (~B) + 1

    What about subtraction (a b) ?

    0

    2

    Result

    Operation

    a

    1

    CarryIn

    CarryOut

    0

    1

    Binvert

    b

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    18/64

    18

    Need to support the set-on-less-than instruction (slt) remember: slt is an arithmetic instruction

    produces a 1 if rs < rt and 0 otherwise

    use subtraction: (a-b) < 0 implies a < b

    Need to support test for equality (beq $t5, $t6, $t7) use subtraction: (a-b) = 0 implies a = b

    Tailoring the ALU to the MIPS

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    19/64

    Supporting slt

    Can we figure out the idea?0

    3

    Result

    Operation

    a

    1

    CarryIn

    CarryOut

    0

    1

    Binvert

    b 2

    Less

    a.

    0

    3

    Result

    Operation

    a

    1

    CarryIn

    0

    1

    Binvert

    b 2

    Less

    Set

    Overflowdetection

    Overflow

    b.

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    20/64

    20

    Set

    a31

    0

    ALU0 Result0

    CarryIn

    a0

    Result1

    a1

    0

    Result2

    a2

    0

    Operation

    b31

    b0

    b1

    b2

    Result31

    Overflow

    Binvert

    CarryIn

    Less

    CarryIn

    CarryOut

    ALU1

    Less

    CarryIn

    CarryOut

    ALU2

    Less

    CarryIn

    CarryOut

    ALU31

    Less

    CarryIn

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    21/64

    21

    Test for equality

    Notice control lines:

    000 = and

    001 = or

    010 = add

    110 = subtract

    111 = slt

    Note: zero is a 1 when the resul t is zero!

    Seta31

    0

    Result0a0

    Result1a1

    0

    Result2a2

    0

    Operation

    b31

    b0

    b1

    b2

    Result31

    Overflow

    Bnegate

    Zero

    ALU0

    Less

    CarryIn

    CarryOut

    ALU1

    Less

    CarryIn

    CarryOut

    ALU2

    Less

    CarryIn

    CarryOut

    ALU31

    Less

    CarryIn

    0

    3

    Result

    Operation

    a

    1

    CarryIn

    CarryOut

    0

    1

    Binvert

    b 2

    Less

    a.

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    22/64

    22

    Conclusion

    We can build an ALU to support the MIPS instruction set

    key idea: use multiplexor to select the output we want

    we can efficiently perform subtraction using twos complement

    we can replicate a 1-bit ALU to produce a 32-bit ALU

    Important points about hardware

    all of the gates are always working the speed of a gate is affected by the number of inputs to the gate

    the speed of a circuit is affected by the number of gates in series

    (on the critical path or the deepest level of logic)

    Our primary focus: comprehension, however,

    Clever changes to organization can improve performance(similar to using better algorithms in software)

    well look at two examples for addition and multiplication

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    23/64

    23

    Is a 32-bit ALU as fast as a 1-bit ALU?

    Is there more than one way to do addition?

    two extremes: ripple carry and sum-of-products

    Can you see the ripple? How could you get rid of it?

    c1 = b0c0 + a0c0 +a0b0 c2 = b1c1 + a1c1 +a1b1c3 = b2c2 + a2c2 +a2b2 c4 = b3c3 + a3c3 +a3b3

    Expanding the carry chainsc2 = a1a0b0+a1a0c0+a1b0c0+b1a0b0+b1a0c0+b1b0c0+a1b1

    c3 = ??? c4 = ???

    Not feasible! Why?

    Cn has 2P terms (P=0n)

    Problem: ripple carry adder is slow

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    24/64

    24

    Appendix: C.6

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    25/64

    25

    An approach in-between our two extremes

    Motivation:

    If we didn't know the value of carry-in, what could we do?

    When would we always generate a carry? gi = aibi When would we propagate the carry? pi = ai + bi

    Did we get rid of the ripple?

    c1 = g0 + p0c0 c2 = g1 + p1c1c3 = g2 + p2c2 c4 = g3 + p3c3

    Expanding the carry chainsc2 = g1+p1g0+p1p0c0

    Feasible! Why?

    The carry chain does not disappear, but is much smaller:

    Cn has n+1 terms

    Carry-lookahead adder

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    26/64

    26

    How about constructing a 16-bit adder in CLA way?

    Cant build a 16 bit adder this way... (too big) Could use ripple carry of 4-bit CLA adders

    Better: use the CLA principle again!

    Use principle to build bigger adders

    CarryIn

    Result0--3

    ALU0

    CarryIn

    Result4--7

    ALU1

    CarryIn

    Result8--11

    ALU2

    CarryIn

    CarryOut

    Result12--15

    ALU3

    CarryIn

    C1

    C2

    C3

    C4

    P0G0

    P1

    G1

    P2G2

    P3G3

    pigi

    pi + 1

    gi + 1

    ci + 1

    ci + 2

    ci + 3

    ci + 4

    pi + 2gi + 2

    pi + 3gi + 3

    a0

    b0a1b1a2b2a3b3

    a4b4a5b5a6

    b6a7b7

    a8b8a9b9

    a10b10a11b11

    a12b12a13b13a14b14a15b15

    Carry-lookahead unit

    P0 = p3 p2 p1 p0

    P1 = p7 p6 p5 p4

    P2 =p11 p10 p9 p8

    P3 = p15 p14 p13 p12

    G0 = g3+p3g2+p3p2g1+p3p2p1g0

    G1 = g7+p7g6+p7p6g5+p7p6p5g4

    G2 = g11+p11g10+p11p10g9+p11p10p9g8

    G3 = g15+p15g14+p15p14g13+p15p14p13g12

    C1 = G0 +(P0

    c0)

    C2 = G1+(P1 G0)+(P1P0c0)

    C3 = G2+(P2 G1)+(P2P1G0)+(P2P1P0c0)

    C4 = G3+(P3 G2)+(P3P2G1)+(P3P2P1G0)

    +(P3P2P1P0c0)

    3

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    27/64

    Chapter 3 Arithmetic for Computers 31

    Multiplication

    Start with long-multiplication approach

    1000 1001

    10000000000010001001000

    Length of product isthe sum of operandlengths

    multiplicand

    multiplier

    product

    3.3Multiplica

    tion

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    28/64

    Chapter 3 Arithmetic for Computers 32

    Multiplication Hardware

    Initially 0

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    29/64

    Chapter 3 Arithmetic for Computers 33

    Optimized Multiplier

    Perform steps in parallel: add/shift

    One cycle per partial-product addition

    Thats ok, if frequency of multiplications is low

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    30/64

    Chapter 3 Arithmetic for Computers 34

    Faster Multiplier

    Uses multiple adders

    Cost/performance tradeoff

    Can be pipelined

    Several multiplication performed in parallel

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    31/64

    Chapter 3 Arithmetic for Computers 35

    MIPS Multiplication

    Two 32-bit registers for product

    HI: most-significant 32 bits

    LO: least-significant 32-bits

    Instructions

    mult rs, rt / multu rs, rt 64-bit product in HI/LO

    mfhi rd / mflo rd

    Move from HI/LO to rd

    Can test HI value to see if product overflows 32 bits mul rd, rs, rt

    Least-significant 32 bits of product> rd

    3

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    32/64

    Chapter 3 Arithmetic for Computers 36

    Division

    Check for 0 divisor

    Long division approach If divisor dividend bits

    1 bit in quotient, subtract

    Otherwise 0 bit in quotient, bring down next

    dividend bit

    Restoring division Do the subtract, and if remainder

    goes < 0, add divisor back

    Signed division Divide using absolute values

    Adjust sign of quotient and remainderas required

    100101000 10010100

    -1000101011010-1000

    100

    n-bit operands yield (n+1)-bitquotient and n-bit remainder

    quotient

    dividend

    remainder

    divisor

    3.4Division

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    33/64

    Chapter 3 Arithmetic for Computers 37

    Division Hardware

    Initially dividend

    Initially divisorin left half

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    34/64

    Chapter 3 Arithmetic for Computers 38

    Optimized Divider

    One cycle per partial-remainder subtraction Looks a lot like a multiplier!

    Same hardware can be used for both

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    35/64

    Chapter 3 Arithmetic for Computers 39

    Faster Division

    Cant use parallel hardware as in multiplier

    Subtraction is conditional on sign of remainder

    Faster dividers (e.g. SRT devision)generate multiple quotient bits per step

    Still require multiple steps

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    36/64

    Chapter 3 Arithmetic for Computers 40

    MIPS Division

    Use HI/LO registers for result

    HI: 32-bit remainder

    LO: 32-bit quotient

    Instructions

    div rs, rt / divu rs, rt

    No overflow or divide-by-0 checking

    Software must perform checks if required

    Use mfhi, mflo to access result

    3

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    37/64

    Chapter 3 Arithmetic for Computers 41

    Floating Point

    Representation for non-integral numbers

    Including very small and very large numbers

    Like scientific notation

    2.34 1056

    +0.002 104

    +987.02 109

    In binary

    1.xxxxxxx2 2yyyy

    Types float and double in C

    normalized

    not normalized

    3.5FloatingP

    oint

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    38/64

    Chapter 3 Arithmetic for Computers 42

    Floating Point Standard

    Defined by IEEE Std 754-1985

    Developed in response to divergence ofrepresentations

    Portability issues for scientific code

    Now almost universally adopted

    Two representations

    Single precision (32-bit) Double precision (64-bit)

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    39/64

    Chapter 3 Arithmetic for Computers 43

    IEEE Floating-Point Format

    S: sign bit (0 non-negative, 1 negative) Normalize significand: 1.0 |significand| < 2.0

    Always has a leading pre-binary-point 1 bit, so no need torepresent it explicitly (hidden bit)

    Significand is Fraction with the 1. restored

    Exponent: excess representation: actual exponent + Bias Ensures exponent is unsigned Single: Bias = 127; Double: Bias = 1203

    S Exponent Fraction

    single: 8 bits

    double: 11 bits

    single: 23 bits

    double: 52 bits

    Bias)(ExponentS 2Fraction)(11)(x

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    40/64

    Chapter 3 Arithmetic for Computers 44

    Single-Precision Range

    Exponents 00000000 and 11111111 reserved

    Smallest value

    Exponent: 00000001 actual exponent = 1 127 =126

    Fraction: 00000 significand = 1.0 1.0 2126 1.2 1038

    Largest value

    exponent: 11111110 actual exponent = 254 127 = +127

    Fraction: 11111significand 2.0

    2.0 2+127 3.4 10+38

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    41/64

    Chapter 3 Arithmetic for Computers 45

    Double-Precision Range

    Exponents 000000 and 111111 reserved

    Smallest value

    Exponent: 00000000001 actual exponent = 1 1023 =1022

    Fraction: 00000 significand = 1.0 1.0 21022 2.2 10308

    Largest value

    Exponent: 11111111110 actual exponent = 2046 1023 = +1023

    Fraction: 11111significand 2.0

    2.0 2+1023 1.8 10+308

    Fl i P i P i i

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    42/64

    Chapter 3 Arithmetic for Computers 46

    Floating-Point Precision

    Relative precision

    all fraction bits are significant

    Single: approx 223

    Equivalent to 23 log102 23 0.3 6 decimal

    digits of precision

    Double: approx 252

    Equivalent to 52 log102 52 0.3 16 decimaldigits of precision

    Fl ti P i t E l

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    43/64

    Chapter 3 Arithmetic for Computers 47

    Floating-Point Example

    Represent0.75

    0.75 = (1)1 1.12 21

    S = 1

    Fraction = 1000002

    Exponent =1 + Bias

    Single:1 + 127 = 126 = 011111102

    Double:1 + 1023 = 1022 = 011111111102

    Single: 101111110100000 Double: 101111111110100000

    Fl ti P i t E l

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    44/64

    Chapter 3 Arithmetic for Computers 48

    Floating-Point Example

    What number is represented by the single-precision float1100000010100000 S = 1

    Fraction = 01000002 Fxponent = 100000012 = 129

    x = (1)1 (1 + 012) 2(129 127)

    = (1) 1.25 22

    =5.0

    IEEE 754 di f fl ti i t b

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    45/64

    51

    IEEE 754 encoding of floating-point numbers

    Single precision Double precision Object represented

    Exponent Fraction Exponent Fraction

    0 0 0 0 0

    0 Nonzero 0 Nonzero Denomalized number

    1-254 Anything 1-2046 Anything Floating-point number

    255 0 2047 0 Infinity

    255 Nonzero 2047 Nonzero NaN (Not a Number)

    +-+-

    +-

    Fl ti P i t Additi

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    46/64

    Chapter 3 Arithmetic for Computers 52

    Floating-Point Addition

    Consider a 4-digit decimal example 9.999 101 + 1.610 101

    1. Align decimal points Shift number with smaller exponent

    9.999 101 + 0.016 101

    2. Add significands 9.999 101 + 0.016 101 = 10.015 101

    3. Normalize result & check for over/underflow

    1.0015 102

    4. Round and renormalize if necessary 1.002 102

    Fl ti P i t Additi

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    47/64

    Chapter 3 Arithmetic for Computers 53

    Floating-Point Addition

    Now consider a 4-digit binary example 1.0002 21 +1.1102 22 (0.5 +0.4375)

    1. Align binary points Shift number with smaller exponent

    1.0002 21 +0.1112 2

    1

    2. Add significands 1.0002 2

    1 +0.1112 21 = 0.0012 2

    1

    3. Normalize result & check for over/underflow

    1.0002 24

    , with no over/underflow 4. Round and renormalize if necessary

    1.0002 24 (no change) = 0.0625

    FP Add H d

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    48/64

    Chapter 3 Arithmetic for Computers 54

    FP Adder Hardware

    Much more complex than integer adder

    Doing it in one clock cycle would take toolong

    Much longer than integer operations

    Slower clock would penalize all instructions

    FP adder usually takes several cycles

    Can be pipelined

    FP Add H d

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    49/64

    Chapter 3 Arithmetic for Computers 55

    FP Adder Hardware

    Step 1

    Step 2

    Step 3

    Step 4

    FP A ith ti H d

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    50/64

    Chapter 3 Arithmetic for Computers 58

    FP Arithmetic Hardware

    FP multiplier is of similar complexity to FPadder But uses a multiplier for significands instead of

    an adder

    FP arithmetic hardware usually does Addition, subtraction, multiplication, division,

    reciprocal, square-root

    FP integer conversion

    Operations usually takes several cycles Can be pipelined

    FP I t ti i MIPS

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    51/64

    Chapter 3 Arithmetic for Computers 59

    FP Instructions in MIPS

    FP hardware is coprocessor 1 Adjunct processor that extends the ISA

    Separate FP registers 32 single-precision: $f0, $f1, $f31 Paired for double-precision: $f0/$f1, $f2/$f3,

    Release 2 of MIPs ISA supports 32

    64-bit FP regs FP instructions operate only on FP registers

    Programs generally dont do integer ops on FP data,or vice versa

    More registers with minimal code-size impact

    FP load and store instructions lwc1, ldc1, swc1, sdc1

    e.g., ldc1 $f8, 32($sp)

    FP I t ti i MIPS

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    52/64

    Chapter 3 Arithmetic for Computers 60

    FP Instructions in MIPS

    Single-precision arithmetic add.s, sub.s, mul.s, div.s

    e.g., add.s $f0, $f1, $f6

    Double-precision arithmetic add.d, sub.d, mul.d, div.d

    e.g., mul.d $f4, $f4, $f6

    Single- and double-precision comparison c.xx.s, c.xx.d (xxis eq, lt, le, ) Sets or clears FP condition-code bit

    e.g. c.lt.s $f3, $f4

    Branch on FP condition code true or false bc1t, bc1f

    e.g., bc1t TargetLabel

    FP E ample F to C

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    53/64

    Chapter 3 Arithmetic for Computers 61

    FP Example: F to C

    C code:

    float f2c (float fahr) {return ((5.0/9.0)*(fahr - 32.0));

    }

    fahr in $f12, result in $f0, literals in global memory

    space Compiled MIPS code:f2c: lwc1 $f16, const5($gp)

    lwc2 $f18, const9($gp)div.s $f16, $f16, $f18lwc1 $f18, const32($gp)sub.s $f18, $f12, $f18mul.s $f0, $f16, $f18jr $ra

    FP E l A M lti li ti

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    54/64

    Chapter 3 Arithmetic for Computers 62

    FP Example: Array Multiplication

    X = X + Y Z All 32 32 matrices, 64-bit double-precision elements

    C code:void mm (double x[][],

    double y[][], double z[][]) {

    int i, j, k;for (i = 0; i! = 32; i = i + 1)

    for (j = 0; j! = 32; j = j + 1)for (k = 0; k! = 32; k = k + 1)x[i][j] = x[i][j]

    + y[i][k] * z[k][j];}

    Addresses ofx, y, z in $a0, $a1, $a2, andi, j, k in $s0, $s1, $s2

    FP E l A M lti li ti

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    55/64

    Chapter 3 Arithmetic for Computers 63

    FP Example: Array Multiplication

    MIPS code:li $t1, 32 # $t1 = 32 (row size/loop end)

    li $s0, 0 # i = 0; initialize 1st for loop

    L1: li $s1, 0 # j = 0; restart 2nd for loop

    L2: li $s2, 0 # k = 0; restart 3rd for loop

    sll $t2, $s0, 5 # $t2 = i * 32 (size of row of x)

    addu $t2, $t2, $s1 # $t2 = i * size(row) + jsll $t2, $t2, 3 # $t2 = byte offset of [i][j]

    addu $t2, $a0, $t2 # $t2 = byte address of x[i][j]

    l.d $f4, 0($t2) # $f4 = 8 bytes of x[i][j]

    L3: sll $t0, $s2, 5 # $t0 = k * 32 (size of row of z)

    addu $t0, $t0, $s1 # $t0 = k * size(row) + jsll $t0, $t0, 3 # $t0 = byte offset of [k][j]

    addu $t0, $a2, $t0 # $t0 = byte address of z[k][j]

    l.d $f16, 0($t0) # $f16 = 8 bytes of z[k][j]

    FP E l A M lti li ti

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    56/64

    Chapter 3 Arithmetic for Computers 64

    FP Example: Array Multiplication

    sll $t0, $s0, 5 # $t0 = i*32 (size of row of y)addu $t0, $t0, $s2 # $t0 = i*size(row) + k

    sll $t0, $t0, 3 # $t0 = byte offset of [i][k]

    addu $t0, $a1, $t0 # $t0 = byte address of y[i][k]

    l.d $f18, 0($t0) # $f18 = 8 bytes of y[i][k]

    mul.d $f16, $f18, $f16 # $f16 = y[i][k] * z[k][j]

    add.d $f4, $f4, $f16 # f4=x[i][j] + y[i][k]*z[k][j]

    addiu $s2, $s2, 1 # $k k + 1

    bne $s2, $t1, L3 # if (k != 32) go to L3

    s.d $f4, 0($t2) # x[i][j] = $f4

    addiu $s1, $s1, 1 # $j = j + 1

    bne $s1, $t1, L2 # if (j != 32) go to L2addiu $s0, $s0, 1 # $i = i + 1

    bne $s0, $t1, L1 # if (i != 32) go to L1

    Interpretation of Data

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    57/64

    Chapter 3 Arithmetic for Computers 66

    Interpretation of Data

    Bits have no inherent meaning

    Interpretation depends on the instructions

    applied Computer representations of numbers

    Finite range and precision

    Need to account for this in programs

    The BIG Picture

    Associativity3.6

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    58/64

    Chapter 3 Arithmetic for Computers 67

    Associativity

    Parallel programs may interleaveoperations in unexpected orders

    Assumptions of associativity may fail

    6Parallelism

    andComputerArithmetic:Ass

    ociativity

    (x+y)+z x+(y+z)

    x -1.50E+38 -1.50E+38

    y 1.50E+38

    z 1.0 1.0

    1.00E+00 0.00E+00

    0.00E+00

    1.50E+38

    Need to validate parallel programs undervarying degrees of parallelism

    x86 FP Architecture3.7

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    59/64

    Chapter 3 Arithmetic for Computers 68

    x86 FP Architecture

    Originally based on 8087 FP coprocessor

    8 80-bit extended-precision registers

    Used as a push-down stack

    Registers indexed from TOS: ST(0), ST(1),

    FP values are 32-bit or 64 in memory Converted on load/store of memory operand

    Integer operands can also be convertedon load/store

    Very difficult to generate and optimize code Result: poor FP performance

    7RealStuff

    :FloatingPointinthex86

    x86 FP Instructions

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    60/64

    Chapter 3 Arithmetic for Computers 69

    x86 FP Instructions

    Optional variations I: integer operand P: pop operand from stack R: reverse operand order But not all combinations allowed

    Data transfer Arithmetic Compare TranscendentalFILD mem/ST(i)

    FISTP mem/ST(i)

    FLDPI

    FLD1

    FLDZ

    FIADDP mem/ST(i)

    FISUBRP mem/ST(i)FIMULP mem/ST(i)FIDIVRP mem/ST(i)

    FSQRT

    FABSFRNDINT

    FICOMP

    FIUCOMP

    FSTSW AX/mem

    FPATAN

    F2XMI

    FCOS

    FPTAN

    FPREM

    FPSIN

    FYL2X

    St i SIMD E t i 2 (SSE2)

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    61/64

    Chapter 3 Arithmetic for Computers 70

    Streaming SIMD Extension 2 (SSE2)

    Adds 4 128-bit registers

    Extended to 8 registers in AMD64/EM64T

    Can be used for multiple FP operands

    2 64-bit double precision

    4 32-bit double precision

    Instructions operate on them simultaneously

    Single-Instruction Multiple-Data

    Right Shift and Division3.8

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    62/64

    Chapter 3 Arithmetic for Computers 71

    Right Shift and Division

    Left shift by iplaces multiplies an integer

    by 2i Right shift divides by 2i?

    Only for unsigned integers

    For signed integers Arithmetic right shift: replicate the sign bit

    e.g.,5 / 4 11111011

    2>> 2 = 11111110

    2=2

    Rounds toward

    c.f. 111110112 >>> 2 = 001111102 = +62

    FallaciesandPitfalls

    Who Cares About FP Accuracy?

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    63/64

    Chapter 3 Arithmetic for Computers 72

    Who Cares About FP Accuracy?

    Important for scientific code

    But for everyday consumer use?

    My bank balance is out by 0.0002!

    The Intel Pentium FDIV bug

    The market expects accuracy

    See Colwell, The Pentium Chronicles

    Concluding Remarks3.9

  • 7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)

    64/64

    Concluding Remarks

    ISAs support arithmetic

    Signed and unsigned integers

    Floating-point approximation to reals

    Bounded range and precision

    Operations can overflow and underflow

    MIPS ISA

    Core instructions: 54 most frequently used

    100% of SPECINT, 97% of SPECFP

    Other instructions: less frequent

    Concludin

    gRemarks