lecture 2 data representation in computer systems lecture duration: 2 hours
TRANSCRIPT
Lecture 2
Data Representation inComputer Systems
Lecture Duration: 2 Hours
Prepared by Dr. Hassan SALTI - 2012 2
Lecture Overview
Introduction Positional Numbering System Decimal to binary conversion Signed integer representation Floating-point representation
Prepared by Dr. Hassan SALTI - 2012 3
Some Notifications – A reminder (1/2)Introduction
Bit: The most basic unit of information in a digital computer (On/Off ; 0/1 state)
Byte: A set of 8bits Word: two or more adjacent bytes that are
manipulated collectively Word size: The size of a word in bits depends on the
computer organization (16, 32, 64 bits, …) Nibbles (or nybbles): set of 4 bits – Usually a set of 8
bits is divided into two nibbles, a low order nibble and a high order nibble
Prepared by Dr. Hassan SALTI - 2012 4
Some notifications – A reminder (2/2)Introduction
Example:0 1 1 0 0 1 1 1 1 0 0 0 1 1 0 1
bit
byte byte
bit bit bit bit bit bit bit bit bit bit bit bit bit bit bit
Word (16 bit)
High Order nibble
High Order nibble
Low Order nibble
Low Order nibble
Most Significant bit
(MSB)
Least Significant bit
(LSB)
Prepared by Dr. Hassan SALTI - 2012 5
Lecture Overview
Introduction Positional Numbering System Decimal to binary conversion Signed integer representation Floating-point representation
Prepared by Dr. Hassan SALTI - 2012 6
Positional Numbering System (1/3)Positional Numbering System
Any numeric value is represented through increasing powers of a radix (or base)
The set of valid numerals (digits) is equal in size to the radix of that system
The least numeral is 0 and the highest one in 1 smaller than the radix
Example:• In the decimal system (base 10)
- The radix is 10- The number of valid numerals is 10 (equal to the radix)- The set of valid numerals is: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
Prepared by Dr. Hassan SALTI - 2012 7
Positional Numbering System (2/3)Positional Numbering System
The most important radices (bases) in computer science are:• Binary
- Radix 2 or base 2- Numerals: {0 , 1}
• Octal- Radix 8 or Base 8- Numerals: {0 , 1 , 2 , 3 , 4 , 5 , 6 , 7}
• Hexadecimal- Radix 16 or base 16- Numerals: {0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , A , B , C , D , E , F}
Prepared by Dr. Hassan SALTI - 2012 8
Positional Numbering System (3/3)Positional Numbering System
Any numeric value is represented through increasing powers of a radix (or base)
Examples• 43.5110 = 2x102 + 4x101 + 3x100 + 5x10-1 + 1x10-2
• 2123 = 2x32 + 1x31 + 2x30 = 2310
• 10110.012 = 1x24 + 0x23 + 1x22 + 1x21 + 0x20 + 0x2-
1 + 1x2-2= 22.2510
Prepared by Dr. Hassan SALTI - 2012 9
Lecture Overview
Introduction Positional Numbering System Decimal to binary conversion• Converting Unsigned Whole Numbers• Converting fractions• Converting between Power-of-Two Radices
Signed integer representation Floating-point representation
Prepared by Dr. Hassan SALTI - 2012 10
Some numbers to remember (1/1)Decimal to binary conversion
Keep in mind the following tables or how to obtain them!
Prepared by Dr. Hassan SALTI - 2012 11
Converting Unsigned Whole Numbers (1/6)Decimal to binary conversion
A real number can take any value (ex. 10323.7643 ; -16813.5322703)
Whole number: No fractions (ex: 10, 1231, 3543, …, -12, -12334,…)
Unsigned number: Only positive numbers (ex: 102313.43234, 1231.56234, 12357, …)
Unsigned whole numbers: No fraction and only positive numbers
Prepared by Dr. Hassan SALTI - 2012 12
Converting Unsigned Whole Numbers (2/6)Decimal to binary conversion
Convert the decimal number 11310 to binary: 11310 = 2
Method 1: Repeated subtraction113- 64 49- 32 17- 16 1- 1 0
1110001
11310 = 11100012
Prepared by Dr. Hassan SALTI - 2012 13
Converting Unsigned Whole Numbers (3/6)Decimal to binary conversion
Method 2: Division-remainder2 |113 2 |56 2 |28 2 |14 2 |7 2 |3 2 |1 0
11310 = 11100012
MSB
LSBRemainder 1Remainder 0Remainder 0Remainder 0Remainder 1Remainder 1Remainder 1
Prepared by Dr. Hassan SALTI - 2012 14
Converting Unsigned Whole Numbers (4/6)Decimal to binary conversion
A binary number with N bits can represent 2N unsigned integers from 0 to 2N-1
Example:• Having N=4 bits, we can
represent 24 = 16 unsigned integers from 0 to 24-1=16-1=15
• The number 16 CANNOT be represented with only 4 bits!!
Prepared by Dr. Hassan SALTI - 2012 15
Converting Unsigned Whole Numbers (5/6)Decimal to binary conversion
The subtraction method is cumbersome. The subtraction method requires a familiarity
with the powers of the radix being used. The division-remainder method is faster and
easier than the repeated subtraction method. The division-remainder method can be used
to convert from decimal to any other base system (not only to base 2).
Prepared by Dr. Hassan SALTI - 2012 16
Converting Unsigned Whole Numbers (6/6)Decimal to binary conversion
Example: Convert 10410 to base 3 using the division-remainder method.
3 |104 3 |34 3 |11 3 |3 3 |1 0
Remainder 2Remainder 1Remainder 2Remainder 0Remainder 1
10410 = 102123
Prepared by Dr. Hassan SALTI - 2012 17
Lecture Overview
Introduction Positional Numbering System Decimal to binary conversion• Converting Unsigned Whole Numbers• Converting fractions• Converting between Power-of-Two Radices
Signed integer representation Floating-point representation
Prepared by Dr. Hassan SALTI - 2012 18
Converting fractions (1/5)Decimal to binary conversion
Fractions in a decimal system can be converted/approximated to fractions in any other radix system
Radix points separate the integer part of a number from its fractional part
Example of fractions (the integer part is italic and the fractional part is bold)• Base 10 : 2390167.1208• Base 3 : 2012.11022• Base 2 : 1011110.111011
The “radix point” is called a “decimal point” in a decimal system, a “binary point” in a binary system, and so on…
Prepared by Dr. Hassan SALTI - 2012 19
Converting fractions (2/5)Decimal to binary conversion
To convert fractions from decimal to any other base system we repeatedly multiply by the destination radix
Example: Convert 0.430410 to base 5.
0.4304x 52.1520 The integer part is 20.1520x 50.7600 The integer part is 0x 53.8000 The integer part is 30.8000x 54.0000 The integer part is 4,
the fractional part is zero, we are done
0.430410 = 0.20345
Prepared by Dr. Hassan SALTI - 2012 20
Converting fractions (3/5)Decimal to binary conversion
Some fractions in one base could be indeterminate• Fractions that contain repeating strings of digits to the right of the
radix point• Example: (2/3)10=(0.666…)10
An indeterminate fraction in one base could be determinate in another base (and vice-versa).• Example: (2/3)10=0.23=(0.666…)10
- 2/3 is indeterminate in base 10 but determinate in base 3.
When a fraction is indeterminate, an approximation is needed• We fix the number of digits to the right of the radix point
Also, approximation is needed due to the limited computing resources (example: limited size of the processor’s registers)
Prepared by Dr. Hassan SALTI - 2012 21
Converting fractions (4/5)Decimal to binary conversion
Example: Convert 0.3437510 to binary with 4 bits to the right of the binary point.
0.34375 x 20.68750 x 21.375000.37500 x 20.75000 x 21.50000 This is our fourth bit.
We will stop here.
0.3437510 = 0.01012
Prepared by Dr. Hassan SALTI - 2012 22
Converting fractions (5/5)Decimal to binary conversion
Convert 26.78125 to binary:26.7812510 = 2
By using the methods just described we will have:
2610=110102 and 0.7812510=0.110012
So 26.7812510=11010.110012
Prepared by Dr. Hassan SALTI - 2012 23
Going back to positional numbering system (1/1)Decimal to binary conversion
Any unsigned whole or fractional number could be converted to decimal by using the “Positional Numbering System” described previously
Examples: 0.01012=0x2-1+1x2-2+0x2-3+1x2-4 = 0 + 0.25 + 0 +
0.0625 = 0.312510
134.20345 = 1x52 + 3x51 + 4x50 + 2x5-1 + 0x5-2 + 3x5-3 + 4x5-4 = 44.430410
Prepared by Dr. Hassan SALTI - 2012 24
Lecture Overview
Introduction Positional Numbering System Decimal to binary conversion• Converting Unsigned Whole Numbers• Converting fractions• Converting between Power-of-Two Radices
Signed integer representation Floating-point representation
Prepared by Dr. Hassan SALTI - 2012 25
Converting between Power-of-Two Radices (1/4)Decimal to binary conversion
To convert between any base to any other base (different than base 10), it is easier to pass through base 10.• Example: 31214= 3?
• First step: 31214 = 3x43 + 1x42 + 2x41 + 1x40=21710
• Second step: by using the division-remainder method: 21710 = 220013
• So 31214=220013
Working between bases that are powers of two is much more easier.
Prepared by Dr. Hassan SALTI - 2012 26
Converting between Power-of-Two Radices (2/4)Decimal to binary conversion
The must famous power-of-two radices are: binary (base 2), octal (base 23 / base 8) and hexadecimal (base 24 / base 16).
Each octal digit is equivalent to a group of 3 binary digits called octet1
Each hexadecimal digit is equivalent to a group of 4 binary digits called hextet
We convert from binary to octal and from binary to hexadecimal by simply grouping bits
1 The term “Octet” could also be used in the literature to describe a set of 8 bits.
Prepared by Dr. Hassan SALTI - 2012 27
Converting between Power-of-Two Radices (3/4)Decimal to binary conversion
Example: Convert 101100100111012 to octal
• Make Groups of 3 bits (from right to left):- 10 110 010 011 101
• Add zero(s) on the left to complete the last octet- 010 110 010 011 101
• Convert each octet to its corresponding octal digit- 010 110 010 011 101 2 6 2 3 5
• Finally: 101100100111012 = 262358
Prepared by Dr. Hassan SALTI - 2012 28
Converting between Power-of-Two Radices (4/4)Decimal to binary conversion
Example: Convert 101100100111012 to hexadecimal
• Make Groups of 4 bits (from right to left):- 10 1100 1001 1101
• Add zero(s) on the left to complete the last hextet- 0010 1100 1001 1101
• Convert each hextet to its corresponding hexadecimal digit
- 0010 1100 1001 1101 2 C 9 D
• Finally: 101100100111012 = 2C9D16
Prepared by Dr. Hassan SALTI - 2012 29
Lecture Overview
Introduction Positional Numbering System Decimal to binary conversion Signed integer representation• Signed Magnitude• Complement system
Floating-point representation
Prepared by Dr. Hassan SALTI - 2012 30
Signed integer representationSigned integer representation
An integer is a whole number Signed integers are the set of positive and
negative whole numbers How should we encode and deal with the
actual sign of the number? Two concepts are used• Signed Magnitude concept• Complement concept
Prepared by Dr. Hassan SALTI - 2012 31
Signed Magnitude (1/13)Signed integer representation
Signed magnitude is the most intuitive method
The MSB (Most Significant Bit) of a binary number is kept as the “sign” of the number• MSB = 1: negative number• MSB = 0: positive number
The remaining bits represent the magnitude (or absolute value) of the numeric value
Prepared by Dr. Hassan SALTI - 2012 32
Signed Magnitude (2/13)Signed integer representation
Example: In a 8 bit word signed magnitude system give the decimal representation of the following numbers
• 00000001?- The MSB is 0: The number is positive- The remaining 7 bits are: 00000012 = 110
- The decimal number is +1
• 10000001?- The MSB is 1: The number is negative- The remaining 7 bits are: 00000012 = 110
- The decimal number is -1
Prepared by Dr. Hassan SALTI - 2012 33
Signed Magnitude (3/13)Signed integer representation
Example: In a 8 bit word signed magnitude system give the decimal representation of the following numbers
• 10001001?- The MSB is 1: The number is negative- The remaining 7 bits are: 00010012 = 910
- The decimal number is -9
• 01000001?- The MSB is 0: The number is positive- The remaining 7 bits are: 10000012 = 6510
- The decimal number is +65
Prepared by Dr. Hassan SALTI - 2012 34
Signed Magnitude (4/13)Signed integer representation
In a N bit word signed magnitude system• 1 bit is used for the sign of the number• N-1 bits are used for the magnitude of the number• The largest integer is 2N-1 - 1• The smallest integer is -(2N-1 - 1)
Example: in a 8 bit word signed magnitude system• The largest integer is 011111112 = 27-1 = 12710
• The smallest integer is 111111112 = -(27-1) = -12710
Prepared by Dr. Hassan SALTI - 2012 35
Signed Magnitude (5/13)Signed integer representation
Computers should be able to carry out mathematical operations
Signed-magnitude arithmetic is carried out using essentially the same methods as humans• At first we look at the signs of the two operands• We arrange the operands in a certain way based on
their signs• We perform the calculation without regard to the
signs• Finally, we supply the sign as appropriate
Prepared by Dr. Hassan SALTI - 2012 36
Signed Magnitude (6/13)Signed integer representation
Adding operands that have the same sign Example: Add 010011112 to 001000112 using
signed-magnitude arithmetic. 1 1 1 1 carries⇐
0 1 0 0 1 1 1 1 (79)0 + 0 1 0 0 0 1 1 + (35)0 1 1 1 0 0 1 0 (114)
We find 010011112 + 001000112 = 011100102 in signed-magnitude representation.
Sign
Prepared by Dr. Hassan SALTI - 2012 37
Signed Magnitude (7/13)Signed integer representation
Overflow condition• In the last example, adding the seventh’ bits to
the left gives no carry• If there is a carry, we say that we have an
overflow condition and the carry is discarded, resulting in an incorrect sum.
Example: Add 010000012 to 011000012 using signed-magnitude arithmetic
Prepared by Dr. Hassan SALTI - 2012 38
Signed Magnitude (8/13)Signed integer representation
1 1 carries⇐0 1 0 0 0 0 0 1 (65)0 + 1 1 0 0 0 0 1 + (97)0 0 1 0 0 0 1 0
The addition overflows The last carry is discarded The sum’s result is incorrect
X
(34)
Prepared by Dr. Hassan SALTI - 2012 39
Signed Magnitude (9/13)Signed integer representation
Signed-magnitude subtraction is carried out in a manner similar to pencil and paper decimal arithmetic
Example 1: Subtract 010011112 (79) from 011000112 (99) using signed-magnitude arithmetic.
0 1 1 2 borrows⇐0 1 1 0 0 0 1 1 (99)0 - 1 0 0 1 1 1 1 (79)0 0 0 1 0 1 0 0 (20)
We find 011000112 - 010011112 = 000101002 in signed-magnitude representation.
Prepared by Dr. Hassan SALTI - 2012 40
Signed Magnitude (10/13)Signed integer representation
Example 2: Subtract 011000112 (99) from 010011112 (79) using signed-magnitude arithmetic.• Here the subtrahend, 01100011, is larger than the
minuend, 01001111. • With the result obtained in Example 2.12, we know
that the difference of these two numbers is 00101002.• Because the subtrahend is larger than the minuend, all
that we need to do is change the sign of the difference.• So we find 010011112 - 011000112 = 100101002 in
signed-magnitude representation
Prepared by Dr. Hassan SALTI - 2012 41
Signed Magnitude (11/13)Signed integer representation
Example 3: Add 100100112 (-19) to 000011012 (+13) using signed-magnitude arithmetic.• The result is negative• We subtract 13 from 19• The result of the binary subtraction is: 100001102 (-6)
Example 4: Subtract 100110002 (-24) from 101010112 (-43) using signed-magnitude arithmetic.• This is equivalent to adding -43 to 24• The result is negative• We subtract 24 from 43• The result of the binary subtraction is: 100100112 (-19)
Prepared by Dr. Hassan SALTI - 2012 42
Signed Magnitude (12/13)Signed integer representation
General rules when operands have different signs• Determine which operand has the larger
magnitude• The sign of the result is the same as the sign of
the operand with the larger magnitude• the magnitude must be obtained by subtracting
(not adding) the smaller one from the larger one
Prepared by Dr. Hassan SALTI - 2012 43
Signed Magnitude (13/13)Signed integer representation
Problems related to signed magnitude• To much decisions to make (larger number? ;
borrows? ; what signs?).• The number 0 could have two representations :
10000000 and 00000000.• Complicated method• Expensive circuits
Prepared by Dr. Hassan SALTI - 2012 44
Lecture Overview
Introduction Positional Numbering System Decimal to binary conversion Signed integer representation• Signed Magnitude• Complement system
Floating-point representation
Prepared by Dr. Hassan SALTI - 2012 45
Complement system (1/19) Signed integer representation
Complement system is used to represent/convert negative numbers only
When using complement system the subtraction is converted to an addition
Advantages of complement system• Simplify computer arithmetic• No need to process sign bits separately• The sign of a number is easily checked by looking
at its high-order bit (MSB).
Prepared by Dr. Hassan SALTI - 2012 46
Complement system (2/19) Signed integer representation
In base 10, “Casting out 9s” was used to subtract numbers
Let’s say we wanted to find 167 - 52• At first, 999 - 52 is calculated
999 – 52 = 947• 947 is then added to 167 and the last carry is added to the
sum:167 – 52 = 167 + 947 = 114 + 1 = 115
a
1 1 1
1 6 7
+ 9 4 7
1 1 4
Carries:
Prepared by Dr. Hassan SALTI - 2012 47
Complement system (3/19) Signed integer representation
The last method uses a “diminished radix complement”
Working in base r (radix), the diminished radix is given by : r-1
Example: Base 10 ; r=10• The diminished radix is r-1 = 10 - 1 = 9• We say that a negative number is converted to its 9’s
complement • For example, -246810 is converted to its nine’s
complement as follows: -246810 = 9999 - 2468 = 7531C9
Prepared by Dr. Hassan SALTI - 2012 48
Complement system (4/19) Signed integer representation
In a binary system r=2• The diminished radix complement is r-1 = 1• We say that we work in one’s complement (C1)• To convert a negative number to its one’s complement
this number is subtracted from all ones• A positive number is directly converted to its binary
representation• Example:
- The one’s complement of 01012 is 11112 - 01012 = 1010C1
- It is nothing more than switching all of the 1s with 0s and vice versa!!
Prepared by Dr. Hassan SALTI - 2012 49
Complement system (5/19) Signed integer representation
Example: Express 2310 and -910 in 8-bit binary one’s complement form.
2310 = + (000101112) = 00010111C1
-910 = - (000010012) = 11110110C1
Prepared by Dr. Hassan SALTI - 2012 50
Complement system (6/19) Signed integer representation
In one’s compliment the subtraction is converted into addition• Example: 2310 – 910 = 2310 + (-910)
Example: Add 2310 to -910 using 8-bit binary one’s complement arithmetic.
The result is 00001110C1 = +(000011102) = 1410
1 1 1 1 1 1
0 0 0 1 0 1 1 1 2310
+ 1 1 1 1 0 1 1 0 + (-910)
0 0 0 0 1 1 0 1 1410
Carries:
Prepared by Dr. Hassan SALTI - 2012 51
Complement system (7/19) Signed integer representation
Example: Add 910 to -2310 using 8-bit binary one’s complement arithmetic.
-2310 = - (00010111)2 = 11101000C1
910 = + (000010012) = 00001001C1
910 + (-2310) = 11101000C1 + 00001001C1
Result: 11110001C1 = -(000011102) = -1410
0 0 0 0 1 0 0 0
0 0 0 0 1 0 0 1 910
+ 1 1 1 0 1 0 0 0 + (-2310)
1 1 1 1 0 0 0 1 -1410
Carries:
Prepared by Dr. Hassan SALTI - 2012 52
Complement system (8/19) Signed integer representation
In One’s complement, we still have two representations for zero: 00000000 and 11111111
Computer engineers long ago stopped using one’s complement
A more efficient representation for binary numbers is the two’s complement
Prepared by Dr. Hassan SALTI - 2012 53
Complement system (9/19) Signed integer representation
Two’s complement is an example of a radix complement
No need to subtract one from the radix r when working in a radix complement.
Example: Base 10 ; r=10• We say that a negative number is converted to its 10’s
complement • For example, -246810 is converted to its ten’s
complement as follows: -246810 = 10000 - 2468 = 7532C10
Prepared by Dr. Hassan SALTI - 2012 54
Complement system (10/19) Signed integer representation
In a binary system r=2• The diminished radix r = 2• We say that we work in two’s complement• Consider “d” is the number of digits• To convert a negative number “N” to its two’s
complement this number is subtracted from rd = 2d : N10 = (2d – N)C2
• A positive number is directly converted to its binary representation
Prepared by Dr. Hassan SALTI - 2012 55
Complement system (11/19) Signed integer representation
Example:• In a 4 bits system: d=4;• All negative numbers are converted by being
subtracted from 2d = 24 = 1610 = 100002
• The two’s complement of 00112 is 100002 - 00112 = 1101C2
• It is nothing more than one’s complement incremented by 1!!
Prepared by Dr. Hassan SALTI - 2012 56
Complement system (12/19) Signed integer representation
Example: Express 2310, -2310, and -910 in 8-bit binary two’s complement form.• 2310 = + (000101112) = 000101112
• -2310 = -(000101112) = 111010002 + 1 = 111010012
• -910 = -(000010012) = 111101102 + 1 = 111101112
Prepared by Dr. Hassan SALTI - 2012 57
Complement system (13/19) Signed integer representation
Unlike C1 arithmetic, in C2 the last carry is discarded
Example 1: Add 910 to -2310 using two’s complement arithmetic.
The result is 11110010C2 = -(000011102) = -1410
Carries: 0 0 0 0 1 0 0 1
0 0 0 0 1 0 0 1 910
+ 1 1 1 0 1 0 0 1 + (-2310)
1 1 1 1 0 0 1 0 -1410
Prepared by Dr. Hassan SALTI - 2012 58
Complement system (14/19) Signed integer representation
Note how a negative binary number in C2 is converted to decimal• At first all 0 and 1 in the C2’s number are
switched: 11110010 → 00001101• A “1” is then added to the last number:
00001101+1 = 00001110• So 11110010C2 = -(000011102) = -1410
Prepared by Dr. Hassan SALTI - 2012 59
Complement system (15/19) Signed integer representation
Example 2: Find the sum of 2310 and -910 in binary using two’s complement arithmetic.
2310 = +(00010111)2 = 00010111C2
-910 = -(000010012) = 11110111C2
2310 + (-910) = 00010111C2 + 11110111C2
Result: 00001110C2 = +(000011102) = 1410
1 1 1 1 0 1 1 1
0 0 0 1 0 1 1 1 2310
+ 1 1 1 1 0 1 1 1 + (-910)
0 0 0 0 1 1 1 0 -1410
Carries:
Prepared by Dr. Hassan SALTI - 2012 60
Complement system (16/19) Signed integer representation
Advantages of two’s complement• It is the most popular choice for representing
signed numbers• The algorithm for adding and subtracting is quite
easy• It has the best representation for 0 (all 0 bits)• It is self-inverting• It is easily extended to larger numbers of bits.
Prepared by Dr. Hassan SALTI - 2012 61
Complement system (17/19) Signed integer representation
Drawback• the asymmetry seen in the range of values that
can be represented by N bits.• Examples:
- With signed-magnitude, 4 bits allow us to represent the values -7 (11112) through +7 (01112).
- Using two’s complement, we can represent the values: -8 (1000C2) through +7 (0111C2)
Prepared by Dr. Hassan SALTI - 2012 62
Complement system (18/19) Signed integer representation
Overflow in complement systems (C1 and C2)• An overflow occurs if two positive numbers are
added and the result is negative• or if two negative numbers are added and the
result is positive.• It is not possible to have overflow when if a
positive and a negative number are being added together.
Prepared by Dr. Hassan SALTI - 2012 63
Complement system (19/19) Signed integer representation
To Detect Overflow• Check the last two carries
- If these are different: there is an overflow- If these are equal: there is no overflow
Example 1: Find the sum of 12610 and 810 in binary using two’s complement arithmetic.
The result is 10000110C2 = -(01111010)2 = -12210!!!
Note that the last two carries are different
0 1 1 1 1 1 1 0
0 1 1 1 1 1 1 0 12610
+ 0 0 0 0 1 0 0 0 + 810
1 0 0 0 0 1 1 0 -1410
Carries:
Prepared by Dr. Hassan SALTI - 2012 64
Lecture Overview
Introduction Positional Numbering System Decimal to binary conversion Signed integer representation Floating-point representation• A simple model• Floating-point arithmetic• Floating point errors
Prepared by Dr. Hassan SALTI - 2012 65
Floating-point representation (1/1)Floating-point representation
A computer is supposed to solve all problems Huge and fractional numbers and complicated
mathematical operations could be involved An optimized solution to give a good ratio:
“Biggest Number/word size” is the Floating point representation
Prepared by Dr. Hassan SALTI - 2012 66
Computers use a form of scientific notation for floating-point representation
Numbers written in scientific notation have three components:
Scientific notation in base 10:
Scientific notation in base 2:
+ 0.101101 23x
+ 0.579 107x
Prepared by Dr. Hassan SALTI - 2012 67
A simple model (1/8)Floating-point representation
In digital computers, floating-point numbers consist of three parts:• A sign bit,• an exponent part: representing the exponent on a
power of 2,• a fractional part called a significand: which is a
fancy word for a mantissa.
Prepared by Dr. Hassan SALTI - 2012 68
A simple model (2/8)Floating-point representation
More bits used for the exponent increases the range of numbers
More bits used for the significant increases the precision
For simplicity, in all this course, we will use a simplified 14 bits model• Sign bit: 1 bit• Exponent: 5 bits• Significand: 8 bits
Prepared by Dr. Hassan SALTI - 2012 69
A simple model (3/8)Floating-point representation
Exercise 1: Represent the number 17 in a 14 bits floating point representation• 17 = 17.0 x 100 = 1.7 x 101 = 0.17 x 102
• Analogically in binary: • 1710 = 100012 x 20
= 1000.12 x 21= 100.012 x 22 = 10.0012 x23 =
1.00012 x 24 = 0.100012 x 25 = 0.0100012 x 26 = 0.00100012 x 27 = ...• As a convention, we stop when the MSB of the significant is “1”:
0.100012 x 25
• The exponent is 510 = 001012
• The significant is: 100012 → 100010002
• So: 0 0 0 1 0 1 1 0 0 0 1 0 0 0
Prepared by Dr. Hassan SALTI - 2012 70
A simple model (4/8)Floating-point representation
The last floating point representation is not suitable for negative exponents• Example:
- the number 0.25 = 0.012 = 0.12 x 2-1
- How to represent the negative exponent -1?!
To solve such problems we use an excess-16 bias• All negative and positive exponents are added by 16• We say that the real exponent is replaced by a biased
exponent• All exponents are converted to positive biased exponents
Prepared by Dr. Hassan SALTI - 2012 71
A simple model (5/8)Floating-point representation
With an excess-16 bias• Exponent values less than 16 will indicate
negative exponent values• Exponent values more than 16 will indicate
positive exponent values• exponents of all zeros or all ones are typically
reserved for special numbers (such as zero or infinity).
Prepared by Dr. Hassan SALTI - 2012 72
A simple model (6/8)Floating-point representation
Example 1: Represent the number 17 in a 14 bits floating point form with excess-16 bias• The number is positive: sign bit is “0”• 1710 = 0.100012 x 25
• The exponent is 510 → (5+16)10 = 2110 = 101012
• The significant is: 100012 → 100010002
• So 17 in floating point form with excess-16 bias is:0 1 0 1 0 1 1 0 0 0 1 0 0 0
Prepared by Dr. Hassan SALTI - 2012 73
A simple model (7/8)Floating-point representation
Example 2: Represent the number 0.2510 in a 14 bits floating point form with excess-16 bias.• The number is positive: sign bit is “0”• 0.25 = 0.012 x 20 = 0.12 x 2-1
• The exponent is -110 → (-1+16)10 = 1510 = 011112
• The significant is 1 → 10000000• So 0.25 in floating point form with excess-16 bias
is: 0 0 1 1 1 1 1 0 0 0 0 0 0 0
Prepared by Dr. Hassan SALTI - 2012 74
A simple model (8/8)Floating-point representation
Example 3: Express -0.0312510 in normalized floating-point form with excess-16 bias.• The number is negative: sign bit is “1”• 0.0312510 = 0.000012 = 0.00001x20 = 0.0001x2-1 =
… = 0.1x2-4
• The exponent is -410 → (-4+16)10 = 1210 = 011002
• The significant is 1 → 10000000• So -0.03125 in floating point form with excess-16
bias is: 1 0 1 1 0 0 1 0 0 0 0 0 0 0
Prepared by Dr. Hassan SALTI - 2012 75
Lecture Overview
Introduction Positional Numbering System Decimal to binary conversion Signed integer representation Floating-point representation• A simple model• Floating-point arithmetic• Floating point errors
Prepared by Dr. Hassan SALTI - 2012 76
Floating point arithmetic (1/2)Floating-point representation
To add/subtract two numbers in floating point form• Both numbers should have the same exponent• If exponents are different
1. we change one of the numbers so that both of them are expressed in the same power of the base
2. We add the binary numbers3. We represent the result in a normalized floating
point form
Prepared by Dr. Hassan SALTI - 2012 77
Floating point arithmetic (2/2)Floating-point representation
Example: Add the following binary numbers as represented in a normalized 14-bit format with an excess-16 bias.
The second number is 0.10011010x20
The first number is 0.11001000x22 = 11.001000x20
Now 0.100110102 + 11.0010002 :
0.1 0 0 1 1 0 1 0+ 1 1.0 0 1 0 0 0 0 0 1 1.1 0 1 1 1 0 1 0
The result is 11.10111010 x 20 = 0.1110111010 x 22
In floating point form with excess-16
0 1 0 0 1 0 1 1 0 0 1 0 0 0
+ 0 1 0 0 0 0 1 0 0 1 1 0 1 0
1810 → 210
1610 → 010
0 1 0 0 1 0 1 1 1 0 1 1 1 0
Prepared by Dr. Hassan SALTI - 2012 78
Lecture Overview
Introduction Positional Numbering System Decimal to binary conversion Signed integer representation Floating-point representation• A simple model• Floating-point arithmetic• Floating point errors
Prepared by Dr. Hassan SALTI - 2012 79
Floating Point Errors (1/2)Floating-point representation
Computers are finite systems When dealing with floating-point form, we are
modeling the infinite system of real numbers in a finite system of integers
What we have, in truth, is an approximation of the real number system
The more bits we use, the better the approximation However, there is always some element of error Such errors can propagate through a lengthy
calculation, causing substantial loss of precision
Prepared by Dr. Hassan SALTI - 2012 80
Floating Point Errors (2/2)Floating-point representation
Example: • In our previous simple model
- we are limited between -0.111111112x215 through +0.111111112x215.
- we cannot store 2x-19 or 2128; they simply don’t fit.- Also, 128.5 cannot be accurately stored even if it is well within
our range→ 128.510 = 10000000.12 = 0.1000000012x28
→ The significant is expressed with more than 8 bits!→ In practice we store only the first 8 bits: 10000000→ We actually store 128 and not 128.5 with an absolute error of 0.5→ The relative error is : 128.5 - 128 = 0.0038910 = 0.39%.
128.5
End of lecture 2
Try to solve all exercises related to lecture 2