chapter 3 data representation. chapter goals describe numbering systems and their use in data...

Chapter 3

Data Representation

Chapter goals Describe numbering systems and their use in

data representation Compare and contrast various data

representation methods Describe how nonnumeric data is

represented

Data representation Humans have many symbolic

forms to represent information Alphabet, numbers, pictograms

Computer can only represent information with electrical signals Is a circuit on or off?

Computers, numbers, and binary data Computers only use on/off signals

to represent information These signals can only represent

numeric data Even character based data is

represented as a number

Why binary data? Electricity has two states, on and

off On = 1 Off = 0

Binary numbers only have 0s and 1s

Data is stored as collections of binary numbers

Binary numbers are “computer friendly” Binary numbers are signals that

can easily be transported Binary numbers can be easily

processed (transformed) by two-state electrical devices that are easy to design and fabricate These devices (and/or gates, adders)

are strung together like an assembly line to carry out a function

Logic gates

Boolean algebra System developed by George

Boole (19th century mathematician) that can determine if two values are:

Equal, not equal, less than, greater than, etc.

Boolean algebra allows the CPU to carry out binary arithmetic (see White p.36-37)

Binary numbers Can be combined into a positional

numbering system Base for decimal numbers is 10,

base for binary numbers is 2 Each position to the left is an

increasing factor of 2

Terminology for number systems Base is also referred to as the radix Binary numbers have a radix of 2 Decimal numbers have a radix of 10 Radix point separates whole values

from fractional values Decimal point is a kind of radix point

Base 2 positional example

Numbering systems Higher base (radix) means fewer

positions are needed to represent a number

Base 2 needs many more positions than base 10

Base 16 (hexidecimal) is often used to represent binary numbers

Computers & binary numbers Each digit of a binary number is

called a bit Bit string – group of digits that

describes a single value

Bit strings Left most bit (most significant bit) called

high order bit Right most bit (least significant bit)

called low order bit 8 bits make a byte Programming

languages/spreadsheets/etc. automatically translate from base 10 to base 2 and back again

Hexadecimal notation Base or radix is 16 More compact than binary Symbols used are 0-9, A-F One hexadecimal position

corresponds to 4 bits Used to designate memory

locations, colors (html & VB)

Goals of computer data representation

Any representation format for numeric data represents a balance among several factors, including: Compactness Accuracy Range Ease of manipulation Standardization

Balancing objectives Compactness and range are

inversely related: the more compact, the smaller the range

Accuracy increases with # of bits used, especially with real numbers: example, 1/3, or 0.33333333 (non-terminating fraction)

Other objectives Does information format make it

easier for processor to perform operations?

Is data in a standard format, allowing simple transfer between computers?

CPU standard data types Integer Real number Character Boolean Memory address

Integer data types Unsigned – assumed to be positive Signed – uses one bit (usually high

order bit) to indicate sign 0 is positive, 1 is negative

Representing negative integers Excess notation and twos

complement Allow subtraction to be carried out

as addition Number is converted to its complement 1 is added to the result When added to another binary number,

carry bit is ignored

Range and overflow Most CPUs use a fixed width of 32

or 64 bits to represent an integer For small numbers format is

padded with leading zeros Machine processes fixed width

information more easily than variable width

Integer overflow If number is too big for fixed width

integer format CPU throws an overflow error

Integer format width is tradeoff between overflow and wasted space (padded zeros)

CPU often use double precision data types for arithmetic operations

Representing real numbers More complicated problem than

storing integers Real numbers contain whole &

fractional components How to represent both parts

together in one format?

Fixed format for real numbers

Floating point notation Any real number can be re-written

using floating point (scientific notation)

12.555 becomes 1.2555 X 10¹ Format stores 12555 (mantissa), 1

(exponent), and sign (+) -143.99 becomes 1.4399 X 102

Format stores 14399 (mantissa), 2 (exponent), and sign (-)

IEEE floating point format for real numbers

Floating point range Number of bits in floating point

format limit range of exponent, mantissa

Overflow (too large a number) always occurs in the exponent

Underflow (too small a number, i.e. negative exponent does not fit)

Range for mantissa Number of bits for mantissa limit the

number of significant digits stored for a real number

23 bits allows for approx. 7 decimal places of precision

Mantissa is stored using truncation (information that does not fit is discarded) Does not throw an overflow condition

Processing complexity General rule is floating point

operations (+, -, *, etc.) take CPU twice as long as integers (binary)

Floating Point Operations Per Second (FLOPS) is a measure of processor speed

Character data Alphabetic letters (upper & lower

case), numerals, punctuation marks, special symbols are called characters

Variable of type character contain only one symbol

Sequence of symbols forming words, sentences, etc. called a string

How computers store characters Character data cannot be directly

processed by a computer Must be translated into a number Characters are converted into

numbers using a table of correspondences between a character and a bit string

Design issues for character coding schemes Table must be publicly available

and all users must use the same table

Coding scheme is a tradeoff among compactness, ease of manipulation, accuracy, range, & standardization

Examples of character coding schemes BCD and EBCIDIC – older IBM

mainframe computers ASCII – PCs Unicode – larger format allows for

expanded and international alphabets (Java and internet applications)

ASCII coding scheme 7 bit format allows for parity bit

(used to check for errors over transmission lines)

Has unique codes for all uppercase & lowercase letters, numbers, other printable characters

Also includes codes for device control

Device control In many applications that handle

text, formatting & commands to a device are included in the same stream of data as the text Examples: word processors (reveal

codes), HTML tags Examples: CR (carriage return),

tab, form feed

Limitations to ASCII Not robust enough to represent

multiple languages and symbols 7 bit format allows for 128 unique

codes, some languages have thousands of symbols

Unicode (16 bit) has 65,536 entries

Boolean data Data types has two values, true

and false Can be stored with one bit The results of many CPU

operations (comparisons) generate a Boolean value stored in a register

Memory addresses Primary storage is a series of

contiguous bytes CPU must be able to access

sections of memory directly Sections of memory are accessed

by their address (location)

Formats for memory addresses Flat memory model – memory starts at

address 0, goes to maximum capacity – 1 Simple integers used to store address

Segmented memory model Memory is divided into equal sized segments

called pages Address has two parts

00FA:0034 number for page, and location within page

Data structures These five primitive types are quite

limited for representing real world data Words, sentences Dates Data base tables

More complex data structures constructed from these five primitive types

Data Structures

Chapter summary To be processed by any device, data must be

converted from its native format into a form suitable for the processing device.

All data, including nonnumeric data, are represented within a modern computer system as strings of binary digits, or bits.

Each bit string has a specific data format and coding method.

Summary (cont.) Numeric data is stored using integer, real

number, and floating point formats. Characters are converted to numbers by

means of a coding table. Boolean vales can have only two values, true

and false. Programs often need to define and

manipulate data in larger and more complex units than primitive CPU data types.

chapter 3 data representation. chapter goals describe numbering systems and their use in data...

Documents