chapter 3 data representation. chapter goals describe numbering systems and their use in data...
Post on 20-Dec-2015
238 views
TRANSCRIPT
Chapter 3
Data Representation
Chapter goals Describe numbering systems and their use in
data representation Compare and contrast various data
representation methods Describe how nonnumeric data is
represented
Data representation Humans have many symbolic
forms to represent information Alphabet, numbers, pictograms
Computer can only represent information with electrical signals Is a circuit on or off?
Computers, numbers, and binary data Computers only use on/off signals
to represent information These signals can only represent
numeric data Even character based data is
represented as a number
Why binary data? Electricity has two states, on and
off On = 1 Off = 0
Binary numbers only have 0s and 1s
Data is stored as collections of binary numbers
Binary numbers are “computer friendly” Binary numbers are signals that
can easily be transported Binary numbers can be easily
processed (transformed) by two-state electrical devices that are easy to design and fabricate These devices (and/or gates, adders)
are strung together like an assembly line to carry out a function
Logic gates
Boolean algebra System developed by George
Boole (19th century mathematician) that can determine if two values are:
Equal, not equal, less than, greater than, etc.
Boolean algebra allows the CPU to carry out binary arithmetic (see White p.36-37)
Binary numbers Can be combined into a positional
numbering system Base for decimal numbers is 10,
base for binary numbers is 2 Each position to the left is an
increasing factor of 2
Terminology for number systems Base is also referred to as the radix Binary numbers have a radix of 2 Decimal numbers have a radix of 10 Radix point separates whole values
from fractional values Decimal point is a kind of radix point
Base 2 positional example
Numbering systems Higher base (radix) means fewer
positions are needed to represent a number
Base 2 needs many more positions than base 10
Base 16 (hexidecimal) is often used to represent binary numbers
Computers & binary numbers Each digit of a binary number is
called a bit Bit string – group of digits that
describes a single value
Bit strings Left most bit (most significant bit) called
high order bit Right most bit (least significant bit)
called low order bit 8 bits make a byte Programming
languages/spreadsheets/etc. automatically translate from base 10 to base 2 and back again
Hexadecimal notation Base or radix is 16 More compact than binary Symbols used are 0-9, A-F One hexadecimal position
corresponds to 4 bits Used to designate memory
locations, colors (html & VB)
Goals of computer data representation
Any representation format for numeric data represents a balance among several factors, including: Compactness Accuracy Range Ease of manipulation Standardization
Balancing objectives Compactness and range are
inversely related: the more compact, the smaller the range
Accuracy increases with # of bits used, especially with real numbers: example, 1/3, or 0.33333333 (non-terminating fraction)
Other objectives Does information format make it
easier for processor to perform operations?
Is data in a standard format, allowing simple transfer between computers?
CPU standard data types Integer Real number Character Boolean Memory address
Integer data types Unsigned – assumed to be positive Signed – uses one bit (usually high
order bit) to indicate sign 0 is positive, 1 is negative
Representing negative integers Excess notation and twos
complement Allow subtraction to be carried out
as addition Number is converted to its complement 1 is added to the result When added to another binary number,
carry bit is ignored
Range and overflow Most CPUs use a fixed width of 32
or 64 bits to represent an integer For small numbers format is
padded with leading zeros Machine processes fixed width
information more easily than variable width
Integer overflow If number is too big for fixed width
integer format CPU throws an overflow error
Integer format width is tradeoff between overflow and wasted space (padded zeros)
CPU often use double precision data types for arithmetic operations
Representing real numbers More complicated problem than
storing integers Real numbers contain whole &
fractional components How to represent both parts
together in one format?
Fixed format for real numbers
Floating point notation Any real number can be re-written
using floating point (scientific notation)
12.555 becomes 1.2555 X 10¹ Format stores 12555 (mantissa), 1
(exponent), and sign (+) -143.99 becomes 1.4399 X 102
Format stores 14399 (mantissa), 2 (exponent), and sign (-)
IEEE floating point format for real numbers
Floating point range Number of bits in floating point
format limit range of exponent, mantissa
Overflow (too large a number) always occurs in the exponent
Underflow (too small a number, i.e. negative exponent does not fit)
Range for mantissa Number of bits for mantissa limit the
number of significant digits stored for a real number
23 bits allows for approx. 7 decimal places of precision
Mantissa is stored using truncation (information that does not fit is discarded) Does not throw an overflow condition
Processing complexity General rule is floating point
operations (+, -, *, etc.) take CPU twice as long as integers (binary)
Floating Point Operations Per Second (FLOPS) is a measure of processor speed
Character data Alphabetic letters (upper & lower
case), numerals, punctuation marks, special symbols are called characters
Variable of type character contain only one symbol
Sequence of symbols forming words, sentences, etc. called a string
How computers store characters Character data cannot be directly
processed by a computer Must be translated into a number Characters are converted into
numbers using a table of correspondences between a character and a bit string
Design issues for character coding schemes Table must be publicly available
and all users must use the same table
Coding scheme is a tradeoff among compactness, ease of manipulation, accuracy, range, & standardization
Examples of character coding schemes BCD and EBCIDIC – older IBM
mainframe computers ASCII – PCs Unicode – larger format allows for
expanded and international alphabets (Java and internet applications)
ASCII coding scheme 7 bit format allows for parity bit
(used to check for errors over transmission lines)
Has unique codes for all uppercase & lowercase letters, numbers, other printable characters
Also includes codes for device control
Device control In many applications that handle
text, formatting & commands to a device are included in the same stream of data as the text Examples: word processors (reveal
codes), HTML tags Examples: CR (carriage return),
tab, form feed
Limitations to ASCII Not robust enough to represent
multiple languages and symbols 7 bit format allows for 128 unique
codes, some languages have thousands of symbols
Unicode (16 bit) has 65,536 entries
Boolean data Data types has two values, true
and false Can be stored with one bit The results of many CPU
operations (comparisons) generate a Boolean value stored in a register
Memory addresses Primary storage is a series of
contiguous bytes CPU must be able to access
sections of memory directly Sections of memory are accessed
by their address (location)
Formats for memory addresses Flat memory model – memory starts at
address 0, goes to maximum capacity – 1 Simple integers used to store address
Segmented memory model Memory is divided into equal sized segments
called pages Address has two parts
00FA:0034 number for page, and location within page
Data structures These five primitive types are quite
limited for representing real world data Words, sentences Dates Data base tables
More complex data structures constructed from these five primitive types
Data Structures
Data Structures
Data Structures
Data Structures
Data Structures
Data Structures
Data Structures
Data Structures
Data Structures
Data Structures
Data Structures
Chapter summary To be processed by any device, data must be
converted from its native format into a form suitable for the processing device.
All data, including nonnumeric data, are represented within a modern computer system as strings of binary digits, or bits.
Each bit string has a specific data format and coding method.
Summary (cont.) Numeric data is stored using integer, real
number, and floating point formats. Characters are converted to numbers by
means of a coding table. Boolean vales can have only two values, true
and false. Programs often need to define and
manipulate data in larger and more complex units than primitive CPU data types.