cs ill ch3

Click here to load reader

Post on 01-Nov-2014




0 download

Embed Size (px)




  • 1. Data Representation Chapter 3 When you go on a trip, you probably follow a road map. The map is not the land over which you travel; it is a representation of that land. The map has captured the essential information needed to accomplish the goal of getting from one place to another. Likewise, the data we need to store and manage on a computer must be represented in a way that captures the essence of the infor- mation, and it must do so in a form convenient for computer processing. Building on the fundamental concepts of the binary number system established in Chapter 2, this chapter explores how we represent and store the various kinds of information a computer manages. 51
  • 2. Multimedia Several different media types Data compression Reducing the amount of space needed to store a piece of data Bandwidth The number of bits or bytes that can be transmitted from one place to another in a fixed amount of time 52 Chapter 3 Data Representation Goals After studying this chapter, you should be able to: I distinguish between analog and digital information. I explain data compression and calculate compression ratios. I explain the binary formats for negative and floating-point values. I describe the characteristics of the ASCII and Unicode character sets. I perform various types of text compression. I explain the nature of sound and its representation. I explain how RGB values define a color. I distinguish between raster and vector graphics. I explain temporal and spatial video compression. 3.1 Data and Computers Without data, computers would be useless. Every task a computer under- takes deals with managing data in some way. Therefore, our need to repre- sent and organize that data in appropriate ways is paramount. In the not-so-distant past, computers dealt almost exclusively with numeric and textual data, but now computers are truly multimedia devices, dealing with a vast array of information categories. Computers store, present, and help us modify many different types of data, including: I Numbers I Text I Audio I Images and graphics I Video Ultimately, all of this data is stored as binary digits. Each document, picture, and sound bite is somehow represented as strings of 1s and 0s. This chapter explores each of these types of data in turn and discusses the basic ideas behind the ways in which we represent these types of data on a computer. We cant discuss data representation without also talking about data compressionreducing the amount of space needed to store a piece of data. In the past we needed to keep data small because of storage limita- tions. Today, computer storage is relatively cheap; but now we have an even more pressing reason to shrink our data: the need to share it with others. The Web and its underlying networks have inherent bandwidth
  • 3. Compression ratio The size of the compressed data divided by the size of the uncompressed data Lossless compression A technique in which there is no loss of infor- mation Lossy compression A technique in which there is loss of informa- tion Analog data Informa- tion represented in a continuous form Digital data Information represented in a discrete form 3.1 Data and Computers 53 restrictions that define the maximum number of bits or bytes that can be transmitted from one place to another in a fixed amount of time. The compression ratio gives an indication of how much compression occurs. The compression ratio is the size of the compressed data divided by the size of the original data. The values could be in bits or characters or whatever is appropriate as long as both values are measuring the same thing. The ratio should result in a number between 0 and 1. The closer the ratio is to zero, the tighter the compression. A data compression technique can be lossless, which means the data can be retrieved without losing any of the original information. Or it can be lossy, in which case some information is lost in the process of compaction. Although we never want to lose information, in some cases the loss is acceptable. When dealing with data representation and compres- sion, we always face a tradeoff between accuracy and size. Analog and Digital Information The natural world, for the most part, is continuous and infinite. A number line is continuous, with values growing infinitely large and small. That is, you can always come up with a number larger or smaller than any given number. And the numeric space between two integers is infinite. For instance, any number can be divided in half. But the world is not just infi- nite in a mathematical sense. The spectrum of colors is a continuous rainbow of infinite shades. Objects in the real world move through contin- uous and infinite space. Theoretically, you could always close the distance between you and a wall by half, and you would never actually reach the wall. Computers, on the other hand, are finite. Computer memory and other hardware devices have only so much room to store and manipulate a certain amount of data. We always fail in our attempt to represent an infi- nite world on a finite machine. The goal, then, is to represent enough of the world to satisfy our computational needs and our senses of sight and sound. We want to make our representations good enough to get the job done, whatever that job might be. Information can be represented in one of two ways: analog or digital. Analog data is a continuous representation, analogous to the actual infor- mation it represents. Digital data is a discrete representation, breaking the information up into separate elements. A mercury thermometer is an analog device. The mercury rises in a continuous flow in the tube in direct proportion to the temperature. We calibrate and mark the tube so that we can read the current temperature, usually as an integer such as 75 degrees Fahrenheit. However, a mercury thermometer is actually rising in a continuous manner between degrees. So at some point in time the temperature is actually 74.568 degrees
  • 4. Digitize The act of breaking information down into discrete pieces 54 Chapter 3 Data Representation 73 75 74 76 77 78 Figure 3.1 A mercury thermometer continually rises in direct proportion to the tempera- ture Fahrenheit, and the mercury is accurately indicating that, even if our markings are not fine enough to note such small changes. See Figure 3.1. Analog information is directly proportional to the continuous, infinite world around us. Computers, therefore, cannot work well with analog information. So instead, we digitize information by breaking it into pieces and representing those pieces separately. Each of the representations we discuss in this chapter has found an appropriate way to take a continuous entity and separate it into discrete elements. Those discrete elements are then individually represented using binary digits. But why do we use binary? We know from Chapter 2 that binary is just one of many equivalent number systems. Couldnt we use, say, the decimal number system, with which we are already more familiar? We could. In fact, its been done. Computers have been built that are based on other number systems. However, modern computers are designed to use and manage binary values because the devices that store and manage the data are far less expensive and far more reliable if they only have to represent one of two possible values. Also, electronic signals are far easier to maintain if they transfer only binary data. An analog signal continually fluctuates in voltage up and down. But a digital signal has only a high or low state, corresponding to the two binary digits. See Figure 3.2. All electronic signals (both analog and digital) degrade as they move down a line. That is, the voltage of the signal fluctuates due to environ- mental effects. The trouble is that as soon as an analog signal degrades,
  • 5. Pulse-code modulation Variation in a signal that jumps sharply between two extremes Reclock The act of reasserting an original digital signal before too much degradation occurs 3.1 Data and Computers 55 Figure 3.2 An analog and a digital signal Threshhold Figure 3.3 Degradation of analog and digital signals information is lost. Since any voltage level within the range is valid, its impossible to know what the original signal state was or even that it changed at all. Digital signals, on the other hand, jump sharply between two extremes. This is referred to as pulse-code modulation (PCM). A digital signal can degrade quite a bit before any information is lost, because any voltage value above a certain threshold is considered a high value, and any value below that threshold is considered a low value. Periodically, a digital signal is reclocked to regain its original shape. As long as it is reclocked before too much degradation occurs, no information is lost. Figure 3.3 shows the degradation effects of analog and digital signals. Binary Representations As we undertake the details of representing particular types of data, its important to remember the inherent nature of using binary. One bit can be either 0 or 1. There are no other possibilities. Therefore, one bit can repre- sent only two things. For example, if we wanted to classify a food as being either sweet or sour, we would need only one bit to do it. We could say that if the bit is 0, the food is sweet, and if the bit is 1, the food is sour. But if we want to have additional classifications (such as spicy), one bit is not sufficient. To represent more than two things, we need multiple bits. Two bits can represent four things because there are four combinations of 0 and 1 that can be made from two bits: 00, 01, 10, and 11. So, for instance, if we want to represent which of four possible gears a car is in (park, drive, reverse, or neutral), we need only two bits. Park could be represented by 00, drive by 01, reverse by 10, and neutral by 11. The actual mapping between bit combinations and the thing each combination represents is sometimes irrelevant (00 could be used to represent reverse, if you prefer), though
  • 6. 56 Chapter 3 Data Representation Figure 3.4 Bit combinations 1 Bit 0 1 2 Bits 00 01 10 11 3 Bits 000 001 010 011 100 101 110 111 4 Bits 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 5 Bits 00000 00001 00010 00011 00100 00101 00110 00111 01000 01001 01010 01011 01100 01101 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111 sometimes the mapping can be meaningful and important, as we discuss in later sections of this chapter. If we want to represent more than four things, we need more than two bits. Three bits can represent eight things because there are eight combina- tions of 0 and 1 that can be made from three bits. Likewise, four bits can represent 16 things, five bits can represent 32 things, and so on. See Figure 3.4. Note that the bit combinations are simply counting in binary as you move down a column.
  • 7. Sign-magnitude repre- sentation Number representation in which the sign represents the ordering of the number (negative and positive) and the value represents the magnitude 3.2 Representing Numeric Data 57 In general, n bits can represent 2n things because there are 2n combina- tions of 0 and 1 that can be made from n bits. Note that every time we increase the number of available bits by 1, we double the number of things we can represent. Lets turn the question around. How many bits do you need to repre- sent, say, 25 unique things? Well, four bits wouldnt be enough because four bits can represent only 16 things. We would have to use at least five bits, which would allow us to represent 32 things. Since we only need to represent 25 things, some of the bit combinations would not have a valid interpretation. Keep in mind that even though we may technically need only a certain minimum number of bits to represent a set of items, we may allocate more than that for the storage of that data. There is a minimum number of bits that a computer architecture can address and move around at one time, and it is usually a power of two, such as 8, 16, or 32 bits. Therefore, the minimum amount of storage given to any type of data is in multiples of that value. 3.2 Representing Numeric Data Numeric values are the most prevalent type of data used in a computer system. Unlike other types of data, there may seem to be no need to come up with a clever mapping between binary codes and numeric data. Since binary is a number system, there is a natural relationship between the numeric information and the binary values that we store to represent them. This is true, in general, for positive integer data. The basic issues regarding integer conversions were covered in Chapter 2 in the general discussion of the binary system and its equivalence to other bases. However, there are other issues regarding the representation of numeric information to consider at this point. Integers are just the beginning in terms of numeric data. This section discusses the representation of negative and noninteger values. Representing Negative Values Arent negative numbers just numbers with a minus sign in front? Perhaps. That is certainly one valid way to think about them. Lets explore the issue of negative numbers, and discuss appropriate ways to represent them on a computer. Signed-Magnitude Representation You have used the signed-magnitude representation of numbers since you first learned about negative numbers in grade school. In the traditional
  • 8. 58 Chapter 3 Data Representation decimal system, a sign (+ or ) is placed before a numbers value, though the positive sign is often assumed. The sign represents the ordering, and the digits represent the magnitude of the number. The classic number line looks something like this, in which a negative sign meant that the number was to the left of zero and the positive number was to the right of zero. Performing addition and subtraction with signed integer numbers can be described as moving a certain number of units in one direction or another. To add two numbers you find the first number on the scale and move in the direction of the sign of the second as many units as specified. Subtrac- tion was done in a similar way, moving along the number line as dictated by the sign and the operation. In grade school you soon graduated to doing addition and subtraction without...