shannon revisited: information in terms of uncertainty

8
Shannon Revisited: Information in Terms of Uncertainty Charles Cole Department of Information Studies, University of Sheffield, Sheffield SlO 2TN, England Shannon’s theory of communication is discussed from the point of view of his concept of uncertainty. It is suggested that there are two information concepts in Shannon, two different uncertainties, and at least two different entropy concepts. Information science focuses on the uncertainty associated with the trans- mission of the signal rather than the uncertainty associated with the selection of a message from a set of possible messages. The author believes the latter information concept, which is from the sender’s point of view, has more to say to information science about what information is than the former, which is from the receiver’s point of view and is mainly concerned with “noise” reduction. Introduction Shannon’s mathematical theory of communication, writ- ten in 1948 as a solution to an engineering problem, has been termed “the most commonly proposed informa- tion concept for information science . . . almost the only formalized, mathematical, and successfully implemented in- formation concept ever proposed for any purpose” (Belkin, 1978). It has been extensively explored by information science (see Zunde, 1981, for a 400-item bibliography); and it is still of immense use in other fields, ranging from literary theory (Iser, 1978) to quantum mechanics (Horgan, 1992). However, when defined narrowly in terms of its stated purpose, Shannon’s theory of information appears to be at odds with information science’s traditional view of what information is or what a central component of any concept of information for information science should be. Shannon was concerned with defining information statistically rather than defining it conceptually, measuring information in bits instead of describing what information is in terms of its value or the effect it might have on the person receiving it, and he explicitly excluded from his analysis the more usual information science concepts derived from psychology and sociology in favor of concepts used in Received September 19, 1991; revised September 14, 1992 and October 29, 1992; accepted December 9, 1992. 0 1993 John Wiley & Sons, Inc. physics such as entropy, probability, and the probabilistic notion of uncertainty. Some consider the gap between Shannon’s and information science’s concept of informa- tion insurmountable (Machlup, 1983); while others believe that information science has not taken advantage of all that Shannon has said about information (Zunde, 1981). A major tendency in information science, however, is to bridge the gap too easily by invoking Shannon in support of a psychological notion of information that hinges on an influential and well-argued extension of Shannon’s theory by the psychologist Garner (1962). The enigma at the heart of Shannon is: Whose informa- tion does the theory refer to -the information “possessed by the sender” or “the ignorance of the receiver that is removed by receipt of the message?” (Jaynes, 1978). Both perspectives are addressed in Shannon’s theory. There is a stimulating but confusing fluidity between the two perspectives centering on the use of the same two terms, “uncertainty” and “entropy,” to measure (i) the information content of a message and (ii) the anti-information content of “noise.” The result is a paradox, or what seems to be a paradox. When Shannon is discussing information from the perspec- tive of the sender of the message, the following equation holds true: information = uncertainty = entropy; when he is discussing information from the perspective of the receiver of the message, the opposite equation holds true: information = reduction of uncertainty/entropy. For the sake of convenience, we will label these two apparently opposed information concepts “Shannon’s in- formation concept I” and “Shannon’s information concept II,” respectively. Information Science’s “Take” on Shannon Concerned with the task of accessing information for someone else-a client, a patron, a university student-information science has traditionally approached Shannon’s information theory from the point of view of the JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE. 44(4):204-211, 1993 CCC 0002~8231/93/040204-08

Upload: charles-cole

Post on 06-Jun-2016

216 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Shannon revisited: Information in terms of uncertainty

Shannon Revisited: Information in Terms of Uncertainty

Charles Cole Department of Information Studies, University of Sheffield, Sheffield SlO 2TN, England

Shannon’s theory of communication is discussed from the point of view of his concept of uncertainty. It is suggested that there are two information concepts in Shannon, two different uncertainties, and at least two different entropy concepts. Information science focuses on the uncertainty associated with the trans- mission of the signal rather than the uncertainty associated with the selection of a message from a set of possible messages. The author believes the latter information concept, which is from the sender’s point of view, has more to say to information science about what information is than the former, which is from the receiver’s point of view and is mainly concerned with “noise” reduction.

Introduction

Shannon’s mathematical theory of communication, writ- ten in 1948 as a solution to an engineering problem, has been termed “the most commonly proposed informa- tion concept for information science . . . almost the only formalized, mathematical, and successfully implemented in- formation concept ever proposed for any purpose” (Belkin, 1978). It has been extensively explored by information science (see Zunde, 1981, for a 400-item bibliography); and it is still of immense use in other fields, ranging from literary theory (Iser, 1978) to quantum mechanics (Horgan, 1992).

However, when defined narrowly in terms of its stated purpose, Shannon’s theory of information appears to be at odds with information science’s traditional view of what information is or what a central component of any concept of information for information science should be. Shannon was concerned with defining information statistically rather than defining it conceptually, measuring information in bits instead of describing what information is in terms of its value or the effect it might have on the person receiving it, and he explicitly excluded from his analysis the more usual information science concepts derived from psychology and sociology in favor of concepts used in

Received September 19, 1991; revised September 14, 1992 and October 29, 1992; accepted December 9, 1992.

0 1993 John Wiley & Sons, Inc.

physics such as entropy, probability, and the probabilistic notion of uncertainty. Some consider the gap between Shannon’s and information science’s concept of informa- tion insurmountable (Machlup, 1983); while others believe that information science has not taken advantage of all that Shannon has said about information (Zunde, 1981). A major tendency in information science, however, is to bridge the gap too easily by invoking Shannon in support of a psychological notion of information that hinges on an influential and well-argued extension of Shannon’s theory by the psychologist Garner (1962).

The enigma at the heart of Shannon is: Whose informa- tion does the theory refer to -the information “possessed by the sender” or “the ignorance of the receiver that is removed by receipt of the message?” (Jaynes, 1978). Both perspectives are addressed in Shannon’s theory. There is a stimulating but confusing fluidity between the two perspectives centering on the use of the same two terms, “uncertainty” and “entropy,” to measure (i) the information content of a message and (ii) the anti-information content of “noise.”

The result is a paradox, or what seems to be a paradox. When Shannon is discussing information from the perspec- tive of the sender of the message, the following equation holds true:

information = uncertainty = entropy;

when he is discussing information from the perspective of the receiver of the message, the opposite equation holds true:

information = reduction of uncertainty/entropy.

For the sake of convenience, we will label these two apparently opposed information concepts “Shannon’s in- formation concept I” and “Shannon’s information concept II,” respectively.

Information Science’s “Take” on Shannon

Concerned with the task of accessing information for someone else-a client, a patron, a university student-information science has traditionally approached Shannon’s information theory from the point of view of the

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE. 44(4):204-211, 1993 CCC 0002~8231/93/040204-08

Page 2: Shannon revisited: Information in terms of uncertainty

receiver, thus highlighting information concept II-to the point where it has been called:

. . . the classical information-theoretic notion of infor- mation as a measure of the reduction of uncertainty. (Barnes, 1975)

An example of Shannon being used in this way is a recent study by Meghabghab and Bilal (1991). They propose:

. . . an optimal questioning strategy (OQS) based on Shannon’s probability theory [which] can assist in deter- mining the number of questions that should be asked of a patron in order to reduce uncertainty in the query and identify the patron’s information need. (Meghabghab & Bilal, 1991)

There are four information concepts here, and at least four uncertainties. The acknowledged uncertainty is the uncertainty “in the query,” and the purpose of the ques- tioning strategy is “to reduce uncertainty in the query.” We are told an “uncertain query” is a query that is “poorly formulated, unclear, imprecise, and sometimes incomplete” (Meghabghab and Bilal, 1991), which is a broadly defined version of Shannon’s concept of “noise.” The authors are therefore using the classical theoretic notion which equates information (the questions the librarian asks) with the reduction of uncertainty (the “noise”).

The three unacknowledged uncertainties in the informa- tion situation as given are (a) the uncertainty of the patron indicated by the word “need,” which is to some extent a psychological concept; (b) the uncertainty of the librarian vis-&vis the query and the patron, which also has some psychological overtones; and (c) the uncertainty associated with the information content of the query itself, before it is sent, without “noise” considerations.

These acknowledged and unacknowledged uncertainties play out in various ways in the information science lit- erature when Shannon is being discussed, interpreted, or explained:

(a>

@I

Cc)

Shannon is interpreted emphasizing the uncertainty the library patron or sender of the message feels that drives him to seek information even though psychological concerns are explicitly excluded from the theory by Shannon. [For a discussion of Shannon’s concept of un- certainty being used as a psychological concept driving people to seek information see Berlyne (1960, 1965);. for a discussion of Berlyne in terms of information science see Belkin & Vickery (198.5).] Shannon is interpreted by some information scientists emphasizing the psychological uncertainty or feeling of surprise of the librarian-receiver vis-ri-vis the incoming message instead of the statistical uncertainty in the query itself (the line between the two is a fine one). [For examples, see Hollnagel (1980); Yovits, Foulk, & Rose (1981); and Belkin & Vickery (1985).] Shannon is interpreted by some information scien- tists emphasizing the uncertainty associated with the transmission of the signal and the reduction of un-

certainty (“noise”), but the uncertainty associated with the message is referred to explicitly as well (Belzer, 1973), leaving the reader confused by the apparent contradiction between two information equations:

information = uncertainty; and information = the reduction of uncertainty.

An example of a type (c) interpretation of Shannon is the study, already quoted, by Meghabghab and Bilal (1991). There, the Shannon uncertainty focused on is the degradation of the signal (or query) due to “noise.” The librarian-receiver must get at the information by reducing the uncertainty due to “noise” by asking questions of the patron about the query, leading, as we have already said, to the classical theoretic notion:

information = the reduction of uncertainty.

However, the uncertainty due to noise is not explicitly separated out from the uncertainty associated with the message-query before it is sent-that is, before it has had a chance to degrade as a result of “noise.” This other uncertainty, however, is there in the query and will be the uncertainty that will most affect the number of questions the librarian must ask. For instance, the above quote from Meghabghab and Bilal (1991) can be analyzed in the following way:

The information in the situation (ignoring the added uncertainty due to “noise”) is a function of the “number of questions” the librarian has to ask the patron in order to reduce the uncertainty in the query; this varies according to the amount of uncertainty in the query itself, which is a function of the number of possible alternative messages in the set from which the message has been selected. If there is more uncertainty in the query, more questions need to be asked, therefore, the more uncertainty in the query the more information in the situation. Giving us: information = uncertainty,

which seems to be the exact opposite of our original reading of the Meghabghab and Bilal quote, which was:

information = the reduction of uncertainty.

Artandi (1973) refers to this apparent contradiction be- tween the two information equations as a “logical opposi- tion”:

-between uncertainty and removal of uncertainty which can make it difficult to differentiate between the concept of information as a measure of uncertainty and as a means of removing uncertainty. (Artandi, 1973)

Meghabghab and Bilal (1991) contain explicit references to both information equations, but they do not confront the opposition clearly. Instead they cite Klir and Folger (1988), who have at least implicitly confronted the logical opposition by adding the word “potential” in front of the first information equation, changing the two equations to:

“potential” information = uncertainty; and information = the reduction of uncertainty.

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE-May 1993 205

Page 3: Shannon revisited: Information in terms of uncertainty

What does “potential” information actually mean? Ar- tandi (1973) citing Nauta (1972), describes “potential” information as the difference between (1) what can be communicated (involving the total number of possible alternative messages that could have be chosen at the time the actual message was chosen and sent), which is the a priori probability of the message being selected; and (2) what is communicated, which is the aposteriori proba- bility or the revised probability of the received message taking into account the noise structure of the particular transmission (if there is noise). “Potential information,” therefore, could be linked to Weaver’s (1949, p. 95n) reference to “missing information,” the missing information in the transmission system due to entropy or entropy-like effects including “noise” (Dubois, 1991; Zunde, 1981). As well, the calculation of this “missing information” can be done in terms of the difference between the a priori entropy of the ensemble of messages that might have been selected and the a posteriori entropy after the message is received, which has been “reduced by noise so that less information is conveyed by the message” (Rothstein, 1990).

There have been various other ways of explaining the “logical opposition” between the two uncertainty equations. Bookstein and Klein (1990) suggest attributing one un- certainty to the “message set” and one uncertainty to the “information content of the transmission,” in effect dividing Shannon into two information concepts along the lines set out in the Introduction. As this is a clear-cut distinction, we will use the message-transmission distinction to simplify the information science position on Shannon’s two concepts of information as follows:

I The information in the message at the source, where information = uncertainty.

II The information in the transmission from the receiver- librarian’s point of view, where information = the reduction of uncertainty.

As information concept II has received the bulk of the attention in information science, Shannon’s information concept I will now be looked at in more detail.

Shannon Revisited: Information Concept I: Message

Shannon’s starting premise is that a message is important not only for itself but for the context of its selection from a set of possible alternative messages that could have been selected but were not:

(1) The significant aspect is that the actual message is one selected from a set of possible messages. [Shannon & Weaver (1949, p. 3); this page number is from the 8th printing, 1959, pagination varies from printing to printing.)

From this, he goes on to give an operational definition of information:

(2) If the number of messages in the set is finite then this number or any monotonic function of this number can be regarded as a measure of information produced when one message is chosen from the set, all choices being equally likely. (Shannon & Weaver, 1949, p. 3)

He also states that the “number” should be a “logarithmic measure.” If we were to put the operational definition given in the above two sentences into equation form it might look something like this:

information (operational definition) = the logarithm of the number of possible messages in the set when one message is chosen from the set, all messages being equally likely to be chosen.

An example of a set of messages is the English alphabet. In the unlikely case where all letters are equally likely to be chosen, the amount of information contained in the selection or choice of one letter from the set is the logarithm of the number “26.” When we create a message to be sent or communicated, we usually do not think of this process as choosing one letter after another from a set of possible messages. What does Shannon mean by “choosing” and “choice?”

Shannon’s Concept of Choice

In the operational definition given above, Shannon’s theory hinges on the selection or choice of one message from a set of possible messages, all messages being equally likely to be chosen-an unlikely but theoretically possible situation, except in games of chance, in the real everyday world.

In a mathematical theory, “choice” is a statistical concept indicating the degree of freedom or the degree of constraint in a selection situation if the choice is made from a set where all messages are not equally likely to be chosen. When throwing dice, for instance, “choice” means how often a number comes up. With fair dice, each number has an equal chance of “coming up” or being “chosen” (by the objective world); therefore, freedom and constraint are in balance for each of the possible selections in the set, each with probability of .167, which in terms of the overall system gives maximum choice because no one choice is favored over the others. With loaded dice certain numbers are favored over others. In the English language, certain letters are favored over others: some letters are chosen only rarely, and certain combinations of letters are favored over other combinations because of what Shannon calls “residue of influence”-that is, the first letter to be chosen influences the choice of the next. In these cases the letters are not randomly selected or freely chosen.

Factors Inhibiting Freedom of Choice

Given the statistical structure of the English language, all choices are not equally likely; there is a statistical interdependence between the letters or words in a string of letters or words that affects the statistical independence of each choice. To explain the inhibiting factor at work in choosing a message made up of letters from the English alphabet, Shannon brings in his concept of redundancy:

206 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE-May 1993

Page 4: Shannon revisited: Information in terms of uncertainty

The redundancy of ordinary English, not considering statistical structure over greater distances than about eight letters, is roughly 50%. This means that when we write English half of what we write is determined by the structure of the language and half is chosen freely. (Shannon & Weaver, 1949, pps. 25, 26)

In these two sentences Shannon makes an equivalency that is very helpful in understanding what his conceptual notion of information might be (as opposed to his opera- tional definition of information which he does give). He also sets up the sentences (50% this, 50% that) so that we, the readers, think we are getting dichotomous terms when we are not. We supply them ourselves, almost as a reflex, but this will prove useful.

In the second of the two sentences just quoted, Shannon says that half of what we write is “chosen freely,” and the other half is determined by the structure of the lan- guage. The immediate implication here is that the structure of the language “determines” or “influences” the choice, giving us:

50% = chosen freely and 50% = influenced,

which we can relate, in the first sentence, to the 50% of ordinary English that Shannon calls “redundant.” There- fore, by joining the two sentences together and adding the dichotomous terms ourselves, it is possible to draw the following conclusion about the statistical structure of the English language (not more than eight letters) as Shan- non means us to see it:

50% = redundant = influenced, and 50% = chosen freely.

The actual dichotomous terms in the above equations are “chosen freely” and “influenced,” and as Shannon uses the term “residue of influence” himself, it might be helpful in our analysis to separate this out as a corollary to his principle of redundancy, because he is really talking about “influenced choosing.”

Shannon now introduces the concept of probability to statistically measure the “residue of influence” factor. Prob- ability is a predictive tool, a way of extrapolating from past experience the statistical dependence of one event (for instance, the choosing of a letter from the alphabet) given what has already happened. In selecting letters from the English alphabet, each choice is conditional on letters already chosen in the message string; a conditional prob- ability can be predetermined: for example, given x, the probability of y being chosen is .8.

As a more specific example, if the English alphabet is the set of symbols being used in the message to be sent, certain letters are used more than others, the letter E more than W, for instance, because of the structure of the language (grammar, tradition, pronunciation limitations, etc.). The letter E “is chosen with probability .12 . . . and W with probability .02” (Shannon & Weaver, 1949, p. 13). Therefore, when encoding, time and channel capacity can be saved if, as is done in telegraphy, E is encoded into a

single dot, “while the infrequent letters, Q, X, Z, are repre- sented by longer sequences of dots and dashes” (Shannon & Weaver, 1949, p. 10). The single letter frequency is what Shannon calls “first order approximation.” Second order approximation is “after a letter is chosen, the next one is chosen in accordance with the frequencies with which the various letters follow the first one” (Shannon & Weaver, 1949, p. 13). An example of second order approximation is the letter H, which is more likely to follow the letter T in English than the letter P is to follow X (Shannon & Weaver, 1949, p. 10). Third order approximation is the third letter in relation to the first two used, etc. The same sort of probability calculations apply to word units and whole sequences of words.

There is, however, another more interesting part to Shannon’s principle of redundancy which he only implies but which gives a certain insight into his thinking about what information is. The dictionary definition of redundant is:

that [which] can be omitted without loss of signif- icance. (Oxford Dictionary, 1985)

This is the meat of Shannon’s principle of redundancy as it concerns his concept of information. It implies that that part of the choosing of a message which is “influenced” can be omitted without loss of significance; it is not necessary from the point of view of the information content of the message before it is sent. (Redundancy or “extra” context is only necessary to the receiver to reconstruct information lost during transmission (Shannon & Weaver, 1949, p. 43) so that the message can be understood.)

We have separated out “residue of influence” from Shannon’s principle of redundancy, which allowed us to add words from the dictionary. Now by joining them together again, this gives us the following expanded form of the above equation:

50% = redundant = influenced = not necessary 50% = chosen freely = not redundant = necessary,

and the following relations among the Shannon concepts:

information (high) freedom of choice (high) degree of influence (low) a priori probability (low),

which gives us what we have not been given by Shannon, the foundations of a conceptual definition of information, something like:

That part of the message which is freely chosen is information; and that which is more or less freely chosen is more or less information depending on the principle of redundancy (including the “residue of influence” corollary) which determines what part of a message is necessary and what is unnecessary using probability ra- tios. A message that has high probability contains more redundancy, and therefore less information content.

To summarize this point: in terms of how best to transmit a message from the point of view of savings in time and channel capacity, which is the purpose of Shannon’s theory, the “residue of influence” corollary of Shannon’s principle

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE-May 1993 207

Page 5: Shannon revisited: Information in terms of uncertainty

of redundancy means that letters or words are more or less likely to be freely chosen and more or less influenced. This leads into the main part of the principle of redundancy, that letters or words which are more less freely chosen and more or less influenced can also be thought of as more or less redundant or unnecessary. The unnecessary part can be deleted from the message (conceptually, via efficient coding), measured out using concepts from probability and physics that Shannon links: uncertainty and entropy.

Shannon’s Concept of Uncertainty

In answer to the question:

Can we define a quantity which will measure, in some sense, how much information is “produced” by such a process, or better, at what rate information is produced? (Shannon & Weaver, 1949, p. 18)

Shannon introduces a new concept, the concept of “uncer- tainty,” which he immediately links with “choice” and then leaves, rephrasing the question:

Can we find a measure of how much “choice” is involved in the selection of the event or of how uncertain we are of the outcome? (Shannon & Weaver, 1949, p. 18)

Shannon finds the “measure” of “choice” and “uncer- tainty” in the concept of entropy: “entropy as a measure of uncertainty” (Shannon & Weaver, 1949, p. 36), borrowing it from “Boltzmann’s famous H theorem” (Shannon & Weaver, 1949, p. 20). The symbol for information entropy is the letter H. “The quantity H,” Shannon writes, “has a number of interesting properties which further substanti- ate it as a reasonable measure of choice or information” (Shannon & Weaver, 1949, p. 20).

Shannon’s Concept of Entropy

The dictionary meaning of the word “entropy” is:

the measure of the degradation or disorganization of the universe (Oxford Dictionary, 1985)

Weaver, Shannon’s collaborator, gives a somewhat fuller picture of what is meant by entropy:

In the physical sciences, the entropy associated with a situation is a measure of the degree of randomness, or of “shuffledness” if you will, in the situation; and the tendency of physical systems to become less and less organized, to become more and more perfectly shuffled . . . (Weaver, 1949, p. 103)

Shannon gives examples of both maximum possible entropy and zero entropy. The maximum possible entropy, says Shannon, is white noise (Shannon & Weaver, 1949, p. 59) where “For a given n, H is a maximum and equal to log n where all the p are equal, i.e.,l/n. This is also intuitively the most uncertain situation” (Shannon & Weaver, 1949, p. 21).

In the opposite situation, the most certain: “If a source can produce only one particular message its entropy is zero”

(Shannon & Weaver, 1949, p. 31). Weaver expands on this (and brings probability into the equation):

In the limiting case where one probability is unity (certainty) and all the others zero (impossibility), then H is zero (no uncertainty at all -no freedom of choice-no information). (Weaver, 1949, p. 105)

From (a) white noise and (b) a single message producing machine, white noise’s opposite, it is possible to sketch in the following relationships between the various concepts:

(a) white noise = entropy (maximum) = uncertainty (max- imum) = probability (even distribution) = maximum information (???)

(b) single message producing machine = entropy (zero) = uncertainty (zero) = probability (1) = “no information”

However, it is very difficult to make sense of this. First, the dictionary definition of entropy as well as the explanation of the entropy provided by Weaver do not seem to be in accord with Shannon’s previous definition of entropy as a measure of information. Now it seems to be a measure of the disorganization of a system.

Second, we are told by Weaver that white noise’s opposite, a single message producing machine, is a “no information” situation. From this, the logical thing to do would be to call white noise a “maximum information” situation, but it is difficult to relate white noise to maximum information. It seems paradoxical.

It is difficult to relate white noise to maximum uncer- tainty as Shannon asks us to do-“This is also intuitively the most uncertain situation” (Shannon & Weaver, 1949, p. 21)-because previously we have been told that uncertainty is a measure of information.

How can these “apparent” discrepancies be explained? As we have seen earlier in the analysis, Shannon’s use of the term “uncertainty” from probability theory measures that part of a message that has not been “influenced” when it is selected from a set of possible messages. Uncertainty is therefore a positive measure of information in a message; the degree of uncertainty is a measure of the statistical independence or the degree of freedom of choice present in the selection of the message-the greater the freedom of choice the greater the uncertainty. The selection with the greatest possible uncertainty is the one that is the least probable.

This is not the case in the white noise situation given above. There the entropy is at its maximum, which means the degree of probability is the same for all the possible messages in the system (with white noise, Shannon has switched the discussion to a global property of the system rather than a message by message choice between alterna- tive messages); the probability is not 0 (or as close to 0 as probability can be ), which would be the selection with the most uncertainty. Therefore, in the white noise situation, as well as in the descriptions of the relationships of the various concepts given by Shannon and Weaver above, uncertainty and probability have parted company, and uncertainty and information have parted company.

208 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE-May 1993

Page 6: Shannon revisited: Information in terms of uncertainty

Weaver seems to agree with us on this particular point (the exclamation mark at the end of the following quote is his):

But if [with noise] the uncertainty is increased, the information is increased, and this sounds as though the noise were beneficial! (Weaver, 1949, p. 109)

He then goes on to point out that the paradox is actu- ally caused by a “semantic trap” inherent in the word “uncertainty”:

This is a situation which beautifully illustrates the se- mantic trap into which one can fall if he does not remember that “information” is used here with a special meaning that measures freedom of choice and hence uncertainty as to what choice has been made. (Weaver, 1949, p. 109)

Thus, Weaver links information to the uncertainty asso- ciated with the degree of freedom of choice in selecting a message-as we have done-but he goes further. He divides Shannon’s uncertainty concept into two: that which is associated with information; and that which is associated with noise (and possibly errors due to the encoding and decoding of the message). Weaver calls these two kinds of uncertainties “desirable uncertainty” and “undesirable uncertainty” (Weaver, 1949, p 109).

The division of uncertainty into two-desirable un- certainty associated with information and undesirable uncertainty associated with noise-allows us to define information concept I in conceptual terms (which, to repeat, Shannon does not do and does not wish to do):

Information = the uncertainty associated with the degree of freedom of the act or process whereby one message is chosen or selected from a set of possible messages; the greater the probability of one message being selected over the others the less the uncertainty, and hence the less the information content of the message to be sent.

Conclusion

Shannon’s concept of uncertainty has a divided or double-edged meaning in Shannon’s information theory. According to Weaver, there is “desirable” uncertainty and “undesirable” uncertainty. Each is associated with a different information concept in Shannon. “Desirable” un- certainty is associated with what we have called Shannon’s information concept I, the information associated with the selection of the message at the source. “Undesirable” uncertainty is associated with what we have called Shannon’s information concept II-information from the point of view of the receiver of the message and the reduction of uncertainty from the signal to recover lost information due to noise.

We can appreciate the attractiveness of Shannon’s infor- mation concept II. Not only does it offer a common sense explanation of what we intuitively think information is, but there are other things as well that favor it which will be

briefly dealt with here, namely the double-edged meanings of the words “uncertainty” and “entropy.”

As we have seen, Shannon’s concept of uncertainty divides into desirable and undesirable uncertainty, but the word “uncertainty,” too, has psychological connotations, not only in everyday language but in probability theory.

There are’two schools of thought in present day proba- bility theory: the subjectivists (or Bayesians) and the objec- tivists (or frequentists). The objectivists base their theory on “the relative frequency with which an event occurs within a sequence of occurrences” (Popper, 1959); while, for the subjectivists, the purpose of enquiry is to make “our belief or expectation of the event more steady and secure” (Hume, 1975, p. 56, Sec. 46).

Shannon’s stated purpose in writing his theory is, fol- lowing Hartley (whose study Shannon cites, along with Nyquist’s, as being the basis for his communication theory), explicitly objectivist:

. . . to eliminate the psychological factors involved and to establish a measure of information in terms of purely physical quantities. (Hartley, 1928)

However, in his use of the word “uncertainty,” Shannon, on a number of occasions, leaves the impression that he is referring to the feeling of certainty rather than the certainty of the event occurring (e.g., see Shannon & Weaver, 1949, pps. 18, 36). Whether Shannon is at heart a subjectivist or not, as some people think he is (Jaynes, 1978) he is generally considered to be an objectivist. However, there is some indication that he is guilty of what Carnap (1950) calls “primitive psychologism” (see also Freudenthal, 1968; Shafer, 1976). The point we wish to make here is that this subjectivist tendency in his use of the word “uncertainty” favors information concept II and the reduction of uncertainty over information concept I because it accords with our instinctive understanding of what information does (i.e., relieve anxiety).

Shannon’s concept of entropy divides similarly: it has one meaning as an information measure, when he talks about the information associated with the message before it is sent; and it acquires another meaning when “noise” is introduced into the theory. As a measure of the information content of the message before it is sent, Shannon is said to have used the term entropy because it had a mathematical formula similar to his information measure (Brillouin, 1962; Rothstein, 1990). There is still controversy about whether or not Shannon’s information entropy can be linked to physical entropy in any real way, or whether it is simply “analogic.” There are those who say it can be, notably Brillouin (1962) and there are those who say it cannot (Bartlett, 1975; Boulding, 1956; Denbigh, 1990; Jauch & Baron, 1990; MacKay, 1983; Weinberg, 1990; Zunde, 1981). This debate centers on a study by Szilard (1964-1929 in the original German), and the term “missing information”-both of which are cited by Weaver in a footnote and presumably influenced Shannon’s thinking. Following Brillouin’s lead, the negative form of entropy, negentropy, is used today in information science.

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE-May 1993 209

Page 7: Shannon revisited: Information in terms of uncertainty

Szilard’s 1929 study offered a possible solution to a famous thought experiment involving Maxwell’s Demon. The Demon, using information obtained by a measuring de- vice, was able to choose which particles entered a chamber, thus bringing about a contravention of the second law of thermodynamics (the tendency of a closed physical system toward complete disorganization, maximum randomness, and even distribution, which is its most probable state). The information obtained by the Demon’s act of measuring “interfered” with the natural propensity of the system toward maximum entropy.

Such measurements are not harmless interventions. A system in which such measurements occur shows a sort of memory faculty. (Szilard, 1929/1964).

Information (obtained by the Demon’s act of measuring), therefore, reduces the entropy in a closed physical system, leading us once again to what we have called Shannon’s information concept II equation:

information = the reduction of entropy.

However, in a article written three years later, Shannon reaffirmed his commitment to entropy as a measure of information:

The entropy is a statistical parameter which measures, in a certain sense, how much information is produced, on the average, for each letter of text in the language. (Shannon, 1951, p. 50)

Brillouin says it is a mistake:

. . he [Shannon] defined entropy with a sign just the opposite to that of the standard thermodynamical dcfini- tion. Hence, what Shannon called entropy of information actually represents negentropy. (Brillouin, 1962)

Did Shannon deliberately change a sign-for convenience or because it was not important to his purposes?-or has Brillouin, by linking Shannon’s information entropy so literally to “the standard thermodynamical definition,” reinforced, like the psychological overtones of the word “uncertainty, ” what we have labeled Shannon’s information concept II-the intuitively plausible notion that informa- tion reduces disorder, entropy, uncertainty, noise in the signal-to the detriment of Shannon’s information concept I, the information associated with the selection of the message at the source? Noise considerations are perhaps Shannon’s overall objective (see the opening sentence in Shannon’s Introduction), but it is not, we argue, the center of what Shannon has to say to information science about what information is.

Finally, Shannon’s information concept I is interesting for its own sake, for the way it uses and links various concepts in an unusual, thought-provoking way.

If we go back to our “extension” of Shannon’s opera- tional definition to a conceptual definition of information:

Information = the uncertainty associated with the degree of freedom of the act or process whereby one message is chosen or selected from a set of possible messages; the greater the probability of one message being selected over the others the less the uncertainty, the less the information content of the message to be sent.

There are two things of interest to be drawn from the relationships Shannon makes linking these various concepts to information.

The first thing of interest is Shannon’s notion that the least probable choice from a set of possible alternatives contains more information than the most probable choice. This is similar to Popper’s contention that a scientific theory or hypothesis that is improbable has high information content (Popper, 1959).

The second thing of interest is Shannon’s notion that information has something to do with the selecting or choosing a message from a set of possible alternatives. We have extended this notion in the conceptual definition above by inserting the words “act or process” to bring it in line with information science’s current discussion on the topic of what information is (Buckland, 1991).

If we could conceive of the set of messages being mental representations of concepts or knowledge structures inside a person’s memory, what could the “set” of possible messages be when that person becomes informed? And of what would the informing process consist when selecting one of these messages? If a person reads something that sets off the informing process, this process might automatically bring forth a set of possible message-type knowledge structures from memory, each of which provides a different explanation for what has just been read. The person then selects one. The more improbable the selection the more in- formation associated with the particular informing process. An improbable selection might lead to things like insight or a turning point in the person’s research.

Acknowledgments

The author gratefully acknowledges the constructive comments of Professor T. D. Wilson and Dr. David Ellis of the University of Sheffield, and two unknown referees.

References

Artandi, S. (1973). Information concepts and their utility. Journal of the American Society for Information Science, 24, 242-245.

Barnes, R.F. Jr. (1975). Information and decision. In A. Debons and W. .I. Cameron (Eds.), Perspectives in information science (pp. 105 - 117). Leyden, The Netherlands: Noordhoff.

Bartlett, M. S. (1975). Probability, statistics and time. London: Chap- man & Hall.

Belkin, N. J. (1978). Information concepts for information science. Journal of Documentation, 34, 55-85.

Belkin, N. J., & Vickery, A. (1985). Interaction in information sys- tems: A review of research from document to knowledge-based sys- tems. Library and information research report 35. London: British Library.

Belzer, J. (1973). Information theory as a measure of information content. Journal of the American Society for Information Science, 24, 300-304.

210 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE-May 1993

Page 8: Shannon revisited: Information in terms of uncertainty

Berlyne, D. E. (1960). Conflict, arousal, and curios@. New York: McGraw-Hill.

Berlyne, D. E. (1965). Structure and direction in thinking. New York: Wiley.

Bookstein, A., & Klein, S. T. (1990). Compression, information the- ory, and grammars: A unified approach. ACM Transactions on Information Systems, 8, 27-49.

Boulding, K. E. (1956). The image. Ann Arbor, MI: University of Michigan Press.

Brillouin, L. (1962). Science and information theory. New York: Academic Press.

Buckland, M. K. (1990). Information as thing. Journal ofthe American Society for information Science, 42, 351-360.

Carnap, R. (1950). Logical foundations of probability. Chicago: The University of Chicago Press.

Denbigh, K. (1990). How subjective is entropy? In H. S. Leff and A. F. Rex (Eds.), Maxwell’s Demon: Entropy, information, computing (pp. 109- 115). Princeton, NJ: Princeton University Press.

Dubois, D., & Prade, H. (1991). Measuring and updating information. Information Sciences, 57158, 181- 195.

Freudenthal, H. (1968). Realistic models in probability. In I. Lakatos (Ed.), The problem of inductive logic. Amsterdam: North-Holland.

Garner, W. R. (1962). Uncertuinty and structure as psychological concepts. New York: Wiley.

Hartley, R. V. L. (1928). Transmission of information. Bell System Technical Journal, 7, 535-563.

Hollnagel, E. (1980). Is information science an anomalous state of knowledge? Journal of Information Science, 2, 183-187.

Horgan, J. (1992). Quantum philosophy. Scientific American, 267, 94-104.

Hume, D. (1975). Enquiries concerning human understanding and concerning the principles of morals. Oxford: Clarendon Press.

Iser, W. (1978). The act of reading: A theory of aesthetic response. Baltimore, MD: The Johns Hopkins University Press.

Jauch, J. M., & Baron, J. G. (1990). Entropy, information and Szilard’s Paradox. In H. S. Leff and A. F. Rex (Eds.), Maxwell’s Demon: Entropy, information, computing (pp. 160-172). Princeton, NJ: Princeton University Press.

Jaynes, E. (1978). Where do we stand on maximum entropy? In R. D. Levin and M. Tribus (Eds.), The maximum entropy formalism: A conference held at the Massachusetts Institute of Technology on May 2-4, 1978. Cambridge, MA: The MIT Press.

Klir, G. J., & Folger, T. (1988). Fuzzy sets, uncertainty, and informa- tion. Englewood Cliffs, NJ: Prentice-Hall.

Leff, H.S., & Rex, A. F. (Eds.) (1990). Maxwell’s Demon: Entropy, information, computing. Princeton, NJ: Princeton University Press.

Machlup, F. (1983). Semantic quirks in studies of information. In F. Machlup and U. Mansfield (Ed%), The study of information: Interdisciplinary messages. (pp. 641-671). New York: Wiley.

Mackay, D.M. (1983). The wider scope of information theory. In F. Machlup and U. Mansfield (Eds.), The study of information: Interdisciplinary messages. (pp. 485-492). New York: Wiley.

Meghabghab, G., & Bilal, D. (1991). Application of information theory to query negotiation: Toward an optimal questioning strat- egy. Journal of the American Society for Information Science, 42, 457-462.

Nauta, D. (1972). The meaning of information. The Hague: Mouton. Oxford dictionary of current English (1985). Oxford: Oxford Univer-

sity Press. Popper, K. R. (1959). The logic of scientific discovery. Toronto: Uni-

versity of Toronto Press. Rothstein, J. (1990). Information, measurement, and quantum mechan-

ics. In H. S. Leff and A. F. Rex (Eds.), Maxwell’s Demon: Entropy, information, computing (pp. 104-108). Princeton, NJ: Princeton University Press.

Shafer, G. (1976). A mathematical theory of evidence. Princeton, NJ: Princeton University Press.

Shannon, C. E. (1951). Prediction and entropy of printed English. Bell System Technical Journal, 30, 50-62.

Shannon, C.E., & Weaver, W. (1949). The mathematical theory of communication. Urbana, IL: The University of Illinois Press.

Szilard, L. (192911964). On the decrease of entropy in a thermo- dynamic system by the intervention of intelligent beings. Behavioral Science, 9, 301-310.

Weaver, W. (1949). Recent contributions to the mathematical theory of communication. In C. E. Shannon & W. Weaver, The mathematical theory of communication (pp. 94-117). Urbana, IL: The University of Illinois Press.

Weinberg, A.M. (1990). On the relation between information and energy systems: A family of Maxwell’s Demons. In H.S. Leff and A.F. Rex (Eds.), Maxwell’s Demon: Entropy, information, computing (pp. 116-121). Princeton, NJ: Princeton University Press.

Yovits, M. C., Foulk, C. R., & Rose, L. (1981). Information flow and analysis: Theory, simulation and experiments. 1. Basic theoretical and conceptual developments. Journal of the American Society for Information Science, 32, 187-202.

Zunde, P. (1981). Information theory and information science. Infor- mation Processing and Management, 17, 341-347.

Zunde, P. (1984). Selected bibliography on information theory appli- cations to information science and related subject areas. Information Processing and Management, 20, 417-497.

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE-May 1993 211