a mathematical theory of communication shanon

Upload: mohammed-morsy

Post on 07-Apr-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/4/2019 A Mathematical Theory of Communication Shanon

    1/53

    A M a t h e m a t i c a l T h e o r y o f C o m m u n i c a t io n *C . E . S h a n n o n

    INTRODUCTIONT H E r e c en t d e v e l o p m e n t o f v ar i ou s m e t h o d s o f m o d u l a t io n s u c h a s P C M a n d P P M w h i c h e x c h a n g e b a n d -w i d t h f o r s i g n a l - t o - n o is e r a t i o h a s i n t e n s i f ie d t h e i n t er e s t in a g e n e r a l t h e o r y o f c o m m u n i c a t i o n . A b a s i s f o rs u c h a t h e o r y i s c o n t a in e d i n t h e i m p o r t a n t p a p e r s o f N y q u i s t 1 a n d H a r t l e y 2 o n t h i s s u b j e c t . I n t h e p r e s e n t p a p e rw e w i l l e x t e n d t h e t h e o r y t o i n c l u d e a n u m b e r o f n e w f a c t o rs , i n p a r t ic u l a r t h e e f f e c t o f n o i s e i n th e c h a n n e l ,a n d t h e s a v i n g s p o s s i b l e d u e t o t h e s t a t is t i c a l st r u c t u r e o f t h e o r i g i n a l m e s s a g e a n d d u e t o t h e n a t u r e o f th e f i n a ld e s t i n a t io n o f t h e i n f o r m a t i o n .

    T h e f u n d a m e n t a l p r o b l e m o f c o m m u n i c a t i o n i s t h a t o f r e p r o d u c i n g a t o n e p o i n t e i th e r e x a c t l y o r a p p r o x i-m a t e l y a m e s s a g e s e l e ct e d a t a n o t h e r p o in t . F r e q u e n t l y t h e m e s s a g e s h a v e meaning; t h a t i s t h e y r e f e r t o o r a r ec o r r e l a te d a c c o r d i n g t o s o m e s y s t e m w i t h c e r ta i n p h y s i c a l o r c o n c e p t u a l e n t it i es . T h e s e s e m a n t i c a s p e c t s o f c o m -m u n i c a t i o n a r e i rr e l e v a n t to th e e n g i n e e r i n g p r o b l e m . T h e s i g n i f i c a n t a s p e c t i s t h a t t h e a c t u a l m e s s a g e i s o n eselected from a set o f p o s si b l e m e s s a g e s . T h e s y s t e m m u s t b e d e s i g n e d t o o p e r a t e f o r e a c h p o s s ib l e s e l e c t io n , n o tj u s t t h e o n e w h i c h w i l l ac t u a l l y b e c h o s e n s i n c e th i s is u n k n o w n a t t h e t i m e o f d e s i g n .

    I f th e n u m b e r o f m e s s a g e s i n t h e s e t i s fi n it e t h e n t h i s n u m b e r o r a n y m o n o t o n i c f u n c t i o n o f th i s n u m b e r c a nb e r e g a r d e d a s a m e a s u r e o f t h e i n f o r m a t i o n p r o d u c e d w h e n o n e m e s s a g e i s c h o s e n f r o m t h e s e t , a l l c h o i c e s b e i n ge q u a l l y l ik e l y. A s w a s p o i n t e d o u t b y H a r t l e y t h e m o s t n a t u r a l c h o i c e i s t h e l o g a r i t h m i c f u n c t i o n . A l t h o u g h t h i sd e f i n i ti o n m u s t b e g e n e r a l i z e d c o n s i d e r a b l y w h e n w e c o n s i d e r t h e i n f l u e n c e o f th e s t a t is t ic s o f t h e m e s s a g e a n dw h e n w e h a v e a c o n t i n u o u s r a n g e o f m e s s a g e s , w e w i l l i n a ll c a se s u s e a n e s s e n t i a l ly l o g a r i t h m i c m e a s u r e .

    T h e l o g a r i t h m i c m e a s u r e i s m o r e c o n v e n i e n t f o r v a ri o u s r e a so n s :1 . I t i s p r a c t i c a ll y m o r e u s e f u l . P a r a m e t e r s o f e n g i n e e r i n g i m p o r t a n c e s u c h a s t im e , b a n d w i d t h , n u m b e r o f

    r e l a y s, e t c ., t e n d t o v a r y l i n e a r ly w i t h t h e l o g a r i t h m o f t h e n u m b e r o f p o s s ib i l it i es . F o r e x a m p l e , a d d i n g o n er e l a y t o a g r o u p d o u b l e s t h e n u m b e r o f p o s s i b l e s t a t e s o f t h e r e l a y s . I t a d d s 1 t o t h e b a s e 2 l o g a r i t h m o f t h i sn u m b e r . D o u b l i n g t h e t im e r o u g h l y s q u a re s t h e n u m b e r o f p o s s i b le m e s s a g e s , o r d o u b l e s t h e lo g a r i t h m ,e tc .

    . I t is n e a r e r t o o u r i n t u i t i v e f e e l i n g a s t o t h e p r o p e r m e a s u r e . T h i s i s c l o s e l y r e l a t e d t o ( 1 ) s i n c e w e i n t u i t i v e l ym e a s u r e s e n t it i es b y l i n e ar c o m p a r i s o n w i t h c o m m o n s t a n d a r d s . O n e f e e ls , f o r e x a m p l e , t h a t t w o p u n c h e dc a r d s s h o u l d h a v e t w i c e t h e c a p a c i t y o f o n e f o r i n f o r m a t i o n s t o r ag e , a n d t w o i d e n t i c a l c h a n n e l s t w i c e t h ec a p a c i t y o f o n e f o r t r a n s m i t t i n g i n f o r m a t i o n .

    3 . I t i s m a t h e m a t i c a l l y m o r e s u i t a b le . M a n y o f th e l i m i t i n g o p e r a ti o n s a r e s i m p l e i n t e r m s o f th e l o g a r i t h mb u t w o u l d r e q u i re c l u m s y r e s t a t e m e n t i n t e r m s o f t h e n u m b e r o f p o ss i b il i ti e s .

    T h e c h o i c e o f a l o g a r i th m i c b a s e c o r r e s p o n d s t o t h e c h o ic e o f a u n i t f o r m e a s u r i n g i n f o r m a t i o n . I f t h e b a s e 2i s u s e d t h e r e s u l t i n g u n i t s m a y b e c a l l e d b i n a r y d i g i t s , o r m o r e b r i e f l y bits, a w o r d s u g g e s t e d b y J . W . T u k e y . Ad e v i c e w i t h t w o s t a b l e p o s i t i o n s , s u c h a s a r e l a y o r a f l i p - fl o p c i r c u i t , c a n s to r e o n e b i t o f i n f o r m a t i o n . N s u c hd e v i c e s c a n s t o r e N b i t s, s i n c e t h e t o t a l n u m b e r o f p o s s i b l e s t a t es i s 2 u a n d l o g 2 2 N = N . I f t h e b a s e 1 0 is u s e dt h e u n i ts m a y b e c a l l e d d e c i m a l d i g i t s. S i n c e

    log 2 M = log 10M~ lo g 10 2= 3 . 321og10 M ,

    *Reprinted for the Bell System Technical Journal with corrections. Copyright 1948. Lucent TechnologiesInc . All rights reserved.1Nyquist, H., "Certain Factors Affecting Telegraph Speed," Bell System Technical Journal, Ap ril 1924, p. 3 24; "Certain Topics inTelegraph Transmission Theory,"A.LE.E. Trans., v. 47, Apr il 1928, p. 617.2Hartley, R. V. L., "Transm ission of Info rmation," Bell System Technical Journal, July 1928, p. 535.Mobile Computing and Communications Review, Volume 5, Number I 3

  • 8/4/2019 A Mathematical Theory of Communication Shanon

    2/53

    INFORMATIONSOURCE TRANSMITTER

    MESSAGE

    _[S GNALRsEFGEiRECEIVER DESTINATIONHESSAGE

    NOISESOURCEF i g . 1 - - S c h e m a t i c d i a g r a m o f a g e n e r a l c o m m u n i c a t io n s y st e m .

    a d e c i m a l d i g i t i s a b o u t 3 b i ts . A d i g i t w h e e l o n a d e s k c o m p u t i n g m a c h i n e h a s t e n s t a b le p o s i t i o n s a n d t h e r e f o r eh a s a s t o r a g e c a p a c i t y o f o n e d e c i m a l d i g it . I n a n a l y t i c a l w o r k w h e r e i n t e g r a t i o n a n d d i f f e r e n t i a t i o n a re i n v o l v e dt h e b a s e e is s o m e t i m e s u s e f u l . T h e r e s u l ti n g u n i t s o f i n f o r m a t i o n w i l l b e c a l le d n a t u r a l u n it s . C h a n g e f r o m t h eb a s e a t o b a s e b m e r e l y r e q u i re s m u l t i p l i c a t i o n b y l o g b a .

    B y a c o m m u n i c a t i o n s y s t e m w e w i l l m e a n a s y s t e m o f t h e t y p e in d i c a t e d s c h e m a t i c a l l y i n F i g . 1 . I t c o n s i s tso f e s s e n t i a l l y f i v e p a r t s:

    . A n information source w h i c h p r o d u c e s a m e s s a g e o r s eq u e n c e o f m e s s a g e s t o b e c o m m u n i c a t e d to t h er e c e i v i n g t e rm i n a l . T h e m e s s a g e m a y b e o f v a r i o u s ty p e s : ( a ) A s e q u e n c e o f le t te r s a s i n a t e l e g r a p h o ft e l e t y p e s y s t e m ; ( b ) A s i n g le f u n c t i o n o f t im e f ( t ) a s i n ra d i o o r t e l e p h o n y ; ( c ) A f u n c t i o n o f t i m e a n d o t h e rv a r i ab l e s a s in b l a c k a n d w h i t e t e l e v i s io n - - h e r e t h e m e s s a g e m a y b e t h o u g h t o f a s a f u n c t i o n f ( x , y , t ) o ft w o s p a c e c o o r d i n a t e s a n d t i m e , t h e l i g h t i n t e n s i t y a t p o i n t (x,y) a n d t i m e t o n a p i c k u p t u b e p l a t e ; ( d ) T w oo r m o r e f u n c t i o n s o f t im e , s a y f ( t ) , g ( t ) , h ( t ) - - t h is i s t h e c a s e i n " t h r e e - d i m e n s i o n a l " s o u n d t r a n s m i s s i o no r i f t h e s y s t e m i s i n t e n d e d t o s e r v i c e s e v e r a l i n d i v i d u a l c h a n n e l s i n m u l t i p l e x ; ( e ) S e v e r a l f u n c t i o n s o fs e v e ra l v a r ia b l e s - - i n c o l o r t e le v i s i o n t h e m e s s a g e c o n s i s ts o f t h re e f u n c t i o n s f( x, y, t), g(x, y, t), h(x, y, t)d e f i n e d i n a t h r e e - d i m e n s i o n a l c o n t i n u u m - - w e m a y a l s o t h i n k o f t h e s e th r e e f u n c t i o n s as c o m p o n e n t s o fa v e c t o r f ie l d d e fi n e d i n th e r e g i o n - - s i m i l ar l y , se v e r al b l a c k a n d w h i t e t e l e v i s i o n s o u r c e s w o u l d p r o d u c e" m e s s a g e s " c o n s i s t in g o f a n u m b e r o f f u n c t i o n s o f t h re e v a r i a b le s ; ( f ) V a ri o u s c o m b i n a t i o n s a l so o c c u r , f o re x a m p l e i n t e l e v i si o n w i t h a n a s s o c i a t e d a u d i o c h a n n e l .

    . A transmitter w h i c h o p e r a t e s o n t h e m e s s a g e i n s o m e w a y t o p r o d u c e a s i g n a l s u i t a b l e f o r t r a n s m i s s i o no v e r t h e c h a n n e l . I n t e l e p h o n y t h i s o p e r a ti o n c o n s i s t s m e r e l y o f c h a n g i n g s o u n d p r e s s u r e i n t o a p r o p o r -t i o n a l e l e c tr i c a l c u rr e n t. I n t e l e g r a p h y w e h a v e a n e n c o d i n g o p e r a t i o n w h i c h p r o d u c e s a s e q u e n c e o f d o ts ,d a s h e s a n d s p a c e s o n t h e c h a n n e l c o r r e s p o n d i n g t o th e m e s s a g e . I n a m u l t i p l e x P C M s y s t e m t h e d i f f e r e n ts p e e c h f u n c t i o n s m u s t b e s a m p l e d , c o m p r e s s e d , q u a n t i z e d a n d e n c o d e d , a n d f i n a l l y i n t e r l e av e d p r o p e r l y t oc o n s t r u c t t h e si g n a l. V o c o d e r s y s t e m s , t e l ev i s i o n a n d f r e q u e n c y m o d u l a t i o n a r e o t h e r e x a m p l e s o f c o m p l e xo p e r a t i o n s a p p l i e d t o t h e m e s s a g e t o o b t a i n t h e s i g n a l.

    3 . T h e channel i s m e r e l y t h e m e d i u m u s e d t o t r a n s m i t t h e s i g n a l f r o m t r a n s m i t t e r t o re c e iv e r . I t m a y b e a p a iro f w i r e s , a c o a x i a l c a b l e , a b a n d o f r a d i o f r e q u e n c i e s , a b e a m o f l ig h t , e tc .

    4 . T h e receiver o r d i n a r i l y p e r f o r m s t h e i n v e r se o p e r a t i o n o f t h a t d o n e b y t h e t r a n sm i t t e r, r e c o n s t r u c t i n g t h em e s s a g e f r o m t h e s ig n a l .

    5 . T h e destination i s t h e p e r so n ( o r th i n g ) f o r w h o m t h e m e s s a g e i s i n t e n d e d .W e w i s h t o c o n s i d e r c e r ta i n g e n e r a l p r o b l e m s i n v o l v i n g c o m m u n i c a t i o n s y s t e m s . T o d o t h i s it is f ir s t n e c -

    e s s a r y t o r e p r e s e n t t h e v a r i o u s e l e m e n t s i n v o l v e d a s m a t h e m a t i c a l e n t i ti e s , s u i t a b ly i d e a l i z e d f r o m t h e i r p h y s i c a lc o u n t e r p a rt s . W e m a y r o u g h l y c l a s s i f y c o m m u n i c a t i o n s y s t e m s i n t o t h r e e m a i n c a t e g o ri e s : d i s c r e t e , c o n t i n u o u sa n d m i x e d . B y a d i s c r et e s y s t e m w e w i l l m e a n o n e i n w h i c h b o t h t h e m e s s a g e a n d t h e s i g n a l a re a s e q u e n c e o fd i s c r e t e s y m b o l s . A t y p i c a l c a s e i s t e l e g r a p h y w h e r e t h e m e s s a g e i s a s e q u e n c e o f l e tt e r s a n d t h e s i g n a l a s e -q u e n c e o f d o t s, d a s h e s a n d s p a c e s . A c o n t i n u o u s s y s t e m i s o n e in w h i c h t h e m e s s a g e a n d s i g n a l a r e b o t h t re a t e d

    Mobile Computing and Communications Review, Volume 5, Number ;!

  • 8/4/2019 A Mathematical Theory of Communication Shanon

    3/53

    as continuous functions, e.g., radio or television. A mixed system is one in which both discrete and continuousvariables appear, e.g., PCM transmission of speech.

    We first consider the discrete case. This case has applications not only in communic ation theory, but also inthe theory of computing machines, the design of telephone exchanges and other fields. In addition the discretecase forms a founda tion for the continuous and mixed cases which will be treated in the second half of the paper.

    PART I: DISCRETE NOI SELESS SYSTEMSI . T H E D I S C R E T E N O I S E L E S S C H A N N E L

    Teletype and telegraphy are two simple examples of a discrete channel for transmitting information. Generally,a discrete channel will mean a system whereby a sequence of choices from a finite set of elementary symbols$1 ,.. ., S, can be transmitted from one point to another. Each of the symbols Si is assumed to have a certain dura-tion in time ti seconds (not necessarily the same for different Si, for example the dots and dashes in telegraphy).It is not required that all possible sequences of the Si be capable of transmission on the system; certain sequencesonly may be allowed. These will be possible signals for the channel. Thus in telegraphy suppose the symbols are:(1) A dot, consisting of line closure for a unit of time and then line open for a unit of time; (2) A dash, consistingof three time units of closure and one unit open; (3) A letter space consisting of, say, three units of line open;(4) A word space of six units of line open. We might place the restriction on allowable sequences that no spacesfollow each other (for if two letter spaces are adjacent, it is identical with a word space). The question we nowconsider is how one can measure the capacity of such a channel to transmit information.

    In the teletype case where all symbols are of the same duration, and any sequence of the 32 symbols isallowed the answer is easy. Each symbol represents five bits of information. If the system transmits n symbolsper second it is natural to say that the channel has a capacity of 5n bits per second. This does not mean thatthe teletype channel will always be transmitting information at this rate - - this is the maximum possible rateand whether or not the actual rate reaches this maximum depends on the source of information which feeds thechannel, as will appear later.

    In the more general case with different lengths of symbols and constraints on the allowed sequences, wemake the following definition:Definition: The capacity C o f a discrete channel is given by

    C=Liml'~N'T'~]T--+~ Twhere N ( T ) is the number of allowed signals of duration T.

    It is easily seen that in the teletype case this reduces to the previous result. It can be shown that the limit inquestion will exist as a finite number in most cases of interest. Suppose all sequences of the symbols $1 ,. . ., Snare allowed and these symbols have durations q, . . . , tn. What is the channel capacity? If N(t) represents thenumber of sequences o f duration t we have

    N(t) = N( t - tl ) + N( t - t2) +. .. + N (t - tn).The total number is equal to the sum of the numbers of sequences ending in S1,S2,. . . ,Sn and these are N( t -q ) , N ( t - t2),. . . ,N( t - t n ) , respectively. According to a well-known result in finite differences, N(t) is thenasymptoti c for large t to X~ where X0 is the largest real solution of the characteristic equation:

    x-t1 _q_x-t 2 +. . . + x- t" = 1and therefore

    C = logX0.In case there are restrictions on allowed sequences we may still often obtain a difference equation of this type

    and find C from the characteristic equation. In the telegraphy case mentioned aboveN(t) ----N ( t - 2) + N ( t - 4 ) + N ( t - 5) + N ( t - 7) + N ( t - 8) + N ( t - 10)

    Mobile Computing and Communications Review, Volume 5, Number I

  • 8/4/2019 A Mathematical Theory of Communication Shanon

    4/53

    as we see by counting sequences of symbols according to the last or next to the last symbol occurring. Hence Cis - log# 0 where #0 is the positive root of 1 = ~2.4_ ~4 q_ #5 ..t..~7 ~_ ]AS_~ ~1 0. Solving this we find C = 0.539.

    A very general type of restriction which may be placed on allowed sequences is the following: We imaginea number of possible states a l , a 2 , . . . , a m . For each state only certain symbols from the set $1,... ,Sn can betransmitted (different subsets for the different states). When one of these has been transmitted the state changesto a new state depending both on the old state and the particular symbol transmitted. The telegraph case is a simpleexample of this. There are two states depending on whether or not a space was the last symbol transmitted. Ifso, then only a dot or a dash can be sent next and the state always changes. If not, any symbol can be transmittedand the state changes if a space is sent, otherwise it remains the same. The conditions can be ind icated in a lineargraph as shown in Fig. 2. The junctio n points correspond to the states and the lines indicate the symbols possible

    DASH

    SHWORD SPACE

    Fig. 2--Graphical representation of the constraints on telegraph symbols.in a state and the resulting state. In Appendix 1 it is shown that if the conditions on allowed sequences can bedescribed in this form C will exist and can be calculated in accordance with the following result:

    bl,S. be the duration of the s th sy mb ol whi ch is all owab le in state i and leads to state j . Thenheorem 1: Letthe channel capacity C is equal to logW where W is the largest real root o f the determinant equation:

    I ~ ~ (s)s~ W -vi j - - t~ij : 0

    w h e r e ~ij -~- 1 if i = j and is zero otherwise.For example, in the telegraph case (Fig. 2) the determinant is:

    ( w - 3 + W - 6 ) ( W - 2 -4- W - 4 )- 1 ( W _ 2 _ ~ W _ 4 _ 1 ) = 0 .On expansion this leads to the equation given above for this case.

    I I . T H E D I S C R E T E S O U R C E O F I N F O R M A T I O NWe have seen that under very general conditions the logarithm of the number of possible signals in a discretechannel increases linearly with time. The capacity to transmit information can be specified by giving this rate ofincrease, the number of bits per second required to specify the particular signal used.

    We now consider the information source. How is an informat ion source to be described mathematically, an~dhow much information in bits per second is produced in a given source? The main point at issue is the effectof statistical knowledge about the source in reducing the required capacity of the channel, by the use of properencoding of the information. In telegraphy, for example, the messages to be transmitted consist of sequencesof letters. These sequences, however, are not completely random. In general, they form sentences and have thestatistical structure of, say, English. The letter E occurs more frequently than Q, the sequence TH more frequen tlythan XP, etc. The existence of this structure allows one to make a saving in time (or channel capacity) by proper]Lyencoding the message sequences into signal sequences. This is already done to a limited extent in telegraphy byusing the shortest channel symbol, a dot, for the most com mon English letter E; while the infrequent letters, Q, X,Z are represented by longer sequences of dots and dashes. This idea is carried still further in certain commercialcodes where common words and phrases are represented by four- or five-letter code groups with a considerablesaving in average time. The standardized greeting and anniversary telegrams now in use extend this to the pointof encoding a sentence or two into a relatively short sequence of numbers.6 Mobile Computing and Communications Review, Volume 5, Number 1

  • 8/4/2019 A Mathematical Theory of Communication Shanon

    5/53

    W e c a n t h i n k o f a d is c r e te s o u r c e a s g e n e r at i n g t h e m e s s a g e , s y m b o l b y s y m b o l . I t w i l l c h o o s e s u c c e s s i v esymbol s accord ing to ce r t a in p robab i l i t i e s depend ing , i n gene ra l , on p reced ing cho ice s a s we l l a s t he pa r t i cu l a rs y m b o l s in q u e s t io n . A p h y s i c a l s y s t e m , o r a m a t h e m a t i c a l m o d e l o f a s y s t e m w h i c h p r o d u c e s s u c h a s e q u e n c eo f s y m b o l s g o v e r n e d b y a s e t o f p r o b a b il i ti e s , i s k n o w n a s a s t o c h a s ti c p r o c e s s . 3 W e m a y c o n s i d e r a d i s c r e tesource , the re fo re , t o be r ep re sen ted by a s tochas t i c p rocess . Conv e rse ly , any s tochas t i c p rocess w hich p rodu cesa d i s cr e t e s e q u e n c e o f s y m b o l s c h o s e n f r o m a f in i te s e t m a y b e c o n s i d e r e d a d i s cr e t e s o u r c e . T h i s w i l l in c l u d esuch ca se s a s :

    1 . Na tu ra l w r i t t en l anguage s such a s Eng l i sh , Ge rman , Ch inese .2 . C o n t i n u o u s i n f o r m a t io n s o u r c e s t ha t h a v e b e e n r e n d e r e d d i s c re t e b y s o m e q u a n t i z in g p r o c e s s . F o r e x a m p l e ,

    t he quan t i zed speech f rom a PCM t ransmi t t e r , o r a quan t i zed t e l ev i s ion s igna l .3 . M a t h e m a t i c a l c a s e s w h e r e w e m e r e l y d e f i n e a b s t r ac t ly a s t o c h a s ti c p r o c e s s w h i c h g e n e r a t e s a s e q u e n c e o f

    s y m b o l s . T h e f o l l o w i n g a r e e x a m p l e s o f th i s l a st t y p e o f s o u r c e .(A) Sup pos e we have f ive l e t te r s A, B , C , D, E which a re chose n each w i th p roba b i l i t y .2 , success ive

    c h o i c e s b e i n g i n d e p e n d e n t. T h i s w o u l d l e a d to a s e q u e n c e o f w h i c h t h e f o l lo w i n g i s a ty p i c a l e x a m p l e .B D C B C E C C C A D C B D D A A E C E E AA B B D A E E C A C E E B A E E C B C E A D .T h i s w a s c o n s t r u c t e d w i t h th e u s e o f a t a b le o f r a n d o m n u m b e r s . 4(B) Us ing the sam e f ive l e t te r s l e t t he p robab i l i t i e s be .4 , . 1, .2 , . 2 , . 1 , r e spec t ive ly , wi th s uccess ive cho ice sindependen t . A typ ica l message f rom th i s source i s t hen :A A A C D C B D C E A A D A D A C E D AE A D C A B E D A D D C E C A A A A A D .

    ( C ) A m o r e c o m p l i c a t e d s t r u ct u r e i s o b t a i n e d i f s u c c e s s i v e s y m b o l s a r e n o t c h o s e n i n d e p e n d e n t l y b u t th e i rp robab i l i t i e s dep end on p reced ing l e t t e rs . In the s imple s t c a se o f th i s t ype a cho ice dep ends on ly onthe p reced ing l e t t e r and no t on ones be fo re t hat . T he s t a t i s ti c a l s t ruc tu re can then be desc r ibe d by a se to f t r ans i ti on p robab i l i t ie s Pi(j ) , t he p robab i l i t y t ha t l e t t e r i i s fo l lowed by l e t t e r j . The ind i ce s i andj r a n g e o v e r a ll t h e p o s s i b l e s y m b o l s . A s e c o n d e q u i v a l e n t w a y o f s p e c i f y in g t h e s t r u c tu r e i s to g i v ethe "d ig ram" p robab i l i t i e s p(i , j ) , i .e . , the r e l a ti ve f r equ ency o f t he d ig ram i j . The l e t t e r f r equenc ie sp(i) , ( t he p robab i l i t y o f l e t t e r i) , the t r ans i t i on p robab i l i t i e s Pi( j ) and the d ig ram probab i l i t i e s p(i, j)a re r e l a t ed by the fo l lowing fo rmula s :

    p ( i) = ~p ( i , j ) = ~p ( j , i ) = ~p ( j ) p j ( i )J J J

    p(i , j ) = p(i )pi( j )ZP i ( J ) = ZP ( i ) = ZP ( i , J ) = 1.j i i,j

    As a spec i f i c example suppose the re a re t h ree l e t t e r s A, B , C wi th t he p robab i l i t y t ab l e s :Pi(J) j

    A B CA 0 4 15 5

    1 1 0i B 2 21 2 1C 2 5 10

    i p(i)9A 2---7

    B ~6272C 2-7

    p ( i , j )

    Ai B

    C

    JA B C

    4 10 1-3 J-~8 8 027 271 4 127 135 135

    A t y p i c al m e s s a g e f r o m t h is s o u r c e i s t h e f o l lo w i n g :3 S ee , fo r ex am p l e , S . Ch an d rasek h a r , "S t o ch as t i c P ro b l em s i n P h y s i c s an d A s t ro n o m y ," Reviews of Modern Physics , v. 15, No. 1,

    J a n u a r y 1 9 4 3 , p . 1 .4 Ken d a l l an d S m i t h , Tables of Random Sampling Numbers, Cam b r i d g e , 1 9 3 9 .

    Mobile Computing and Communications Review, Volume 5, Number I 7

  • 8/4/2019 A Mathematical Theory of Communication Shanon

    6/53

    (D )

    A B B A B A B A B A B A B A B B B A B B B B B A B A B A B A B A B B B A C A C A B B AB B B B A B B A B A C B B B A B A .T h e n e x t i n cr e a s e i n c o m p l e x i t y w o u l d i n v o l v e t r ig r a m f r e q u e n c i e s b u t n o m o r e . T h e c h o i c e oa le t te r w o u l d d e p e n d o n t h e p r e c e d i n g t w o l e tt e r s b u t n o t o n t h e m e s s a g e b e f o r e t h a t p o i n t. Ase t o f t r i g ram f requenc ie s p( i , j , k ) or equ iva l en t ly a se t o f t r ans i t i on p robab i l i t i e s pi j (k ) w o u l d br e q u ir e d . C o n t i n u i n g i n t h is w a y o n e o b t a in s s u c c e s s i v e l y m o r e c o m p l i c a t e d s t o c h a s t i c p r o c e s s e sIn the gene ra l n -g ram case a se t o f n -g ram p robab i l i t i e s p( i l , i 2 , . . . , i n ) or o f t rans i t i on p robab i l i t iePil,i2,...,in_l (in) i s r equ i red to spec i fy t he s t a t is t i c a l s t ruc tu re .S t o c h a s t i c p r o c e s s e s c a n a l s o b e d e f i n e d w h i c h p r o d u c e a t e x t c o n s i s t i n g o f a s e q u e n c e o f " w o r d s .S u p p o s e t h e r e a r e f iv e l e tt e r s A , B , C , D , E a n d 1 6 " w o r d s " i n th e l a n g u a g e w i t h a s s o c i a t e d p r o b a b i li t ies:

    . 1 0 A . 1 6B E B E . l l C A B E D . 0 4D E B

    . 04 A D E B . 04 B E D . 05 C E E D . 1 5 D E E D

    . 05 A D E E . 0 2 B E E D . 08 D A B .0 1 E A B

    .0 1 B A D D . 05 C A . 0 4 D A D . 05 E ES u p p o s e s u c c e s s i v e " w o r d s " a r e c h o s e n i n d e p e n d e n t l y a n d a r e s e p a r a t e d b y a s p a c e . A t y p i c a l m e ss a g e m i g h t b e :D A B E E A B E B E D E E D D E B A D E E A D E E E E D E B B E B E B E B E B E B E A D E E B E D D E E DD E E D C E E D A D E E A D E E D D E E D B E B E C A B E D B E B E BE D D A B D E E D A D E B .I f a l l t he words a re o f f i n i t e l eng th th i s p rocess i s equ iva l en t t o one o f t he p reced ing type , bu t t hed e s c r i p ti o n m a y b e s i m p l e r i n te r m s o f th e w o r d s t r u c tu r e a n d p r o b a b i l i ti e s . W e m a y a l s o g e n e r a l i zh e r e a n d i n t r o d u c e t r a n s i ti o n p r o b a b i l it i e s b e t w e e n w o r d s , e t c .

    These a r t i f i c i a l l anguages a re use fu l i n cons t ruc t ing s imple p rob lems and example s t o i l l us t r a t e va r ious poss ib i li t ie s . W e can a l so appro x ima te t o a na tu ra l l anguag e b y means o f a se r i e s o f s imple a r t i f ic i a l language sT h e z e r o - o r d e r a p p r o x i m a t i o n i s o b t a i n e d b y c h o o s i n g a l l l e t t e r s w i t h t h e s a m e p r o b a b i l i t y a n d i n d e p e n d e n t l yT h e f i rs t - o rd e r a p p r o x i m a t i o n i s o b t a in e d b y c h o o s i n g s u c c e s s i v e l et t er s i n d e p e n d e n t l y b u t e a c h l e t te r h a v i n g t hsam e p robab i l i t y t ha t i t ha s i n t he na tu ra l l anguag e . 5 Thus , i n t he f i r s t -o rde r approx im a t ion to Eng l i sh , E i s chos e n w i t h p r o b a b i l i t y . 1 2 ( it s fr e q u e n c y i n n o r m a l E n g l i s h ) a n d W w i t h p r o b a b i l i t y . 02 , b u t t h e re i s n o i n f l u e n cb e t w e e n a d j a c e n t l e t te r s a n d n o t e n d e n c y t o f o r m t h e p r e f e r r e d d i g r a m s s u c h a s T H , E D , e t c . I n th e s e c o n d - o r d eapprox ima t ion , d ig ram s t ruc tu re i s i n t roduced . Af t e r a l e t t e r i s chosen , t he nex t one i s chosen in accordance wi ththe f r equenc ie s wi th wh ich the va r ious l e tt e r s fo l low the f i r s t one . Th i s r equ i re s a t ab l e o f d ig ram f requen c iePi ( j ) . In t he t h i rd -o rde r approx ima t ion , t r i g ram s t ruc tu re i s i n t roduced . Each l e t t e r i s chosen wi th p robab i l i t i ew h i c h d e p e n d o n t h e p r e c e d i n g t w o l e tt e rs .

    I I I . T H E S E R I E S O F A P P R O X I M A T I O N S T O E N G L I S HT o g i v e a v i s u a l i d e a o f h o w t h i s s e ri e s o f p r o c e s s e s a p p r o a c h e s a la n g u a g e , t y p i c a l s e q u e n c e s i n t he a p p r o x i m at i o n s to E n g l i s h h a v e b e e n c o n s t r u c t e d a n d a r e g i v e n b e l o w . I n a ll c a s e s w e h a v e a s s u m e d a 2 7 - s y m b o l " a l p h a b e t ,t he 26 l e t t e r s and a space .

    1 . Z e r o - o r d e r a p p r o x i m a t i o n ( s y m b o l s i n d e p e n d e n t a n d e q u i p r o b a b l e ) .X F O M L R X K H R J F F J U J Z L P W C F W K C Y J F F J E Y V K C Q S G H Y D Q P A A M K B Z A A C I B Z L H J Q D

    2 . F i r s t -o r d e r a p p r o x i m a t i o n ( s y m b o l s in d e p e n d e n t b u t w i t h f r e q u e n c i e s o f E n g l is h t e x t) .O C R O H L I R G W R N M I E L W I S E U L L N B N E S E BY A T H E E I A L H E N H T T P A O O B T T V A N A HB R L .

    3. S e c o n d - o r d e r a p p r o x i m a t i o n ( d i g r a m s t r u c t u re a s i n E n g li s h ).5Lett er , d ig ram and t r i g ram f requenc i es a re g iven i n Secret and Urgent by F l e t che r Pra t t , Blue R ibbon Book s , 1939 . W ord f requenc i eare t abu l a t ed i n Relative Frequency of English Speech Sounds, G. Dewey, Harvard Univers i t y Press , 1923 .

    8 Mobile Computing and Communications Review, Volume 5, Num ber

  • 8/4/2019 A Mathematical Theory of Communication Shanon

    7/53

    ON IE ANTSOUTINYS ARE T INCTORE ST BE S DEAM Y ACHIN D I LONASIVE TUCOOWEAT TEASONARE FUSO TIZIN ANDY TOBE SEACE CTISBE.

    4. Third-order approximat ion (trigram structure as in English).IN NO IST LAT WHEY CRATICT FROURE BIRS GROCID PONDENOME OF DEMONSTURESOF THE REPTAGIN IS REGOACTIONA OF CRE.

    . First-order word approximation. Rather than continue with tetragram .... n-gram structure it is easier andbetter to jum p at this point to word units. Here words are chosen independently but with their appropriatefrequencies.

    REPRESENTING AND SPEEDILY IS AN GOOD APT OR COME CAN DIFFERENT NATURALHERE HE THE A IN CAME THE TO OF TO EXPERT GRAY COME TO FURNISHES T HE LINEMESSAGE HAD BE THESE.

    6. Second-order word approximation. The word transition probabilities are correct but no further structure isincluded.

    THE HEAD AND IN FRONTAL ATTACK ON AN ENGLISH WRITER THAT THE CHARA CTEROF THIS POINT IS THEREFORE ANOTHER METHO D FOR THE LETTERS THAT THE TIMEOF WHO EVER TOLD THE P ROBLEM FOR AN UNEXPECTED.

    The resemblance to ordinary English text increases quite noticeably at each of the above steps. Note thatthese samples have reasonably good structure out to about twice the range that is taken into account in theirconstruction. Thus in (3) the statistical process insures reasonable text for two-letter sequences, but four-lettersequences from the sample can usually be fitted into good sentences. In (6) sequences o f four or more words caneasily be placed in sentences without unusual or strained constructions. The particular sequence o f ten words"attack on an English writer that the character o f this" is not at all unreasonable. It appears then that a sufficientlycomplex stochastic process will give a satisfactory representation of a discrete source.

    The first two samples were constructed by the use of a book of random numbers in conjunction with (forexample 2) a table of letter frequencies. This method might have been continued for (3), (4) and (5), sincedigram, trigram and word frequency tables are available, but a simpler equivalent method was used. To construct(3) for example, one opens a book at random and selects a letter at random on the page. This letter is recorded.The book is then opened to another page and one reads until this letter is encountered. The succeeding letteris then recorded. Turning to another page this second letter is searched for and the succeeding letter recorded,etc. A similar process was used for (4), (5) and (6). It would be interesting if further approximations could beconstructed, but the labor involved becomes enormous at the next stage.

    IV. GRAPHICAL REPRESENTATION OF A MARKOFF PROCESSStochastic processes of the type described above are known mathemati cally as discrete Mark off processes andhave been extens ively studied in the literature. 6 The general case can be described as follows: There exist a finitenumber of possible "states" of a system; $1, $2, .. ., Sn . In addition there is a set of transition probabilities; Pi(j)the probability that if the system is in state Si it will next go to state S j . To make this Markoff process into aninformation source we need only assume that a letter is produced for each transition from one state to another.The states will correspond to the "residue of influence" from preceding letters.

    The situation can be represented graph ically as shown in Figs. 3, 4 and 5. The "s tates" are the junct ion pointsin the graph and the probabilities and letters produced for a transition are given beside the corresponding line.Figure 3 is for the example B in Section 2, while Fig. 4 corresponds to the example C. In Fig. 3 there is only onestate since successive letters are independent. In Fig. 4 there are as many states as letters. If a trigram examplewere constructed there would be at m o s t n 2 states correspond ing to the possible pairs of letters preceding the onebeing chosen. Figure 5 is a graph for the case of word structure in example D. Here S corresponds to the "space"symbol.

    6For a detailed treatment see M. FrEchet,Mdthode d es fon ctio ns arbitraires. Thdorie des dvdnements en chafne dans le cas d'unnombre fini d' dtats possibles. Paris, Gauthier-Villars, 1938.Mobile Computing and Communications Review, Volume 5, Num ber I 9

  • 8/4/2019 A Mathematical Theory of Communication Shanon

    8/53

    E

    B

    D .2Fig. 3- - A graph corresponding to the source in example B.

    .2 B

    Fig. 4 - - A graph corresponding to the source in example C.

    V. ERGODIC AND MIXE D SOURCESA s w e h a v e i n d i c at e d a b o v e a d i s c r e te s o u r c e f o r o u r p u r p o s e s c a n b e c o n s i d e r e d t o b e r e p r e s e n t e d b y a M a r k o f fp r o c e s s . A m o n g t h e p o s s i b l e d i sc r e t e M a r k o f f p r o c e s s e s t h e r e i s a g r o u p w i t h s p e c i a l p r o p e r ti e s o f s i g n if i c a n ce i nc o m m u n i c a t i o n t h e o ry . T h i s s p e c ia l c l a s s c o n s is t s o f t h e " e r g o d i c " p r o c e s s e s a n d w e s h a l l c a l l t h e c o r r e s p o n d i n gs o u r c e s e r g o d i c s o u r c e s . A l t h o u g h a ri g o r o u s d e f i n i ti o n o f a n e r g o d i c p r o c e s s i s s o m e w h a t i n v o l v e d , th e g e n e r a lidea i s s imple . In an e rgod ic p rocess eve ry sequence p roduced by the p rocess i s t he same in s t a t i s t i c a l p rope r t i e s .Thus the l e t t e r f r equenc ie s , d ig ram f requenc ie s , e t c . , ob t a ined f rom pa r t i cu l a r sequences , wi l l , a s t he l eng ths o fthe sequen ces i nc rea se , appro ach de f in i t e lim i t s indep enden t o f t he pa r t i cu l a r sequen ce . Ac tua l ly t h i s i s no t t rueo f e v e r y s e q u e n c e b u t t h e s e t f o r w h i c h i t i s fa l s e h a s p r o b a b i l i ty z e r o . R o u g h l y t h e e r g o d i c p r o p e r t y m e a n ss t a t i s t i c a l homogene i ty .

    Al l t he examp le s o f a r t if i c ia l l anguage s g iven abov e a re e rgod ic . Th i s p ro pe r ty i s r e la t ed to t he s t ruc tu re o f t hec o r r e s p o n d i n g g r a p h. I f th e g r a p h h a s t h e f o l l o w i n g t w o p r o p e r t ie s 7 t h e c o r r es p o n d i n g p r o c e s s w i l l b e e r g o d ic :

    1 . Th e g raph d oes no t cons i s t o f two i so l a t ed pa r t s A and B such tha t i t i s impo ss ib l e t o go f rom junc t io npo in t s i n pa r t A to j unc t io n po in t s i n pa r t B a long l i ne s o f t he g raph in t he d i rec t ion o f a r rows and a l soi m p o s s i b l e t o g o f r o m j u n c t i o n s i n p a r t B to j u n c t i o n s i n p a r t A .

    2 . A c lose~l se r i e s o f l i ne s i n the g raph wi th a l l a r rows on the l i ne s po in t ing in t he same o r i en t a t i on w i l l beca l l ed a "c i rcu i t . " The " l eng th" o f a c i r cu i t is t he numb er o f l i ne s in it . T hus in F ig . 5 se r i e s BEBE S i s ac i r cu i t o f l en g t h 5 . T h e s e c o n d p r o p e r t y r e q u i r e d i s t h a t th e g r e a t e s t c o m m o n d i v i s o r o f t h e l e n g th s o f a llc i r cu i t s i n t he g raph be one .

    I f t he f i r s t cond i t i on i s sa t i s f i ed bu t t he second one v io l a t ed by hav ing the g rea t e s t common d iv i so r equa l t od > 1 , t he sequen ces have a ce rt a in t ype o f pe r iod ic s t ruc tu re . The va r ious seq uenc es f a l l i n to d d i f f e ren t c l a sse swh ich a re s t a t i s t i c a lly t he sam e apa r t f rom a sh i f t o f t he o r ig in ( i . e ., wh ich l e t t e r i n t he seq uenc e i s c a l led l e t t e r1 ). By a sh i f t o f f rom 0 up to d - 1 any sequ ence can be ma de s t a t i s t i c a lly equ iva l en t t o any o the r. A s im pleexam ple w i th d = 2 is t he fo l lowing : The r e a re t h ree poss ib l e l e tt e r s a , b , c . Le t t e r a i s fo l low ed wi th e i t he r b o rc wi th p robab i l i t i e s and 2 r e spec t ive ly . E i the r b o r c i s a lways fo l l ow ed by l e t t e r a . T hus a t yp i ca l seq uen ce i s

    a b a c a c a c a b a c a b a b a c a c .T h i s t y p e o f s i t u at io n i s n o t o f m u c h i m p o r t a n c e f o r o u r w o r k .

    I f t he f ir s t cond i t i on i s v io l a t ed the g raph m ay be sepa ra t ed in to a se t o f subgrap hs each o f wh ich sa t i s f ie s t hef i r st cond i t i on . W e wi l l a ssu me tha t t he second con d i t i on i s a l so sa ti s f i ed fo r each subgraph . W e have in t h is c a se

    7These are restatements in terms of the graph of conditions given in Fr6chet.10 Mobile Computing and Communications Review, Volume 5, Number I

  • 8/4/2019 A Mathematical Theory of Communication Shanon

    9/53

    D . ~ . ~ E _

    E

    F i g . 5 - - A g r a p h c o r r e s p o n d i n g t o th e s o u r c e in e x a m p l e D .

    w h a t m a y b e c a ll e d a " m i x e d " s o u r c e m a d e u p o f a n u m b e r o f p u re c o m p o n e n t s . T h e c o m p o n e n t s c o r r e s p o n d t ot h e v a r io u s s u b g r a p h s . I f L 1 , L 2 , L 3 , . . . a r e t h e c o m p o n e n t s o u r c e s w e m a y w r i t e

    L = p iLl + p2L2 + p3L3 + " "w h e r e Pi i s t h e p r o b a b i l i t y o f th e c o m p o n e n t s o u r c e Li .

    P h y s i c a l l y t h e s i t u a ti o n r e p r e s e n t e d i s t h is : T h e r e a r e s e v e r a l d i f fe r e n t s o u r c e s L 1 , L 2 , L 3 ~ . . . w h i c h a r e e a c ho f h o m o g e n e o u s s t a ti s ti c a l s t ru c t u r e ( i .e . , t h e y a r e e r g o d i c ) . W e d o n o t k n o w a priori w h i c h i s t o b e u s e d , b u t o n c et h e s e q u e n c e s t ar ts i n a g i v e n p u r e c o m p o n e n t Li, i t c o n t i n u e s i n d e f i n i t e l y a c c o r d i n g t o t h e s t a t i s ti c a l s t r u c t u r e o ft h a t c o m p o n e n t .

    A s a n e x a m p l e o n e m a y t a k e t w o o f th e p r o c e s se s d e f i n e d a b o v e a n d a s s u m e p l = . 2 a n d p 2 = . 8 . A s e q u e n c ef r o m t h e m i x e d s o u r c e

    L = .2L1 + .8L2w o u l d b e o b t a i n e d b y c h o o s i n g f i r s t L 1 o r L 2 w i t h p r o b a b i l i t ie s . 2 a n d . 8 a n d a f t e r th i s c h o i c e g e n e r a t i n g as e q u e n c e f r o m w h i c h e v e r w a s c h o s e n .

    E x c e p t w h e n t h e c o n t r a r y is s ta t e d w e s h a l l a s s u m e a s o u r c e t o b e e r g o d i c . T h i s a s s u m p t i o n e n a b l e s o n e t oi d e n t i f y a v e r a g e s a l o n g a s e q u e n c e w i t h a v e r a g e s o v e r th e e n s e m b l e o f p o s s i b l e s e q u e n c e s ( t h e p r o b a b i l i t y o f ad i s c r e p a n c y b e i n g z e r o ) . F o r e x a m p l e t h e r e l a t iv e f r e q u e n c y o f t h e le t t e r A i n a p a r t ic u l a r i n f in i t e s e q u e n c e w i l lb e , w i t h p r o b a b i l i t y o n e , e q u a l t o i ts r e la t i v e f r e q u e n c y i n t h e e n s e m b l e o f s e q u e n c e s .

    I f Pi i s t h e p r o b a b i l i t y o f s t a t e i a n d Pi(j) t h e t r a n s it i o n p r o b a b i l i t y t o s t a t e j , t h e n f o r t h e p r o c e s s t o b es t a t io n a r y i t is c l e a r t h a t t h e P / m u s t s a t i sf y e q u i l i b r i u m c o n d i t i o n s :

    Pj = ~,PiP i( j ) .iI n t h e e r g o d i c c a s e i t c a n b e s h o w n t h a t w i t h a n y s t a r t i n g c o n d i t i o n s t h e p r o b a b i l i t i e s Pj (N) o f b e i n g i n s t a te ja f t e r N s y m b o l s , a p p r o a c h t h e e q u i l i b r i u m v a l u e s a s N - + ~o .

    V I . C H O I C E , U N C E R T A I N T Y A N D E N T R O P YW e h a v e re p r e s e n t e d a d i s c r e t e i n f o r m a t i o n s o u r c e a s a M a r k o f f p r o c e s s . C a n w e d e f i n e a q u a n t i t y w h i c h w i l lm e a s u r e , i n s o m e s e n s e , h o w m u c h i n f o r m a t i o n i s " p r o d u c e d " b y s u c h a p r o c e s s , o r b e t t e r , a t w h a t r a t e i n f o r m a -t i o n is p r o d u c e d ?

    S u p p o s e w e h a v e a s et o f p o s s i b l e e v e n t s w h o s e p r o b a b i l it i e s o f o c c u r r e n c e a r e p l ,p 2 , . . . , p n . T h e s e p r o b -a b i li t ie s a r e k n o w n b u t t h a t i s a l l w e k n o w c o n c e r n i n g w h i c h e v e n t w i ll o c c u r. C a n w e f i n d a m e a s u r e o f h o wm u c h " c h o i c e " i s i n v o l v e d in t h e s e l e c ti o n o f t h e e v e n t o r o f h o w u n c e r t ai n w e a r e o f th e o u t c o m e ?

    I f t h e r e i s s u c h a m e a s u r e , s a y H ( p l , P 2 , . - . , Pn), i t i s r e a s o n a b l e t o r e q u i r e o f i t t h e f o l l o w i n g p r o p e r t i e s :Mobile Computing and Communications Review, Volume 5, Numb er I 11

  • 8/4/2019 A Mathematical Theory of Communication Shanon

    10/53

    1 . H s h o u l d b e c o n t i n u o u s i n t h e Pi.1 t h e n H s h o u l d b e a m o n o t o n i c i n c r e as i n g f u n c t i o n o f n . W i t h e q u a l l y l i k e l y. I f a ll the Pi are e q u a l , Pi = ~,

    e v e n t s t h e r e i s m o r e c h o i c e , o r u n c e r t a i n t y , w h e n t h e r e a r e m o r e p o s s i b l e e v e n t s .3 . I f a c h o i c e b e b r o k e n d o w n i n t o t w o s u c c e s s iv e c h o i c e s , t h e o r i g i n a l H s h o u l d b e t h e w e i g h t e d s u m o f th e'

    i n d i v i d u a l v a l u e s o f H . T h e m e a n i n g o f t h i s i s i l l u s t r a t e d in F i g . 6 . A t t h e l e f t w e h a v e t h r e e p o s s i b i l i ti e s

    S 1/21 / 3 " ' , , 1 / 6

    Fig . 6 - -D eco m pos i t ion o f a cho ice f rom th ree pos s ib il i ti e s .P l = , P 2 = , P 3 = I " O n t h e r i g h t w e f i rs t c h o o s e b e t w e e n t w o p o s s i b i l i ti e s e a c h w i t h p r o b a b i l i t y ,a n d i f t h e s e c o n d o c c u r s m a k e a n o t h e r c h o i c e w i t h p r o b a b i l i t ie s ~ , . T h e f i n a l r e s u l t s h a v e t h e s a m e .p r o b a b i l i t i e s a s b e f o r e . W e r e q u i r e , i n t h i s s p e c i a l c a se , t h a t

    1 1 1 2H ( ~ , g , ~ ) = H ( , ) + ~ H ( g , ).T h e c o e f f i c i e n t i s b e c a u s e t h i s s e c o n d c h o i c e o n l y o c c u r s h a l f t h e t i m e .

    I n A p p e n d i x 2 , t h e f o l l o w i n g r e s u l t i s e s t a b l i s h e d :Theorem 2: The only H satisfying, the three above assumptions is of the form:

    H = - K Z Pi l o g p ii= 1where K is a positi ve constant.

    T h i s t h e o r e m , a n d t h e a s s u m p t i o n s r e q u i r e d f o r i t s p r o o f , a r e i n n o w a y n e c e s s a r y f o r t h e p r e s e n t t h e o r y .I t is g i v e n c h i e f l y t o l e n d a c e r t a i n p l a u s i b i l i t y t o s o m e o f o u r l a t e r d e f i n i t i o n s . T h e r e a l j u s t i f i c a t i o n o f t he se ;d e f i n i t i o n s , h o w e v e r , w i l l r e s i d e i n t h e i r i m p l i c a t i o n s .

    Q u a n t i t ie s o f t h e f o r m H = - ~ p i l o g p i ( t h e c o n s t a n t K m e r e l y a m o u n t s t o a c h o i c e o f a u n i t o f m e a s u r e )p l a y a ce n t r a l r o l e in i n f o r m a t i o n t h e o r y a s m e a s u r e s o f i n f o r m a t i o n , c h o i c e a n d u n c e r t a i n t y . T h e f o r m o f Hw i l l b e r e c o g n i z e d a s t h a t o f e n t r o p y a s d e f i n e d i n c e r t a i n f o r m u l a t i o n s o f s t at i st i c al m e c h a n i c s 8 w h e r e Pi is the',p r o b a b i l i t y o f a s y s t e m b e i n g i n c el l i o f it s p h a s e s p a c e . H i s t h e n , fo r e x a m p l e , t h e H i n B o l t z m a n n ' s f a m o u s / tt h e o r e m . W e s h a l l c a l l H = - ~ Pi lo g Pi the en t rop y o f the s e t o f p rob ab i l i t i e s P 1 , . - . , p ,~ . I f x is a chan ce va ri ab le ',w e w i l l w r i te H(x) f o r i t s e n t r o p y ; t h u s x i s n o t a n a r g u m e n t o f a f u n c t i o n b u t a l a b e l f o r a n u m b e r , t o d i f f e r e n t i a t ei t f r o m H(y) s a y , t h e e n t r o p y o f t h e c h a n c e v a r i a b l e y .

    T h e e n t r o p y i n t h e c a s e o f t w o p o s s i b i li t i e s w i t h p r o b a b i l i t i e s p a n d q = 1 - p , n a m e l yH = - ( p l o g p + q l o g q )

    i s p l o t t e d i n F i g . 7 a s a f u n c t i o n o f p .T h e q u a n t i t y H h a s a n u m b e r o f i n t e r e s t i n g p r o p e r t i e s w h i c h f u r t h e r s u b s t a n t i a t e i t a s a r e a s o n a b l e m e a su re .,

    o f c h o i c e o r i n f o r m a t i o n .1. H ---- 0 i f and on ly i f a l l the Pi b u t o n e a r e z e r o , t h i s o n e h a v i n g t h e v a l u e u n i t y . T h u s o n l y w h e n w e a r e . ,

    c e r t a in o f t h e o u t c o m e d o e s H v a n i sh . O t h e r w i s e H i s p o s i ti v e .2 . F o r a g iv e n n , H i s a m a x i m u m a n d e q u a l t o l o g n w h e n a l l t h e Pi are equa l ( i . e . , ) . T h i s i s a l s o in tu i t ive ly

    t h e m o s t u n c e r t a i n s i tu a t io n .8See, fo r exa mple , R . C . Tolman , Principles o f Statistical Mechanics, Oxford , C l a rendon , 1938.

    1 2 Mobile Computing and Communications Review, Volume 5, Number 1

  • 8/4/2019 A Mathematical Theory of Communication Shanon

    11/53

    HB I T S

    1. 09.8.765.4.3.210

    f/ \/ \

    /0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1.0

    PFig . 7 - -E n t r op y in the ca s e o f two pos s ib il i ti e s wi th p robab i li t ie s p and (1 - p ) .

    3 . S u p p o s e t h e r e a r e t w o e v e n t s , x a n d y , i n q u e s t i o n w i t h m p o s s i b i l i t ie s f o r t h e f i r s t a n d n f o r t h e s e c o n d .L e t p(i, j) b e t h e p r o b a b i l i t y o f t h e j o i n t o c c u r r e n c e o f i f o r th e f i r st a n d j f o r t h e s e c o n d . T h e e n t r o p y o f t h e j o i n te v e n t i s

    H (x, y ) = - ~] p( i, j) lo g p( i, j)t, J

    w h i l e

    I t is e a s i l y s h o w n t h a t

    H(x) = - ~ p ( i , j ) l og ~ p ( i , j )i,j j

    H ( y ) = - ~ p ( i , j ) l o g ~ p ( i , j ) .i,j i

    H(x,y) < H(x) + H(y)w i t h e q u a l i t y o n l y i f t h e e v e n t s a r e i n d e p e n d e n t ( i . e . , p(i , j ) = p(i)p(j)) . T h e u n c e r t a i n t y o f a j o i n t e v e n t i s l e sst h a n o r e q u a l t o t h e s u m o f t h e i n d i v i d u a l u n c e r t a i n t i e s .

    4 . A n y c h a n g e t o w a r d e q u a l i z a t io n o f th e p r o b a b i li t ie s Pl ,P2, . . . ,Pn i n c r e as e s H . T h u s i f P l < P 2 a n d w ei n c r e a se P l , d e c r e a s i n g P 2 a n e q u a l a m o u n t s o t h a t P l a n d P 2 a re m o r e n e a r l y e q u a l , t h e n H i n c r ea s e s . M o r eg e n e r a ll y , if w e p e r f o r m a n y " a v e r a g i n g " o p e r a t i o n o n t h e Pi o f t h e f o r m

    Pl = Zai jPjJ

    w h e r e ~ i a ij = ~ j a i j = 1 , and a l l aij > O, t h e n H i n c r e a s e s ( e x c e p t i n t h e s p e c i a l c a s e w h e r e t h i s t r a n s f o r m a t i o na m o u n t s t o n o m o r e t h a n a p e r m u t a t i o n o f t h e pj w i t h H o f c o u r s e r e m a i n i n g t h e s a m e ) .

    5 . S u p p o s e t h e r e a r e t w o c h a n c e e v e n t s x a n d y a s i n 3 , n o t n e c e s s a r i l y i n d e p e n d e n t . F o r a n y p a r t i c u l a r v a l u ei t h a t x c a n a s s u m e t h e r e i s a c o n d i t i o n a l p r o b a b i l i t y Pi(j) t h a t y h a s t h e v a l u e j . T h i s i s g i v e n b y

    Pi ( j ) - - p ( i , j )Z j p ( i , j ) "W e d e f i n e th e conditional entropy o f y, Hx(y) a s t h e a v e r a g e o f t h e e n t r o p y o f y f o r e a c h v a l u e o f x , w e i g h t e da c c o r d i n g t o t h e p r o b a b i l i t y o f g e t t i n g t h a t p a r t i c u l a r x . T h a t is

    Hx(y) = - ~ p ( i , j ) l ogp i ( j ) .t~J

    Mobile Computing and Communications Review, Volume 5, Number I 13

  • 8/4/2019 A Mathematical Theory of Communication Shanon

    12/53

    This quantity measures how uncertain we are of y on the average when we know x.Pi ( j ) we obtain

    o r

    Substituting the value of

    Hx(y) = - ~,~ p( i, j) lo g p( i, j) + ~_~p( i, j) lo g ~_~p( i, j)i j i , j j

    = H(x ,y ) - H(x)

    H(x , y ) = H(x) + Hx(y).The uncertainty (or entropy) of the joint event x ,y is the uncertainty of x plus the uncertainty of y when x isknown.

    6. From 3 and 5 we haveH(x ) + H(y) >_ H( x,y ) = H(x ) + Hx(y) .

    HenceH(y) > Hx(y) .

    The uncer tainty of y is never increased by knowledge of x. It will be decreased unless x and y are independentevents, in which case it is not changed.

    V I I . T H E E N T R O P Y O F A N I N F O R M A T I O N S O U R C E

    Consider a discrete source of the finite state type considered above. For each possible state i there will be a set ofprobabilities Pi(j) of producing the various possible symbols j. Thus there is an entropy Hi for each state. Theentropy of the source will be defined as the average of these Hi weighted in accordance with the probability ofoccurrence of the states in question:

    H ~ - - Z P i H ii

    = - - ~ , P i P i ( j ) l o g p i ( j ) .l, J

    This is the entropy of the source per symbol of text. If the Markoff process is proceeding at a definite time ratethere is also an entropy per second u ' =

    iwhere j~ is the average frequency (occurrences per second) of state i. ClearlyH I = m H

    where m is the average number of symbols produced per second. H or H' measures the amount of informationgenerated by the source per symbol or per second. If the logarithmic base is 2, they will represent bits per sym bolor per second.

    If successive symbols are independent then H is simply - ~ p i l o g Pi where Pi is the probability of symboli. Suppose in this case we consider a long message of N symbols. It will contain with high probability aboutp i N occurrences of the first symbol, p z N occurrences of the second, etc. Hence the probability o f this particul~xmessage will be roughly iP pP l Nppz N= 2 . . . pp , No r

    log p -- N y~ Pi log Pii

    log p - - -NHH -- log/ p

    NH is thus approximately the logarithm of the reciprocal probability of a typical long sequence divided by thenumber of symbols in the sequence. The same result holds for any source. Stated more precisely we have (seeAppendix 3):14 Mobile Computing and Communications Review, Volume 5, Number I

  • 8/4/2019 A Mathematical Theory of Communication Shanon

    13/53

    The orem 3: Given any e > 0 and (~ > O, we can find an No such that the sequenc es of any leng th N _> No fallinto two classes:

    1. A set who se total prob ability is less than e.2 . The remainder , al l o f wh ose mem ber s have probabi l it ies sat is fying the inequal i ty

    l g p - 1 - HN < 6 .I n o t h e r w o r d s w e a r e a lm o s t c e r ta i n t o h a v e l g p - 1 ve ry c lose t o H when N i s l a rge .A c l o s e l y r e la t e d r e s u lt d e a l s w i t h th e n u m b e r o f s e q u e n c e s o f v a r io u s p r o b a b i li t ie s . C o n s i d e r a g a in t h e

    s e q u e n c e s o f l e n g th N a n d l e t t h e m b e a r r a n g e d i n o r d e r o f d e c r e a s in g p r o b a b i li t y . W e d e f i n e n(q) t o be t hen u m b e r w e m u s t t a k e f r o m t h i s s e t st a rt in g w i t h t h e m o s t p r o b a b l e o n e i n o r d e r t o a c c u m u l a t e a to t a l p ro b a b i l i t yq fo r t hose t aken .

    Theorem 4:Li m lognn(q) _ HN---~oo N

    whe n q does no t equal 0 or 1.W e m a y i n t er p r e t l o g n ( q ) a s t h e n u m b e r o f b i t s r e q u i r e d t o s p e c i f y t h e s e q u e n c e w h e n w e c o n s i d e r o n l y

    t h e m o s t p r o b a b l e s e q u e n c e s w i t h a t o ta l p r o b a b i li t y q . T h e n l o g n ( q ) i s t h e n u m b e r o f b i t s p e r s y m b o l f o r t h eNs p e c if i c at i o n . T h e t h e o r e m s a y s t h at f o r l ar g e N t h i s w i l l b e i n d e p e n d e n t o f q a n d e q u a l t o H . T h e r a t e o f g r o w t ho f t h e l o g ar i th m o f t h e n u m b e r o f r e a s o n a b l y p r o b a b l e s e q u e n c e s i s g i v e n b y H , r e g a r d l e s s o f o u r i n t e rp r e t at i o no f " r e a s o n a b l y p r o b a b l e . " D u e t o t h e s e r e su l ts , w h i c h a r e p r o v e d i n A p p e n d i x 3 , i t i s p o s s i b l e f o r m o s t p u r p o s e st o tr e a t t h e lo n g s e q u e n c e s a s t h o u g h t h e r e w e r e j u s t 2 H / o f t hem, each wi th a p robab i l i t y 2 - n N .

    T h e n e x t t w o t h e o r e m s s h o w t h a t H a n d H t c a n b e d e t e r m i n e d b y l i m i ti n g o p e r a t i o n s d i r e c tl y f r o m t h es t a t is t i c s o f t he mess age s equenc es , w i thou t r e fe rence to t he s t a te s and t r ans i ti on p ro bab i l i t i e s be tw een s t a t e s .

    Theorem 5: Le t p (B i ) b e the probabi l i ty of a sequence B i o f sym bols from the source. L et1 BG N = - - ~ i p ( i ) l o g p ( B i )

    where the sum is over al l sequences Bi containing N sy mbols . Then GN is a mon otonic decreasing funct ion of Nand

    Li ra GN = H.N-+ooTheorem 6: Le t p (Bi , S j ) be the probabi l i ty o f sequence Bi fol lo wed by sy mb ol Sj and PBi (S ) = p (Bi, S j ) /p (Bi)

    b e the condit ional probabi l i ty o f Sj af ter Bi . Le tFN = - - ~ p ( B i , S j ) l og p B i( S j)i,j

    where the sum i s over a ll b locks B i o f N - 1 symbols and over a ll s ymb ols S j . Then FN i s a monoton ic decreasingfunct ion o f N,

    FN = NGN - (N - 1 ) G N - 1 ,1 NGN=Fn,

    n = lFN < GN,

    and L i m / + o o F N = H .Mobile Computing and Communications Review, Volume 5, Number I 15The nu mb er o f s igna ls o f dura t ion T in the channe l i s g rea t e r t han 2 t~ -o j l wi th 0 sma l l when T i s l a rge , i f w e

  • 8/4/2019 A Mathematical Theory of Communication Shanon

    14/53

    T h e s e r e s u lt s a r e d e ri v e d in A p p e n d i x 3 . T h e y s h o w t h a t a s e r ie s o f a p p r o x i m a t io n s t o H c a n b e o b t a i n e db y c o n s i d e r i n g o n l y t h e s t a ti s ti c al s t r u c t u re o f t h e s e q u e n c e s e x t e n d i n g o v e r 1 , 2 , . . . , N s y m b o l s . FN i s t he be t t e rapprox im a t ion . In f ac t FN i s th e e n t r o p y o f t h e N th o rd e r a p p r o x i m a t i o n t o t h e s o u r c e o f t h e t y p e d i s c u s s e d a b o v e .I f t he re a re no s t a t i s ti c a l i n f luences ex t end ing ove r mo re than N s ym bol s , t ha t i s i f t he cond i t i ona l p robab i l i t y o ft h e n e x t s y m b o l k n o w i n g t h e p r ec e d i n g ( N - 1 ) i s n o t c h a n g e d b y a k n o w l e d g e o f an y b e f o r e t ha t, t h e n FN = H.FN o f c o u r s e i s t h e co n d i t io n a l e n t r o p y o f th e n e x t s y m b o l w h e n t h e ( N - 1 ) p r e c e d i n g o n e s a r e k n o w n , w h i l e GNi s t h e e n t ro p y p e r s y m b o l o f b l o c k s o f N s y m b o l s .

    T h e r a t i o o f t h e e n t r o p y o f a s o u r c e t o t h e m a x i m u m v a l u e i t c o u l d h a v e w h i l e s t i ll r e s t ri c t ed t o t h e s a m es y m b o l s w i l l b e c a l l e d i t s relat ive entropy. T h i s i s t h e m a x i m u m c o m p r e s s i o n p o s s i b l e w h e n w e e n c o d e i n t othe same a lphabe t . One minus the r e l a t i ve en t ropy i s t he redundancy . T h e r e d u n d a n c y o f o rd i n a r y E n g l is h , n o tcons ide r ing s t a t is t i c a l s t ruc tu re ove r g rea t e r d i s t ances t han abou t e igh t le t t e rs , i s rough ly 50% . Th i s m eans tha tw h e n w e w r i t e E n g l is h h a l f o f w h a t w e w r i t e is d e t e r m i n e d b y t h e s t ru c t u re o f t h e l a n g u a g e a n d h a l f is c h o s e nf r ee l y. T h e f i g u r e 5 0 % w a s f o u n d b y s e v e r a l in d e p e n d e n t m e t h o d s w h i c h a ll g a v e re s u l ts i n t hi s n e i g h b o r h o o d .O n e i s b y c a l c u la t i o n o f t h e e n tr o p y o f t h e a p p r o x i m a t i o n s t o E n g li s h . A s e c o n d m e t h o d i s t o d e l e te a c e rt a inf rac t ion o f the l e tt e r s f rom a samp le o f Eng l i sh t ex t and then l e t some one a t t em pt t o r e s to re t hem. I f t hey canb e r e s to r e d w h e n 5 0 % a r e d e l e t e d t h e re d u n d a n c y m u s t b e g r e a t e r t h a n 5 0 % . A t h ir d m e t h o d d e p e n d s o n c e r ta i nk n o w n r e s u l t s i n c r y p t o g ra p h y .

    T w o e x t r e m e s o f r e d u n d a n c y i n E n g l i s h p r o s e a r e re p r e s e n t e d b y B a s i c E n g l i s h a n d b y J a m e s J o y c e ' s b o o k" F i n n e g a n s W a k e " . T h e B a s i c E n g l i s h v o c a b u l a r y i s li m i t e d t o 8 5 0 w o r d s a n d t h e r e d u n d a n c y i s v e r y h ig h . T h i si s r e f l ec t ed in t he expans ion tha t occurs when a pa ssage i s t r ans l a t ed in to Bas i c Eng l i sh . Joyce on the o the r hande n l a rg e s t h e v o c a b u l a r y a n d i s a ll e g e d t o a c h i e v e a c o m p r e s s i o n o f s e m a n t i c c o n t e n t .

    T h e r e d u n d a n c y o f a la n g u a g e i s r e l a te d t o t h e e x i s t e n c e o f c r o s s w o r d p u z z l e s . I f t h e r e d u n d a n c y i s z e r o a n ys e q u e n c e o f le t te r s i s a r e a s o n a b l e t e x t in t h e l a n g u a g e a n d a n y t w o - d i m e n s i o n a l a r r a y o f le t te r s f o r m s a c r o s s w o r dp u z z l e. I f t h e r e d u n d a n c y i s t o o h ig h t h e l a n g u a g e i m p o s e s t o o m a n y c o n s t r a i n ts f o r la r g e c r o s s w o r d p u z z l e s t ob e p o s s i b l e . A m o r e d e t a i l e d a n a l y s i s s h o w s t h a t i f w e a s s u m e t h e c o n s t ra i n ts i m p o s e d b y t h e l a n g u a g e a r e o f ar a t h er c h a o t i c a n d r a n d o m n a t u re , l a r g e c r o s s w o r d p u z z l e s a r e j u s t p o s s i b l e w h e n t h e r e d u n d a n c y i s 5 0 % . I f t h er e d u n d a n c y i s 3 3 % , t h r e e -d i m e n s i o n a l c r o s s w o r d p u z z l e s s h o u l d b e p o s s i b l e , e tc .

    VII I . REPRESENTATION OF THE ENCODING AND DECODING OPERATIONSW e h a v e y e t t o r e p r e s e n t m a t h e m a t i c a l l y t h e o p e r a t i o n s p e r f o r m e d b y t h e t r a n s m i t t e r a n d r e c e i v e r i n e n c o d i n gand deco d ing the i n fo rma t ion . E i the r o f t he se wi l l be ca l l ed a d i sc re t e t r ansduce r . The inp u t t o t he t r ansduce r i sa s e q u e n c e o f i n p u t s y m b o l s a n d i ts o u t p u t a s e q u e n c e o f o u t p u t s y m b o l s . T h e t r a n s d u c e r m a y h a v e a n i n te r n alm e m o r y s o t h a t it s o u t p u t d e p e n d s n o t o n l y o n t h e p r e s e n t i n p u t s y m b o l b u t a l s o o n t he p a s t h is t o ry . W e a s s u m etha t t he i n te rna l m em ory i s fi n it e , i .e . , t he re ex i s t a fi n i te num ber m of poss ib l e s t a t e s o f the t r ansd uce r and tha ti t s ou tpu t i s a func t ion o f t he p re sen t s t a t e and the p re sen t i npu t sym bol . The nex t s t a t e wi l l be a second fu nc t iono f th e s e t w o q u a n t i ti e s. T h u s a tr a n s d u c e r c a n b e d e s c r i b e d b y t w o f u n c t io n s :

    Yn = f (X n , an)o n+l = g(xn,' n)

    w h e r eXn is the n th i n p u t s y m b o l ,an i s t he s ta t e o f t he tr ansduce r wh en the n th i n p u t s y m b o l i s i n t ro d u c e d ,Yn i s t h e o u t p u t s y m b o l ( o r s e q u e n c e o f o u t p u t s y m b o l s ) p r o d u c e d w h e n Xn i s in t r o d u c e d i f t h e s t a te i s a n .

    I f th e o u t p u t s y m b o l s o f o n e t r a n s d u c e r c a n b e i d e n ti f ie d w i t h t h e in p u t s y m b o l s o f a s e c o n d , t h e y c a n b econn ec ted in t andem and the r e su l t i s a l so a t r ansduce r . I f t he re ex i s t s a seco nd t r ansduce r w hich op e ra t e s on theou tpu t o f t he f i rs t and recov e rs t he o r ig ina l i npu t, t he f i r s t t r ansduc e r w i l l be ca l l ed non-s ingu la r and the sec ondwi l l be ca l l ed i t s i nve r se .

    The ore m 7: Th e out put o f a f ini te state t ransducer driven by a f ini te state stat ist ical source is a f ini te states ta t is t i ca l source , wi th en t rop y (per un i t t ime) l e ss t han or equal t o tha t o f t he inpu t . I f t he t ransducer i s non-s ingular th ey are equal .16 Mobile Computing and Communications Review, Volume 5, Number I

  • 8/4/2019 A Mathematical Theory of Communication Shanon

    15/53

    L e t c t r e p r e s e n t t h e s t at e o f th e s o u r c e , w h i c h p r o d u c e s a s e q u e n c e o f s y m b o l s xi ; and l e t /3 be t he s t a t e o fthe tr ansduce r , w hich p roduces , i n i t s ou tpu t , b locks o f sym bol s yj . T h e c o m b i n e d s y s t e m c a n b e r e p r e s e n t e dby the "p ro duc t s t a t e space " o f pa i r s ( c~, /3 ) . Tw o po in t s i n t he space (oq , f l l ) and (oz2 ,f12) , a re conne c t ed by al i ne i f a l c an p rod uce an x wh ich chan ges /31 to /32 , and th i s l ine i s g iven the p roba b i l i t y o f t ha t x i n t h i s c a se .T h e l i n e i s l a b e l e d w i t h t h e b l o c k o f yj s y m b o l s p r o d u c e d b y t h e t ra n s d u ce r . T h e e n t r o p y o f th e o u t p u t c a n b eca l cu l a t ed a s t he we igh ted s um o ve r t he s ta t e s . I f we sum f i r s t on /3 each re su l t i ng t e rm i s l e ss t han o r equa l t o thecor re spo nd ing t e rm fo r o~, hence the en t ropy i s no t i nc rea sed . I f the t r ansduce r i s non-s ingu la r l e t i t s ou tpu t beconn ec ted to the i nve r se t r ansducer . I f HI , H~ and H~ a re t he ou tpu t en t rop ie s o f t he source , t he f i rs t and secon dt ransduce r s r e spec t ive ly , t hen H~ > H~ > H~ = HI and the re fo re HI = H~ .S u p p o s e w e h a v e a s y s t e m o f c o n st r a in t s o n p o s s i b l e s e q u e n c e s o f t h e t y p e w h i c h c a n b e r e p r e s e n t e d b ya l inear graph as in Fig. 2. I f prob abi l i t ies _(s)_Jij were ass igned to t he va r ious l i ne s connec t ing s t a t e i t o s t a t e jt h is w o u l d b e c o m e a s o u r ce . T h e r e i s o n e p ar t ic u l a r a s s i g n m e n t w h i c h m a x i m i z e s t h e r e s u lt i n g e n t r o p y ( s e eA p p e n d i x 4 ) .

    Theorem 8: Le t t he sy s t em o f cons tra in t s cons idered as a channe l have a capac i t y C = log W . I f w e a s s i g n_(s ) B; ~(sl"t W - - i jIJij Bi

    w h e r e gi(; i s the durat ion of the s th sy mb ol lead ing fr om state i to state j and the Bi satisfyo ( s )B i = )__B jW-~ i:

    s, jt h en H i s m a x i m i z e d a n d e q u a l to C .

    B y p r o p e r a s s i g n m e n t o f t he t ra n s it io n p r o b a b i l it i e s t h e e n t r o p y o f s y m b o l s o n a c h a n n e l c a n b e m a x i m i z e da t the channe l capac i ty .

    IX . THE FUNDAMENTAL THEOREM FOR A NOISELESS CH ANNELW e wi l l now jus t i fy our i n t e rp re ta t i on o f H a s t he r a te o f gene ra t ing in fo rma t ion by p rov ing tha t H de t e rmin esthe channe l capac i ty r equ i red wi th mos t e f f i c i en t cod ing .

    Theorem 9: Le t a source have en t ropy H (b i t s per symb ol ) and a channe l have a capac i t y C (b i t s per second) .CThen i t i s poss ib l e to encode the ou tpu t o f t he source in such a wa y as to t ransmi t a t t he average rate ~ - esym bol s per se cond over the channe l where i s arbi t rar i ly smal l . I t is no t poss ib l e to t ransmi t a t an average ra teCgrea te r than - - .H

    CT h e c o n v e r s e p a r t o f t h e th e o r e m , t h a t - - c a n n o t b e e x c e e d e d , m a y b e p r o v e d b y n o t i n g t h a t t h e e n t ro p yHof the channe l i npu t pe r seco nd i s equa l t o tha t o f the source , s ince t he t ransmi t t e r m us t be non-s ingu la r , anda l so th is en t rop y cann o t ex ceed the channe l capac i ty . He nce H ~ _< C and the num ber o f sym bol s pe r secon d= H ' / H < C / H .

    The f i rs t pa r t o f t he theore m wi l l be p roved in two d i f f e ren t wa ys . The f i rs t me thod i s to cons ide r t he se t o f a lls e q u e n c e s o f N s y m b o l s p r o d u c e d b y t h e s o u r c e . F o r N l a rg e w e c a n d i v i d e t h e s e in t o t w o g r o u p s , o n e c o n t a in i n gle ss t han 2 (H+n)N me mb ers and the secon d con ta in ing l e ss t han 2 RN me mb ers (whe re R i s t he l oga r i t hm of t henum ber o f d i f f e ren t symb ol s ) and hav ing a to t a l p robab i l i t y l e ss t han # . As N inc rea se s ~ and # app roach ze ro .The nu mb er o f s igna l s o f dura t ion T in the channe l i s g rea t e r t han 2 ( c - ) r wi th 0 sma l l whe n T i s l arge , i f wec h o o s e

    t h e n t h e re w i l l b e a s u ff i ci e n t n u m b e r o f s e q u e n c e s o f c h a n n e l s y m b o l s f o r th e h i g h p r o b a b i l i ty g r o u p w h e n N a n dT a re su f f i c i en t ly l a rge (howeve r sma l l A) and a l so some add i t i ona l ones . The h igh p robab i l i t y g roup i s coded inan a rb i t r a ry one - to -one way in to t h i s se t . The rema in ing sequences a re r ep re sen ted by l a rge r sequences , s t a r t i ngand end ing w i th one o f t he sequen ces no t used fo r t he h igh p rob ab i l i t y g roup . Th i s spec i a l se quen ce ac t s a s a st a r tMobile Computing and Communications Review, Volume 5, Number I 17

  • 8/4/2019 A Mathematical Theory of Communication Shanon

    16/53

    and s top s igna l fo r a d i f f e ren t code . In be tween a su f f i c i en t t ime i s a l l owed to g ive enough d i f f e ren t sequencesfo r a l l t he l ow probab i l i t y messages . Th i s wi l l r equ i re

    T i = ( ~ q - ~ o ) Nw h e r e ~ i s s m a ll . T h e m e a n r a te o f t ra n s m i s s i o n i n m e s s a g e s y m b o l s p e r s e c o n d w i l l t h e n b e g r e a te r t h a n

    -1 -1

    CAs N inc rea se s 6 , ), and ~ appro ach ze ro and the r a t e appro aches ~ .A n o t h e r m e t h o d o f p e r f o r m i n g t h i s c o d i n g a n d t h e r e b y p r o v i n g t h e t h e o r e m c a n b e d e s c r i b e d a s fo l l o w s :

    Ar ran ge the me ssag es o f l eng th N in o rde r o f dec rea s ing p robab i l i t y and sup pos e the i r p robab i l i t ie s a re P l >__P2 >_ P3 "' " >--Pn. L e t Ps = Y_.,Sl-1Pi; hat i s Ps i s t he cumula t ive p robab i l i t y up to , bu t no t i nc lud ing , Ps . We f i r s te n c o d e i n t o a b in a r y s y s t e m . T h e b i n a r y c o d e f o r m e s s a g e s i s o b t a i n e d b y e x p a n d i n g P s a s a b i n a r y n u m b e r . T h eexpans ion i s c a r r i ed ou t t o ms p l a c e s, w h e r e ms i s the i n t ege r sa t i s fy ing :

    1 1lg2 P s -< ms < 1 + log 2 --'PsT h u s t h e m e s s a g e s o f h ig h p r o b a b i l i t y a r e re p r e s e n t e d b y s h o r t c o d e s a n d t h o s e o f lo w p r o b a b i l i t y b y l o n g c o d e s .F r o m t h e s e i n e qu a l it i es w e h a v e 1 1

  • 8/4/2019 A Mathematical Theory of Communication Shanon

    17/53

    X. DISCUSSION AND EXAMPLESIn order to obtain the maximum power transfer from a generator to a load, a transformer must in general beintroduced so that the generator as seen from the load has the load resistance. The situation here is roughlyanalogous. The transducer which does the encoding should match the source to the channel in a statistical sense.The source as seen from the channel through the transducer should have the same statistical structure as thesource which maximizes the entropy in the channel. The content of Theorem 9 is that, although an exact matchis not in general possible, we can approximate it as closely as desired. The ratio of the actual rate of transmissionto the capacity C may be called the effic iency of the coding system. This is of course equal to the ratio of theactual entropy of the channel symbols to the maximum possible entropy.

    In general, ideal or nearly ideal encoding requires a long de lay in the transmitter and receiver. In the noiselesscase which we have been considering, the main function of this delay is to allow reasonably good matching ofprobabilities to corresponding lengths of sequences. With a good code the logarithm of the reciprocal probabilityof a long message must be proportional to the duration o f the corresponding signal, in fact

    lgp-1 CT

    must be small for all but a small fraction of the long messages.If a source can produce only one particular message its entropy is zero, and no channel is required. For

    example, a computing machine set up to calculate the successive digits of 7r produces a definite sequence with nochance element. No channel is required to "transmit" this to another point. One could construct a second machineto compute the same sequence at the point. However, this may be impractical. In such a case we can choose toignore some or all of the statistical knowledge we have o f the source. We might consider the digits of 7r to be arandom sequence in that we constrruct a syst em capable of sending any sequence o f digits. In a similar way wemay choose to use some of our statistical knowledge of English in constructing a code, but not all of it. In sucha case we consider the source with the maximum entropy subject to the statistical conditions we wish to retain.The entropy of this source determines the channel capacity which is necessary and sufficient. In the 7r examplethe only information retained is that all the digits are chosen from the set 0, 1, .. ., 9. In the case of English onemight wish to use the statistical saving possible due to letter frequencies, but nothing else. The maximum entropysource is then the first approximation to Eng lish and its entropy determines the required channel capacity.

    As a simple example of some of these results consider a source which produces a sequence of letters chosenfrom among A, B, C, D with probabilities I 1 1 1, 4, 8, 8, successive symbols being chosen independently. We have

    H= -( lo g + log + ~log)7= ~ bits per symbol.

    Thus we can approximate a coding system to encode messages from this source into binary digits with an averageof 2. binary digit per symbol. In this case we can actually achieve the limiting value by the following code(obtained by the method of the second proof of Theorem 9):

    A 0B 10C 110D 111

    The average number o f binary digits used in encoding a sequence of N symbols will beN( x 1 + 41-x 2+ ~ x 3) = 47-N.

    It is easily seen that the binary digits 0, 1 have probabilities , so the H for the coded sequences is one bitper symbol. Since, on the average, we have ~ binary symbols per original letter, the entropies on a time basisare the same. The max imum possible entropy for the original set is log4 = 2, occurring when A, B, C, D haveMobile Computing and Communications Review, Volume 5, Number I 19

  • 8/4/2019 A Mathematical Theory of Communication Shanon

    18/53

    1 1 1 1 H e n c e t h e r e l a t i v e e n t r o p y i s 7 . W e c a n t r a n s l a t e t h e b i n a r y s e q u e n c e s i n t o t h e o r i g i n a lp robab i l i t i e s 4 , 4 , 4 , 4"s e t o f s y m b o l s o n a t w o - t o - o n e b a s i s b y t h e f o l l o w i n g t a b le :0 0 A ~01 B /10 C I1 1 D /

    T h i s d o u b l e p r o c e s s t h e n e n c o d e s t h e o r i g in a l m e s s a g e i n t o t h e s a m e s y m b o l s b u t w i t h a n a v e ra g e c o m p r e s s i o nr a t i o 7 .

    A s a s e c o n d e x a m p l e c o n s i d e r a s o u r c e w h i c h p r o d u c e s a s e q u e n c e o f A ' s a n d B ' s w i t h p r o b a b i l i t y p f o r Aa n d q f o r B . I f p

  • 8/4/2019 A Mathematical Theory of Communication Shanon

    19/53

    channel, i.e., the received signal, will be denoted by H(y). In the noiseless case H(y) = H(x). The joint entropyof input and output will be H (xy). Finally there are two conditional entropies Hx (y) and Hy (x), the entropy of theoutput when the input is known and conversely. Among these quantities we have the relations

    /-/0c, y) =/4( x) + x(y) = -( y ) +/-/y 0c).All of these entropies can be measured on a per-second or a per-symbol basis.

    XI I. EQUIVOCATION AND CHANNEL CAPACITYIf the channel is noisy it is not in general possible to reconstruct the original message or the transmitted signal withcertainty by any operation on the received signal E. There are, however, ways of transmitting the informat ionwhich are optimal in combating noise. This is the problem which we now consider.

    Suppose there are two possible symbols 0 and 1, and we are transmitting at a rate of 1000 symbols per secondwith probabilities P0 = Pl = . Thus our source is producing in formation at the rate of 1000 bits per second.During transmission the noise introduces errors so that, on the average, 1 in 100 is received incorrectly (a 0 as 1,or 1 as 0). What is the rate of transmission of information? Certainly less than 1000 bits per second since about1% of the received symbols are incorrect. Our first impulse might be to say the rate is 990 bits per second, merelysubtracting the expected number of errors. This is not satisfactory since it fails to take into account the recipient'slack of knowledge o f where the errors occur. We may carry it to an extreme case and suppose the noise so greatthat the received symbols are entirely independent of the transmitted symbols. The probability of receiving 1is whatever was transmitted and similarly for 0. Then about h alf of the received symbols are correct due tochance alone, and we would be giving the system credit for transmitting 500 bits per second while actually noinformation is being transmitted at all. Equally "go od" transmission would be obtained by dispensing with thechannel entirely and flipping a coin at the receiving point.

    Evidently the proper correction to apply to the amount of information transmitted is the amount of thisinformation which is missing in the received signal, or alternatively the uncertainty when we have received asignal of what was actually sent. From our previous discussion of entropy as a measure of uncertainty it seemsreasonable to use the conditional entropy of the message, knowing the received signal, as a measure of thismissing information. This is indeed the proper definition, as we shall see later. Following this idea the rate ofactual transmission, R, would be obtained by subtracting from the rate of production (i.e., the entropy of thesource) the average rate of condit ional entropy.

    R = H (x) - Hy (x)The conditional entropy Hy(x) will, for convenience, be called the equivocation. It measures the average

    ambiguity o f the received signal.In the example conside red above, if a 0 is received the a posteriori probability that a 0 was transmitted is .99,

    and that a 1 was transmitted is .01. These figures are reversed if a 1 is received. HenceHy (x) = -[ .9 9 log .99 + 0.01 log 0.01]

    = .081 bits/symbolor 81 bits per second. We may say that the system is transmit ting at a rate 1000 - 81 = 919 bits per second.In the extreme case where a 0 is equally likely to be received as a 0 or 1 and similarly for 1, the a posterioriprobabil ities are , and

    Hy(x) = - [ log + og ]= 1 bit per symbol

    or 1000 bits per second. The rate of transmission is then 0 as it should be.The following theorem gives a direct intuitive interpretation of the equivocation and also serves to justify it

    as the unique appropriate measure. We consider a communication system and an observer (or auxiliary device)who can see both what is sent and wha t is recovered (with errors due to noise). This observer notes the errors inthe recovered message and transmits data to the receiving point over a "correction channel " to enable the receiverto correct the errors. The situation is indica ted schemati cally in Fig. 8.Mobile Computing and Communications Review, Volume 5, Number I 21

  • 8/4/2019 A Mathematical Theory of Communication Shanon

    20/53

    SOURCE TRANSMITTER

    CORRECTION DATA

    OBSERVER

    RECEIVER CORRECTINGDEVICEFig. 8--Schematic diagram of a correction system.

    Theorem 10: I f the correction channel has a capacity equal to Hy(x) i t is pos sible to so enco de the correctiodata as to send i t over this channel and correct all but an arbitrarily small fraction c of the errors. This is noposs ib le i f the channe l capac i ty i s le ss than Hy(x ) .

    Roughly then, Hy (x) is the amount of additional information that must be supplied per second at the receivinpoint to correct the received message.

    To prove the first part, consider long sequences of received message M' and corresponding original messagM. There will be logarithmically T H y ( x ) of the M's which could reasonably have produced each M'. Thus whave T H y ( x ) binary digits to send each T seconds. This can be done with e frequency of errors on a channel ocapacity Hy(x ) .

    The second part can be proved by no ting, first, that for any discrete chance variables x, y, z

    The left-hand side can be expanded to giveHy (x , z ) _> B y(x ) .

    /b(z) + n y z ( X ) _> n y ( x )nyz (X ) _> ny (x ) - G ( z ) _> t -l y (x ) - n ( z ) .

    If we identify x as the output of the source, y as the received signal and z as the signal sent over the correctiochannel, then the right-hand side is the equivocation less the rate of transmission over the correction channeIf the capacity of this channel is less than the equivocation the right-hand side will be greater than zero andHyz(X) > 0. But this is the uncertainty of what was sent, knowing both the received signal and the correctiosignal. If this is greater than zero the frequency of errors cannot be arbitrarily small.Example :

    Suppose the errors occur at random in a sequence of binary digits: probability p that a digit is wrong anq = 1 - p that it is right. These errors can be corrected if their position is known. Thus the correctiochannel need only send information as to these positions. This amounts to transmitting from a sourcwhich produces bina ry digits with probability p fo r 1 (incorrect) and q for 0 (correct). This requires channel of capacity

    - [p log p + q log q]which is the equivocation of the original system.

    The rate of transmission R can be written in two other forms due to the identities noted above. We haveR = H(x ) - Hy(x )

    = H(y ) - Hx(y )= H ( x ) + H ( y ) - H ( x , y ) .

    Mobile Computing and Comm unications Review, Volume 5, Number

  • 8/4/2019 A Mathematical Theory of Communication Shanon

    21/53

    The first defining expression has already been interpreted as the amount of information sent less the uncertaintyof what was sent. The second measures the amoun t received less the part of this which is due to noise. The thirdis the sum of the two amounts less the joint entropy and therefore in a sense is the number of bits per secondcomm on to the two. Thus all three expressions have a certain intuitive significance.

    The capacity C of a noisy channel should be the maximum possible rate of transmission, i.e., the rate whenthe source is properly matched to the channel. We therefore define the channel capacity by

    C= Max (H( x)- H y ( x ) )where the max imum is with respect to all possible informat ion sources used as input to the channel. If the channelis noiseless, Hy(x ) = 0. The definition is then equivalent to that already given for a noiseless channel since themaximum entropy for the channel is its capacity.

    XIII. T H E F U N D A M E N T A L T H E O R E M F O R A D I S C R E T E C H A N N E L W I T H N O I S EIt may seem surprising that we should define a definite capacity C for a noisy channel since we can never sendcertain information in such a case. It is clear, however, that by sending the in formation in a redundan t form theprobability of errors can be reduced. For example, by repeating the message man y times and by a statisticalstudy of the different received versions of the message the probab ility of errors could be made very small. Onewould expect, however, that to make this probability of errors approach zero, the redundancy of the encodingmust increase indefinitely, and the rate of transmission therefore approach zero. This is by no means true. If itwere, there would not be a very well defined capacity, but only a capacity for a given frequency of errors, or agiven equivocation; the capacity going down as the error requirements are made more stringent. Actually thecapacity C defined above has a very definite significance. It is possible to send information at the rate C throughthe channel wi th as smal l a f requ ency o f e rrors or equ ivoca t ion as des i red by proper encoding. This statement isnot true for any rate greater than C. If an attempt is made to transmit at a higher rate than C, say C + R1, then therewill necessarily be an equivocation equal to or greater than the excess R1. Nature takes payment by requiring jus tthat much uncertainty, so that we are not actually getting any more than C through correctly.

    The situati