source coding algorithms for fast data compression · a similar algorithm encodes variable length...
TRANSCRIPT
SOURCE CODING ALGORITHMS FOR FAST DATA COMPRESSION
A DISSERTATION
SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING
AND THE COMMITTEE ON GRADUATE STUDIES
OF STANFORD UNIVERSITY
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
BY
Richard Clark Pasco
May 1976
@ Copyright 1976
b Y
Richard Clark Pasco
This page intentionally left blank.
This page intentionally left blank.
ABSTRACT
Noiseless source coding, or noiseless data compression, is a one-
to-one mapping between data and a more compact representation. Invertible
arithmetic algorithms are presented which encode strings of random source
symbols with known conditional probabilities into strings of symbols for
a channel. One algorithm encodes blocks of fixed length into codewords
satisfying the prefix condition whose expected length exceeds the source
Shannon entropy by at most two symbols plus an exponentially decreasing
function of computational precision. The new process differs from pre-
vious coding algorithms, such as Huffman coding, in that computation time
grows only linearly with block length, permitting real-time compression
at rates arbitrarily close to the source entropy rate by using very long
blocks. A similar algorithm encodes variable length source strings into
fixed size codewords with comparable results. A generalized structure is
proposed to unify the new algorithms with those of Elias and Rissanen.
Acknowledgements
I wish to thank: Professor Thomas M. Cover, my advisor, whose
guidance, technical insight, and direction was invaluable; Professor
Robert M. Gray for his editorial assistance; Bell Telephone Laboratories
for salary and expenses during my study; U.S. Air Force Contract fIF44620-
74-C-0068 for computer time; my son, Matthew, for cheerful diversion
from my work; and Ms. Katherine Adams for typing the final manuscript.
Table of Contents
. . . . . . 1 . INTRODUCTION AND NOTATION
2 . FIXED-TO-VARIABLE CODES . . . . . . . 2 . 1 Problem Statement and History . 2 .2 The Fixed-to-Variable Algorithms
2.2 .1 The encoding algorithm . 2.2.2 The decoding algorithm . 2 .2 .3 Codeword set is proper . 2.2 .4 Compression rate . . . . 2.2.5 Computational complexity
3 . VARIABLE-TO-FIXED CODES . . . . . . . 3.1 Problem Statement and History . 3.2 The Variable-to-Fixed Algorithms
4 . EXPERIMENTAL IMPLEMENTATION . . . . . 5 . GENERALIZATION . . . . . . . . . . .
. . . . . . Appendix A EFFECTS OF QUANTIZATION OF PROBABILITIES 65
Appendix B MAXIMIZING VF MESSAGE LENGTH DOES NOT
. . . . . . . . . . . . . . . . . . . . MINIMIZE RATE 69
. . . . . . . . . . . . . . . . . . Appendix C PROGRAM LISTINGS 73
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . - 1 0 5
vii
This page intentionally left blank.
CHAPTER 1
INTRODUCTION AND NOTATION
This paper presents new algorithms for variable rate noiseless source
coding of data from discrete-time, finite-alphabet sources. The new algo-
rithms are fast, achieving compression rates arbitrarily close to the
source entropy while requiring a computational effort of two multiplica-
tions and one addition per symbol to be encoded. This speed permits coding
operations to be performed in real-time, eliminating the need for storage
of tables of codewords.
To put the new algorithms in perspective, we explain some terms used
in the previous paragraph. Source coding is a translation from redundant
source data to a more compact channel representation. By noiseless, we
mean that the source data may be reconstructed exactly given the correct
channel data.
To facilitate the study of the new algorithms, we make some definitions.
A set S of finite strings of symbols from a finite discrete alphabet A
is proper [~elinek and Schneider, 19721 if and only if no string in S is
the prefix of any other string in S , and S is complete [~ilbert and
Moore, 19591 if and only if every infinite string of symbols from A is
prefixed by some string in S . Table 1 illustrates examples drawn from
the binary alphabet A = {0,1) . Table 1. Examples of Complete and Proper
Sets of Binary Strings
Not Complete
Proper
These concepts are well known in the theory of variable length codes:
a proper set of codewords is called a prefix or instantaneous code
[~bramson, 19631; a complete set of codewords is called an exhaustive
code [~ilbert and Moore, 19591.
Cohn El9761 has described a new formal structure for source codes.
The following definition is intended to agree with his use of the same
term: A noiseless block-to-block code is a one-to-one mapping from a com- ----
plete and proper message set of strings of source symbols into a proper
codeword set of strings of channel symbols. The elements of the domain
will be called messages; the elements of the range will be called codewords.
In this paper we will restrict our attention to noiseless block-to-
block codes. This restriction leads to some useful properties. A one-to-
one mapping is invertible; every codeword represents a unique message. A
complete message set guarantees that every possible source sequence can be
encoded. A proper message set allows the encoder to transmit a codeword
immediately upon receipt of the last symbol of a message. Finally, a proper
codeword set allows the decoder to output the decoded message instanta-
neously upon receipt of the last symbol of a codeword.
A performance measure associated with source codes is their rate, which
we shall define by R = EL / EN
where EN is the expected message length and EL is the expected codeword
length. This expression for block-to-block codes was presented by Cohn as
a theorem derived from a more general definition.
Two specific types of block-to-block codes are especially easy to im-
plement. If the message set is the set of all strings of fixed length N
source symbols, the code is called fixed-to-variable (FV). Alternately, if --
the codeword set is constrained to contain only codewords of fixed length
L channel symbols, the code is called variable-to-fixed -- (VF). If neither
condition is met, the code is called variable-to-variable. - It is perhaps
of historical interest to know that most early block-to-block codes were
of the FV category [~uffman, 1952, and Gilbert and Moore, 19591. Variable-
to-variable [~olomb, 19661 and variable-to-f ixed [~unstall, 1968, Schalkwijk,
1972, and Jelinek and Schneider, 1972 and 19741 codes have appeared more
recently.
We will make the notational convention that if x is a singly-
infinite (hereafter shortened to infinite) sequence, x will denote the i
i-th symbol of x and x(i) = x x 1 2u'Xi
will denote the i-symbol prefix
Suppose an information source emits the infinite random sequence X
of random symbols from an M-ary alphabet I = { 0 1 2 - 1 , and
let x denote a particular M-ary sequence. Assume that the conditional
probability distributions
i - = P(Xi=xil ~(i-l)=x(i-1))
are known for all i and for all x . n
The probability that a particular finite string x(n) E IM prefixes
X is given by n
P(x(n)) = P(X(n)=x(n)) = Np(xilx(i-1)) (3 )
i= 1 where x(0) is the null string.
Suppose that a D-ary noiseless channel is available which can transmit
symbols from I = { 0 2 D - 1 with equal cost and without error. In-
tuitively, the average information content, measured in D-ary digits, of
the i-th source symbol Xi is the conditional entropy of Xi given the
past. The entropy of Xi conditioned on a specific past is
and the conditional entropy of Xi given the past is
Let B be a complete and proper message set from IM
and let
b(x)~B be the unique message which prefixes x . For all b in B,
let ~ ( b ) be its length and b for 1 I i I N(b) be the i-th symbol of i
b . The probability that message b E B prefixes X , given by (3) with
n = N(b) and x(N(b)) = b , is N(b)
The expected length of the first message is
EN1 - - C' P (b) N(b) . b E B
Let L(b) denote the length of the codeword assigned to message ~ E B . The expected length of the first codeword is
Finally, we may define the entropy, or average information content, of the
first message,
Because messages and codewords are in a one-to-one correspondence, they
share a common distribution. Thus HD(b(X)) is also the antropy of t h e
first codeword.
In this paper, a new theory for the design of algorithms which quickly
encode data with known statistics at rates approaching the theoretical
minimum will be developed. In Chapter 2, shortcomings of traditional FV
methods will be illustrated and used to motivate the new work. A new FV
compression algorithm will be presented and analyzed in detail. In
Chapter 3, a modified algorithm for VF coding will be similarly analyzed.
In Chapter 4, demonstration implementations of the new algorithms will be
discussed. Finally, in Chapter 5, the theory will be generalized and re-
lated to other work in the field. For an overview of the new theory, it
is suggested that the reader go to Chapter 5 next, and then read Chapters
2 through 5 in sequence.
This page intentionally left blank.
CHAPTER 2
FIXED-TO-VARIABLE CODES
2.1 Problem Statement and History
With fixed-to-variable (FV) codes, the message set is the set of all
M-ary strings of length N ,
The message which prefixes the infinite source string X is simply the
first N symbols from X ,
A FV code assigns a unique distinct codeword from a proper set to each
string in 1: . Let L be a random variable whose value is the length
of the codeword assigned to b(X) and hence to X(N) . The rate expres-
sion (1) becomes for FV codes
R = E L / N (12)
The FV coding problem is the design of algorithms which assign codewords
to messages in I with minimal rate R and reasonable computational M
complexity.
Shannon [1948] and Fano [1948] proved that there exists no FV code
which maps message set into a codeword set with expected codeword IM
length less than HD(x(~)) and that there always exists a code with ex-
pected codeword length bounded by
and Huffman [1952] presented an algorithm for finding the optimum such
code. By making N arbitrarily large compression rate EL/N approaching
the per-symbol entropy HD(x(~))/~ could be achieved, but this is not
practical when codes need to be computed in real-time. Real-time coding
is necessary when little memory is available for table storage and the
dependence of the conditional distribution of the source on the past is
complex. The impracticality of real-time Huffman coding with large N
lies in the number of computations required to assign codewords to source
strings.
~uffman's algorithm maps a probability distribution over a message
set into a set of codewords with integer lengths. To make the integer-
length constraint have negligible effect on the rate of the code, the
messages and codewords are made very long. The number of messages n
grows exponentially with the message length N . The computation time
grows faster yet: Van Voorhis [1975] has shown that a straightforward
implementation of ~uffman's algorithm on a set of n messages requires
2 O(n ) calculations to assign codewords, where the notation f(x) =
O(g(x)) means that the ratio f(x)/g(x) is bounded. A clever implemen-
tation of Huffman's algorithm still requires O(n log n) steps. With
N N n = M this means that O(NM ) operations per block are needed. It is
clearly impractical to implement these algorithms as N grows.
In this paper, however, a different approach is taken. With arith-
metic, instead of coding over the set of all source strings of
length N , single source symbols are encoded individually and the code
elements are combined arithmetically into a codeword. Various ways in
which the code elements may be combined are discussed in Chapter 5.
Consider the communications system shown in Figure 1.
Figure 1. Communication System Block Diagram
- - - L - - - - - -1 - - - - - - - - L - I I I
I I I
I I
I +Predictor I
I .- J.
- I I I I
The compressor, consisting of a predictor and an encoder, implements the
function defined as a code in the previous section. The predictor con-
tains the knowledge of the source statistics. By observing previous source
symbols, the predictor informs the encoder of the conditional probability
distribution of the next symbol to be emitted. The encoder grows a code-
word by combining this information with the actual symbol emitted. By an
analogy to be developed, the encoder maps source symbols into code ele-
ments whose lengths are not constrained to be integral numbers of channel
symbols. A message maps into N of these elements which are packed
snugly into a large codeword, necessarily of integer length for trans-
mission. The codeword is transmitted through a noiseless channel to the
Source
expander, which inverts the work of the compressor. The expander contains
a replica of the predictor in the encoder, and a decoder which uses the
predictor's output to dismantle the codeword. This is possible because
the decoder needs to know only the distribution for the symbol it is
1 I ' i-' Encoder Decoder
- - *user ' I ' -
I
working on, and it outputs symbols sequentially as it extracts them from
the codeword. The integer-length constraint is made to have negligible
effect on the rate by increasing the block length. This is practical
because the computational complexity grows only linearly with N , the
number of source symbols encoded in the block.
Arithmetic coding is a generalization of a procedure due to Elias
(unpublished result, explained by Jelinek in [1968a] and [1968b]). Briefly,
Elias' idea is to define a cumulative distribution function (CDF) on a
lexicographic ordering of the set of all strings of length N . To encode a given message, the CDF is evaluated at the message. In our notation,
we say of sequences x and y that x < y if and only if xi < yi for
the minumum i such that xi * yi . Define the symbol CDF -
and the string CDF -
Elias observed that F(x(N)) is the sum of the probabilities of N sets
of strings, where the i-th set contains all strings which first differ
from x by being less in the i-th symbol, so
The probability of a string may be calculated from the probability of a
string one symbol shorter and the conditional probability of the last
symbol, 10
Elias exhibited a sequential algorithm based on (16) and (17) for comput-
ing the message CDF F(x(N)) . Elias coding is a special case of
arithmetic coding because the symbol CDF (14) maps source symbols into
code elements and the sum (16) combines them into a codeword. By its
definition, F(x(N)) is a monotonic function of x(N) and therefore an
invertible function. At each message x(N), , F(x(N)) increases by a
step of height P(x(N)) ; hence specifying any point in the interval
[F(x(N) ) , F(x(N) ) + P (x(N) ) ) uniquely specifies x(N) . Such a point
may be obtained by truncating the D-ary expansion of F(x(N)) + P(x(N))
to bogD P(x(N))] digits, where [ z ] = the least integer not less than
z . By (9) the expected number of digits lies in the interval
[H~(x(N)) , HD(X(N)) + 1 ) . Unfortunately, Jelinek's statement that the computational complexity
of Elias' algorithm grows linearly with the message length N is only
correct for small N , where the product in (17) can be represented in
one computer word. For large N , multiprecision techniques must be
used to accurately represent P(x(i)) . Suppose that the conditional
probability p(xil x(i-1)) is expressed in J digits for all i . Then,
by (17), P(x(i)) has J more digits than Pi-1)) , or a total of
iJ digits. The complexity of each of the N multiplications grows
linearly toward NJ . Therefore the total complexity of encoding a mes-
sage of length N is O(N~J) ; i.e., it grows as the square of the
message length.
Another problem with Elias' algorithm is that the resulting code-
words do not satisfy the prefix condition; the codeword set is not proper.
11
Therefore it is necessary either to attach a length-indicating prefix to
each codeword or to insert a comma between codewords. Either technique
introduces an inefficiency, lengthening the average codeword by approxi-
mately logD N digits.
2.2 The Fixed-to-Variable Algorithms
In this section an encoding algorithm, which maps messages of N
source symbols into variable-length codewords, and its inverse for decod-
ing are presented. The length L of the codeword W depends on the
message and the source statistics, with expected length bounded by
where V (K) is an exponentially decreasing function of the computational D
precision K , and can be made negligibly small with convenient values
of K . The compression rate can therefore be made arbitrarily close to
HD(X(N))/N by making N large enough. This is done without penalty,
for computational complexity grows linearly with N . More specifically
it grows as O(NJ(K+M)) , where N is the message length, J is the
number of digits in the D-ary representation of the source probabilities,
K is the precision in D-ary digits of internal arithmetic, and M is
the source alphabet size. The algorithms require that radix-D notation
be used for number representation and arithmetic. Therefore, although D
is arbitrary, implementation on a digital computer is easiest when D is
a power of 2 . Although the structure of the encoding algorithm presented in this
section strongly resembles Elias coding, the codeword is no longer sub-
ject to interpretation as a cumulative distribution function on the set
of all messages of length N . This is because finite-precision arith-
metic is used to achieve the linear growth of computational complexity
with message length. It was experimentally confirmed that the obvious
scheme of rounding the results of all calculations in Elias' algorithm
fails; losses in precision prohibit all but the first few symbols from
being decoded correctly. The significant result here is that by using
the proper strategy for limiting precision in encoder and decoder, the
message may be decoded without error, and only a small penalty in code-
word length is paid.
Another new result is that by making the codewords just one digit
longer than with Elias coding, a proper codeword set results. Therefore
no commas or length-indicating prefixes with their inherent inefficiencies
are needed.
2.2.1 The encoding algorithm
We will begin our study with the encoding algorithm. First we will
examine the data structures in the computer memory during the execution of
the algorithm. The encoding algorithm is initially presented in the struc-
tured form of a high-level computer language, then translated into a
recursive notation for analysis. Following this we will develop an in-
tuitive understanding of its operation, and finally mathematically
formalize this understanding.
Four variables, Q , C , F , and T , will represent the data struc-
tures for the encoding algorithm.
Let Q be an array containing M fixed-point J-digit fractional
D-ary numbers. Each number Q(m) , where ~ E I M , has a rational value
defined by
where the qmj E ID are the digits in its D-ary expansion. Assuming
that the P(x. Ix(i-1)) are multiples of D-j for all xi E IM , array 1
Q will hold conditional probabilities for the i-th symbol to be encoded
provided by the predictor,
qi(xi) = p(xilx(i-1)) for all xieIM . (18
-J If for some i the p(xilx(i-1)) are not multiples of D then the
Qi can be made arbitrarily close to the actual probabilities by increas-
ing their precision J . Appendix A finds a precision J sufficient to
place a given bound on the added codeword length due to this quantization.
Typically, J will be on the order of a computer word length. The Qi
must satisfy
0 I Qi(xi) < 1 for all xi&IM , (19)
-J qi(xi) is a multiple of D for all xi E IM , (20)
and
Let C be another array, formatted exactly like Q . Array C is
filled with the cumulative distribution function given by Ci(0) = 0
and
ci(xi) = C' Qi(m) for all positive x. 1 E I M ' (22) m < x i
Therefore (21) and (22) imply that for all x. E: IM , 1
Let F be a multiprecision fixed-point fractional D-ary number.
While it will rarely be needed, enough storage for JN digits should be
available. A pointer may be used to indicate the least significant
nonzero digit. The rational value of F is defined by
where the £.€ID are the digits in its D-ary expansion. We will soon J
see how F is used.
Let T be a normalized floating point number, consisting of two
parts: A significant field containing K D-ary digits to tl t2 ... t K- 1
satisfying t €ID for 0 I k I K-1 and t * 0 , and an exponent field k 0
containing the nonnegative integer T . Later we shall establish require-
ments for K and bounds on T . Typically K will be on the order of a
computer word length. The rational value of T is defined to be
Using positional scientific notation, T would be written
From this definition it follows that for any positive integer n
T > n if and only if T < D - ~ ,
-n -n+l T = n if and only if D I T < D Y
and -n+l
T < n if and only if D I T .
The encoding algorithm follows.
Algorithm FVE (Fixed-to-Variable Encoding):
(1) Set F +- 0 . (2) Set T -+ 1 . (3) For i -+ 1,2,3, ... N do:
(3a) Load the Q and C arrays from the predictor.
(3b) Get symbol x from the source. i
(3c) Set F -+ F + T.c~(x~) . (3d) Set T -+ T.Qi(xi) truncated to K significant digits.
(4) Set L +- T + 1 . (5) Truncate F to L digits and add D - ~ (i.e. set F +- D - ~ ~ ~ F + 1J ). (6) Transmit the L digits of F . (7) Go to 1 and begin encoding the next message.
For analysis it is desirable to attach a distinct label to each value
taken on by the variables in Algorithm FVE. Let F and T denote the i i
values attained by F and T respectively after the i-th iteration of
step FVE-3. Thus
F = O 0
and
and, for 1 I i I N ,
and Ti = T Q. (xi) truncated to K i-1 1 significant digits. (30)
Note that because Qi(xi) < 1 , (30) implies that {T~) is a decreasing
sequence. Let T~ denote the value contained in the exponent field of
16
T after the N-th iteration of step FVE-3, and let W denote the result
of step FVE-5. In this notation, steps FVE-4 and FVE-5 become
and
where LzJ = the greatest integer not exceeding z . To develop an intuitive understanding of the operation of Algorithm
FVE, suppose that K -t and there is no truncation in step FVE-3d. Then
and Fi = F(x(i))
where the right-hand quantities are as defined for Elias coding. When T
is truncated to K digits, however, this interpretation is invalid. One
valid interpretation is that F is an overlapped concatentation of code
elements C(xi) , with successive code elements assigned less significant
digits of F . T is then a pointer indicating where in accumulator F
the next code element may be placed without overlapping its predecessor
too much. Another interpretation is that F is a sum of scaled code
elements, and T is a scale factor which places a sufficiently small
weight on each code element so that it does not interfere with its pre-
decessor. Under these interpretations, the necessity of truncation rather
than rounding is clear. If T were rounded up in step FVE-3d, successive
code elements might be weighted so heavily that they could interefere with
preceding code elements.
In the following example, Algorithm FVE encodes the sequence
3 3 1 3 3 3 3 5 3 3 from I6 into the decimal codeword 2198. Let M=6,
D=10, N=10, J=2, and K=2 . Suppose the source emits independent sym-
bols with identical distributions
By (4), the entropy of each symbol is H (X.) = 0.319 digitslsymbol for 10 1
all i , and by (9) the entropy of a message of length 10 is HlO(X(lO))=
3.19 digitslmessage. (By (45) we have V10(2) = .0458 digitslsymbol, and
the expected codeword length, by (51), is between 4.19 and 5.65 digits.)
We encode the typical sequence 3 3 1 3 3 3 3 5 3 3 . Step FVE-1 sets
Fo = .O and step FVE-2 sets To = 1.OD-0 , where the notation a.bD-c
means a.b x D-' , the D being used to separate the significant field
of T from its exponent field T . Table 2 lists the results of step FVE-3.
Table 2. Example of Step FVE-3
i - - Xi Fi Ti
Working through this example may clarify the roles of F as an accumu-
lator containing a sum of scaled code elements and T as a pointer
containing a scale factor. For example, partial sum F3 = -1864 reslllts
when the code element of the third symbol, C(x3) = C(1) = .O1 , is mul-
tiplied by the scale factor from the previous iteration, T2 = 6.41)-1 =
.64 , and the product .0064 is added to the previous sum F2 = .18 . The new pointer T3 = 5.1D-2 results when the previous scale factor,
T2 = 6.4D-1 , is multiplied by the probability of the third symbol,
Q(x~) = ~(1) = .08 , and the product 5.12D-2 is truncated to two signi-
ficant digits. After the tenth iteration, step FVE-4 sets L = 3+1 = 4 . Step FVE-5 sets W = .2198 and the codeword transmitted is 2198.
Let us analyze the properties of Algorithm FVE. We begin by build-
ing the necessary theory to show that the codeword can be expressed in L
digits, that the message may be decoded correctly, and that the codeword
set is proper.
Boundary (27) and recursion (29) imply that F is a sum of scaled N
code elements
The truncation in (30) implies the bound
A direct consequence of the final truncation and addition (32) is
and a useful relation is obtained by applying (31) and (25),
An essential characteristic of Algorithm FVE is that the scale factor T
after each iteration of step FVE-3 bounds from above the scaled sum of
all subsequent code elements; i.e., if a and b are integers and
1 5 a I b , then
This result is established in a slightly different form by the following
lemma.
Lemma: For any integers a , b , if 1 I a I b then
Proof (by induction on b ) :
(1) If b = a , equality is trivial.
(2) If b > a , the inductive step is taken by the following chain of
inequalities, for the reasons noted below:
where (a) is the inductive hypothesis; (b) follows from bound (23);
(c) follows from the truncation inequality (34); and (d) results when the
term T C (x ) is included in the summation. b-1 b b Q.E.D.
We need assurance that the operations of Algorithm FVE did not cause
a carry overflow out of the most significant (D-I) digit of F , and
that the codeword can therefore be expressed exactly in L digits as in-
tended.
Theorem: Codeword W can be expressed exactly in L digits.
-L Proof: Because W is a multiple of D it is enough to show that
0 I W < 1 . That W 2 0 follows from (35) and the fact that all terms
in (33) are nonnegative. That W < 1 is given by
(a) w I F + D - ~ N
Where (a) is the upper bound of (35); (b) follows from (36) and D 1 2 ;
(c) follows from sum (33); (d) results from Lemma (37) with a = l , b = N ;
(e) follows because (28) and (30) imply To = 1 and T1 = Ql(xl) ; and
(f) is bound (23). Q.E.D.
2.2.2 The decoding algorithm
We have seen the encoding algorithm and some of its properties.
Algorithm FVE maps messages of N source symbols into codewords of vari-
able length L digits. Next we will investigate a decoding algorithm
which maps an infinite string of channel digits into a sequence of mes-
sages.
Algorithm FVD (Fixed-to-Variable Decoding):
(1) Set F + transmitted string.
(2) Set T +- 1 . (3) For i +- 1,2,3, ... N do:
(3a) Load the Q and C arrays from the predictor.
(3b) Set yi + max y : y E IM and Ci(y) 6 FIT 1 . ( 3 c ) Output decoded symbol yi . (3d) Set F +- F - T.Ci(yi) . (3e) Set T + TWQ~(Y~) truncated to K significant digits.
(4) Set L + T + 1 , and begin decoding the next codeword, L channel
digits after the beginning of the present codeword.
In step FVD-3b, only the first J digits to the right of the radix
point of the quotient FIT need be calculated to make the indicated com-
parison, because C is exact in J digits. A brief examination of
elementary long division algorithms shows that this can be done by ex-
amining only J digits of F beyond the least significant nonzero digit
of T . Considering the formats in which F and T are represented,
we see that F need only be known to T + (K-1) + J digits. Thus in
practice it will be unnecessary to fill the entire buffer at step FVD-1;
rather, digits of F can be read in from the channel as they are refer-
enced. 22
Let G denote the value of F after the execution of step FVL)-1 0
and let G denote the value of F after the i-th iteration of step i
FVD-3. By the argument just advanced, only the first digits of G are
explicitly represented in the decoder; the remainder are still in the
channel.
No distinct symbol is needed to denote the values of T during de-
coding, for it will be shown that yi = x i y so that the values of T
determined by FVD-2 and FVD-3e will always equal the values determined
by FVE-2 and FVE-3d. Thus Ti may also denote the contents of T after
the i-th iteration of FVD-3.
Briefly, Algorithm FVD correctly decodes the message by correctly
decoding each symbol one at a time. When one symbol is being decoded,
the sum of subsequent scaled code elements is bounded, as shown by Lemma
(37). This bound is small enough so that the current symbol is decoded
correctly and its exact code element is subtracted, exposing the next
code element for decoding. To clarify this, we will use Algorithm FVD
to parse the codeword 2198 generated in the previous example from the
beginning of a decimal channel sequence and to decode the message cor-
rectly.
Again, let M=6 , D=10 , N=10 , J=2 , and K=2 . As required, the
predictor provides the same independent, identical distribution as used
in the previous example. We decode the channel string
2 1 9 8 3 1 4 1 5 9 .... Step FVD-1 sets Go = .219+ , where + denotes that subsequent digits
have not yet been read in from the channel. Step FVD-2 sets T = 1.OD-0. 0
Table 3 lists the results of step FVD-3.
Table 3. Example of Step FVD-3
Let us examine how the decision Y2 = 3 is made. The quotient
G1/~l = .1198+/.80 is computed to two places, resulting in .14. Since
this is at least as large as C(3) = .10 but not as large as C(4) =
.90 , step FVD-3b sets y = 3 . After the tenth iteration of step 2
FVD-3, step FVD-4 sets L = 4 and begins the next codeword with G = 0
We will next formally prove that Algorithm FVD always correctly de-
codes the message. We will see that channel data after the codeword has
too small an effect on Go to cause decoding errors.
The decoder sees the codeword W followed by the beginning of the
next codeword. Because the decoder cannot immediately determine where
codeword W ends, step FVD-1 cannot set G = w exactly but can only 0
bound it by W S G < W + D - ~
0
Nesting this into bounds (35) on W , we obtain
and by (36) this becomes
2 FN 5 Go < FN + - T
D N '
Since D 1 2 ,
Steps FVD-3b and FVD-3d imply
and
Theorem: Algorithm FVD produces an exact copy of the message,
y i = x for 1 S i S N . i
Proof (by induction on i ) :
Let 1 5 j S N and suppose that the first j - 1 symbols have been
correctly decoded. Given y -1) = ( j -1) , we will show that y = j
X j *
That y1 = x1 will follow with j = 1 . We will first compute
two bounds on the partial sum G then will observe that the decision j-1 '
rule (39) sets Yj = x . The reasons for each link in the chain of in- j
equalities follow afterward. The lower bound is
where (a) follows from recursion (40) and the inductive hypothesis; (b)
follows from the lower bound of (38); (c) results from cancelling terms
in sum (33); and (d) results when some nonnegative terms are dropped. The
upper bound is given by
where (a) follows from recursion (40) and the inductive hypothesis; (b)
follows from the upper bound of (38); (c) results from cancelling terms
with (33) and separating the last term from the sum; (d) follows from
(23); (e) follows from the truncation inequality (34); (f) results from
Lemma (37) with a = j and b = N-1 ; and (g) follows by (34). Thus
we have
Dividing by Tj-l yields
Note that if x = M-1 the upper bound is 1 , by (21) and (22); other- j
wise the upper bound is C (x.+l) , by (22). In either case the decision j J
rule (39) correctly sets y = x . Q.E.D.
2.2.3 Codeword set is proper
We have seen that the decoding algorithm reproduces the message
correctly regardless of the channel data following the codeword, although
it does examine some of that data during the decoding process. We will
next compute a bound on the number of channel symbols examined beyond the
codeword end, and argue that the codeword set is proper.
In the discussion following the presentation of Algorithm FVD, it
was demonstrated that y can be determined by examining only the first N
T ~ - l + (K-1) + J digits of the channel sequence. Since T ~ - ~ I T~ and
L = T~ + 1 , this means that at most L + K + J - 2 channel digits will
be read by Algorithm FVD before the message is completely decoded. L of
these are the codeword; therefore at most J + K - 2 must be saved for
2 7
decoding subsequent messages.
That the codeword set is proper (the code is instantaneous) is
established by considering the following modification to the decoder:
Each time a new digit of G is needed, but before reading any actual
channel symbols, the decoder could append an arbitrary string of J + K - 2
digits and attempt to complete decoding. If the decoder determined that
the codeword extended into the trial string, then another channel symbol
would be read. But as soon as the codeword ended with the last symbol
already read in, the decoder would immediately output the message, which
would be correct by Theorem (41). Thus the fact that Algorithm FVD ex-
amines channel data after the codeword during the decoding process is
only a property of the algorithm. By sketching the design of an instan-
taneous decoder, we have shown that the set of codewords satisfies the
prefix condition and therefore is proper.
2.2.4 Compression rate
Let us develop the tools necessary to examine the expected behavior
of the codeword length L . Lemma: Let Z be a real number satisfying 0 < Z < 1 and let T be
the truncation to K significant digits of the 0-ary expansion of Z . Then
Proof: Represent Z and T in D-ary normalized floating-point form;
i.e., let 00
and
where t. E I and to 2 1 . 1 D
By (a) and (c), in analogy to (25),
Subtracting (b) from (a)
and thus by (c)
Dividing, applying upper bound (e) to the numerator and lower bound (d)
to the denominator,
--r-K+l Z - T - < D - - D1-K z il-r
Subtracting (f) from 1 = 1 ,
T 1-K - > l - D Q.E.D. z
This lemma and recursion (30) establish a lower bound complementary
to the upper bound given by the truncation inequality (34) ,
Define
and
Taking logarithms of (43) yields
. li . log log, qi(xi) D Qi(xi> + VD(K) .
Roughly speaking, li measures the length of the codeword dedicated
to the i-th symbol from the source, and VD(K) is an upper bound to the
length of the codeword wasted by each truncation in step FVE-3d.
Table 4 shows typical values of VD(K) . For example, V (16) = 2
4.40 X means that with a double-precision implementation on an
eight-bit binary microprocessor, less than one ten-thousandth of a code-
word bit is wasted for each source symbol by using finite precision
arithmetic. Similarly, V16(6) = 3.44 x means that with a standard
precision 6-digit floating-point hexadecimal representation such as used
on the IBM 370, less than one one-millionth of a hexadecimal channel
digit is wasted per source symbol.
Table 4
Typical values of v~(K) = - logD (I-DI-~)
(hexadecimal (hexadecimal (blKts) 1 (:::I; digits) 1 v l ~ ~ ~ ~ t s )
Theorem: The function VD(K) = -log D (1-Dl) decays exponentially with
K . That is, for any c > 1 there exists a Kc such that K 2 K C
implies
(A) D - ~ I v (K) 5 c (A) D - ~ ,
t D t Ln D
(a) (b)
Proof of (a): Recall the well-known bound, for all z > 0 ,
R n z z - 1 .
Thus
Let z = I - D - ~ . Observe that z is positive if K > 0 , which is assumed. Substituting
for z ,
( ) D - 5 VD(K) . 9.E.D. (a).
-CZ Proof of (b): Let c > 1 and g(z) = 1 - z - e . Then gf (z) =
-1 + ce-" . Note that g(0) = 0 , g(z) is continuous, and g1 (0) =
c - 1 > 0 . Thus g(z) is increasing as it passes through the origin.
Therefore there exists a z > 0 such that z E [O,zc] implies g(z) 2 0. C
-
By definition of g(z) this implies that for z E [o,.~] ,
-CZ 1 - z 2 e .
Taking logarithms to the base D ,
Let z = D and K 2 1 - log z . If K 2 K then z r [0,zC] and c D c c
1-K cD log (1-D1-K) 2 - .
D ' Ln D
Reversing signs, K 2 K implies C
vD (K) s c (O) dK . Q.E.D. (b) &n D
We are now in a position to compute bounds on the rate. By (18),
(46) becomes
By ( 6 ) , summing this relation over i = 1,2,3, ... N yields
Codeword length L is given by
(a) L = 'rN + 1
(el = ( l o D T i-1 - log D T i )I + 1
i= 1
where (a) is (31); (b) follows from (25); (c) is the symmetry between
ceiling and floor; (d) follows because To = 1 in boundary (28) ; in (e)
the expression is written as a telescoping sum; and (f) follows from the
definition (44) of li . Because z 5 r z l < z + 1 , (50) implies
Nesting this into (49) implies
Taking expectation yields
Dividing by N yields the desired bounds on the rate.
Theorem: Algorithm FVE achieves a compression rate bounded by
where N is the message length, K is the precision of the internal
register T , and VD(~) = -logD (1 - D l ) is an exponentially decreas-
ing function of K . Proof: This is a direct consequence of the preceding development, and
results when each term in (51) is divided by N . Q.E.D.
By making N sufficiently large, R can be made arbitrarily close
to the average entropy per symbol of the first N symbols. Next we
shall see that this can be done with reasonable computational complexity.
33
2.2.5 Computational complexity
Let us investigate how the computational complexity of algorithms
FVE and FVD grows with precisions J and K and block length N . Step FVE-1 requires a small fixed amount of work once per block.
Even though a large storage area is available for I? , this step need
not clear it if pointer is kept to indicate the end of the occupied
area. The memory can be cleared as 4 is advanced when step FVE-3c
needs more storage, as will be investigated. Step FVE-1 need only ini-
tialize fl = 0 and = 1 ; the complexity of this is O(1) . Step FVE-2 initializes the K significant digits of T and its
exponent field T ; the complexity is thus O(K) . Step FVE-3 is executed N times. We compute the complexity of each
iteration. Step FVE-3a requires that 2MJ digits be loaded, although in
some applications this can be reduced. If the source outputs are inde-
pendent and identically distributed (IID), step FVE-3a could be eliminated
entirely, with Q and C held constant. For a finite state Markov
source, several Q and C arrays could be fixed and a pointer switched
to the correct arrays for the state of the source. But for this worst-
case analysis, we shall assume the complexity of FVE-3a is O(MJ) . Step
FVE-3b is trivial. Step FVE-3c requires two stages; the first is calcu-
lation of a J + K digit product from J-digit and K-digit factors. If
J and K are the computer word size, this can usually be done in one
operation. A straightforward multiprecision approach would require JK
operations, although Knuth [1971] has shown more efficient procedures.
We shall assume a complexity of O(JK) for this stage. The second stage
requires the addition of the J + K digit product to a multiprecision
number, requiring J + K operations for the initial addition and a few
more to propagate the carry. That the carry propagation is negligible
is established by the following argument: At worst, suppose the initial
addition always generates a carry. F becomes the codeword of an effi-
cient source code; therefore its digits are nearly uniformly distributed
among ID . Propagation of the carry through n additional places re-
quires that n consecutive digits of F have value D-1 ; this occurs
with probability D-n . Hence the additional distance of carry propaga-
tion is approximately geometrically distributed with mean 1/D , which
is small compared to J+K . If pointer @ is used to keep track of
the occupied area of F , it need be advanced no more than J digits
from the previous iteration. This is because T is set by FVE-3d and
Q(xi) is at least D - ~ . Therefore the overall complexity of step FVE-3c
is the complexity of the multiplication, O(JK) . Step FVE-3d requires
that another J +K digit product be calculated; this complexity is also
taken to be O(JK) . The net complexity of step FVE-3 grows as the sum
of the complexities of its parts, for a total of O(MJ + JK) per itera-
tion. For N iterations this becomes O(NJ(M+K)) . Step FVE-4 is trivial.
Step FVE-5 simply requires that pointer @ be moved, a digit incre-
mented, and a carry propagated. For the argument established before,
this represents a negligible amount of work.
The computational complexity of Algorithm FVE is summarized on the
next page.
Step Complexity
FVE- 1 O(1)
FVE-2 0 (K)
FVE-3 O(NJ(M+K))
Per iteration: O(J(M+K))
FVE-3a O(MJ)
FVE-3b O(1)
FVE-3c O(JK)
FVE-3d O(JK)
FVE-4 0 (1)
FVE-5 0 (1)
Total O(NJ(M+K))
When J and K are one computer word length, and the source model
is such that the Q and C arrays do not have to be computed for each
symbol, then only two single-precision multiplications and one single-
precision addition per symbol perform the encoding functions of step FVE-3.
Algorithm FVE is fast because these are easy operations; supportive exper-
imental results will be discussed in Chapter 4.
We shall now see that decoding is no more complex than encoding. The
complexity of Algorithm FVD grows as for Algorithm FVE because the only
step in Algorithm FVD without a corresponding step in FVE is FVD-3b. We
examine its complexity. As was noted, the division F/T only needs to be
carried out to J digits and the remainder dropped. Dividing a J + K
digit dividend by a K digit divisor to obtain a J digit quotient re-
quires O(JK) operations. The search for y can be conducted as a binary
search when M is large, requiring log M comparisons of at most J
digits each. The search therefore requires O(J logM) operations. Since
FVD-3b is iterated N times, its total complexity grows as O(NJ(K+logM));
this is absorbed into the complexity of the remaining steps for an overall
3 6
complexity of O(NJ(K+ M)) , the same as for Algorithm FVE.
The complexity of encoding, transmitting, and decoding a block is
thus O(NJ(K+M)) . For constant J , K , and M , this grows linearly
with N .
This page intentionally left blank.
CHAPTER 3
VARIABLE-TO-FIXED CODES
3.1 Problem Statement and History
For variable-to-fixed (VF) codes, the rate expression (1) becomes
where codeword length L is fixed. The optimal VF code maximizes the
expected message length EN . We shall see that the problem of finding optimal VF codes for con-
ditional sources is much more difficult than the problem of finding optimal
VF codes for memoryless sources.
Tunstall [1968, quoted in Jelinek and Schneider, 19721 showed an
optimal design procedure for selecting message sets for VF codes for
memoryless sources. His algorithm is in a sense a dual to Huffman's FV
algorithm, and its computational complexity is comparably large when the
number of messages is large.
Lynch [1966], Davisson [1966], and Schalkwijk [1972] exhibited a
VF coding algorithm which does a good job with typical sequences from a
memoryless source, but as pointed out by Cohn [1976] is suboptimal because
of its treatment of atypical sequences. Essentially a dual to Elias
coding, their algorithm suffers from the same computational complexity. A
codeword is a sum of binomial coefficients. For long messages, the bi-
nomial coefficients become very large, and there are many of them in the
sum. The computational complexity grows as the square of the message
length, rendering the algorithm impractical.
There does not appear to have been any published solution to the
problem of selecting optimal message sets for VF codes for conditional
3 9
sources. (That Tunstall's procedure is not always optimal for conditional
sources is shown by the example in Appendix B. Code set A was dclsignc.ct
by Tunstall's algorithm and is suboptimal.) Jelinek and Schneider [ 1 9 7 4 ]
have considered VF codes for Markov sources, but their goal is mini-
mizing the probability of error due to buffer overflow rather than
maximizing expected message length. That VF codes for conditional
sources have not yet been theoretically explored as fully as FV codes
is understandable because their analysis is much more complicated.
Because of the fixed message length N of FV codes, the j-th
message always begins at time jN + 1 . When codewords are assigned to
the j-th message, it does not matter how codewords were assigned to the
first j-1 messages. The code designer is constrained to a given par-
sing of the source string into messages and may therefore independently
code each message.
With VF coding, however, the time when each message begins depends
on the source string and the message sets from which previous messages
were selected. A naive designer might choose a message set which maxi-
mizes the expected message length of the first message but causes the
second message to begin at a time when all possible codes are inefficient.
A simple example of this interaction appears in Appendix B and will be
discussed when the VF algorithms of Section 3.2 are analyzed.
We will begin by heuristically modifying the FV algorithms of
Chapter 2 for VF coding. Then we shall analyze their performance and
argue that while not necessarily optimal they are efficient and asymp-
totically approach the optimal, which is possible because computational
complexity grows linearly with the codeword length.
4 0
3.2 The Variable-to-Fixed Algorithms
With only minor modifications, Algorithms FVE and FVD may be con-
verted for variable-to-fixed coding. The data structures for the VF
algorithms will be the same Q , C , F , and T as were used for the
VF code, and they will satisfy (18) - (26). The symbol-by-symbol en-
coding is unchanged; the difference lies in how the decision that a block
is complete is made. In the FV algorithms a block was complete after
N symbols had been encoded. But in the VF case it is desirable to en-
code as many symbols into a codeword of fixed length L as will fit.
Because the stopping decision will be the same in both the encoder and
decoder, and because the decoder only has access to previously decoded
symbols, it is necessary that the encoder decide whether to accept another
source symbol for encoding based only on symbols already encoded and not
on the new candidate. Under the interpretation of F as an overlapped
concatenation of code elements and T as a pointer, a new symbol may
only be accepted if T is sufficiently far from the least-significant
end of the allocated codeword so that even the longest code element will
fit. Alternately, interpreting F as a sum of scaled code elements and
T as a scale factor, a new symbol may only be encoded if T is large
enough so that the scaled code element of any new symbol can be accurately
represented within the L digits allowed.
Under either interpretation, a new symbol is accepted if T is not
below some threshold. For speed, only the exponent field T of T is
examined to make the comparison. This decision is represented by the
"while" clause in the encoding algorithm.
Algorithm VFE (Variable-to-Fixed Encoding):
(1) Set F +- 0 . 4 1
(2) Set T +- 1 . (3) While T < L - J do:
(3a) Load the Q and C arrays from the predictor.
(3b) Get symbol xi from the source.
(3c) Set F + F + T.ci(xi) . (3d) Set T + T.Q.(xi) truncated to K significant digits.
1
-L (4) Truncate F to L digits and add D (i. e., set F +- LD~F +d ) . (5) Transmit the L digits of F . (6) Go to (1) and begin encoding the next message.
As in the case of the FV algorithms, we attach distinct labels to
successive values taken on by variables in Algorithm VFE, letting Fi and
Ti denote the values attained by F and T respectively after the i-th
iteration of step VFE-3. Let N denote the total number of iterations,
and hence the message length. Now N is a function of the random source
data X . Even so, relations (27) - (30) and (32) - (35) are still valid,
as is Lemma (37) and its proof. The dependence of codeword length L on
'rN given by (31) no longer applies, however. Instead, the threshold
L - J for T determines N . We will investigate why this threshold
guarantees correct decoding. Recall that bounds (38) were used to prove
correct decoding for the FV code. In the VF case, however, we cannot
control the codeword precision; it is fixed at L digits. Therefore to
satisfy (38) the VF encoder must insure that TN is not too small.
Since the encoder must decide to accept the N-th symbol based on T N-1 '
the following lemma is needed.
Lemma: The final value of T is related to the previous value by
-J with equality if and only if QN(Q = D . Proof: Recall recursion (30) and that by (19) and (20) Q (x ) is a
N N
positive-integer-multiple of D - ~ . There are two cases depending on
whether the integer is 1 or more.
-J -J Casel: If QN(XN)=~ then T N = D T exactly. The truncation N- 1
in step VFE-3d has no effect because QN(xN) is a power of D . case2: If Q (x) > 2 ~ - ~ , then
N N
where (a) is the lower bound of (43); (b) is the case 2 premise; and (c)
DK- 1 follows because D 2 2 and K 2 1 imply 2 2 and hence
(I-DI-~) 5 112 . Q.E.D.
With the aid of the above lemma we may calculate absolute bounds
on T . N
Theorem: The final value of T is bounded,
Proof: The last iteration of step VFE-3 began because L-J . With
n = L - J in (26) this implies
Substituting this into (54) proves the lower bound. Step VFE-3 declined
to begin an (N+l)st iteration because T 2 L - J . With n = L - J in N
(24) and (25), this implies the upper bound. Q.E.D.
The lower bound of (55) and the fact that D - ~ < D-~" imp 1 y
which by (35) implies
Once again W can be expressed in L digits; the proof is unchanged
except that step (b) now follows from (57) . The VF decoding algorithm is obtained by modifying Algorithm FVD
in the same way as Algorithm FVE was modified to obtain Algorithm VFE.
Algorithm VFD (Variable-to-Fixed Decoding):
(1) Set F + The first L channel digits.
(2) Set T + 1 . (3) While T < L - J do:
(3a) Load the Q and C arrays from the predictor.
(3b) Set y. = maxiy: YE I and Ci(y) 5 FIT} . 1 M
(3c) Output the decoded symbol y i
(3d) Set F + F - T.Ci(yi) . (3e) Set T +- T.Q.(yi) truncated to K significant digits.
1
(4) Got to (1) and begin decoding the next codeword, L channel digits
after the beginning of the present codeword.
Let G denote the value of F after the i-th iteration of VFD-3. i
Since the codeword length is constant, step VFD-1 sets Go = W exactly.
By (58) this means F I G < F + T N O N N
which confirms (38) for the VF code. The decoding rule (39) and recur-
sion (40) apply directly.
The correct-decoding theorem (41) and its proof hold for the VF
code also, with the additional verification that the decoder outputs N
symbols, the correct number. This is because the decision that a block
is complete is based on the same values of T in encoder and decoder.
The rate definition (1) becomes for a VF code
We will explore the properties of EN . As for the FV code, define
ti and vD(K) according to (44) and (45). Then (46) follows directly.
For initial analysis suppose that the source is memoryless (single-state).
In this case the li are independent, identically distributed (IID)
positive-real-valued random variables. We pause to develop a needed
theorem.
Consider a random process {Ll,L2,L3, ...) where the & are IID i
positive-real-valued random variables. Define the maximum and minimum
possible values of L i '
1 max =sup{.: p(Lit z) > 0)
and L = infiz: p(Li5 z) > o ) . min
These definitions allow (but do not require) random variables to be i
continuous. Define the cumulative independent-increment process
The time to reach some fixed positive threshold L' ,
is an integer-valued random variable determined by the random process.
We seek bounds on the expected value of N . Lemma: L' < E L N 6 L 1 + L . max (60)
Proof: We prove the stronger result L' < LN 5 L' + Lax . That
L' < LN follows by definition of N . The upper bound is proved by con-
tradiction. Assume that for some trial LN > L' + lmax . Then
L ~ - l 2 LN - L > L' , which contradicts the definition of N . Q.E.D. max
Lemma: ELN = EN EL . (61)
Proof: [~ald, 1947, Section 3.51 The cumulative process L will take
the longest to reach the threshold when all steps attain their i
minimum value. In that case, by definition of N ,
Smaller values of N result when some of the L exceed Lmin i . In
general,
Let m = L'lLmin + 1 so that N 6 m . Then 1 J
where (a ) is because t h e ti a r e I I D ; i n (b) t h e sum i s separa ted ; (c )
i s by l i n e a r i t y of expec ta t ion ; (d) i s by d e f i n i t i o n of LN ; (e) fo l lows
from condi t ion ing on N knowing 1 I N I m ; ( f ) i s because g+1 through 1 a r e independent of L1 through L (g) i s because t h e
m j '
a r e I I D ; (h) i s a r e f a c t o r i n g ; ( i ) fol lows because t h e f i r s t sum i s 1 i
and t h e second sum i s t h e d e f i n i t i o n of EN . Adding EL EN - m EL t o
both s i d e s completes t h e proof. Q.E.D.
Theorem: For t h e random process descr ibed i n t h e t e x t , t h e expected va lue
of N i s bounded by
Proof: Substituting (61) into (60) yields
Dividing by E [ gives bounds on EN ,
L' - < E N S L'+ 'max EL Q.E.D.
~1
We now apply this result to finding the rate of the VF code for
memoryless sources. Taking expectation on (42),
HD(Xi) ~e~ < H~(X~) + VD(K) .
Let 1 = 1 max
+ VD(K) = logD - + VD(K) = J + VD (K) . logD iiGTq
D -J
Recall from (50) that
With the upper bound of (55) this implies
Similarly by (56)
Therefore setting L' = L - J - 1 satisfies (59) . Substituting this,
(63), and (64) into (62) yields
Theorem: Algorithm VFE achieves with memoryless sources a compression
rate bounded by
L L - H (X.) I R < ----- L-1 D 1 L-J-1 (67)
where L is the codeword length, J is the number of digits in the
D-ary representation of the source probabilities, HD(Xi) is the
entropy of each independent, identically distributed source symbol,
K is the number of digits of precision in the internal register T ,
and VD(K) = -10~~(l-~l-~) . Proof: This is a direct consequence of the preceding development,
obtained by dividing all terms in (66) by L and inverting. Q.E.D.
Because the necessary theory for calculating the rate of a VF
code for a conditional source (a source with memory) is not yet
fully developed, we shall not attempt to find any bounds on the
rate that Algorithm VFE achieves for conditional sources. It would
also be meaningless to try to compute the expected length of the
first message, because as shown by the example in Appendix B,
maximizing the expected message length for every history may be a
suboptimal strategy. Instead, we will examine how effectively
Algorithm FVE uses each codeword to contain information.
Recall that X denotes the infinite random source string to be
emitted, and B denotes a complete and proper message set. Because
exactly one message b(X)EB prefixes X , the message set B parti-
tions the range of X into I B I categories. We are to transmit one
codeword W from a set of equal-length codewords in a one-to-one corres-
pondence with B . Because there is a fixed cost associated with the
transmission of W , we want to maximize the information contained in W
about X , or I(W;X) . Because information is symmetric [~sh, 19651,
this quantity equals I(X;W) which in turn equals I(X;b(X)) because
codewords and messages are in one-to-one correspondence. But I(X;b(X))=
H(b (X)) - H(b (X) I X) , where H(b (X) I X) = 0 because b(x) is uniquely
determined by x for all infinite strings x . Therefore we want to
maximize H(b(X)) . This is done by selecting B so that all messages
are as nearly equiprobable as possible.
Suppose message b of length N(b) is encoded. Then
With this, (42) implies
I " < log D P (bi 1 b (i-1)) + vD(K)
Summing over i = 1,2,3,. . .N(b) with (6) yields
By (65) this becomes
5 0
where N is the length of the longest message. Exponentiating and re- max
arranging terms yields
We will now show that there is no VF code with a codeword length
less than L - J -1- Nmax V (K) which could encode the same message set D
as Algorithm VFE (or any other message set which conveys as much informa-
tion).
Theorem: The message entropy achieved by Algorithm VFE is bounded below
by HD(b(x)) > L - J - 1 - N VD(K) max
where L is the codeword length, J is the number of digits in the D-ary
representation of each source conditional probability, N is the length max
1-K of the longest message, and VD(K) = -logD(l-D ) decays exponentially
with the number K of significant digits in register T . Proof: We have
> C P(~)(L-J - 1 - N v~(K)) max be B
= L - J - 1 - N vD(K) max
where (a) is the definition of message entropy (9); (b) follows from (68)
and the monotonicity of the logarithm function; and (c) results from
factoring and replacing the sum of the probabilities with 1. The behavior
of VD(K) is given by (47). Q.E.D.
Note that by letting L become very large, the entropy per codeword
symbol,
approaches 1 digit; thus Algorithm VFE is optimal in the limit.
CHAPTER 4
EXPERIMENTAL LMPLEMENTATION
To demonstrate the speed of Algorithms FVE, FVD, VFE, and VFD, they
were implemented on an Interdata model 7/16 minicomputer. This is a
microprogrammed machine with a 16-bit word length and average instruction
execution time of about two microseconds. We will not go into detail
about the machine's architecture and instruction set as these are covered
in the manufacturer's documentation [~nterdata, 19711. For compatability
of nomenclature with their larger machines, Interdata calls the 16-bit
word a halfword; this convention will be followed here.
Parameters chosen for the experimental implementation were selected
both to simplify programming and to accurately reflect a typical applica-
tion. A binary channel (D =2) is frequently encountered in practice;
since binary arithmetic is natural to the host machine this parameter was
chosen. The precisions J and K of the probabilities Q and pointer
T were both chosen to be 12 bits. It seems unlikely that any source
would have a probability distribution known more accurately than to one
part in 4096. That V2(12) = 7.05 x insures that keeping 12 bits
of pointer T is sufficient to limit codeword waste to under one one-
thousandth of a bit per symbol. Conveniently, 12-bit numbers fit within
the 16-bit halfword without occupying the most significant bit, which is
used as the sign bit of two's complement notation for signed numbers.
Although "Multiply Halfword Unsigned" is a standard microcoded instruc-
tion, there is no corresponding divide instruction which does not use
two's complement notation.
Although Interdata specifies a floating-point hexadecimal format,
the minicomputer used in this experiment has no hardware or microcode
to perform floating point arithmetic. When conventional floating point
operations are needed, they are normally performed by subroutines in the
operating system. Because these subroutines provide many features not
needed in the coding algorithms (recall that T is always positive and
at most one), they are slow. Consequently a simpler format was chosen
to represent pointer T . Two halfwords are allocated. The first con-
tains the 12 significant bits of T right adjusted, and the second con-
tains the binary representation of the exponent field.
The Q and C arrays are allocated as described in Section 2.2.
Each element occupies the 12 least significant bits of one halfword.
The multiprecision accumulator F is implemented as a large array,
packed 16 bits per halfword.
For ease of programming, expecially for communications with the ex-
perimenter, the test programs were written in FORTRAN. However, the key
steps involving arithmetic with Q , C , F , and T are not easily
coded in FORTRAN because of the format of these numbers. Special FORTRAN-
callable assembly-language subroutines were written to perform these
steps.
Program listings and results appear in Appendix C. The first four
listings are the assembly language subroutines. Subroutine NEXTT performs
the multiplication and truncation in step FVE-3d. Subroutine SCADD per-
forms the multiplication and addition in step FVE-3c. Subroutine SCGET
returns the quotient in step FVD-3b. Subroutine SCSUB performs the multi-
plication and subtraction in step FVD-3d. Then two FORTRAN-coded routines
5 4
are listed. DUMP provides a formatted hexadecimal list of the first N
halfwords beginning at a specified address. DUMPV is similar but the
number of digits allocated per halfword depends on the maximum value to
be represented.
The subroutine listings are followed by Program BFV, a demonstration
of Algorithms FVE and FVD for Bernoulli (memoryless binary) sources. The
bulk of the code provides for interactively obtaining the source statistics
and message length form the experimenter. The program then uses a system-
supplied random number generator to generate a source sequence according
to the desired statistics. Comments distinguish the steps of Algorithms
FVE and FVD. Following the program listing is a sample printout.
Program MFV is an extension of Program BFV to encode first order
Markov sources (sources whose output is a Markov chain) with arbitrary
alphabet size. A sample printout follows the program listing.
Program VF is a demonstration of Algorithms VFE and VFD for memory-
less sources with arbitrary alphabet size; again it is followed by a
sample printout.
Each of the three programs encoded several million source symbols
and decoded them without error. The execution time of the programs was
found to grow linearly with the message length, with typical time being
30 seconds to encode, decode, and compare a message of 30,000 symbols.
The speed of 1000 symbols per second approximately applied to all three
programs; however, a large alphabet slowed the latter two because a sub-
optimal linear search in step FVD-3b was selected to simplify programming.
The author's experience indicates that for the Interdata 7/16,
programs written in FORTRAN run much slower than programs written in
assembly language, with a speed ratio as great as ten or more. It is
5 5
suspected t h a t a c a r e f u l l y engineered assembly language implementation f o r
a s p e c i f i c a p p l i c a t i o n could y i e l d an even g r e a t e r i nc rease i n speed over
t h i s demonstration.
CHAPTER 5
GENERALIZATION
The algorithms explored in this paper may be considered special
cases of a family of algorithms in which the codeword is a sum of code
elements which have been scaled or shifted so that they do not interfere
with one another when added. There is a one-to-one correspondence between
source symbols and code elements, and each code element has associated
with it a measure of its precision or length used in computing the scale
factors.
To illustrate this structure, consider a simple classical Huffman
coding problem. Suppose a source emits symbols 0 , 1 , and 2 with
probabilities 1 , 4 , and 114 which are assigned code elements 0 ,
10 , and 11 respectively. The code string for the sequence 01201 is
01011010, where the code elements have been concatenated.
Any concatenation is equivalent to displacing each code element to
the right of its predecessor by a distance equal to the length of the pre-
decessor, or to displacing the codestring left by a distance equal to the
length of a new code element in order to make room for it. In the gen-
eralized case where code elements need not have integer "lengths" these
two displacement modes are not equivalent. We consider strings of D-ary
symbols to be radix-D numbers and introduce a radix point as a reference.
Shifting a number left by p places then corresponds to multiplying by
DP ; negative p implies right shifts. We illustrate with a decimal
(D=10) example. Suppose that we wish to concatenate the numbers 2 and
5 . If the distance p between them is to be one place, we could multiply
2 by 10' and add 5 , giving 25 . Or we could start with 2 and add
5 7
5 X 10-I , giving 2.5 . But now suppose that the numbers 2 and 5
are to be separated by p = 1% digits. The first method yields
2 x lo1$ + 5 = 68.24555 , but the second method yields 2 + 5 x 10-14 =
2.158114 , which differs markedly from 68.24555 . A machine implementation requires that, in addition to a table of
code elements and their lengths, two registers be provided. The accumu-
lator contains the running sum of scaled code elements, and the pointer
provides the scale factors indicating the displacements of the code ele-
ments in the accumulator. To illustrate these definitions, consider
Algorithm FVE. Here array C contains the code elements. The values of
Q may be called the precisions of C because C(x) + z where
0 I z < Q(x) uniquely specifies x . F is the accumulator and T is
the pointer. We will represent T in fixed-point form so that its use
as a pointer is clear. Set M = 3 , D = 2 , N=5, J=2, K = 1 and encode
the same string 01201:
We set L = 8 + 1 = 9 and the codeword is 010110101 . Coincidentally,
the ~(x) are the Huffman code elements and the code string is the con-
catenated Huffman code string with a trailing 1 attached.
It is not necessary that the displacement of code elements be accom-
plished as in algorithms FVE and VFE. In fact there are eight types of
arithmetic coding algorithms in the family as a result of three choices.
First, instead of shifting the code element, it would be possible to
shift the contents of the accumulator in the opposite direction. Second,
the code elements could be added to the left (most significant) end of
the accumulator instead of to the right. Third, the pointer can contain
either the actual scale factors as in this work, or their logarithms. In
the latter case, the pointer is moved by adding the displacement, and the
scale factors are obtained by exponentiation. Table 5 summarizes the en-
coding algorithms belonging to the family.
One algorithm of the family was developed independently by Rissanen
[1976]. It shifts the code element to the most significant end of the
accumulator, using a pointer obtained by addition and exponentiation. We
shall now compare the alternatives in the three choices, and see that it
is preferable to shift the code element rather than the accumulator, and
to add code elements to the least significant end of the accumulator.
Scaling the new code element is preferable to scaling the accumulator
because a smaller volume of data must be manipulated. Code elements are
of fixed length J digits, but the significant length of the accumulator
grows until it reaches the codeword length L . If code elements of
length J are scaled by a pointer of length K digits, each of the N
multiplications is between a J-digit and a K-digit factor. If the accu-
mulator is scaled, however, the N multiplications grow in complexity as
5 9
Table 5
Encoding Algorithm Summary
Let p = probability of symbol i i
F = Accumulator T = Scale factor pointer
or L = Logarithmic pointer
Initialize F = 0 1::: I Shifting the Code Element
A. Right 1. By multiplication (Elias, Pasco)
F = F + T.C i T = T-p i
2. By exponentiation
F = F + D - ~ * c ~ L = L + L i
B. Left 1. By multiplication
T = T/p i
F = F + T-C i
2. By exponentiation (Rissanen) L = L + L i
I1 Shifting the Accumulator A. Left
1. By multiplication F = F + C i F = F/pi
2. By exponentiation F = F + C i
-ti F = F + D
B. Right 1. By multiplication
F = F .
F = F + C i
2. By exponentiation -Li
F = F O D
the length of the accumulator increases. The last multiplication involves
L-digit and K-digit factors. Thus the complexity would grow as O(NL) ,
proportional to the square of the message length. Because of this intol-
erable complexity, we shall hereafter assume that the code element is
scaled.
We say a codeword is grown to the right if successive code elements
are added to the least significant end of the accumulator, and grown to
the left if they are added to the most significant end. We will contrast
these two techniques.
With codewords grown to the right, the radix point is either at the
left end of the accumulator or a fixed distance from it. With codewords
grown to the left, the reverse is true. This difference is important at
decoding time. The decoder sees an infinite string of channel digits,
from which it must load the accumulator with a codeword correctly justi-
fied for the ensuing arithmetic operations of decoding. Fixed length
codewords present no problem if the order of transmission (e.g. most
significant digit first) and the radix point location are agreed upon.
With variable length codewords, however, the problem depends on how the
codeword was grown with respect to the radix point. If the codeword is
grown to the right and transmitted most significant digit first, then the
radix point may be placed and the decoding begun before the end of the
codeword is known, as in the case of algorithm FVD. If the codeword is
grown to the left, however, problems result. The most obvious is that
there is no way of determining the number of digits which are to be placed
to the left of the radix point. But the problem is much more subtle and
extends even to VF codes.
Decision rules for decoding involve comparing the (possibly scaled)
magnitude of the accumulator with some other quantity. The most signifi-
cant digits of the accumulator will most heavily influence the results
of the comparison. This dictates that the symbols encoded in the most
significant digits of the codeword be decoded first, so that their heavily
weighted code elements may be subtracted. Codewords grown to the left
must therefore be decoded in the reverse order from which they were en-
coded, while codewords grown to the right may be decoded in their original
order. In order that the leftmost code element may be decoded correctly,
its scale factor must be known. For codewords grown to the right this
scale factor is as initialized by the algorithm; in algorithm FVE it is 1 .
But for codewords grown to the left the scale factor of the leftmost code
element is a function of the preceding source symbols in the message,
which are unknown to the decoder. Therefore whenever a codeword grown to
the left is being used, the scale factor of the most significant code
element must be transmitted in addition to the codeword itself. In other
words, the contents of the pointer as well as the accumulator must be
transmitted. Typically, about log N additional digits are required D
for the pointer. This necessity, which Rissanen recognized, limits the
compression rate achievable by such algorithms.
There is another reason, perhaps of greater significance, for pre-
ferring algorithms which grow codewords to the right. Recall that a
necessary assumption in the discussion of the operation of the predictor
for conditional sources is that the predictor in the expander, like the
predictor in the compressor, provides the probability distribution for
the current symbol conditioned on all previous symbols. But we have seen
that codewords grown to the left must be decoded in the reverse order
62
from which they were encoded, and thus the previous symbols are unavail-
able to the predictor in the expander. This prohibits using of algorithms
whose codewords grow to the left for compression of data from conditional
sources.
There does not appear to be any fundamental reason to select either
the multiplicative or additive-with-exponentiation technique for moving
the pointer, and practical reasons may dictate the selection. Many com-
puters have hardward which can quickly perform the multiplications used
in this paper, but some do not. Rissanen has shown that the exponentia-
tion can be quickly done by table look-up, making this technique
competitive.
This page intentionally left blank.
APPENDIX A
EFFECTS OF OUANTIZATION OF PROBABILITIES
We are to investigate the effect on compression rate of the necessary
quantization of the source conditional probabilities. To simplify the
notation we will concentrate on some specific time index i and some
specific history x i - 1 . We may then define
and p(xi) = p(xi1x(i-1)) . (A2
Under these assumption, if both (18) and (20) are satisfied, then
(48) implies 1 ti log -
D p(xi)
-J Suppose now that p(x.) is not a multiple of D for some xi& IM.
1
Then (18) is not satisfied and (48) no longer applies. Let .eli denote
the contribution of x to the codeword length under these conditions. i
BY (461,
The extra codeword length because of the quantization is bounded above by
subtracting (A3) from (A4) ,
The expected value
is the Kullback-Liebler [1951] distance from distribution p to distri-
bution q . Kullback and Liebler have shown that ID(p,q) 2 0 with
6 5
equality if and only if p(xi) = q(xi) for all x. E IM . 1
We have already seen that V (K) decays exponentially with K . We D
will now see that the precision J of q(xi) sufficient to bound
P (xi) logD --- arbitrarily closely from above grows only as the logarithm
q (xi>
of the inverse of the bound, and then show by example that a much smaller
precision is sometimes sufficient to bound the expected value ID(p,q) . Theorem: For any probability distribution p and any real number 6
satisfying 0 < 6 < 1 , there exists an integer J given by
and a quantized approximation q to p satisfying
and
q(x ) r 0 for all xi€1 i M '
- J q(xi) is a multiple of D for all x.€IM ,
1
P (xi) such that < 6 for all x. E I
1 M ' (All)
Proof : Consider the interval [D-6p (Xi) , (xij] . The width of the
interval is (1 - D ) x i . I£ J = + l o g D d + l , then 1 by Lemma (A12) appearing after this proof,
and hence
Therefore there is some multiple of D - ~ in the interval. Set q(xi)
equal to the largest such multiple. Then
-J Inorder that the q(xi) maysumto 1 , multiplesof D maynowbe
arbitrarily added to various q(xi) until q(Xi) = 1 . ~~t~ that
x i E IM
this action preserves the bound (All). Q.E.D.
We now prove the lemma required in the previous proof.
Lemma: If D 1 2 and 0 < 6 < 1 , then
Proof: let D 2 2 and 0 < 6 < 1. . Let f(6) = D-D1-& - 6 . Then f" (6) = -(ln < 0 . Observe that f (0) = 0 and f (1) =
D-2 2 0 . By convexity f(6) 2 0 for all 6 between 0 and 1 . Thus
D > 6 . Taking logarithms to base D ,
Rearranging terms,
It is seldom necessary that bound (All) be satisfied for all xi€ I*;
rather it is often sufficient to bound only the expected value ID(p,q) . The following example shows that this can sometimes be done with a much
smaller J . Suppose that M = 3 and the source distribution p =
(.015, .015, .970) . We want the minumum J which will allow us to en-
code for a decimal (D =lo) channel with the average quantization loss
- 3 IlO(p,q) less than = 10 . By formula (A5), setting J = 6 would be
sufficient. However, either distribution q1 = (. 01, .01, .98) or
q2 = (-02, .02, .98) does the job with J = 2 , because IIO(p,ql) =
-4 9.62 X and I (p,q2) = 6.17 x 10 . 10
The results of this appendix are summarized in the following theorem.
Theorem: Let 1 be the contribution to the codeword length of the i-th i
symbol x when the source conditional probabilities may be exactly rep- i
resented in the Q array, and let 1' be the contribution of x when i i
the conditional probabilities are quantized to J digits. The codeword
length wasted by quantization may be bounded arbitrarily closely above
where 6 is an arbitrary real number satisfying 0 < 6 < 1 , and VD(K)=
-logD (1 - is an exponentially decreasing function of the precision
K of register T . This is achieved by selecting a large enough K and
by selecting the number of digits J in the quantized probabilities ac-
cording to
J = log - + log D + + l l . 1 pmin
Proof: This results from Theorem (A7) when (All) is substituted into
(A5). The exponential behavior of VD(K) is given by Theorem (47). Q.E.D.
6 8
APPENDIX B
MAXIMIZING VF MESSAGE LENGTH DOES NOT MINIMIZE RATE
Although the optimal minimum-rate VF code maximizes the steady-state
expected message length EN, it does not necessarily follow for a condi-
tional source (a source with memory) that maximizing the expected message
length for every history will achieve this optimum. This surprising re-
sult is illustrated by the following counterexample. Suppose we wish to
encode a binary Markov chain [~eller, 19681 with the transition matrix
into codewords of length one ternary symbol. Define a -- code set to be a
set of codes (as defined in Chapter 1) and a rule for selecting one of
them based on the state of the source, where the state of a source is all
aspects of its history which influence its future behavior. For a Markov
chain, the state is simply the previous symbol. Let code set A assign
the three codewords to the message set {0,10,11 for state 0 but use
the message set {00,01,1) for state 1, and let code set B use the message
set 0 1 0 , l l for either state. The expected message length given state
0 is 1.9 source symbols for either code set; given state 1 it is 1.6
for code set A and 1.4 for code set B. We shall see that despite this,
code set B is better.
We first investigate code set A. According to (6) the message
probabilities are
from which the expected message lengths are, by (7),
A code set induces a subchain of states with one transition of the sub-
chain defined as one message emitted by the source, and the state of the
subchain defined as the last symbol of the previous message [~elinek and
Schneider, 18741. The probability of a transition from state a to state
b is thus the total probability of all messages from state a ending in
symbol b . The state transition matrix under code set A is therefore
The subchain stationary distribution is
Thus the expected message length is
Similarly for code set B,
The surprising result is that although E(NAI0) = ~ ( ~ ~ 1 0 ) and
E(NAI1) > E(NBI1) , the unconditioned expected message lengths are re-
lated in the reverse way, EN < EN B '
and code set B has the lower rate. A
An intuitive justification for this result occurs when the message
and hence code entropies are compared. For state 1 , the message entropy
for code set A is H3(.06, .54, .4) = 0.790 ternary digits, but for code
set B the message entropy is H3(.6, . 2 4 , .16) = 0.858 ternary digits.
As argued in Chapter 3, this means that code set B packs more information
into each codeword.
This page intentionally left blank.
APPENDIX C
PROGRAM LISTINGS
*+NE:x:TT ++*+*****++**+***********44444***44**4*44*44*4+*44
4
4 :SUEROUT I PIE NE::ITT (T r I:!') 4
4 TH I :5 Zu'EROl-IT I NE PERFORMS THE MUL'T I P L 11163.T I Ot.4
4 T = T * <%> TRI-INCATED 'TU 1? ZIGtJ I F I CHNT P I T
.I) I- ISE~ EB . r . ~ ~ p s F14E-:3n 9 F10:lD-:3~? $'FE-:31l. *p411 '8)F 4
4 CALL I N I ~ SE~I-IENCE:
4 INTEGER*? T (2:) 9 Q I::HLPHHBET 3 I ZE:? 9 ::<:
CHLL NE:j::TT (T r I;! (><::I > 4
4 F ~ R M H T UF RRSUMENTS: + T - FLOHTING POINT SPECIHL F O P M H T ~ TWU HHLFW
4 T1::l'j I~OPITHINS 5 1 ~ p ~ 1 ~ 1 ~ ~ 1 t . 4 ~ FIEL-D? 12 EI.T
+ IrtDJUSTEn 9 I.4 I TH I PIPL. I E I l RHTI I I:.' P O I NT
.* THE 1 ~ T H RNII 1 F'TH PIT ~ ~ o t . 1 RI GWT.
.* T I::?, C O N T ~ INS E:X:PO~.JEPIT FIELD 9 A POSIT I 1,s
4 INTEGER? R I 1SH.T-HDJI-IZTEIr.
4 - ONE HRLFWORD r o N rt=i I 1.1 I t . 4 ~ H F I ::<:ED-F~ I NT
,* F? I QH'T-AnJl-ISTED FRHlZT I ON C l I TH I MFL I
4 POINT LEFT OF P103T S I13N I P4 I SHt.JT F I T
4
****+*+*++++++******4*44***4444********44******44*
EMTP'I' NEXTT 4
4 .?. .2.'fMBOL 1 C REIS I STER ASS I Gt-JMENTS.
+ 12 H EQlJ 1 0 HKI~REZS OF I:!.
TH ElY-I 11 I~DDREZS OF T. TH EaLl 12 MOST s IIGN I F I I Z R ~ J ' T HF~L
T L EIJU 1'3 LEHST : ~ I I ~ C J I F I I Z A ~ ~ T H A T :x: E1Jl-l 14 EXPONENT FIELIB OF T . RTN €121-1 1 5 R E T U R ~ . ~ L I ~ c : : . 4
4 GET PARHMETERS AND MULT I FL'T..
4
NEXTT :ZTM QS9 FTNzfi'y' .~ . - l jE 7. REG IZTERS.
42 LH T L ? 0 (TH:r I ~ E T 3 I G N I F I C A N T F I E L
O04OR 0 [14C:H Pi0 ERPORP
DONE FTN:SH14
* NEXTT NORM I:! A RTPi TFI TH T L TX
52 NORM SLH TH? 1 -. 2.H I FT PPODIICT LEFT.
5 :3 A 1 .5 T ::.( . 1 1 t.lCPEMEtJT E::.::POt.lENT.
5-1 THT T H r X i F 8 0 0 " P 4 a ~ t . i ~ ~ I ZED YET?
57 * 53 -.
.L.F)!.>E RESULT A N D RETI-IRN.
53 6 0 DONE :STY THg 0 (TA> -. ~.RI...~E SIG~.JIFICANT FIE
61 LM QHg FTN:SH$' RESTORE REG IZTEPS.
62 HH RTN r 0 l::RTN> CHLCULATE RETUPN AI~D
.- ., b .-r HR RTN RETURN. 64 + 5 F T ~ { : ~ ~ l ~ v ' '~Z-QA-QI FORTRHN PEG I STEP SAI..?
66 END
4 TH I s SVEROUT I NE PERFORMS THE MULT I PL I CHT I ON
4 AND ADKII T 1 ON F = F + 'T411: c::'::! 4 USE^ BY STEPS F1dE-:3c ~ t . 4 1 1 VFE-~IZ. 4
FORMAT OF HRIZUMENTS: F - M U L T I P ~ E C I ~ I ~ N FRACTION? PACKED 16 B I T S
HHLFWORX3 I N HRRA"i. ? I MPL I ED RAII I X PO I NT
OF MOST S ISNIF ICANT B I T I N MOZT J IS t . I IF1
FIRST::^ HYLFWORII.
T - F L O ~ T I N I ~ POINT ZPECIHL FORMAT? 'TGIO Ht3LFW
T1::l::l C o r r ~ e ~ r t s SIGNIFICANT F I E L n r 12 BIT
HbJUSTEDr WITH IMPL IED RADI:? POINT
THE 1 ~ T H AND ~ Z T H B I T FROM PI15W.T.
T <2? I ~ O N T A INS E::e:PONEt.IT F I E L n 2 A PO5 I T I I...'
I NTEGER? R I GHT-HDJUSTEB.
1; ( X : > - ONE H A L F W O R ~ CONTAI CI I NG A F I .:.::ED.-PO I NT
R I GHT-AbJU5'TEII FRHCT I ON WITH I M F L I
POINT LEFT OF MO'ST SIGN1 ~ . I I I Z A ~ ~ T B I T
4
++*+++*44++4+*+*+4+*444**+44444444*444444444+4444*
ENTRg~' SCHDD 4
4 : ~ ~ M B O L I I= REG I STER ASS I GNMENTS:
+ C A EQU 6 ~ I I I I R E ~ ~ OF C 'I::.::::' . PTR EQlJ 7 POINTER INTO F i34PA"i .
TH EQCl :3 HIGH SII;NIFII=ANT BIT
TM EQLi 3 P~IDISLE S I Gt.IIFICAC4.T Ir
TM 1 EQCl 10 HLTERNATE REGIJTEP F
TL EaIJ 11 LOW SIGNIFIIZHNT BITS
TH EQU 12 ~ D D R E S S OF T. RUNE EQCl 13 COt.45THt.IT ONE.
T :+ Eal-1 14 E::.;PONENT F I E L D OF T. RTQ EQU 15 RETURFI L I t.4~:.
4
4 I?ET PARAMETERS RNb MULT I PL'r' SIIFN I F I CAt.4 T F I EL
4 .Y - :.l-.HDD STm CH 9 FTNSH'y' 'ZA~. .E FCIRTTRH REIF I ZTE
LH TM? 12 (TH, GET SIGNIFIl=Ht.4'T F I E L
+ HDD PRODVIZT TO CORRECT FLHCE IN F r = t ~ ~ f i . v . .
LH TXr 2 (TH::I - I2ET E::.<F'ONENT F I ELD 0 111 12 1 JR 48EC
[I [I [I 2 [I 0 1 SR 1::37E
FFFF 1j01C:F: CE70
On133 002OR C47O
FFFE 111 024R 4H7F
I2 I? 02 13 1?;?:3R 2l;jEF [I O;:AF( C:4E 11
O 111 I? F [II:IE:ER 1:'3E0
0 0 O:3 I:II:I'E:~R 4:3:30
0052R 0036R Z12%, 0O:3SR 1:7EO
FFFF 0 12:f:CR ED:3E
onng 0040R Z:3083 0 0 4 2 k 24B [I 0044F 0:3H9 0 0 4 i P ECHE
F F F 8 [I O4HR ECSE
FFF8 I:IO~ER 61B7
0 o 04 [I 1:152R 4E97
0 0 0'2 111 1:155F 4 [I97
0 0 1:I 2 005AR 4E:37
O I:I O O OO!~IER 40:37
0000 [I[lC,?F( z'3:3Cn
L H I PTRr -1 (TX) .:. -L .~JFTRA~ZT 1
:SRHH PTRr :3 I I.." ID E E: ..I.. ;3 1.4 KI
PTR9 X'FFFE'
PTRr 2 (RTYb
BPS LEFT %: H I
'SHIFT LEFT 1 T O i2 FL
ES FIGHT L I S
LHR SplL
ADD2 TLr O TM19 TM TM 1 7 -3 (TX.1
- bo TWO HALFWURI lF
CLEHR T LOW.
CUP r. TM a b r m
~.HIFT R I I ~ H T TX-I2 FLA
HDD:3 AHM TLr 4 CPTR)
HDDZ HCH TMr 2 (PTR)
HCH THr O (PTR.,
DONE
HDD2 0 0 5 2 ~ ADD3 0 04ER I: H 0 0 06, DUNE OO6ER FTNSHIV' O [17:3R LEFT 0 0 38R PROP O 0gC.R PTR 0007 RIGHT O1142R NINE 0013D FTPi 0 [I [IF
+ SCADD OOOOR TF1 O 0 OC TH 01:10% TL 13 0 !:,I: TM [I [I [I E? TM1 I? 0 111 R T :.:: 1301lE
79 + 3 0 + PROPAGHTE CARPY. :31 + a 2 L I S RONEr l 3:3 PROP :31:s PTR,L> EIUP.IP PO I NTER
:3 4 AHM BONE? O <PTP)
:3 5 BE:;.: PROP !315 DONE LM I:H I FTN:?AV FIESTORE REIS I STEPS.
37 AH RTN r 0 ITRTN:r I~HLSULH'TE RET~JRN ADTI
:3 8 BR RTN 33 r3 0 FTNSA1%) n::: .:, -,c-CH-CA .-.
31 END
01][10R DO70 0 064F:
I:ll:1[14R 4:31:F 0 0 [I4
0 [I O:3R 45DF I1 111 I., 6
(I 0 OC:R 4:3EC [I 0 111 ;2
I:I[I~ OR C87E FFFF
***+*****++*+***+++***+***+********++*++**+**++**~+ ,* -+ ::IJEROIJT INE :I:CGET <Fr T 9 R A T I lJ.i * + TH I 5 ZLIPROUT I NE RETURN3 THE DVOT 1 EN'T "T + E..( STEPS F 1 d D - 3 ~ ~ p t r i ':.'FD-:>p.
* + I::ALLING SERUENCE:
+ I NTEGER+2 F c::xxx:r , T @:?::I , RAT I * IZHLL SClSET (F, T , RAT 10:)
FORMF~T OF INPUTS: * F - MULTIPRECI~ION FRSCTIONI PACKED 16 BITS
+ HfiLFWORIT I N RRRe V' r I MPL I ED RHD I PO I PIT
+ OF MOS'T S I 8 N I F I C A N T B I T It4 MOST 3113NIFI
+ (FIRST:> HALFWORD.
* T - FLORTINS POINT SPECIAL FORMHTz TWO HFtLFW
+ T (I:> ~ZONTHINS SIGNIFICANT FIELII~ 12 PIT
+ !3bJUSTED 9 1.4 I TH I MPL I ED RHD I ::.:: PCI I t.4T
4 THE 1 ~ T H RND L ~ T H PIT FROPI QISYT.
* T 1.:'2:> CONTH INS E:X:PONENT F I ELD 7 Y PO= I + I I...~
I EITElTER 9 R I GYT-AD JUSTED . RHT I El- ONE HHLFWORD IZONTA I N I NG A F I :>::ED-PO I t.4~
+ R I iSH'T.-F)DJU:STED FRACT 10151 WITH I tIpL I
+ POINT LEFT OF MOST SISNINICANT B I T
+ *+++*++**+***+****+**+*******************+*+*+****
ENTRY SCGET * * .7
.L.YMEOL I C REG I STER ASS I SMMENTS . .+
PTR EQU 7 F'OIPJTER IN.TO F ~=tPqR.i.
FH EQIJ :3 HIGH BITS FROM F. F M EQI-1 9 ~~IIDIILE PIT= FFRV F.
FM 1 EQCl 1 0 ALTERNRTE REGISTER F
!=L EQlJ 11 LOW BIT.: FPOM F. T A EOCl 12 ~~DKIPESS OF T. I: A E I ~ U 13 AIIIIRESS OF El-IOT I EN'T.
T )( EOCl 14 E::.<PONENT F I ELII UF T. RTN EQU 15 RETURN L I MK.
,* * EHSED ON Tr GET DATA FROM CORRECT HREH I N F.
.> 7 - .2.~,~3ET STM PTR r FTNTFIV 'SAI..~E FORTRAN UEG I STE
LH TX , 2 I::TA> I ~ E T EXPONENT F I E L D 0
[1014R C:E70 [I 0 0'3
l:inl:2R C470 FFFE
130lCR 4H7F [I I] i l ~
I:I 0 2 OR 48-37' [I [I 02
111 024R 4:387 1]000
0l]Z:3R Z6EF [ I~ZAR C4EO
0 0 [IF OOSER II:yEO
0 0 O:3 [I 0:3L5R 4'33 0
0 052P [1Ij:3CmF! Z1'6 L -
[I l:1:33R C:7E O FFFF
0 0:31:P EC:3E 0003
0 134 I]R 23 09 Qn42R 4:3B7
111 0 [I 4 01:146R Z4AO 13048R €ERE
FFF8 004CR EDBE
FFF8 01:151:1p [ I G ~ H
NHI PTRr X'FFFEN 11 E L E T E L :: E : E I. ..' E *.I E: .r' '1
L H FM r 2 (PTR, I ~ E T F MIDDLE B I T 3
1 :z T ::.; 9 1 5 NHI T X r l 5
BPS LEFT 1::. r NEED LEFT Z:HIFTS PIGHT XHI TX? X'FFFF" II:!J~.IFLEMENT TX
B 12 D I Id 1 : ~ HIID THLl H A L F W U R I l Z
LEFT LH F L r 4 W T R I GET LOW HALFWORII
L I Z F M l r O :sLL FM 1 9 -:3 (T>:) ~:HIFT LEFT T)-!3 p~f i* :
OHR FMr F M l C~MBINE F M PIT=.
* * PERFORM DI I , ' IS ION RNII RETLIRN PEIULT.
D I '4 DH FH r 10 I:TH> DII..>IIIE FHrFM .' T
STH FMg 0 1::C:A:r .> :.TORE RESULT
DONE LM P TR r FTNSHo R E S T ~ ~ E REG I ZTERS.
AH RTNr O (RTN:) II:SLI:UL,=,TE RETI-IRP.I ex115
BR RTN * FTNSH1$) 32-PTR-PTR
ENT!
CH D I '4 DONE FH FL FM FMl FTNSH1n,J LEFT PTR RIGHT RTN
+ PCGET T A T>::
FORMA T OF ~RSUMENTS:
F - MI-ILT I FPEC I 5 I ON FPR8Z.T I ON 9 FFICKEb f 6 B I T S
HFILFWORD I N RRRA'v'r I PiPL I ED RHD 1:s: PO I t4T
OF MOST S I G N I F I C R N T B I T I t4 MOST S I ~ 5 N I F I
'::FIRST) HFILFWOR~.
T - FLORTING POINT SFEIZIAL FORMAT^ TGJO HFILFW
r(1') CONTAINS S I I ~ N I F I I Z F I F ~ T " I E L D ? 12 B I T
ADJUSTED r W I TH I MPL I ED RAIl I:%: PO I NT
THE 1 ~ T H AND ~ Z T H FIT FROM R I WIT.
T (L>> I ~ O N T A I N 5 E:s:::PONENT F I ELD r A Po5 I T I I...'
I NTEGERr R I CHT-AT3JUSTED.
1: (:<) - ONE HALFWURn CONTAIN I NG 13 F I :s:ED-PO I NT
RIGHT-HDJUSTED FRACTION WITH I M P L I
P O I N T L E F T OF MOST S I G N I N I C H N T B I T
* 4
FL FM FH 1: A PTR TY TM TM1 T L TH RUNE T :< RTN
EQl l 3 EQIJ 4 EQl l 5 EBCl 6 EQI-I 7 EQCI 8 EQlJ '3 E 121-1 1 0 EQIJ 11 EQCl 12 EQC! 13 ErS!U 14 EQCl 1'5
Lob4 B I 'T5 OF FROKIUIZT.
MIDDLE BITS UF FRODU
HIGH BITS OF FRCI~PIUCT
HDDRES= OF 12. POIN'TER INTr3 F ARf?A'r'
HIGH B I T S UF T. Y I ~ P L E E: IT5 OF T. HLTERNATE REG I STER F
Low BITS OF T. F~KIDRESS OF T. I ~ D N ~ T A N ' T ONE.
E::.::FOCIENT FIELD OF T. RETI-1-t.4 L I ~4k:. .
4 8 + 49 + !SET PHRWMETERS AN^ MULT I F.L'.I.. 3 I 'sr.4 I F I IZHN‘T F I EL
5 0 + 51 'SCSI-IB STM FLrFTN:?AV -. l.a=l'.'~ FORTRAN A E ~ I : 5 ' r ~
56 L I Z T L r O ):!-EAR T LOW BIT=
57 58 RASED ON EXPONENT F I E L r l O F Tr JUPTRACT "aODU
53 FROM PROPEP PLHCE I N F HRPA'r'.
611 6 1 L H T ::.< 9 2 C T H :> GET EXPONENT FIELIB !:I O 1 6R 4SEC
1:1 0 0 2 001AR C37E
FFFF Cl01ER CE70
0 I] 013 OOZ2R 1,7470
FFFE 0112dR 4H7F
0 0 02 0I]zHF: Zt5EF I!OL>CR C4EO
111 O OF 0030R C9EO
0 0 0 8 01134R 4'330
O 12 4ER 013'33R 3 1 26 003AR lZ7EO
FFFF 0 13'3Ep ED:SE
0 I1 0'3 0 042R z:3 136 0044R 118H9 0046R ECHE
FFFS 0114HR ECSE
FFF8 0114ER 4 9 3 7
I] 111 111 4 0052R 4:347
0 [I I] z 045c,p 4 3 9 7
0 8 i t [I
.- ., L H I PTR r - 1 < T N ,- b~ L:VBTRACT 1
6 :2 SRHA P T R r 3 DIVIDE B Y :3 ANIS
6 4 N H I PTRr X'FFFE' DELETE L:SB: EGLE~.I B.T.T
A I S NHI
CHI
.7 = ? 2.UETRACT '3 H H L F W O
BPS LEFT XH I
'SLL THr 9 (TX>
R :s FIGHT LHR
:SRL
LUAD REGISTERS FROM
FMs 2 l:PTR>
0 0:32R 0 O131:R Pi0 ERRORS
CH ISONE FH F L F M FTNSHV LEFT PROP PTR RIGHT RUNE RTN ::;C'IIJE 7. 2. IJ B 13 TY TH T L TM TM 1 TX
:3 0 SHR F L r TL 81 STH FL r 4 (PTRT
:3 2 :SCHP F M r TM :3 '3 STH FM r 2 <PTR>
:3 4 .- - bI,HR FHr TH 5 5 STH FHr O SPTR)
:3 6 BNC'S DONE 87 LCS RONEr 1 :3:3 PROP C. .&IS PTR, 2 99 HHM RUNEr 0 CIPTR)
3 3 BR RTN 94 39 FTNSHV DS :3Z-FL-FL 96 END
SUP.TWAI=T LOC.I H~~LFWOF! -. >.fii..)E 11 I FFEWENCE
.-. '..~JBTRHIZT t.1 I DDLE HqLF
.- '1.[UETRklZT H IGH .:. L.HVE HIGH HALFL-4OPD
FORTRHN PEGI~TER 5 ~ 1 ...'
THTS SlJRROUTT.ME DlTMPS THE F I F S T N PALFllORnS I N A R R A Y T,OC (OR !.I HALFTJORDS SKCINMTNG IJIT'I 1,OC) 1:T I-IEYADECTPIAT, nR!To J,OCICPT, UNIT Tau.
DO lfl0 I = 1 , N ' ,
I,SUR = ~ n n ( 1 - 1 , 1 6 ) + 1 CALI, !JNPAC!!(LOC(T) LTHE(LSUR) ) I F (I,SIIR . EO. 16) 1 7 ~ f ~ ~ (1,1~,28@) LTPJI:
SURKOUTINE DLTFlPV(LI!, F1, LOC, hl/\.T,I'!l)
T H I S SLIHROUTINF: PllMPS THE F I R S T !! I ~ A I . F ~ ~ l C I R ~ S T?l APPAP 1,OC (OR ?I IIALF\JORDS RECIFJMING \ l I T t 1 1,OC) T M 11EXAT)CCT?lAJ, OYTr) LOGICAI. ITPJIT T,ll.
. - C O H T I N ~ J E . I F (LSUR .NE. 32) r!KITE (LU, 2flfl) (T ,T?lE(I ) ,1=1 ,LSTTH) RGTIIKI*? FOKTIAT(R(1X 4 A 2 ) ) DO 19)l T = 1,r LSUR f l O n ( 1 - 1 . 6 4 ) + 1 CALL IJNPACK(I,O~: ( I )-1 URIJF) T,'II?E (T,SITH) = 2 56 * J , A ~ J D ( U R L ~ I ? ( 2 I F (LSUR. C 0 . 6 4 ) !TRITE (LU,2$41) COMTIPJITE I F (LSUH .FE. 6 4 ) WRITE (TAU, 2g1) RETURN
PROISRAM PJAME = RFV LATEST WEI;~ZIUN 4...'2..."715. PROISRAMMER = F. C. PASIZO FIXED-TO-VRRIABLE SOURCE C O D I N G A L G O R I T H M DEMUNSTRHT 1Ot.J
F O R EERNOULL I SOURCES. D = z r J = K = 12 B I T S .
O - HVMAN INTERFACE
6 - PRINTER F O R b d T H AN11 R E S U L T S L O G
~~FIRIRPLES O F MAJOR S I G N I F I C H N C E :
XPTR = NUMPER O F % S i " M B 0 L S H L R E R D Y USEI l .
YPTR = NUMBER O F :~ 'Y 'MBOLS I N '$ RRRAY.
L = ~ZOQEWORII L E N G T H
COMMON .HCHANNL.fF 1 r F LOG I CHL LOG r NOLOG INTEGER+Z XDIM. XPTRr ::.::LEN9 YDIMr '(PTRr FIRST:, ZEPOz :.: 1:2049:> 'f (2049:1 I NTEGER*? F 1 r F ( 2 049:) - FD I M r LIJ 9 Q (2) r C: 9 T 1:Z:) r FOVERT INTEGER+Z Nr AN:SWERs YES I)HTH j'ES.zelH'$ ...'9ZERO.*'t':I.J DHTH FD I N r XD I M r i ' I I l M ' i ? 049 r 2 04'3 r 2 049..# F 1 = 0 WRITE (09 001:)
. . FORMHTC'FIXED-TO-VRRIAPLE SOURCE CODING DEMONSTRHTION:'..I
I:HLL PROMPT (09 'Do ,YOU WRNT DHTF) L O G ON L. 11.6. ':',I' OR N:) '? 1.'' ? READ ( 0 2 0 04:) HNSWER FORMAT (A1 3 LOG = VINSWER. EQ. YES) PiOLOG = . NOT.LOG
!JRITE 0031 FORMRT ( / EERNOIJLL I SOURCE MODEL ' 1 I F CLOG) 1r.M I T E ( 6 9 003) [!!RITE (Or 0 1 0) FORMAT ('ENTER PROBABILITY OF FI IOCIFZCE " 1 " fi.5 NUMER~TOI? OF~".I
CHLL PROMPT ( O r /FRFICT ION: h i l - . f 4 0 9 6 REHI) < 0, 02 0:) Q (2:) FORMAT (14) IF (12 1::s) . LE. [I. OR. (2) .5E. 4 096) GO TO [I 07 12 = 4 0 9 6 - 13 (2) Qc:l 'r = 1:
H= (13 (1 1) +HLOI; (Q (1 :> ./4 0'36. > (2:) *ALOG (Q (2:) ..id 09<, . ::I ':I ...' ( 4 1596, * A L O I ~ I:, 5':1 ':I
CHLL PROMPT ( 0 9 / M E S S ~ ~ G E LENGTH N = '? ~-il- REHD (I] 9 08 1.1 N FORMAT < 15, XLEN = 16 + XDIM I F CN. GT. :.:LEN? GO TO 0:30
.y.
.2.ET UP SOURCE HRRHV SCCORD ING TO PROEHB I L IT IES.
WRITE C O r C591:) FORMHT C "GENEWRT I NG SOURCE ..' 1) DO 095 XPTR = 1rXDIM ::.<rXFTR) = 0 DO 0% XPTR = I r X L E N I F 1: I RAND (:4 096) . GE. I=> CALL :zETR 1 T (X 9 :x:F'TR) CONTINUE LH:ST = 0
Ir.lRI TE 1: 0 r 1 05s FORMAT ( . . . * ' ' E ~ c a n I Na '' ::I
F'd4- 1 DO 1111 I = 19FDIN F(I::l = O
F!)E-Z T(1) = 204:3 T 1:2) = 0
Fta)E-3 DO l5l:I XPTR = FIRST*LH:ST
F V E - 3 P IPYM = NTHBIT(X.XPTR) + 1
F!)E-'3c 1 F (1 'S'.(M. GE. 2) IZALL 3C:BDD (F T 9 I:)
F1l)E-'2~ IZALL NEXTT (T 9 0 (I S'~M':I ::)
FVE-4 L = T(2:r .+ 1
Fg~)E-5 To fibb ;?a& (-L.:I 9 W E fiDn ( 1 . 0 l 2++ (-L+ 1 ::I ::a & (0.5:) T (1) = 204% T ( 2 ) = L -1 CALL '7.- L.I-.HDD CF r T r 2 048:)
WRITE(0,178T NvL . .
FORMHT ( 1 6 " ~ Y M P U L ~ ENCODED; CODEWORD LENGTH =." 1 5 " FIT+.' .:I I F <NOLOG:? GO TO 2 0 0 RATE = FLOHT l:L> ..NFLOAT CN::l IdRITE (6, 1'30) RATE FORMAT <..' 'RATE ='F6 . 2 .- .' PIT,= - ..'.-. s VMBOL" :I
!l!RITE (69 179:) N . . FOPMAT (.,":~~IJRI=E MESSRIFE ( '' I 5 ' SYMBOLS::~ := .. )
::.<PTR = CF 1 RST+ 15) .*'I 6 I ~ A L L DlJMP (:ij r N.' 115 9 X C:.:PTR) )
!I!RITE (69 l:30:) L FORMAT (., "THE 1z0nEwoRr1 I 5 1:' " 1% ." p I : ..' ':I CF1l-L DIJNP I:'C.r (L+lS:r ...'lC,. F:r
I.JRITE (09 2 1 0:1 FORMHT <'DECODING"?
DO 2'50 '$PTR = F IRST r LH'1:T
CHLL CLRB I T (Y r 'f12TR> I F I::ISYM.LE.~:> 5 0 TO 2 5 0 CHLL SETB I T (Y r 'U'PTF1::l
I.I.IRITE (l]r:I-rOS:> FORMHT <."DECODING COMPLETE. '?
I.IJRITE (09 401) . .
FORMAT ('EEI~;I N COMPRRINI~" .5
DO 4 0 0 I = F IRSTrLA5T I F CNTHEIT (Xr I>. NE.NTHRIT ( ' f r 1:) :) GO TO '3114 IXINTINUE IJ.IR 1 TE ( 0 ? 4 2 0:) I F f::L[7ts) Ij.IRITE (6,42O':1 FORMHT ( Z C : ~ ~ ~ ~ ~ ~ ~ o ~ SUIZCESSFUL':~
GO TO 1 0 0
IrJRITE 91 1:) 1110 TO 1 0 0 1 IrJRITE (Or '312:) t50 TO 1 0 0 1 WRITE ( O r '313) GO TO 1 0 0 1 FORMAT (.,'RAN OIJT OF 501-IRIZE I ) ~ T A . ":I
FORMAT I: ' OUTPUT BUFFER FULL " :)
FORMHT C #DECODING E R R O R ' )
Ir.lRITE ( O r 1002:) FORMAT 1:'TO R E S T R R T 1.4 I T H SAME PHRAMETERS r T'r'FE ''cfl" ' . : : I
PHU.SE GO TO 090 END
SOlJKCL FIISSSACF ( 4096 SYMRO1,S) = FFFFFFFF I'Fr7EFFF FFFFFFFF FFFFFRrF FTFFvnF'F FFFFFFFT Ff"FFITcT PFFT'TFTr FPFFFFFF rFFFFFI'l? F F F T F F F fFRFFFTS FFFPFFFF FFFFFrYF TFTTTPFF FlTrFFF*F FF7FFFFF FFFFFFFF FTTVFFFFF FFFFFFrF "FrFFFFT FFFFrPrF Tl'T'fT'l'T17 I'rf!'rcl'l' FFT'PFFFF VFFFFFFF Ff'FFFFFF FFFFrFTF FPFFFFFr FFFFFFFP FFFPFFrT ! : r r 7 F W r F7PFFFFT' FFFFFFFI, FFFTFFFF I'FFFFFTF AFFT'FFFF TFFTFFFF TFl'FFrrT7 r T ' F F T F r FTFTFRFF I'T'FFFFFF FPTrFPFF FFFFFFTF FPFFFFFF 17FFFFT'TI: FrT'FvfFF T'FFYF-'['T'I' FFFFFFFF E'FFFFFFF FDTFFFFF FFFFFFEF FFFFFFFF FFFFFrFr FFTFITFF F r r F f r F FFFFFDFF FFFFFFFF FFFFFFFF FFFFFFFF Fr rFFFFP FFFFFFFF FFvFrFrT lTyFFYrr FFFFFFFR FFFFFFFF FFFFFFFF FFFFFRFE F F F r r F F F FFFFFFFF FFrFFFFr r F f F r r r F FTFFFFFF FFFFFFFF FFFFFFFF FFFFFFTF FFYFFFFF FFFFFFEF F r r F F F F r 7FTfTTT'T FPPFFFFF FFFFFTFF FFFFFF7F vFFFFFF7 FFFT'FT'FF FFFFPT'rr FT'PTI'PTT Y r r T r r r r PFFFFFFF FFFFFFFF FFFFFFFF PFFFFFF7 FFTFrT'TF ITFFFFFPT rTFrPFFF FrcFPTr T'FFFFFFF rFFFFFFF FFT17FFFF TFFFFFFF RFFFFFTF FFFFTFFF rlTTTFTF VFPFPIW FFFFFFRF F7FFFFFF FFFFFFFP FFFFPFFf T F F r r r F F DFFFFrTr FrE'rFFFF T F F " F r v FFPFFFFF FTFFFFFF FFFPFFFF FFF7FFTr TTrFFFFI) FFFFFFE'F rTFTFFFT rFFFI'T'I'1' FFFFFFFF FFFFFFFF FFFF7FFF FFFFDFTI: FFFTFFFF FFFFFFFF FFVTFPFF TFFF!'PT.'lT
TNE CODEWOlVl I S ( 3fl1 BITS) : 5Rhn49A1 lnEl35495 PR426333 52DF438F PF7A67DR A2ARE4/+l+ CAOlj(i7FO nA@QI(j1:?n n7 l a n 2 ~ 7 ~ 6 a 5
COMPARTSOIJ SIICCESSFITL
UTT< = 41.4175 RITS/SYI'fROl,
SOURCE MESSAGE ( 4@96 SYMRO1,S) = FFFFFFFF FFFDFFFP FFFFRFFF FFFFFFrr FFPFFFFF FFFFFFFF FFVFrFr r rFFFFfFF FFVFFFFF FFFFFFFF FFVTFFFF FFFFPF7" TF'FFrFFF FFFFFFFF TrTPFrFF WFaVFl'F FFFFFFFF FFFFFFFF FFFFFnFF PFFFFFf17 FFFFFFFF FFFFFFFF r r F c f F F n r F F F F F F FFFRFFFF FFFFFFFF FFFFFFFF FFFFFPFT FPFFFFFF FFFFFFFT TPFT7rFFP TVFFrT'F FFFFFFEF FFFFFFFF FFFFFFFF FFFFFFrF FFFFFrFF VFFFFrFF FFTFFFFF FrFTprPF FFFFFFFF FFFFFFBF FFFFFFFP FFFFFnFF FFFFFFFF TFFFFFTF FFFFVFT'O T'rFFFrrF FFFFFFF7 FFFFFFFF FFFFFFFF FFFFFFFr FFllrFFFF FFFFFFFF FFFFTFFr F F r T F r F r FFFFFFFF FFFFFFFF FFFFEFFF FFFFFFFF FrFFFFFF FFFFFFFF FFF7FFFt7 rrrFT'WV FFFFFFFR FFFFFFFF FFFFHFFF FFFFFFFE FFFFFFFF FFFFFFFF F V F F I ~ F ~ F r F r F F F r r FFFFFFFF FFF7FFFr PFFFFFFF FFFFPFF7 FFFTFFFF FFFFFTFF F ? ) r 7 r F r F FFTFTTl'T FFFFFFFF rFFFFFFF FFFFFFFF FFFFFFTF TT'FFFFFr I'FFFEFrr W7FFTFT I~CPTI7Fl'l' FFFFFFFF FFFI'FPT'F TFFrFFFF FFFFFFVF 17FFFFTFF FFFFFFTF FFFfTFPF FFrl-FFfF P7FFFFFF FFFFFFFF FDfFfFFF F7FFFFFF FfFFFF'oF TFFEFTET rT'FfrFOc FFFFrrPF FFFrFFFF FFFFFFFF FFFTFFFF FFFFFFFr BFFFFFFF FFFFFFTF FPFrTPFF FFFFRrJ'F FFFFFFFF T,T7FFF7FF FFFFFFFF FFFFFFFF DFFFFFFF FFFFTVFF r7FFFFTiU FI 'FT'Wrr FFFFFFFF FEFFFFFF FPFFFFFF FFFFFFFF F V F F F F F F F F F F F T WFPFFFF I ' rFFFFFr
THE CODCTrCIRD I S ( 707 RTTS): 5FQ)R141%1 I3flE51)fl41) C7FFQ)2 19 85CBDAC7 2PA765194 57n3flhl C A?66l\Dnl Dl '3Friznh 638CC9C7 7952E422
COFlPAPT SnFl SITCCESSFITT,
PATC = l.@fl@ RITS/SYWROI,
L/ SOIIRCE JlKSS.I\GI: ( 4g96 SYFIROLS) =
C5F5E992 6284636E 56FS8DAl SDCC12qF 7Hflfl94n7 qTVE743C 3hFpRrHh lCrRR299 R74RDC7F Pfl336RnP 352fl@97@ C912D683 35T4Q184 R7ElQ672 ? 4CFF7AY l)ER9"IFfll 53684581 RRE67AC3 C375DR15 23DgFOE9 lR7FAK4h ERflDF841 C71444D1 735@77flG 46C66543 6AC1781C 15A54A36 3657R7Vlt 3431ArC9 C18flfl6"? 1T75R177 h 2 9 f l l n l r E59CQ93F RlDF@46Q C217R14F 1CR88A36 7Qrfl5CR5 323E35C1 5Cnr733C 3A3P?l?K E7EAASF9 FDF53F45 EAD73564 F2g3@16?\ 8E14AChl E22F24Rl A54nC4fl8 43hrR5RR AC3876A3 949C65F3 EDRD7Fh3 8DC014C3 7 r 7 5 E l h 4 9A6CCFnD 516314E4 AflQ1"4461 698lVC15 nC26E25C 6nRAA94E lF75DD7n C2599747 23E94323 R9F4Ffl3P rP964?35 2hR7C579 ECR2EF58 RF104SCE fi353391h DFWEE37 h323Rfl5C 449h43Afl l lQDD186 8hRlAAlC A841CEflT: RRCrlR2A lFF8Efl86 921)P;fl829 23782gE3 51'C1)8265 12Rfl74PP D@@A3rD@ E6565.4flI FAflflfl33C 35BD3F51 FOQDbE2R 15425F555 RlCfl7581 4fl487114 22fl8DR3F 1)Q)A?9817 E82D9AC4 AA940FE3 712202FT 77/tCB946 BDRA2FFF: 8ESA.F7:'r/+ A5hCFFA2 3R4Cl949 D53DPR55 3823F659 28D262F3 B8R956flh R33133A2 65F11n5fl 45F@h4@E 95872987 h227fl728 0698F446 7@CRD15@ 8flg5CgEJ) nCAAT599 5r6BhPCC DDA?lFDD Q77PFF7C 3R37C2CF 99ElFBAI P97hC68C 47462493 2FPflCgR9 A1$3ODCQ 566@1344E 991FECPC 7ElC3363 R7308DF6 2 5 V 7 F 1 2 E14DDC?3 594@A?Ch RnhRRnfiA
T111? CnPI::I!OPD TS ( 4@97 RTTS) : C5F5Eo92 f i 9 R b 3 6 E 5hP78DAl RDCC129T> 7RflgQ4n2 QFPE743C 3AF1'8FRh 1CESnYZ B74RDC7r Qfl336Rnn 35?@flQ7@ C912nhfl3 38F4n184 P7E1967? 2ACFF7AF DE892Ffl1 53684581 RsE67AC3 C3751lR15 23D0FOEc1 1 ??PAC46 I:RflDF841 C7 1454r)3 735@77G/6 46Ch6543 6AC1781C 15A54A36 3657R784 3431AFCQ c18@@6P2 1C75R122 h2QfllDlF E59C993F RlDFg46q C217R14F 1CR88A36 2qrfl5CR5 323E35C1 5C9r733E 3A38712E E7EAA8F9 FDF53F45 EAD73564 F2@3@163 8C14AC61 E22F24R1 454nC4@8 43hFR5RR AC3876A7 949C65E3 EDRD7F63 8DC@14C? 7F75ElA4 9A6CCFDD 516314T4 AflDlFA41 69SPDC15 DC26E25C 6DRAA94E 1F75Dn7!) F25Q9747 23Rfl49?3 RQFaPfl3P En964235 26R7C.579 CCR2EF58 RFln[+RCE 03533916 DF1)8E1'37 A323RO5C 44Q643Afl 119Dnl';O 86BlAAlC A841CEflE HECFlR2A lFF8EflRf. 92D8fl824 23782flE3 5ECQ8265 l?.Rfl7[tn? D @ @ A ~ F D ~ E6565A01 FAflflfl33C 95RD3F51 Ffl9n6C2R E425F555 ~ 1 ~ f l 7 5 5 1 4fl4871 1 b 22fl8DB3F IlflA9q817 E82D9AC4 AA940FE3 71 22D2FF 774CR946 SDRA2FFE 8F8AF7Q4 A56CFFA2 3R4C1349 D53DflR55 3823F65Q 2811262F3 RRR956flh R33133A2 6 5 F l l n 5 g 45Pfl640E 05872987 6227fl728 @698F445 7flCRn15fl 8flgSCflEn DCAPF599 5F68ARCC nDA21FDD 977FFF7C 3R33C2CF Q9ElFRAI RQ7AEhRC 474624q3 21'QfiCflRo Al@3TlnE?
L 566oR44E 991FCCTC 7ElC3363 1373fl8Drh 25F07F12 E14QDC23 594flA3Eh R9689Rfl4 88Qfl
COalP A R I SON SUCCESSFUT,
-. I-. -. L -. I-. -. C. C C C r; C: C -. C.. -. I-. C 1:
C: C 1:
C: -. C. C . L
PROGRAM NAME = MFV LATEST RE!.) I S I ON 4...;2...'- I" a *
PRUGR~MMER = R.C.PHSCO F x XED-TO-'*)AR I R B L SCIllRCE COD I Nl'. HLGOP I THM DEMUNSTRHT I O N
FOR 1 ST ORDER MRRKOV SOURCES.
THIS IMPLEMENTATION HHS PHRAMETERS
D = sr = K = 12 ~ 1 y s r HND L <= 4 0 9 6 EITS.
i $ ~ A ~ l ~ ~ ~ ~ ~ O F MAJOR S I G N I F I C A N C E :
MHLPH = HLPHRBET SIZE. (SYMBOLS RUN FROM 1 THROVISH MHLF'H> . XPTR = NUMBER OF X S.I~MBOLS ALRER~IY USED.
' V T R = NUMBER OF SYMPOLS IN 'f HRRRY. L = CODEWORD LENISTH
Q C I J'l= PROERBILTIY THAT S.Y.MBOL I IS FOLLOWED BY J
COMMON /CHHNNL.'F 1 9 F LOG1 CHL LnCr NOLOG DOUBLE PREC: I S ION A < 4 ? 4) r B (4) r MKFtREA (4) r QG! r H (41 r EH INTEGER+Z :+(I OIS17j ~:x:D1Mr:~:PTl?r'f (1 0017:) r YDIMr' fPTRr F IRST, LH:5.T INTEGER+s F l y F ( l 2 9 > ~ F D I M ~ L W Y Q C ~ ~ ~ > r I>LEFTrC[ :4r4) rTI!'?> rFOVERT I NTEGER+Z Pi 9 ANZIdER YES r XPREV I 'fP17EV r :4PRE14 1 9 1211 I M DHTH YES./lHY ..' DHTH FDIMr XDIMr YDIM.'lZ9r 1 O O O r lOOO.~r l;!DIM...'4..< F1 = 0 WRITE (Or 001)
. . 0 0 1 FORMAT ('MRRKUV Fid CODING DEMONSTRmT I O N . '. )
(I:HLL PROMPT (0 r "DO 'fnl-1 WRNT DATF) LOG url LU. 6. <'f OF? N::l '7 L ")
PEAD (Or 004:) HNSWER 0 04 FORMAT <A1 >
LOG = <HNSWER. EQ. Y E S NOLOG = . NOT. LOG
ESTABLISH ALPHABET S I Z E PFJTJ T)ISTRIBUTION:5.
CALL PROMPT SOURCE RLPHABET SIZE = ? 1 ..') REHD (Or 003) MALPH FORMAT (I 1:) I F CCMHLPH.LE.O>.OR. <MHLPH.GT.QDIM)> GO TO 0112 I F (LOG> blR I T E (6 r 0 05.:1 MHLPH FORMAT C.'..'" HLPHRBET SIZE =' r 15) DO 080 XPREV = 1 9 MRLPH 1: l:XPREV9 1 > = O CLEFT = 40915 H (XPREV) = 0. MHLPHl = MHLPH - 1 XPREVl = XPREV-1 URITE (09 010) XPREVl FORMAT ('ENTER NUMERkTOWS O F P R O E A E I L I T I E S O F S Y M B O L 5 F O L "
1 '.LOWING' 13) no 0 : ~ I = I~MALPM I l = I - 1 WRITE (Or012:) Q L E F T v I 1 FORMAT <':SPACE P E W F I I N I N G =' 15' ...'4(I'%; ENTER P W O P A B I L I T Y NO. " r 13.:1 IZHLL PROMPT (02 '~L1,.~4Cl96 REnD C 0 r 02 0::l Q CXPREV 2 1) FORMHT (14:) I F C CB (XPREV9 1:) . GE. O> . FIND. CQ CXPREVr 1:) . LE. QLEFT> > GO TO 030 MRITE ( O r 022) . . FORMAT < ' ~ O R R Y ~ OUT O F RANGE. TRY A G A I N UFJ THAT ONE. .' ) 150 TO 019 BLEFT = BLEFT - Q (XPPEVr I) I F (Q CXPREV r I) . EQ. 0:) GO TO 033 A C I 2 XPREV:) = DRLE (0 0::PREV r I) .'4096. :I
H t::)(PREV) = H CXPREV) + H C I r XPREV) +DLOG (H (I 9 XPREV:] 1) C (XPREVg 1 + 1) = C CXPREV r 1) + Q CXPREV 2 I > CUNT I NUE IS! (XPREVs MALPH) = QLEFT I F CQLEFT.EQ. 01 GO TO 037 Ql2 = DBLE CQLEFT .# 4096. H (XPREV) = HCXPREV) + QQ*DLOG <BQ) H (XPREV> = H (XPREV) ./DLOG C 0.5D 0) ACNFILPHrXPREV) = 1.0DO
I F CNOLOG:) GO TO 050 !I.IRITE Cbr070) (QCXPREVrI>rC(XPREVrI>rI=lrMALPH) FORMAT ( y' P R O ~ ~ B I L I T..~.. 17 1 : : ~ > I~IJMULAT I C ( : x : ) '' .C (2 (5::.: 1 6 " '4 096 ') ) ) 1.t.IR I TE C6 r 07 1) H C::<PRE1v'3 FORMHT C / 'ENTRDP.~ = " . F6. 3 .'' B I TS,'S~MPOL. ' ./.') CONTINUE
I F (NOLUG> GO TO 0:31 DO 073 I = l r M H L P H l H < I P I > = H < I g I : > - 1.110 P (1:) = O.DQ BsMHLPH:i = 1.D0 CALL L E Q T l F ( A ~ l ~ M H L P H r ~ ~ ! D I M ? E ~ O ~ I t i K H R E A ~ I E R > InlRITE (69074) <ECI ) r I= l rMHLPH:> FCIRMHT C ' S T ~ T I ONRR'V. D I ~ T R I BUT I ON ' fl' <F 1 O.4) 11
EH = 0.D0 DO 075 I = l r M A L P H EH = EH + H < I > + R < I > MRITE (69 0763 EH FORNHT C'HI.~ERAGE SOUPIZE ENTROPY :3' B I T~...'S.~*MBOL '.':)
OBTHIN BLOCK LENGTH.
CHLL PROMPT (O~'BLDC~-: LENGTH N = ? L-L1 RERn COzOZO> N
C- .LET UP SOURCE HRRRY t3CCORD I N G T O PROPAB I L I T IES
~ J R I T E < O r 091:) FORMAT C/~ENERPIT I NG .Z UURIZE '':I
XPREV = 1 DO 0 9 7 XPTR = 1rXDIW I = IRHND <40136> ISYM = MHLPH I F ( C CXPREV r ISYM> . LE. 1:) 50 TO 0'36 ISYM = I9'r'M - 1 GO TO 95 X c:XPTR> = IS'r'M :.;PREV = I'S'r'M
XPREV = 1 YPRE'V = 1 LHST = O
F I R S T = LHST + 1 LHST = F IRST + N - 1 I F (LHST. 13. :.:DIN) Gr', TO 3 0 1
WRITE COrZlO) FORMAT ('DECODING':)
F '$1 D - I2 TC1:r = 2 0 4 8 T (2:) = O
F'y'D-:> DO 2 5 0 VPTR = FIRST! LHST
F14D-139 CALL SCGET CF r T 7 FOVERT) IS'r'M = MHLPH I F CFOVERT. 1SE. C: C'r'PREV r I SYM) :) GO TO 2 3 0 Is ' fM = I's'fM-1 GO TO 225
F14D-:3lz Y c 3 " T R = ISYM
F81)D-3rt CHLL SCSUB(FFT?C<YPREV~ ISSr'M>>
F1u)D-3~ CHLL NEXTT (Tr Q l.'r'PREVr ISYM) ) '$PRE1$) = 1 :S'fM
lrlR I TE C 0 r 3 05) FnRMflT ('DECODING COMPLETE. "I
IJRITE (07 401:r FORMHT ('BEGIN COMPAPIN~S')
I F (YPTR. NE.Lt3ST) 50 TO 9 0 3 DO 4 0 0 I = FIRSTrXPTR I F CX (I). NE.Y (1) T GD TO '304 CONT I NIJE IJRITE (09 420) I F CLOG> WRITE (6 r 4 2 0:) FORMRT ~I=OMPPARISON F ~ ~ ~ = ~ ~ ~ ~ ~ ~ ' : )
GO TO 1 0 0
WRITE <09911') GD TO 1 0 0 1 WRITE (09912) 150 TO 1 OO1 WPITE (Or 913:) 150 TO 1 O C l l WRITE <Or 913) GO TO 1 0 0 1 FDRMHT(./'RAN OUT OF 'SOURCE DATF). ':> FORMAT(' OIJTPUT BUFFER FULL'")
FORMAT C DECOD I NG ERPOR' 1) WRITE (09 100Z> FORMAT ( 'To PES1ART W I T H SAME PARAMETEPS 9 T vPE "CO" ' :> PAClSE GO TO 090 END
WPITE (02 105.1 FORMAT (..'../ 'ENCODING'
FVE- 1 DO 1 1 0 I = I Y F D I M F < I > = 0
Fi,.'E-;I
T(1:) = 2048 T (2) = [I
F I*/ E - '3 DO 1 5 0 XPTR = FIRSTqLHST
FPE-3s IS'fM = :+ (XPTR)
F15tE-:3t= CHLL SCHDD CF 9 T r C (XPREV r I SYN:.')
FG'E-3n CHLL NEXTT (T 9 O I'XPREV r I .SYM) ) XPREV = ISYV
FVE-4 L := T (21 + 1
F1stE-5 TO HDD 2**(-L:)z bJE IIDD (1.0 * 2**( -L+i> '1 (13.5) T (1'1 = L>04:3 T ( 2 ) = L-1 CALL SCHDD l.:F 9 Tr 2 048
WRITE <Or 1781 N9 I- FORHHT (16 ' :=.'I'MPDLS IENCODED: CODEWORD LENGTH = " 15 ' ' B ITS..":^ I F s:NOLOG:j GO TCI 2Or1 RHTE = FLOAT (L> ..;'FLU! IT (N, Ir.IRITE (6, 190:) PHTE FORMAT (./'RHTE =" F6. . ." PI TS/S.U'MBOL <')
I.JEITE (69 179:) N FORMHT ( . . " : ~ ~ I J R ~ E 1 . 1 ~ 5 mc;E ( ' r 1 5 r ' S.VMEOLS:~ = ' ) CHLL DIJMP1~/ (6, f l y :if (FI i PT) r MHLF'H'I I.I.IRITE (69 180) L FOPMAT <:..<'THE ~ o ~ c ~ o D IS (' 15." BITS) : '':I [:ALL DUMP (6 9 (L+ 1.51) ..*' 15 9 Fl
PROP,ARII,TJ"I' CIIMUCATIVC 4 141149196 41144196
327614096 41014096 410/4096 368614096
ENTROPY = fl .923 RTTSISYYROL.
PRnR ARILITY CIllIIJLATTVE 41411441Qf) 4114096 418/4096 4141/4096
3276/4096 82014096
PROBARILITY CIRIIILATTVI' 327614096 014096
41@/4@9h 327614096 4141/4@96 7686144196
ENTROPY = 41.923 RITS/ SY?fROL.
STATTONARY DT.STRIRUTION 41.3333 0. 3333 0 .3333
Avera~e eource entropy = 0 .923 bits / synhol
THE CODEWOPD I S ( 906 BITS): D9DnCE2B RCEfl283E C2881AAA 64F022AI E26BAE66 CCR2D84C FF5ER3DE D33C2915 B355F3D2 12056BAfi: 43DA3CD7 FB34AC51' 72FlE3D8 RDR10EF5 31874DDT; R3'176R38 AF3 19E21 n6359EAn 4R744879 390CF323 7Rg40R2A 345716P.F DR? SFDR6 ?@95A3n(t 25FDAF7fl 8n1635C7 9E?75D58 28DR2467 Fg71
COMPARISOtI SUCCCSSFUL
PROGRAM NAME = I $ f LATEST REI..> I 5 I 0t.4 4 .*'z ..'715 PROCPAMMER = E.C.PASCO l ~ ~ ~ ~ ~ n ~ ~ ~ - ~ o - F ~ ~ ~ ~ SOURCE CODING RLGORITHM DEMCINSTP~~TION.
FOR INDEPENDENTP IDENTICALLY DISTRIBUTED I O U P C E MOLlEL.
' Y . h rr
THIS IMPLEMEHTRTION HR5 F A P A M E T ~ ~ S
D = 29 .J = K = 12 B I T S ? Rt+Jb L <* 4 0 3 6 F I T S .
I? - HUMAN INTERFRCE. r .'. . , ; * , 6 - PPINTOCIT - DATA 9 W D PESCILT+PLOG.
MHLPH = HLPHAPET S I Z E . ~:'~'Y'MPOLI R U b l FROM 1 TYROUGH YALPH'r. LW = I:OIIEWORD LENGTH i' 16 >;PTR = NIJMPER O F X S-tME0LS ALREADY 1-ISED.
'v'PTR = UMBER O F S V M B O L S I N 'r' A-Y.
L = I:ODEHORD LENlBTH <n M V L T I P L E O F 161) . COMMON .flCHHNNL.*'F 1 9 F LOGICHL LOGr NOLOG INTEGER+Z X(4OOO> rXDIMrXPTRr1f1.400cI:j r8r 'DIM?' tPTRrFIF2STrLA:5T INTEGER+Z F ~ P F ( Z ~ ~ ) ~ F D I M ~ L W P Q ( ~ ~ > ~ O L E F T ~ C ( ' ~ ? : ~ ~ T ~ Z ' > ~ F O V E R T I NTEGER+Z N r AN:S~JSER 9 YES IIHTFS 'r'ES.8 1 Hj' ..' DHTH FDIMr XDIHr ' fDIN/ i?57r 40l?Or 4900.' F 1 = O URITE ( O r 001) FORMFST ("VARIRPLE-TO-FIXED SOIJPCE IZODINB DEMONSTRAT I ~ N . ..'::I I X L L PROMPT t O r "Do vau WANT DATA LOIS ON I-. 1-1. 6 . ('t OR Pi::l .? h *" :I READ < O 9 0 04:) AN:SI.IJER FORMAT (A 1 > LOG = (YNSWER. EQ. YES::# NOLOG = .NOT. LOG
ESTAPL ISH HLPHRBET S I Z E AN11 D I STRIPVT I ON5.
CHLL PROMPT (0, "MHLPH = '7 hL:> REHD (0,00:3> MYLPH FORMHT <I?> 1 F ( (MALPH. LE. 0:) . OR. i:MALPH. ST. 99) :I 150 TO 0 02 I F CLOG) IJIRITE (69 O O"i3 MHLPH FORMHT (./' ..J .' HLPHRBET :5 I ZE =" r 15) I: ( 1 :J = fl
I;ILEFT = 4036
H = 0. MALPHl = MALPH - 1 lr!PITE ( O r 01 0::i MYLPHl FORMAT ( "ENTER' 9 I 3 9 ' NL~MERATORS OF PWOBHE I L IT I ES '> ':I DO 035 I = lrMHLPH1 WRITE (09 012::~ QLEFTr I FORMAT (":SF ACE REMPI I N I NG = .z I 5 ~ . ' 4 096 ; ENTER PROPFIB I L I T.I t.40. "' 9 I 3:) II:HLL PROMPT (0, h'1,.,"4 096 READ (0, 02:'0::1 Q1::I) FORMAT (14:) 1 F ( (Q < 1) . IZT. 0) . nND. (IS! (1 :I . L T . QLEFT) > 111O TO 0:3 111 UR 1 TE ( 0 9 I:IzZ)
. . FORMHT ( ) Z O R R ~ ~ OIJT O F RANIS€. TR.~. FIGHIN ON T H W T ONE. ' ?
GO TO 019 QLEFT = QLEFT - 1:. (1) I;II~I = Q (1) ."'4Oq36. H = H + QQ+HLOG(QQ) C c:I+l.r = 12l::I:l + Q C I I 1I:ONT I NUE Q CMHLPH) = lS!LEFT QI;! = l;!LEFT . 4096. H = H + I~Q*HLOI:(QQ) H = -H...*nLOlS (2. o
I F (N0LOG:r 150 TO OB0 I.i!RITE (15~TJ76) ( ~ ( I j , l : : ~ I : ~ r I = l ~ M F S L P H ~ F0RMF)TC.d' P~OB~BILIT~ I:!(::<::> CIJMIJLF)TIVE c(:~:.) '..' (2(5'x:?Ig" ...'409c;.':>'>':1 WRITE (69 071) H . . FORMFIT (./ " E N T R ~ P ~ = / Fc. 12 " B I T S / S ~ M P O L . " ...'.I !
CALL PROMPT ( O ~ ' C O ~ E W ~ R I ~ LENGTH L = ? b-1- READ (Or 020::i L I F CMODrLrl6:3.NE.O) GI3 TO 080 LW = L.."lb Ll,.ll - LLJ + 1 I F (Llr!. LT. 2 . OR. Llnll. ST. FDIM> GO TO 050 L M J = L - 1 2
IrJRITE <Or 091) FDRHAT C "IZENERAT I NIS SOVRCE '' 11
DO 0915 XPTP = 1 r XDIN I = IRAND(409&? I'SYM = MALPH IF (C( I%YM) .LE . I ) C;O TO 1536 1:S"fM = IS'fM - 1
60 TO 95 {XPTR) = I:S'fN
LAST - 0
Ir.lRITE (0, 105:) FORMHT ,.ENCODING"' :)
'v'FE- 1 DO 1111 I = l r L U l ~ ( 1 : ) = i l
'$IFE-lZ T ( 1 ) = ;104:3 T(r2:r = 11 ~ J H I L E TALI < L-.J I,O '4FE-'3
'~ )FE- '~B ::(PTR - XPTR + 1 1 F (::.<PTR. GT. XD 1 M.:l 5;0 TO 9 0 1 1 SYM := X (:::(PTR)
?,,'FE-'3,= I:HLL :SCfiDD (F TI 12 I:: I :SYM'> >
',,,'FE-.311 IZHLL HE:x:TT (T , Q (1 S'I'M'r > I F I::T(Z:) .LT.LM.J::l 150 TO 1 2 0
VFE-4 F(Llr.ll:> = 0 TO ADD s++ (-L) r HE R ~ I D #::I. 0 + 2++ (-L+l:I :j + (!:I. 5 ) T (1, = ;104:3 T ( 2 ) = L-1 IZHLL :SCHDD (:F 9 T 9 2 048:)
LAST = XPTR = LA:ST - F I R S T + 1
Ir.lRITE (0, 17:3> Nr L FORMfiT S'fMPOL.5 ENIZO~ED; CObEWORD LEN15TH ='' 15." P I T S . " 3 I F (NOLOl::, 150 TO 2 0 0 RHTE = FLOAT (Ll .NFLOHT (Pi:' Ir.lRITE (69 1 9 0 : ~ RATE FflRMHT (./'RHf E ='F6. 13.'. BI TS.':=.~MBIJL.") Ir.lR I TE (69 17'3:) N FURMHT (..#' '' S ~ I J ~ E M E I S A s E 1; " t 15 9 '.' ~ .V 'MEOLS:~ =' :I
CALL DIJMPP (6 , N 9 X (FIRST:) t MALPH:) I.I,IRITE (69 1:30.> L FORMAT (.#*'THE c o n ~ u o ~ o 15 (" 15' BIT:?:) : '.') CHLL IIIJMP (6 LW 9 F)
I:: -. I-.
1:
#3 [I 1
Ir.lRITE (Or 2 1 0? FORMAT ( ' . ~ E C O D I NG")
I,,,' F D - T(1) = 2 0 4 5 T ( 2 ) = O WHILE TAU < L-J no ' ~ F D - 3
' ~ ) F D - : ~ B CHLL :SCGET (F 9 T 9 FOVERT) IS'r'M = MALPH I F CFOVERT . GE. C ( I SL.i'MM:r GO TO 23 0 I:S'fM = IS'fM-1 150 TO 225
'~'FD-:~I: YPTR = YPTR + 1 I F C'i'PTR. CT. 'fD I MI GO TO 9 02 'f ('fPTR) = 1 S'$M
I.,,' F m- :3 IS CHLL :Sl:PIJR 9 T 9 1: ( 1 :55'M) >
I I v F D - ' ~ F CALL NEXTT CT r Q C I :S'fM:r :I
I F (T ( 2 ) . LT. LM.J::# 150 TO ZZO WRITE (Or 309) FORMAT ODECODING COMPLETE. .':I
!I.IRITE <Or 401) FORMAT ('BEGIN COMPARING":'
1 F (YPTR. ME. LRST) 150 TO '303 DO 4 0 0 I = FIRSTrLRST I F C:4 (1). T'4E.Y (1)) GO TO 9 0 4 I X N T I NUE 1r.lRITE <Or 420) I F (LOG? 1r.lRITE (69 420:) FORMRT i COMP~RIIKJN SUCCESSFUL')
GO TO 1 0 0
WRITE (Or 91 1) GO TO 1 0 0 1 IrlRITE ( 0 ~ ~ 3 1 2 ) GO TO 1 0 0 1 llJPITE (01 913:) GO TO 1 0 0 1 WRITE C09914:) I GO TO 1 0 0 1 FORMAT C ' RAN OUT OF SOURCE ~ IATR. <.)
FOf?MAT < ' OUTPUT PUFFER FULL "' :I
FORMAT ( DECOD I NG ERROP: LENGTH MI SMATIZH / )
-. , ME " '> FORMAT C"DEI=ODI NC ERROR: MI SMHTCH AT'' I SZ-TH Q..
Idk ITE COr 1002) FfJRMHT ( " T f l RESTART CI I TH 5,;AME PARAMETERS r T.rsPE " 1-10" ..' PHU'SE GO TO 090 END 10 1
TIIF: COT)EI!OR? 'IS ( 20148 PkITS) : rflhCQ)')gE flK7fl747E 8CO49CD5 8D5776F7 352T114Ag PC33Cf!51: 371:1!1'9b?. CCA3443l J321:AlA2A fl7407E7A 5F4flEfl98 CT,2rfSf{DI: lCA$2EC1 fl5412TVD (1C17101:I: FAAVIPI'C 77170DP5 gC3362C3 7261.2191 5flhCFrQ? 8A2EfllC1h 7F011432A hK7?5T!Y4 17CFq27C
L P,61(ntiFCR 531AA5K9 FFjI!lF3F3 31:@941)6') 3R5C5fl75 4C2RAR71 E77TD::flC ?.2CAF.:~~l/r Ffl36R9qfl hfl38RD87 F46Cg654 EflCAC73d 91RRf269 88A35ACA. D?C3?95n rF997.47n qIflsfln83c plcn3e6r3r: 5 4 4 ~ 5 1 ~ 5 2 3 7 4 3 1 ~ 0 9211:~~(42 4641212~1 3 ~ ~ ~ 1 ~ 1 1 2 ~ r 3 6 n 1 l i ~ R7CT74RD 2r9R5552 D933fl315 2AF124C5 613T4377 26CE3RhA h77C15rC 61@7jA@i1: 4)8CI:fl6r3 4 1 C18r7C 2ES9151A 3R51779" Crg39712 F47A5496 C ~ / I ~ ~ . Q ) ~ C 154P7f:I:l
COlll'hPI SO!! Sl1CCI;SSFUT.
ENTPOPY = ?. hq7 13TTS/S?FfROIJ.
RATE = 2.R25 RTTS/SYifP0T,
SOITRC T, MI' A41 11PQ)C I C 1 RQJCDQ\ $4371%C.I CC 1gcv q4C$ RflRRQ)P,SI? 8fl~r. 1 n f i ~ 8WC3fl8@ iflninici @RQ)31 1FU CRQ)fll IRd i ~ i r sn im 1 x 1 I I C W
THI: Cf~JllIlIOl'~ I S ( 2@4!1 SITS) Rn3QF7A6@ FCITE0639 17V51D32 7FQl:13E6P, 4C88D69C 644Rfl7.53 PARCFD43 273ER55F 7flE3C?C9 Q)20CDr)59 17T3A1 127o t?/tC6r\FEIl bE3rC31)F /lC(:97C? 1 @913AlCC 3A?@AflCI: Tr42CA37 7Arl"lrflRF 9 ~ 6 3 4 2 6 1 : 173ni32n2 r:35flr321 8 6 l / l?hCl ~444Ah2(iI H2RFfl24n
COrlPARJST)!? SIJCCESSFCTJ,
.S) = F11RQ)IPI l?flUlacll r, PRQ)7 loR@ ~ , i ~ n f l r : r f l 1 FCCHCRA 1 AACfllfll R l l@flqflI [ \ in incuf l U1@111f?7 fl(nQ)CQIC 1 r3 8RQ)80C(:Q)
7flfll fl 4flfl suf l14c11 nclRccrig l~?~c113$4f?(: 1 lCHl i n n 1glQ)CQI 1 PQ) flC3111flP R l l lRQ)@P, CRRfl1349 ncflcgplqf 1fli 11-71 5
This page intentionally left blank.
Bibliography
Norman Abramson, Information Theory and Coding, McGraw-Hill, New York, 1963.
Robert Ash, Information Theory, Wiley Interscience, New York, 1965.
L.R. Bahl and H. Kobayashi, "Image Data Compression by Predictive Coding," IBM J. Res. Develop., March 1974, pp. 164-179.
H. Blasbalg and R. Van Blerkom, "Message Compression," IRE Trans. Space and Elec. Telem., Sept., 1962, pp. 228-238.
S.D. Bradley, "Optimizing a Scheme for Run Length Encoding," Proc. IEEE, Jan. 1969, pp. 108-109.
Larry Carter and John Gill, "Conjectures on Uniquely Decipherable Codes," IEEE Trans. Info. Theory, Vol. IT-20, No. 3, May 1974, pp. 394-396.
David L. Cohn, "Optimum Noiseless Source Codes with Fixed Dictionary Size," to be presented at IEEE International Symposium on Informa- tion Theory, Ronneby, Sweden, June 21-24, 1976.
Thomas M. Cover, "Enumerative Source Encoding", IEEE Trans. Info. Theory, Vol. IT-19, No. 1, Jan. 1973, pp. 73-77.
L.D. Davisson, "Comments on 'Sequence Time Coding for Data Compression"', Proc. IEEE, Vol. 54, Dec. 1966, p. 2010.
L.D. Davisson, "Comments on 'An Algorithm for Source Coding"', IEEE Trans. Info. Theory, Vol. IT-18, No. 6, Nov. 1972.
L.D. Davisson, "Universal Noiseless Coding," IEEE Trans. Info. Theory, Vol. IT-19, No. 6, Nov. 1973, pp. 783-795.
R.M. Fano, Technical Report No. 65, The Research Laboratory of Electronics, M.I.T., 1948.
William Feller, An Introduction to Probability Theory and its Applications, Vol. I, Third Edition, John Wiley and Sons, New York, 1968.
Robert G. Gallager, Information Theory and Reliable Communication, John Wiley and Sons, New York, 1968.
E.N. Gilbert, "Codes Based on Inaccurate Source Probabilities," IEEE Trans. Info. Theory, Vol. IT-17, No. 3, May 1971, pp.304-314.
E.N. Gilbert and E.F. Moore, "Variable-Length Binary Encodings," Bell System Tech. J., Vol. 38, July 1959, pp. 933-967.
Solomon W. Golomb, "Run-Length l":ncodings," IEEE Trans. Inf o. Theory, July 1966, pp. 399-401.
Thomas S. Huang, "An Upper Bound on the Entropy of Run-Length Coding," IEEE Trans. Info. Theory, Vol. IT-20, Sept. 1974, pp. 675-676.
David A. Huffman, "A Method for the Construction of Minimum-Redundancy Codes," Proc. IRE, Vol. 40, No. 9, Sept. 1952, pp. 1098-1101.
Interdata, Inc., User's Manual, Publication No. 29-261R01, Interdata, Inc., Oceanport, New Jersey, 1971.
Frederick Jelinek, Probabilistic Information Theory, McGraw Hill, 1968, pp. 476-489.
Frederick Jelinek, "Buffer Overflow in Variable Length Coding of Fixed Rate Sources," IEEE Trans. I.T., Vol. IT-14, No. 3, May 1968, pp. 490-501.
Frederick Jelinek and Kenneth S. Schneider, "On Variable-Length-to-Block Coding," IEEE Trans. Info. Theory, Vol. IT-18, No. 6, Nov. 1972, pp. 765-774.
Frederick Jelinek and Kenneth S. Schneider, "Variable-Length Encoding of F.ixed-Rate Markov .Sources for Fixed-Rate Channels," IEEE Trans. Info. Theory, Vol. IT-20, No. 6, Nov. 1974, pp. 750-755.
Donald E. Knuth, The Art of Computer Programming, Vol. 2, 1st. edition, Addison-Wesley, 1971.
S. Kullback and R.A. Liebler, "On Information and Sufficiency," - The Annals of Mathematical Statistics, Vol. 22, No. 1, March 1951, pp. 79-86.
Thomas J. Lynch, "Sequence Time Coding for Data Compression," Proceedings IEEE, Vol. 54, Oct. 1966, pp. 1490-1491.
H. Meyr, Hans. G. Rosdolsky, and Thomas S. Huang, "Optimum Run-Length Codes," IEEE Trans. on Communications, Vol. COM-22, No. 6, June 1974, pp. 826-835.
John I. Molinder, "Optimal Coding with a Single Standard Run Length," IEEE Trans. Infor. Theory, Vol. IT-20, No. 3, May 1974, pp. 336-
J. Rissanen, "Generalized Kraft Inequality and Arithmetic Coding of Strings," IBM J. Res. and Dev., May 1976, (to be published).
J. Pieter M. Schalkwijk, "An Algorithm for Source Coding," IEEE Trans. Info. Theory, Vol. IT-18, No. 3, May 1972, pp. 395-399.
C.E. Shannon, "A Mathematical Theory of Communication," Bell System Tech. J., Vol. 27, No. 3, July 1948, pp. 379-423 and pp. 624-656.
B.P. Tunstall, "Synthesis of Noiseless Compression Codes," Ph.D. disser- tation, Georgia Inst. Tech., Atlanta, 1968, (quoted in Jelinek and Schneider, 1972).
David C. Van Voorhis, "Constructing Codes with Bounded Codeword Lengths," IEEE Trans. Info. Theory, March 1974, pp. 288-299.
David C. Van Voorhis, "Practical Noiseless Coding," talk presented at Stanford University EE-375 Information Systems Seminar, Oct. 16, 1975.
Abraham Wald, Sequential Analysis, John Wiley and Sons, 1947, and Dover Publications, New York, 1973.