huffman redundancy for large alphabet sources

1412 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 60, NO. 3, MARCH 2014

Huffman Redundancy for Large Alphabet SourcesHamed Narimani and Mohammadali Khosravifard, Member, IEEE

Abstract— The performance of optimal prefix-free encoding formemoryless sources with a large alphabet size is studied. It isshown that the redundancy of the Huffman code for almost allsources with a large alphabet size n is very close to that of theaverage distribution of the monotone sources with n symbols.This value lies between 0.02873 and 0.02877 bit for sufficientlylarge n.

Index Terms— Huffman redundancy, memoryless source,monotone source, average monotone source.

I. INTRODUCTION

REDUNDANCY is one of the most important measuresfor the evaluation of the performance of a code for a

source. Therefore, one of the basic problems in the contextof source coding is to find a code with minimum possibleredundancy for a given source. It is well-known that in thecase of memoryless sources, the optimal prefix-free code, i.e.,the minimum redundancy code, can be easily constructed bythe Huffman algorithm [1]. As our focus is on memorylesssources, throughout this paper we use the term source insteadof memoryless source. Also, as in many cases we prefer thatthe codewords be decoded uniquely and instantaneously, weonly consider the class of prefix-free codes (whose codelengthssatisfy the Kraft inequality).

Now that the problem of finding the minimum redundancycode is solved, the first natural question is: “how much isthe redundancy of the optimal code?” The first, simplest andmost general answer to this question is that the Huffmanredundancy, i.e., the redundancy of the Huffman code, lies in[0, 1). The statement is always true regardless of the numberof source symbols. The Huffman redundancy is zero for dyadicsources, and can be arbitrarily close to 1 for low-entropysources. In addition to the aforementioned general bounds,i.e., 0 and 1, several bounds are derived in the literature whichdepend on some particular characteristics of the source suchas the smallest, the largest, or any arbitrary symbol probability[2]–[7].

Although these bounds reflect the performance of theHuffman coding scheme to a large extent, there are somequestions which cannot be answered by these bounds. Forinstance, note that the Huffman redundancy is a continuousfunction over the set of sources with n symbols. In otherwords, for any r ∈ [0, 1) and any arbitrary n, there exists

Manuscript received March 19, 2013; accepted November 5, 2013. Date ofpublication December 11, 2013; date of current version February 12, 2014.

The authors are with the Department of Electrical and ComputerEngineering, Isfahan University of Technology, Isfahan 8415683111, Iran(e-mail: [email protected]; [email protected]).

Communicated by A. B. Wagner, Associate Editor for Shannon Theory.Color versions of one or more of the figures in this paper are available

online at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TIT.2013.2294832

a source with n symbols for which the Huffman redundancyis equal to r . Now, the question is:Are all values between 0 and 1 equally expectable for theHuffman redundancy?If not, does the Huffman redundancy take a large value(close to 1), a small value (close to 0) or a moderate value(close to 0.5), for most of the sources?

Furthermore, the mentioned bounds may not be helpful fora comprehensive comparison between the Huffman codingscheme and any suboptimal method. Particularly, althoughwe know that the Shannon redundancy is larger than orequal to the Huffman redundancy for any arbitrary source,the lower and upper bounds on the redundancy are the same(0 and 1, respectively) for both codes. Hence, it is not possibleto evaluate the degree of superiority of the Huffman codebased on these bounds.1

These arguments suggest looking at the problem ofevaluating the performance of a coding scheme (in particular,Huffman coding) from a point of view other than the upper andlower bounds on the redundancy. We should adopt a measurewhich evaluates the performance of a coding scheme in anoverall manner. In other words, unlike the bounds on theredundancy which depend only on some specific sources withthe smallest or the largest redundancy, we need an overallperformance measure in which all underlying sources playtheir own role.

In order to make a reasonable judgment, the overallperformance evaluation must be carried out over a set ofsources that are comparable in a sense. In this paper weconsider all sources with n symbols as our evaluation space.As we will see, with this choice of evaluation space somesignificant properties of the Huffman redundancy will berevealed. Also, since there is no preference for any of then-tuple sources in the general case, they should all have thesame weight of contribution in the adopted measure.

Therefore, the problem of evaluating the overallperformance of the Huffman code over the set of sourceswith n symbols can be restated in a probabilistic framework,as follows. We consider the set of sources with n symbols asthe sample space and assume a uniform distribution over thisset. In this way, Huffman redundancy is a random variablewhose statistical properties reflect the overall performance ofthe Huffman coding scheme for n-tuple sources. For instance,the expected value of this random variable carries someinformation about the overall performance of the Huffmancode, as it is influenced by all sources.

This approach was previously proposed and used in [8] inorder to evaluate the overall performance of the Shannon code.

1Similar argument was made in [8] for the Shannon coding scheme.

0018-9448 © 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

NARIMANI AND KHOSRAVIFARD: HUFFMAN REDUNDANCY FOR LARGE ALPHABET SOURCES 1413

Fig. 1. The empirical histogram of the Huffman redundancy over the set ofsources with n symbols for n = 10, 20, 50, 100, 200, 500 and 1000.

In this way, the average redundancy of the Shannon code overthe set of all sources with n symbols was formulated andstudied. Also, the asymptotic behaviors (i.e., as the alphabetsize n tends to infinity) of the average and the variance ofthe Shannon redundancy were investigated. It was shown thatfor almost all sources with a large alphabet, the Shannonredundancy is very close to 0.5 bit.

Clearly, in order to have a complete evaluation of the per-formance of a coding scheme, it is ideal to have the probabilitydensity function of the redundancy which includes all we needto know about the overall performance. However, since there isno explicit relationship between the symbol probabilities andthe Huffman codelengths, it seems impossible to analyticallyderive a formula for the pdf of the Huffman redundancy. Thus,in the first step, to gain an insight into the issue, we employedsimulation. To do so, for a given n, 108 n-tuple sourceswere generated at random, and their corresponding Huffmanredundancies were computed. The empirical histogram ofthe Huffman redundancy is illustrated in Fig. 1 for n =10, 20, 50, 100, 200, 500 and 1000.

Three major observations from Fig. 1 are:

1) In general, the Huffman redundancy is distributed in abell-shaped form; very small and very large redundan-cies (close to 0 or 1) are not likely.

2) As n increases, the pdf tends to be more concentratedaround its average. In other words, the variance of theredundancy decreases, and may tend to zero, as thealphabet size increases.

3) As n increases, the mean of the Huffman redundancygets close to 0.0288 bit.

Although it seems very hard to formulate the pdf of theHuffman redundancy in general case, based on the aboveobservations the problem sounds solvable in the asymptoticcase (i.e., for large n). This is because, if the variance tendsto zero, the asymptotic behavior of the pdf will depend onlyon its expected value, which in general seems to be moretractable than the pdf.

The goal of this paper is to verify observations II and III.More precisely, we analytically prove that:“when n is sufficiently large, the Huffman redundancy is veryclose to 0.0288 bit for almost all sources with n symbols.”

The main challenge of studying the average redundancy of theHuffman code in this paper is that there is no explicit relationbetween the symbol probabilities and Huffman codelengths,while in the case of the Shannon code [8] the length of thei th codeword is completely determined by the i th symbolprobability, i.e., �i = �− log pi�.

The rest of this paper is organized as follows. In thenext section, after providing the required definitions, somenotations are introduced. The main results of this paper arebriefly presented and discussed in Section III and their proofsare provided in Sections IV–VI.

II. DEFINITIONS AND NOTATIONS

In the rest of this paper, all random variables/vectors aredenoted by boldface font; and the expected value and thevariance are denoted by, respectively, E[.] and V ar [.]. Also,

for two random variables/vectors X and Y, Xd= Y stands

for equality in distribution, and fX(.) denotes the probabilitydensity function (pdf ) of X.

Throughout this paper, n is the number of symbols(or the alphabet size) of the source/sources under considera-tion. Let pi , 1 ≤ i ≤ n, denote the probability that the sourceoutputs the i -th symbol. Since

∑ni=1 pi = 1, a memoryless

information source with n symbols is completely characterizedby an (n−1)-tuple vector Pn = (p2, p3, . . . , pn). In this way,we implicitly consider p1 � 1 −∑n

i=2 pi .For a source Pn , let H (Pn), L∗(Pn), R∗(Pn) and

�∗i (Pn) stand for respectively the entropy, the average

Huffman codelength, the Huffman redundancy and the i -th Huffman codelength (corresponding to the i -th symbol)of Pn , i.e.,

H (Pn) � −n∑

i=1

pi log pi , L∗(Pn) �n∑

i=1

pi �∗i (Pn) ,

and R∗(Pn) � L∗(Pn) − H (Pn).

A memoryless source is monotone if the symbolprobabilities are sorted in descending order. In this paper, theproblem is defined on the set of ordinary sources while thesolution is based on the set of monotone sources. Therefore,to distinguish between monotone sources and ordinary ones,we show the i -th symbol probability of a monotone sourcewith n symbols by qi , and the corresponding symbol proba-bility vector by the (n−1)-tuple vector Qn = (q2, q3, . . . , qn).Also, we use the convention q1 � 1 −∑n

i=2 qi . Thus, usingthe notation “q” instead of “p” (and “Q” instead of “P”)inherently implies the monotonicity of the source, i.e., q1 ≥q2 ≥ · · · ≥ qn . We define �n (resp. �n) as the set of allordinary sources (resp. monotone sources) with n symbols.That is,

�n �{

(p2, p3, . . . , pn)

∣∣∣∣∣

(1 −

n∑

i=2

pi

), p2, p3, . . . , pn≥0

}

,

�n �{

(q2, q3, . . ., qn)

∣∣∣∣∣

(1−

n∑

i=2

qi

)≥q2 ≥q3 ≥· · ·≥ qn ≥0

}

.


Defining vol(A) as the volume of a set A, it isknown [9, Equation (57)] that

vol(�n) = n! vol(�n) = 1

(n − 1)! . (1)

Let Pn = ( p2, p3, . . . , pn) and Qn = (q2, q3, . . . , qn)denote two random vectors uniformly distributed over, respec-tively, �n and �n . That is,

f Pn (Pn) = 1

vol(�n). I(Pn ∈ �n),

f Qn (Qn) = 1

vol(�n). I(Qn ∈ �n),

where I(.) denotes the indicator function. Also we define therandom variables p1 � 1 −∑n

i=2 pi and q1 � 1 −∑ni=2 qi .

Recall that the statistical properties of the random variableR∗(Pn) reflect the overall performance of the Huffman codingscheme for n-tuple sources.

A very special element of �n which plays a crucialrole in this paper is the average monotone sourceQn = (qn

2 , qn3 , . . . , qn

n ) defined by

qni = 1

n

n∑

k=i

1

kfor 2 ≤ i ≤ n. (2)

Note that qn1 � 1 −

n∑

i=2qn

i implies qn1 = 1

n

n∑

i=1

1i . The

importance of Qn is that it is actually the average of allmonotone sources with n symbols [9], that is

Qn = E[ Qn]. (3)

Hence, it is called the average monotone source/distribution [9].Throughout this paper, we use the following notations:

• γ � 0.5772: the Euler-Mascheroni constant.• e � 2.71828: Euler’s constant.• ��x�� x� − x .• log(.): logarithm in base 2.• I(.): Indicator function.• vol(A): Volume of the set A.• R: The set of real numbers.• Q

+: The set of nonnegative rational numbers.• Z

+: The set of nonnegative integers.

For any sequence {ak}, we use the convention∑ j

k=i ak = 0if i > j .

Also for every source with n symbols we consider pi = 0for i ≥ n + 1.

III. MAIN RESULTS AND DISCUSSION

The main result of this paper is an analytic proof forverifying observations II and III given in Introduction. Weprove them in two steps: Theorem 1 and Theorem 2. Theproofs are provided in sections V and VI.

Theorem 1: As n increases, the Huffman redundancy of Pn

converges in probability to that of Qn [defined in (2)], that is

limn→∞ R∗(Pn) − R∗(Qn) = 0 in probability. (4)

Fig. 2. The behavior of R∗(Qn) for 3 ≤ n ≤ 105.

In fact, Theorem 1 states that the Huffman redundancy ofalmost all sources with large alphabet size n is very closeto that of Qn . In the proof of Theorem 1 we will show that

R∗(Pn)d= R∗( Qn). On the other hand, the principal property

of Qn is that Qn = E[ Qn]. Thus from (4), we actually have

limn→∞ R∗( Qn) − R∗(E[ Qn]) = 0 in probability.

This means that for sufficiently large n, the behavior of almostall monotone sources in the sense of Huffman redundancyis very close to that of their average, i.e., Qn . This pointmay seem expectable with justification that most elements ofa set may behave like its average element in some senses.However, note that it is not the case for the set of ordinarysources with n symbols �n and the average of its elementsE[Pn] = ( 1

n , 1n , . . . , 1

n ), i.e.,

R∗(Pn) − R∗(E[Pn]) �→ 0 in probability.

This reveals the crucial role of monotone sources in ouranalysis.

Using Theorem 1 and noting that the Huffman redundancyis bounded, it is not hard to show that

limn→∞ E

[R∗(Pn)

]− R∗(Qn) = 0 (5)

andlim

n→∞ V ar[

R∗(Pn)]

= 0.

All the above arguments suggest that we should study theasymptotic behavior of R∗(Qn). This is shown in Fig. 3 for3 ≤ n ≤ 105. It seems that R∗(Qn) rapidly converges to areal number about 0.288 bit as n increases. However, as Fig.2 shows for 103 ≤ n ≤ 105, there is a very small oscillationin R∗(Qn) for large values of n. In fact, although R∗(Qn) isvery close to 0.0288 for large n, it is a divergent sequence ofn. This point is proved in Theorem 2.

Theorem 2: For the Huffman redundancy of Qn we have

0.0287376755 < lim infn→∞ R∗(Qn)

< lim supn→∞

R∗(Qn) < 0.0287681321.


Fig. 3. Oscillatory behavior of R∗(Qn) for large n.

Combining Theorem 1 and Theorem 2, we can conclude themain statement of this paper.

Corollary 1: If n is sufficiently large, the Huffmanredundancy of almost all sources with n symbols lies between0.0287376755 and 0.0287681321 bit.Although the Huffman redundancy can potentially be closeto 1 bit, it is concluded from Corollary 1 that the Huffmanredundancy is likely to be as small as 0.0288 for large alphabetsizes. As we will see in Subsection VI-A, the real number0.0288 is approximately equal to the constant

log log e−γ log e−e−1+∞∑

k=1

(1−e−2−k−e−2k)�0.0287675781.

The average redundancy of the Huffman code over �n wasstudied in [8]. It was proved that [8, Eq. (30)]

lim supn→∞

E[R∗(Pn)

]< 0.086. (6)

Moreover, in [8, Eq. (32)] and [10] it was conjectured (viacomputation and simulation) that

lim supn→∞

E[R∗(Pn)

]< 0.029. (7)

Now using the results of this paper, i.e. (5) and Theorem 2,we have

0.0287376755 < lim infn→∞ E

[R∗(Pn)

]

< lim supn→∞

E[R∗(Pn)

]< 0.0287681321

which justifies the conjecture in (7) and improves the boundof (6).

In [8] it is shown that the redundancy of the Shannon codelies between 0.5 − 1.6 × 10−5 and 0.5 + 1.6 × 10−5 bit foralmost all n-tuple sources with sufficiently large n. Combiningwith Corollary 1, one can conclude that the Huffman codeis approximately 0.47 bit better than the Shannon code foralmost all sources with large alphabet. Moreover, Huffmancode and Shannon code are similar in sense that for both codes,the average redundancy is not a convergent sequence of thealphabet size n.

A real number close to 0.0288 has previously appearedin the context of Huffman redundancy. In [11], Gallagerstudied the Huffman redundancy for geometrically distrib-uted integer alphabets i = 0, 1, 2, . . . with distribution(1 − θ)θ i , where θ is a real number in (0, 1). It is shownthat as θ tends to 1, the Huffman redundancy oscillatesaround 2 − log e − log log e � 0.0285386 bit. However, theproblem of [11] is basically different from ours in thispaper.

The asymptotic redundancy of the Huffman block code forbinomial sources is studied in [12], where Szpankowski con-sidered a binary source generating a sequence of length n dis-tributed as Binomial (n, p) with p ≤ 1

2 being the probability ofemitting 0 or 1. It is shown that, as block length n increases,if log(1 − p)/p be irrational then the Huffman redundancytends to 3

2 − log e � 0.057304; and if log(1− p)/p be rationalthen the Huffman redundancy oscillates around 0.057304. Thisvalue is very close to 2 × 0.0288. Nevertheless, note thatthe problem in [12] is basically different from that of thispaper.

The remainder of this paper is organized as follows. InSection IV some basic properties of Pn and Qn are presented.Using these properties, Theorem 1 is proved in Section V.Section VI is dedicated to prove Theorem 2.

IV. PRELIMINARY LEMMAS

In this section some important properties of random sourcesPn and Qn , which will be used in the proof of Theorem 1,are presented.

Lemma 1: We have

E[

pi

] = 1

n, E

[p2

i

]= 2

n(n + 1)and E

[pi p j

] = 1

n(n + 1)

for every 1 ≤ i, j ≤ n and i �= j .The proof is straightforward (and thus omitted) using

[8, Lemma 1] which says

f pi(p) =

⎧⎨

⎩

(n − 1) (1 − p)n−2 if 0 ≤ p ≤ 1

0 otherwise(8)

and

f pi , p j

(pi , p j

) =

⎧⎪⎪⎨

⎪⎪⎩

(n − 1)(n − 2)(1 − pi − p j )n−3

if pi , p j ≥ 0 and pi + p j ≤ 1

0 otherwise

for every 1 ≤ i, j ≤ n and i �= j .In [9, Lemma 5], it is shown that if T (x1, x2, . . . , xn)

is a symmetric function, i.e., its value is invariant underany permutation of x1, x2, . . . , xn , then the average ofT(

p1, p2, . . . , pn

)and T

(q1, q2, . . . , qn

)are the same. The

following lemma states a superior property for such symmetricfunctions.

Lemma 2: If T(.) is a symmetric function, then

T(

p1, p2, p3, . . . , pn

) d= T(q1, q2, q3, . . . , qn

).

Proof: The proof is provided in Appendix A.


Corollary 2: Since the entropy H (.), the average Huff-man codelength L∗(.) and the Huffman redundancy R∗(.)are all symmetric functions of symbol probabilities, wehave

H (Pn)d= H ( Qn), L∗(Pn)

d= L∗( Qn), R∗(Pn)d= R∗( Qn).

This is one of the keys to proving Theorem 1. In thisway, we deal with Qn and �n instead of Pn and �n .The following lemmas show the essential relation betweenQn and Pn .

Lemma 3: Let x1, x2, . . . , xn be n i.i.d. random variableswith generic pdf fx(x) = e−xu(x). Defining

pi � xi∑n

k=1 xkand qi �

∑nj=i

1j x j

∑nk=1 xk

(9)

for i =1, 2, . . . , n, we have the following statements:S1: p1 = 1 − ∑n

i=2 pi and ( p2, p3, . . . , pn) is uniformlydistributed over �n .S2: q1 = 1 − ∑n

i=2 qi and (q2, q3, . . . , qn) is uniformlydistributed over �n .Proof: S1 is a well-known property in probability theory. Forexample see [13, Theorem 4.1]. The proof of S2 is providedin Appendix A.

Lemma 4: For the random sources Pn = ( p2, p3, . . . , pn)and Qn = (q2, q3, . . . , qn), we have

q id=

n∑

j=i

1

jp j for 1 ≤ i ≤ n. (10)

Proof: Comparing the definitions of Pn and Qn with thestatements S1 and S2, we see that

pid= pi and qi

d= q i for 1 ≤ i ≤ n. (11)

On the other hand, (9) implies that

qi =n∑

j=i

1

jp j for 1 ≤ i ≤ n. (12)

Combining (11) and (12) gives (10).Noting E

[pi

] = 1n (from Lemma 1) and (10), we conclude

E[qi

] = 1n

n∑

j=i

1j and hence Qn = E

[Qn]. Therefore, Qn

is the average of n-tuple monotone distributions (monotonesources with n symbols). This relation was previously derivedin [9] through a different approach.

In the rest of this section we use Lemma 4 to derive someproperties of the random monotone source Qn .

Lemma 5: We have

limn→∞ Pr

{

qn >1

n3

}

= 1.

Proof: Using (8) and Lemma 4, which says qnd= 1

n pn , wecan write

Pr

{

qn >1

n3

}

= Pr

{

pn >1

n2

}

=∫ 1

1n2

(n − 1)(1 − x)n−2dx

=(

1 − 1

n2

)n−1

≥ e− 2(n−1)

n2

where the last expression comes from the fact that e−2x ≤ 1−xfor 0 ≤ x ≤ 1

2 .Lemma 6: We have

E

[n∑

i=1

(qi − qn

i

)2]

= − 2

n(n + 1)+ 1

n2

n∑

k=1

1

k

= O

(ln n

n2

)

. (13)

Proof: As Lemma 4 implies qid= ∑n

k=i1k pk , we can write

E

[(q i − qn

i

)2]

= E

⎡

⎣

(n∑

k=i

1

kpk − qn

i

)2⎤

⎦

(a)= E

⎡

⎣

(n∑

k=i

1

k

(

pk − 1

n

))2⎤

⎦

=n∑

k=i

n∑

m=i

1

kmE

[(

pk − 1

n

)(

pm − 1

n

)]

(b)=n∑

k=i

n∑

m=i

1

km

(

E[

pk pm

]− 1

n2

)

=n∑

k=i

1

k2

(

E[

p2k

]− 1

n2

)

+2n−1∑

k=i

n∑

m=k+1

1

km

(

E[

pk pm]− 1

n2

)

(c)=(

2

n(n + 1)− 1

n2

) n∑

k=i

1

k2

+2

(1

n(n + 1)− 1

n2

) n−1∑

k=i

n∑

m=k+1

1

km

= n − 1

n2(n + 1)

n∑

k=i

1

k2

+ −2

n2(n + 1)

n−1∑

k=i

n∑

m=k+1

1

km.


where (a) follows from (2), and (b) and (c) follow fromLemma 1. Thus we have

E

[n∑

i=1

(qi − qn

i

)2]

=n∑

i=1

E

[(qi − qn

i

)2]

= n − 1

n2(n + 1)

n∑

i=1

n∑

k=i

1

k2 − 2

n2(n + 1)

n∑

i=1

n−1∑

k=i

n∑

m=k+1

1

km

(d)= n − 1

n2(n + 1)

n∑

j=1

1

j− 2

n2(n + 1)

⎛

⎝n −n∑

j=1

1

j

⎞

⎠

= − 2

n(n + 1)+ 1

n2

n∑

j=1

1

j

= O

(ln n

n2

)

where (d) follows from the facts that

n∑

i=1

n−1∑

k=i

n∑

m=k+1

1

km= n −

n∑

j=1

1

j

andn∑

i=1

n∑

k=i

1

k2 =n∑

j=1

1

j

which can be easily proved by induction.The following lemma demonstrates the asymptotic behavior

of Qn in the sense of Variational distance from Qn .Lemma 7: There exists ρ0 ∈ R

+ such that

limn→∞ Pr

{n∑

i=1

∣∣∣qi − qn

i

∣∣∣ ≥

√

ρ0ln n√

n

}

= 0. (14)

Proof: From (13), we know that there exist ρ0 ∈ R+ such

that

E

[n∑

i=1

(qi − qni )2

]

≤ ρ0ln n

n2

for sufficiently large n. Thus, using Markov inequality we canwrite

1√n

≥ Pr

{n∑

i=1

(qi − qni )2 ≥ √

nE

[n∑

i=1

(qi − qni )2

]}

≥ Pr

{n∑

i=1

(qi − qni )2 ≥ √

nρ0ln n

n2

}

. (15)

It is not hard to show that for n real numbers r1, r2, . . . , rn ,we have

n∑

i=1

|ri | ≤√√√√n

n∑

i=1

r2i .

Thus, using (15) we can write

1√n

≥ Pr

{n∑

i=1

(q i − qni )2 ≥ ρ0

ln n

n√

n

}

≥ Pr

{n∑

i=1

|qi − qni | ≥

√

ρ0ln n√

n

}

.

V. PROOF OF THEOREM 1

In this section we provide the proof of Theorem 1. First,we prove the following lemma on the largest Huffmancodelength.

Lemma 8: Let �∗max(Pn) denote the largest Huffman

codelength of a given source Pn . We have,

�∗max(Qn) ≤ 3 log n for n ≥ 2 (16)

andlim

n→∞ Pr{�∗

max( Qn) ≤ 5 log n} = 1. (17)

Proof: Let pi denote the i -th symbol probability of a sourceand pmin = mini pi . It is well-known that the largest Huffmancodelength �∗

max(.) is associated with the smallest symbolprobability pmin. Also, let fk denote the k-th Fibonacci numbergiven by fk = φk−(−φ)−k√

5where φ = 1+√

52 (Binet’s formula).

In [14, Theorem 1] it is shown that �∗i ≤ m if pi ≥ 1

fm+2. Thus,

noting that fm+2 > φm , we have �∗max ≤ m if pmin ≥ φ−m .

Hence, in order to prove (16) and (17), it is enough to showthat

qnn > φ−3 log n (18)

andlim

n→∞ Pr{

qn > φ−5 log n}

= 1, (19)

respectively. Noting qnn = n−2 and 3 log φ > 2, (18) is verified

and hence (16) is proved. Also, from 5 log φ > 3 we can write

Pr{

qn > φ−5 log n}

> Pr

{

qn >1

n3

}

.

Thus Lemma 5 verifies (19) and consequently (17) isproved.Proof of Theorem 1: From [10, Eq. (9)] we have

limn→∞ H (Pn) − E

[H (Pn)

] = 0 in probability.

On the other hand, it is proved in [9, Theorem 6] that

limn→∞ E

[H (Pn)

]− H (Qn) = 0.

Combining these we have

limn→∞ H (Pn) − H (Qn) = 0 in probability. (20)

Thus Theorem 1 is proved if we show that

limn→∞ L∗(Pn) − L∗(Qn) = 0 in probability. (21)

Recall that Corollary 2 gives L∗(Pn)d= L∗( Qn). Hence, (21)

is equivalent to

limn→∞ L∗( Qn) − L∗(Qn) = 0 in probability. (22)


In the remainder of this section, we concentrate on proving(22) which completes the proof.

Noting the optimality of the Huffman code, we can write

L∗( Qn) − L∗(Qn) =n∑

i=1

qi�∗i ( Qn) −

n∑

i=1

qni �∗

i

(Qn)

≤n∑

i=1

qi�∗i

(Qn)−

n∑

i=1

qni �∗

i

(Qn)

=n∑

i=1

(q i − qn

i

)�∗

i

(Qn)

≤n∑

i=1

∣∣∣qi − qn

i

∣∣∣ �∗

i

(Qn)

≤ 3 log nn∑

i=1

∣∣∣q i − qn

i

∣∣∣ , (23)

where the last expression follows from (16). Combining (23)and (14), we obtain

limn→∞ Pr

{

L∗( Qn) − L∗(Qn) > 3 log n

√

ρ0ln n√

n

}

= 0. (24)

On the other hand, we can write

L∗( Qn) − L∗(Qn) =n∑

i=1

q i�∗i ( Qn) −

n∑

i=1

qni �∗

i

(Qn)

≥n∑

i=1

q i�∗i

(Qn)−

n∑

i=1

qni �∗

i

(Qn)

=n∑

i=1

(qi − qn

i

)�∗

i

(Qn)

≥ −n∑

i=1

∣∣∣qi − qn

i

∣∣∣ �∗

i

(Qn)

≥ −�∗max

(Qn)

n∑

i=1

∣∣∣qi − qn

i

∣∣∣ . (25)

Combining (25), (14) and (17) results in

limn→∞ Pr

{

L∗( Qn)− L∗(Qn) < −5 log n

√

ρ0ln n√

n

}

= 0. (26)

Noting (24), (26), and limn→∞ log n

√ρ0

ln n√n

= 0, we conclude(22) and the proof of Theorem 1 is complete.

VI. PROOF OF THEOREM 2

This section is dedicated to proving Theorem 2which characterizes the asymptotic behavior of R∗(Qn).In [9, Theorem 4], it is shown that

limn→∞ log n − H (Qn) = (1 − γ ) log e. (27)

Noting (27) and R∗(Qn) = L∗(Qn) − H (Qn), the problemreduces to analyzing the asymptotic behavior of L∗(Qn) −log n. First, we define two useful representations for thecodelength sequences, which considerably simplify our futureformulation and analysis.

Definition 1: Consider an n-tuple prefix-free code withnon-decreasing codelength sequence �1, �2, . . . , �n , i.e., �1 ≤�2 ≤ · · · ≤ �n . The CM2-representation of this sequence isthe triple

[�0,�; {zk}�+1

k=0

]where �0 and � are two integers

satisfying �0 ≤ �1 and � + �0 ≥ �n , and

zk =∣∣∣{

i | �i ≤ �0 + k − 1}∣∣∣ for 0 ≤ k ≤ � + 1. (28)

Alternatively, we use NCM3-representation[�0,�; {αk}�+1

k=0

]

whereαk = zk

nfor 0 ≤ k ≤ � + 1.

Regarding this definition, the following points areremarkable:

1) A codelength sequence has differentCM-representations. For instance, both [1, 7; (0, 0,2, 4, 7, 8, 10, 10, 10)] and [2, 4; (0, 2, 4, 7, 8, 10)] areCM-representations for the codelength sequenceL0 = (2, 2, 3, 3, 4, 4, 4, 5, 6, 6).

2) In the literature, some authors prefer to use the mul-tiplicity sequence instead of the codelength sequence.The CM-representation defined above, actually reflects(though is not equal to) the cumulative sum of the multi-plicity sequence. For instance, the multiplicity sequenceof L0 is M0 = 0, 2, 2, 3, 1, 2, 0, 0, 0, . . . and the cumu-lative sum of M0 is 0, 2, 4, 7, 8, 10, 10, 10, . . ..

3) We have always z0 = α0 = 0, z�+1 = n andα�+1 = 1.

4) The sequences {zi }�+1

i=0and {αi }�+1

i=0are nondecreasing.

5) For each i , 1 ≤ i ≤ n, we have zk < i ≤ zk+1 for somek = 0, 1, 2, . . . ,�. Also we have

�i = �0 + k iff zk < i ≤ zk+1 . (29)

6) The Kraft-sum of a codelength sequence can bereformulated as

n∑

i=1

2−�i =�∑

k=0

zk+1∑

i=zk +1

2−�i

=�∑

k=0

zk+1∑

i=zk +1

2−(�0+k)

=�∑

k=0

(zk+1 − zk)2−(�0+k)

= 2−�0

(

n2−� +�∑

k=1

zk2−k

)

(30)

= n2−�0

(

2−� +�∑

k=1

αk2−k

)

. (31)

Lemma 9: The average codeword length of a code for thesource Qn is formulated by

n∑

i=1

qni �i = �0 + � −

�∑

k=1

zk

(1

n+ qn

zk+1

)

(32)

2Cumulative Multiplicity.3Normalized Cumulative Multiplicity.


and bounded as

�0 + � −�∑

k=1

αk (1 − ln αk) ≤n∑

i=1

qni �i

≤ �0 + � −�∑

k=1

αk

(

1 − lnαk + 1

n

1 + 1n

)

. (33)

Proof: First, note that for an arbitrary sequence a0, a1, . . . wehave

m∑

k=0

k (ak+1 − ak) =m∑

k=1

(am+1 − ak) . (34)

Also we have [9, Equations (32), (33)]

z∑

i=1

qni = z

(1

n+ qn

z+1

)

for 1 ≤ z ≤ n. (35)

Noting (29), we can write

n∑

i=1

qni �i =

�∑

k=0

zk+1∑

i=zk +1

qni �i

=�∑

k=0

zk+1∑

i=zk +1

qni (�0 + k)

= �0

�∑

k=0

zk+1∑

i=zk+1

qni +

�∑

k=0

⎛

⎝kzk+1∑

i=zk +1

qni

⎞

⎠

= �0

n∑

i=1

qni +

�∑

k=0

k

(zk+1∑

i=1

qni −

zk∑

i=1

qni

)

(a)= �0 +�∑

k=1

(z�+1∑

i=1

qni −

zk∑

i=1

qni

)

(b)= �0 +�∑

k=1

(

1 −zk∑

i=1

qni

)

(c)= �0 +�∑

k=1

[

1 − zk

(1

n+ qn

zk+1

)]

= �0 + � −�∑

k=1

zk

(1

n+ qn

zk+1

)

,

where (a) follows from (34), (b) follows from z�+1 = n,and (c) follows from (35). Thus (32) is proved.

In order to derive (33), we use the inequality

1

nln

n + 1

z + 1≤ qn

z+1 ≤ 1

nln

n

z

derived in [9, Eq. (11)]. Combining this with (32) completesthe proof for (33).

In the following subsections we use the aforementionedlemmas to derive acceptable lower and upper bounds onL∗(Qn) − log n and consequently R∗(Qn).

A. Lower Bounding lim inf R∗(Qn)

The main goal of this subsection is to derive a lower boundon R∗(Qn) and examine its asymptotic behavior. Recall that

the Huffman code has the minimum average codelength, i.e.,

L∗(Qn) = min�1,�2,...,�n∈N s.t .

n∑

i=12−�i ≤1

n∑

i=1

qni �i . (36)

Since all Huffman codelengths of any arbitrary n-tuple sourcelie between 1 and n − 1, we set �0 = 1, and � = n − 2 anduse (30) and (32) to rewrite (36) as

L∗(Qn)

= minz1,z2,...,zn−2∈Z+ s.t .

n21−n+n−2∑

k=1zk 2−k−1≤1

n − 1 −n−2∑

k=1

zk

(1

n+ qn

zk+1

)

(a)≥ minα1,α2,...,αn−2∈R s.t .nα1,nα2,...,nαn−2∈Z+

n

(

21−n+n−2∑

k=1αk 2−k−1

)

≤1

n − 1 −n−2∑

k=1

αk (1 − ln αk)

≥ minα1,α2,...,αn−2∈R s.t .

n

(

21−n+n−2∑

k=1αk 2−k−1

)

≤1

n − 1 −n−2∑

k=1

αk (1 − ln αk)

(37)

where (a) follows from (31) and the left-hand side of (33).Note that since αn−1 and zn−1 take a fixed value (1 and n,respectively), they are not considered in the above minimiza-tion expressions.

The minimization problem in (37) can be solved using themethod of Lagrange multipliers. In this way, if we considerthe Lagrange function

�(α1, . . . , αn−2; λ) = n − 1 −n−2∑

k=1

αk (1 − ln αk)

+λ

[

n

(

21−n +n−2∑

k=1

αk2−k−1

)

−1

]

,

∇� = 0 occurs at

α∗k = e−n2−k−1λ∗(n) 1 ≤ k ≤ n − 2 (38)

where λ∗(n) satisfies

1 = n21−n +n−2∑

k=1

n2−k−1e−n2−k−1λ∗(n). (39)

Substituting (38) in the objective function of (37), we get

L∗(Qn) ≥ n − 1 −n−2∑

k=1

e−nλ2−k−1(

1 + nλ2−k−1)

(40)

for λ = λ∗(n). Now, define

f(x) �∞∑

k=1

x2−ke−x2−k.


In Lemma 11 in Appendix B, we prove that

ξ < f(x) for x > 20, (41)

where ξ = 1.44268073. Noting (41), we have

ξ < f

(nξ

2

)

for n ≥ 28 > 40ξ .

Thus we can write for n ≥ 28

1 <1

ξf

(nξ

2

)

=∞∑

k=1

n2−k−1e−nξ2−k−1

=∞∑

k=n−1

n2−k−1e−nξ2−k−1 +n−2∑

k=1


<

∞∑

k=n−1

n2−k−1 +n−2∑

k=1


= n21−n +n−2∑

k=1


which, with the fact that the right-hand side of (39) is adecreasing function of λ∗, implies ξ < λ∗(n). On the otherhand, (40) does hold for any λ ≤ λ∗(n) as the expression inthe right-hand side is an increasing function of λ for λ > 0.Hence, (40) is satisfied for n ≥ 28 and λ = ξ . Therefore,defining

g(x) �∞∑

k=1

[1 −

(1 + x2−k

)e−x2−k

]− log x

we can write

L∗(Qn) − log n

≥ n − 1 − log n −n−1∑

k=2

e−nξ2−k(

1 + nξ2−k)

= 1 − log n +n−1∑

k=2

[1 −

(1 + nξ2−k

)e−nξ2−k

]

= log ξ + g

(nξ

2

)

−∞∑

k=n

[1 −

(1 + nξ2−k

)e−nξ2−k

]

≥ log ξ + g

(nξ

2

)

−∞∑

k=n

(1 − e−nξ2−k

)

(a)≥ log ξ + g

(nξ

2

)

−∞∑

k=n

nξ2−k

= log ξ + g

(nξ

2

)

− ξn21−n, (42)

where (a) follows from the inequality 1 − e−x ≤ x for x ≥ 0.Lemma 12 in Appendix B states that for x > 20

g(x) > ϑ, (43)

where ϑ = −1.10996329. Combining with (42), we have

L∗(Qn) − log n ≥ log ξ + ϑ − ξn21−n

for n ≥ 28. Hence,

lim infn→∞ R∗(Qn)

= lim infn→∞

[L∗(Qn) − log n

]+[

log n − H (Qn)]

≥ log ξ + ϑ + (1 − γ ) log e

≥ 0.0287376755. (44)

Remark: To have an idea of which mathematicalconstants constitute the lower bound 0.0287376755 onlim infn→∞ R∗(Qn), note the proofs of Lemmas 11 and 12(given in Appendix B) that show

ξ ≈ log e and ϑ ≈ − log e−e−1+∞∑

k=1

(1 − e−2−k − e−2k

).

Thus, the real number 0.0287376755 can be regarded as theapproximate value of

log log e − γ log e − e−1 +∞∑

k=1

(1 − e−2−k − e−2k

).

B. Upper Bounding lim sup R∗(Qn)

Clearly, the average codelength of any n-tuple prefix-freecode is an upper bound on the Huffman average codelengthof Qn . Therefore, to derive an upper bound on L∗(Qn), it isenough to appropriately find an n-tuple codelength vector, or

equivalently its NCM-representation[�0,�; {αi }�+1

i=1

], which

satisfies the Kraft inequality, and then evaluate the correspond-ing average codeword length with Qn . This is what we do inthis subsection.

Definition 2: Let ζ(t) : [0, 1] −→ [0, 1] denote the root ofthe function

ϕt (x) = 2−14 − 2t−4 +14∑

i=0

2−i x2−i, (45)

that is,14∑

i=0

2−iζ(t)2−i = 2t−4 − 2−14. (46)

Note that for t ∈ [0, 1], ϕt (x) has a unique root in the interval[0, 1] and hence ζ(t) is well-defined. This is because ϕt (0) <0, ϕt (1) > 0 and ϕ′

t (x) > 0 for x ≥ 0.Definition 3: For a given n ≥ 64, we define code-

lengths of the S(n) code, i.e., �S(n)

1 , �S(n)

2 , . . . , �S(n)

n , by

NCM-representation[�log n� − 5, 15; {αS(n)

k }16

k=0

], where

αS(n)

i = �nχ21−i

n �n

for 1 ≤ i ≤ 15 (47)

and χn is the root of ϕt (x) for t = ��log n��, i.e.,

χn = ζ(��log n��). (48)

Clearly, we set αS(n)

0 = 0 and αS(n)

16 = 1.


Remark: In this remark we clarify why the S(n) codeis designed as mentioned in Definition 3. From (33) welearned that �0 + � −∑�

k=1 αk (1 − ln αk) is an acceptableapproximation of

∑ni=1 qn

i �i for large n. Thus, in order to findan acceptable suboptimal code for Qn (with large n) we canconsider the following optimization problem

arg minα1,α2,...,α�∈R s.t.

nα1,nα2,...,nα�∈Z+

n2−�0

(

2−�+�∑

k=1αk 2−k

)

≤1

−�∑

k=1

αk (1 − ln αk) (49)

for some appropriate values of �0 and �. Using trial anderror we found that �0 = �log n� − 5 and � = 15 areappropriate choices which result in an acceptable suboptimalcode for Qn . This observation is analytically shown in the restof this subsection.

To find a suboptimal solution of (49) we can release theconstraints nαk ∈ Z

+, k = 1, 2, . . . ,�, and solve theproblem

arg minα1,α2,...,α�∈R s.t.

−2��log n��−5+2−15+15∑

k=1αk 2−k≤0

−15∑

k=1

αk (1 − ln αk) . (50)

Using Lagrange multipliers, the solution of (50) is derivedas α∗

k = χ21−n

n , k = 1, 2, . . .�. Note that (48) impliesϕ��log n��(χn) = 0 which guarantees that the Kraft condition doeshold. Finally, to satisfy the constraint nαk ∈ Z

+, we refine α∗k

as (47). Lemma 10 shows that this refinement does not violatethe Kraft inequality.

Lemma 10: The codelengths �S(n)

1 , �S(n)

2 , . . . , �S(n)

n satisfy theKraft inequality.Proof: Using the definition of the S(n) code and noting (31),we can write

n∑

k=1

2−�S(n)

k

= n2−�log n�+5×(

2−15 +15∑

i=1

�nχ21−i

n �n

2−i

)

≤ n2−�log n�+5×(

2−15 +15∑

i=1

χ21−i

n 2−i

)

= 2−��log n��+5×(

2−15 + 1

2

14∑

i=0

χ2−i

n 2−i

)

(a)= 2−��log n��+5×(

2−15 + 1

2×2−4×

[2��log n�� − 2−10

])

= 1,

where (a) follows from χn = ζ(��log n��) and (46).

Thus L∗(Qn) is upper bounded byn∑

i=1qn

i �S(n)

i . We use (47)

and the right-hand side of (33) to write

L∗(Qn) − log n

≤ − log n +n∑

i=1

qni �

S(n)

i

≤ − log n + �log n� + 10

−15∑

i=1

�nχ21−i

n �n

⎛

⎝1 − ln�nχ21−i

n �n + 1

n

1 + 1n

⎞

⎠

(a)≤ ��log n�� + 10

−15∑

i=1

nχ21−i

n − 1

n

⎛

⎝1 − lnnχ21−i

n −1n + 1

n

1 + 1n

⎞

⎠

= ��log n�� + 10

−15∑

i=1

(

χ21−i

n − 1

n

)(

1 − lnχ21−i

n

1 + 1n

)

, (51)

where (a) follows from the fact that

ν(t) = t

(

1 − lnt + 1

n

1 + 1n

)

is a nondecreasing function for t ∈ [0, 1]. Defining

hn(t) � t + 10 −14∑

i=0

(

ζ(t)2−i − 1

n

)(

1 − lnζ(t)2−i

1 + 1n

)

,

it is not hard to show that the function sequence {hn(t)}∞n=1uniformly converges to the function

h(t) � t + 10 −14∑

i=0

ζ(t)2−i(

1 − ln ζ(t)2−i)

(52)

on the interval [0, 1], and hence [15, Theorem 7.9] impliesthat

limn→∞ sup

0≤t<1|hn(t) − h(t)| = 0. (53)

Noting (51), we can write

L∗(Qn)−log n

≤ hn(��log n��)≤ sup

0≤t<1hn(t)

≤ sup0≤t<1

h(t) + sup0≤t<1

|hn(t) − h(t)| , (54)

and consequently (53) gives

lim supn→∞

L∗(Qn) − log n ≤ sup0≤t<1

h(t).

Thus the problem of upper bounding lim supn→∞ R∗(Qn)reduces to determining sup0≤t<1 h(t). To do so, we shoulddetermine and compare the values of h(t) at its critical pointsand the marginal points t = 0, 1. This is accomplished inLemma 13 (given in Appendix B), where we show that

sup0≤t<1

h(t) < −0.58118073159. (55)


Combining (55) and (27), we get

lim supn→∞

R∗(Qn) = lim supn→∞

L∗(Qn) − log n + (1 − γ ) log e

≤ sup0≤t<1

h(t) + (1 − γ ) log e

< −0.58118073159 + (1 − γ ) log e

< 0.0287681321. (56)

C. Divergency of R∗(Qn)

From (44) and (56), it can be seen that

lim supn→∞

R∗(Qn) − lim infn→∞ R∗(Qn) < 4 × 10−5.

Thus, one may guess that lim supn→∞ R∗(Qn) =lim infn→∞ R∗(Qn), i.e., R∗(Qn) is convergent. However, inthis subsection we prove that lim supn→∞ R∗(Qn) is strictlylarger than lim infn→∞ R∗(Qn). To do so, we show thefollowing statements:S1. There exists a sequence {nk}, k = 1, 2, 3, . . ., of positiveintegers such that

lim supk→∞

R∗(Qnk ) > η1, (57)

where η1 = 0.02876639123.S2. There exists a sequence

{nk}, k = 1, 2, 3, . . ., of positive

integers such that

lim infk→∞ R∗(Qnk ) < η2, (58)

where η2 = 0.02876489239.Noting that η1 > η2, the proof of Theorem 2 is complete.

In the remainder of this subsection, we prove S1 and S2.

S1. Let a∗ � 20.3 and consider the sequence nk � �a∗2k/ξ�.Fix arbitrary real β ∈ [20/a∗, 1). It is not hard to show thatthere exists K (β) such that β2x < 2x − 1 for any x > K (β).Substituting x with k + log a∗

ξ , we have

βa∗2k

ξ<

a∗2k

ξ− 1 < nk ≤ a∗2k

ξ

for k > K (β) − log a∗ξ , and consequently,

βa∗2k−1 <nkξ

2≤ a∗2k−1.

Hence, we have

nkξ

2= ak2k−1 for some ak ∈ (βa∗, a∗] ⊆ (20, 20.3].

Combining with (42) and (71) (in Appendix B, Lemma 12),we obtain

L∗(Qnk )−log nk

≥ log ξ+g (ak)−5 × 10−8−ξ nk21−nk

≥ log ξ + infβa∗<a≤a∗ g (a) − 5 × 10−8 − ξ nk21−nk

which, as k → ∞, gives

lim supk→∞

L∗(Qnk )−log nk ≥ log ξ+ infβa∗<a≤a∗ g (a)−5×10−8.

Now, define

g(x) �50∑

k=1

[1 − (1 + x2−k)e−x2−k

]− log x .

Note that g(x) is a finite series, and thus, it can be exactlycomputed for any given x . Since 1 − (y + 1)e−y > 0 fory > 0, we have g(x) > g(x), and consequently,

lim supk→∞

L∗(Qnk )−log nk ≥ log ξ+ infβa∗<a≤a∗ g (a)−5×10−8.

On the other hand, since g(x) is a continuous function of x ,we have

limβ→1− inf

βa∗<a≤a∗ g (a) = g(a∗) ≥ −1.10993448437.

Therefore, we can write

lim supk→∞

R∗(Qnk )

= lim supk→∞

[L∗(Qnk ) − log nk

]+ lim

k→∞

[log nk − H (Qnk )

]

≥ log ξ+g(a∗)−5×10−8+(1−γ ) log e

≥ 0.02876639123.

S2. Let t∗ � 0.92, and define nk � �2k−t∗� fork = 1, 2, 3, · · · . It is easy to show that for any ε > 0 thereexists K (ε) such that 2x−ε < 2x − 1 for any x > K (ε).Substituting x with k − t∗, we have

2k−t∗−ε < 2k−t∗ − 1 < nk ≤ 2k−t∗

for k > K (ε) + t∗, and hence we can write

t∗ ≤ ��log nk�� < t∗ + ε

for 0 ≤ ε < 1 − t∗. Combining with (54) we can write

L∗(Qnk ) − log nk

≤ supt∗≤t≤t∗+ε

hnk (t)

≤ supt∗≤t≤t∗+ε

h(t) + supt∗≤t≤t∗+ε

∣∣hnk (t) − h(t)

∣∣ ,

and noting [15, Theorem 7.9],

lim infk→∞ L∗(Qnk ) − log nk ≤ sup

t∗≤t≤t∗+εh(t).

On the other hand, since h(t) is continuous, we have

limε→0+ sup

t∗≤t≤t∗+εh(t) = h(t∗)

≤ −0.58118397122.

Therefore, it is concluded that

lim infk→∞ R∗(Qnk )

= lim infk→∞

[L∗(Qnk )−log nk

]+ lim

k→∞

[log nk −H (Qnk )

]

≤ h(t∗) + (1 − γ ) log e

≤ 0.02876489239

and the proof is complete.Remark: The difference between η1 and η2 is less than

2 × 10−6. Thus we need to compute their values, and in


particular g(a∗) and h(t∗), to sufficiently high precision. Todo so, we have used the method in the Remark at the end ofAppendix B. All computations are carried out with 100 digitsprecision.

APPENDIX A

Proof of Lemma 2: We need to show that

Pr{T (Pn) ≤ x} = Pr{T (Qn) ≤ x} for x ∈ R.

Let σ = (σ (1), σ (2), . . . , σ (n)) denote a permutation of(1, 2, . . . , n) and Sn denote the set of all such permuta-tions. For σ ∈ Sn and A ⊂ �n , using the conventionq1 = 1 −∑n

i=2 qi , we define

σ(A) ={(qσ(2), qσ(3), . . . , qσ(n))

∣∣∣(q2, q3, . . . , qn) ∈ A

}.

Now, fix x and define A � {Qn |Qn ∈ �n, T(Qn) ≤ x }.Clearly, for all σ ∈ Sn , we have σ(A) ⊂ σ(�n) and hence

σ1(A) ∩ σ2(A) ⊂ σ1(�n) ∩ σ2(�n). (59)

On the other hand, for σ1 �= σ2, we have vol[σ1(�n) ∩σ2(�n)] = 0 which together with (59) gives vol[σ1(A) ∩σ2(A)] = 0. Thus we can write

vol [σ1(A) ∪ σ2(A)] = vol[σ1(A)] + vol[σ2(A)] (60)

for σ1 �= σ2. Also, we have

vol[σ(A)] = vol(A) (61)

for all σ ∈ Sn . Since T(.) is a symmetric function, we canwrite

T(Pn) ≤ x ⇐⇒ Pn ∈⋃

σ∈Sn

σ(A)

and hence

Pr{T(Pn) ≤ x} =vol

⎡

⎣⋃

σ∈Sn

σ(A)

⎤

⎦

vol(�n)

(a)=

∑

σ∈Sn

vol[σ(A)]

vol(�n)

(b)= |Sn| . vol(A)

vol(�n)

= n! vol(A)

vol(�n)

(c)= vol(A)

vol(�n)

= Pr{T (Qn) ≤ x}where (a), (b) and (c) follow from (60), (61) and (1),respectively.Proof of Lemma 3: S1 is a well-known property in probabilitytheory (see [13, Theorem 4.1]). Here we only provide the proof

for S2. First, noting (9) we haven∑

i=1

qi =n∑

i=1

∑nj=i

1j x j

∑nk=1 xk

=∑n

i=1∑n

j=i1j x j

∑nk=1 xk

=∑n

k=1 xk∑n

k=1 xk

= 1.

Thus we have q1 = 1 − ∑ni=2 q i . Now, consider the

transformation T : (x1, x2, . . . , xn) �−→ (q2, q3, . . . , qn, ς)defined by

⎧⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎩

ς =n∑

k=1

xk,

qi =∑n

k=i1k xk

∑nk=1 xk

for i = 2, 3, . . . , n.

(62)

Since we have

∂ qi

∂x j=

⎧⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎩

− qi

ςif i > j

1j − qi

ςif i ≤ j

and∂ς

∂x j= 1 for 1 ≤ j ≤ n,

the Jacobian matrix of T becomes

J =

⎡

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

∂ q2

∂x1

∂ q2

∂x2

∂ q2

∂x3. . .

∂ q2

∂xn

∂ q3

∂x1

∂ q3

∂x2

∂ q3

∂x3. . .

∂ q3

∂xn

......

.... . .

...

∂ qn

∂x1

∂ qn

∂x2

∂ qn

∂x3. . .

∂ qn

∂xn

∂ς

∂x1

∂ς

∂x2

∂ς

∂x3. . .

∂ς

∂xn

⎤

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

= −1

ς

⎡

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

q2 q2 − 12 q2 − 1

3 . . . q2 − 1n

q3 q3 q3 − 13 . . . q3 − 1

n

......

.... . .

...

qn qn qn . . . qn − 1n

−ς −ς −ς · · · −ς

⎤

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

.

By elementary row operations on J , it is not hard to show that

|det(J )| = 1

n!ς1−n .


Moreover, if we consider (62) as a system of linear equations(with xi ’s as unknown variables), then its unique solutionwould be

xk =

⎧⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎩

[(1 −∑n

j=2 q j

)− q2

]ς k = 1,

k(qk − qk+1)ς k = 2, . . . , n − 1,

nqnς k = n.

(63)

Since the solution is unique, we can write

f q2,q3,...,qn,ς

(q2, q3, . . . , qn, ς

) = fx1,...,xn(x1, . . . , xn)

|det(J )|

=

n∏

i=1fxi

(xi)

|det(J )|= n!ςn−1

n∏

i=1

fxi(xi )

= n!ςn−1n∏

i=1

e−xi u(xi)

= n!ςn−1

eς

n∏

i=1

u(xi). (64)

Noting (62) and (63), we observe thatn∏

i=1

u(xi) = u(ς) . I[(q2, q3, . . . , qn) ∈ �n

].

This point together with (64) give

f q2,q3,...,qn

(q2, q3, . . . , qn

)

= I[(q2, q3, . . . , qn) ∈ �n

]n!∫ ∞

−∞ςn−1

eςu(ς) dς

= I[(q2, q3, . . . , qn) ∈ �n

]n! (n − 1)!

= I[(q2, q3, . . . , qn) ∈ �n

] 1

vol(�n),

where the last line follows from (1).

APPENDIX B

Lemma 11: Consider the function

f(x) �∞∑

k=1

x2−ke−x2−k.

For x > 20 we have

1.44268073 < f(x) < 1.44270935.

Proof: It is easy to verify for a positive integer s that f(2sa) =f(a) + δf(s, a) where δf(s, a) �

∑s−1k=0 a2ke−a2k

is a positiveterm. Furthermore, for a > ln 2 we can write

0 < δf(s, a) <

∞∑

k=0

a2ke−a2k

< a∞∑

k=0

2ke−a(k+1)

= a

ea − 2

For x > 40, let

s(x) �⌊

logx

20

⌋and a(x) � x2−s(x). (65)

Clearly, s(x) is a positive integer and 20 ≤ a(x) < 40. Hencewe have

f(x) = f[2s(x)a(x)] = f[a(x)] + δf [s(x), a(x)], (66)

where

0 < δf [s(x), a(x)] <20

e20 − 2< 5 × 10−8.

As a result, we get

inf20≤a<40

f(a) < f(x) < sup20≤a<40

f(a) + 5 × 10−8 (67)

for x > 40. Thus, in order to complete the proof it onlyremains to find the infimum and supremum of f(x) over theinterval [20, 40). This is what we do in the rest of the proof.Define

f(x) �50∑

k=1

x2−ke−x2−k.

Using e−y < 1 − y2 for 0 ≤ y ≤ 1, we can write for a ≤ 251

0 < f(a) − f(a) =∞∑

k=51

a2−ke−a2−k

<

∞∑

k=51

a2−k(1 − a2−k)

= a∞∑

k=51

2−k − a2∞∑

k=51

2−2k−1

= 2−50 × a − 1

6× 2−100 × a2,

and hence we have

0 < f(a) − f(a) < 2−50 × 40 − 1

6× 2−100 × 402 < 4 × 10−13

(68)for 20 ≤ a < 40. From (67) and (68) we can write

inf20≤a<40

f(a) < f(x) < sup20≤a<40

f(a) + 5 × 10−8 + 4 × 10−13

for x > 40. On the other hand, since f(a) is a finite series, itis possible to compute its value for any given a. Computation4

shows that

1.44268073 < f(a) < 1.44270930 (69)

for 20 ≤ a ≤ 40, and the proof is complete.Fig. 4 shows f(x) for 20 ≤ x ≤ 40.Remark: Fig. 4 shows that log e is in the middle

of the bounds in (69). Hence, f(x) for x > 20 can beapproximated by

log e � 1.44269504

with error less than 1.5 × 10−5.

4See the Remark at the end of Appendix B.


Fig. 4. The value of f(x) for 20 ≤ x ≤ 40.

Also, it can be approximated by limk→∞ f(xk) for anysequence {xk} in [20,∞), with error less than 3 × 10−5.In particular, since f(2x) = xe−x + f(x), we have

limm→∞ f(2m) = f(1) + e−1 +

∞∑

k=1

2ke−2k

= e−1 +∞∑

k=1

(2ke−2k + 2−ke−2−k

)

� 1.442704207.

Therefore, f(x) for x > 20 can be approximated as

log e � e−1 +∞∑

k=1

(2ke−2k + 2−ke−2−k

)(70)

with error less than 3 × 10−5

Lemma 12: Consider the function

g(x) �∞∑

k=1

[1 −

(1 + x2−k

)e−x2−k

]− log x .

For x > 40, we have

−1.10996329 < g(x) < −1.10993448.Proof: The proof is quite similar to that of Lemma 11. Fora positive integer s we have g(2sa) = g(a) − δg(s, a) whereδg(s, a) �

∑s−1k=0(1 + a2k)e−a2k

. For a > ln 2 we can write

0 < δg(s, a) <

∞∑

k=0

(1 + a2k)e−a2k

<

∞∑

k=0

(1 + a2k)e−a(k+1)

= e−a∞∑

k=0

(e−a)k + ae−a∞∑

k=0

(2e−a)k

= e−a 1

1 − e−a+ ae−a 1

1 − 2e−a

<a + 1

ea − 2.

By the same definition of s(x) and a(x) in (65), for x > 40we have

g(x) = g[2s(x)a(x)] = g[a(x)] − δg[s(x), a(x)],

where

0 < δg[s(x), a(x)] <20 + 1

e20 − 2< 5 × 10−8.

As a result, we get

g[a(x)] − 5 × 10−8 < g(x) < g[a(x)], (71)

and thus,

inf20≤a<40

g(a)−5×10−8 < g(x) < sup20≤a<40

g(a) for x > 40.

(72)Thus, to complete the proof it is only required to find theinfimum and supremum of g(x) over the interval [20, 40).Define

g(x) �50∑

k=1

[1 −

(1 + x2−k

)e−x2−k

]− log x .

Noting that 1−(y+1)e−y > 0 for y > 0, we have g(x) > g(x).Moreover, since 1 − y < e−y for 0 ≤ y ≤ 1, we have for0 < a < 251

g(a) − g(a) =∞∑

k=51

[1 − (1 + a2−k)e−a2−k

]

≤∞∑

k=51

[1 − (1 + a2−k)(1 − a2−k)

]

=∞∑

k=51

a24−k

< 402 × 4

3× 4−51 < 5 × 10−28. (73)

Combining (72) and (73), we get

inf20≤a<40

g(a) − 5 × 10−8 < g(x) < sup20≤a<40

g(a) + 5 × 10−28

for x > 40. On the other hand, since g(a) is a finite series, itis possible to compute its value for any given a. Computation5

shows that

−1.10996324 < g(a) < −1.10993448 (74)

for 20 ≤ a ≤ 40, and the proof is complete.Fig. 5 shows g(x) for 20 ≤ x ≤ 40.Remark: Lemma 12 shows that the value of g(x) for x >

20 can be approximated by limk→∞ g(xk) for any sequence{xk} in [20,∞), with error less than 3 × 10−5. In particular,

5See the Remark at the end of Appendix B.


Fig. 5. The value of g(x) for 20 ≤ x ≤ 40.

since g(2x) = g(x) − (1 + x)e−x , we have

limm→∞ g(2m) = g(1) −

∞∑

k=0

(1 + 2k

)e−2k

= −2e−1 +∞∑

k=1

[

1 −(

1 + 2k)

e−2k

−(

1 + 2−k)

e−2−k]

= −e−1 +∞∑

k=1

(1 − e−2k − e−2−k

)

− e−1 −∞∑

k=1

(2ke−2k + 2−ke−2−k

)

Combining with (70), it is concluded that g(x) for x > 20 canbe approximated by

−e−1 +∞∑

k=1

(1 − e−2k − e−2−k

)− log e � −1.109947658456.

Lemma 13: We have

sup0≤t<1

h(t) < −0.58118073159.

Proof: Noting (52) and using (46), we can write

h(t) = t+10−14∑

i=0

ζ(t)2−i[1−2−i ln ζ(t)

]

= t+10−14∑

i=0

ζ(t)2−i +ln ζ(t)14∑

i=0

ζ(t)2−i2−i

= t+10−14∑

i=0

ζ(t)2−i +(

2t−4−2−14)

ln ζ(t). (75)

TABLE I

THE VALUES OF ti , ζ(ti ) AND h(ti ) FOR i = 1, 2, 3, 4

Thus the derivative of h(t) is given by

h′(t) = 1 −

14∑

i=0

2−iζ ′(t)ζ(t)2−i −1

+(

2t−4 − 2−14) ζ ′(t)

ζ(t)+ (ln 2)2t−4 ln ζ(t)

= 1 − ζ ′(t)ζ(t)

14∑

i=0

2−iζ(t)2−i

+(

2t−4 − 2−14) ζ ′(t)

ζ(t)+ (ln 2)2t−4 ln ζ(t)

= 1 + (ln 2) 2t−4 ln ζ(t) (76)

where (76) follows from (46). Noting (76), we have

h′(t0) = 0 ⇐⇒ ζ(t0) = �24−t0

where � � e− log e. On the other hand (46) implies

ζ(t0) = �24−t0 ⇐⇒ �(t0) = 0 (77)

where

�(t) � 2−14 − 2t−4 +14∑

i=0

2−i�24−t−i.

Thus the problem of determining sup0≤t≤1 h(t) reduces tonumerical computation of h(t) at t1 = 0, t2 = 1 and theroots of �(t) in the interval [0, 1].

Our computation (with 100 digits precision) shows that h(t)has two roots in [0, 1], namely t3 and t4. The value of ζ(ti )and h(ti ) for i = 1, 2, 3 and 4 are computed with 100 digitsprecision, and presented in Table I. It can be seen that

sup0≤t≤1

h(t) = maxt∈{t1,t2,t3,t4}

h(t) = h(t3) < −0.58118073159

and the proof is complete.Remark: In Appendix B, we found the infimum, the

supremum or the roots of the functions f(x), g(x), ϕt (x),�(t) and h(t) over a continuous interval of real numbersby computation. Since the intervals are continuous, it is notpossible to compute the function values at all points of theinterval, and unavoidably, we select some points in eachinterval and compute the function value at these points. Thecrucial issue is that we should do this selection so that theinfimum/supremum/root of the functions are computed withsufficient precision. In order to guarantee that the computedinfimum/supremum/root is sufficiently close to its true realvalue, we used the following approach.Consider the case that we need to find the infimum of a func-tion r(x) on a continuous interval [a, b], i.e., infa≤x≤b r(x).6

6This approach is applicable for finding the supremum and the roots of afunction as well.


Suppose that there exists nonnegative function r(x) suchthat

|r(x) − r(y)| ≤ |y − x |.r(x) for all x, y ∈ [a, b].Also, assume that we have the values of r(x) at m + 1 pointsx0, x1, . . . , xm where xi = a + iδ for 0 ≤ i ≤ m and δ = b−a

m .Then, it is not hard to show that

min0≤k≤m

[r(xk) − δr(xk)] ≤ infa≤x≤b

r(x) ≤ min0≤k≤m

r(xk).

Moreover, infa≤x≤b

r(x) occurs at some points in

⋃

j∈{0,1,...,m} s.t .r(x j )−δr(x j )≤ min

0≤k≤mr(xk )

[x j , x j + δ).

We can achieve arbitrary precision by iteratively choosingappropriate interval [a, b] and δ. The above discussion canbe generalized to multivariate functions.

For the functions f(x), g(x), ϕt (x), �(t) and h(t), it can beshown that ∣

∣∣f(x + x) − f(x)

∣∣∣ ≤ 2x (78)

for x ≥ 0 and 0 < x ≤ 1;

|g(x + x) − g(x)| ≤ 3x (79)

for x ≥ 0 and 0 < x ≤ 1;

|�(t + t) − �(t)| ≤ t (80)

for 0 ≤ t ≤ 1 and 0 < t ≤ log log e;

|ϕt+t(x + x) − ϕt (x)| ≤ 2t−4t + 4

3

x

x(81)

for 0 < x ≤ 1, 0 < x ≤ x and 0 < t ≤ log log e; and

|H(t + t, z + z) − H(t, z)| ≤ t (1 − ln z

8) + 2.2

z

z(82)

for 0 < t ≤ log log e, 0 < z ≤ z, where

H(t, z) � t + 10 −14∑

i=0

z2−i + (2t−4 − 2−14) ln z

and h(t) = H (t, ζ(t)). Using the above inequalitiesand the mentioned approach, we computed theinfimum/supremum/root of the aforementioned functionsto sufficient precision.

REFERENCES

[1] D. A. Huffman, “A method for the construction of minimum-redundancycodes,” in Proc. IRE, Sep. 1952, pp. 1098–1101.

[2] A. D. Santis and R. M. Capocelli, “Tight upper bounds on the redun-dancy of Huffman codes,” IEEE Trans. Inf. Theory, vol. 35, no. 1,pp. 1084–1091, Sep. 1989.

[3] R. M. Capocelli and A. D. Santis, “New bounds on the redundancy ofHuffman codes,” IEEE Trans. Inf. Theory, vol. 37, no. 4, pp. 1095–1104,Jul. 1991.

[4] D. Manstetten, “Tight bounds on the redundancy of Huffmancodes,” IEEE Trans. Inf. Theory, vol. 38, no. 1, pp. 144–151,Jan. 1992.

[5] C. Ye and R. W. Yeung, “A simple upper bound on the redundancy ofHuffman codes,” IEEE Trans. Inf. Theory, vol. 48, no. 7, pp. 2132–2138,Jul. 2002.

[6] F. Cicalese and U. Vaccaro, “Bounding the average length of optimalsource codes via majorization theory,” IEEE Trans. Inf. Theory, vol. 50,no. 4, pp. 633–637, Apr. 2004.

[7] S. Mohajer, P. Pakzad, and A. Kakhbod, “Tight bounds on the redun-dancy of Huffman codes,” IEEE Trans. Inf. Theory, vol. 58, no. 11,pp. 6737–6746, Nov. 2012.

[8] H. Narimani, M. Khosravifard, and T. A. Gulliver, “How suboptimalis the Shannon code?” IEEE Trans. Inf. Theory, vol. 59, no. 1,pp. 458–471, Jan. 2013.

[9] M. Khosravifard, H. Saidi, M. Esmaeili, and T. A. Gulliver, “Theminimum average code for memoryless monotone sources,” IEEE Trans.Inf. Theory, vol. 53, no. 1, pp. 955–975, Mar. 2008.

[10] H. Narimani, M. Khosravifard, and T. A. Gulliver, “Near-optimalityof the minimum average redundancy code for almost all monotonesources,” IEICE Trans. Fundam., vol. 94, no. 11, pp. 2092–2096,Nov. 2011.

[11] R. Gallager and D. V. Voorhis, “Optimal source codes for geometri-cally distributed alphabets,” IEEE Trans. Inf. Theory, vol. 21, no. 2,pp. 228–230, Mar. 1975.

[12] W. Szpankowski, “Asymptotic average redundancy of Huffman (andother) block codes,” IEEE Trans. Inf. Theory, vol. 46, no. 7,pp. 2434–2443, Mar. 2000.

[13] L. Devroye, Non-Uniform Random Variate Generation, 1st ed.New York, NY, USA: Springer-Verlag, 1986.

[14] G. O. H. Katona and T. O. H. Nemetz, “Huffman code and self-information,” IEEE Trans. Inf. Theory, vol. 22, no. 3, pp. 337–340,May 1976.

[15] W. Rudin, Principles of Mathematical Analysis, 3rd ed. New York, NY,USA: McGraw-Hill, 1976.

Hamed Narimani received the B.Sc. and M.Sc. degrees in 2002 and 2004from Sharif University of Technology, Iran, and the M.Sc. and Ph.D. degreesin 2009 and 2013 from Isfahan University of Technology, Iran, all in electricalengineering. He is currently an Assistant Professor with the Department ofElectrical and Computer Engineering, Isfahan University of Technology. Hismain research interests are information theory and smart grid.

Mohammadali Khosravifard (M’07) received the B.Sc., M.Sc., and Ph.D.degrees in electrical engineering from Shiraz University, Sharif University ofTechnology, and Isfahan University of Technology, Iran, in 1996, 1998, and2004, respectively. Since September 2004, he has been with the Departmentof Electrical and Computer Engineering, Isfahan University of Technology,where he is currently an associate professor. His research interests includeinformation theory and image processing.

huffman redundancy for large alphabet sources

Documents