large deviations performance of interval algorithm for random … · first, we show that the length...
TRANSCRIPT
Large Deviations Performance of Interval
Algorithm for Random Number Generation
Akisato KIMURA ∗
[email protected] UYEMATSU∗
February 22, 1999
No. AK-TR-1999-01
Abstract
We investigate large deviations performance of interval algorithm forrandom number generation. First, we show that the length of input sequenceper the length of output sequence approaches to the ratio of entropies ofinput and output distributions almost surely.
Next, we investigate large deviations performance especially for intrinsicrandomness. We show that the length of output fair random bits per inputsample approaches to the entropy of the input source almost surely, and wecan determine the exponent in this case.
Further, we consider to obtain the fixed number of fair random bits fromthe input sequence with fixed length. We show that the approximation errormeasured by the variational distance and divergence vanishes exponentiallyas the length of input sequence tends to infinity, if the number of outputrandom bits per input sample is below the entropy of the source. Contrarily,the approximation error measured by the variational distance approaches totwo exponentially and the approximation error measured by the divergenceapproaches to infinity linearly, if the number of random bits per input sampleis above the entropy of the source.
∗Dept. of Electrical and Electronic Eng., Tokyo Institute of Technology, 2-12-1Ookayama, Meguro-ku, Tokyo 152-8552, Japan
1
I. Introduction
Random number generation is a problem of simulating some prescribedtarget distribution by using a given source. This problem has been inves-tigated in computer science, and has a close relation to information theory[1, 2, 3]. Some practical algorithms for random number generation havebeen proposed so far, i.e. [1, 3, 4, 5]. In this paper, we consider the intervalalgorithm proposed by Han and Hoshi [3].
Performance of the interval algorithm has already been investigated in[3, 6, 7]. Han and Hoshi [3] have showed that the expected length of inputsequence per the length of output sequence can be characterized by the ratioof entropies of the input and output distributions. Uyematsu and Kanaya[6] have investigated large deviations performance of the interval algorithmwhere the distribution of input source is uniform. Further, Uchida and Han[7] have extended the result of Uyematsu and Kanaya to stationary ergodicMarkov process. We investigate large deviations performance, where theinput and output distributions is both non-uniform.
First, we show that the length of input sequence per the length of outputsequence approaches to the ratio of entropies of input and output distribu-tions almost surely.
Next, we investigate large deviations performance especially for intrinsicrandomness. We show that the length of output fair random bits per inputsample approaches to the entropy of the input source almost surely, and wecan determine the exponent in this case.
Further, we consider to obtain the fixed number of fair random bits fromthe input sequence with fixed length. We show that the approximation errormeasured by the variational distance and divergence vanishes exponentiallyas the length of input sequence tends to infinity, if the number of randombits per input sample is below the entropy of the source. Contrarily, theapproximation error measured by the variational distance approaches totwo exponentially and the approximation error measured by the divergenceapproaches to infinity linearly, if the number of random bits per input sampleis above the entropy of the source.
II. Basic Definitions
(a) Discrete Memoryless SourceLet X be a finite set. We denote by M(X ) the set of all probability
distributions on X . Throughout this paper, by a source X with alphabet X ,we mean a discrete memoryless source (DMS) of distribution PX ∈ M(X ).To denote a source, we will use both notations X and PX interchangeably.
2
For random variable X which has a distribution PX , we shall denote thisentropy as H(PX) and H(X), interchangeably.
H(PX)�= −
∑x∈X
PX(x) log PX(x).
Further, for arbitrary distributions P,Q ∈ M(X ), we denote by D(P ‖ Q)the information divergence
D(P ‖ Q)�=∑x∈X
P (x) logP (x)Q(x)
.
Lastly, we denote by d(P,Q) the variational distance or l1 distance betweentwo distributions P and Q on X
d(P,Q)�=∑x∈X
|P (x) − Q(x)|.
From now on, all logarithms and exponentials are to the base two.
(b) Type of SequenceThe type of a sequence x ∈ X n is defined as a distribution Px ∈ M(X ),
where Px(a) is given by
Px(a) =1n· (number of occurrences of a ∈ X in x). (1)
We shall write Pn or P for the set of types of sequences in X n. We denotedby T n
P or TP the set of sequences of type P in X n. On the contrary fora distribution P ∈ M(X ), if TP �= ∅ then we denote by P the type ofsequences in X n.
We introduce some well-known facts, cf. Csiszar-Korner [8]: For the setof types in X n, we have
|Pn| ≤ (n + 1)|X | (2)
where | · | denotes the cardinality of the set. For the set of sequences of typeP in X n,
(n + 1)−|X | exp(nH(P )) ≤ |TP | ≤ exp(nH(P )) (3)
If x ∈ TP , we then have
Qn(x) = exp[−n{D(P ‖ Q) + H(P )}]. (4)
From (3)(4),
(n + 1)−|X | exp(−nD(P ‖ Q)) ≤ Qn(TP ) ≤ exp(−nD(P ‖ Q)) (5)
3
(c) Intrinsic RandomnessIn this paper, we especially investigate the problem to generate a uniform
random number with as large size as possible from a general source X ={Xn}∞n=1. This problem is called intrinsic randomness problem [9]. Here,we shall introduce basic definitions and a result for intrinsic randomnessproblem.
Definition 1: For arbitrary source X = {Xn}∞n=1, rate R is achievableIntrinsic Randomness (IR) rate if and only if there exists a map ϕn : X n →UMn such that
lim infn→∞
1n
log Mn ≥ R
limn→∞ d(UMn , ϕn(Xn)) = 0,
where UMn
�= {1, 2, · · · ,Mn} and UMn is a uniform distribution on UMn .
Definition 2 (sup achievable IR rate):
S(X) = sup{R | R is achievable IR rate}
As for the characterization of IR rate, Vembu and Verdu [2] proved thefollowing fundamental theorem.
Theorem 1: For any stationary source X,
S(X) = H(X) (6)
where H(X) is the entropy rate of X.
III. Interval Algorithm
In this chapter, we introduce the interval algorithm for random numbergeneration, proposed by Han and Hoshi [3].
Let us consider to produce an i.i.d. random sequence Y n = (Y1, Y2, · · · , Yn).Each random variable Yi (i = 1, 2, · · · , n) is subject to a generic distributionq = (q1, q2, · · · , qN ). We generate this sequence by using an i.i.d. randomsequence X1,X2, · · · , with a generic distribution p = (p1, p2, · · · , pM ).
Interval Algorithm for Generating Random Process
4
1a) Partition an unit interval [0, 1) into N disjoint subintervalJ(1), J(2), · · · , J(N) such that
J(i) = [Qi−1, Qi) i = 1, 2, · · · , N
Qi =i∑
k=1
qk i = 1, 2, · · · , N ; Q0 = 0.
1b) Set
Pj =j∑
k=1
pk j = 1, 2, · · · ,M ; P0 = 0.
2) Set s = t = λ (null string), αs = γt = 0, βs = δt = 1, I(s) = [αs, βs),J(t) = [γt, δt), and m = 1.
3) Obtain an output symbol from the source X to have a value a ∈{1, 2, · · · ,M}, and generate the subinterval of I(s)
I(sa) = [αsa, βsa)
where
αsa = αs + (βs − αs)Pa−1
βsa = αs + (βs − αs)Pa.
4a) If I(sa) is entirely contained in some J(ti) (i = 1, 2, · · · , N), thenoutput i as the value of the mth random number Ym and set t = ti.Otherwise, go to 5).
4b) If m = n then stop the algorithm. Otherwise, partition the intervalJ(t) ≡ [γt, δt) into N disjoint subinterval J(t1), J(t2), · · · , J(tN) suchthat
J(tj) = [γtj , δtj) j = 1, 2, · · · , N
where
γtj = γt + (δt − γt)Qj−1
δtj = γt + (δt − γt)Qj
and set m = m + 1 and go to 4a).
5
5) Set s = sa and go to 3).
Han and Hoshi have shown that
limn→∞
E(L)n
=H(p)H(q)
, (7)
where E(L) is the average length of input sequence to obtain output se-quences of length n.
IV. Almost Sure Convergence Theorem
We shall investigate large deviations performance of the interval algo-rithm for random number generation.
Let us consider to produce an i.i.d. random sequence Y n = (Y1, Y2, · · · , Yn).Each random variable Yi (i = 1, 2, · · · , n) is subject to a generic distributionPY on Y. We generate this sequence by using an i.i.d. random sequenceX1,X2, · · · with a generic distribution PX on X . We denote by Tn(x,y) thelength of input sequence x ∈ X nR necessary to generate y ∈ Yn. Then, weobtain the following theorem:
Theorem 2:lim
n→∞1n
Tn(X,Y n) =H(Y )H(X)
a.s. (8)
Before the proof of theorem, we shall give a necessary definition andsome lemmas for strongly typical sequence [14].
Definition 3: Let Px be a type of the sequence x ∈ X n. For δ > 0 and adistribution P ∈ M(X ), if Px satisfies D(Px ‖ P ) ≤ δ then we call x ∈ X n
P -typical sequence or strongly typical sequence. Further, when a randomvariable X with alphabet X has the distribution P , we also call x ∈ X n
X-typical sequence.We shall write T n
δ (P ) or Tδ(P ) for the set of P -typical sequences in X n,and T n
δ (X) or Tδ(X) for the set of X-typical sequences in X n.
Lemma 1: For every 0 < δ ≤ 18 , if x ∈ T n
δ (P ) then∣∣∣∣− 1n
log Pn(x) − H(P )∣∣∣∣ ≤ γ′
n (9)
where
γ′n = δ −
√2δ log
√2δ
|X | . (10)
6
Lemma 2: Suppose that a sequence {δn} satisfies limn→∞δn = 0, lim
n→∞nδn =
∞. For every P ∈ M(X ),
Pn(Tδn(P )) ≥ 1 − εn (11)
where
εn = exp
{− n
(δn − |X | log(n + 1)
n
)}. (12)
Proof of Theorem 2:Suppose that a sequence {δn} satisfies lim
n→∞δn = 0 and limn→∞nδn = ∞.
Due to the nature of the interval algorithm, we can correspond eachy ∈ Yn to a distinct subinterval J(y) of [0, 1) with its width Pn
Y (y). On theother hand, we can also correspond each x ∈ X nR to I(x) with its widthPnR
X (x). Then, each subinterval I(x) corresponds to the input sequence.If the subinterval I(x) is included in some J(y), then the input sequencecorresponding to I(x) can terminate the algorithm.
(a)achievable partAssume that we don’t have to terminate the algorithm for x /∈ TδnR
(X).
7
Then, by using Lemma 1,2 and (2)-(5), we obtain
Pr{
1n
Tn(X,Y n) ≥ R
}≤
∑y∈Yn:
P nY (y)≥ min
x∈TδnR(X)
P nRX (x)
2 maxx∈TδnR
(X)PnR
X (x)
+∑
y∈Yn:
P nY (y)≤ min
x∈TδnR(X)
P nRX (x)
PnY (y) +
∑x∈XnR:
x∈TδnR(X)
PnRX (x)
≤∑
Q∈Pn:D(Q‖PY )+H(Q)≤R(H(X)+γ′
n)
2 exp{−n(R H(X) − H(Q) − Rγ′n)}
+∑
Q∈Pn:D(Q‖PY )+H(Q)≥R(H(X)+γ′
n)
exp(−nD(Q ‖ PY )) + εn
≤∑
Q∈Pn:D(Q‖PY )+H(Q)≤R(H(X)+γ′
n)
2 exp{−n(R H(X) − H(Q) − Rγ′n)}
+∑
Q∈Pn:D(Q‖PY )+H(Q)≥R(H(X)+γ′
n)
exp{−n(D(Q ‖ PY ) − 2Rγ′n)} + εn
≤∑
Q∈Pn
2 exp[−n{D(Q ‖ PY ) − 2Rγ′n
+|R H(X) − H(Q) − D(Q ‖ PY ) + Rγ′n|+}] + εn
≤ 2(n + 1)|Y| exp[−n minQ∈M(Y)
{D(Q ‖ PY ) − 2Rγ′n
+|R H(X) − H(Q) − D(Q ‖ PY ) + Rγ′n|+}] + εn
where |x|+ = max{0, x}. Here, let be
Er(R,PX , PY ) = minQ∈M(X )
{D(Q ‖ PY ) + |R H(X) − H(Q) − D(Q ‖ PY )|+}.
Er(R,PX , PY ) = 0 if and only if Q = PY and R H(X) ≤ H(Q), i.e. R ≤H(Y )H(X) . This implies that Er(R,PX , PY ) > 0 if and only if R > H(Y )
H(X) . From
this for every δ > 0, there exists a sufficiently large n0 such that γ′n < δH(X)
H(X)H(Y )
+δ
8
for all n ≥ n0. Therefore, for all n ≥ n0 we can see
minQ∈M(X )
{D(Q ‖ PY ) − 2Rγ′n + |R H(X) − H(Q) − D(Q ‖ PY ) + Rγ′
n|+}∣∣∣∣R= H(Y )
H(X)+δ
= minQ∈M(Y)
{D(Q ‖ PY ) − 2
(H(Y )H(X)
+ δ
)γ′
n
+∣∣∣∣H(Y ) − H(Q) + δH(X) − D(Q ‖ PY ) +
(H(Y )H(X)
+ δ
)+ γ′
n
∣∣∣∣+}
> 0
Hence,∞∑
n=1
Pr{
1n
Tn(X,Y n) − H(Y )H(X)
≥ δ
}< ∞ (13)
(b)converse partIf n1 is sufficiently large, we can δn ≤ 1
8 for all n ≥ n1. Thus fromLemma 1, if x ∈ TδnR
(X) then
exp{−nR(H(X) + γ′nR)} ≤ PnR
X (x) ≤ exp{−nR(H(X) − γ′nR)}.
Let N(XnR) be an integer such that
exp{nR(H(X) + 2γ′nR)} ≤ N(XnR) ≤ exp{nR(H(X) + 3γ′
nR)}.It is easy to see PnR
X (x) > 1N(XnR)
for all x ∈ TδnR(X). Assume that all
x /∈ TδnR(X) stop the algorithm, by using Lemma 1,2 and (2)-(5) we obtain
Pr{
1n
Tn(X,Y n) ≤ R
}≤
∑y∈Yn:
P nY (y)≥ 1
N(XnR)
PnY (y) +
∑x∈XnR:
x/∈TδnR(X)
PnRX (x)
≤∑
Q∈Pn:H(Q)+D(Q‖PY )≤ 1
nlog(N(XnR))
exp(−nD(Q ‖ PY )) + εn
≤∑
Q∈Pn:H(Q)+D(Q‖PY )≤R(H(X)+3γ′
n)
exp(−nD(Q ‖ PY )) + εn
≤ (n + 1)|Y| exp{−n minQ∈M(Y):
H(Q)+D(Q‖PY )≤R(H(X)+3γ′n)
D(Q ‖ PY )} + εn.
9
Here, let be
F (R,PX , PY ) = minQ∈M(Y):
H(Q)+D(Q‖PY )≤R H(X)
D(Q ‖ PY ).
F (R,PX , PY ) = 0 if and only if Q = PY and H(Q) ≤ R H(X), i.e. R ≥H(Y )H(X) . This implies that F (R,PX , PY ) > 0 if and only if R < H(Y )
H(X) . Fromthis, for every δ > 0 there exists a sufficiently large n2 satisfying γ′
n <δH(X)
3“
H(Y )H(X)
−δ” for all n ≥ n2. Therefore, for all n ≥ n2 we can see
minQ∈M(Y):
H(Q)+D(Q‖PY )≤R(H(X)+3γ′n)
D(Q ‖ PY )
∣∣∣∣∣R=
H(Y )H(X)
−δ
= minQ∈M(Y):
H(Q)+D(Q‖PY )≤H(Y )−δH(X)+3γ′n
“H(Y )H(X)
−δ”D(Q ‖ PY )
> 0
Hence,∞∑
n=1
Pr{
1n
Tn(X,Y ) − H(Y )H(X)
≤ −δ
}< ∞ (14)
From (13)(14), by using the Borel-Cantelli’s principle (e.g. [10]) we canobtain (8). �
Contrarily, let us consider to generate an i.i.d. random sequence Y1, Y2, · · ·by using an i.i.d. random sequence Xn = (X1,X2, · · · ,Xn). We denote byLn(Xn, Y ) the length of the generated sequence. Then from Theorem 2, weimmediately obtain the following corollary.
Corollary 1:
limn→∞
1n
Ln(Xn, Y ) =H(X)H(Y )
a.s. (15)
V. Almost Sure Convergence of Number of Fair Bits per InputSample
In above chapter, we showed that the length of input sequence per thelength of output sequence converges to the ratio of entropies of the inputand output distributions almost surely.
To investigate asymptotic properties, we consider more restricted case.Let us consider to produce a sequence of fair bits by using an i.i.d. random
10
sequence Xn = (X1,X2, · · · ,Xn) of length n. Each random variable Xi (i =1, 2, · · · , n) is subject to a generic distribution PX on X . We denote by Ln(x)the number of generated fair bits from the input sequence x ∈ X n. Here,we define the following functions:
Er(R,PX) = minQ∈M(X ):H(Q)≤R
D(Q ‖ PX), (16)
Esp(R,PX) = minQ∈M(X ):
D(Q‖PX)+H(Q)≤R
D(Q ‖ PX), (17)
F (R,PX ) = minQ∈M(X ):
D(Q‖PX)+H(Q)≥R
D(Q ‖ PX), (18)
G(R,PX ) = minQ∈M(X ):H(Q)≥R
D(Q ‖ PX). (19)
Then, we obtain the following large deviations performances of interval al-gorithm:
Theorem 3: For R > 0,
lim infn→∞
[− 1
nlog Pr
{1n
Ln(Xn) ≤ R
}]≥ Er(R,PX ). (20)
For R > Rmin = −maxx∈X log PX(x)
lim supn→∞
[− 1
nlog Pr
{1n
Ln(Xn) ≤ R
}]≤ Esp(R,PX). (21)
Further, Er(R,PX) > 0 if and only if R < H(X), Esp(R,PX) > 0 if andonly if Rmin < R < H(X), and Er(R,PX) < Esp(R,PX ) for R < H(X).
Proof: We can show this theorem in a similar manner as Theorem 2. Wecan correspond each x ∈ X n to a distinct subinterval I(x) of [0, 1) with itswidth Pn
X(x). Partition a unit interval [0, 1) into exp(nR) subintervals
Ji�= [(i − 1) exp(−nR), i exp(−nR)) i = 1, 2, · · · , exp(nR).
First, we shall show (20). The number of input sequences not to stop
11
the algorithm is not more than exp(nR). Then we obtain
Pr{
1n
Ln(Xn) ≤ R
}≤
∑Q∈Pn:x∈TQ
min(|TQ|, exp(nR)
)Pn
X(x)
≤∑
Q∈Pn
min[1, exp{−n(H(Q) − R)}
]exp(−nD(Q ‖ PX))
=∑
Q∈Pn
exp[−n{D(Q ‖ PX) + |H(Q) − R|+}]
≤ (n + 1)|X | exp[−n min
Q∈M(X ){D(Q ‖ PX) + |H(Q) − R|+}
]which implies
lim infn→∞
[− 1
nlog Pr
{1n
Ln(Xn) ≤ R
}]≥ min
Q∈M(X ){D(Q ‖ PX) + |H(Q) − R|+}.
Note that D(Q ‖ PX) + H(Q) − R [resp. H(Q)] is a linear (resp. convex)function of Q. Then,
minQ∈M(X ):H(Q)≥R
{D(Q ‖ PX) + |H(Q) − R|+}
= minQ∈M(X ):H(Q)=R
{D(Q ‖ PX) + |H(Q) − R|+}
= minQ∈M(X ):H(Q)=R
D(Q ‖ PX).
Hence, we obtain (20).Er(R,PX) = 0 if and only if Q = PX and H(Q) ≤ R, i.e. R ≥ H(X).
This implies that Er(R,PX) > 0 if and only if R < H(X).
12
Next, we show (21). We have
Pr{
1n
Ln(Xn) ≤ R
}≥
∑x∈Xn:
P nX(x)≥exp(−nR)
PnX(x)
≥∑
Q∈Pn:H(Q)+D(Q‖PX)≤R
(n + 1)−|X | exp(−nD(Q ‖ PX))
≥ (n + 1)−|X | exp{−n minQ∈Pn:
H(Q)+D(Q‖PX)≤R
D(Q ‖ PX)}
which implies (21) for R > Rmin. It should be noted that the minimum of(17) is taken over the non-empty set of Q if R > Rmin.
Esp(R,PX) = 0 if and only if Q = PX and H(Q) ≤ R, i.e. R ≥ H(X).This implies that Esp(R,PX) > 0 if and only if Rmin < R < H(X). �
Theorem 4: For 0 < R < Rmax = −minx∈X
log PX(x),
lim infn→∞
[− 1
nlog Pr
{1n
Ln(Xn) ≥ R
}]≥ F (R,PX ). (22)
For 0 < R < log |X |,
lim supn→∞
[− 1
nlog Pr
{1n
Ln(Xn) ≥ R
}]≤ G(R,PX ). (23)
Further, F (R,PX) > 0 if and only if H(X) < R < Rmax, G(R,PX ) > 0 ifand only if H(X) < R < log |X |, and F (R,PX) < G(R,PX) for R > H(X).
Proof: We can show this theorem in a similar manner as the proof of The-orem 3. First, we shall show (22). We have
Pr{
1n
Ln(Xn) ≥ R
}≤
∑x∈Xn:
P nX(x)≤exp(−nR)
PnX(x)
≤∑
Q∈Pn:H(Q)+D(Q‖PX)≥R
exp(−nD(Q ‖ PX))
≤ (n + 1)|X | exp{−nF (R,PX)}
13
which implies (22) for R < Rmax. It should be noted that the minimum of(18) is taken over the non-empty set of Q if R < Rmax.
F (R,PX ) = 0 if and only if Q = PX and H(Q) ≥ R, i.e. R ≤ H(X).This implies that F (R,PX ) > 0 if and only if H(X) < R < Rmax.
Next, we show (23). We have
Pr{
1n
Ln(Xn) ≥ R
}≥
∑Q∈Pn:
x∈TQ, |TQ|≥2 exp(nR)
12|TQ|Pn
X(x)
≥ 12(n + 1)−|X | ∑
Q∈Pn:|TQ|≥2 exp(nR)
exp(−nD(Q ‖ PX))
≥ 12(n + 1)−|X | ∑
Q∈Pn:H(Q)≥R+ 1
nlog 2(n+1)|X|
exp(−nD(Q ‖ PX))
≥ 12(n + 1)−|X | exp{−n min
Q∈Pn:H(Q)≥R+ 1
nlog 2(n+1)|X|
D(Q ‖ PX)}.
By the continuity of divergence, we can obtain (23) for R < log |X |. Itshould be noted that the minimum of (19) is taken over the non-empty setof Q if R < log |X |.
G(R,PX ) = 0 if and only if Q = PX and H(Q) ≥ R, i.e. R ≤ H(X).This implies that G(R,PX ) > 0 if and only if H(X) < R < log |X |. �
Remark 1:Let us consider to produce a specified number of fair bits by using a
sequence from the source X. We denote by Tn(X) the length of inputsequence to obtain fair bits of length n. Then, we obtain similar relationsas (20)-(23). For example, corresponding to (20), we have
lim infn→∞
[− 1
nlog Pr
{1n
Tn(X) ≥ R
}]≥ Er(R,PX) (24)
whereEr(R,PX) = min
Q∈M(X ):H(Q)≤1/R
R D(Q ‖ PX). (25)
VI. Error Exponent for Intrinsic Randomness
14
In this chapter, let us consider to produce fixed number of random bitswith an input sequence of length n. In this case, we cannot generate fairbits exactly but approximately.
First, we modify the interval algorithm for generating random processso that the algorithm outputs a specified sequence 11 · · · 1 ∈ YnR wheneverthe algorithm does not stop with an input sequence of length n, whereY = {0, 1}. The modified algorithm can be described below.
Modified Interval Algorithm for Generating Fair Bits with Fixed Input Length
1a) Partition an unit interval [0, 1) into disjoint subinterval J(0), J(1) suchthat
J(i) =[12i,
12(i + 1)
)i = 0, 1.
1b) Set
Pj =j∑
k=1
pk j = 1, 2, · · · ,M ; P0 = 0.
2) Set s = t = λ (null string), αs = γt = 0, βs = δt = 1, I(s) = [αs, βs),J(t) = [γt, δt), l = 0, and m = 1.
3) If l = n then output 11 · · · 1 as the output sequence Y nR, and stop thealgorithm. Otherwise obtain an input symbol from the source X tohave a value a ∈ {1, 2, · · · ,M}, and generate the subinterval of I(s)
I(sa) = [αsa, βsa)
where
αsa = αs + (βs − αs)Pa−1
βsa = αs + (βs − αs)Pa,
and set l = l + 1.
4a) If I(sa) is entirely contained in some J(ti) (i = 0, 1), then set t = ti.Otherwise, go to 5).
4b) If m = nR then output t as the output sequence Y nR, and stop thealgorithm. Otherwise, partition the interval J(t) ≡ [γt, δt) into disjointsubinterval J(t0), J(t1) such that
J(tj) = [γtj , δtj) j = 0, 1
15
where
γtj = γt +12j(δt − γt)
δtj = γt +12(j + 1)(δt − γt),
and set m = m + 1 and go to 4a).
5) Set s = sa and go to 3).
(a) Approximation Error by Variational DistanceWe first measured the approximation error by the variational distance
between the desired and approximated output distribution. Then, we obtainthe following theorems:
Theorem 5: If the modified interval algorithm is used, then we have
lim infn→∞
[− 1
nlog d
(Uexp(nR), P
nRY
) ] ≥ Er(R,PX), (26)
where Uexp(nR) is a uniform distribution on Uexp(nR) = {1, 2, · · · , exp(nR)},PnR
Y denote the output distribution of the modified interval algorithm, andEr(R,PX ) is given by (16). Further, for R > Rmin,
lim supn→∞
[− 1
nlog d
(Uexp(nR), P
nRY
) ] ≤ Esp(R,PX) (27)
where Esp(R,PX) is given by (17).
Proof: First, we shall show (26). The number of input sequences to outputa specified sequence 11 · · · 1 ∈ YnR is not more than exp(nR). Then, we
16
obtain
d(Uexp(nR), PnRY ) =
∑y∈YnR
∣∣exp(−nR) − PnRY (y)
∣∣
=∑
y∈YnR:y �=11···1
∣∣exp(−nR) − PnRY (y)
∣∣+∣∣∣∣∣∣∣∣∣∑
y∈YnR:y �=11···1
(exp(−nR) − PnRY (y))
∣∣∣∣∣∣∣∣∣= 2
∑y∈YnR:y �=11···1
∣∣exp(−nR) − PnRY (y)
∣∣≤ 2
∑Q∈Pn:x∈TQ
min(|TQ|, exp(nR)
)Pn
X(x)
≤ 2(n + 1)|X | exp[−n min
Q∈M(X ){D(Q ‖ PX) + |H(Q) − R|+}
]which implies (26).
Next, we show (27). In a similar manner as the proof of (26), we obtain
d(Uexp(nR), PnR
Y
)= 2
∑y∈YnR:y �=11···1
∣∣exp(−nR) − PnRY (y)
∣∣≥ 2
∑x∈Xn:
P nX(x)≥exp(−nR)
PnX(x)
≥ 2(n + 1)−|X | exp{−n minQ∈Pn:
D(Q‖PX)+H(Q)≤R
D(Q ‖ PX)}
which implies (27) for R > Rmin. �
This theorem implies that if the length of output sequence per inputsample is below the entropy of the source, the approximation error measuredby the variational distance vanishes exponentially as the length of inputsequence tends to infinity, by using the modified interval algorithm.
Next theorem shows the upper bounds of the error exponent.
Theorem 6: Let PnRY denote a distribution on Uexp(nR) using any algorithm
for random number generation with fixed input length n. Then for R >Rmin,
lim supn→∞
[− 1
nlog d(Uexp(nR), PnR
Y )]≤ Esp(R,PX), (28)
17
where Esp(R,PX) is given by (17).
Proof: It should be noted that∣∣∣exp(−nR) − PnR
Y (y)∣∣∣ ≥ 1
2PnR
Y (y) if
PnRY (y) ≥ 2 exp(−nR). Then, we have
d(Uexp(nR), PnRY ) =
∑y∈YnR
∣∣∣exp(−nR) − PnRY (y)
∣∣∣≥
∑x∈Xn:
P nX(x)≥2 exp(−nR)
12Pn
X(x)
≥ 12(n + 1)−|X | ∑
Q∈Pn:D(Q‖PX)+H(Q)≤R− 1
n
exp(−nD(Q ‖ PX))
≥ 12(n + 1)−|X | exp{−n min
Q∈Pn:D(Q‖PX)+H(Q)≤R− 1
n
D(Q ‖ PX)}.
From the continuity of divergence, we can obtain (28). �
Note that Er(R,PX) < Esp(R,PX) for R < H(X). Hence, it is still anopen problem to obtain the exact error exponent of the proposed algorithm.
Next theorem shows the converse result.
Theorem 7: If the modified interval algorithm is used, then for R < Rmax,
lim infn→∞
[− 1
nlog
{2 − d(Uexp(nR), PnR
Y )} ] ≥ F (R,PX), (29)
where F (R,PX ) is given by (18). Further, for R < log |X |
lim supn→∞
[− 1
nlog
{2 − d(Uexp(nR), PnR
Y )} ] ≤ G(R,PX), (30)
where G(R,PX ) is given by (19).
Proof: First, we shall show (29). From the equality a + b = |a − b| +
18
2min(a, b), we obtain
2 − d(Uexp(nR), PnR
Y
)= 2 −
∑y∈YnR
∣∣exp(−nR) − PnRY (y)
∣∣= 2
∑y∈YnR
min(
exp(−nR), PnRY (y)
)= 2
∑y∈YnR:y �=11···1
PnRY (y) + 2 exp(−nR)
≤ 2∑
x∈Xn:P n
X(x)≤exp(−nR)
PnX(x) + 2 exp(−nR)
≤ 2(n + 1)|X | exp{−n minQ∈M(X ):
D(Q‖PX)+H(Q)≥R
D(Q ‖ PX)} + 2exp(−nR).
Now that F (H(X), PX ) = 0, F (R,PX ) is monotonously increasing for R ≥H(X) and
D(Q ‖ PX) =∑x∈X
Q(x) logQ(x)P (x)
≤ −∑x∈X
Q(x) log PX(x)
≤ −∑x∈X
Q(x) log minbx∈X PX(x)
= − log minbx∈X PX(x)
= Rmax,
then F (R,PX) < R for R < Rmax. Hence, from the convexity of divergencewe can obtain (29) for R < Rmax.
Next, we show (30). In a similar manner as the proof of (29), we have
2 − d(Uexp(nR), PnRY ) = 2
∑y∈YnR
min(
exp(−nR), PnRY (y)
)≥ 2
∑x∈Xn:
x∈TQ, |TQ|≥2 exp(nR)
12|TQ|Pn
X(x)
≥ (n + 1)−|X | exp{−n minQ∈Pn:
H(Q)≥R+ 1n
log 2(n+1)|X|
D(Q ‖ PX)}
19
which implies (30) for R < log |X |. �
This theorem implies that if the length of output sequence per inputsample is above the entropy of the source, the approximation error measuredby the variational distance approaches to two exponentially as the length ofinput sequence tends to infinity.
Next theorem was due to Ohama.
Theorem 8 [5]: Consider the optimum algorithm for random number gen-eration with fixed input length n, let PnR
Y denote the distribution on YnR
which minimizes the variational distance. Then, we have
limn→∞
[− 1
nlog
{2 − d(Uexp(nR), PnR
Y )}]
= F ′(R,PX) (31)
where
F ′(R,PX ) = minQ∈M(X )
{D(Q ‖ PX) + |R − H(Q) − D(Q ‖ PX)|+}. (32)
Further, F ′(R,PX) ≤ F (R,PX) and equality holds for R ≤ R0, where
R0�= D(U|X | ‖ PX) + log |X |. (33)
Proof: (a)converse partFor x ∈ X n such that Pn
X(x) ≥ exp(−nR), we assign x to a certainy ∈ YnR one by one. We shall denote the set of these y ∈ YnR by A. Also,for x ∈ X n such that Pn
X(x) ≤ exp(−nR), we assign as many x as possibleto a certain y ∈ Ac, on condition that the sum of probability of assigned xis not over the probability of y. We shall denote the set of these y ∈ Ac byB1. If there are some x to be corresponded to no y, we assign these x tosuitable y ∈ B1 one by one. We shall denote the set of these y ∈ B1 by B2.
20
Then, we have
2 − d(Uexp(nR), PnRY )
= 2∑
y∈YnR
min(
exp(−nR), PnRY (y)
)= 2
∑y∈A
exp(−nR) + 2∑
y∈B1∩Bc2
PnRY (y) + 2
∑y∈B2
exp(−nR)
≤ 2∑y∈A
exp(−nR) + 2∑
y∈B1
PnRY (y)
= 2∑
x∈Xn:P n
X(x)≥exp(−nR)
exp(−nR) + 2∑
x∈Xn:P n
X(x)≤exp(−nR)
PnX(x)
≤ 2∑
Q∈Pn:D(Q‖PX)+H(Q)≤R
exp{−n(R − H(Q))} + 2∑
Q∈Pn:D(Q‖PX)+H(Q)≥R
exp(−nD(Q ‖ PX))
= 2∑
Q∈Pn
exp[− n{D(Q ‖ PX) + |R − H(Q) − D(Q ‖ PX)|+}
]≤ 2(n + 1)|X | exp
[−n min
Q∈M(X ){D(Q ‖ PX) + |R − H(Q) − D(Q ‖ PX)|+}
]which implies
lim infn→∞
[− 1
nlog
{2 − d(Uexp(nR), PnR
Y )}]
≥ F ′(R,PX).
(b)achievable part(b-i) Suppose that R ≥ R0. We assign x ∈ X n such that Pn
X(x) ≥ exp(−nR)to y ∈ YnR one by one, and arbitrary for other x ∈ X n. Then, we have
2 − d(Uexp(nR), PnRY ) = 2
∑y∈YnR
min(
exp(−nR), PnRY (y)
)≥ 2
∑x∈Xn:
P nX(x)≥exp(−nR)
exp(−nR)
≥ 2(n + 1)−|X | ∑Q∈Pn:
D(Q‖PX)+H(Q)≤R
exp{−n(R − H(Q))}
≥ 2(n + 1)−|X | exp{−n minQ∈Pn:
D(Q‖PX)+H(Q)≤R
(R − H(Q))}.
21
By the way since R ≥ R0, from the convexity of R − H(Q) we have
minQ∈M(X ):
D(Q‖PX)+H(Q)≤R
(R − H(Q)) = R − log |X |.
Therefore, we obtain
minQ∈M(X )
{D(Q ‖ PX) + |R − H(Q) − D(Q ‖ PX)|+}
= min
⎡⎢⎣ minQ∈M(X ):
D(Q‖PX)+H(Q)≤R
(R − H(Q)), minQ∈M(X ):
D(Q‖PX)+H(Q)≥R
D(Q ‖ PX)
⎤⎥⎦= min
⎡⎢⎣R − log |X |, minQ∈M(X ):
D(Q‖PX)+H(Q)≥R
D(Q ‖ PX)
⎤⎥⎦ .
Here,
Q∗ = arg minQ∈M(X ):
D(Q‖PX)+H(Q)≥R
D(Q ‖ PX).
Then, we have
D(Q∗ ‖ PX) − (R − log |X |) ≥ R − H(Q∗) − (R − log |X |)≥ −H(Q∗) + log |X |≥ 0
which implies that
minQ∈M(X )
{D(Q ‖ PX) + |R − H(Q) − D(Q ‖ PX)|+} = R − log |X |
for R ≥ R0. Hence, we have
lim supn→∞
[− 1
nlog
{2 − d(Uexp(nR), P
nRY )
}]≤ F ′(R,PX)
for R ≥ R0.
(b-ii) Suppose that R < R0. We can select a type Q′ such that D(Q′ ‖PX) + H(Q′) ≥ R and Q′ minimizes D(Q′ ‖ PX) > 0. Then, we assign asmany x ∈ TQ′ as possible to y ∈ YnR, on condition that the sum of the
22
probability of assigned x is not over the probability of y, and arbitrary forx /∈ TQ′ . In this case, the number of x corresponding to a certain y is
k =⌊
exp(−nR)exp[−n{D(Q′ ‖ PX) + H(Q′)}]
⌋.
Thus, for a sufficiently large n0 and all n ≥ n0, the number of y to beassigned is upperbounded as follows.⌈
exp(nH(Q′))k
⌉≤ exp(nH(Q′))
k+ 1
≤ exp(nH(Q′))exp[−n{R − H(Q′) − D(Q′ ‖ PX)}] − 1
+ 1
≤ 2 exp(nH(Q′))exp[−n{R − H(Q′) − D(Q′ ‖ PX)}] + 1
= 2 exp[n{R − D(Q′ ‖ PX)}] + 1≤ exp(nR).
Therefore, we have
2 − d(Uexp(nR), PnRY )
= 2∑
y∈YnR
min(
exp(−nR), PnRY (y)
)≥ 2
∑x∈Xn:x∈TQ′
PnX(x)
≥ 2(n + 1)−|X | exp{−n minQ∈Pn:
D(Q‖PX)+H(Q)≥R
D(Q ‖ PX)}.
By the way, note that R−H(Q) [resp. D(Q ‖ PX)+ H(Q)] is convex (resp.linear) function of Q. Thus, for R < R0, min
Q∈M(X ):D(Q‖PX)+H(Q)≤R
(R − H(Q)) can
be attained at its boundary, that is,
minQ∈M(X ):
D(Q‖PX)+H(Q)≤R
(R − H(Q)) = minQ∈M(X ):
D(Q‖PX)+H(Q)=R
(R − H(Q))
= minQ∈M(X ):
D(Q‖PX)+H(Q)=R
D(Q ‖ PX).
23
This implies that for R < R0
minQ∈M(X )
{D(Q ‖ PX) + |R − H(Q) − D(Q ‖ PX)|+} = minQ∈M(X ):
D(Q‖PX)+H(Q)≥R
D(Q ‖ PX).
Hence, we have
lim supn→∞
[− 1
nlog
{2 − d(Uexp(nR), P
nRY )
}]≤ F ′(R,PX)
for R < R0.
From (a)(b-i)(b-ii), we obtain (31).
F ′(R,PX) = 0 if and only if Q = PX and R ≤ H(Q), i.e. R ≤ H(X).This implies that F ′(R,PX) > 0 if and only if R > H(X). �
Theorem 7 and 8 imply that the modified interval algorithm is not opti-mum if R ≥ R0.
(b) Approximation Error by DivergenceNext, we shall consider to measure the approximation error by the di-
vergence between the desired and approximated output distribution. First,we show the following lemma.
Lemma 3 Let Pn, Qn be arbitrary distributions on X n. If
d(Pn, Qn) ≤ ε,
then
D(Pn ‖ Qn) ≤ −ε log PnminQn
min,
where Pnmin (resp. Qn
min) is the minimum of Pn (resp. Qn) on X n.
Proof: Using the inequality
−∑
x∈Xn
Qn(x) log Qn(x) ≤ −∑
x∈Xn
Qn(x) log Pn(x), (34)
we have ∑x∈Xn
(Pn(x) log Pn(x) − Qn(x) log Qn(x))
≤∑
x∈Xn
(Pn(x) − Qn(x)) log Pn(x)
≤ − log Pnmin
∑x∈Xn
|Pn(x) − Qn(x)|
= −d(Pn, Qn) log Pnmin
≤ −ε log Pnmin.
24
Hence, we obtain
D(Pn ‖ Qn) =∑
x∈Xn
Pn(x) logPn(x)Qn(x)
=∑
x∈Xn
Pn(x) log Pn(x) −∑
x∈Xn
Pn(x) log Qn(x)
≤∑
x∈Xn
Qn(x) log Qn(x) − ε log Pnmin −
∑x∈Xn
Pn(x) log Qn(x)
=∑
x∈Xn
(Qn(x) − Pn(x)) log Qn(x) − ε log Pnmin
≤ − log Qnmin
∑x∈Xn
|Pn(x) − Qn(x)| − ε log Pnmin
≤ −ε log PnminQn
min.
�
From Theorem 5 and Lemma 3, we immediately obtain the followingcorollary.
Corollary 2: If the modified interval algorithm is used, then we have
lim infn→∞
[− 1
nlog D
(Uexp(nR) ‖ PnR
Y
) ] ≥ Er(R,PX) (35)
where Er(R,PX) is given by (16).
This corollary implies that if the length of output sequence per inputsample is below the entropy of the source, the approximation error measuredby the divergence also vanishes exponentially as the length of input sequencetends to infinity.
Remark 2:Han[9] has showed that there exists an algorithm for random number
generation of which normalized divergence vanishes. However as shown inCorollary 2, for DMS (more generally finite-state unifilar sources), evendivergence can vanish as the length of input sequence tends to infinity.
Next, we show the following lemmas.
Lemma 4: Let Pn, Qn be arbitrary distributions on X n. If
2 − d(Pn, Qn) ≤ ε,
25
then
D(Pn ‖ Qn) + D(Qn ‖ Pn) ≥ − log ε.
Proof: Using the log-sum inequality[8] and (34), we obtain
D(Pn ‖ Qn) + D(Qn ‖ Pn)
=∑
x∈Xn
Pn(x) logPn(x)Qn(x)
+∑
x∈Xn
Qn(x) logQn(x)Pn(x)
=∑
x∈Xn
(Pn(x) log Pn(x) + Qn(x) log Qn(x))
−∑
x∈Xn
(Pn(x) log Qn(x) + Qn(x) log Pn(x))
≥∑
x∈Xn
(Pn(x) + Qn(x))| log Pn(x) − log Qn(x)|
=∑
x∈Xn
(Pn(x) + Qn(x))∣∣∣∣log Pn(x)
Qn(x)
∣∣∣∣=
∑x∈Xn
(Pn(x) + Qn(x)) logmax(Pn(x), Qn(x))min(Pn(x), Qn(x))
≥∑
x∈Xn
Pn(x) logPn(x)
min(Pn(x), Qn(x))
≥ log1∑
x∈Xn
min(Pn(x), Qn(x))
≥ − log ε.
�
Lemma 5: Let Pn, Qn be arbitrary distributions on X n. If
2 − d(Pn, Qn) ≥ ε,
then
D(Pn ‖ Qn) ≤ −(2 − ε) log PnminQn
min.
Proof: Note that
d(Pn, Qn) ≤ 2 − ε.
26
Then, in a similar manner as the proof of Lemma 4, we obtain
D(Pn ‖ Qn)
=∑
x∈Xn
Pn(x) logPn(x)Qn(x)
=∑
x∈Xn
Pn(x) log Pn(x) −∑
x∈Xn
Pn(x) log Qn(x)
≤∑
x∈Xn
Qn(x) log Qn(x) − (2 − ε) log Pnmin −
∑x∈Xn
Pn(x) log Qn(x)
=∑
x∈Xn
(Qn(x) − Pn(x)) log Qn(x) − (2 − ε) log Pnmin
≤ − log Qnmin
∑x∈Xn
|Pn(x) − Qn(x)| − (2 − ε) log Pnmin
≤ −(2 − ε) log PnminQn
min.
�
From these lemmas and Theorem 7, we immediately obtain the followingcorollary.
Corollary 3: If the modified interval algorithm is used, then for R < Rmax,
lim infn→∞
1n
{D(Uexp(nR) ‖ PnR
Y
)+ D
(PnR
Y ‖ Uexp(nR)
) } ≥ F (R,PX), (36)
where F (R,PX ) is given by (18). Further, for R < log |X |
lim supn→∞
1n
D(Uexp(nR) ‖ PnR
Y
) ≤ R(1 − log minbx∈X PX(x)). (37)
This corollary implies that if the length of output sequence per inputsample is above the entropy of the source, the approximation error measuredby the divergence approaches to two linearly as the length of input sequencetends to infinity.
Remark 3:We can easily extend the result of Chapter IV, V and VI to stationary
ergodic Markov source by using Markov type [11]. Further, we can extendthese result to finite-state unifilar Markov source by using the definition oftype in [12, 13].
VII. Conclusion
27
We have investigated large deviations performance of the interval al-gorithm for random number generation. We have showed almost surelyproposition for i.i.d. random sequence. We have clarified some asymptoticproperties, when target random number is subject to uniform distribution.As future researches, we are going to generalize our results to more complexsources.
References
[1] D. Knuth and A. Yao, “The complexity of nonuniform random numbergeneration,” Algorithm and Complexity, New Directions and Results,pp.357-428, ed. by J. F. Traub, Academic Press, New York, 1976.
[2] S. Vembu and S. Verdu, “Generating random bits from and arbitrarysource: Fundamental limits,” IEEE Trans. on Inform. Theory , vol.IT-41, pp.1322-1332, Sep. 1995.
[3] T. S. Han and M. Hoshi, “Interval algorithm for random number gener-ation,” IEEE Trans. on Inform. Theory , vol.43, pp.599-611, Mar. 1997.
[4] F. Kanaya, “An asymptotically optimal algorithm for generating Markovrandom sequences,” Proc. of SITA’97 , pp.77-80, Matsuyama, Japan,Dec. 1997 (in Japanese).
[5] Y. Ohama, “Fixed to fixed length random number generation using onedimensional piecewise linear maps,” Proc. of SITA’98 , pp.57-60, Gifu,Japan, Dec. 1998 (in Japanese).
[6] T. Uyematsu and F. Kanaya, “Methods of channel simulation achievingconditional resolvability by statistically stable transformation,“ submit-ted to IEEE Trans. on Inform. Theory
[7] O. Uchida and T. S. Han, “Performance analysis of interval algorithm forgenerating Markov processes,” Proc. of SITA’98 , pp.65-68, Gifu, Japan,Dec. 1998.
[8] I. Csiszar and J. Korner, Information Theory: Coding Theorems forDiscrete Memoryless Systems. New York: Academic, 1981.
[9] T. S. Han: Information-Spectrum Methods in Information Theory, Bai-fukan, Tokyo, 1998 (in Japanese).
28
[10] P. C. Shields: The ergodic theory of discrete sample paths, GraduateStudies in Math. vol.13, American Math. Soc. , 1996.
[11] L. D. Davisson, G. Longo and A. Sgarro, “Error exponent for the noise-less of finite ergodic Markov sources,” IEEE Trans. on Inform. Theory ,vol.IT-27, pp.431-438, Jul. 1981.
[12] N. Merhav, “Universal coding with minimum probability of codewordlength overflow,” IEEE Trans. on Inform. Theory , vol.37, pp.556-563,May. 1991.
[13] N. Merhav and D. L. Neuhoff, “Variable-to-fixed length codes providebetter large deviations performance than fixed-to-variable length codes,”IEEE Trans. on Inform. Theory , vol.38, pp.135-140, Jan. 1992.
[14] T. Uyematsu: Today’s Shannon Theory , Baifukan, Tokyo, 1998 (inJapanese).
29