polar write once memory codes

5088 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 8, AUGUST 2013

Polar Write Once Memory CodesDavid Burshtein, Senior Member, IEEE, and Alona Strugatski

Abstract—A coding scheme for write once memory (WOM)using polar codes is presented. It is shown that the scheme achievesthe capacity region of noiseless WOMs when an arbitrary numberof multiple writes is permitted. The encoding and decoding com-plexities scale as log , where N is the blocklength. For Nsufficiently large, the error probability decreases subexponentiallyin N. The results can be generalized from binary to generalizedWOMs, described by an arbitrary directed acyclic graph, usingnonbinary polar codes. In the derivation, we also obtain results onthe typical distortion of polar codes for lossy source coding. Somesimulation results with finite length codes are presented.

Index Terms—Polar codes, write once memory codes (WOMs).

I. INTRODUCTION

T HE model of a write once memory (WOM) was proposedby Rivest and Shamir [1]. In WOMs, writing may be ir-

reversible in the sense that once a memory cell is in some state,it cannot easily convert to a preceding state. Flash memory isan important example since the charge level of each memorycell can only increase, and it is not possible to erase a singlememory cell. It is possible to erase together a complete blockof cells which comprises a large number of cells, but this is acostly operation and it reduces the life cycle of the device.Consider a binary WOM which is comprised of memory

cells. Suppose that we write on the device times and denotethe number of possible messages in the th write by

. The number of bits that are written in the th write isand the corresponding code rate is .

Let denote the -dimensional state vector of the WOM attime (generation) for , and suppose that .For , the binary message vector is ( bits).Given and the memory state , the encoder computes

using an encoding function and writes the resulton the WOM. The WOM constraints can be expressed by

, where the vector inequality applies componentwise.Since the WOM is binary, and are binary vectors, so thatif for some component , then . The decoderuses a decoding function to compute the decoded message

. The goal is to design a low complexity read-writescheme that satisfies theWOM constraints and achieves

Manuscript received September 22, 2012; revised March 17, 2013; acceptedMarch 21, 2013. Date of publication April 01, 2013; date of current versionJuly 10, 2013. This work was supported by the Israel Science Foundation underGrant 772/09. This paper was presented in part at the 2012 IEEE InternationalSymposium on Information Theory.The authors are with the School of Electrical Engineering, Tel-Aviv Uni-

versity, Tel-Aviv 69978, Israel (e-mail: [email protected]; [email protected]).Communicated by E. Arıkan, Associate Editor for Coding Theory.Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TIT.2013.2255732

for with high probability for any set of messages. As is commonly done in the literature, we

assume that the generation number on each write and read isknown (see, e.g., [2] where it is explained why this assumptiondoes not affect the WOM rate).The capacity region of the WOM is [3]

(1)

( denotes a -dimensional vector with positive elements;is the binary entropy

function). Note that this is both the zero-error capacity regionand the -error capacity region1 (see the comment after thestatement of Theorem 4 in [3]). We also define the maximumaverage rate,

The maximum average rate was shown to be [3]. This means that the total number of bits that can be stored

on WOM cells in writes is which is signifi-cantly higher than . The maximum fixed rate defined by

was also obtained [3]. WOM codes were proposed in the pastby various authors, e.g., [1], [2], [4]–[9], and references therein.For the case where there are two writes, , the method in [9]can approach capacity with computational complexity which ispolynomial in the blocklength. To the best of our knowledge,this was the first solution with this property.In this study, which is an expanded version of [10], we pro-

pose a new family of WOM codes based on polar codes [11].The method relies on the fact that polar codes are asymptoti-cally optimal for lossy source coding [12] and can be encodedand decoded efficiently ( operations where isthe blocklength). We show that our method can achieve anypoint in the capacity region of noiseless WOMs when an ar-bitrary number of multiple writes is permitted. The encoding

1The vector of rates is in the zero-error ( -error, respec-tively) capacity region if the corresponding rates are achievable with zero error(with a vanishing, with the blocklength, error, respectively).

0018-9448/$31.00 © 2013 IEEE

BURSHTEIN AND STRUGATSKI: POLAR WRITE ONCE MEMORY CODES 5089

and decoding complexities scale as . For suf-ficiently large, the error probability is at most for any

. We demonstrate that this method can be used toconstruct actual practical WOM codes. We also show that ourresults can be applied to generalized WOMs, described by anarbitrary directed acyclic graph (DAG), using nonbinary polarcodes. In the derivation, we also obtain results on the typicaldistortion of polar codes for lossy source coding.Recently, another WOM code was proposed [13], which can

approach any point in the capacity region of noiseless WOMsin computational complexity that scales polynomially with theblocklength. On one hand, the method in [13] is deterministicand guarantees zero error, while our method is probabilisticand only guarantees a vanishing (with the blocklength) errorprobability. On the other hand, as discussed in [13] (see alsoSection VI), the method in [13] requires excessively long block-length sizes, while our method can be used to construct actualWOM codes with reasonable blocklength sizes. Even thoughthere is some small error probability (that approaches zero as theblocklength increases), in an actual WOM (e.g., flash memory),there is also some channel noise. Hence, there is always somesmall inevitable error.The rest of this paper is organized as follows. In Section II,

we provide some background on polar codes for channel andlossy source coding. In Section III, we provide extended re-sults on polar codes for lossy source coding that will be re-quired later. In Section IV, we present the new proposed polarWOM code for the binary case and analyze its performance. InSection V, we present a generalization of our solution to gener-alized WOMs, described by an arbitrary DAG, using nonbinarypolar codes. In Section VI, we present some simulation results.Finally, Section VII concludes this paper.

II. BACKGROUND ON POLAR CODES

In his seminal work [11], Arikan has introduced Polar codesfor channel coding and showed that they can achieve the sym-metric capacity (i.e., the capacity under uniform input distribu-tion) of an arbitrary binary-input channel. In [14], it was shownthat the results can be generalized to arbitrary discrete memo-ryless channels. We will follow the notation in [12]. Let

and let its th Kronecker product be . Also de-

note . Let be an -dimensional binary mes-sage vector, and let where the matrix multiplicationis over . Suppose that we transmit over a memorylessbinary-input channel, , with transition probabilityand channel output vector . If is chosen at random withuniform probability, then the resulting probability distribution

is given by

(2)

Define the following subchannels :

Denote by the symmetric capacity of the channel (itis the channel capacity when the channel is memoryless bi-nary-input output-symmetric (MBIOS)) and by theBhattacharyya parameters of the subchannels (for a defi-nition of the Bhattacharyya parameter, see, e.g., [11, Sec. I.A]).In [11] and [15], it was shown that asymptotically in , a frac-tion of the subchannels satisfy for any

. Based on this result, the following communica-tion scheme was proposed. Let be the code rate. Denote bythe set of subchannels with the highest values of

(denoted in the sequel as the frozen set), and by theremaining subchannels. Fix the input to the subchannelsin to some arbitrary frozen vector (known both to the en-coder and to the decoder) and use the channels in to transmitinformation. The encoder then transmits over thechannel. The decoder applies the following successive cancela-tion (SC) scheme. For , if , then

( is common knowledge), otherwise

where

Asymptotically, reliable communication under SC decoding ispossible for any . The error probability is upperbounded by for any , and the SC decoder canbe implemented in complexity .Polar codes can also be used for lossy source coding [12].

Consider a binary symmetric source (BSS), i.e., a random bi-nary vector uniformly distributed over all -dimensional bi-nary vectors. Let be a distance measure between twobinary vectors, and , such that ,where and . For agiven distortion level, , one would like to design a code withrate and blocklength , an encoder and a decoder that op-erate as follows. Given a realization of the source, theencoder obtains a codeword , represented by its index, suchthat . Given the codeword index, the decodercan then obtain the codeword .The polar source coding scheme for a BSS is described as fol-

lows [12]. Define a binary symmetric channel (BSC)with crossover parameter . This channel is the test channel[16] corresponding to a BSS and distortion level . Given thechannel, , construct a polar code with frozen set that con-sists of the subchannels with the largest values of

. This code uses some arbitrary frozen vector whichis known both to the encoder and to the decoder (e.g., )and has rate . Given the SC encoder appliesthe following scheme based on the duality between source andchannel coding. For , if then ,otherwise

(3)


(w.p. denotes with probability). It can be shown thatis set to zero (one, respectively) with the posterior probabilitythat ( , respectively) given that and

. As explained in [12], unlike polar channelcoding, the decoder uses a randomized decision rule, termedrandomized rounding in [12]. The randomized rounding isessential to the analysis of the source encoder. The complexityof the encoding scheme is . Since iscommon knowledge, the decoder only needs to obtainfrom the encoder . It can then reconstruct theapproximating source codeword using . Let

be the average distortion of this polar code(the averaging is over both the source vector, , and over theapproximating source codeword, , which is determinedat random from ). Also denote by therate distortion function [16]. In [12], it was shown, given any

and , thatfor (i.e., ) sufficiently large, ,and any frozen vector , the polar code with rate under SCencoding satisfies

(4)

In fact, as noted in [12], the proof of (4) is not restricted to aBSS and extends to general sources, e.g., a binary erasure source[12].

III. EXTENDED RESULTS FOR POLAR SOURCE CODES

Although the result in [12] is concerned only with the av-erage distortion, one may strengthen it by combining it withthe strong converse result of the rate distortion theorem in [17,p. 127]. The strong converse asserts that for any , if

is chosen sufficiently small and , thencan be made arbitrarily small

by choosing sufficiently large. Combining this with (4), wecan conclude, for a polar code designed for a , with

and sufficiently small, that

(5)

for any .We now extend the analysis of polar source codes by

showing, with high probability, a joint typicality relationbetween a given source realization and the approximatingcodeword under SC encoding. Using this joint typicality result,it will be possible to extend the result in (5) and obtain animproved upper bound estimate (as a function of ) on theconsidered probability. We note, however, that only the jointtypicality result (and not the improvement to (5)) is necessaryfor our WOM code analysis.The following discussion is valid for an arbitrary discrete

MBIOS channel, , in (2). As in [12], we construct apolar source code, which uses the SC encoding scheme (3), withfrozen set defined by,

(6)

(note that depends on ; however, for simplicity, our notationdoes not show this dependence explicitly) and

(7)

By [12, Th. 19 and eq. (22)], (see also [12, eq. (12)]),

Hence, for any , if is large enough, then the rate ofthe code satisfies

Let be a source vector produced by a sequence of inde-pendent identically distributed (i.i.d.) realizations of . Ifis chosen at random with uniform probability, then the vectorproduced by the SC encoder (that utilizes (3)) has a conditionalprobability distribution given by [12]

(8)

where

ifif

(9)

On the other hand, the conditional probability of given cor-responding to (2) is

Recall that the SC encoder sets to zero (one, re-spectively) with the posterior probability that ( ,respectively) given that and . This ex-plains the second line in (9). The first line in (9) is due to the factthat is chosen at random with uniform probability. Hence,the probability distributions andare different only for . Now, since the frozen set ischosen such that it comprises the indices associated with badsubchannels (i.e., subchannels with Bhattacharyya param-eters close to one), this implies that for isclose to 1/2. Hence, the probability distributions and

are close to each other. An explicit upper bound on thegap between and was obtained in [12] and will be usedbelow.In the sequel, we employ standard strong typicality argu-

ments. Similarly to the notation in [16, Sec. 10.6, pp. 325–326],we define an -strongly typical sequence with re-spect to a distribution on the finite set , and denote itby (or for short) as follows. Let de-note the number of occurrences of the symbol in the sequence. Then, if the following two conditions hold.First, for all with .Second, for all with . Simi-larly, we define -strongly typical sequenceswith respect to a distribution on the finite set ,and denote it by (or for short). We de-note by the number of occurrences of in ,


and require the following. First, for all with. Second, for all

with . The defi-nition of -strong typicality can be extended to more than twosequences in the obvious way.

In our case, . Note that is a fullrank matrix. Therefore, each vector corresponds to ex-actly one vector . We say that if

with respect to the probability distri-bution (see (2)).Theorem 1: Consider a discrete MBIOS channel, .

Suppose that the input binary random variable is uniformlydistributed (i.e., w.p. ), and denote thechannel output random variable by . Let the source vectorrandom variable be created by a sequence of i.i.d. re-alizations of . Consider a polar code for source coding [12]with blocklength and a frozen set defined by (6) and(7) (whose rate approaches asymptotically) as describedabove. Let be the random variable denoting the output of theSC encoder. Then for any and (i.e., )sufficiently large, w.p. at least .Recall that the SC encoder’s output has conditional proba-

bility distribution given by (8) and (9). Hence, Theorem

1 asserts that, for sufficiently large,

.Proof: To prove the theorem, we use the following result

of [12, Lemmas 5 and 7],

(10)

Hence,

(11)

In addition, we claim the following

(12)

for some constant (that can depend on ). We now prove (12):

(13)

In the first equality, we have used the fact thatimplies . Let be a binary random

variable such that if andotherwise. Then,

Therefore,

(14)

where the inequality is due to Hoeffding’s inequality (using thefact ). Hence,

which, together with (13), proves (12). From (11)

Combining this with (12), we get

Recalling the definition of , (7), the theorem followsimmediately.Although not needed in the rest of the paper, we can now

improve the inequality (5) using the following Theorem.Theorem 2: Let be a random vector, uniformly distributed

over all -dimensional binary vectors. Consider a polarcode for source coding [12] designed for a BSC with crossoverparameter . Let be the reconstructed source codewordgiven . Then, for any and sufficientlylarge,

(15)

The code rate approaches the rate distortion function,, for sufficiently large.

Proof: Since then,

Denote by , and the events


Fig. 1. The probability transition function of the th channel.

Then, for sufficiently large,

(16)

The last inequality is due to Theorem 1. This proves (15) (since(16) holds for any )

IV. THE PROPOSED POLAR WOM CODE

Consider the binary WOM problem that was defined inSection I. Given some set of parameters

, and , we wish to show that we canconstruct a reliable polar coding scheme for any set of WOMrates in the capacity region (1). That is, therates satisfy

where

(17)

For that purpose, we consider the following test channels.The input set of each channel is . The output set is

. Denote the input random variableby and the output by . The probability transitionfunction of the th channel is defined by

(18)

where

ifififif .

(19)

This channel is also shown in Fig. 1. It is easy to verify that thecapacity of this channel is and that the capacityachieving input distribution is symmetric, i.e.,

.For each channel , we design a polar code with blocklengthand frozen set of subchannels defined by (6). The rate is

(20)

where is arbitrarily small for sufficiently large. Thiscode will be used as a source code.Denote the information sequence by and the

sequence of WOM states by . Also letdenote the th encoding function for a WOM state

vector and information vector . Similarly, let denotethe th decoding function for a WOM state vector . Hence,

and , where is theretrieved information sequence. We define andas follows.Encoding function, :1) Let where denotes bitwise XOR andis a sample from an -dimensional uniformly distributedrandom binary vector. The vector is a commonrandomness source (dither), known both to the encoder andto the decoder.

2) Let and . Compressthe vector using the th polar code with . Thisresults in a vector and a vector .

3) Finally .Decoding function, :1) Let .2) where denotes the elements of

the vector in the set .Note that the information is embedded within the set .

Hence, when considered as a WOM code, our code has rate, where is the rate

of the polar source code.For the sake of the analysis of our proposed method, we

slightly modify the coding scheme as follows:(M1) The definition of the th channel is modified such thatin (19), we use instead of where will bechosen sufficiently small.Wewill show that any set of rates

that satisfy

is achievable in our scheme. Setting sufficiently smallthen shows that any point in the capacity region (1) isachievable using polar WOM codes.(M2) The encoder sets instead of ,where is -dimensional uniformly distributed binary(dither) vector known both at the encoder and decoder. Inthis way, the assumption that is uniformly distributedholds. Similarly, the decoder modifies its operation to

.

(M3) We assume a random permutation of the input vectorprior to quantization in each polar code. These random

permutations are known both at the encoder and decoder.More precisely, in step 2, the encoder applies the permuta-tion, , on . Then, it compresses the permuted and ob-tains some polar codeword. Finally, it applies the inversepermutation, , on this codeword to produce and pro-ceeds to step 3. The decoder, in the end of step 1, uses thepermutation, , to permute , and then uses this permuted(instead of ) in step 2.

(M4) Denote the Hamming weight of the WOM stateafter writes by . Also denote the binomial


distribution with trials and success probabilityby , such that if for

. After weobtain in step 3 of the encoder for the th write, wedraw at random a number from the distribution. If , then we flip elements in from 0

to 1. This modified value of is finally written onto theWOM.

Clearly, the complexities of the original and modified encodersand decoders are . The significance of the modifi-cations (M1)–(M4) will be pointed out in the proof of the fol-lowing theorem.Theorem 3: Consider an arbitrary information sequence

with rates that are inside the ca-pacity region (1) of the binary WOM. For anyand sufficiently large, the binary polar WOM coding schemedefined by and (M1)–(M4) can beused to write this sequence reliably over the WOM w.p. at least

in encoding and decoding complexities .To prove the theorem, we need the following lemma.2

Consider an i.i.d. source with the following probabilitydistribution:

ifififif .

(21)Note that this source has the marginal distribution of the outputof the th channel defined by (18) and (19) under a symmetricinput distribution.Lemma 1: Consider a polar code designed for the th

channel defined by (18) and (19) as described above. The codehas rate defined in (20), a frozen set of subchannels, ,and some frozen vector which is uniformly distributedover all -dimensional binary vectors. The code is used toencode a random vector drawn by i.i.d. sampling fromthe distribution (21) using the SC encoder. Denote by theencoded codeword. Then, for any , andsufficiently large, the following holds w.p. at least :

(22)

(23)

Proof: According to Theorem 1, for (i.e., ) largeenough,

w.p. at least . Consider all possible triples ,where and . From the defini-tion of , if , then (w.p. at least ),

(24)

2This Lemma is formulated for the original channel with parameter , andnot for the (M1) modified channel with parameter .

and if , then

(25)

In addition, using and thechannel definition (18) and (19), we have

Combining this with (24), we obtain

Hence,

This proves (22). Similarly (23) is due to (25) sincefrom the definition of the

channel.We proceed to the proof of Theorem 3. We denote by

, and the random variables corre-sponding to , and .

Proof of Theorem 3: Note that since the WOM is noise-less, we only need to prove successful encoding. That is, by thedefinition of the decoding function, , since is known atthe decoder, the decoder always retrieves the correct informa-tion vector, , from the WOM state . Therefore, we only needto show that, w.p. at least (for any ),there is no encoding error in the th writing for all .An encoding error occurs if at some stage the new state vector,, obtained by the encoder, is not compatible with the currentstate vector, , and thus, cannot be written over by .Recall our definition . We will prove the the-

orem by showing, using induction on , that (w.p. at least) the th encoding is successful and .

For , this assertion holds since and . Hence,we only need to prove the th induction step. For notational sim-plicity, we use instead of . We also denote by the valueof before applying the procedure described in (M4), and bythe value of after applying the procedure described in (M4).Finally, , and . Supposethen that (the induction assumption). Toprove the th induction step, we need to show that this implies(w.p. at least ) that the th encoding is successful and

. Our main claim is that, under the induc-tion assumption, for sufficiently small, and sufficientlylarge, the th encoding will be successful and(w.p. at least ). Once the main claim is proved, the thinduction step is also proved. This is due to the fact that, for any

in (M4) is greater than w.p. at leastfor some constant independent of

(this follows from the Chernoff bound). Hence, (w.p.at least ). After we apply the procedure in (M4), wethus obtain .


We now prove our main claim above. Considering step 1 ofthe encoding, we see that , after the random per-mutation described in (M3), can be considered as i.i.d. sam-pling of the source defined in (21) (by our assumptionthat , and since is uniformly dis-tributed). Now, in step 2 of the encoding, we compress the vector. Due to modification (M2), we can assume that the frozen

vector in the source polar code is uniformly distributed.Hence, we can apply Lemma 1 and obtain that the compressionof in step 2 satisfies the following for any and suf-ficiently large w.p. at least .1) If then .2) For at most components , we have

and .Hence, in step 3 of the encoding, if , then

(i.e., the WOM constraints are satisfied). In addition,there are at most components for which

and . Therefore, w.p. at least , thevectors and satisfy the WOM constraints and

(26)

(in the first inequality, we have used the fact that for suffi-ciently large, w.p. at leastfor some constant independent of , as follows from theChernoff bound). Setting concludes the proofof our main claim above and the proof of the th induction step.Note that modification (M1) was required since is usedfor concluding that (for some ) from(26). The complexity claim is due to the results in [11].Notes:1) The test channel in the first write is actually a BSC (since

in Fig. 1). Similarly, in the last write, we canmerge together the source symbols and (notethat so that and are statistically independentgiven ), thus obtaining a test channel which is abinary erasure channel.

2) Consider, for example, a flash memory device. In practice,the dither, , can be determined from the address of theword (e.g., the address is used as a seed value to a randomnumber generator).

3) In the rare event where an encoding error has occurred,the encoder may reapply the encoding using another dithervector value. Furthermore, the decoder can realize whichvalue of dither vector should be used in various ways.One possibility is that this information is communicated,similarly to the assumption that the generation number isknown. Another possibility is that the decoder will switchto the next value of the dither value upon detecting de-coding failure, e.g., by using CRC information. By re-peating this procedure of re-encoding upon a failure eventat the encoder several times, one can reduce the error prob-ability as much as required.

V. GENERALIZATION TO NONBINARY POLAR WOM CODES

A. Nonbinary Polar Codes

Nonbinary polar codes over a -ary alphabet forchannel coding over arbitrary discrete memoryless channelswere proposed in [14]. Nonbinary polar codes over a -aryalphabet for lossy source coding of a memoryless source wereproposed in [18]. First suppose that is prime. Similarly to thebinary case, the codeword of a -ary polar code is relatedto the -dimensional message vector by therelation , where the matrix is the same asin the binary case. However, now where

, and is performed modulo .Suppose that we transmit over a memoryless channel withtransition probability and channel output vector . Ifis chosen at random with uniform probability over , then

the resulting probability distribution is given by

(27)

Define the following subchannels:

We denote by and , respectively, the symmetriccapacity parameters of and . In [14], it was shown thatthe subchannels polarize as in the binary case with thesame asymptotic polarization rate. The frozen set is chosen simi-larly to the binary case. Asymptotically, reliable communicationunder SC decoding is possible for any rate . The errorprobability is upper bounded by for any , and thedecoder can be implemented in complexity .Nonbinary polar codes were also proposed for lossy source

coding [18]. Consider some random variable .For simplicity we assume that is finite. Also denote

. Let the source vector random variablebe created by a sequence of i.i.d. realizations of . Let

be some (finite) distance measure between and. Furthermore, for and , we define

. Given some distortion level, ,let be the test channel that achieves the symmetric ratedistortion bound, , (i.e., the rate distortion bound underthe constraint that is uniformly distributed over ) for thesource at distortion level . Using that channel, , weconstruct a polar code with frozen set defined by [18]

(28)

where . Given , the SC encoder appliesthe following scheme. For , if , then

, otherwise


The complexity of this scheme is . It was shown[18] that

Hence, for sufficiently large, the rate of the code,, approaches . Furthermore,

for any frozen vector, ,

under SC encoding, where is the averagedistortion.In fact, using the results in [18], the statements in Section III

immediately extend to the nonbinary case. Consider a polar codeconstructed using some discrete channel with frozenset defined in (28). Suppose that is chosen at random withuniform probability. Then, similarly to (8) and (9), the vectorproduced by the SC encoder has a conditional probability dis-tribution given by

(29)

where

ifif

(30)

On the other hand, the conditional probability of given cor-responding to (27) is

Similarly to (10) above, it was shown in [18, Lemmas 2 and5] that

(31)

Combining (31) with exactly the same arguments that werepresented in Theorem 1 yields the following generalization toTheorem 1.Theorem 4: Consider a discrete channel, , where

and where is prime. Suppose thatthe input random variable is uniformly distributed over ,and denote the channel output random variable by . Let thesource vector random variable be created by a sequence ofi.i.d. realizations of . Consider a polar code for source coding[18] with blocklength and a frozen set defined by (28)(whose rate approaches asymptotically). Let be therandom variable denoting the output of the SC encoder. Then,for any , and sufficiently large,

w.p. at least .Although not needed in the sequel, Theorem 2 also general-

izes to the -ary case:Theorem 5: Consider some random variable and let

where is prime. Let the source vector

random variable be created by a sequence of i.i.d. real-izations of . Let be some (finite) distance measure be-tween and . Let be the test channel thatachieves the symmetric rate distortion bound for distance mea-sure and some distortion level, . Consider a polarcode for source coding [18] designed for as describedabove. Let be the reconstructed source codeword given. Then, for any and sufficiently large,

(32)

The code rate approaches the symmetric rate distortion function,, for sufficiently large.

Proof: Given some , we set sufficiently smalland sufficiently large, thus obtaining

where the last inequality is due to Theorem 4, and the fact that if, for sufficiently small and sufficiently

large, then .When is not prime, the results in this section still apply

provided that the polarization transformation is modified as de-scribed in [14]. In each step of the transformation, instead of

we use

where is a permutation, chosen at random with uniform prob-ability over .

B. The Generalized WOM Problem

Following [19], the generalized WOM is described bya rooted DAG, represented by its set of states (vertices)and by its set of edges . The setrepresents the possible states of each memory cell. We saythat there exists a path from state to state in the WOM,and denote it by , if, for some , there existvertices such that for

is connected to by an edge in(in particular ). The root of the DAG, which representsthe initial state of the WOM, is vertex 0. While updating theWOM, only transitions from state to state whereare possible. As an example [19] consider the case where

and . In thiscase, we can update a memory cell from state 0 to any otherstate. We can update from state 1 to either 1, 2, or 3. We canupdate from state 2 to either 2 or 3. A memory cell in state 3will remain in this state forever. For two vectors ,we denote by if and only if for .Furthermore, consider two random variables, and , that


take values in . We denote if, for all andimplies .

The capacity region of the WOM is [3], [19]

(33)

where denotes conditional entropy.Consider some set of random variables such that

(34)Define

(35)

and

(36)

It follows that

(37)

and

(38)

where for a -dimensional probability vector(i.e., and ),

the entropy function is defined by

(in this section, the base of all the logarithms is so that coderate is measured with respect to -ary information symbols).

C. The Proposed Nonbinary Polar WOM Code

Given a set of random variables satisfying (34), withparameters defined in (35) and (36), we wish to show that wecan construct a reliable polar WOM coding scheme with WOMrates that satisfy for

, corresponding to the capacity region (33). Forthat purpose, we consider the following test channels. Theinput set of each channel is . The output setis . Denote the input random variable by andthe output by . The probability transition function of theth channel is defined by

(39)

where the additions are modulo .This channel is symmetric in the following sense [20, p. 94].

The set of outputs can be partitioned into subsets (the outputswith equal value of ) such that in the matrix of transition

probabilities of each subset, each row (column, respectively) isa permutation of any other row (column). Hence, by [20, Th.4.5.2], the capacity achieving distribution is the uniform distri-bution, and the symmetric capacity of this channel is in fact the

capacity. For uniformly distributed over ,we obtain (see Appendix A)

(40)

For each channel , we design a polar code with blocklengthand frozen set of subchannels defined by (28). The rate is

(41)

where is arbitrarily small for sufficiently large. Thiscode will be used as a source code.Denote the information sequence by and the

sequence of WOM states by . Also letdenote the th encoding function for a WOM state

vector and information vector . Similarly, let denotethe th decoding function for a WOM state vector . Hence,

and , where is theretrieved information sequence. We define andas follows. All the additions (and subtractions) are performedmodulo .Encoding function, :1) Let , where is a sample from an dimen-sional uniformly distributed random -aryvector. The vector is a common randomness source

(dither), known both to the encoder and to the decoder.2) Let and . Compressthe vector using the th polar code with . Thisresults in a vector and a vector .

3) Finally .Decoding function, :1) Let .2) .

As in the binary case, the information is embedded within theset . Hence, when considered as a WOM code, our code hasrate , where is therate of the source code.For the sake of the analysis of our proposed method, we

slightly modify the coding scheme similarly to the binary case(modifications (M1)–(M4)). More precisely,

(M’1) The definition of the th channel is modified suchthat in (39), we use instead of

. The parameters ,are defined as follows:

ififif

(42)

for some which will be chosen arbitrarily small. Inorder to obtain a valid set of parameters, ,we first argue that we can assume, without loss of gener-ality, that

(43)

Since vertex 0 is the root of our DAG, then .If the required condition (43) is not satisfied, then we canslightly shift the probabilities such that (43) doeshold (all the other transition probabilities, for


, remain the same). Suppose we can prove the theoremfor the shifted parameters. That is, we assume that we canprove the theorem for inside the capacity re-gion (33) defined with the shifted parameters. Then, bycontinuity arguments, if we make the difference betweenthe original and shifted parameters sufficiently small, thiswill also prove the theorem for rates insidethe capacity region defined with the original parameters

.(M’2) The encoder sets instead of

, where is -dimensional uniformly dis-tributed -ary (dither) vector known both at the encoderand decoder. In this way, the assumption that is uni-formly distributed holds. Similarly, the decoder modifies

its operation to .

(M’3) This modification is identical to modification (M3)above.(M’4) Denote by , whereis the th element of the WOM state after writes,. Also denote by . Let the

multinomial distribution with trials and proba-bilities and

, be denoted by .Then, if for

such that.

After we obtain in step 3 of the encoder for the thwrite, we draw at random a vector fromthe distribution . If

, then, , weflip elements in from 0 to . This modifiedvalue of is finally written onto the WOM.

Clearly, the complexities of the original and modified encodersand decoders are .Theorem 6: Consider an arbitrary information sequence

with rates that are inside the ca-pacity region (33) of the -ary WOM. For any andsufficiently large, the nonbinary polar WOM coding scheme

defined by and (M’1)–(M’4)can be used to write this sequence reliably over the WOMw.p. at least in encoding and decoding complexities

.To prove the theorem we need the following lemma.3 Con-

sider an i.i.d. source with the following probabilitydistribution:

(44)

Note that this source has the marginal distribution of the outputof the th channel defined by (39) under a symmetric inputdistribution.Lemma 2: Consider a -ary polar code designed for the th

channel defined by (39) as described above. The code has ratedefined in (41), a frozen set of subchannels, , and some frozenvector which is uniformly distributed over all dimen-sional -ary vectors. The code is used to encode a random vector

3This lemma is formulated for the original channel with parameters(and not for the (M’1) modified channel with parameters ).

drawn by i.i.d. sampling from the distribution (44) usingthe SC encoder. Denote by the encoded codeword. Then, forany , and sufficiently large, the followingholds w.p. at least (in the following expressions, ad-ditions are modulo ):

(45)

Furthermore, if , then

(46)

Proof: According to Theorem 4, for large enough,

w.p. at least . Consider all possible triples ,where and

. From the definition of , we have (w.p.

at least ),

(47)

Furthermore, if , then (47) can be strengthenedto

(48)

In addition, using and the channel definition(39), we have

where here, and in the following expressions, andare calculated modulo . Combining this with (47), we obtain

Hence,

( is also calculated modulo ). This proves (45). Simi-larly, (46) is due to (48) since implies, and therefore, from the definition of the channel, we have

for all .We proceed to the proof of Theorem 6. We denote by

, and the random variables corre-sponding to , and .

Proof of Theorem 6: The proof is similar to the proof ofTheorem 3. Therefore, we provide a shorter proof version withemphasis on the differences.As was noted in the binary case, we only need to prove suc-

cessful encoding.Recall our definition . Suppose that

. Our main claim is


that under this assumption, for sufficiently small andsufficiently large, w.p. at least , the encoding will besuccessful and for .For notational simplicity, we use instead of . We alsodenote by the value of before applying the procedure de-scribed in (M’4). Considering step 1 of the encoding, we see that

, after the random permutation described in (M’3),can be considered as i.i.d. sampling of the source definedin (44). Hence, by Lemma 2 and (M’2), the compression of thisvector in step 2 satisfies the following for any and suf-ficiently large w.p. at least .1) Suppose that . Then,

. In addi-tion, and .Hence, we conclude, under the above assumption, that if

, then .2) For at most components wehave and , i.e., .

Hence, the WOM constraints are satisfied, and there are at mostcomponents for which

and . Therefore, w.p. at least , the vectorsand satisfy the WOM constraints and

Now, recalling (42), we obtain for

(49)

where the equality is due to (37). Setting(note that due to (43)) yields our main

claim.From (49), we know that in (M’4) will indeed

satisfy the condition , w.p. atleast . The proof of the theorem now follows by usinginduction on to conclude that (w.p. at least ) the thencoding is successful and .

VI. SIMULATION RESULTS

To demonstrate the performance of our coding scheme for fi-nite length codes, we performed experiments with polar WOMcodes with , 12, 14, 16. Each polar code was constructedusing the test channel in Fig. 1 with the appropriate parametersand . To learn the frozen set of each code, we used

the Monte–Carlo approach that was described in [21] (whichis a variant of the method proposed by Arikan [11]). Fig. 2describes our experiments with write WOMs designedto maximize the average rate. Using the results in [3], we set

. Hence, by (17), . Each point in each graphwas determined by averaging the results of 1000 Monte–Carloexperiments. Fig. 2 (left) shows the success rate of the first writeas a function of the rate loss compared to the optimum

for each value of . Here, success

is defined as . Note that even if this conditionis violated, the first write always produces a valid state vector,i.e., a state vector, , compatible with the initial state, .The motivation for this definition follows from the proof of The-orem 3. Fig. 2 (right) shows the success rate of the second writeas a function of the rate loss compared to the optimum

. Here, we declare a success if theWOM constraintsare satisfied, i.e., if in the second write the encoder obtains anew state vector, , which is compatible with the currentWOMstate, . As was noted in the proof of Theorem 3, this is the onlyrequirement for success since the decoder cannot fail (in addi-tion is always compatible with ). Each experiment inthe second write was performed by using a first write with rateloss of compared to the optimum. For ,12, 14, should be higher, but this is compensated by usinghigher values of . As an alternative we could have used ahigher rate loss for , 12, 14, in which case de-creases. In terms of total rate loss, both options yielded similarresults. As was noted earlier, the success rate can be increasedif, upon detecting an encoding error, the encoder repeats the en-coding with another dither value.We have also experimented with a write WOM. We

used polar codes with , 14, 16 and setand ( and ) to maximize

the average rate in accordance with [3]. To find the frozen setof each code, we used density evolution as described in [22]

and [21]. We applied the quantized version of density evolu-tion described in [21, Sec. 2.5.2], with quantization bin size of0.25 for the quantized messages. The advantage of the densityevolution approach over the Monte–Carlo approach is that, withsufficiently small size quantization level, it can learn the frozenset deterministically without resorting to long time consumingMonte-Carlo simulations. For the case with the largervalues of , the Monte–Carlo estimation required a very largenumber of simulations. The maximum average rate is obtainedfor , and . The actual in-formation rates are presented in Table I, where in read/writeexperiments all information triples were encoded (and decoded)successfully.Comparing our results to the recently proposed method in

[13] shows the following. We applied [13, Sec. 2.3 and 2.4],with , 0.4 ( is a parameter that controls the gapbetween the maximum and actual sum rate) and design rates

and . The parameters, and (the blocklength) are set as de-

scribed in [13]. The parameter controls the tradeoff betweenthe blocklength, , and the encoding and decoding complex-ities. Setting , the encoding complexity is and thedecoding complexity is . The parameter in thescheme satisfies . Now, the actual rate in the firstwrite, , satisfies

(50)

The actual rates in the th write, , satisfy

(51)


Fig. 2. The success probabilities of the first (left) and second (right) writes to the binary WOM as a function of the rates loss, and . In the second write,.

TABLE ITHE PERFORMANCE OF WRITE POLAR WOMS WITH , 14, 16

For ( , respectively), the method in [13] guaran-tees that the gap between the maximum and actual sum rate isat most ( , respectively). Moreover, by (50) and(51), it turns out that the gap is at least 0.1 (0.2, respectively).However, the blocklengths in these cases are huge ( and

, respectively).Finally, we performed experiments with generalized WOMs

for the case where the DAG representing the WOM is the fol-lowing. The vertices are and the edges are

. We consider the case where there are twowrites, i.e., . By [19, Proof of Theorem 3.2], the maximumtotal number of 3-ary information symbols, , that canbe stored in one storage cell of the WOM is 1.6309. Further-more, by [19, Proof of Theorem 3.2], the maximum value of

is achieved for and , andthe following parameters, and for

, need to be used in (33):

Our scheme uses polar codes with and , 12, 14.Using the above parameters, and ,in (39), we obtain the following definition of the first testchannel:

ififif

ififif

ififif

Similarly, the second test channel is given by

We see that given and are statistically indepen-dent. Hence, we can simplify this channel by merging the threeoutput symbols, , and , into one symbol. Tolearn the frozen set of each code, we used the Monte–Carloapproach that was described in [21].Fig. 3 presents the success rate of the first and second writes

as a function of the rates loss and compared to theoptimal rates, and . This is shownfor polar codes with , and . Each pointin the graph was obtained by averaging the results of 10 000Monte–Carlo experiments. In the first write, we declare a suc-cess if the fraction of “1” (“2” respectively) in is less than


Fig. 3. The success probabilities of the first (left) and second (right) writes to the -ary WOM for as a function of the rates loss, and . In thesecond write, .

or equal to ( , respectively). In the second write,we declare a success if all the WOM constraints are satisfied.Each experiment in the second write was preformed by using

(i.e., ).

VII. CONCLUSION

We have presented a new family of WOM codes based onthe recently proposed polar codes. These codes achieve the ca-pacity region of noiseless WOMs when an arbitrary number ofmultiple writes is permitted. The encoding and decoding com-plexities scale as , where is the blocklength. Forsufficiently large, the error probability decreases subexpo-

nentially in . The results apply both for binary and for gener-alized WOMs, described by an arbitrary DAG.There are various directions in which our work can be gen-

eralized. The first is the design of codes for noisy WOMs. Itshould be noted that there are various models for noisy WOMs.The capacity region of the most general model, proposed in [3],is yet unknown. However, for certain special cases [3], [23],the maximum average rate and in some cases even the capacityregion are known. The achievable rate region of some noisyWOMmodels, presented in [3] and [23], are based on coding forGelfand–Pinsker side information channels. Hence, in this case,one may wish to consider the results in [12] for polar codingover a binary side information channel, and combine them withour method.Another possibility for further research is the consideration of

other codes or decoding methods in our scheme. For example,instead of polar source codes, one may consider low-densitygenerating-matrix codes that were shown in the past to be usefulfor lossy source coding. Even if polar codes are kept in ourscheme, it may be possible to improve performance by usingiterative encoding combined with decimation instead of SC en-coding. This is due to the fact that iterative decoding usuallyyields better results compared to SC decoding of polar codes

[11], [21]. One may also consider using list decoding of polarcodes as proposed in [24].Finally, all the theoretical results on polar codes, including

our results for polar WOM codes, are formulated for a block-length which is sufficiently large. It would be interesting to de-rive the blocklength required in order to be able to achieve a pre-scribed gap between the maximum sum rate and the actual sumrate. This problem is closely related to the blocklength requiredto achieve a prescribed gap between the rate distortion functionand the actual rate of a polar source code for some given distor-tion level. This was indicated as an open problem in [12, Sec.IX]. The method in [13] for WOM codes asserts a blocklengthof , where is the number of writes, isthe gap to capacity and the encoding and decoding complexi-ties are . In our experiments, our method required dramat-ically shorter blocklength sizes compared to the method in [13].However, it would be interesting to complement these experi-mental results by a theoretical result on the dependence of theblocklength on the gap to capacity.

APPENDIXDERIVATION OF (40)

For uniformly distributed over we have. In addition,

where


Now,

Hence,

In addition,

Hence,

where the last equality is due to (38). Thus, we conclude that

and we have obtained (40).

ACKNOWLEDGMENT

The authors would like to thank the anonymous reviewers fortheir comments that helped to improve the presentation of thepaper.

REFERENCES[1] R. L. Rivest and A. Shamir, “How to reuse a write-once memory,” Inf.

Control, vol. 55, no. 1–3, pp. 1–19, 1982.[2] E. Yaakobi, S. Kayser, P. H. Siegel, A. Vardy, and J. K. Wolf, “Codes

for write-once memories,” IEEE Trans. Inf. Theory, vol. 58, no. 9, pp.5985–5999, Sep. 2012.

[3] C. Heegard, “On the capacity of permanent memory,” IEEE Trans. Inf.Theory, vol. IT-31, no. 1, pp. 34–42, Jan. 1985.

[4] G. Cohen, P. Godlewski, and F. Merkx, “Linear binary code forwrite-once memories,” IEEE Trans. Inf. Theory, vol. IT-32, no. 5, pp.697–700, Sep. 1986.

[5] G. Zemor and G. D. Cohen, “Error-correcting WOM-codes,” IEEETrans. Inf. Theory, vol. 37, no. 3, pp. 730–734, May 1991.

[6] A. Jiang and J. Bruck, “Joint coding for flash memory storage,” inProc. IEEE Int. Symp. Inf. Theory, Toronto, ON, Canada, Jul. 2008,pp. 1741–1745.

[7] Y. Wu, “Low complexity codes for writing a write-once memorytwice,” in Proc. IEEE Int. Symp. Inf. Theory, Austin, TX, USA, Jun.2010, pp. 1928–1932.

[8] Y. Wu and A. Jiang, “Position modulation code for rewritingwrite-once memories,” IEEE Trans. Inf. Theory, vol. 57, no. 6, pp.3692–3697, Jun. 2011.

[9] A. Shpilka, New Constructions of WOM Codes Using the WozencraftEnsemble Oct. 2011, Arxiv preprint arXiv:1110.6590.

[10] D. Burshtein and A. Strugatski, “Polar write once memory codes,” inProc. IEEE Int. Symp. Inf. Theory, Boston, MA, USA, Jul. 2012, pp.1972–1976.

[11] E. Arikan, “Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels,”IEEE Trans. Inf. Theory, vol. 55, no. 7, pp. 3051–3073, Jul. 2009.

[12] S. B. Korada and R. L. Urbanke, “Polar codes are optimal for lossysource coding,” IEEE Trans. Inf. Theory, vol. 56, no. 4, pp. 1751–1768,Apr. 2010.

[13] A. Shpilka, Capacity achieving multiwrite WOM codes Sep. 2012,Arxiv preprint arXiv:1209.1128.

[14] E. Sasoglu, E. Telatar, and E. Arikan, “Polarization for arbitrarydiscrete memoryless channels,” in Proc. IEEE Inf. Theory Workshop,2009, pp. 144–148.

[15] E. Arikan and E. Telatar, “On the rate of channel polarization,” in Proc.IEEE Int. Symp. Inf. Theory, 2009, pp. 1493–1495.

[16] T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nded. New York, NY, USA: Wiley, 2006.

[17] I. Csiszár and J. Körner, Information Theory: Coding Theoremsfor Discrete Memoryless Systems, 2nd ed. Budapest, Hungary:Akadémiai Kiadó, Dec. 1997.

[18] M. Karzand and E. Telatar, “Polar codes for q-ary source coding,” inProc. IEEE Int. Symp. Inf. Theory, Austin, TX, USA, Jun. 2010, pp.909–912.

[19] F. W. Fu and A. J. Han Vinck, “On the capacity of generalized write-once memory with state transitions described by an arbitrary directedacyclic graph,” IEEE Trans. Inf. Theory, vol. 45, no. 1, pp. 308–313,Jan. 1999.

[20] R. G. Gallager, Information Theory and Reliable Communication.New York, NY, USA: Wiley, 1968.

[21] S. B. Korada, “Polar codes for channel and source coding,” Ph.D. dis-sertation, EPFL, Lausanne, Switzerland, 2009.

[22] R. Mori and T. Tanaka, “Performance and construction of polar codeson symmetric binary-input memoryless channels,” in Proc. IEEE Int.Symp. Inf. Theory, Seoul, Korea, Jun. 2009, pp. 1496–1500.

[23] L. Wang and Y. H. Kim, “Sum-capacity of multiple-write noisymemory,” in Proc. IEEE Int. Symp. Inf. Theory, Saint Petersburg,Russia, Aug. 2011, pp. 2527–2531.

[24] I. Tal and A. Vardy, “List decoding of polar codes,” in Proc. IEEE Int.Symp. Inf. Theory, Saint Petersburg, Russia, Aug. 2011, pp. 1–5.

David Burshtein (M’92–SM’99) received the B.Sc. and Ph.D. degrees in elec-trical engineering in 1982 and 1987, respectively, from Tel-Aviv University,Tel-Aviv, Israel. During 1988–1989, he was a Research Staff Member in theSpeech Recognition Group of IBM, T. J. Watson Research Center, YorktownHeights, NY. In 1989, he joined the School of Electrical Engineering, Tel-AvivUniversity, where he is presently a Professor. His research interests include in-formation theory, coding theory, codes on graphs and iterative decoding algo-rithms. Dr. Burshtein serves as an Associate Editor for Coding Techniques forthe IEEE TRANSACTIONS ON INFORMATION THEORY.

Alona Strugatski received the B.Sc. degree in Electrical Engineering andPhysics in 2009 from the Israel Institute of Technology (Technion). Since2009 she is with the Israel Defence forces, working as an engineer. She iscurrently working toward the M.Sc. degree at the department of ElectricalEngineering-Systems at Tel-Aviv University, Tel Aviv, Israel. Her researchinterests include codes on graphs and iterative decoding algorithms, codingtheory, and information theory.

polar write once memory codes

Documents