Download - Module 4 Arithmetic Coding
Module 4, Data Compression 1LISA, NTPU
Module 4Arithmetic Coding
Prof. Hung-Ta Pai
Module 4, Data Compression 2LISA, NTPU
Reals in BinaryAny real number x in the interval [0, 1) can be represented in binary as .b1b2... where bi is a bit
Module 4, Data Compression 3LISA, NTPU
First Conversion
L:=0; R:=1; i :=1;while x > L *
if x < (L+R)/2 then bi := 0; R := (L+R)/2;if x ≥ (L+R)/2 then bi := 1; L := (L+R)/2;i := i + 1;
end{while}bi := 0 for all j ≥ i;
* Invariant: x is always in the interval [L, R)
Module 4, Data Compression 4LISA, NTPU
Basic IdeasRepresent each string x of length n by a unique interval [L, R) in [0, 1)The width of the interval [L, R) represents the probability of x occurringThe interval [L, R) can itself be represented by any number, called a tag, within the half open intervalThe k significant bits of the tag .t1t2t3.... is the code of x
That is, .t1t2t3...tk000... is in the interval [L, R)
Module 4, Data Compression 5LISA, NTPU
Example
1. Tag must be in the half open interval2. Tag can be chosen to be (L+R)/23. Code is the significant bits of the tag
Module 4, Data Compression 6LISA, NTPU
Better Tag
Module 4, Data Compression 7LISA, NTPU
Example of CodesP(a) = 1/3, P(b) = 2/3
Module 4, Data Compression 8LISA, NTPU
Code Generation from TagIf binary tag is .t1t2t3... = (L+R)/2 in [L, R), then we want to choose k to form the code t1t2 ...tkShort code: choose k to be as small as possible so that L ≤ . t1t2 ...tk000... < RGuaranteed code:
Choose k = ⎡log2(1/(R-L))⎤ + 1L ≤ . t1t2 ...tkb1b2b3... < R for any bits b1b2b3... For fixed length strings provides a good prefix codeExample: [.000000000..., .000010010...), tag = .000001001...
Short code: 0Guaranteed code: 000001
Module 4, Data Compression 9LISA, NTPU
Guaranteed Code ExampleP(a) = 1/3, P(b) = 2/3
Guaranteed code -> Prefix code
Module 4, Data Compression 10LISA, NTPU
Coding AlgorithmP(a1), P(a2), ..., P(am)C(ai) = P(a1) + P(a2) + ... +P(ai-1)Encode x1x2...xn
Initialize L := 0; and R:=1;For i = 1 to n do
W := R - L;L := L + W * C(xi);R := L + W * P(xi);
end;t := (L+R)/2; choose code for the tag
Module 4, Data Compression 11LISA, NTPU
Coding ExampleP(a) = 1/4, P(b) = 1/2, P(c) = 1/4C(a) = 0, C(b) =1/4, C(c) = 3/4abca
Module 4, Data Compression 12LISA, NTPU
Coding ExcerciseP(a) = 1/4, P(b) = 1/2, P(c) = 1/4C(a) = 0, C(b) =1/4, C(c) = 3/4bbbb
Module 4, Data Compression 13LISA, NTPU
Decoding (1/3)Assume the length is known to be 30001 which converts to the tag .0001000
Module 4, Data Compression 14LISA, NTPU
Decoding (2/3)Assume the length is known to be 30001 which converts to the tag .0001000
Module 4, Data Compression 15LISA, NTPU
Decoding (3/3)Assume the length is known to be 30001 which converts to the tag .0001000
Module 4, Data Compression 16LISA, NTPU
Decoding AlgorithmP(a1), P(a2), ..., P(am)C(ai) = P(a1) + P(a2) + ... +P(ai-1)Decode b1b2...bm, number of symbols is n
Initialize L := 0; and R:=1;t := b1b2...bm000...for i = 1 to n do
W := R - L;find j such that L + W * C(aj) ≤ t < L + W * (C(aj)+P(aj));output aj;L := L + W * C(aj); R = L + W * P(aj);
Module 4, Data Compression 17LISA, NTPU
Decoding ExampleP(a) = 1/4, P(b) = 1/2, P(c) = 1/4C(a) = 0, C(b) =1/4, C(c) = 3/400101
Module 4, Data Compression 18LISA, NTPU
Decoding IssuesThere are two ways for the decoder to know when to stop decoding
Transmit the length of the stringTransmit a unique end of string symbol
Module 4, Data Compression 19LISA, NTPU
Practical Arithmetic CodingScaling:
By scaling we can keep L and R in a reasonable range of values so that W = R–L does not underflowThe code can be produced progressively, not at the endComplicates decoding some
Integer arithmetic coding avoids floating point altogether
Module 4, Data Compression 20LISA, NTPU
AdaptationSimple solution – Equally Probable Model
Initially all symbols have frequency 1After symbol x is coded, increment its frequency by 1Use the new model for coding the next symbolExample in alphabet a, b, c, d
Module 4, Data Compression 21LISA, NTPU
Zero Frequency ProblemHow do we weight symbols that have not occurred yet?
Equal weight? Not so good with many symbolsEscape symbol, but what should its weight be?When a new symbol is encountered send the <esc>, followed by the symbol in the equally probable model (both encoded arithmetically)
Module 4, Data Compression 22LISA, NTPU
End of File ProblemSimilar to Zero Frequency ProblemReasonable solution:
Add EOF to the post-ESC equally-probable modelWhen done compressing:
First send ESCThen send EOF
What’s the cost of this approach?
Module 4, Data Compression 23LISA, NTPU
Arithmetic vs. HuffmanBoth compress very wellFor m symbol grouping
Huffman is within 1/m of entropyArithmetic is within 2/m of entropy
ContextHuffman needs a tree for every contextArithmetic needs a small table of frequencies for every context
AdaptationHuffman has an elaborate adaptive algorithmArithmetic has a simple adaptive mechanism