[1em] universal information processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfinformationprocessingsystem...

163
Universal Information Processing Young-Han Kim Department of Electrical and Computer Engineering University of California, San Diego Pattern Recognition & Machine Learning Summer School August ,

Upload: others

Post on 31-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Universal Information Processing

Young-Han Kim

Department of Electrical and Computer Engineering

University of California, San Diego

Pattern Recognition & Machine Learning Summer School

August ,

Page 2: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Information processing system

X YSystem

∙ Tasks: Compression, prediction, portfolio selection, filtering, estimation,

denoising, decoding, classification, closeness testing, control, . . .

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 3: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Information processing system

X YSystem

∙ Tasks: Compression, prediction, portfolio selection, filtering, estimation,

denoising, decoding, classification, closeness testing, control, . . .

∙ Algorithms:

é Causal vs. noncausal

é Deterministic vs. randomized

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 4: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Information processing system

X YSystem

∙ Tasks: Compression, prediction, portfolio selection, filtering, estimation,

denoising, decoding, classification, closeness testing, control, . . .

∙ Algorithms:

é Causal vs. noncausal

é Deterministic vs. randomized

∙ Universality:

é Probabilistic setting: “Optimal” performance when the source distribution is unknown

é Deterministic setting: “Optimal” performance among a class of algorithms

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 5: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Information processing system

X YSystem

∙ Tasks: Compression, prediction, portfolio selection, filtering, estimation,

denoising, decoding, classification, closeness testing, control, . . .

∙ Algorithms:

é Causal vs. noncausal

é Deterministic vs. randomized

∙ Universality:

é Probabilistic setting: “Optimal” performance when the source distribution is unknown

é Deterministic setting: “Optimal” performance among a class of algorithms

∙ Questions:

é Existence: Does there exist a universal algorithm for a given task?

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 6: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Information processing system

X YSystem

∙ Tasks: Compression, prediction, portfolio selection, filtering, estimation,

denoising, decoding, classification, closeness testing, control, . . .

∙ Algorithms:

é Causal vs. noncausal

é Deterministic vs. randomized

∙ Universality:

é Probabilistic setting: “Optimal” performance when the source distribution is unknown

é Deterministic setting: “Optimal” performance among a class of algorithms

∙ Questions:

é Existence: Does there exist a universal algorithm for a given task?

é Complexity: Is it practical to implement?

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 7: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Information processing system

X YSystem

∙ Tasks: Compression, prediction, portfolio selection, filtering, estimation,

denoising, decoding, classification, closeness testing, control, . . .

∙ Algorithms:

é Causal vs. noncausal

é Deterministic vs. randomized

∙ Universality:

é Probabilistic setting: “Optimal” performance when the source distribution is unknown

é Deterministic setting: “Optimal” performance among a class of algorithms

∙ Questions:

é Existence: Does there exist a universal algorithm for a given task?

é Complexity: Is it practical to implement?

é Rate of convergence and nonasymptotic behavior: Does it perform “well” in real life?

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 8: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Probabilistic setting

∙ Source

é Parametric: X ∼ pθ(x), θ ∈ T

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 9: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Probabilistic setting

∙ Source

é Parametric: X ∼ pθ(x), θ ∈ T

é Nonparametric: i.i.d., Markov, hidden Markov, stationary ergodic, . . .

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 10: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Probabilistic setting

∙ Source

é Parametric: X ∼ pθ(x), θ ∈ T

é Nonparametric: i.i.d., Markov, hidden Markov, stationary ergodic, . . .

∙ Game: Nature chooses the distribution

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 11: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Probabilistic setting

∙ Source

é Parametric: X ∼ pθ(x), θ ∈ T

é Nonparametric: i.i.d., Markov, hidden Markov, stationary ergodic, . . .

∙ Game: Nature chooses the distribution

∙ Performance: Measured by an expected cost or reward

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 12: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Probabilistic setting

∙ Source

é Parametric: X ∼ pθ(x), θ ∈ T

é Nonparametric: i.i.d., Markov, hidden Markov, stationary ergodic, . . .

∙ Game: Nature chooses the distribution

∙ Performance: Measured by an expected cost or reward

∙ Goal: Achieve performance of optimal algorithm without prior knowledge of X

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 13: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Example: Universal compression

Xn . . . Xn

Encoder Decoder

∙ Let X = {Xi} be a stationary ergodic source over a finite alphabetX

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 14: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Example: Universal compression

Xn . . . Xn

Encoder Decoder

∙ Let X = {Xi} be a stationary ergodic source over a finite alphabetX∙ Minimum compression rate (average number of bits per source symbol):

H(X) = limn→∞

nH(Xn)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 15: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Example: Universal compression

Xn . . . Xn

Encoder Decoder

∙ Let X = {Xi} be a stationary ergodic source over a finite alphabetX∙ Minimum compression rate (average number of bits per source symbol):

H(X) = limn→∞

nH(Xn)

∙ Universal compression algorithms achieveH(X) for all stationary ergodic X

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 16: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Example: Universal compression

Xn . . . Xn

Encoder Decoder

∙ Let X = {Xi} be a stationary ergodic source over a finite alphabetX∙ Minimum compression rate (average number of bits per source symbol):

H(X) = limn→∞

nH(Xn)

∙ Universal compression algorithms achieveH(X) for all stationary ergodic X∙ Examples: Lempel–Ziv, Burrows–Wheeler transform, context-tree weighting

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 17: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Example: Universal compression

Xn . . . Xn

Encoder Decoder

∙ Let X = {Xi} be a stationary ergodic source over a finite alphabetX∙ Minimum compression rate (average number of bits per source symbol):

H(X) = limn→∞

nH(Xn)

∙ Universal compression algorithms achieveH(X) for all stationary ergodic X∙ Examples: Lempel–Ziv, Burrows–Wheeler transform, context-tree weighting

∙ These algorithms can be implemented efficiently and perform well on real data

é LZ: compress, GIF, and TIFF

é LZ: gzip, PNG, and PDF

é BWT: bzip2

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 18: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Deterministic setting

∙ Source:

é An arbitrary sequence

é No statistical assumption (real world applications)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 19: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Deterministic setting

∙ Source:

é An arbitrary sequence

é No statistical assumption (real world applications)

∙ Game: Nature chooses the sequence (oftenmaliciously)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 20: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Deterministic setting

∙ Source:

é An arbitrary sequence

é No statistical assumption (real world applications)

∙ Game: Nature chooses the sequence (oftenmaliciously)

∙ Performance: Measured by a cost or reward for the specific sequence

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 21: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Deterministic setting

∙ Source:

é An arbitrary sequence

é No statistical assumption (real world applications)

∙ Game: Nature chooses the sequence (oftenmaliciously)

∙ Performance: Measured by a cost or reward for the specific sequence

∙ Reference class of algorithms

é Memoryless, sliding-window, finite-state, . . .

é More generally, any class of “experts”

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 22: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Deterministic setting

∙ Source:

é An arbitrary sequence

é No statistical assumption (real world applications)

∙ Game: Nature chooses the sequence (oftenmaliciously)

∙ Performance: Measured by a cost or reward for the specific sequence

∙ Reference class of algorithms

é Memoryless, sliding-window, finite-state, . . .

é More generally, any class of “experts”

∙ Goal: Achieve performance of the optimal algorithm for each sequence

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 23: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Example: Sequential prediction

∙ Multiple choice exam: Guess the answers one by one

∙ True answers (A, B, C, or D) are revealed right after

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 24: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Example: Sequential prediction

∙ Multiple choice exam: Guess the answers one by one

∙ True answers (A, B, C, or D) are revealed right after

∙ Reference class of algorithms: Constant testers

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 25: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Example: Sequential prediction

∙ Multiple choice exam: Guess the answers one by one

∙ True answers (A, B, C, or D) are revealed right after

∙ Reference class of algorithms: Constant testers

∙ Universal prediction algorithms achieve score of best tester for every exam

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 26: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Example: Sequential prediction

∙ Multiple choice exam: Guess the answers one by one

∙ True answers (A, B, C, or D) are revealed right after

∙ Reference class of algorithms: Constant testers

∙ Universal prediction algorithms achieve score of best tester for every exam

∙ Examples: Cover’s binary prediction, Feder–Merhav–Gutman algorithm

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 27: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

This tutorial

∙ Universal information processing: Get something out of nothing!

é Intersection of statistics, information theory, and learning theory

é Many different terminologies, problems, approaches, and flavors

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 28: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

This tutorial

∙ Universal information processing: Get something out of nothing!

é Intersection of statistics, information theory, and learning theory

é Many different terminologies, problems, approaches, and flavors

∙ Goal: Provide a gentle overview of information-theoretic approaches

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 29: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Outline

∙ Review of information measures

∙ Lossless compression and probability assignment (probabilistic / deterministic)

∙ Portfolio selection (deterministic)

∙ Sequential prediction (probabilistic)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 30: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Entropy

∙ Entropy of a discrete random variable X ∼ p(x) (pmf), X ∈ X :

H(X) = −H p(x) log p(x) = − EX �log p(X)�é Nonnegative and concave function of p(x)é H(X) ≤ log |X | (by Jensen’s inequality)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 31: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Entropy

∙ Entropy of a discrete random variable X ∼ p(x) (pmf), X ∈ X :

H(X) = −H p(x) log p(x) = − EX �log p(X)�é Nonnegative and concave function of p(x)é H(X) ≤ log |X | (by Jensen’s inequality)

∙ Binary entropy function: For p ∈ [, ],H(p) = −p log p − ( − p) log( − p)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 32: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Entropy

∙ Entropy of a discrete random variable X ∼ p(x) (pmf), X ∈ X :

H(X) = −H p(x) log p(x) = − EX �log p(X)�é Nonnegative and concave function of p(x)é H(X) ≤ log |X | (by Jensen’s inequality)

∙ Binary entropy function: For p ∈ [, ],H(p) = −p log p − ( − p) log( − p)

∙ Conditional entropy (equivocation): If X ∼ F(x) and Y | {X = x} ∼ p(y|x),H(Y |X) = XH(Y |X = x) dF(x) = − EX,Y�log p(Y |X)�

é H(Y |X) ≤ H(Y) (with equality if X and Y are independent)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 33: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Entropy

∙ Joint entropy: If (X, Y) ∼ p(x, y),H(X, Y) = H(X) +H(Y |X) = H(Y) +H(X |Y)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 34: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Entropy

∙ Joint entropy: If (X, Y) ∼ p(x, y),H(X, Y) = H(X) +H(Y |X) = H(Y) +H(X |Y)

∙ Chain rule:

H(Xn) = nHi=

H(Xi |Xi−)é H(Xn) ≤ ∑n

i= H(Xi) (with equality if X ,X , . . . ,Xn are independent)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 35: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Entropy

∙ Joint entropy: If (X, Y) ∼ p(x, y),H(X, Y) = H(X) +H(Y |X) = H(Y) +H(X |Y)

∙ Chain rule:

H(Xn) = nHi=

H(Xi |Xi−)é H(Xn) ≤ ∑n

i= H(Xi) (with equality if X ,X , . . . ,Xn are independent)

∙ Entropy rate: For a stationary random process X = {Xi},H(X) = lim

n→∞

nH(Xn) = lim

n→∞H(Xn |Xn−)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 36: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Relative entropy

∙ Relative entropy (Kullback-Leibler divergence) of a pair of pmfs p(x) and q(x):D(p‖q) = D(p(x)‖q(x))

= H p(x) log p(x)q(x)

= E�log p(X)q(X)�

é Nonnegativity: D(p‖q) ≥ andD(p‖q) = iff p ≡ q

é Convexity: D(p‖q) is convex in (p, q), i.e., for any (p , q), (p , q), λ ∈ [, ],λD(p‖q) + λ̄D(p‖q) ≥ D(λp + λ̄p‖λq + λ̄q)

é Chain rule: For any p(x, y) and q(x, y),D(p(x, y)‖q(x, y)) = D(p(x)‖q(x)) +H

x

p(x)D(p(y|x)‖q(y|x))

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 37: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Mutual information

∙ Mutual information between (X, Y) ∼ p(x, y):I(X; Y) = D�p(x, y)‖p(x)p(y)�

= Hx,y

p(x, y) log p(x, y)p(x)p(y)

= Hx

p(x)D(p(y|x)‖p(y)) (chain rule)

= H(X) −H(X |Y)= H(Y) −H(Y |X)

é Nonnegative function of p(x, y) = p(x)p(y|x)é Concave in p(x) for fixed p(y|x) and convex in p(y|x) for fixed p(x)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 38: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Mutual information

∙ Mutual information between (X, Y) ∼ p(x, y):I(X; Y) = D�p(x, y)‖p(x)p(y)�

= Hx,y

p(x, y) log p(x, y)p(x)p(y)

= Hx

p(x)D(p(y|x)‖p(y)) (chain rule)

= H(X) −H(X |Y)= H(Y) −H(Y |X)

é Nonnegative function of p(x, y) = p(x)p(y|x)é Concave in p(x) for fixed p(y|x) and convex in p(y|x) for fixed p(x)é Information capacity of a channel (conditional pmf) p(y|x):

maxp(x)

I(X; Y) = maxp(x)

minq(y)

Hx

p(x)D(p(y|x)‖q(y)) (nonnegativity)

= minq(y)

maxp(x)

Hx

p(x)D(p(y|x)‖q(y)) (minimax theorem)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 39: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Outline

∙ Review of information measures

∙ Lossless compression and probability assignment (probabilistic / deterministic)

∙ Portfolio selection (deterministic)

∙ Sequential prediction (probabilistic)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 40: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Lossless source codes

∙ LetX be a finite alphabet

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 41: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Lossless source codes

∙ LetX be a finite alphabet

∙ Binary source code C : X ∗ → {, }∗

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 42: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Lossless source codes

∙ LetX be a finite alphabet

∙ Binary source code C : X ∗ → {, }∗∙ Prefix (instantaneous) code C : X → {, }∗

é No codeword is a prefix of another

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 43: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Lossless source codes

∙ LetX be a finite alphabet

∙ Binary source code C : X ∗ → {, }∗∙ Prefix (instantaneous) code C : X → {, }∗

é No codeword is a prefix of another

∙ Example:

é X = {a, b, c}, C(a) = , C(b) = , C(c) =

é C(acab) =

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 44: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Lossless source codes

∙ LetX be a finite alphabet

∙ Binary source code C : X ∗ → {, }∗∙ Prefix (instantaneous) code C : X → {, }∗

é No codeword is a prefix of another

∙ Example:

é X = {a, b, c}, C(a) = , C(b) = , C(c) =

é C(acab) =

∙ Kraft inequality: l(x) = |C(x)| is the length function of a prefix code iff

H −l(x) ≤

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 45: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Lossless source codes

∙ LetX be a finite alphabet

∙ Binary source code C : X ∗ → {, }∗∙ Prefix (instantaneous) code C : X → {, }∗

é No codeword is a prefix of another

∙ Example:

é X = {a, b, c}, C(a) = , C(b) = , C(c) =

é C(acab) =

∙ Kraft inequality: l(x) = |C(x)| is the length function of a prefix code iff

H −l(x) ≤

∙ Example: / + / + / <

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 46: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Average codeword length

∙ Performance of a prefix code is measured by its average codeword length

L = E(l(X)) = H p(x)l(x)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 47: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Average codeword length

∙ Performance of a prefix code is measured by its average codeword length

L = E(l(X)) = H p(x)l(x)∙ Relationship between l(x) and p(x)

é l(x) should be small for large p(x) and large for small p(x)é The Kraft inequality suggests

p(x) ⇔ −l(x) ,

log(/p(x)) ⇔ l(x)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 48: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Average codeword length of the optimal code

Theorem

If C∗(x) is a prefix code that minimizes L = E l(X) = ∑ p(x)l(x), thenH(X) ≤ L∗ < H(X) +

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 49: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Average codeword length of the optimal code

Theorem

If C∗(x) is a prefix code that minimizes L = E l(X) = ∑ p(x)l(x), thenH(X) ≤ L∗ < H(X) +

∙ Entropy is the fundamental limit for prefix codes (up to bit)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 50: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Average codeword length of the optimal code

Theorem

If C∗(x) is a prefix code that minimizes L = E l(X) = ∑ p(x)l(x), thenH(X) ≤ L∗ < H(X) +

∙ Entropy is the fundamental limit for prefix codes (up to bit)

∙ If X = {Xi} is stationary ergodic, then L∗(Xn)/n → H(X) (entropy rate)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 51: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Average codeword length of the optimal code

Theorem

If C∗(x) is a prefix code that minimizes L = E l(X) = ∑ p(x)l(x), thenH(X) ≤ L∗ < H(X) +

∙ Entropy is the fundamental limit for prefix codes (up to bit)

∙ If X = {Xi} is stationary ergodic, then L∗(Xn)/n → H(X) (entropy rate)∙ Lower bound: Kraft inequality and nonnegativity of relative entropy

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 52: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Average codeword length of the optimal code

Theorem

If C∗(x) is a prefix code that minimizes L = E l(X) = ∑ p(x)l(x), thenH(X) ≤ L∗ < H(X) +

∙ Entropy is the fundamental limit for prefix codes (up to bit)

∙ If X = {Xi} is stationary ergodic, then L∗(Xn)/n → H(X) (entropy rate)∙ Lower bound: Kraft inequality and nonnegativity of relative entropy

∙ Upper bound: Consider l(x) = ⌈log(/p(x))⌉ < log(/p(x)) +

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 53: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Average codeword length of the optimal code

Theorem

If C∗(x) is a prefix code that minimizes L = E l(X) = ∑ p(x)l(x), thenH(X) ≤ L∗ < H(X) +

∙ Entropy is the fundamental limit for prefix codes (up to bit)

∙ If X = {Xi} is stationary ergodic, then L∗(Xn)/n → H(X) (entropy rate)∙ Lower bound: Kraft inequality and nonnegativity of relative entropy

∙ Upper bound: Consider l(x) = ⌈log(/p(x))⌉ < log(/p(x)) +

∙ Equivalence between compression and probability assignment:

−l∗(x) ≃ p(x) or equivalently l∗(x) ≃ log(/p(x))

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 54: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Average codeword length of the optimal code

Theorem

If C∗(x) is a prefix code that minimizes L = E l(X) = ∑ p(x)l(x), thenH(X) ≤ L∗ < H(X) +

∙ Entropy is the fundamental limit for prefix codes (up to bit)

∙ If X = {Xi} is stationary ergodic, then L∗(Xn)/n → H(X) (entropy rate)∙ Lower bound: Kraft inequality and nonnegativity of relative entropy

∙ Upper bound: Consider l(x) = ⌈log(/p(x))⌉ < log(/p(x)) +

∙ Equivalence between compression and probability assignment:

−l∗(x) ≃ p(x) or equivalently l∗(x) ≃ log(/p(x))

∙ Arithmetic coding: Efficient translation of probabilities p(xi|xi−) to code phrases

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 55: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Source coding with mismatch

∙ Suppose X ∼ p(x)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 56: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Source coding with mismatch

∙ Suppose X ∼ p(x)∙ But we design a code assuming X ∼ q(x), i.e., |C(x)| = log(/q(x))

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 57: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Source coding with mismatch

∙ Suppose X ∼ p(x)∙ But we design a code assuming X ∼ q(x), i.e., |C(x)| = log(/q(x))∙ What is the regret (redundancy) of this mismatch?

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 58: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Source coding with mismatch

∙ Suppose X ∼ p(x)∙ But we design a code assuming X ∼ q(x), i.e., |C(x)| = log(/q(x))∙ What is the regret (redundancy) of this mismatch?

∙ Let

L = H p(x) log

q(x) ,L∗ = H p(x) log

p(x) = H(X)Then

R := L − L∗ = H p(x) log p(x)q(x) = D(p‖q) ≥

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 59: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Minimax redundancy

∙ Now suppose X ∼ pθ for some unknown θ ∈ T

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 60: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Minimax redundancy

∙ Now suppose X ∼ pθ for some unknown θ ∈ T

∙ How should we design our code (i.e., choose q(x))?

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 61: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Minimax redundancy

∙ Now suppose X ∼ pθ for some unknown θ ∈ T

∙ How should we design our code (i.e., choose q(x))?Minimax redundancy theorem

R∗ = minq(x)

maxθ∈T

D(pθ‖q)= max

F(θ)I(Θ; X)

Moreover,

q∗(x) = X pθ(x)dF∗(θ)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 62: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Minimax redundancy

∙ Now suppose X ∼ pθ for some unknown θ ∈ T

∙ How should we design our code (i.e., choose q(x))?Minimax redundancy theorem

R∗ = minq(x)

maxθ∈T

D(pθ‖q)= max

F(θ)I(Θ; X)

Moreover,

q∗(x) = X pθ(x)dF∗(θ)∙ Lagrange duality (KKT condition)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 63: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Minimax redundancy

∙ Now suppose X ∼ pθ for some unknown θ ∈ T

∙ How should we design our code (i.e., choose q(x))?Minimax redundancy theorem

R∗ = minq(x)

maxθ∈T

D(pθ‖q)= max

F(θ)I(Θ; X)

Moreover,

q∗(x) = X pθ(x)dF∗(θ)∙ Lagrange duality (KKT condition)

∙ Each q(x) leads to an upper bound on R∗, while each F(θ) leads to a lower bound

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 64: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Minimax redundancy of Bernoulli sources

∙ Let X , X , . . . be i.i.d. ∼ Bern(θ), θ ∈ [, ]

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 65: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Minimax redundancy of Bernoulli sources

∙ Let X , X , . . . be i.i.d. ∼ Bern(θ), θ ∈ [, ]∙ If θ is known, then L∗(Xn) = nH(θ)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 66: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Minimax redundancy of Bernoulli sources

∙ Let X , X , . . . be i.i.d. ∼ Bern(θ), θ ∈ [, ]∙ If θ is known, then L∗(Xn) = nH(θ)∙ What is the best performance with no prior knowledge of θ?

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 67: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Minimax redundancy of Bernoulli sources

∙ Let X , X , . . . be i.i.d. ∼ Bern(θ), θ ∈ [, ]∙ If θ is known, then L∗(Xn) = nH(θ)∙ What is the best performance with no prior knowledge of θ?

Theorem

R∗(Xn) =

log n + o()

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 68: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Minimax redundancy of Bernoulli sources

∙ Let X , X , . . . be i.i.d. ∼ Bern(θ), θ ∈ [, ]∙ If θ is known, then L∗(Xn) = nH(θ)∙ What is the best performance with no prior knowledge of θ?

Theorem

R∗(Xn) =

log n + o()

∙ We focus on the upper bound on R∗

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 69: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Laplace mixture

∙ We first try Θ ∼ Unif [, ]

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 70: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Laplace mixture

∙ We first try Θ ∼ Unif [, ]∙ Since

pθ(xn) = θk(xn)( − θ)n−k(xn) ,

where k = k(xn) is the number of s in xn , the mixture probability is

qL(xn) = X

θk( − θ)n−kdθ = �n

k�(n + )

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 71: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Laplace mixture

∙ We first try Θ ∼ Unif [, ]∙ Since

pθ(xn) = θk(xn)( − θ)n−k(xn) ,

where k = k(xn) is the number of s in xn , the mixture probability is

qL(xn) = X

θk( − θ)n−kdθ = �n

k�(n + )

∙ This mixture has a simple horizon-free sequential probability assignment,

namely,

qL(|xn) = k +

n +

(What is the probability that the sun will rise tomorrowmorning?)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 72: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Laplace mixture

∙ Corresponding redundancy:

R = maxθ

D(pθ‖q)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 73: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Laplace mixture

∙ Corresponding redundancy:

R = maxθ

D(pθ‖q)= max

θ

Hxn

pθ(xn) log pθ(xn)qL(xn)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 74: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Laplace mixture

∙ Corresponding redundancy:

R = maxθ

D(pθ‖q)= max

θ

Hxn

pθ(xn) log pθ(xn)qL(xn)

= maxθ

Hxn

θk( − θ)n−k log θk( − θ)n−kqL(xn)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 75: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Laplace mixture

∙ Corresponding redundancy:

R = maxθ

D(pθ‖q)= max

θ

Hxn

pθ(xn) log pθ(xn)qL(xn)

= maxθ

Hxn

θk( − θ)n−k log θk( − θ)n−kqL(xn)

= maxθ

Hxn

θk( − θ)n−k log �θk( − θ)n−k�nk�(n + )�

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 76: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Laplace mixture

∙ Corresponding redundancy:

R = maxθ

D(pθ‖q)= max

θ

Hxn

pθ(xn) log pθ(xn)qL(xn)

= maxθ

Hxn

θk( − θ)n−k log θk( − θ)n−kqL(xn)

= maxθ

Hxn

θk( − θ)n−k log �θk( − θ)n−k�nk�(n + )�

= maxθ

¨ nHk=

�nk�θk( − θ)n−k log �θk( − θ)n−k�n

k��© + log(n + )

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 77: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Laplace mixture

∙ Corresponding redundancy:

R = maxθ

D(pθ‖q)= max

θ

Hxn

pθ(xn) log pθ(xn)qL(xn)

= maxθ

Hxn

θk( − θ)n−k log θk( − θ)n−kqL(xn)

= maxθ

Hxn

θk( − θ)n−k log �θk( − θ)n−k�nk�(n + )�

= maxθ

¨ nHk=

�nk�θk( − θ)n−k log �θk( − θ)n−k�n

k��© + log(n + )

= maxθ

[−H(Binom(n, θ))] + log(n + )

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 78: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Laplace mixture

∙ Corresponding redundancy:

R = maxθ

D(pθ‖q)= max

θ

Hxn

pθ(xn) log pθ(xn)qL(xn)

= maxθ

Hxn

θk( − θ)n−k log θk( − θ)n−kqL(xn)

= maxθ

Hxn

θk( − θ)n−k log �θk( − θ)n−k�nk�(n + )�

= maxθ

¨ nHk=

�nk�θk( − θ)n−k log �θk( − θ)n−k�n

k��© + log(n + )

= maxθ

[−H(Binom(n, θ))] + log(n + )≤ log(n + )

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 79: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Jeffreys–Krichevsky–Trofimovmixture

∙ We now tryΘ ∼ Beta(/, /), i.e., f (θ) = /(πxθ( − θ))

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 80: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Jeffreys–Krichevsky–Trofimovmixture

∙ We now tryΘ ∼ Beta(/, /), i.e., f (θ) = /(πxθ( − θ))∙ Then, by Stirling’s approximation,

qJKT(xn) = X

θk( − θ)n−kf (θ)dθ ≥ $n

� k

n�k �n − k

n�n−k

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 81: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Jeffreys–Krichevsky–Trofimovmixture

∙ We now tryΘ ∼ Beta(/, /), i.e., f (θ) = /(πxθ( − θ))∙ Then, by Stirling’s approximation,

qJKT(xn) = X

θk( − θ)n−kf (θ)dθ ≥ $n

� k

n�k �n − k

n�n−k

∙ Corresponding redundancy:

R ≤ log($n) =

log n

(Essentially optimal!)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 82: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Jeffreys–Krichevsky–Trofimovmixture

∙ We now tryΘ ∼ Beta(/, /), i.e., f (θ) = /(πxθ( − θ))∙ Then, by Stirling’s approximation,

qJKT(xn) = X

θk( − θ)n−kf (θ)dθ ≥ $n

� k

n�k �n − k

n�n−k

∙ Corresponding redundancy:

R ≤ log($n) =

log n

(Essentially optimal!)

∙ Horizon-free sequential probability assignment:

qJKT(|xn) = k + /n +

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 83: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Extensions

∙ Nonbinary alphabet: Form-ary memoryless source,

R∗ ≃ m −

log n

achieved by Θ ∼ Dirichlet(/, /, . . . , /)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 84: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Extensions

∙ Nonbinary alphabet: Form-ary memoryless source,

R∗ ≃ m −

log n

achieved by Θ ∼ Dirichlet(/, /, . . . , /)∙ Memory:

é There exists q(xn) such that

limn→∞

nD(p(xn)‖q(xn)) =

for every stationary ergodic p(xn)!

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 85: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Extensions

∙ Nonbinary alphabet: Form-ary memoryless source,

R∗ ≃ m −

log n

achieved by Θ ∼ Dirichlet(/, /, . . . , /)∙ Memory:

é There exists q(xn) such that

limn→∞

nD(p(xn)‖q(xn)) =

for every stationary ergodic p(xn)!é Arbitrarily slow convergence

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 86: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Universal compression of individual sequences

∙ Let {pθ : θ ∈ T } be a collection of codes (probability assignments)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 87: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Universal compression of individual sequences

∙ Let {pθ : θ ∈ T } be a collection of codes (probability assignments)

∙ Let q(xn) be our code

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 88: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Universal compression of individual sequences

∙ Let {pθ : θ ∈ T } be a collection of codes (probability assignments)

∙ Let q(xn) be our code∙ Regret for a given sequence xn:

R(q, xn) = log

q(xn) −minθ

log

pθ(xn)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 89: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Universal compression of individual sequences

∙ Let {pθ : θ ∈ T } be a collection of codes (probability assignments)

∙ Let q(xn) be our code∙ Regret for a given sequence xn:

R(q, xn) = log

q(xn) −minθ

log

pθ(xn)∙ Minimax redundancy:

R∗ = minq(xn)

maxxn

R(q, xn)= min

q(xn)maxθ

maxxn

logpθ(xn)q(xn)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 90: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Universal compression of individual sequences

∙ Let {pθ : θ ∈ T } be a collection of codes (probability assignments)

∙ Let q(xn) be our code∙ Regret for a given sequence xn:

R(q, xn) = log

q(xn) −minθ

log

pθ(xn)∙ Minimax redundancy:

R∗ = minq(xn)

maxxn

R(q, xn)= min

q(xn)maxθ

maxxn

logpθ(xn)q(xn)∙ Note: Minimax redundancy in the probabilistic setting

R∗ = minq(xn)

maxθ

D(pθ‖q)= min

q(xn)maxθ

Hxn

pθ(xn) log pθ(xn)q(xn)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 91: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Normalized maximum likelihood code

∙ Normalized maximum likelihood (NML) code:

qNML(xn) ∝ maxθ

pθ(xn)= maxθ pθ(xn)∑yn maxθ pθ(yn)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 92: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Normalized maximum likelihood code

∙ Normalized maximum likelihood (NML) code:

qNML(xn) ∝ maxθ

pθ(xn)= maxθ pθ(xn)∑yn maxθ pθ(yn)

∙ Corresponding regret:

R(qNML , xn) = log

qNML(xn) −minθ

log

pθ(xn) = logHynmaxθ

pθ(yn)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 93: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Normalized maximum likelihood code

∙ Normalized maximum likelihood (NML) code:

qNML(xn) ∝ maxθ

pθ(xn)= maxθ pθ(xn)∑yn maxθ pθ(yn)

∙ Corresponding regret:

R(qNML , xn) = log

qNML(xn) −minθ

log

pθ(xn) = logHynmaxθ

pθ(yn)∙ Equalizer: Regret is independent of xn !

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 94: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Normalized maximum likelihood code

∙ Normalized maximum likelihood (NML) code:

qNML(xn) ∝ maxθ

pθ(xn)= maxθ pθ(xn)∑yn maxθ pθ(yn)

∙ Corresponding regret:

R(qNML , xn) = log

qNML(xn) −minθ

log

pθ(xn) = logHynmaxθ

pθ(yn)∙ Equalizer: Regret is independent of xn !

Theorem

R∗ = R(qNML , xn)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 95: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Normalized maximum likelihood code

∙ Normalized maximum likelihood (NML) code:

qNML(xn) ∝ maxθ

pθ(xn)= maxθ pθ(xn)∑yn maxθ pθ(yn)

∙ Corresponding regret:

R(qNML , xn) = log

qNML(xn) −minθ

log

pθ(xn) = logHynmaxθ

pθ(yn)∙ Equalizer: Regret is independent of xn !

Theorem

R∗ = R(qNML , xn)

∙ qNML is horizon-dependent and difficult to make sequential

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 96: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Example: Memoryless codes

∙ LetX = {, } and pθ(xn) = θk(xn)( − θ)n−k(xn) (memoryless codes)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 97: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Example: Memoryless codes

∙ LetX = {, } and pθ(xn) = θk(xn)( − θ)n−k(xn) (memoryless codes)

∙ Normalized maximum likelihood code:

qNML(xn) = maxθ pθ(xn)∑yn maxθ pθ(yn)= � k(xn)

n�k(xn) � n−k(xn)

n�n−k(xn)

∑yn � k(yn)

n�k(yn) � n−k(yn)

n�n−k(yn)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 98: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Example: Memoryless codes

∙ LetX = {, } and pθ(xn) = θk(xn)( − θ)n−k(xn) (memoryless codes)

∙ Normalized maximum likelihood code:

qNML(xn) = maxθ pθ(xn)∑yn maxθ pθ(yn)= � k(xn)

n�k(xn) � n−k(xn)

n�n−k(xn)

∑yn � k(yn)

n�k(yn) � n−k(yn)

n�n−k(yn)

∙ Minimax regret:

R∗ = logHyn�k(yn)

n�k(yn) �n − k(yn)

n�n−k(yn)

= lognH

k=

�nk�� k

n�k �n − k

n�n−k

log

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 99: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Example: Memoryless codes

∙ As in the probabilistic setting, we now try the mixture approach

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 100: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Example: Memoryless codes

∙ As in the probabilistic setting, we now try the mixture approach

∙ Recall

qJKT(xn) ≥ $n� k

n�k �n − k

n�n−k

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 101: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Example: Memoryless codes

∙ As in the probabilistic setting, we now try the mixture approach

∙ Recall

qJKT(xn) ≥ $n� k

n�k �n − k

n�n−k

∙ Note the maximum likelihood θ = k/n:maxθ

θk( − θ)n−k = �k

n�k �n − k

n�n−k

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 102: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Example: Memoryless codes

∙ As in the probabilistic setting, we now try the mixture approach

∙ Recall

qJKT(xn) ≥ $n� k

n�k �n − k

n�n−k

∙ Note the maximum likelihood θ = k/n:maxθ

θk( − θ)n−k = �k

n�k �n − k

n�n−k

∙ Corresponding redundancy:

R ≤

log n

(Again essentially optimal!)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 103: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Summary and extensions

∙ Normalized regret (for both probabilistic and deterministic settings):

limn→∞

R

n= lim

n→∞

log n

n=

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 104: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Summary and extensions

∙ Normalized regret (for both probabilistic and deterministic settings):

limn→∞

R

n= lim

n→∞

log n

n=

∙ Duality between probabilistic and deterministic settings:

memoryless sources ↔ constant codes

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 105: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Summary and extensions

∙ Normalized regret (for both probabilistic and deterministic settings):

limn→∞

R

n= lim

n→∞

log n

n=

∙ Duality between probabilistic and deterministic settings:

memoryless sources ↔ constant codes

∙ Jeffreys–Krichevsky–Trofimovmixture:

é Essentially optimal redundancy (up to a constant)

é Sequential (causal) probability assignment (arithmetic coding)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 106: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Summary and extensions

∙ Normalized regret (for both probabilistic and deterministic settings):

limn→∞

R

n= lim

n→∞

log n

n=

∙ Duality between probabilistic and deterministic settings:

memoryless sources ↔ constant codes

∙ Jeffreys–Krichevsky–Trofimovmixture:

é Essentially optimal redundancy (up to a constant)

é Sequential (causal) probability assignment (arithmetic coding)

∙ Nonbinary alphabet: R∗ ≃ m−

log n

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 107: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Summary and extensions

∙ Normalized regret (for both probabilistic and deterministic settings):

limn→∞

R

n= lim

n→∞

log n

n=

∙ Duality between probabilistic and deterministic settings:

memoryless sources ↔ constant codes

∙ Jeffreys–Krichevsky–Trofimovmixture:

é Essentially optimal redundancy (up to a constant)

é Sequential (causal) probability assignment (arithmetic coding)

∙ Nonbinary alphabet: R∗ ≃ m−

log n

∙ There exists q(xn) such that

limn→∞

minq(xn)

maxxn

nlog

p(xn)q(xn) =

for all finite-state probability assignments p(xn)!Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 108: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Outline

∙ Review of information measures

∙ Lossless compression and probability assignment (probabilistic / deterministic)

∙ Portfolio selection (deterministic)

∙ Sequential prediction (probabilistic)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 109: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Investment in a stock market

∙ Consider a stock market withm stocks

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 110: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Investment in a stock market

∙ Consider a stock market withm stocks

∙ Let xn denote an arbitrary sequence of price relative vectors

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 111: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Investment in a stock market

∙ Consider a stock market withm stocks

∙ Let xn denote an arbitrary sequence of price relative vectors

∙ Let ai(xi−), i = , , . . ., be a sequence of portfolios (investment strategies)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 112: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Investment in a stock market

∙ Consider a stock market withm stocks

∙ Let xn denote an arbitrary sequence of price relative vectors

∙ Let ai(xi−), i = , , . . ., be a sequence of portfolios (investment strategies)

∙ Wealth at time n:

Sn(a, xn) = nIi=

aTi (xi−)xi

= nIi=

mHj=

aij(xi−)xij

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 113: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Investment in a stock market

∙ Consider a stock market withm stocks

∙ Let xn denote an arbitrary sequence of price relative vectors

∙ Let ai(xi−), i = , , . . ., be a sequence of portfolios (investment strategies)

∙ Wealth at time n:

Sn(a, xn) = nIi=

aTi (xi−)xi

= nIi=

mHj=

aij(xi−)xij∙ Minimax regret:

R∗ = minb

maxxn

maxa

log Sn(a, xn) − log Sn(b, xn) = minb

maxa

maxxn

logSn(a, xn)Sn(b, xn)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 114: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Constant rebalanced portfolios

∙ Reference class of portfolios: ai(xi−) ≡ a, i = , , . . .

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 115: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Constant rebalanced portfolios

∙ Reference class of portfolios: ai(xi−) ≡ a, i = , , . . .

∙ Minimax regret:

R∗ = minb

maxa

maxxn

logSn(a, xn)Sn(b, xn)

= log¬ Hn+n+⋅⋅⋅+nm=n

� n

n , n , . . . , nm� mI

i=

�nin�ni­

≃ m −

log

n

π+ log

Γ(/)mΓ(m/)

(the minimax regret for binary sequence compression!)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 116: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Constant rebalanced portfolios

∙ Reference class of portfolios: ai(xi−) ≡ a, i = , , . . .

∙ Minimax regret:

R∗ = minb

maxa

maxxn

logSn(a, xn)Sn(b, xn)

= log¬ Hn+n+⋅⋅⋅+nm=n

� n

n , n , . . . , nm� mI

i=

�nin�ni­

≃ m −

log

n

π+ log

Γ(/)mΓ(m/)

(the minimax regret for binary sequence compression!)

∙ Form = ,

R∗ = lognHj=

�nj�� j

n�j �n − j

n�n−j ≃

log

πn

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 117: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Probability assignment and portfolio

∙ Each probability assignment p(yn) on {, , . . . ,m}n induces a portfolioaiyi (xi−) = ∑yi− p(yi)x(yi−)∑yi− p(yi−)x(yi−)

where x(yn) = ∏ni= xiyi

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 118: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Probability assignment and portfolio

∙ Each probability assignment p(yn) on {, , . . . ,m}n induces a portfolioaiyi (xi−) = ∑yi− p(yi)x(yi−)∑yi− p(yi−)x(yi−)

where x(yn) = ∏ni= xiyi

∙ By telescoping,

Sn(a, xn) = Hynp(yn)x(yn)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 119: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Probability assignment and portfolio

∙ Each probability assignment p(yn) on {, , . . . ,m}n induces a portfolioaiyi (xi−) = ∑yi− p(yi)x(yi−)∑yi− p(yi−)x(yi−)

where x(yn) = ∏ni= xiyi

∙ By telescoping,

Sn(a, xn) = Hynp(yn)x(yn)

∙ Fund of funds: Each p(yn) invests all money in stocks yi on day i

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 120: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Probability assignment and portfolio

∙ Each probability assignment p(yn) on {, , . . . ,m}n induces a portfolioaiyi (xi−) = ∑yi− p(yi)x(yi−)∑yi− p(yi−)x(yi−)

where x(yn) = ∏ni= xiyi

∙ By telescoping,

Sn(a, xn) = Hynp(yn)x(yn)

∙ Fund of funds: Each p(yn) invests all money in stocks yi on day i

∙ Conversely, each portfolio a induces a probability assignment p(yn)a ⇔ p,

b ⇔ q

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 121: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Probability assignment and portfolio

∙ Each probability assignment p(yn) on {, , . . . ,m}n induces a portfolioaiyi (xi−) = ∑yi− p(yi)x(yi−)∑yi− p(yi−)x(yi−)

where x(yn) = ∏ni= xiyi

∙ By telescoping,

Sn(a, xn) = Hynp(yn)x(yn)

∙ Fund of funds: Each p(yn) invests all money in stocks yi on day i

∙ Conversely, each portfolio a induces a probability assignment p(yn)a ⇔ p,

b ⇔ q

∙ Thus, the question boils down to choosing the right q(yn)Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 122: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Minimax portfolio

∙ Let qNML(yn) be the normalized maximum likelihood pmf

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 123: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Minimax portfolio

∙ Let qNML(yn) be the normalized maximum likelihood pmf

∙ Form = ,

qNML(xn) = � k(xn)n

�k(xn) � n−k(xn)n

�n−k(xn)∑yn � k(yn)

n�k(yn) � n−k(yn)

n�n−k(yn)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 124: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Minimax portfolio

∙ Let qNML(yn) be the normalized maximum likelihood pmf

∙ Form = ,

qNML(xn) = � k(xn)n

�k(xn) � n−k(xn)n

�n−k(xn)∑yn � k(yn)

n�k(yn) � n−k(yn)

n�n−k(yn)

∙ Then

R(bNML , xn) = max

xn

maxa

logSn(a, xn)Sn(b, xn)

= maxxn

maxp

log∑yn p(yn)x(yn)∑yn qNML(yn)x(yn)

≤ maxp

maxyn

logp(yn)

qNML(yn)≃

log

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 125: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Horizon-free portfolios

∙ Laplace mixture:

R(bL , xn) = log �m + n −

m − �

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 126: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Horizon-free portfolios

∙ Laplace mixture:

R(bL , xn) = log �m + n −

m − �

∙ Jeffreys–Krichevsky–Trofimovmixture:

R(bJKT , xn) = m −

log n + log

Γ(/)mΓ(m/) + o()

(Essentially optimal!)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 127: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Horizon-free portfolios

∙ Laplace mixture:

R(bL , xn) = log �m + n −

m − �

∙ Jeffreys–Krichevsky–Trofimovmixture:

R(bJKT , xn) = m −

log n + log

Γ(/)mΓ(m/) + o()

(Essentially optimal!)

∙ For any mixture q(yn) = ∫∏ni= p(yi) dF(p),

biyi = ∑yi− q(yi)x(yj−)∑yi− q(yi−)x(yi−)= ∫ p(yi)Si−(p, xi−) dF(p)

∫ Si−(p, xi−) dF(p)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 128: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Horizon-free portfolios

∙ Laplace mixture:

R(bL , xn) = log �m + n −

m − �

∙ Jeffreys–Krichevsky–Trofimovmixture:

R(bJKT , xn) = m −

log n + log

Γ(/)mΓ(m/) + o()

(Essentially optimal!)

∙ For any mixture q(yn) = ∫∏ni= p(yi) dF(p),

biyi = ∑yi− q(yi)x(yj−)∑yi− q(yi−)x(yi−)= ∫ p(yi)Si−(p, xi−) dF(p)

∫ Si−(p, xi−) dF(p)∙ Fund of funds:

Sn(b, xn) = X Sn(p, xn) dF(p)Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 129: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Outline

∙ Review of information measures

∙ Lossless compression and probability assignment (probabilistic / deterministic)

∙ Portfolio selection (deterministic)

∙ Sequential prediction (probabilistic)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 130: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Sequential prediction

∙ Observation: xn = (x , x , . . . , xn), xi ∈ X

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 131: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Sequential prediction

∙ Observation: xn = (x , x , . . . , xn), xi ∈ X

∙ Action: an = (a , a , . . . , an), ai ∈ A

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 132: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Sequential prediction

∙ Observation: xn = (x , x , . . . , xn), xi ∈ X

∙ Action: an = (a , a , . . . , an), ai ∈ A

∙ Loss function: l : X ×A → [,∞)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 133: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Sequential prediction

∙ Observation: xn = (x , x , . . . , xn), xi ∈ X

∙ Action: an = (a , a , . . . , an), ai ∈ A

∙ Loss function: l : X ×A → [,∞)∙ Cumulative loss: l(xn , an) = ∑n

i= l(xi , ai)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 134: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Sequential prediction

∙ Observation: xn = (x , x , . . . , xn), xi ∈ X

∙ Action: an = (a , a , . . . , an), ai ∈ A

∙ Loss function: l : X ×A → [,∞)∙ Cumulative loss: l(xn , an) = ∑n

i= l(xi , ai)∙ Deterministic strategy: ai(xi− , ai−) = ai(xi−), i ∈ [ : n]

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 135: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Sequential prediction

∙ Observation: xn = (x , x , . . . , xn), xi ∈ X

∙ Action: an = (a , a , . . . , an), ai ∈ A

∙ Loss function: l : X ×A → [,∞)∙ Cumulative loss: l(xn , an) = ∑n

i= l(xi , ai)∙ Deterministic strategy: ai(xi− , ai−) = ai(xi−), i ∈ [ : n]∙ Randomized strategy: F(ai|xi− , ai−), i ∈ [ : n]

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 136: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Sequential prediction

∙ Observation: xn = (x , x , . . . , xn), xi ∈ X

∙ Action: an = (a , a , . . . , an), ai ∈ A

∙ Loss function: l : X ×A → [,∞)∙ Cumulative loss: l(xn , an) = ∑n

i= l(xi , ai)∙ Deterministic strategy: ai(xi− , ai−) = ai(xi−), i ∈ [ : n]∙ Randomized strategy: F(ai|xi− , ai−), i ∈ [ : n]∙ Examples:

é Probability assignment (log loss): a ∈ P(X ) and l(x, a) = − log a(x)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 137: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Sequential prediction

∙ Observation: xn = (x , x , . . . , xn), xi ∈ X

∙ Action: an = (a , a , . . . , an), ai ∈ A

∙ Loss function: l : X ×A → [,∞)∙ Cumulative loss: l(xn , an) = ∑n

i= l(xi , ai)∙ Deterministic strategy: ai(xi− , ai−) = ai(xi−), i ∈ [ : n]∙ Randomized strategy: F(ai|xi− , ai−), i ∈ [ : n]∙ Examples:

é Probability assignment (log loss): a ∈ P(X ) and l(x, a) = − log a(x)é Portfolio selection: a ∈ P(X ) and l(x, a) = − log aTx

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 138: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Bayes envelope

∙ Let X ∼ p(x)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 139: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Bayes envelope

∙ Let X ∼ p(x)∙ Bayes envelope:

U(X) = U(p) = infaE[l(X, a)]

é Concave in p(x) (since it is the infimum over linear functions of p(x))

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 140: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Bayes envelope

∙ Let X ∼ p(x)∙ Bayes envelope:

U(X) = U(p) = infaE[l(X, a)]

é Concave in p(x) (since it is the infimum over linear functions of p(x))∙ Bayes response a∗(X) = a∗(p)

é Any a that attains U(X)é Assumewithout loss of generality deterministic

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 141: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Bayes envelope

∙ Let X ∼ p(x)∙ Bayes envelope:

U(X) = U(p) = infaE[l(X, a)]

é Concave in p(x) (since it is the infimum over linear functions of p(x))∙ Bayes response a∗(X) = a∗(p)

é Any a that attains U(X)é Assumewithout loss of generality deterministic

∙ Example

é For a ∈ P(X ) and l(x, a) = − log a(x),U(X) = H(X), which is attained by a∗ = p

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 142: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Bayes envelope

∙ Let X ∼ p(x)∙ Bayes envelope:

U(X) = U(p) = infaE[l(X, a)]

é Concave in p(x) (since it is the infimum over linear functions of p(x))∙ Bayes response a∗(X) = a∗(p)

é Any a that attains U(X)é Assumewithout loss of generality deterministic

∙ Example

é For a ∈ P(X ) and l(x, a) = − log a(x),U(X) = H(X), which is attained by a∗ = p

∙ Every admissible strategy is a Bayes response to some q(x)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 143: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Bayes envelope

∙ Let X ∼ p(x)∙ Bayes envelope:

U(X) = U(p) = infaE[l(X, a)]

é Concave in p(x) (since it is the infimum over linear functions of p(x))∙ Bayes response a∗(X) = a∗(p)

é Any a that attains U(X)é Assumewithout loss of generality deterministic

∙ Example

é For a ∈ P(X ) and l(x, a) = − log a(x),U(X) = H(X), which is attained by a∗ = p

∙ Every admissible strategy is a Bayes response to some q(x)∙ Conversely, for every q(xn), ai(xi−) = a∗(q(xi|xi−)) is a valid prediction strategy

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 144: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Generalized divergence

∙ Suppose X ∼ p(x)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 145: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Generalized divergence

∙ Suppose X ∼ p(x)∙ Let a∗(q) be the Bayes response to X ∼ q(x)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 146: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Generalized divergence

∙ Suppose X ∼ p(x)∙ Let a∗(q) be the Bayes response to X ∼ q(x)∙ Regret:

Ep[l(X, a∗(q)) − l(X, a∗(p))] = Ep[l(X, a∗(q))] − U(p) =: Δ(p‖q)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 147: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Generalized divergence

∙ Suppose X ∼ p(x)∙ Let a∗(q) be the Bayes response to X ∼ q(x)∙ Regret:

Ep[l(X, a∗(q)) − l(X, a∗(p))] = Ep[l(X, a∗(q))] − U(p) =: Δ(p‖q)∙ Δ(p‖q) ≥ is referred to as the generalized divergence

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 148: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Generalized divergence

∙ Suppose X ∼ p(x)∙ Let a∗(q) be the Bayes response to X ∼ q(x)∙ Regret:

Ep[l(X, a∗(q)) − l(X, a∗(p))] = Ep[l(X, a∗(q))] − U(p) =: Δ(p‖q)∙ Δ(p‖q) ≥ is referred to as the generalized divergence

Lemma

Δ(p‖q) ≤ lmaxy(ln )D(p‖q)where lmax = maxa,x l(x, a)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 149: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Generalized divergence

∙ Suppose X ∼ p(x)∙ Let a∗(q) be the Bayes response to X ∼ q(x)∙ Regret:

Ep[l(X, a∗(q)) − l(X, a∗(p))] = Ep[l(X, a∗(q))] − U(p) =: Δ(p‖q)∙ Δ(p‖q) ≥ is referred to as the generalized divergence

Lemma

Δ(p‖q) ≤ lmaxy(ln )D(p‖q)where lmax = maxa,x l(x, a)∙ Proof idea: Pinsker’s inequality∑ |p(x) − q(x)| ≤ x(ln )D(p||q)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 150: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Consistency of the plug-in strategy

Theorem

If limn→∞(/n)D(p(xn)‖q(xn)) = for every p(xn) ∈ P , then

limn→∞

nΔ(p(xn)‖q(xn)) =

for every p(xn) ∈ P

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 151: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Consistency of the plug-in strategy

Theorem

If limn→∞(/n)D(p(xn)‖q(xn)) = for every p(xn) ∈ P , then

limn→∞

nΔ(p(xn)‖q(xn)) =

for every p(xn) ∈ P

∙ Example (multiple choice exam):

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 152: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Consistency of the plug-in strategy

Theorem

If limn→∞(/n)D(p(xn)‖q(xn)) = for every p(xn) ∈ P , then

limn→∞

nΔ(p(xn)‖q(xn)) =

for every p(xn) ∈ P

∙ Example (multiple choice exam):

é ForΘ ∼ Dirichlet(/, /, /, /),D(p(xn)‖q(xn)) = O((log n)/n)

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 153: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Consistency of the plug-in strategy

Theorem

If limn→∞(/n)D(p(xn)‖q(xn)) = for every p(xn) ∈ P , then

limn→∞

nΔ(p(xn)‖q(xn)) =

for every p(xn) ∈ P

∙ Example (multiple choice exam):

é ForΘ ∼ Dirichlet(/, /, /, /),D(p(xn)‖q(xn)) = O((log n)/n)é R = O(x(log n)/n) → is achievable

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 154: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Consistency of the plug-in strategy

Theorem

If limn→∞(/n)D(p(xn)‖q(xn)) = for every p(xn) ∈ P , then

limn→∞

nΔ(p(xn)‖q(xn)) =

for every p(xn) ∈ P

∙ Example (multiple choice exam):

é ForΘ ∼ Dirichlet(/, /, /, /),D(p(xn)‖q(xn)) = O((log n)/n)é R = O(x(log n)/n) → is achievable

é In other words, the plug-in strategy can track optimal performance

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 155: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Consistency of the plug-in strategy

Theorem

If limn→∞(/n)D(p(xn)‖q(xn)) = for every p(xn) ∈ P , then

limn→∞

nΔ(p(xn)‖q(xn)) =

for every p(xn) ∈ P

∙ Example (multiple choice exam):

é ForΘ ∼ Dirichlet(/, /, /, /),D(p(xn)‖q(xn)) = O((log n)/n)é R = O(x(log n)/n) → is achievable

é In other words, the plug-in strategy can track optimal performance

é Can we do better?

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 156: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Concluding remarks

∙ Compression = probability assignment

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 157: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Concluding remarks

∙ Compression = probability assignment

∙ Investment ≃ probability assignment

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 158: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Concluding remarks

∙ Compression = probability assignment

∙ Investment ≃ probability assignment

∙ “Good” probability assignment⇒ “good” prediction algorithm

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 159: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Concluding remarks

∙ Compression = probability assignment

∙ Investment ≃ probability assignment

∙ “Good” probability assignment⇒ “good” prediction algorithm

∙ JKT/Dirichlet mixture: Best asymptotic probability assignment

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 160: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Concluding remarks

∙ Compression = probability assignment

∙ Investment ≃ probability assignment

∙ “Good” probability assignment⇒ “good” prediction algorithm

∙ JKT/Dirichlet mixture: Best asymptotic probability assignment

∙ We can actually do better than the plug-in strategy

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 161: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Concluding remarks

∙ Compression = probability assignment

∙ Investment ≃ probability assignment

∙ “Good” probability assignment⇒ “good” prediction algorithm

∙ JKT/Dirichlet mixture: Best asymptotic probability assignment

∙ We can actually do better than the plug-in strategy

∙ Regrets for probabilistic and deterministic settings can be different

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 162: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

Concluding remarks

∙ Compression = probability assignment

∙ Investment ≃ probability assignment

∙ “Good” probability assignment⇒ “good” prediction algorithm

∙ JKT/Dirichlet mixture: Best asymptotic probability assignment

∙ We can actually do better than the plug-in strategy

∙ Regrets for probabilistic and deterministic settings can be different

∙ Many more tasks, approaches, and algorithms

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /

Page 163: [1em] Universal Information Processingweb.eng.ucsd.edu/~yhk/pdfs/uip.pdfInformationprocessingsystem X Y System ∙ Tasks: Compression,prediction, portfolioselection,filtering,estimation,

To learn more

∙ Cover and Thomas (), Elements of Information Theory, nd ed, Wiley

∙ Lugosi and Cesa–Bianchi (), Prediction, Learning, and Games, Cambridge

∙ Merhav (), “Universal prediction,” IEEE Trans. Inf. Theory

∙ Willems (), “Lossless source coding algorithms,” IEEE Int. Symp. Inf. Theory

Young-Han Kim (UCSD) Universal Information Processing PRMLSS (August ) /