theory and practice of di erential entropy estimationyjhan/diff_entropy.pdf · theory and practice...
TRANSCRIPT
![Page 1: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/1.jpg)
Theory and Practice of Differential EntropyEstimation
Yanjun Han (Stanford EE)
Joint work with:
Weihao Gao UIUC ECE
Jiantao Jiao Stanford EE
Tsachy Weissman Stanford EE
Yihong Wu Yale Stats
July 17, 2018
![Page 2: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/2.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Outline
IntroductionProblem SetupRelated Works
Theory: Optimal EstimationEstimator ConstructionEstimator Analysis
Practice: Adaptive EstimationIdea of Nearest NeighborEstimator AnalysisNumerical Results
Conclusion
2 / 42
![Page 3: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/3.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
IntroductionProblem SetupRelated Works
Theory: Optimal EstimationEstimator ConstructionEstimator Analysis
Practice: Adaptive EstimationIdea of Nearest NeighborEstimator AnalysisNumerical Results
Conclusion
3 / 42
![Page 4: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/4.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Motivation
Information-theoretic measures:
I entropy H(X )
I mutual information I (X ;Y )
I Kullback–Leibler divergence D(P‖Q)
Subroutine for many fields and applications:
I machine learning: classification, clustering, feature selection
I causal inference: network flow
I sociology
I computational biology
I ...
4 / 42
![Page 5: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/5.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Motivation
Information-theoretic measures:
I entropy H(X )
I mutual information I (X ;Y )
I Kullback–Leibler divergence D(P‖Q)
Subroutine for many fields and applications:
I machine learning: classification, clustering, feature selection
I causal inference: network flow
I sociology
I computational biology
I ...
4 / 42
![Page 6: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/6.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Problem Formulation
Problem:
I let f be a continuous density supported on [0, 1]d , belongingto some function class F
I observe X n = (X1, · · · ,Xn)i .i .d .∼ f
I estimate the differential entropy of f based on X n:
h(f ) =
∫[0,1]d
−f (x) log f (x)dx
Target: charactermize the minimax risk
infh
supf ∈F
Ef
|h(X n)− h(f )|
5 / 42
![Page 7: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/7.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Problem Formulation
Problem:
I let f be a continuous density supported on [0, 1]d , belongingto some function class F
I observe X n = (X1, · · · ,Xn)i .i .d .∼ f
I estimate the differential entropy of f based on X n:
h(f ) =
∫[0,1]d
−f (x) log f (x)dx
Target: charactermize the minimax risk
infh
supf ∈F
Ef
|h(X n)− h(f )|
5 / 42
![Page 8: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/8.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Problem Formulation
Problem:
I let f be a continuous density supported on [0, 1]d , belongingto some function class F
I observe X n = (X1, · · · ,Xn)i .i .d .∼ f
I estimate the differential entropy of f based on X n:
h(f ) =
∫[0,1]d
−f (x) log f (x)dx
Target: charactermize the minimax risk
infh
supf ∈F
Ef
|h(X n)− h(f )|
5 / 42
![Page 9: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/9.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Problem Formulation
Problem:
I let f be a continuous density supported on [0, 1]d , belongingto some function class F
I observe X n = (X1, · · · ,Xn)i .i .d .∼ f
I estimate the differential entropy of f based on X n:
h(f ) =
∫[0,1]d
−f (x) log f (x)dx
Target: charactermize the minimax risk
infh
supf ∈F
Ef
|h(X n)− h(f )|
5 / 42
![Page 10: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/10.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Problem Formulation
Problem:
I let f be a continuous density supported on [0, 1]d , belongingto some function class F
I observe X n = (X1, · · · ,Xn)i .i .d .∼ f
I estimate the differential entropy of f based on X n:
h(f ) =
∫[0,1]d
−f (x) log f (x)dx
Target: charactermize the minimax risk
infh
supf ∈F
Ef |h(X n)− h(f )|
5 / 42
![Page 11: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/11.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Problem Formulation
Problem:
I let f be a continuous density supported on [0, 1]d , belongingto some function class F
I observe X n = (X1, · · · ,Xn)i .i .d .∼ f
I estimate the differential entropy of f based on X n:
h(f ) =
∫[0,1]d
−f (x) log f (x)dx
Target: charactermize the minimax risk
infh
supf ∈F
Ef |h(X n)− h(f )|
5 / 42
![Page 12: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/12.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Problem Formulation
Problem:
I let f be a continuous density supported on [0, 1]d , belongingto some function class F
I observe X n = (X1, · · · ,Xn)i .i .d .∼ f
I estimate the differential entropy of f based on X n:
h(f ) =
∫[0,1]d
−f (x) log f (x)dx
Target: charactermize the minimax risk
infh
supf ∈F
Ef |h(X n)− h(f )|
5 / 42
![Page 13: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/13.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Choice of Function Class
Holder ball Hsd(L)
I 0 < s ≤ 1: |f (x)− f (y)| ≤ L‖x − y‖s
I 1 < s ≤ 2: ‖∇f (x)−∇f (y)‖ ≤ L‖x − y‖s−1
I intuition: ‖f (s)‖∞ ≤ L
Lipschitz ball (or Besov ball) Lipsp,d(L)
I definition: for any t ∈ Rd ,
‖f (·+ t) + f (· − t)− 2f (·)‖p ≤ L‖t‖s
I intuition: ‖f (s)‖p ≤ L
6 / 42
![Page 14: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/14.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Choice of Function Class
Holder ball Hsd(L)
I 0 < s ≤ 1: |f (x)− f (y)| ≤ L‖x − y‖s
I 1 < s ≤ 2: ‖∇f (x)−∇f (y)‖ ≤ L‖x − y‖s−1
I intuition: ‖f (s)‖∞ ≤ L
Lipschitz ball (or Besov ball) Lipsp,d(L)
I definition: for any t ∈ Rd ,
‖f (·+ t) + f (· − t)− 2f (·)‖p ≤ L‖t‖s
I intuition: ‖f (s)‖p ≤ L
6 / 42
![Page 15: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/15.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Parameters
Reminder of important parameters:
I n: sample size
I d : dimension of support of f
I s ∈ (0, 2]: smoothness parameter of F
7 / 42
![Page 16: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/16.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
IntroductionProblem SetupRelated Works
Theory: Optimal EstimationEstimator ConstructionEstimator Analysis
Practice: Adaptive EstimationIdea of Nearest NeighborEstimator AnalysisNumerical Results
Conclusion
8 / 42
![Page 17: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/17.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Nonparametric Functional Estimation
General ProblemGiven X1, · · · ,Xn ∼ f , we would like to estimate the functional ofthe form
I (f ) =
∫w(f (x))dx
Example
I quadratic functional: I (f ) =∫f (x)2dx
I cubic functional: I (f ) =∫f (x)3dx
9 / 42
![Page 18: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/18.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Nonparametric Functional Estimation
General ProblemGiven X1, · · · ,Xn ∼ f , we would like to estimate the functional ofthe form
I (f ) =
∫w(f (x))dx
Example
I quadratic functional: I (f ) =∫f (x)2dx
I cubic functional: I (f ) =∫f (x)3dx
9 / 42
![Page 19: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/19.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Smooth Functional
I quadratic functional: elbow effect
Theorem (Bickel–Ritov’88)
infI
supf ∈Hs
d
Ef |I − I (f )| n−4s
4s+d + n−12
I cubic functional: same result with much more involvedestimator construction (Kerkyacharian–Picard’96, Tchetgen etal.’08)
I smooth functional: reduce to linear, quadratic and cubic viaTaylor expansion (Mukherjee–Newey–Robins’17)
I almost nothing is known for nonsmooth functionals
10 / 42
![Page 20: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/20.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Smooth Functional
I quadratic functional: elbow effect
Theorem (Bickel–Ritov’88)
infI
supf ∈Hs
d
Ef |I − I (f )| n−4s
4s+d + n−12
I cubic functional: same result with much more involvedestimator construction (Kerkyacharian–Picard’96, Tchetgen etal.’08)
I smooth functional: reduce to linear, quadratic and cubic viaTaylor expansion (Mukherjee–Newey–Robins’17)
I almost nothing is known for nonsmooth functionals
10 / 42
![Page 21: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/21.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Smooth Functional
I quadratic functional: elbow effect
Theorem (Bickel–Ritov’88)
infI
supf ∈Hs
d
Ef |I − I (f )| n−4s
4s+d + n−12
I cubic functional: same result with much more involvedestimator construction (Kerkyacharian–Picard’96, Tchetgen etal.’08)
I smooth functional: reduce to linear, quadratic and cubic viaTaylor expansion (Mukherjee–Newey–Robins’17)
I almost nothing is known for nonsmooth functionals
10 / 42
![Page 22: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/22.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Smooth Functional
I quadratic functional: elbow effect
Theorem (Bickel–Ritov’88)
infI
supf ∈Hs
d
Ef |I − I (f )| n−4s
4s+d + n−12
I cubic functional: same result with much more involvedestimator construction (Kerkyacharian–Picard’96, Tchetgen etal.’08)
I smooth functional: reduce to linear, quadratic and cubic viaTaylor expansion (Mukherjee–Newey–Robins’17)
I almost nothing is known for nonsmooth functionals
10 / 42
![Page 23: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/23.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Differential Entropy Estimation
Kernel-based methods:
I Joe’89
I Gyorfi–van der Meulen’91
I Hall–Morton’93
I Paninski–Yajima’08
I Kandasamy et al.’15
I ...
Nearest neighbor methods:
I Tsybakov–van derMeulen’96
I Sricharan–Raich–Hero’12
I Singh–Poczos’16
I Berrett–Samworth–Yuan’16
I Delattre–Fournier’17
I Gao–Oh–Viswanath’17
I ...
11 / 42
![Page 24: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/24.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Differential Entropy Estimation
Kernel-based methods:
I Joe’89
I Gyorfi–van der Meulen’91
I Hall–Morton’93
I Paninski–Yajima’08
I Kandasamy et al.’15
I ...
Nearest neighbor methods:
I Tsybakov–van derMeulen’96
I Sricharan–Raich–Hero’12
I Singh–Poczos’16
I Berrett–Samworth–Yuan’16
I Delattre–Fournier’17
I Gao–Oh–Viswanath’17
I ...
11 / 42
![Page 25: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/25.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Differential Entropy Estimation (Cont’d)
Drawbacks of previous works:
I extra assumption: the density f is lower bounded by a positiveuniversal constant, e.g., f (x) ≥ 0.01 everywhere
I only prove consistency
I no new lower bound beyond quadratic case
12 / 42
![Page 26: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/26.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Differential Entropy Estimation (Cont’d)
Drawbacks of previous works:
I extra assumption: the density f is lower bounded by a positiveuniversal constant, e.g., f (x) ≥ 0.01 everywhere
I only prove consistency
I no new lower bound beyond quadratic case
12 / 42
![Page 27: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/27.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Differential Entropy Estimation (Cont’d)
Drawbacks of previous works:
I extra assumption: the density f is lower bounded by a positiveuniversal constant, e.g., f (x) ≥ 0.01 everywhere
I only prove consistency
I no new lower bound beyond quadratic case
12 / 42
![Page 28: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/28.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
IntroductionProblem SetupRelated Works
Theory: Optimal EstimationEstimator ConstructionEstimator Analysis
Practice: Adaptive EstimationIdea of Nearest NeighborEstimator AnalysisNumerical Results
Conclusion
13 / 42
![Page 29: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/29.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Main Result
TheoremFor any d and p ∈ [2,∞), s ∈ (0, 2], we have
infh
supf ∈Lipsp,d
Ef |h − h(f )| (n log n)−s
s+d + n−12
Significance
I first exact expression for the minimax rate, including sharpexponent and exact logarithmic factor
I parametric rate n−12 requires s ≥ d
I does not use any extra assumption (e.g., boundedness of f )
I improves the best known lower bound
14 / 42
![Page 30: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/30.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Main Result
TheoremFor any d and p ∈ [2,∞), s ∈ (0, 2], we have
infh
supf ∈Lipsp,d
Ef |h − h(f )| (n log n)−s
s+d + n−12
Significance
I first exact expression for the minimax rate, including sharpexponent and exact logarithmic factor
I parametric rate n−12 requires s ≥ d
I does not use any extra assumption (e.g., boundedness of f )
I improves the best known lower bound
14 / 42
![Page 31: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/31.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Main Result
TheoremFor any d and p ∈ [2,∞), s ∈ (0, 2], we have
infh
supf ∈Lipsp,d
Ef |h − h(f )| (n log n)−s
s+d + n−12
Significance
I first exact expression for the minimax rate, including sharpexponent and exact logarithmic factor
I parametric rate n−12 requires s ≥ d
I does not use any extra assumption (e.g., boundedness of f )
I improves the best known lower bound
14 / 42
![Page 32: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/32.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Main Result
TheoremFor any d and p ∈ [2,∞), s ∈ (0, 2], we have
infh
supf ∈Lipsp,d
Ef |h − h(f )| (n log n)−s
s+d + n−12
Significance
I first exact expression for the minimax rate, including sharpexponent and exact logarithmic factor
I parametric rate n−12 requires s ≥ d
I does not use any extra assumption (e.g., boundedness of f )
I improves the best known lower bound
14 / 42
![Page 33: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/33.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Main Result
TheoremFor any d and p ∈ [2,∞), s ∈ (0, 2], we have
infh
supf ∈Lipsp,d
Ef |h − h(f )| (n log n)−s
s+d + n−12
Significance
I first exact expression for the minimax rate, including sharpexponent and exact logarithmic factor
I parametric rate n−12 requires s ≥ d
I does not use any extra assumption (e.g., boundedness of f )
I improves the best known lower bound
14 / 42
![Page 34: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/34.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Idea: Two-stage Approximation
Recall
h(f ) =
∫[0,1]d
−f (x) log f (x)dx
I can estimate −f (x) log f (x) for every x and then integrate
I involves both function f (x) and functional y 7→ −y log y
I two-stage approximation: first approximate the function andthen approximate the functional
15 / 42
![Page 35: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/35.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Idea: Two-stage Approximation
Recall
h(f ) =
∫[0,1]d
−f (x) log f (x)dx
I can estimate −f (x) log f (x) for every x and then integrate
I involves both function f (x) and functional y 7→ −y log y
I two-stage approximation: first approximate the function andthen approximate the functional
15 / 42
![Page 36: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/36.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Idea: Two-stage Approximation
Recall
h(f ) =
∫[0,1]d
−f (x) log f (x)dx
I can estimate −f (x) log f (x) for every x and then integrate
I involves both function f (x) and functional y 7→ −y log y
I two-stage approximation: first approximate the function andthen approximate the functional
15 / 42
![Page 37: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/37.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Idea: Two-stage Approximation
Recall
h(f ) =
∫[0,1]d
−f (x) log f (x)dx
I can estimate −f (x) log f (x) for every x and then integrate
I involves both function f (x) and functional y 7→ −y log y
I two-stage approximation: first approximate the function andthen approximate the functional
15 / 42
![Page 38: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/38.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
First Stage
How to estimate f (x) at a given x?
I no unbiased estimator...
I first-stage approximation: consider fh = f ∗ Kh instead, whereKh is some kernel with bandwidth h
Example
When Kh(x) = 12h1[−h,h](x), we have
fh(x) =1
2h
∫ x+h
x−hf (y)dy
16 / 42
![Page 39: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/39.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
First Stage
How to estimate f (x) at a given x?
I no unbiased estimator...
I first-stage approximation: consider fh = f ∗ Kh instead, whereKh is some kernel with bandwidth h
Example
When Kh(x) = 12h1[−h,h](x), we have
fh(x) =1
2h
∫ x+h
x−hf (y)dy
16 / 42
![Page 40: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/40.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
First Stage
How to estimate f (x) at a given x?
I no unbiased estimator...
I first-stage approximation: consider fh = f ∗ Kh instead, whereKh is some kernel with bandwidth h
Example
When Kh(x) = 12h1[−h,h](x), we have
fh(x) =1
2h
∫ x+h
x−hf (y)dy
16 / 42
![Page 41: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/41.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
First Stage
How to estimate f (x) at a given x?
I no unbiased estimator...
I first-stage approximation: consider fh = f ∗ Kh instead, whereKh is some kernel with bandwidth h
Example
When Kh(x) = 12h1[−h,h](x), we have
fh(x) =1
2h
∫ x+h
x−hf (y)dy
16 / 42
![Page 42: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/42.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
First Stage (Cont’d)
Advantages of fh:
I close to f for small bandwidth: ‖fh − f ‖p . hs
I admits an unbiased estimator:
E
[
1
n
n∑i=1
Kh(x − Xi )
]=
1
n
n∑i=1
E[Kh(x − Xi )]
=1
n
n∑i=1
∫Kh(x − y)f (y)dy = f ∗ Kh(x)
First-stage approximation
Estimate h(fh) instead of h(f )!
17 / 42
![Page 43: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/43.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
First Stage (Cont’d)
Advantages of fh:
I close to f for small bandwidth: ‖fh − f ‖p . hs
I admits an unbiased estimator:
E
[
1
n
n∑i=1
Kh(x − Xi )
]=
1
n
n∑i=1
E[Kh(x − Xi )]
=1
n
n∑i=1
∫Kh(x − y)f (y)dy = f ∗ Kh(x)
First-stage approximation
Estimate h(fh) instead of h(f )!
17 / 42
![Page 44: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/44.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
First Stage (Cont’d)
Advantages of fh:
I close to f for small bandwidth: ‖fh − f ‖p . hs
I admits an unbiased estimator:
E
[1
n
n∑i=1
Kh(x − Xi )
]
=1
n
n∑i=1
E[Kh(x − Xi )]
=1
n
n∑i=1
∫Kh(x − y)f (y)dy = f ∗ Kh(x)
First-stage approximation
Estimate h(fh) instead of h(f )!
17 / 42
![Page 45: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/45.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
First Stage (Cont’d)
Advantages of fh:
I close to f for small bandwidth: ‖fh − f ‖p . hs
I admits an unbiased estimator:
E
[1
n
n∑i=1
Kh(x − Xi )
]=
1
n
n∑i=1
E[Kh(x − Xi )]
=1
n
n∑i=1
∫Kh(x − y)f (y)dy = f ∗ Kh(x)
First-stage approximation
Estimate h(fh) instead of h(f )!
17 / 42
![Page 46: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/46.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
First Stage (Cont’d)
Advantages of fh:
I close to f for small bandwidth: ‖fh − f ‖p . hs
I admits an unbiased estimator:
E
[1
n
n∑i=1
Kh(x − Xi )
]=
1
n
n∑i=1
E[Kh(x − Xi )]
=1
n
n∑i=1
∫Kh(x − y)f (y)dy
= f ∗ Kh(x)
First-stage approximation
Estimate h(fh) instead of h(f )!
17 / 42
![Page 47: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/47.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
First Stage (Cont’d)
Advantages of fh:
I close to f for small bandwidth: ‖fh − f ‖p . hs
I admits an unbiased estimator:
E
[1
n
n∑i=1
Kh(x − Xi )
]=
1
n
n∑i=1
E[Kh(x − Xi )]
=1
n
n∑i=1
∫Kh(x − y)f (y)dy = f ∗ Kh(x)
First-stage approximation
Estimate h(fh) instead of h(f )!
17 / 42
![Page 48: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/48.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
First Stage (Cont’d)
Advantages of fh:
I close to f for small bandwidth: ‖fh − f ‖p . hs
I admits an unbiased estimator:
E
[1
n
n∑i=1
Kh(x − Xi )
]=
1
n
n∑i=1
E[Kh(x − Xi )]
=1
n
n∑i=1
∫Kh(x − y)f (y)dy = f ∗ Kh(x)
First-stage approximation
Estimate h(fh) instead of h(f )!
17 / 42
![Page 49: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/49.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Second StageThere exists unbiased estimator for fh(x)k for any k = 1, 2, · · · , n
E [
Kh(x − X1)Kh(x − X2) · · ·Kh(x − Xk)
]
=
∫· · ·∫
Kh(x − y1) · · ·Kh(x − yk)f (y1) · · · f (yk)dy1 · · · dyk
=
(∫Kh(x − y)f (y)dy
)k
= fh(x)k
U-statistics
Uk(x) =1(nk
) ∑1≤i1<i2<···<ik≤n
k∏j=1
Kh(x − Xij )
I efficiently computable via Newton’s identity
18 / 42
![Page 50: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/50.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Second StageThere exists unbiased estimator for fh(x)k for any k = 1, 2, · · · , n
E [
Kh(x − X1)Kh(x − X2) · · ·Kh(x − Xk)
]
=
∫· · ·∫
Kh(x − y1) · · ·Kh(x − yk)f (y1) · · · f (yk)dy1 · · · dyk
=
(∫Kh(x − y)f (y)dy
)k
= fh(x)k
U-statistics
Uk(x) =1(nk
) ∑1≤i1<i2<···<ik≤n
k∏j=1
Kh(x − Xij )
I efficiently computable via Newton’s identity
18 / 42
![Page 51: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/51.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Second StageThere exists unbiased estimator for fh(x)k for any k = 1, 2, · · · , n
E [Kh(x − X1)Kh(x − X2) · · ·Kh(x − Xk)]
=
∫· · ·∫
Kh(x − y1) · · ·Kh(x − yk)f (y1) · · · f (yk)dy1 · · · dyk
=
(∫Kh(x − y)f (y)dy
)k
= fh(x)k
U-statistics
Uk(x) =1(nk
) ∑1≤i1<i2<···<ik≤n
k∏j=1
Kh(x − Xij )
I efficiently computable via Newton’s identity
18 / 42
![Page 52: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/52.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Second StageThere exists unbiased estimator for fh(x)k for any k = 1, 2, · · · , n
E [Kh(x − X1)Kh(x − X2) · · ·Kh(x − Xk)]
=
∫· · ·∫
Kh(x − y1) · · ·Kh(x − yk)f (y1) · · · f (yk)dy1 · · · dyk
=
(∫Kh(x − y)f (y)dy
)k
= fh(x)k
U-statistics
Uk(x) =1(nk
) ∑1≤i1<i2<···<ik≤n
k∏j=1
Kh(x − Xij )
I efficiently computable via Newton’s identity
18 / 42
![Page 53: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/53.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Second StageThere exists unbiased estimator for fh(x)k for any k = 1, 2, · · · , n
E [Kh(x − X1)Kh(x − X2) · · ·Kh(x − Xk)]
=
∫· · ·∫
Kh(x − y1) · · ·Kh(x − yk)f (y1) · · · f (yk)dy1 · · · dyk
=
(∫Kh(x − y)f (y)dy
)k
= fh(x)k
U-statistics
Uk(x) =1(nk
) ∑1≤i1<i2<···<ik≤n
k∏j=1
Kh(x − Xij )
I efficiently computable via Newton’s identity
18 / 42
![Page 54: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/54.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Second StageThere exists unbiased estimator for fh(x)k for any k = 1, 2, · · · , n
E [Kh(x − X1)Kh(x − X2) · · ·Kh(x − Xk)]
=
∫· · ·∫
Kh(x − y1) · · ·Kh(x − yk)f (y1) · · · f (yk)dy1 · · · dyk
=
(∫Kh(x − y)f (y)dy
)k
= fh(x)k
U-statistics
Uk(x) =1(nk
) ∑1≤i1<i2<···<ik≤n
k∏j=1
Kh(x − Xij )
I efficiently computable via Newton’s identity
18 / 42
![Page 55: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/55.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Second StageThere exists unbiased estimator for fh(x)k for any k = 1, 2, · · · , n
E [Kh(x − X1)Kh(x − X2) · · ·Kh(x − Xk)]
=
∫· · ·∫
Kh(x − y1) · · ·Kh(x − yk)f (y1) · · · f (yk)dy1 · · · dyk
=
(∫Kh(x − y)f (y)dy
)k
= fh(x)k
U-statistics
Uk(x) =1(nk
) ∑1≤i1<i2<···<ik≤n
k∏j=1
Kh(x − Xij )
I efficiently computable via Newton’s identity
18 / 42
![Page 56: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/56.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Second Stage (Cont’d)
How to estimate −fh(x) log fh(x) at a given x?
I no unbiased estimator either...
I but we have unbiased estimator for all polynomials of fh(x)!
Second-stage Approximation
Write the objective functional as
−fh(x) log fh(x) ≈K∑
k=0
ak fh(x)k
then H(x) =∑K
k=0 akUk(x) is an unbiased estimator for thepolynomial approximation.
19 / 42
![Page 57: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/57.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Second Stage (Cont’d)
How to estimate −fh(x) log fh(x) at a given x?
I no unbiased estimator either...
I but we have unbiased estimator for all polynomials of fh(x)!
Second-stage Approximation
Write the objective functional as
−fh(x) log fh(x) ≈K∑
k=0
ak fh(x)k
then H(x) =∑K
k=0 akUk(x) is an unbiased estimator for thepolynomial approximation.
19 / 42
![Page 58: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/58.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Second Stage (Cont’d)
How to estimate −fh(x) log fh(x) at a given x?
I no unbiased estimator either...
I but we have unbiased estimator for all polynomials of fh(x)!
Second-stage Approximation
Write the objective functional as
−fh(x) log fh(x) ≈K∑
k=0
ak fh(x)k
then H(x) =∑K
k=0 akUk(x) is an unbiased estimator for thepolynomial approximation.
19 / 42
![Page 59: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/59.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Second Stage (Cont’d)
How to estimate −fh(x) log fh(x) at a given x?
I no unbiased estimator either...
I but we have unbiased estimator for all polynomials of fh(x)!
Second-stage Approximation
Write the objective functional as
−fh(x) log fh(x) ≈K∑
k=0
ak fh(x)k
then H(x) =∑K
k=0 akUk(x) is an unbiased estimator for thepolynomial approximation.
19 / 42
![Page 60: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/60.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Estimator Construction
I choose a suitable kernel Kh with bandwidth h, and fh , f ∗Kh
I for every x , we aim to estimate −fh(x) log fh(x)
1. if fh(x) is small, apply the previous unbiased estimator of thepolynomial approximation of y 7→ −y log y
2. if fh(x) is large, just plug in −fh(x) log fh(x)
I integrate the pointwise estimate
20 / 42
![Page 61: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/61.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Estimator Construction
I choose a suitable kernel Kh with bandwidth h, and fh , f ∗Kh
I for every x , we aim to estimate −fh(x) log fh(x)
1. if fh(x) is small, apply the previous unbiased estimator of thepolynomial approximation of y 7→ −y log y
2. if fh(x) is large, just plug in −fh(x) log fh(x)
I integrate the pointwise estimate
20 / 42
![Page 62: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/62.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Estimator Construction
I choose a suitable kernel Kh with bandwidth h, and fh , f ∗Kh
I for every x , we aim to estimate −fh(x) log fh(x)
1. if fh(x) is small, apply the previous unbiased estimator of thepolynomial approximation of y 7→ −y log y
2. if fh(x) is large, just plug in −fh(x) log fh(x)
I integrate the pointwise estimate
20 / 42
![Page 63: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/63.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Estimator Construction
I choose a suitable kernel Kh with bandwidth h, and fh , f ∗Kh
I for every x , we aim to estimate −fh(x) log fh(x)
1. if fh(x) is small, apply the previous unbiased estimator of thepolynomial approximation of y 7→ −y log y
2. if fh(x) is large, just plug in −fh(x) log fh(x)
I integrate the pointwise estimate
20 / 42
![Page 64: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/64.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Estimator Construction
I choose a suitable kernel Kh with bandwidth h, and fh , f ∗Kh
I for every x , we aim to estimate −fh(x) log fh(x)
1. if fh(x) is small, apply the previous unbiased estimator of thepolynomial approximation of y 7→ −y log y
2. if fh(x) is large, just plug in −fh(x) log fh(x)
I integrate the pointwise estimate
20 / 42
![Page 65: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/65.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
IntroductionProblem SetupRelated Works
Theory: Optimal EstimationEstimator ConstructionEstimator Analysis
Practice: Adaptive EstimationIdea of Nearest NeighborEstimator AnalysisNumerical Results
Conclusion
21 / 42
![Page 66: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/66.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Error Decomposition
Ef |h − h(f )|
≤ |h(f )− h(fh)|+ Ef |h − h(fh)|
≤ |h(f )− h(fh)|+ |Ef h − h(fh)|+√
Varf (h)
= approx. error + bias + std
. hs +log n
nhdK 2+
2K
n√hd
Choosing h (n log n)−1
s+d ,K log n completes the proof.
22 / 42
![Page 67: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/67.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Error Decomposition
Ef |h − h(f )| ≤ |h(f )− h(fh)|+ Ef |h − h(fh)|
≤ |h(f )− h(fh)|+ |Ef h − h(fh)|+√
Varf (h)
= approx. error + bias + std
. hs +log n
nhdK 2+
2K
n√hd
Choosing h (n log n)−1
s+d ,K log n completes the proof.
22 / 42
![Page 68: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/68.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Error Decomposition
Ef |h − h(f )| ≤ |h(f )− h(fh)|+ Ef |h − h(fh)|
≤ |h(f )− h(fh)|+ |Ef h − h(fh)|+√
Varf (h)
= approx. error + bias + std
. hs +log n
nhdK 2+
2K
n√hd
Choosing h (n log n)−1
s+d ,K log n completes the proof.
22 / 42
![Page 69: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/69.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Error Decomposition
Ef |h − h(f )| ≤ |h(f )− h(fh)|+ Ef |h − h(fh)|
≤ |h(f )− h(fh)|+ |Ef h − h(fh)|+√
Varf (h)
= approx. error + bias + std
. hs +log n
nhdK 2+
2K
n√hd
Choosing h (n log n)−1
s+d ,K log n completes the proof.
22 / 42
![Page 70: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/70.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Error Decomposition
Ef |h − h(f )| ≤ |h(f )− h(fh)|+ Ef |h − h(fh)|
≤ |h(f )− h(fh)|+ |Ef h − h(fh)|+√
Varf (h)
= approx. error + bias + std
. hs +log n
nhdK 2+
2K
n√hd
Choosing h (n log n)−1
s+d ,K log n completes the proof.
22 / 42
![Page 71: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/71.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Error Decomposition
Ef |h − h(f )| ≤ |h(f )− h(fh)|+ Ef |h − h(fh)|
≤ |h(f )− h(fh)|+ |Ef h − h(fh)|+√
Varf (h)
= approx. error + bias + std
. hs +log n
nhdK 2+
2K
n√hd
Choosing h (n log n)−1
s+d ,K log n completes the proof.
22 / 42
![Page 72: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/72.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Key Lemma in Bounding |h(fh)− h(f )|
Inequality of Fisher Information
Let f ∈ C 2(R) be supported on [0, 1], and f ≥ 0 everywhere. Thefollowing inequality holds:
J(f ) ,∫
(f ′)2
f≤ Cp‖f ′′‖p
where 1 < p ≤ ∞.
23 / 42
![Page 73: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/73.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Proof of Key LemmaNon-negativity of f :
0 ≤ f (x + h)
≤ f (x) + hf ′(x) + h
∫ x+h
x|f ′′(y)|dy
0 ≤ f (x − h) ≤ f (x)− hf ′(x) + h
∫ x
x−h|f ′′(y)|dy
Rearranging:
|f ′(x)| ≤ infh>0
[2f (x)
h+ 2h · 1
2h
∫ x+h
x−h|f ′′(y)|dy
]
≤ infh>0
[2f (x)
h+ 2h · sup
r>0
1
2r
∫ x+r
x−r|f ′′(y)|dy
]= 2√
f (x)M[|f ′′|](x)
24 / 42
![Page 74: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/74.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Proof of Key LemmaNon-negativity of f :
0 ≤ f (x + h) ≤ f (x) + hf ′(x) + h
∫ x+h
x|f ′′(y)|dy
0 ≤ f (x − h) ≤ f (x)− hf ′(x) + h
∫ x
x−h|f ′′(y)|dy
Rearranging:
|f ′(x)| ≤ infh>0
[2f (x)
h+ 2h · 1
2h
∫ x+h
x−h|f ′′(y)|dy
]
≤ infh>0
[2f (x)
h+ 2h · sup
r>0
1
2r
∫ x+r
x−r|f ′′(y)|dy
]= 2√
f (x)M[|f ′′|](x)
24 / 42
![Page 75: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/75.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Proof of Key LemmaNon-negativity of f :
0 ≤ f (x + h) ≤ f (x) + hf ′(x) + h
∫ x+h
x|f ′′(y)|dy
0 ≤ f (x − h) ≤ f (x)− hf ′(x) + h
∫ x
x−h|f ′′(y)|dy
Rearranging:
|f ′(x)| ≤ infh>0
[2f (x)
h+ 2h · 1
2h
∫ x+h
x−h|f ′′(y)|dy
]
≤ infh>0
[2f (x)
h+ 2h · sup
r>0
1
2r
∫ x+r
x−r|f ′′(y)|dy
]= 2√
f (x)M[|f ′′|](x)
24 / 42
![Page 76: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/76.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Proof of Key LemmaNon-negativity of f :
0 ≤ f (x + h) ≤ f (x) + hf ′(x) + h
∫ x+h
x|f ′′(y)|dy
0 ≤ f (x − h) ≤ f (x)− hf ′(x) + h
∫ x
x−h|f ′′(y)|dy
Rearranging:
|f ′(x)| ≤ infh>0
[2f (x)
h+ 2h · 1
2h
∫ x+h
x−h|f ′′(y)|dy
]
≤ infh>0
[2f (x)
h+ 2h · sup
r>0
1
2r
∫ x+r
x−r|f ′′(y)|dy
]= 2√f (x)M[|f ′′|](x)
24 / 42
![Page 77: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/77.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Proof of Key LemmaNon-negativity of f :
0 ≤ f (x + h) ≤ f (x) + hf ′(x) + h
∫ x+h
x|f ′′(y)|dy
0 ≤ f (x − h) ≤ f (x)− hf ′(x) + h
∫ x
x−h|f ′′(y)|dy
Rearranging:
|f ′(x)| ≤ infh>0
[2f (x)
h+ 2h · 1
2h
∫ x+h
x−h|f ′′(y)|dy
]≤ inf
h>0
[2f (x)
h+ 2h · sup
r>0
1
2r
∫ x+r
x−r|f ′′(y)|dy
]
= 2√f (x)M[|f ′′|](x)
24 / 42
![Page 78: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/78.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Proof of Key LemmaNon-negativity of f :
0 ≤ f (x + h) ≤ f (x) + hf ′(x) + h
∫ x+h
x|f ′′(y)|dy
0 ≤ f (x − h) ≤ f (x)− hf ′(x) + h
∫ x
x−h|f ′′(y)|dy
Rearranging:
|f ′(x)| ≤ infh>0
[2f (x)
h+ 2h · 1
2h
∫ x+h
x−h|f ′′(y)|dy
]≤ inf
h>0
[2f (x)
h+ 2h · sup
r>0
1
2r
∫ x+r
x−r|f ′′(y)|dy
]= 2√f (x)M[|f ′′|](x)
24 / 42
![Page 79: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/79.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Maximal Function
Definition (Hardy–Littlewood Maximal Function)
For non-negative function h, the Hardy–Littlewood maximalfunction M[h] is defined as
M[h](x) , supr>0
1
|B(x ; r)|
∫B(x ;r)
h(y)dy .
Theorem (Hardy–Littlewood Maximal Inequality)
The following tail bound holds:
Volx ∈ Rd : M[h](x) > t
≤ Cd
t
∫h(x)dx .
Consequently, ‖M[h]‖p ≤ Cp‖h‖p for any p ∈ (1,∞].
25 / 42
![Page 80: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/80.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Maximal Function
Definition (Hardy–Littlewood Maximal Function)
For non-negative function h, the Hardy–Littlewood maximalfunction M[h] is defined as
M[h](x) , supr>0
1
|B(x ; r)|
∫B(x ;r)
h(y)dy .
Theorem (Hardy–Littlewood Maximal Inequality)
The following tail bound holds:
Volx ∈ Rd : M[h](x) > t
≤ Cd
t
∫h(x)dx .
Consequently, ‖M[h]‖p ≤ Cp‖h‖p for any p ∈ (1,∞].
25 / 42
![Page 81: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/81.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Proof of Key Lemma (Cont’d)
Recall
|f ′(x)| ≤ 2√
f (x)M[|f ′′|](x).
Consequently,∫(f ′)2
f≤ 4‖M[f ′′]‖1 ≤ 4‖M[f ′′]‖p ≤ 4Cp‖f ′′‖p.
26 / 42
![Page 82: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/82.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Proof of Key Lemma (Cont’d)
Recall
|f ′(x)| ≤ 2√
f (x)M[|f ′′|](x).
Consequently,∫(f ′)2
f≤ 4‖M[f ′′]‖1 ≤ 4‖M[f ′′]‖p ≤ 4Cp‖f ′′‖p.
26 / 42
![Page 83: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/83.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Applications of Maximal Function
I Doob’s martingle inequality
I Lebesgue differentiation theorem
I Birkhoff’s pointwise ergodic theorem
27 / 42
![Page 84: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/84.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Summary
I two-stage approximation is optimal for differential entropyestimation
I polynomial-time estimator
I need to tune parameters h,K in practice
I requires the knowledge of s
28 / 42
![Page 85: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/85.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
IntroductionProblem SetupRelated Works
Theory: Optimal EstimationEstimator ConstructionEstimator Analysis
Practice: Adaptive EstimationIdea of Nearest NeighborEstimator AnalysisNumerical Results
Conclusion
29 / 42
![Page 86: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/86.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Another View of Differential Entropy
h(f ) =
∫−f (x) log f (x)dx
= Ef [− log f (X )]
≈ 1
n
n∑i=1
− log f (Xi )
≈ 1
n
n∑i=1
− log f (Xi )
QuestionHow to find a good estimator f (Xi )?
30 / 42
![Page 87: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/87.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Another View of Differential Entropy
h(f ) =
∫−f (x) log f (x)dx
= Ef [− log f (X )]
≈ 1
n
n∑i=1
− log f (Xi )
≈ 1
n
n∑i=1
− log f (Xi )
QuestionHow to find a good estimator f (Xi )?
30 / 42
![Page 88: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/88.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Another View of Differential Entropy
h(f ) =
∫−f (x) log f (x)dx
= Ef [− log f (X )]
≈ 1
n
n∑i=1
− log f (Xi )
≈ 1
n
n∑i=1
− log f (Xi )
QuestionHow to find a good estimator f (Xi )?
30 / 42
![Page 89: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/89.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Another View of Differential Entropy
h(f ) =
∫−f (x) log f (x)dx
= Ef [− log f (X )]
≈ 1
n
n∑i=1
− log f (Xi )
≈ 1
n
n∑i=1
− log f (Xi )
QuestionHow to find a good estimator f (Xi )?
30 / 42
![Page 90: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/90.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Another View of Differential Entropy
h(f ) =
∫−f (x) log f (x)dx
= Ef [− log f (X )]
≈ 1
n
n∑i=1
− log f (Xi )
≈ 1
n
n∑i=1
− log f (Xi )
QuestionHow to find a good estimator f (Xi )?
30 / 42
![Page 91: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/91.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Nearest Neighbor Estimator
Let hi be the distance of Xi to its nearest neighbor, we set
f (Xi ) · Vol(B(Xi ; hi )) =1
n.
Kozachenko–Leonenko (KL) Nearest Neighbor Estimator
hKL =1
n
n∑i=1
log [nVol(B(Xi ; hi ))] + γ
where γ ≈ 0.577 is Euler’s constant.
31 / 42
![Page 92: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/92.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Nearest Neighbor Estimator
Let hi be the distance of Xi to its nearest neighbor, we set
f (Xi ) · Vol(B(Xi ; hi )) =1
n.
Kozachenko–Leonenko (KL) Nearest Neighbor Estimator
hKL =1
n
n∑i=1
log [nVol(B(Xi ; hi ))] + γ
where γ ≈ 0.577 is Euler’s constant.
31 / 42
![Page 93: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/93.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Insights behind KL Estimator
Key Observation
For each i ,∫B(Xi ,hi )
f (y)dy ∼ Beta(1, n − 1).
Consequence
Define
fh(x) =1
Vol(B(x ; h))
∫B(x ;h)
f (y)dy
we have
Ef [hKL]− h(f ) = Ef
[log
f (X )
fh(X )(X )
]+ E log[n · Beta(1, n − 1)] + γ︸ ︷︷ ︸
=O(n−1)
32 / 42
![Page 94: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/94.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Insights behind KL Estimator
Key Observation
For each i ,∫B(Xi ,hi )
f (y)dy ∼ Beta(1, n − 1).
Consequence
Define
fh(x) =1
Vol(B(x ; h))
∫B(x ;h)
f (y)dy
we have
Ef [hKL]− h(f ) = Ef
[log
f (X )
fh(X )(X )
]+ E log[n · Beta(1, n − 1)] + γ︸ ︷︷ ︸
=O(n−1)
32 / 42
![Page 95: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/95.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Insights behind KL Estimator
Key Observation
For each i ,∫B(Xi ,hi )
f (y)dy ∼ Beta(1, n − 1).
Consequence
Define
fh(x) =1
Vol(B(x ; h))
∫B(x ;h)
f (y)dy
we have
Ef [hKL]− h(f ) = Ef
[log
f (X )
fh(X )(X )
]+ E log[n · Beta(1, n − 1)] + γ︸ ︷︷ ︸
=O(n−1)
32 / 42
![Page 96: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/96.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Main Result
TheoremFor any d > 0 and s ∈ (0, 2], the KL estimator satisfies
supf ∈Hs
d
Ef |hKL − h(f )| . n−s
s+d log n + n−12
Significance
I optimal up to logarithmic factor
I does not use extra assumptions (e.g., boundedness of f )
I adaptive in smoothness s
I do not need to tune parameter
33 / 42
![Page 97: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/97.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Main Result
TheoremFor any d > 0 and s ∈ (0, 2], the KL estimator satisfies
supf ∈Hs
d
Ef |hKL − h(f )| . n−s
s+d log n + n−12
Significance
I optimal up to logarithmic factor
I does not use extra assumptions (e.g., boundedness of f )
I adaptive in smoothness s
I do not need to tune parameter
33 / 42
![Page 98: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/98.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Main Result
TheoremFor any d > 0 and s ∈ (0, 2], the KL estimator satisfies
supf ∈Hs
d
Ef |hKL − h(f )| . n−s
s+d log n + n−12
Significance
I optimal up to logarithmic factor
I does not use extra assumptions (e.g., boundedness of f )
I adaptive in smoothness s
I do not need to tune parameter
33 / 42
![Page 99: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/99.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Main Result
TheoremFor any d > 0 and s ∈ (0, 2], the KL estimator satisfies
supf ∈Hs
d
Ef |hKL − h(f )| . n−s
s+d log n + n−12
Significance
I optimal up to logarithmic factor
I does not use extra assumptions (e.g., boundedness of f )
I adaptive in smoothness s
I do not need to tune parameter
33 / 42
![Page 100: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/100.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Main Result
TheoremFor any d > 0 and s ∈ (0, 2], the KL estimator satisfies
supf ∈Hs
d
Ef |hKL − h(f )| . n−s
s+d log n + n−12
Significance
I optimal up to logarithmic factor
I does not use extra assumptions (e.g., boundedness of f )
I adaptive in smoothness s
I do not need to tune parameter
33 / 42
![Page 101: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/101.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
IntroductionProblem SetupRelated Works
Theory: Optimal EstimationEstimator ConstructionEstimator Analysis
Practice: Adaptive EstimationIdea of Nearest NeighborEstimator AnalysisNumerical Results
Conclusion
34 / 42
![Page 102: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/102.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Error Analysis
I Variance of hKL: ,
I Bias of hKL: suffices to upper bound |Ef log f (X )fh(X )(X ) |
1. Upper bound Ef logfh(X )(X )
f (X ) =∫f (x)E log
fh(x)(x)
f (x) dx : ,
2. Upper bound Ef log f (X )fh(X )(X ) =
∫f (x)E log f (x)
fh(x)(x)dx
I If fh(x)(x) is large: ,I If fh(x)(x) is small: /
QuestionFor small ε > 0, find a good upper bound of
E[∫
f (x)1(fh(x)(x) ≤ ε)dx
]
35 / 42
![Page 103: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/103.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Error Analysis
I Variance of hKL: ,I Bias of hKL: suffices to upper bound |Ef log f (X )
fh(X )(X ) |
1. Upper bound Ef logfh(X )(X )
f (X ) =∫f (x)E log
fh(x)(x)
f (x) dx : ,
2. Upper bound Ef log f (X )fh(X )(X ) =
∫f (x)E log f (x)
fh(x)(x)dx
I If fh(x)(x) is large: ,I If fh(x)(x) is small: /
QuestionFor small ε > 0, find a good upper bound of
E[∫
f (x)1(fh(x)(x) ≤ ε)dx
]
35 / 42
![Page 104: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/104.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Error Analysis
I Variance of hKL: ,I Bias of hKL: suffices to upper bound |Ef log f (X )
fh(X )(X ) |
1. Upper bound Ef logfh(X )(X )
f (X ) =∫f (x)E log
fh(x)(x)
f (x) dx : ,
2. Upper bound Ef log f (X )fh(X )(X ) =
∫f (x)E log f (x)
fh(x)(x)dx
I If fh(x)(x) is large: ,I If fh(x)(x) is small: /
QuestionFor small ε > 0, find a good upper bound of
E[∫
f (x)1(fh(x)(x) ≤ ε)dx
]
35 / 42
![Page 105: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/105.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Error Analysis
I Variance of hKL: ,I Bias of hKL: suffices to upper bound |Ef log f (X )
fh(X )(X ) |
1. Upper bound Ef logfh(X )(X )
f (X ) =∫f (x)E log
fh(x)(x)
f (x) dx : ,
2. Upper bound Ef log f (X )fh(X )(X ) =
∫f (x)E log f (x)
fh(x)(x)dx
I If fh(x)(x) is large: ,I If fh(x)(x) is small: /
QuestionFor small ε > 0, find a good upper bound of
E[∫
f (x)1(fh(x)(x) ≤ ε)dx
]
35 / 42
![Page 106: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/106.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Error Analysis
I Variance of hKL: ,I Bias of hKL: suffices to upper bound |Ef log f (X )
fh(X )(X ) |
1. Upper bound Ef logfh(X )(X )
f (X ) =∫f (x)E log
fh(x)(x)
f (x) dx : ,
2. Upper bound Ef log f (X )fh(X )(X ) =
∫f (x)E log f (x)
fh(x)(x)dx
I If fh(x)(x) is large: ,
I If fh(x)(x) is small: /
QuestionFor small ε > 0, find a good upper bound of
E[∫
f (x)1(fh(x)(x) ≤ ε)dx
]
35 / 42
![Page 107: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/107.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Error Analysis
I Variance of hKL: ,I Bias of hKL: suffices to upper bound |Ef log f (X )
fh(X )(X ) |
1. Upper bound Ef logfh(X )(X )
f (X ) =∫f (x)E log
fh(x)(x)
f (x) dx : ,
2. Upper bound Ef log f (X )fh(X )(X ) =
∫f (x)E log f (x)
fh(x)(x)dx
I If fh(x)(x) is large: ,I If fh(x)(x) is small: /
QuestionFor small ε > 0, find a good upper bound of
E[∫
f (x)1(fh(x)(x) ≤ ε)dx
]
35 / 42
![Page 108: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/108.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Error Analysis
I Variance of hKL: ,I Bias of hKL: suffices to upper bound |Ef log f (X )
fh(X )(X ) |
1. Upper bound Ef logfh(X )(X )
f (X ) =∫f (x)E log
fh(x)(x)
f (x) dx : ,
2. Upper bound Ef log f (X )fh(X )(X ) =
∫f (x)E log f (x)
fh(x)(x)dx
I If fh(x)(x) is large: ,I If fh(x)(x) is small: /
QuestionFor small ε > 0, find a good upper bound of
E[∫
f (x)1(fh(x)(x) ≤ ε)dx
]
35 / 42
![Page 109: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/109.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Minimal Function
Definition (Minimal Function)
For non-negative function f supported on [0, 1]d , the minimalfunction m[f ] is defined as
m[f ](x) = inf0<r≤1
1
|Vol(B(x ; r))|
∫B(x ;r)
f (y)dy .
Observation
E[∫
f (x)1(fh(x)(x) ≤ ε)dx
]≤∫
f (x)1(m[f ](x) ≤ ε)dx
36 / 42
![Page 110: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/110.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Minimal Function
Definition (Minimal Function)
For non-negative function f supported on [0, 1]d , the minimalfunction m[f ] is defined as
m[f ](x) = inf0<r≤1
1
|Vol(B(x ; r))|
∫B(x ;r)
f (y)dy .
Observation
E[∫
f (x)1(fh(x)(x) ≤ ε)dx
]≤∫
f (x)1(m[f ](x) ≤ ε)dx
36 / 42
![Page 111: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/111.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Generalized Maximal Inequality
Theorem (Generalized Maximal Inequality)
Let µ1, µ2 be two Borel measures on metric space Ω ⊂ Rd , thenfor any t > 0,
µ2
x ∈ Ω : sup
r>0
µ1(B(x ; r))
µ2(B(x ; r))> t
≤ Cd
t· µ1(Ω).
Corollary
Choose µ1 = Lebesgue measure, dµ2dµ1
= f , we have∫f (x)1(m[f ](x) ≤ ε)dx
≤ µ2x ∈ [0, 1]d : sup
r>0
µ1(B(x ; r))
µ2(B(x ; r))>
1
ε
≤ Cd · ε.
37 / 42
![Page 112: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/112.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Generalized Maximal Inequality
Theorem (Generalized Maximal Inequality)
Let µ1, µ2 be two Borel measures on metric space Ω ⊂ Rd , thenfor any t > 0,
µ2
x ∈ Ω : sup
r>0
µ1(B(x ; r))
µ2(B(x ; r))> t
≤ Cd
t· µ1(Ω).
Corollary
Choose µ1 = Lebesgue measure, dµ2dµ1
= f , we have∫f (x)1(m[f ](x) ≤ ε)dx
≤ µ2x ∈ [0, 1]d : sup
r>0
µ1(B(x ; r))
µ2(B(x ; r))>
1
ε
≤ Cd · ε.
37 / 42
![Page 113: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/113.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Generalized Maximal Inequality
Theorem (Generalized Maximal Inequality)
Let µ1, µ2 be two Borel measures on metric space Ω ⊂ Rd , thenfor any t > 0,
µ2
x ∈ Ω : sup
r>0
µ1(B(x ; r))
µ2(B(x ; r))> t
≤ Cd
t· µ1(Ω).
Corollary
Choose µ1 = Lebesgue measure, dµ2dµ1
= f , we have∫f (x)1(m[f ](x) ≤ ε)dx ≤ µ2
x ∈ [0, 1]d : sup
r>0
µ1(B(x ; r))
µ2(B(x ; r))>
1
ε
≤ Cd · ε.
37 / 42
![Page 114: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/114.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Generalized Maximal Inequality
Theorem (Generalized Maximal Inequality)
Let µ1, µ2 be two Borel measures on metric space Ω ⊂ Rd , thenfor any t > 0,
µ2
x ∈ Ω : sup
r>0
µ1(B(x ; r))
µ2(B(x ; r))> t
≤ Cd
t· µ1(Ω).
Corollary
Choose µ1 = Lebesgue measure, dµ2dµ1
= f , we have∫f (x)1(m[f ](x) ≤ ε)dx ≤ µ2
x ∈ [0, 1]d : sup
r>0
µ1(B(x ; r))
µ2(B(x ; r))>
1
ε
≤ Cd · ε.
37 / 42
![Page 115: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/115.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
IntroductionProblem SetupRelated Works
Theory: Optimal EstimationEstimator ConstructionEstimator Analysis
Practice: Adaptive EstimationIdea of Nearest NeighborEstimator AnalysisNumerical Results
Conclusion
38 / 42
![Page 116: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/116.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Dimensionality d
0.01
0.1
100 1000
k = 5, s = 2
d=1d=2d=4
39 / 42
![Page 117: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/117.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Smoothness s
0.01
0.1
100 1000
k = 5, d = 1
s=0.5s=1
s=1.5s=2
40 / 42
![Page 118: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/118.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
Conclusion
Take-home message:
I two-stage approximation (first approximate the function, thenapproximate the functional) is optimal
I nearest neighbor estimator is near-optimal and adaptive to thesmoothness parameter
I Hardy–Littlewood maximal inequality is crucial to deal withdensity close to zero
41 / 42
![Page 119: Theory and Practice of Di erential Entropy Estimationyjhan/diff_entropy.pdf · Theory and Practice of Di erential Entropy Estimation Introduction Theory: Optimal Estimation Practice:](https://reader035.vdocument.in/reader035/viewer/2022070908/5f8916f0226ec04e4f13d318/html5/thumbnails/119.jpg)
Theory and Practice of Differential Entropy Estimation
Introduction Theory: Optimal Estimation Practice: Adaptive Estimation Conclusion
References
I Yanjun Han, Jiantao Jiao, Tsachy Weissman, and Yihong Wu,“Optimal Rates of Entropy Estimation over Lipschitz Balls”,arXiv preprint, arXiv:1711.02141
I Yanjun Han, Jiantao Jiao, Rajarshi Mukherjee, and TsachyWeissman, “On Estimation of Lr -Norms in Gaussian WhiteNoise Models”, arXiv preprint, arXiv:1710.03863.
I Jiantao Jiao, Weihao Gao, and Yanjun Han, “The NearestNeighbor Information Estimator is Adaptively Near MinimaxRate-Optimal”, arXiv preprint, arXiv:1711.08824.
42 / 42