entropy and dependence estimation barnabás...

63
Entropy and Dependence Estimation Barnabás Póczos Department of Computing Science, University of Alberta, Canada Eötvös Loránd University Neural Information Processing Group July 2, 2009

Upload: ngodung

Post on 26-Aug-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

Entropy and Dependence Estimation

Barnabás Póczos Department of Computing Science,

University of Alberta, Canada

Eötvös Loránd University

Neural Information Processing Group

July 2, 2009

Page 2: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

2

Contents• Entropy estimation

– Spacing– Leonenko and Kozahenko– Euclidean Graphs + Combinatorial Optimization

• Kernel Mutual Information

• Copula framework

• Copula methods for ICA– Schweitzer Wolf dependence

• Copula methods for MI estimation

• Results

Page 3: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

3

Applications of Mutual Information

• Information theory

• Feature selection

• Clustering

• Image registration

• Independent Component Analysis

• Independent Subspace Analysis

• Optimal experiment design

• Structure learning

• Causality detection

Page 4: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

Independent Compenent Analysis

and

Independent Subspace Analysis

Page 5: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

5

ICA Cost Functions

JICAW: Iy; : : : ; yM

:Rpy; : : : ; yM

py;:::;yM py:::pyM

dy;

JICAW: Hy : : :HyM

Page 6: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

6

ISA Cost Functions

Mutual Information:

py pydyHy ¡R

Shannon-entropy:

RIy; : : : ;ym py

py¢¢¢pymdy

Page 7: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

7

ISA Cost Functions

Page 8: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

Entropy Estimation

Page 9: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

9

1D Spacing Methods

• Vasicek 1976, Test for normality

• Used in RADICAL, E. L. Miller

Page 10: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

10

Multi-dimensional Entropy Estimations, Method of Kozahenko and Leonenko

Then the nearest neighbor entropy estimation:

This estimation is means-square consistent, but not robust.Let us try to use more neighbors!

Hz n

nPj

nkN;j ¡ zjk CE

CE ¡1Re¡t tdt

fz; : : : ;zng n

z 2 Rd N;j zj

Page 11: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

11

Multi-dimensional Rényi’s Entropy Estimations

Let us apply Rényi’s-

entropy for estimating

the Shannon-entropy:

¡® R

®!

H® ¡R

Let us use - K-nearest neighbors - geodesic spanning trees for estimating the multi-dimensional Rényi’s entropy.(It could be much more general…)

f®zdz

fz fzdz

Page 12: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

12

Beardwood - Halton - Hammersley Theorem for kNN graphs

fz; : : : ;zng n

z 2 Rd Nk;j k zj

° d¡ d®

¡®

kn®

nPj

Pv2Nk;j

kv¡zjk°! H®z c

n! 1

Lots of other graphs, e.g. MST, TSP, minimal matching, Steiner graph…etc could be used as well.

Page 13: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

13

Multi-dimensional Entropy Estimations Using Geodesic Spanning Forests

• Build first an Euclidean neighbourhood graph– use the edges of the k nearest nodes to each node

• Find geodesic spanning forests on this graph– (minimal spanning forests of the Euclidean neighbourhood graph)

zp

Page 14: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

14

Euclidean Graphs

– Euclidean neighbourhood graph

– Weight of minimal (γ-weighted) Euclidean spanning forest:

Where is the set of all γ-weighted Euclidean spanning forests

L°z T2T

Pe2T

kek°

T

fz; : : : ;zng n

z 2 Rd

E fe ep; q zp¡ zq 2 Rd;zq 2 Nk;pg

° d¡ d® d°

L°zn® ! H®z c n! 1

Page 15: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

15

Estimation of the Shannon-entropy

Page 16: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

16

Examples(J. A. Costa and A. O. Hero)

Page 17: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

Dependence Estimation Using Kernel Methods

Page 18: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

18

Kernel covariance (KC) A. Gretton, R. Herbrich, A. Smola, F. Bach, M. Jordan

The calculation of the supremum over function sets is extremely difficult. Reproducing Kernel Hilbert Spaces make it easier.

Page 19: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

19

RKHS construction for x, y stochastic variables.

Page 20: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

20

Kernel covariance (KC)

And what is more, after some calculation we get, that

Page 21: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

Dependence Estimation Using Copula Methods

Page 22: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

22

COPULA• Function that couples multivariate distribution

function to their marginal functions• Multivariate distribution function whose marginals are

uniform on [0,1]

Copulas are useful for:• Studying scale free measures of dependence

– nonparametric independence test– financial risk management

• Constructing families of multivariate distributions• Sampling from multivariate distributions• Studying Chapman-Kolmogorov equations in Markov

processes

Page 23: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

23

Brief History• Hoeffding 1940, studying distributions with [-1/2,1/2]2 support

and uniform marginals”Had Hoeffding chosen the unit square [0,1]2 instead of [-1/2,1/2]2

he would have discover copulas…” (Schweizer)

• Frechet 1951, independently rediscover many of the same results ) Frechet-Hoeffding bounds.

• Feron 1956, introduce the word “copula” (grammatical term that links subject and predicate)

• Sklar 1959, Sklar’s theorem.

• Schweizer-Wolf 1981, Schweizer-Wolf’s σ for measure of dependence.

Page 24: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

24

Copula

Bivariate copula C is a multivariate distribution (cdf) defined on a unit square with uniform univariate marginals:

Uniform marginsGrounded

2-increasing

u1 u2

v1

v2

¸ 0

Fig. from N. Baeva

Page 25: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

25

Frechet-Hoeffding bounds M, W, Π Copulas

Figs. from Nielsen

Page 26: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

26

Sklar’s Theorem[Sklar, 59]

= +

The copula couples joint distribution to its univariate margins.

+

Page 27: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

27

Independence Copula (Π)

uniform copula densityIndependent distribution

Proof. Let us apply the Sklar’s theorem:

Page 28: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

28

Copula properties

• Invariance under strictly monotone transformations

• Order statistics can be expressed with copulas

Important when we need dependence measure that is invariant under strictly monotone rescaling the variants.

Page 29: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

29

Basic Properties

• Frechet-Hoeffding bounds

• Lipschitz continuity

• Convex linear combination of copulas

Page 30: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

30

Concordance

X, Y variables are concordant if large (small) values are tend to be associated

with large (small) values of the other.

Page 31: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

31

Measure of ConcordanceDefinition (Concordance)κ association between X,Y continuous variables with copula C, s.t.

Theorem: Y is increasing function of X ) κX,Y = κM= 1

Y is decreasing ) κX,Y = κW= -1

α, β monotone increasing ) κα(X),β(Y)= κX,Y

Theorem:Kendall’s t, Spearman’s ρ, Gini’s γ, Blomquist’s β are measure of Concordance

Page 32: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

32

Kendall’s τ

c= #concordant pairs, d= #discordant pairs

Sample version

Population version

Theorem, Kendall’s τ with copulas:

Page 33: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

33

Spearman’s ρ

Population version

Sample version:Pearson’s linear correlation between the

ranks.

Theorem, Spearman’s ρ with copulas:

Page 34: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

34

Measure of Dependence

δ association between X,Y continuous variables with copula C is a measure of dependence if

Problem with concordance:

Page 35: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

35

Schweizer-Wolff Measures of Dependence

Lp-norm between the copula for the distribution and the independence copula.

Special cases:

[Schweizer and Wolff, 81]

Multivariate L1 version:

Page 36: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

36

From ranks to copula

Ranks and copulas• Invariant under monotonic transformations• Not very sensitive to outliers (e.g. empirical cov is very sensitive)• The Copula is the CDF of the population distribution over ranks

distribution of ranks =empirical copula density

Page 37: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

37

[Deheuvels,79]Empirical Copula Distribution

Page 38: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

38

Using the Empirical Copula

Useful for computing sample versions of functions on copulas

Page 39: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

39

Contribution: New ICA Contrast• Using a measure of dependence based on copulas, joint

distributions over ranks.

• Properties– Does not require density estimation– Non-parametric

• Advantages– Very robust to outliers– Frequently performs as well as state of the art algorithms (and

sometimes better)– Contrast can be used with ISA (… as opposed to Radical)

• Can be used to estimate dependence – Easy to implement (code publicly available)

• Disadvantages– Somewhat slow (although not prohibitively so)– Needs more samples to demix near-Gaussian sources

Page 40: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

40

Inputs: X, a 2 x N matrix of signals, K, number of evaluation angles

For θ=0,π/(2K),…,(K-1)π/(2K)• Compute rotation matrix

• Compute rotated signals Y(θ)=W(θ)X.• Compute s(Y(θ)), a sample estimate of σ

− Sort− Compute s

Find best angle θm=argminθs(Y(θ))

Output: Rotation matrix W=W(θm), demixed signal Y=Y(θm), and estimated dependence measure s=s(Y(θm))

Schweizer-Wolff ICA (SWICA)

O(1)

O(N)

O(NlogN)O(N2)

O(K)

O(KN2)

Page 41: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

41

Amari Error

• Measures how close a square matrix is to a permutation matrix

B = WA

demixing mixing

Page 42: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

42

Synthetic Marginal Distributions[Bach and Jordan 02]

Page 43: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

43

ICA Comparison (M=2, N=1000, 1000 repetitions)

bold 2 [best, best+0.2]

Page 44: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

44

Outliers(M=2, N=1000, 10000 repetitions)

0 5 1 0 1 5 2 0 2 50

0 .0 5

0 .1

0 .1 5

0 .2

0 .2 5

0 .3

0 .3 5

0 .4

0 .4 5

N u m b e r o f p o s t - w h it e n in g o u t l ie r s

Am

ari e

rror

S W I C AR A D IC A LK e r n e l- I C AF a s t I C AJ A D E

Page 45: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

45

Corrupted Music

Page 46: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

46

Outliers: Multidimensional ComparisonAm

ari e

rror

Number of post-whitening outliers

M=4, N=2000, 1000 repetitions M=8, N=5000, 100 repetitions

Page 47: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

47

Unmixing Image Sources with Outliers (3% outliers)

Original Mixed SWICA FastICA

[http://www.cis.hut.fi/projects/ica/data/images/]

Page 48: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

48

Independent Subspace Analysis(ISA, The Woodstock Problem)

Sources Observation Estimation

Find W, recover Wx

Page 49: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

49

Independent Subspace Analysis

Page 50: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

50

Summary

• New contrast based on a measure of dependence for distribution over ranks– Robust to outliers– Comparable performance to state of the art

algorithms– Can handle a moderate number of sources (M=20)– Can be used to solve ISA

• Future work– Further acceleration of SWICA– What types of sources does SWICA do well on and

why?

Page 51: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

Copula Methods for Mutual Information Estimation

Page 52: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

52

Non-parametric MI estimation using copulas

Mutual information is the negentropy of the copula:

Rényi’s information is the negative Rényi’s entropy of the copula:

Page 53: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

53

Algorithms

Algorithm for (Rényi’s) MI estimation: • Estimate the copula using empirical copula

• Estimate its (Rényi’s) negentropy

Other methods:

2. k-Nearest Neighbours for entropy estimation

3. Kernel Density Estimation

4. Histogram (bin) based estimations

Page 54: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

54

Consistency in 2D

Page 55: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

55

Consistency in 2D

(with Gamma marginals)

Page 56: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

56

Rotated 2D sources

Page 57: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

57

Rotated 2D uniform distribution

Page 58: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

58

Rotated 2D uniform distribution in the presence of outliers

Page 59: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

59

4D sources

Page 60: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

60

Image Registration

Page 61: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

61

ISA

Page 62: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

62

ISA in the Presence of Outliers

Page 63: Entropy and Dependence Estimation Barnabás Póczosbapoczos/other_presentations/ELTE_02_07_2009.pdf · Entropy and Dependence Estimation Barnabás Póczos Department of Computing

63

Conclusion

Use Copulas,

They can be useful…

Thanks for the Attention!