mutual information as a measure of dependence · erik-jan van kesteren october 5, 2017 methods...

27
mutual information as a measure of dependence Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab

Upload: others

Post on 19-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information

mutual information as a measure ofdependence

Erik-Jan van KesterenOctober 5, 2017

Methods & Statistics Data Science lab

Page 2: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information

outline

Information

Mutual Information

Maximal Information Coefficient

Questions?

Let’s play!

1

Page 3: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information

information

Page 4: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information

entropy

Entropy is a measure of uncertainty about the value of a randomvariable.

Formalised by Shannon (1948) at Bell Labs.

Its unit is commonly shannon, bits, or nats.

In general (discrete case):

H(X) = −∑x∈X

p(x) log p(x)

3

Page 5: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information

entropy

Let X be the outcome of a coin flip:

X ∼ bernoulli(p)

then:H(X) = −p log p− (1− p) log(1− p)

4

Page 6: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information

entropy

coinEntropy <- function(p) -p * log(p) - (1-p) * log(1-p)curve(coinEntropy, 0, 1)

0.0 0.2 0.4 0.6 0.8 1.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

P(heads)

Ent

ropy

(na

ts)

5

Page 7: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information

entropy

When we use 2 as the base of the log, the unit will be in shannon orbits.

0.0 0.2 0.4 0.6 0.8 1.0

0.2

0.4

0.6

0.8

1.0

P(heads)

Ent

ropy

(bi

ts)

6

Page 8: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information

information

Uncertainty = Information

“the amount of information we gain when we observe theresult of an experiment is equal to the amount ofuncertainty about the outcome before we carry out theexperiment” (Rényi, 1961)

0.0 0.2 0.4 0.6 0.8 1.0

0.2

0.4

0.6

0.8

1.0

P(heads)

Ent

ropy

(bi

ts)

7

Page 9: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information

joint entropy

We can also do this for multivariate probability mass functions:

H(X, Y) =∑x∈X

∑y∈Y

p(x, y) log p(x, y)

8

Page 10: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information

mutual information

Page 11: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information

mutual information

Mutual Information is the information that a variable X carries abouta variable Y (or vice versa)

I(X; Y) = H(X) +H(Y)−H(X, Y)

= −∑x∈X

p(x) log p(x)−∑y∈Y

p(y) log p(y) +∑x∈X

∑y∈Y

p(x, y) log p(x, y)

=∑x∈X

∑y∈Y

p(x, y) log(

p(x, y)p(x)p(y)

)

10

Page 12: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information

mutual information

I(X; Y) is a measure of association between two random variableswhich captures linear and nonlinear relations

If X ∼ N (µ1, σ1) and Y ∼ N (µ2, σ2), then

I(X; Y) ≥ − 12 log(1− ρ2)

(Krafft, 2013)

11

Page 13: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information

estimating mi in the continuous case

Common estimation method: discretize and then calculateI(X, Y).

Other option: kde and then numerical integration.

This is an active field of research in ML (e.g., Gao et al., 2017).

12

Page 14: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information

maximal information coefficient

Page 15: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information

maximal information coefficient

We need a measure of dependence that is equitable: itsvalue should depend only on the amount of noise and noton the functional form of the relation between X and Y.(Reshef et al., 2011, paraphrased)

14

Page 16: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information

example

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

15

Page 17: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information

example

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

16

Page 18: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information

example

H(X) = −0.3 log 0.3− 0.3 log 0.3− 0.4 log 0.4 = 1.09H(Y) = −0.2 log 0.2− 0.4 log 0.4− 0.3 log 0.3− 0.1 log 0.1 = 1.28

H(X, Y) = −0.6 log 0.1− 0.4 log 0.2 = 2.03

I(X; Y) = H(X) +H(Y)−H(X, Y) = 0.34

Then, normalise so that In(X; Y) ∈ [0, 1]

In(X; Y) =I(X; Y)

logmin(nx,ny)=

0.34log 3 = 0.31

17

Page 19: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information

maximal information criterion

How to calculate the Maximal Information Criterion (MIC)

1. For all grids of size nx × ny up to nx · ny ≤ N0.6 calculatemaximum normalised MI for different bin sizes.

2. Pick the maximum value of these normalised MIs.

18

Page 20: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information

equitability

-2 0 2

-20

2

rsq: 1; MIC: 1

x

ax

-2 0 2

-4-2

02

4

rsq: 0.68; MIC: 0.58

xbx

-2 0 2

-4-2

02

46

rsq: 0.42; MIC: 0.37

x

cx

-2 0 2

02

46

810

12

rsq: 0; MIC: 1

x

fx

-2 0 2

05

10

rsq: 0; MIC: 0.52

x

gx

-2 0 2

05

1015

rsq: 0; MIC: 0.33

xhx

19

Page 21: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information

functional forms

20

Page 22: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information

questions?

Page 23: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information

let’s play!

Page 24: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information

get your laptops out!

install.packages(”minerva”)library(”minerva”)set.seed(142857)x <- rnorm(300)

# Define functional formf <- function(x) log(abs(x))

# Get the MICmine(x, f(x))$MIC

23

Page 25: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information

the rules

1. Don’t add errors! The goal is to cheat the system!2. You can only use x once in f(x).3. f(x) can only perform 2 operations.4. Any number in f(x) needs to be a 9.5. Top tip: plot(x, f(x)).

24

Page 26: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information

references

Gao, W., Kannan, S., Oh, S., and Viswanath, P. (2017). Estimating Mutual Information forDiscrete-Continuous Mixtures. pages 1–25.

Krafft, P. (2013). Correlation and mutual information – building intelligent probabilisticsystems.

Rényi, A. (1961). On measures of entropy and information. Fourth Berkeley Symposiumon Mathematical Statistics and Probability, 1(c):547–561.

Reshef, D., Reshef, Y., Finucane, H., Grossman, S., Mcvean, G., Turnbaugh, P., Lander, E.,Mitzenmacher, M., and Sabeti, P. (2011). Detecting Novel Associations in Large DataSets. Science, 334(6062):1518–1524.

Shannon, C. E. (1948). A mathematical theory of communication. The Bell SystemTechnical Journal, 27(July 1928):379–423.

read more:http://science.sciencemag.org/content/334/6062/1502.full

25

Page 27: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information

my top function

f <- function(x) abs(9 %% x)mine(x, f(x))$MIC# [1] 0.4969735

-2 -1 0 1 2

0.0

0.5

1.0

1.5

2.0

f(x) = abs(9 %% x)

x

f(x)

26