![Page 1: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information](https://reader035.vdocument.in/reader035/viewer/2022081600/605eed37ae3c0d63e05ac3a7/html5/thumbnails/1.jpg)
mutual information as a measure ofdependence
Erik-Jan van KesterenOctober 5, 2017
Methods & Statistics Data Science lab
![Page 2: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information](https://reader035.vdocument.in/reader035/viewer/2022081600/605eed37ae3c0d63e05ac3a7/html5/thumbnails/2.jpg)
outline
Information
Mutual Information
Maximal Information Coefficient
Questions?
Let’s play!
1
![Page 3: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information](https://reader035.vdocument.in/reader035/viewer/2022081600/605eed37ae3c0d63e05ac3a7/html5/thumbnails/3.jpg)
information
![Page 4: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information](https://reader035.vdocument.in/reader035/viewer/2022081600/605eed37ae3c0d63e05ac3a7/html5/thumbnails/4.jpg)
entropy
Entropy is a measure of uncertainty about the value of a randomvariable.
Formalised by Shannon (1948) at Bell Labs.
Its unit is commonly shannon, bits, or nats.
In general (discrete case):
H(X) = −∑x∈X
p(x) log p(x)
3
![Page 5: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information](https://reader035.vdocument.in/reader035/viewer/2022081600/605eed37ae3c0d63e05ac3a7/html5/thumbnails/5.jpg)
entropy
Let X be the outcome of a coin flip:
X ∼ bernoulli(p)
then:H(X) = −p log p− (1− p) log(1− p)
4
![Page 6: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information](https://reader035.vdocument.in/reader035/viewer/2022081600/605eed37ae3c0d63e05ac3a7/html5/thumbnails/6.jpg)
entropy
coinEntropy <- function(p) -p * log(p) - (1-p) * log(1-p)curve(coinEntropy, 0, 1)
0.0 0.2 0.4 0.6 0.8 1.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
P(heads)
Ent
ropy
(na
ts)
5
![Page 7: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information](https://reader035.vdocument.in/reader035/viewer/2022081600/605eed37ae3c0d63e05ac3a7/html5/thumbnails/7.jpg)
entropy
When we use 2 as the base of the log, the unit will be in shannon orbits.
0.0 0.2 0.4 0.6 0.8 1.0
0.2
0.4
0.6
0.8
1.0
P(heads)
Ent
ropy
(bi
ts)
6
![Page 8: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information](https://reader035.vdocument.in/reader035/viewer/2022081600/605eed37ae3c0d63e05ac3a7/html5/thumbnails/8.jpg)
information
Uncertainty = Information
“the amount of information we gain when we observe theresult of an experiment is equal to the amount ofuncertainty about the outcome before we carry out theexperiment” (Rényi, 1961)
0.0 0.2 0.4 0.6 0.8 1.0
0.2
0.4
0.6
0.8
1.0
P(heads)
Ent
ropy
(bi
ts)
7
![Page 9: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information](https://reader035.vdocument.in/reader035/viewer/2022081600/605eed37ae3c0d63e05ac3a7/html5/thumbnails/9.jpg)
joint entropy
We can also do this for multivariate probability mass functions:
H(X, Y) =∑x∈X
∑y∈Y
p(x, y) log p(x, y)
8
![Page 10: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information](https://reader035.vdocument.in/reader035/viewer/2022081600/605eed37ae3c0d63e05ac3a7/html5/thumbnails/10.jpg)
mutual information
![Page 11: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information](https://reader035.vdocument.in/reader035/viewer/2022081600/605eed37ae3c0d63e05ac3a7/html5/thumbnails/11.jpg)
mutual information
Mutual Information is the information that a variable X carries abouta variable Y (or vice versa)
I(X; Y) = H(X) +H(Y)−H(X, Y)
= −∑x∈X
p(x) log p(x)−∑y∈Y
p(y) log p(y) +∑x∈X
∑y∈Y
p(x, y) log p(x, y)
=∑x∈X
∑y∈Y
p(x, y) log(
p(x, y)p(x)p(y)
)
10
![Page 12: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information](https://reader035.vdocument.in/reader035/viewer/2022081600/605eed37ae3c0d63e05ac3a7/html5/thumbnails/12.jpg)
mutual information
I(X; Y) is a measure of association between two random variableswhich captures linear and nonlinear relations
If X ∼ N (µ1, σ1) and Y ∼ N (µ2, σ2), then
I(X; Y) ≥ − 12 log(1− ρ2)
(Krafft, 2013)
11
![Page 13: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information](https://reader035.vdocument.in/reader035/viewer/2022081600/605eed37ae3c0d63e05ac3a7/html5/thumbnails/13.jpg)
estimating mi in the continuous case
Common estimation method: discretize and then calculateI(X, Y).
Other option: kde and then numerical integration.
This is an active field of research in ML (e.g., Gao et al., 2017).
12
![Page 14: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information](https://reader035.vdocument.in/reader035/viewer/2022081600/605eed37ae3c0d63e05ac3a7/html5/thumbnails/14.jpg)
maximal information coefficient
![Page 15: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information](https://reader035.vdocument.in/reader035/viewer/2022081600/605eed37ae3c0d63e05ac3a7/html5/thumbnails/15.jpg)
maximal information coefficient
We need a measure of dependence that is equitable: itsvalue should depend only on the amount of noise and noton the functional form of the relation between X and Y.(Reshef et al., 2011, paraphrased)
14
![Page 16: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information](https://reader035.vdocument.in/reader035/viewer/2022081600/605eed37ae3c0d63e05ac3a7/html5/thumbnails/16.jpg)
example
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
15
![Page 17: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information](https://reader035.vdocument.in/reader035/viewer/2022081600/605eed37ae3c0d63e05ac3a7/html5/thumbnails/17.jpg)
example
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
16
![Page 18: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information](https://reader035.vdocument.in/reader035/viewer/2022081600/605eed37ae3c0d63e05ac3a7/html5/thumbnails/18.jpg)
example
H(X) = −0.3 log 0.3− 0.3 log 0.3− 0.4 log 0.4 = 1.09H(Y) = −0.2 log 0.2− 0.4 log 0.4− 0.3 log 0.3− 0.1 log 0.1 = 1.28
H(X, Y) = −0.6 log 0.1− 0.4 log 0.2 = 2.03
I(X; Y) = H(X) +H(Y)−H(X, Y) = 0.34
Then, normalise so that In(X; Y) ∈ [0, 1]
In(X; Y) =I(X; Y)
logmin(nx,ny)=
0.34log 3 = 0.31
17
![Page 19: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information](https://reader035.vdocument.in/reader035/viewer/2022081600/605eed37ae3c0d63e05ac3a7/html5/thumbnails/19.jpg)
maximal information criterion
How to calculate the Maximal Information Criterion (MIC)
1. For all grids of size nx × ny up to nx · ny ≤ N0.6 calculatemaximum normalised MI for different bin sizes.
2. Pick the maximum value of these normalised MIs.
18
![Page 20: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information](https://reader035.vdocument.in/reader035/viewer/2022081600/605eed37ae3c0d63e05ac3a7/html5/thumbnails/20.jpg)
equitability
-2 0 2
-20
2
rsq: 1; MIC: 1
x
ax
-2 0 2
-4-2
02
4
rsq: 0.68; MIC: 0.58
xbx
-2 0 2
-4-2
02
46
rsq: 0.42; MIC: 0.37
x
cx
-2 0 2
02
46
810
12
rsq: 0; MIC: 1
x
fx
-2 0 2
05
10
rsq: 0; MIC: 0.52
x
gx
-2 0 2
05
1015
rsq: 0; MIC: 0.33
xhx
19
![Page 21: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information](https://reader035.vdocument.in/reader035/viewer/2022081600/605eed37ae3c0d63e05ac3a7/html5/thumbnails/21.jpg)
functional forms
20
![Page 22: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information](https://reader035.vdocument.in/reader035/viewer/2022081600/605eed37ae3c0d63e05ac3a7/html5/thumbnails/22.jpg)
questions?
![Page 23: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information](https://reader035.vdocument.in/reader035/viewer/2022081600/605eed37ae3c0d63e05ac3a7/html5/thumbnails/23.jpg)
let’s play!
![Page 24: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information](https://reader035.vdocument.in/reader035/viewer/2022081600/605eed37ae3c0d63e05ac3a7/html5/thumbnails/24.jpg)
get your laptops out!
install.packages(”minerva”)library(”minerva”)set.seed(142857)x <- rnorm(300)
# Define functional formf <- function(x) log(abs(x))
# Get the MICmine(x, f(x))$MIC
23
![Page 25: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information](https://reader035.vdocument.in/reader035/viewer/2022081600/605eed37ae3c0d63e05ac3a7/html5/thumbnails/25.jpg)
the rules
1. Don’t add errors! The goal is to cheat the system!2. You can only use x once in f(x).3. f(x) can only perform 2 operations.4. Any number in f(x) needs to be a 9.5. Top tip: plot(x, f(x)).
24
![Page 26: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information](https://reader035.vdocument.in/reader035/viewer/2022081600/605eed37ae3c0d63e05ac3a7/html5/thumbnails/26.jpg)
references
Gao, W., Kannan, S., Oh, S., and Viswanath, P. (2017). Estimating Mutual Information forDiscrete-Continuous Mixtures. pages 1–25.
Krafft, P. (2013). Correlation and mutual information – building intelligent probabilisticsystems.
Rényi, A. (1961). On measures of entropy and information. Fourth Berkeley Symposiumon Mathematical Statistics and Probability, 1(c):547–561.
Reshef, D., Reshef, Y., Finucane, H., Grossman, S., Mcvean, G., Turnbaugh, P., Lander, E.,Mitzenmacher, M., and Sabeti, P. (2011). Detecting Novel Associations in Large DataSets. Science, 334(6062):1518–1524.
Shannon, C. E. (1948). A mathematical theory of communication. The Bell SystemTechnical Journal, 27(July 1928):379–423.
read more:http://science.sciencemag.org/content/334/6062/1502.full
25
![Page 27: Mutual information as a measure of dependence · Erik-Jan van Kesteren October 5, 2017 Methods & Statistics Data Science lab. outline Information Mutual Information Maximal Information](https://reader035.vdocument.in/reader035/viewer/2022081600/605eed37ae3c0d63e05ac3a7/html5/thumbnails/27.jpg)
my top function
f <- function(x) abs(9 %% x)mine(x, f(x))$MIC# [1] 0.4969735
-2 -1 0 1 2
0.0
0.5
1.0
1.5
2.0
f(x) = abs(9 %% x)
x
f(x)
26