arthur gretton - university college londongretton/coursefiles/paris18_2.pdf · coco\...
TRANSCRIPT
![Page 1: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/1.jpg)
Representing and comparing probabilities: Part 2
Arthur Gretton
Gatsby Computational Neuroscience Unit,University College London
Paris, 2018
1/48
![Page 2: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/2.jpg)
Testing against a probabilistic model
2/48
![Page 3: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/3.jpg)
Statistical model criticism
MMD(P ;Q) = kf �k2 = supkf kF�1[EQ f � Epf ]
-4 -2 2 4
-0.3
-0.2
-0.1
0.1
0.2
0.3
0.4
p(x)
q(x)
f *(x)
f �(x ) is the witness functionCan we compute MMD with samples from Q and a model P?Problem: usualy can’t compute Epf in closed form.
3/48
![Page 4: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/4.jpg)
Stein idea
To get rid of Epf insup
kf kF�1[Eq f � Epf ]
we define the Stein operator
[Tpf ] (x ) =1
p(x )ddx
(f (x )p(x ))
ThenEPTP f = 0
subject to appropriate boundary conditions. (Oates, Girolami, Chopin, 2016)
4/48
![Page 5: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/5.jpg)
Stein idea: proof
Ep [Tpf ] =Z �
1p(x )
ddx
(f (x )p(x ))�p(x )dxZ �
ddx
(f (x )p(x ))�dx
= [f (x )p(x )]1�1= 0
5/48
![Page 6: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/6.jpg)
Stein idea: proof
Ep [Tpf ] =Z �
1
���p(x )ddx
(f (x )p(x ))��
��p(x )dxZ �ddx
(f (x )p(x ))�dx
= [f (x )p(x )]1�1= 0
5/48
![Page 7: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/7.jpg)
Stein idea: proof
Ep [Tpf ] =Z �
1
���p(x )ddx
(f (x )p(x ))��
��p(x )dxZ �ddx
(f (x )p(x ))�dx
= [f (x )p(x )]1�1= 0
5/48
![Page 8: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/8.jpg)
Stein idea: proof
Ep [Tpf ] =Z �
1
���p(x )ddx
(f (x )p(x ))��
��p(x )dxZ �ddx
(f (x )p(x ))�dx
= [f (x )p(x )]1�1= 0
5/48
![Page 9: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/9.jpg)
Stein idea: proof
Ep [Tpf ] =Z �
1
���p(x )ddx
(f (x )p(x ))��
��p(x )dxZ �ddx
(f (x )p(x ))�dx
= [f (x )p(x )]1�1= 0
5/48
![Page 10: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/10.jpg)
Kernel Stein DiscrepancyStein operator
Tpf =1
p(x )ddx
(f (x )p(x ))
Kernel Stein Discrepancy (KSD)
KSD(p; q ;F) = supkgkF�1
EqTpg � EpTpg
6/48
![Page 11: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/11.jpg)
Kernel Stein DiscrepancyStein operator
Tpf =1
p(x )ddx
(f (x )p(x ))
Kernel Stein Discrepancy (KSD)
KSD(p; q ;F) = supkgkF�1
EqTpg �����EpTpg = supkgkF�1
EqTpg
6/48
![Page 12: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/12.jpg)
Kernel Stein DiscrepancyStein operator
Tpf =1
p(x )ddx
(f (x )p(x ))
Kernel Stein Discrepancy (KSD)
KSD(p; q ;F) = supkgkF�1
EqTpg �����EpTpg = supkgkF�1
EqTpg
-4 -2 2 4
-0.6
-0.4
-0.2
0.2
0.4
p(x)
q(x)
g *(x)
6/48
![Page 13: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/13.jpg)
Kernel Stein DiscrepancyStein operator
Tpf =1
p(x )ddx
(f (x )p(x ))
Kernel Stein Discrepancy (KSD)
KSD(p; q ;F) = supkgkF�1
EqTpg �����EpTpg = supkgkF�1
EqTpg
-4 -2 2 4
0.1
0.2
0.3
0.4
p(x)
q(x)
g *(x)
6/48
![Page 14: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/14.jpg)
Kernel stein discrepancyClosed-form expression for KSD: given Z ;Z 0 � q , then(Chwialkowski, Strathmann, G., ICML 2016) (Liu, Lee, Jordan ICML 2016)
KSD(p; q ;F) = Eqhp(Z ;Z 0)
where
hp(x ; y) := @x log p(x )@x log p(y)k(x ; y)
+ @y log p(y)@xk(x ; y)
+ @x log p(x )@yk(x ; y)
+ @x@yk(x ; y)
and k is RKHS kernel for F
Only depends on kernel and @x log p(x ). Do not need tonormalize p, or sample from it.
7/48
![Page 15: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/15.jpg)
Statistical model criticism
Chicago crime data
8/48
![Page 16: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/16.jpg)
Statistical model criticism
Chicago crime dataModel is Gaussian mixture with two components.
8/48
![Page 17: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/17.jpg)
Statistical model criticism
Chicago crime dataModel is Gaussian mixture with two components
Stein witness function
8/48
![Page 18: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/18.jpg)
Statistical model criticism
Chicago crime dataModel is Gaussian mixture with ten components.
8/48
![Page 19: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/19.jpg)
Statistical model criticism
Chicago crime dataModel is Gaussian mixture with ten components
Stein witness functionCode: https://github.com/karlnapf/kernel_goodness_of_fit 8/48
![Page 20: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/20.jpg)
Kernel stein discrepancyFurther applications:
Evaluation of approximate MCMC methods.(Chwialkowski, Strathmann, G., ICML 2016; Gorham, Mackey, ICML 2017)
What kernel to use?
The inverse multiquadric kernel,
k(x ; y) =�c + kx � yk2
2
��for � 2 (�1; 0).
9/48
![Page 21: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/21.jpg)
Testing statistical dependence
10/48
![Page 22: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/22.jpg)
Dependence testing
Given: Samples from a distribution PXY
Goal: Are X and Y independent?
Theirnosesguidethemthroughlife,andthey'reneverhappierthanwhenfollowinganinterestingscent.
Alargeanimalwhoslingsslobber,exudesadistinctivehoundy odor,andwantsnothingmorethantofollowhisnose.
Textfromdogtime.com andpetfinder.com
A responsive,interactivepet,onethatwillblowinyourearandfollowyoueverywhere.
YX
11/48
![Page 23: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/23.jpg)
MMD as a dependence measure?
Could we use MMD?
MMD(PXY| {z }P
;PXPY| {z }Q
;H�)
We don’t have samples from Q := PXPY , only pairsf(xi ; yig
ni=1
i:i:d:� PXY
� Solution: simulate Q with pairs (xi ; yj ) for j 6= i .
What kernel � to use for the RKHS H�?
12/48
![Page 24: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/24.jpg)
MMD as a dependence measure?
Could we use MMD?
MMD(PXY| {z }P
;PXPY| {z }Q
;H�)
We don’t have samples from Q := PXPY , only pairsf(xi ; yig
ni=1
i:i:d:� PXY
� Solution: simulate Q with pairs (xi ; yj ) for j 6= i .
What kernel � to use for the RKHS H�?
12/48
![Page 25: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/25.jpg)
MMD as a dependence measure?
Could we use MMD?
MMD(PXY| {z }P
;PXPY| {z }Q
;H�)
We don’t have samples from Q := PXPY , only pairsf(xi ; yig
ni=1
i:i:d:� PXY
� Solution: simulate Q with pairs (xi ; yj ) for j 6= i .
What kernel � to use for the RKHS H�?
12/48
![Page 26: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/26.jpg)
MMD as a dependence measureKernel k on images with feature space F ,
Kernel l on captions with feature space G,
13/48
![Page 27: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/27.jpg)
MMD as a dependence measureKernel k on images with feature space F ,
Kernel l on captions with feature space G,
Kernel � on image-text pairs: are images and captions similar?
13/48
![Page 28: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/28.jpg)
MMD as a dependence measure
Given: Samples from a distribution PXY
Goal: Are X and Y independent?
MMD2( bPXY ; bPX bPY ;H�) :=1n2 trace(KL)
( K, L column centered)
14/48
![Page 29: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/29.jpg)
MMD as a dependence measure
Given: Samples from a distribution PXY
Goal: Are X and Y independent?
MMD2( bPXY ; bPX bPY ;H�) :=1n2 trace(KL)
14/48
![Page 30: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/30.jpg)
MMD as a dependence measure
Two questions:
Why the product kernel? Many ways to combine kernels - why noteg a sum?
Is there a more interpretable way of defining this dependencemeasure?
15/48
![Page 31: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/31.jpg)
Illustration: dependence 6=correlation
Given: Samples from a distribution PXY
Goal: Are X and Y dependent?
0 0.2 0.4 0.6 0.8 1
0
0.5
1
1.5Correlation: 0.88
16/48
![Page 32: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/32.jpg)
Illustration: dependence 6=correlation
Given: Samples from a distribution PXY
Goal: Are X and Y dependent?
0 0.2 0.4 0.6 0.8 1
-2
-1
0
1
2Correlation: 0.07
16/48
![Page 33: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/33.jpg)
Illustration: dependence 6=correlation
Given: Samples from a distribution PXY
Goal: Are X and Y dependent?
-2 -1 0 1 2
-1.5
-1
-0.5
0
0.5
1
1.5Correlation: 0.00
16/48
![Page 34: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/34.jpg)
Finding covariance with smooth transformations
Illustration: two variables with no correlation but strong dependence.
-2 -1 0 1 2
-1.5
-1
-0.5
0
0.5
1
1.5Correlation: 0.00
17/48
![Page 35: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/35.jpg)
Finding covariance with smooth transformations
Illustration: two variables with no correlation but strong dependence.
-2 -1 0 1 2
-1.5
-1
-0.5
0
0.5
1
1.5Correlation: 0.00
-2 0 2
-1
-0.5
0
0.5
-2 0 2
-1
-0.5
0
0.5
17/48
![Page 36: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/36.jpg)
Finding covariance with smooth transformations
Illustration: two variables with no correlation but strong dependence.
-2 -1 0 1 2
-1.5
-1
-0.5
0
0.5
1
1.5Correlation: 0.00
-2 0 2
-1
-0.5
0
0.5
-2 0 2
-1
-0.5
0
0.5
-1 -0.5 0 0.5
-1
-0.5
0
0.5Correlation: 0.90
17/48
![Page 37: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/37.jpg)
Define two spaces, one for each witness
Function in F
f (x ) =1X
j=1
fj'j (x )
Feature map
Kernel for RKHS F on X :
k(x ; x 0) = h'(x ); '(x 0)iF
Function in G
g(y) =1X
j=1
gj�j (y)
Feature map
Kernel for RKHS G on Y:
l(x ; x 0) = h�(y); �(y 0)iG18/48
![Page 38: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/38.jpg)
The constrained covarianceThe constrained covariance is
COCO(PXY ) = supkf kF � 1kgkG � 1
cov[f (x )g(y)]
-2 -1 0 1 2
-1.5
-1
-0.5
0
0.5
1
1.5Correlation: 0.00
-2 0 2
-1
-0.5
0
0.5
-2 0 2
-1
-0.5
0
0.5
-1 -0.5 0 0.5
-1
-0.5
0
0.5Correlation: 0.90
19/48
![Page 39: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/39.jpg)
The constrained covarianceThe constrained covariance is
COCO(PXY ) = supkf kF � 1kgkG � 1
cov
240@ 1Xj=1
fj'j (x )
1A0@ 1Xj=1
gj�j (y)
1A35
19/48
![Page 40: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/40.jpg)
The constrained covarianceThe constrained covariance is
COCO(PXY ) = supkf kF � 1kgkG � 1
Exy
240@ 1Xj=1
fj'j (x )
1A0@ 1Xj=1
gj�j (y)
1A35
Fine print: feature mappings '(x ) and �(y) assumed to have zero mean.
19/48
![Page 41: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/41.jpg)
The constrained covarianceThe constrained covariance is
COCO(PXY ) = supkf kF � 1kgkG � 1
Exy
240@ 1Xj=1
fj'j (x )
1A0@ 1Xj=1
gj�j (y)
1A35
Fine print: feature mappings '(x ) and �(y) assumed to have zero mean.
Rewriting:
Exy [f (x )g(y)]
=
2664f1f2...
3775>
Exy
0BB@2664'1(x )'2(x )
...
3775 h �1(y) �2(y) : : :i1CCA
| {z }C'(x)�(y)
2664g1
g2...
3775
19/48
![Page 42: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/42.jpg)
The constrained covarianceThe constrained covariance is
COCO(PXY ) = supkf kF � 1kgkG � 1
Exy
240@ 1Xj=1
fj'j (x )
1A0@ 1Xj=1
gj�j (y)
1A35
Fine print: feature mappings '(x ) and �(y) assumed to have zero mean.
Rewriting:
Exy [f (x )g(y)]
=
2664f1f2...
3775>
Exy
0BB@2664'1(x )'2(x )
...
3775 h �1(y) �2(y) : : :i1CCA
| {z }C'(x)�(y)
2664g1
g2...
3775
COCO: max singular value of feature covariance C'(x )�(y)19/48
![Page 43: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/43.jpg)
Computing COCO in practiceGiven sample f(xi ; yi )g
ni=1
i:i:d:� PXY , what is empirical \COCO ?
kernels are computed with empirically centered features '(x )� 1n
Pni=1 '(xi ) and
�(y)� 1n
Pni=1 �(yi ).
G., Smola., Bousquet, Herbrich, Belitski, Augath, Murayama, Pauls, Schoelkopf, and Logothetis,
AISTATS’05
20/48
![Page 44: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/44.jpg)
Computing COCO in practiceGiven sample f(xi ; yi )g
ni=1
i:i:d:� PXY , what is empirical \COCO ?
\COCO is largest eigenvalue max of"0 1
n KL1n LK 0
# "�
�
#=
"K 00 L
# "�
�
#:
Kij = k(xi ; xj ) and Lij = l(yi ; yj ).
Fine print: kernels are computed with empirically centered features '(x )� 1n
Pni=1 '(xi )
and �(y)� 1n
Pni=1 �(yi ).
G., Smola., Bousquet, Herbrich, Belitski, Augath, Murayama, Pauls, Schoelkopf, and Logothetis,
AISTATS’05
20/48
![Page 45: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/45.jpg)
Computing COCO in practiceGiven sample f(xi ; yi )g
ni=1
i:i:d:� PXY , what is empirical \COCO ?
\COCO is largest eigenvalue max of"0 1
n KL1n LK 0
# "�
�
#=
"K 00 L
# "�
�
#:
Kij = k(xi ; xj ) and Lij = l(yi ; yj ).
Witness functions (singular vectors):
f (x ) /nX
i=1
�ik(xi ; x ) g(y) /nX
i=1
�i l(yi ; y)
Fine print: kernels are computed with empirically centered features '(x )� 1n
Pni=1 '(xi )
and �(y)� 1n
Pni=1 �(yi ).
G., Smola., Bousquet, Herbrich, Belitski, Augath, Murayama, Pauls, Schoelkopf, and Logothetis,
AISTATS’05
20/48
![Page 46: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/46.jpg)
Empirical COCO: proof (1)The Lagrangian is
L(f ; g ; �; ) =1n
nXi=1
[f (xi )g(yi )]| {z }covariance
��
2
�kf k2
F � 1��
2
�kgk2
G � 1�
| {z }smoothness constraints
:
Fine print: f (xi )g(yi ) centered to have zero empirical mean.
21/48
![Page 47: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/47.jpg)
Empirical COCO: proof (1)The Lagrangian is
L(f ; g ; �; ) =1n
nXi=1
[f (xi )g(yi )]| {z }covariance
��
2
�kf k2
F � 1��
2
�kgk2
G � 1�
| {z }smoothness constraints
:
Fine print: f (xi )g(yi ) centered to have zero empirical mean.
Assume (cf representer theorem):
f =nX
i=1
�i'(xi ) g =nX
i=1
�i (yi )
for centered '(xi ); �(yi ).
21/48
![Page 48: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/48.jpg)
Empirical COCO: proof (1)The Lagrangian is
L(f ; g ; �; ) =1n
nXi=1
[f (xi )g(yi )]| {z }covariance
��
2
�kf k2
F � 1��
2
�kgk2
G � 1�
| {z }smoothness constraints
:
Fine print: f (xi )g(yi ) centered to have zero empirical mean.
Assume (cf representer theorem):
f =nX
i=1
�i'(xi ) g =nX
i=1
�i (yi )
for centered '(xi ); �(yi ).First step is smoothness constraint:
kf k2F � 1 = hf ; f iF � 1
21/48
![Page 49: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/49.jpg)
Empirical COCO: proof (1)The Lagrangian is
L(f ; g ; �; ) =1n
nXi=1
[f (xi )g(yi )]| {z }covariance
��
2
�kf k2
F � 1��
2
�kgk2
G � 1�
| {z }smoothness constraints
:
Fine print: f (xi )g(yi ) centered to have zero empirical mean.
Assume (cf representer theorem):
f =nX
i=1
�i'(xi ) g =nX
i=1
�i (yi )
for centered '(xi ); �(yi ).First step is smoothness constraint:
kf k2F � 1 = hf ; f iF � 1
=
* nXi=1
�i'(xi );nX
i=1
�i'(xi )
+F� 1
21/48
![Page 50: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/50.jpg)
Empirical COCO: proof (1)The Lagrangian is
L(f ; g ; �; ) =1n
nXi=1
[f (xi )g(yi )]| {z }covariance
��
2
�kf k2
F � 1��
2
�kgk2
G � 1�
| {z }smoothness constraints
:
Fine print: f (xi )g(yi ) centered to have zero empirical mean.
Assume (cf representer theorem):
f =nX
i=1
�i'(xi ) g =nX
i=1
�i (yi )
for centered '(xi ); �(yi ).First step is smoothness constraint:
kf k2F � 1 = hf ; f iF � 1
=
* nXi=1
�i'(xi );nX
i=1
�i'(xi )
+F� 1
= �>K�� 121/48
![Page 51: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/51.jpg)
Proof sketch (2)Second step is covariance:
1n
nXi=1
[f (xi )g(yi )] =1n
nXi=1
hf ; '(xi )iF hg ; '(yi )iG
=1n
nXi=1
* nX`=1
�`'(x`); '(xi )
+Fhg ; '(yi )iG
=1n�>KL�
22/48
![Page 52: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/52.jpg)
Proof sketch (2)Second step is covariance:
1n
nXi=1
[f (xi )g(yi )] =1n
nXi=1
hf ; '(xi )iF hg ; '(yi )iG
=1n
nXi=1
* nX`=1
�`'(x`); '(xi )
+Fhg ; '(yi )iG
=1n�>KL�
22/48
![Page 53: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/53.jpg)
Proof sketch (2)
Second step is covariance:
1n
nXi=1
[f (xi )g(yi )] =1n
nXi=1
hf ; '(xi )iF hg ; '(yi )iG
=1n
nXi=1
* nX`=1
�`'(x`); '(xi )
+Fhg ; '(yi )iG
=1n�>KL�
where Kij = k(xi ; xj )=h'(xi ); '(xj )iF Lij = l(yi ; yj ).
22/48
![Page 54: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/54.jpg)
Proof sketch (2)Second step is covariance:
1n
nXi=1
[f (xi )g(yi )] =1n
nXi=1
hf ; '(xi )iF hg ; '(yi )iG
=1n
nXi=1
* nX`=1
�`'(x`); '(xi )
+Fhg ; '(yi )iG
=1n�>KL�
where Kij = k(xi ; xj )=h'(xi ); '(xj )iF Lij = l(yi ; yj ).
The Lagranian is now:
L(f ; g ; �; ) =1n�>KL� �
�
2
��>K�� 1
��
2
��>L� � 1
�
22/48
![Page 55: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/55.jpg)
What is a large dependence with COCO?
−2 0 2−3
−2
−1
0
1
2
3
X
Y
Smooth density
−4 −2 0 2 4−4
−2
0
2
4
X
Y
500 Samples, smooth density
−2 0 2−3
−2
−1
0
1
2
3
XY
Rough density
−4 −2 0 2 4−4
−2
0
2
4
X
Y
500 samples, rough density
Density takes the form:
PXY / 1+sin(!x ) sin(!y)
Which of these is the more “dependent”?
23/48
![Page 56: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/56.jpg)
Finding covariance with smooth transformations
Case of ! = 1:
-4 -2 0 2 4
-4
-2
0
2
4Correlation: 0.31
-2 0 2
-1
-0.5
0
0.5
1
-2 0 2
-1
-0.5
0
0.5
1
-0.5 0 0.5
-0.5
0
0.5
Correlation: 0.50 COCO: 0.09
24/48
![Page 57: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/57.jpg)
Finding covariance with smooth transformations
Case of ! = 2:
-4 -2 0 2 4
-4
-2
0
2
4Correlation: 0.02
-2 0 2
-1
-0.5
0
0.5
1
-2 0 2
-1
-0.5
0
0.5
1
-0.5 0 0.5
-0.5
0
0.5
Correlation: 0.54 COCO: 0.07
25/48
![Page 58: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/58.jpg)
Finding covariance with smooth transformations
Case of ! = 3:
-4 -2 0 2 4
-4
-2
0
2
4Correlation: 0.03
-2 0 2
-1
-0.5
0
0.5
1
-2 0 2
-1
-0.5
0
0.5
1
-0.5 0 0.5
-0.5
0
0.5
Correlation: 0.44 COCO: 0.04
26/48
![Page 59: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/59.jpg)
Finding covariance with smooth transformations
Case of ! = 4:
-4 -2 0 2 4
-4
-2
0
2
4Correlation: 0.05
-2 0 2
-1
-0.5
0
0.5
1
-2 0 2
-1
-0.5
0
0.5
1
-0.5 0 0.5
-0.5
0
0.5
Correlation: 0.25 COCO: 0.02
27/48
![Page 60: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/60.jpg)
Finding covariance with smooth transformations
Case of ! =??:
-4 -2 0 2 4
-4
-2
0
2
4Correlation: 0.01
-2 0 2
-1
-0.5
0
0.5
1
-2 0 2
-1
-0.5
0
0.5
1
-0.5 0 0.5
-0.5
0
0.5
Correlation: 0.14 COCO: 0.02
28/48
![Page 61: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/61.jpg)
Finding covariance with smooth transformations
Case of ! = 0: uniform noise! (shows bias)
-4 -2 0 2 4
-4
-2
0
2
4Correlation: 0.01
-2 0 2
-1
-0.5
0
0.5
1
-2 0 2
-1
-0.5
0
0.5
1
-0.5 0 0.5
-0.5
0
0.5
Correlation: 0.14 COCO: 0.02
29/48
![Page 62: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/62.jpg)
Dependence largest when at “low” frequencies
As dependence is encoded at higher frequencies, the smoothmappings f ; g achieve lower linear dependence.
Even for independent variables, COCO will not be zero at finitesample sizes, since some mild linear dependence will be found by f,g(bias)
This bias will decrease with increasing sample size.
30/48
![Page 63: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/63.jpg)
Can we do better than COCO?A second example with zero correlation.First singular value of feature covariance C'(x )�(y):
-1 -0.5 0 0.5 1
-1
-0.5
0
0.5
1
Correlation: 0.00
-2 0 2
-1
-0.5
0
0.5
-2 0 2
-1
-0.5
0
0.5
-1 -0.5 0 0.5
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
Correlation: 0.80 COCO1: 0.11
31/48
![Page 64: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/64.jpg)
Can we do better than COCO?A second example with zero correlation.Second singular value of feature covariance C'(x )�(y):
-1 -0.5 0 0.5 1
-1
-0.5
0
0.5
1
Correlation: 0.00
-2 0 2
-1
-0.5
0
0.5
1
-2 0 2
-1
-0.5
0
0.5
1
31/48
![Page 65: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/65.jpg)
Can we do better than COCO?A second example with zero correlation.Second singular value of feature covariance C'(x )�(y):
-1 -0.5 0 0.5 1
-1
-0.5
0
0.5
1
Correlation: 0.00
-2 0 2
-1
-0.5
0
0.5
1
-2 0 2
-1
-0.5
0
0.5
1
-1 -0.5 0 0.5 1
-0.5
0
0.5
Correlation: 0.37 COCO2: 0.06
31/48
![Page 66: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/66.jpg)
The Hilbert-Schmidt Independence Criterion
Writing the ith singular value of the feature covariance C'(x )�(y) as
i := COCOi (PXY ;F ;G);
define Hilbert-Schmidt Independence Criterion (HSIC)
HSIC 2(PXY ;F ;G) =1Xi=1
2i :
G, Bousquet , Smola., and Schoelkopf, ALT05; G,., Fukumizu, Teo., Song., Schoelkopf., and Smola,NIPS 2007,.
32/48
![Page 67: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/67.jpg)
The Hilbert-Schmidt Independence Criterion
Writing the ith singular value of the feature covariance C'(x )�(y) as
i := COCOi (PXY ;F ;G);
define Hilbert-Schmidt Independence Criterion (HSIC)
HSIC 2(PXY ;F ;G) =1Xi=1
2i :
G, Bousquet , Smola., and Schoelkopf, ALT05; G,., Fukumizu, Teo., Song., Schoelkopf., and Smola,NIPS 2007,.
HSIC is MMD with product kernel!
HSIC 2(PXY ;F ;G) = MMD2(PXY ;PXPY ;H�)
where �((x ; y); (x 0; y 0)) = k(x ; x 0)l(y ; y 0).
32/48
![Page 68: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/68.jpg)
Asymptotics of HSIC under independence
Given sample f(xi ; yigni=1
i:i:d:� PXY , what is empirical\HSIC ?
Empirical HSIC (biased)
\HSIC =1n2 trace(KL)
Kij = k(xi ; xj ) and Lij = l(yiyj ) (K and L computed withempirically centered features)
Statistical testing: given PXY = PXPY , what is the threshold c�such that P(\HSIC > c�) < � for small �?Asymptotics of\HSIC when PXY = PXPY :
n\HSIC D!
1Xl=1
�lz 2l ; zl � N (0; 1)i:i:d:
where �l l (zj ) =R
hijqr l (zi )dFi;q;r ; hijqr = 14!
P(i;j ;q;r)(t;u;v ;w)
ktu ltu + ktu lvw � 2ktu ltv
33/48
![Page 69: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/69.jpg)
Asymptotics of HSIC under independence
Given sample f(xi ; yigni=1
i:i:d:� PXY , what is empirical\HSIC ?
Empirical HSIC (biased)
\HSIC =1n2 trace(KL)
Kij = k(xi ; xj ) and Lij = l(yiyj ) (K and L computed withempirically centered features)
Statistical testing: given PXY = PXPY , what is the threshold c�such that P(\HSIC > c�) < � for small �?Asymptotics of\HSIC when PXY = PXPY :
n\HSIC D!
1Xl=1
�lz 2l ; zl � N (0; 1)i:i:d:
where �l l (zj ) =R
hijqr l (zi )dFi;q;r ; hijqr = 14!
P(i;j ;q;r)(t;u;v ;w)
ktu ltu + ktu lvw � 2ktu ltv
33/48
![Page 70: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/70.jpg)
Asymptotics of HSIC under independence
Given sample f(xi ; yigni=1
i:i:d:� PXY , what is empirical\HSIC ?
Empirical HSIC (biased)
\HSIC =1n2 trace(KL)
Kij = k(xi ; xj ) and Lij = l(yiyj ) (K and L computed withempirically centered features)
Statistical testing: given PXY = PXPY , what is the threshold c�such that P(\HSIC > c�) < � for small �?Asymptotics of\HSIC when PXY = PXPY :
n\HSIC D!
1Xl=1
�lz 2l ; zl � N (0; 1)i:i:d:
where �l l (zj ) =R
hijqr l (zi )dFi;q;r ; hijqr = 14!
P(i;j ;q;r)(t;u;v ;w)
ktu ltu + ktu lvw � 2ktu ltv
33/48
![Page 71: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/71.jpg)
Asymptotics of HSIC under independence
Given sample f(xi ; yigni=1
i:i:d:� PXY , what is empirical\HSIC ?
Empirical HSIC (biased)
\HSIC =1n2 trace(KL)
Kij = k(xi ; xj ) and Lij = l(yiyj ) (K and L computed withempirically centered features)
Statistical testing: given PXY = PXPY , what is the threshold c�such that P(\HSIC > c�) < � for small �?Asymptotics of\HSIC when PXY = PXPY :
n\HSIC D!
1Xl=1
�lz 2l ; zl � N (0; 1)i:i:d:
where �l l (zj ) =R
hijqr l (zi )dFi;q;r ; hijqr = 14!
P(i;j ;q;r)(t;u;v ;w)
ktu ltu + ktu lvw � 2ktu ltv
33/48
![Page 72: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/72.jpg)
A statistical test
Given PXY = PXPY , what is the threshold c� such thatP(\HSIC > c�) < � for small � (prob. of false positive)?
Original time series:
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y10
Permutation:
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
Y7 Y3 Y9 Y2 Y4 Y8 Y5 Y1 Y6 Y10
Null distribution via permutation� Compute HSIC for fxi ; y�(i)gn
i=1 for random permutation � of indicesf1; : : : ;ng. This gives HSIC for independent variables.
� Repeat for many different permutations, get empirical CDF� Threshold c� is 1� � quantile of empirical CDF 34/48
![Page 73: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/73.jpg)
A statistical test
Given PXY = PXPY , what is the threshold c� such thatP(\HSIC > c�) < � for small � (prob. of false positive)?
Original time series:
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y10
Permutation:
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
Y7 Y3 Y9 Y2 Y4 Y8 Y5 Y1 Y6 Y10
Null distribution via permutation� Compute HSIC for fxi ; y�(i)gn
i=1 for random permutation � of indicesf1; : : : ;ng. This gives HSIC for independent variables.
� Repeat for many different permutations, get empirical CDF� Threshold c� is 1� � quantile of empirical CDF 34/48
![Page 74: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/74.jpg)
A statistical test
Given PXY = PXPY , what is the threshold c� such thatP(\HSIC > c�) < � for small � (prob. of false positive)?
Original time series:
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y10
Permutation:
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
Y7 Y3 Y9 Y2 Y4 Y8 Y5 Y1 Y6 Y10
Null distribution via permutation� Compute HSIC for fxi ; y�(i)gn
i=1 for random permutation � of indicesf1; : : : ;ng. This gives HSIC for independent variables.
� Repeat for many different permutations, get empirical CDF� Threshold c� is 1� � quantile of empirical CDF 34/48
![Page 75: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/75.jpg)
Application: dependence detection across languagesTesting task: detect dependence between English and French text
Lesordresdegouvernementsprovinciauxetmunicipauxsubissentdefortespressions
Honourablesenators,IhaveaquestionfortheLeaderoftheGovernmentintheSenate
Textfromthealignedhansards ofthe36th parliamentofcanada,https://www.isi.edu/natural-language/download/hansard/
YXHonorablessénateurs,maquestions’adresseauleaderdugouvernementauSénat
Aucontraire,nousavonsaugmentelefinancementfédéralpourledéveloppementdesjeunes
Nodoubtthereisgreatpressureonprovincialandmunicipalgovernments
Infact,wehaveincreasedfederalinvestmentsforearlychildhooddevelopment.
......
35/48
![Page 76: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/76.jpg)
Application: dependence detection across languagesTesting task: detect dependence between English and French textk -spectrum kernel, k = 10, sample size n = 10
\HSIC =1n2 trace(KL)
(K and L column centered) 36/48
![Page 77: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/77.jpg)
Application:Dependence detection across languages
Results (for � = 0:05)
k-spectrum kernel: average Type II error 0
Bag of words kernel: average Type II error 0.18
Settings: Five line extracts, averaged over 300 repetitions, for“Agriculture” transcripts. Similar results for Fisheries andImmigration transcripts.
37/48
![Page 78: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/78.jpg)
Testing higher order interactions
38/48
![Page 79: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/79.jpg)
Detecting higher order interaction
How to detect V-structures with pairwise weak individualdependence?
X Y
Z
39/48
![Page 80: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/80.jpg)
Detecting higher order interaction
How to detect V-structures with pairwise weak individualdependence?
39/48
![Page 81: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/81.jpg)
Detecting higher order interactionHow to detect V-structures with pairwise weak individualdependence?
X ?? Y ;Y ?? Z ;X ?? ZX1 vs Y1 Y1 vs Z1
X1 vs Z1 X1*Y1 vs Z1
X Y
Z
X ;Y i:i:d:� N (0; 1)
Z j X ;Y � sign(XY )Exp( 1p2)
Fine print: Faithfulness violated here!
39/48
![Page 82: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/82.jpg)
V-structure discovery
X Y
Z
Assume X ?? Y has been established.V-structure can then be detected by:
Consistent CI test: H0 : X ?? Y jZ [Fukumizu et al. 2008, Zhang et al. 2011]
Factorisation test: H0 : (X ;Y ) ?? Z _ (X ;Z ) ?? Y _ (Y ;Z ) ?? X(multiple standard two-variable tests)
How well do these work?40/48
![Page 83: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/83.jpg)
Detecting higher order interaction
Generalise earlier example to p dimensions
X ?? Y ;Y ?? Z ;X ?? ZX1 vs Y1 Y1 vs Z1
X1 vs Z1 X1*Y1 vs Z1
X Y
Z
X ;Y i:i:d:� N (0; 1)
Z j X ;Y � sign(XY )Exp( 1p2)
X2:p ;Y2:p ;Z2:pi :i :d :� N (0; Ip�1)
Fine print: Faithfulness violated here!
41/48
![Page 84: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/84.jpg)
V-structure discovery
CI test for X ?? Y jZ from Zhang et al. (2011), and a factorisation test,n = 500
42/48
![Page 85: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/85.jpg)
Lancaster interaction measureLancaster interaction measure of (X1; : : : ;XD) � P is a signedmeasure �P that vanishes whenever P can be factorised non-trivially.
D = 2 : �LP = PXY � PXPY
43/48
![Page 86: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/86.jpg)
Lancaster interaction measureLancaster interaction measure of (X1; : : : ;XD) � P is a signedmeasure �P that vanishes whenever P can be factorised non-trivially.
D = 2 : �LP = PXY � PXPY
D = 3 : �LP = PXYZ �PXPYZ �PY PXZ �PZPXY +2PXPY PZ
43/48
![Page 87: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/87.jpg)
Lancaster interaction measure
Lancaster interaction measure of (X1; : : : ;XD) � P is a signedmeasure �P that vanishes whenever P can be factorised non-trivially.
D = 2 : �LP = PXY � PXPY
D = 3 : �LP = PXYZ �PXPYZ �PY PXZ �PZPXY +2PXPY PZ
X Y
Z
X Y
Z
X Y
Z
X Y
Z
PXY Z −PXPY Z −PY PXZ −PZPXY +2PXPY PZ
∆LP =
43/48
![Page 88: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/88.jpg)
Lancaster interaction measureLancaster interaction measure of (X1; : : : ;XD) � P is a signedmeasure �P that vanishes whenever P can be factorised non-trivially.
D = 2 : �LP = PXY � PXPY
D = 3 : �LP = PXYZ �PXPYZ �PY PXZ �PZPXY +2PXPY PZ
X Y
Z
X Y
Z
X Y
Z
X Y
Z
PXY Z −PXPY Z −PXZPY −PXYPZ +2PXPY PZ
∆LP = 0
Case of PX ?? PYZ
43/48
![Page 89: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/89.jpg)
Lancaster interaction measure
Lancaster interaction measure of (X1; : : : ;XD) � P is a signedmeasure �P that vanishes whenever P can be factorised non-trivially.
D = 2 : �LP = PXY � PXPY
D = 3 : �LP = PXYZ �PXPYZ �PY PXZ �PZPXY +2PXPY PZ
(X ;Y ) ?? Z _ (X ;Z ) ?? Y _ (Y ;Z ) ?? X ) �LP = 0:
...so what might be missed?
43/48
![Page 90: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/90.jpg)
Lancaster interaction measureLancaster interaction measure of (X1; : : : ;XD) � P is a signedmeasure �P that vanishes whenever P can be factorised non-trivially.
D = 2 : �LP = PXY � PXPY
D = 3 : �LP = PXYZ �PXPYZ �PY PXZ �PZPXY +2PXPY PZ
�LP = 0; (X ;Y ) ?? Z _ (X ;Z ) ?? Y _ (Y ;Z ) ?? X
Example:
P(0; 0; 0) = 0:2 P(0; 0; 1) = 0:1 P(1; 0; 0) = 0:1 P(1; 0; 1) = 0:1P(0; 1; 0) = 0:1 P(0; 1; 1) = 0:1 P(1; 1; 0) = 0:1 P(1; 1; 1) = 0:2
43/48
![Page 91: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/91.jpg)
A kernel test statistic using Lancaster Measure
Construct a test by estimating k�� (�LP)k2H�; where � = k l m :
k��(PXYZ � PXY PZ � � � � )k2H�
=
h��PXYZ ; ��PXYZ iH�� 2 h��PXYZ ; ��PXY PZ iH�
� � �
44/48
![Page 92: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/92.jpg)
A kernel test statistic using Lancaster Measure
Table: V -statistic estimators of h���; ��� 0iH�
(without terms PXPY PZ ). His centering matrix I � n�1
Lancaster interaction statistic: Sejdinovic, G, Bergsma, NIPS13
k�� (�LP)k2H�
=1n2 (HKH �HLH �HMH )++ :
Empirical joint central moment in the feature space 45/48
![Page 93: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/93.jpg)
A kernel test statistic using Lancaster Measure
Table: V -statistic estimators of h���; ��� 0iH�
(without terms PXPY PZ ). His centering matrix I � n�1
Lancaster interaction statistic: Sejdinovic, G, Bergsma, NIPS13
k�� (�LP)k2H�
=1n2 (HKH �HLH �HMH )++ :
Empirical joint central moment in the feature space 45/48
![Page 94: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/94.jpg)
V-structure discovery
Lancaster test, CI test for X ?? Y jZ from Zhang et al. (2011), and afactorisation test, n = 500 46/48
![Page 95: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/95.jpg)
Interaction for D � 4
Interaction measure valid for all D :(Streitberg, 1990)
�SP =X�
(�1)j�j�1 (j�j � 1)!J�P
� For a partition �, J� associates to thejoint the corresponding factorisation,e.g., J13j2j4P = PX1X3PX2PX4 :
47/48
![Page 96: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/96.jpg)
Interaction for D � 4
Interaction measure valid for all D :(Streitberg, 1990)
�SP =X�
(�1)j�j�1 (j�j � 1)!J�P
� For a partition �, J� associates to thejoint the corresponding factorisation,e.g., J13j2j4P = PX1X3PX2PX4 :
47/48
![Page 97: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/97.jpg)
Interaction for D � 4
Interaction measure valid for all D :(Streitberg, 1990)
�SP =X�
(�1)j�j�1 (j�j � 1)!J�P
� For a partition �, J� associates to thejoint the corresponding factorisation,e.g., J13j2j4P = PX1X3PX2PX4 :
1e+04
1e+09
1e+14
1e+19
1 3 5 7 9 11 13 15 17 19 21 23 25D
Nu
mb
er
of
pa
rtitio
ns o
f {1
,...
,D} Bell numbers growth
47/48
![Page 98: Arthur Gretton - University College Londongretton/coursefiles/paris18_2.pdf · COCO\ islargesteigenvalue max of " 0 1 n KL 1 n LK 0 #" # = " K 0 0 L #" #: K ij = k( x i; x j) andL](https://reader036.vdocument.in/reader036/viewer/2022081400/60603bfccbeac33dbf60bd2e/html5/thumbnails/98.jpg)
Co-authorsFrom Gatsby:Mikolaj Binkowski
Kacper Chwialkowski
Wittawat Jitkrittum
Heiko Strathmann
Dougal Sutherland
Wenkai Xu
External collaborators:Kenji Fukumizu
Bernhard Schoelkopf
Bharath Sriperumbudur
Alex Smola
Zoltan Szabo
Questions?
48/48