Download - Satyaki Mahalanabis Daniel Štefankovič
![Page 1: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/1.jpg)
Satyaki MahalanabisDaniel Štefankovič
University of Rochester
Density estimation in linear time(+approximating L1-distances)
![Page 2: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/2.jpg)
Density estimation
DATA+f1 f2
f3f4 f5
f6
density
F = a family of densities
![Page 3: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/3.jpg)
Density estimation - example
+
N(,)
0.418974, 0.848565, 1.73705, 1.59579, -1.18767, -1.05573, -1.36625
F = a family of normal densities with =1
![Page 4: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/4.jpg)
Measure of quality:
L1 – distance from the truth
Why L1?
|f-g|1 = |f(x)-g(x)| dx
1) small L1 all events estimated with small additive error2) scale invariant
g=TRUTH f=OUTPUT
![Page 5: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/5.jpg)
Obstacles to “quality”:
DATA+
weak class of densities
bad data
F
dist1(g,F)
?
![Page 6: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/6.jpg)
What is bad data ?
g = TRUTHh = DATA (empirical density)
| h-g |1
= 2max |h(A)-g(A)|AY(F)
Y(F) = Yatracos class of F Aij={ x | fi(x)>fj(x) }
f1
f2 f3
A12A13 A23
![Page 7: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/7.jpg)
= 2max |h(A)-g(A)|AY(F)
Density estimation
DATA (h)+F
with small |g-f|1
assuming these are small:
dist1(g,F)
f
![Page 8: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/8.jpg)
= 2max |h(A)-g(A)|AY(F)
Why would these be small ???
dist1(h,F)
1) pick a large enough F 2) pick a small enough F so that VC-dimension of Y(F) is small3) data are iid from h
They will be if:
E[max|h(A)-g(A)|]
Theorem (Haussler,Dudley, Vapnik, Chervonenkis):VC(Y)
samplesAY
![Page 9: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/9.jpg)
How to choose from 2 densities?
f1 f2
![Page 10: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/10.jpg)
How to choose from 2 densities?
f1 f2
+1 +1 +1 -1
![Page 11: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/11.jpg)
How to choose from 2 densities?
f1 f2
+1 +1 +1 -1
T
T f1
T f2
Th
![Page 12: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/12.jpg)
How to choose from 2 densities?
f1 f2
+1 +1 +1 -1
T
T f1
T f2
Th
Scheffé: if T h > T (f1+f2)/2 f1
else f2
Theorem (see DL’01): |f-g|1 3dist1(g,F) + 2
![Page 13: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/13.jpg)
= 2max |h(A)-g(A)|AY(F)
Density estimation
DATA (h)+F
with small |g-f|1
assuming these are small:
dist1(g,F)
f
![Page 14: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/14.jpg)
Test functions
Tij (x) = sgn(fi(x) – fj(x))
Tij(fi – fj) = (fi-fj)sgn(f
i-f
j) = |fi – fj|1
F={f1,f2,...,fN}
TijfiTijfj
fi winsfj wins
Tijh
![Page 15: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/15.jpg)
Density estimation algorithms
Scheffé tournament: Pick the density with the most wins.
Theorem (DL’01): |f-g|1 9dist1(g,F)+8
Minimum distance estimate (Y’85): Output fk F that minimizes max |(fk-h) Tij| Theorem (DL’01): |f-g|1 3dist1(g,F)+2
ij
n2
n3
![Page 16: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/16.jpg)
Density estimation algorithms
Scheffé tournament: Pick the density with the most wins.
Theorem (DL’01): |f-g|1 9dist1(g,F)+8
Minimum distance estimate (Y’85): Output fk F that minimizes max |(fk-h) Tij| Theorem (DL’01): |f-g|1 3dist1(g,F)+2
ij
n2
n3Can we do better?
![Page 17: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/17.jpg)
Our algorithm: Efficient minimum loss-weight
repeat until one distribution left 1) pick the pair of distributions in F that are furthest apart (in L1) 2) eliminate the loser
Theorem [MS’08]: |f-g|1 3dist1(g,F)+2 n
Take the most “discriminative” action.
*
* after preprocessing F
![Page 18: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/18.jpg)
Tournament revelation problem INPUT: a weighed undirected graph G (wlog all edge-weights distinct)
OUTPUT: REPORT: heaviest edge {u1,v1} in G ADVERSARY eliminates u1 or v1 G1
REPORT: heaviest edge {u2,v2} in G1
ADVERSARY eliminates u2 or v2 G2
.....OBJECTIVE: minimize total time spent generating reports
![Page 19: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/19.jpg)
Tournament revelation problem
1
23 4
5 6
A
B
CD
report the heaviest edge
![Page 20: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/20.jpg)
Tournament revelation problem
1
23 4
5 6
A
B
CD
report the heaviest edge
BC
![Page 21: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/21.jpg)
Tournament revelation problem
1
23
A
CD
report the heaviest edge
BCeliminate B
report the heaviest edge
![Page 22: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/22.jpg)
Tournament revelation problem
1
23
A
CD
report the heaviest edge
BCeliminate B
report the heaviest edge
AD
![Page 23: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/23.jpg)
Tournament revelation problem
1 CD
report the heaviest edge
BCeliminate B
report the heaviest edge
ADeliminate A
report the heaviest edge
CD
![Page 24: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/24.jpg)
Tournament revelation problem
1
23 4
5 6
A
B
CD
BCB C
AD BDA D DB
DC AC AD AB
2O(F) preprocessing O(F) run-timeO(F2 log F) preprocessing O(F2) run-time
WE DO NOT KNOW: Can get O(F) run-time with polynomial preprocessing ???
![Page 25: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/25.jpg)
Efficient minimum loss-weight
repeat until one distribution left 1) pick the pair of distributions that are furthest apart (in L1) 2) eliminate the loser
2O(F) preprocessing O(F) run-timeO(F2 log F) preprocessing O(F2) run-time
WE DO NOT KNOW: Can get O(F) run-time with polynomial preprocessing ???
(in practice 2) is more costly)
![Page 26: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/26.jpg)
Efficient minimum loss-weight
repeat until one distribution left 1) pick the pair of distributions that are furthest apart (in L1) 2) eliminate the loser
Theorem: |f-g|1 3dist1(g,F)+2 nProof:
For every f’ to which f loses
|f-f’|1 max |f’-f’’|1 f’ loses to f’’
“that guy lost even more badly!”
![Page 27: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/27.jpg)
Proof: For every f’ to which f loses
|f-f’|1 max |f’-f’’|1 f’ loses to f’’
“that guy lost even more badly!”
f1
BEST=f2f3
bad loss
2hT23 f2T23 + f3T23
(f1-f2)T12 (f2-f3) T23
(f4-h)T23
(fi-fj)(Tij-Tkl) 0
|f1-g|1 3|f2-g|1+2
![Page 28: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/28.jpg)
Application: kernel density estimates (Akaike’54,Parzen’62,Rosenblatt’56)
K = kernel
h = density kernel used to smooth empirical g(x1,x2,...,xn i.i.d. samples from h)
K(y-xi)1n
i=1
n
g * K
h * Kas n
=
![Page 29: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/29.jpg)
K(y-xi)1
n i=1
n
h * Kas n
What K should we choose?
Dirac would be goodDirac is not good
Something in-between: bandwidth selection for kernel density estimates
Ks(x)=K(x/s)
sas s 0 Ks(x) Dirac
Theorem (see DL’01): as s 0 with sn |g*K – h|1 0
g * K =
![Page 30: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/30.jpg)
Data splitting methods for kernel density estimates
K1ns
y-xi( )s
How to pick the smoothing factor ?
i=1
n
x1,x2,...,xn
x1,...,xn-m
xn-m+1,...,xn
fs = K1(n-m)s
y-xi( )si=1
n-m
choose s usingdensity estimation
![Page 31: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/31.jpg)
Kernels we will use:
K1ns
y-xi( )s
piecewise uniform
piecewise linear
![Page 32: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/32.jpg)
Bandwidth selection for uniformkernels N distributions each is piecewise uniform with n pieces m datapoints
E.g. N n1/2
m n5/4
Goal: run the density estimation algorithm efficiently
gTij (fi+fj)Tij
2
|fi-fj|1
(fk-h) Tkj
EMLWMD
N2
N2
N
n
TIME
n+m log n
n+m log n
![Page 33: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/33.jpg)
Bandwidth selection for uniformkernels N distributions each is piecewise uniform with n pieces m datapoints
E.g. N n1/2
m n5/4
Goal: run the density estimation algorithm efficiently
gTij (fi+fj)Tij
2
|fi-fj|1
(fk-h) Tkj
EMLWMD
N2
N2
N
n
TIME
n+m log n
n+m log n
Can speed this up?
![Page 34: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/34.jpg)
Bandwidth selection for uniformkernels N distributions each is piecewise uniform with n pieces m datapoints
E.g. N n1/2
m n5/4
Goal: run the density estimation algorithm efficiently
gTij (fi+fj)Tij
2
|fi-fj|1
(fk-h) Tkj
EMLWMD
N2
N2
N
n
TIME
n+m log n
n+m log n
Can speed this up?
absolute error badrelative error good
![Page 35: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/35.jpg)
Approximating L1-distances between distributions
WE WILL DO:(N2+Nn) (log N)
2
TRIVIAL (exact): N2n
N piecewise uniform densities (each n pieces)
![Page 36: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/36.jpg)
Dimension reduction for L2
Johnson-Lindenstrauss Lemma (’82)
: L2 Lt2 t = O(-2 ln n)
( x,y S)
d(x,y) d((x),(y)) (1+)d(x,y)
|S|=n
N(0,t-1/2)
![Page 37: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/37.jpg)
Dimension reduction for L1
Cauchy Random Projection (Indyk’00)
: L1 Lt1 t = O(-2 ln n)
( x,y S)
d(x,y) est((x),(y)) (1+)d(x,y)
|S|=n
N(0,t-1/2)C(0,1/t)
(Charikar, Brinkman’03 : cannot replace est by d)
![Page 38: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/38.jpg)
Cauchy distribution C(0,1)density function: 1
(1+x2)
XC(0,1) aXC(0,|a|)
XC(0,a), YC(0,b)X+YC(0,a+b)
FACTS:
![Page 39: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/39.jpg)
X1 X2 X3 X4X5X6X7 X8 X9
A B
z
X1C(0,z)A(X2+X3) + B(X5+X6+X7+X8)
Cauchy random projection for L1
D
(Indyk’00)
![Page 40: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/40.jpg)
X1 X2 X3 X4X5X6X7 X8 X9
A B
z
Cauchy random projection for L1
D
D(X1+X2+...+X8+X9)
(Indyk’00)
X1C(0,z)A(X2+X3) + B(X5+X6+X7+X8)
Cauchy(0,|-|1)
![Page 41: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/41.jpg)
All pairs L1-distances piece-wise linear densities
![Page 42: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/42.jpg)
All pairs L1-distances piece-wise linear densities
X1 X2 C(0,1/2)
R=(3/4)X1 + (1/4)X2 B=(3/4)X2 + (1/4)X1
R-BC(0,1/2)
![Page 43: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/43.jpg)
All pairs L1-distances piece-wise linear densities
Problem: too many intersections!
Solution: cut into even smaller pieces!
Stochastic measures are useful.
![Page 44: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/44.jpg)
Brownian motion
exp(-x^2/2)1
(21/2
Cauchy motion
1
(1+x)2
0 .2 0 .4 0 .6 0 .8 1 .0
0 .5
0 .5
1 .0
0 .2 0 .4 0 .6 0 .8 1 .0
0 .4
0 .2
0 .2
0 .4
![Page 45: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/45.jpg)
Brownian motion
exp(-x^2/2)1
(21/20 .2 0 .4 0 .6 0 .8 1 .0
0 .5
0 .5
1 .0
f dL = Y N(0,S)
computing integrals is easyf:RRd
![Page 46: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/46.jpg)
f dL = Y C(0,s) for d=1
computing integrals is easyf:RRd
0 .2 0 .4 0 .6 0 .8 1 .0
0 .4
0 .2
0 .2
0 .4
Cauchy motion
1
(1+x)2
computing integrals is hard d>1* obtaining explicit expression for the density
*
![Page 47: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/47.jpg)
X1 X2 X3 X4X5X6X7 X8 X9
What were we doing?
(f1,f2,f3) dL = (w1)1,(w2)1,(w3)1
![Page 48: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/48.jpg)
X1 X2 X3 X4X5X6X7 X8 X9
What were we doing?
(f1,f2,f3) dL = (w1)1,(w2)1,(w3)1
Can we efficiently compute integrals dL for piecewise linear?
![Page 49: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/49.jpg)
Can we efficiently compute integrals dL for piecewise linear?
R R2
z)=(1,z)
(X,Y)= dL
![Page 50: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/50.jpg)
R R2
z)=(1,z)
(X,Y)= dL
(2(X-Y),2Y) has density atu+v,u-v
2
![Page 51: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/51.jpg)
All pairs L1-distances for mixtures of uniform densities in time
O((N^2+Nn) (log N)2 )
All pairs L1-distances for piecewise linear densities in time
O((N^2+Nn) (log N)2 )
![Page 52: Satyaki Mahalanabis Daniel Štefankovič](https://reader035.vdocument.in/reader035/viewer/2022062314/56812db4550346895d92e39e/html5/thumbnails/52.jpg)
R R3
z)=(1,z,z2) (X,Y,Z)= dL
?1)
QUESTIONS
2) higher dimensions ?