distributions of persistence diagrams and approximations · vasileios maroulas department of...

46
Distributions of Persistence Diagrams and Approximations Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August 31, 2018

Upload: others

Post on 01-Sep-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Distributions of Persistence Diagrams andApproximations

Vasileios Maroulas

Department of MathematicsDepartment of Business Analytics and Statistics

University of Tennessee

August 31, 2018

Page 2: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

Thanks

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 2 / 45

Page 3: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

Joint with: Funded by:

I Josh Mike (now at Michigan State)I Andrew Marchese (now at Plated)I John Sgouralis (now at Arizona State)I Chris Oballe

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 3 / 45

Page 4: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

Acoustic Signals at the ARL

I Two classes representing two different types of weapons.

I Goal is to help military officers make tactical decisions based onthe type of weapon system.

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 4 / 45

Page 5: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

Signals from ARL Dataset

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 5 / 45

Page 6: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

I Merge statistics and topology to understand the geometry ofsignals and classify them.

I TDA has recently been introduced to the field of signal andtime-series classification.

I Biological Signals (Zhang et al. (2015))

I Action Recognition (Venkataraman et al. (2016))

I Wheeze Detection (Emrani et al. (2014))

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 6 / 45

Page 7: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

Motivation

I Data has shape and shape matters.

I Latent topological features in scientific data.

I VM and A. Nebenführ, Tracking rapid intracellular movements: a Bayesian random set approach, Annals of Applied Statistics,2015.

I I. Sgouralis, A. Nebenführ and VM, A Bayesian Topological Framework for the Identification and Reconstruction of SubcellularMotion, SIAM on Imaging Sciences, 2017.

I J. Mike, C. Sumrall, VM and F. Schwartz. Non-Landmark Classification in Paleobiology, Paleobiology, 2016.

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 7 / 45

Page 8: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

Key Picture

I Data lies in a topological space.I Take measurements, sampling that space.I Reconstruct it by using an approximation.I Compute the invariants to understand it.

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 8 / 45

Page 9: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

From Signals to Point Clouds (Taken’s Thm)I Suppose that w : [0,T]→ R are the realizations of a discrete time

series in [0,T].I Consider a set of delay indices τ1, τ2, · · · , τn−1I The n−dimensional delay embedding of W is the concatenation

of time-delayed samples:

W(t) = (w(t),w(t + τ1),w(t + τ2), · · · ,w(t + τn−1)) (1)

Time (t)0 50 100 150 200 250 300

w(t

)

2.8

3

3.2

3.4

3.6

3.8

4

4.2

4.4

Figure: Signal evolution in time domain

4.4

4.2

4

3.8

w(t)

3.6

3.4

3.2

3

Point Cloud

2.82.8

3

3.2

3.4

w(t+τ1)

3.6

3.8

4

4.2

4.4

2.5

3

3.5

4

4.5

w(t

2)

Figure: 3D delay embedding

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 9 / 45

Page 10: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

From Point Clouds to Inference and Classification

I Now we have turned our time-series into a point cloud living inN-dimensional space.

4.4

4.2

4

3.8

w(t)

3.6

3.4

3.2

3

Point Cloud

2.82.8

3

3.2

3.4

w(t+τ1)

3.6

3.8

4

4.2

4.4

2.5

3

3.5

4

4.5

w(t

2)

I How can we extract information from this data and use it forclassification and statistical inference?

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 10 / 45

Page 11: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

Outline

Data Analysis using Persistence Homology

Distribution of random persistence diagrams

Kernel Density Estimation

Conclusion

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 11 / 45

Page 12: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

Simplicial Complex

I Simplicial complexes are discretizations of real-life shapes

I Generalization of graphs with higher order relationships amongthe nodes.

I A simplicial complex is the union of simple pieces (simplices) i.e.vertices, edges, triangles etc.

I A face of k−simplex are all the (k − 1)−simplex.

I Two simplices must intersect at a common face or not at all.

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 12 / 45

Page 13: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

Construction of Simplicial complexes for data

Start with a point-cloud Π and create an abstract representation ofvertices one for each point in your Π.

Figure: Left: Point Cloud; Right: Simplicial Complex

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 13 / 45

Page 14: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

Construction of Simplicial complexes for data

Create “spheres" of radius r centered at each point.

Figure: Left: Point Cloud; Right: Simplicial Complex

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 14 / 45

Page 15: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

Construction of Simplicial complexes for data

Increase radius r.

Figure: Left: Point Cloud; Right: Simplicial Complex

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 15 / 45

Page 16: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

Construction of Simplicial complexes for data

Add edges between vertices vi and vj if the corresponding circlesintersect.

Figure: Left: Point Cloud; Right: Simplicial Complex

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 16 / 45

Page 17: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

Construction of Simplicial complexes for data

Add triangles between vertices vi, vj and vk if all three circles intersect,etc.

Figure: Left: Point Cloud; Right: Simplicial Complex

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 17 / 45

Page 18: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

I Based on the simplicial complex we retrieve the Betti numbers(the dimensions of some vector spaces associated to ourtopological space)

I Betti 0: number of “clusters"I Betti 1: number of holes

Figure: β0 = 2, β1 = 0

Figure: β0 = 2, β1 = 1

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 18 / 45

Page 19: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

I Based on the simplicial complex we retrieve the Betti numbers(the dimensions of some vector spaces associated to ourtopological space)

I Betti 0: number of “clusters"I Betti 1: number of holes

Figure: β0 = 2, β1 = 0 Figure: β0 = 2, β1 = 1

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 18 / 45

Page 20: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

Persistence DiagramsI Interested in is the persistence of the Betti numbers.I When do different connected components/holes form and how

long do they last (with respect to r)?I The Betti numbers compactly encoded in a 2-dim plot which

provides the birth time vs death time of these features

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 19 / 45

Page 21: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

Results on Signals from ARL Dataset

(a)

(c)

(e)

(b)

(d)

(f)

I A. Marchese and VM. Signal classification with a point process distance on the space of persistence diagrams. Advances in DataAnalysis and Classification, pp. 1-26, 2017

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 20 / 45

Page 22: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

Classifier

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 21 / 45

Page 23: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

Statistics and Persistence Diagrams

I Summary statistics such as center and variance (Bobrowski etal., 2014; Mileyko et al., 2011; Turner et al., 2014; Marchese andVM, 2017)

I Birth and death estimates (Emmett et al., 2014)

I Confidence sets (Fasy et al., 2014)

I Need a framework to understand the above summary statisticsthrough a single viewpoint

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 22 / 45

Page 24: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

Novel framework

I A complete and consistent framework of how to constructdistributions of persistence diagrams.

I Capture the important information of these diagrams in terms oftheir inherent set properties

I Set membership

I Cardinality

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 23 / 45

Page 25: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

Setup

I Take data X = xj generated by some random process

I Associated (random) persistence diagram D with featuresξi = (bi, di), such that the a “hole" appears at scale bi and is filledat scale di

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 24 / 45

Page 26: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

Lemma 2.1 (J. Mike & VM, 2018)

Consider a multiset of independent singleton random persistencediagrams

{Dj}M

j=1. If each singleton Dj is described by the valueq(j) = P[Dj 6= ∅] and the subsequent conditional pdf, p(j)(ξ), given∣∣Dj∣∣ = 1, then the global pdf for D = ∪M

j=1Dj is given by

fD(ξ1, ..., ξN) =∑

γ∈I(N,M)

Q(γ)

N∏k=1

p(γ(k))(ξk), (2)

for each N ∈ {0, ...,M} where I(N,M) consists of all increasinginjections γ : {1, ...,N} → {1, ...,M}, and

I The sum over γ ∈ I(N,M) in Eq. (2) accounts for each possiblecombination of singleton presence.

I The weights Q(γ) proportional to the probability for eachsingleton to be either present, q(j), or absent, 1− q(j), for each j.

I J. Mike and VM. Nonparametric Estimation of Probability Density Functions of Random Persistence Diagrams. arXiv:1803.02739

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 25 / 45

Page 27: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

Example 2.2I Consider two 1-dimensional singleton diagrams, D1 and D2, with

probabilities of being nonempty q(1) = 0.6 and q(2) = 0.8.I Local densities when nonempty: p(1)(x) = 1√

2πe−(x+1)2/2 and

p(2)(x) = 1√2π

e−(x−1)2/2.

I Global pdf for D = D1 ∪ D2 through a set of local densities{f0, f1(x), f2(x, y)

}such that

f0 = P[|D| = 0] = (1− q(1))(1− q(2)) = 0.08 (3a)

f1(x) = (1− q(2))q(1)p(1)(x) + (1− q(1))q(2)p(2)(x)

=0.12√

2πe−(x+1)2/2 +

0.32√2π

e−(x−1)2/2,(3b)

f2(x, y) =q(1)q(2)

2

[p(1)(x)p(2)(y) + p(1)(y)p(2)(x)

]=

0.242π

(e−((x−1)2+(y+1)2)/2 + e−((x+1)2+(y−1)2)/2

).

(3c)

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 26 / 45

Page 28: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

Figure: Left: Plot of the local density f1(x) in Eq. (3b); Right: Contour plot ofthe local density f2(x, y) in Eq. (3c). These pdfs cover the different possibleinput dimensions and are symmetric under permutations of the input.

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 27 / 45

Page 29: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

Past Studies-KDE

I Extensive work to devise various maps from persistencediagrams into Hilbert spaces

I Mapping into a Hilbert space, these studies allow the applicationof statistical learning methods such as principal componentanalysis, random forest, support vector machine, etc.

I Chepushtanova et al. (2015) discretizes persistence diagrams via bins, yielding vectors in a high dimensional Euclideanspace.

I Reininghaus et al., (2014) and Kusano et al., (2016) define kernels between persistence diagrams in a ReproducingKernel Hilbert Spaces

I Adler et al. (2017) utilizes Gibbs distributions in order to replicate similar persistence diagrams, e.g. for use in MCMCtype sampling.

I Kernel density estimation on the underlying data to estimate atarget diagram

I Bobrowski et al. (2014) constructs an estimator for the target diagram

I Fasy et al. (2014) defines a confidence set.

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 28 / 45

Page 30: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

Building a Kernel Density

• Goal: kernel density Kσ(Z,D).◦ Center Diagram D◦ Bandwidth σ◦ Input Z.

• Split D into upper and lower halves:◦ Du =

{(b, d) ∈ D : d − b > σ

}◦ D` =

{(b, d) ∈ D : d − b ≤ σ

}• Define random diagrams:◦ Du centered at Du.◦ D` centered at D`.

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 29 / 45

Page 31: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

Building the Upper Density.

• Split into singletons: Du = ∪jDj,u

• Each Dj,u is described by:◦ qj = P[Dj,u 6= ∅].◦ Local pdf, pj(b, d),◦ Restricted Gaussian

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 30 / 45

Page 32: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

Building the Lower Density.

• Lower cardinality N` =∣∣D`

∣∣.◦ Cardinality probability mass: ν(j).◦ Chosen with mean N`.

• Single kernel density p`:◦ Project D` onto the diagonal.◦ Kernel estimate for these points.

• D` has number according to ν,with draws i.i.d. according to p`.

p`(b, d) =1

N`

∑(bi,di)∈D`

1πσ2 e

−((

b− bi+di2

)2+(

d− bi+di2

)2)/2σ2

. (4)

Global pdf D` : fD`(ξ1, ..., ξN) = ν(N)

N∏j=1

p`(ξj). (5)

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 31 / 45

Page 33: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

The Kernel Density

Theorem 3.1 (J. Mike & VM, 2018)The random diagram D = Du ∪ D` with Du and D` defined accordingto the previous construction with center D and bandwidth σ has thefollowing global pdf, or kernel density, evaluated at Z = {ξ1, ..., ξN}:

Kσ(Z,D) =

Nu∑j=0

ν(N − j)∑

γ∈I(j,N)

Q(γ)

j∏k=1

pγ(k)(ξk)

N∏k=j+1

p`(ξk) (6)

where I(j,N) ={γ : {1, ..., j} → {1, ...,N} : γ is increasing

}, and

Q(γ) =

∏Nj=1

(1− qj

)∏jk=1

(1− qγ(k)

) j∏k=1

qγ(k). (7)

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 32 / 45

Page 34: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

Example 3.2Consider D = ((1, 3), (2, 4), (1, 1.3), (3, 3.2)) and σ = 1/2.

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 33 / 45

Page 35: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

Kernel with input cardinality 1

Figure: Contour map for the kernel density restricted toa single input feature (Eq. (8)). The center diagram isindicated by red (upper) and green (lower) points. Scale barsat the right of each plot indicate the range of probabilitydensity in each shaded region

Kernel Kσ((b1, d1),D) equals to:

ν(0)[(1 − q(2)

)q(1)p(1)(b1, d1) + (1 − q(1)

)q(2)p(2)(b1, d1)

]+ ν(1)

[(1 − q(1)

)(1 − q(2))p`(b1, d1)

](8)

p`(b, d) =2π

[e−((b−1.15)2+(d−1.15)2) + e−((b−3.1)2+(d−3.1)2)

]p(1)(b1, d1) ∝ e−2((b1−2)2+(d1−4)2)

p(2)(b1, d1) ∝ e−2((b1−1)2+(d1−3)2)

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 34 / 45

Page 36: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

Kernel with input cardinality 2

Consider Z = (ξ1, ξ2) = ((b1, d1), (b2, d2))

Kσ((ξ1, ξ2),D) = ν(0)q(1)q(2)p(1)(b1, d1)p(2)(b2, d2)

+ ν(1)[(1− q(2))q(1)p(1)(b1, d1)

+ (1− q(1))q(2)p(2)(b1, d1)]p`(b2, d2)

+ ν(2)(1− q(1))(1− q(2))p`(b1, d1)p`(b2, d2)

(9)

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 35 / 45

Page 37: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

Kernel with input cardinality 2

(a)(b) (c)

Figure: Contour maps for slices of the kernel density Kσ((ξ, ξ′2),D) with inputcardinality 2. A single feature ξ′2, indicated by white crosshairs, is fixed torestrict to a 2D subspace as follows: (a) ξ′2 = (1, 3) (b) ξ′2 = (2, 4) and (c)ξ′2 = (2.5, 2.7). The center diagram is indicated by red (upper) and green(lower) points. Scale bars at the right of each plot indicate the range ofprobability density in each shaded region.

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 36 / 45

Page 38: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

Kernel with input cardinality 3

Kσ((ξ1, ξ2, ξ3),D) = ν(1)[q(1)q(2)p(1)(b1, d1)p(2)(b2, d2)

]p`(b3, d3)

+ ν(2)[(1− q(2))q(1)p(1)(b1, d1)

+ ν(2)(1− q(1))q(2)p(2)(b1, d1)]p`(b2, d2)p`(b3, d3)

+ ν(3)(1− q(1))(1− q(2))p`(b1, d1)p`(b2, d2)p`(b3, d3).

(10)

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 37 / 45

Page 39: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

Kernel with input cardinality 3

(a) (b)

Figure: Contour maps for slices of the kernel density Kσ((ξ, ξ′2, ξ′3),D) with

input cardinality 3. A pair of features ξ′2 and ξ′3, indicated by white crosshairs,are fixed to restrict to a 2D subspace as follows: (a) (ξ′2, ξ

′3) = ((1, 3), (2, 4))

and (b) (ξ′2, ξ′3) = ((1, 3), (2.5, 3.5)). The center diagram is indicated by red

(upper) and green (lower) points. Scale bars at the right of each plot indicatethe range of probability density in each shaded region.

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 38 / 45

Page 40: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

Kernel Density Estimation

Theorem 3.3 (J. Mike & VM, 2018)

I Random persistence diagram with global pdf f :I f must satisfy decay and boundedness conditions.I Diagrams {Di}n

i=1 sampled i.i.d. according to f .

I Yield a KDE: f̂ (Z) = 1n

∑ni=1 Kσ(Z,Di)

I σ = O(n−α) chosen with 0 < α ≤ α2M,I f̂ → f uniformly on compact subsets of the space of PDs.

J. Mike and VM. Nonparametric Estimation of Probability DensityFunctions of Random Persistence Diagrams. arXiv:1803.02739

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 39 / 45

Page 41: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

Example 3.4I Generate samples which each consist of 10 points uniformly from

the unit circle with additive Gaussian noise, N((0, 0), ( 150 )2I2).

I Toy dataset for signal analysis.

(a) (b)

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 40 / 45

Page 42: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

I Plots of persistence diagramKDEs.

I Color indicates the probabilitydensity. White regions above thediagonal indicate portions of verylow probability density.

I Each column is a particular slice,while each row is a particularKDE with n and σ.

I Left: Local KDEs f̂n,σ((b, d))evaluated at a diagram with onlyone feature. The mode of theconverged density isapproximately(b′

2, d′2) = (0.77, 0.98).

I Right: Local KDEsf̂n,σ((b, d), (0.77, 0.98)) evaluatedat a diagram with two featuresand one feature fixed.Theseslices have two modes which arevery close to the diagonal at(0, 0) and (1, 1).

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 41 / 45

Page 43: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

Summary

I Considered the problem of estimating the distribution ofpersistence diagrams

I Established a novel kernel density

I We focused on set properties-membership and cardinality

I We established convergence and verified several syntheticexamples.

I With a pdf at hand, we can start implementing Monte Carlosampling and move on to further probabilistic settings

I Bayesian formulation

I Applications in biology, defense, material science and chemistry.

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 42 / 45

Page 44: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

Bayesian Framework

In principle, we can compute posterior distributions using Bayes’theorem for random sets

π(DX|DY) ∝ `(DY |DX)π(DX)

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 43 / 45

Page 45: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

Posterior Approximation

I Mahler (2003)I Singh, Vo, Baddeley and Zuyev (2007)I Caron, Del Moral, Doucet, Pace (2011)

0

1

2

3

0 1 2 3

Birth

Per

sist

ence

0.40.81.2

Intensity

Prior

(a)

0

1

2

3

0 1 2 3

Birth

Per

sist

ence

0.40.81.2

Intensity

Prior

(b)

0

1

2

3

0 1 2 3

Birth

Per

sist

ence

1234

Intensity

Posterior

(c)

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 44 / 45

Page 46: Distributions of Persistence Diagrams and Approximations · Vasileios Maroulas Department of Mathematics Department of Business Analytics and Statistics University of Tennessee August

Data Analysis using Persistence Homology Distribution of random persistence diagrams Kernel Density Estimation Conclusion

Thank you-Questions?

V. Maroulas (UTK) KDE of random persistence diagrams August 31, 2018 45 / 45