structural analysis of multivariate point clouds using simplicial … · 2019. 2. 10. · bastian...

Post on 21-Sep-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Structural Analysis of Multivariate Point Cloudsusing Simplicial Chains

Bastian Rieck Heike Leitte

Interdisciplinary Center for Scientific ComputingHeidelberg University

Bastian Rieck Structural Analysis of Multivariate Point Clouds 1

Motivation

SPLOM

MDS Isomap

Common workflow:

Use e.g. SPLOM to make sense of data.Use dimensionality reduction methods to obtain visualizationfor exploratory data analysis.

Bastian Rieck Structural Analysis of Multivariate Point Clouds 2

Motivation

SPLOM

MDS Isomap

Common workflow:Use e.g. SPLOM to make sense of data.

Use dimensionality reduction methods to obtain visualizationfor exploratory data analysis.

Bastian Rieck Structural Analysis of Multivariate Point Clouds 2

Motivation

SPLOM

MDS Isomap

Common workflow:Use e.g. SPLOM to make sense of data.Use dimensionality reduction methods to obtain visualizationfor exploratory data analysis.

Bastian Rieck Structural Analysis of Multivariate Point Clouds 2

Motivation

SPLOM MDS

Isomap

Common workflow:Use e.g. SPLOM to make sense of data.Use dimensionality reduction methods to obtain visualizationfor exploratory data analysis.

Bastian Rieck Structural Analysis of Multivariate Point Clouds 2

Motivation

SPLOM MDS Isomap

Common workflow:Use e.g. SPLOM to make sense of data.Use dimensionality reduction methods to obtain visualizationfor exploratory data analysis.

Bastian Rieck Structural Analysis of Multivariate Point Clouds 2

Beyond comparing projection scatterplots?

Bastian Rieck Structural Analysis of Multivariate Point Clouds 3

Typical goalQuantify changes & differences

Bastian Rieck Structural Analysis of Multivariate Point Clouds 4

Our approach

Combine geometrical (distance-preserving) and topological(structure-preserving) methods.Visualize internal connectivity of a data set as a graph

Bastian Rieck Structural Analysis of Multivariate Point Clouds 5

Topology

b1 = 2, b2 = 1

Describe data sets by high-dimensional “holes”.Hole ≈ inhomogeneous region in the data.Detection using persistent homology.Description using simplicial chains

Bastian Rieck Structural Analysis of Multivariate Point Clouds 6

The need for geometrical informationWithout geometrical information:

What we want

What we get

With geometrical information:

What we want What we get

Bastian Rieck Structural Analysis of Multivariate Point Clouds 7

The need for geometrical informationWithout geometrical information:

What we want What we get

With geometrical information:

What we want What we get

Bastian Rieck Structural Analysis of Multivariate Point Clouds 7

The need for geometrical informationWithout geometrical information:

What we want What we get

With geometrical information:

What we want

What we get

Bastian Rieck Structural Analysis of Multivariate Point Clouds 7

The need for geometrical informationWithout geometrical information:

What we want What we get

With geometrical information:

What we want What we get

Bastian Rieck Structural Analysis of Multivariate Point Clouds 7

Finding loops

Bastian Rieck Structural Analysis of Multivariate Point Clouds 8

Localization of loops

Obtain unlocalized loops by traversal.Solve all-pairs shortest path problem.Obtain localized loops by traversal.

Bastian Rieck Structural Analysis of Multivariate Point Clouds 9

The simplicial chain graphIdealized example

Bastian Rieck Structural Analysis of Multivariate Point Clouds 10

The simplicial chain graphIn practice

Bastian Rieck Structural Analysis of Multivariate Point Clouds 11

Data set: Voting data

Alice +1 +1 . . . +1Bob −1 −1 . . . −1Carol +1 −1 . . . 0Dave 0 0 . . . 0

Voting data from the United States House of Representatives.About 430 points of dimensions 600–900.Data sets from 1990–2011.

Bastian Rieck Structural Analysis of Multivariate Point Clouds 12

Data set: Voting data

Democrats

Republicans

Republicans

Democrats

2008 2009

Boundaries between parties are defined by “dissenters” from theparty line.

Bastian Rieck Structural Analysis of Multivariate Point Clouds 13

Data set: TAOTropical atmosphere ocean project

El Nino phenomenon.Continuous data stream from all buoy moorings.5-dimensional feature space: Wind velocities, humidity, airtemperature, and sea surface temperature.

Bastian Rieck Structural Analysis of Multivariate Point Clouds 14

Data set: TAO

1993, 1996: No El Nino phenomenon in data set.1994–1995, 1997–1998: El Nino phenomenon.

Bastian Rieck Structural Analysis of Multivariate Point Clouds 15

Lessons learned

Features obtained via persistent homology are suitable forcomparative analysis.Visualization of internal point cloud structure quickly becomestoo abstract.Thus: Aim to quantify differences in persistent homology usingwell-defined metrics.

Bastian Rieck Structural Analysis of Multivariate Point Clouds 16

Summarize data even further

Data sets Persistence diagrams

Persistenthomology

Bastian Rieck Structural Analysis of Multivariate Point Clouds 17

ApplicationQuantifying differences in models for multivariate data

bagged trees

regression trees

enet

enet tuned

knn

lm tuned

lm

m5

mars

plspls tuned

random forest

ridge

ridge tuned

rlm pca

rlm

rpart

svmsvm tuned

cubist

Bastian Rieck Structural Analysis of Multivariate Point Clouds 18

ApplicationA framework for comparing dimensionality reduction methods

Define data descriptors on data set.Calculate their persistent homology.Compare their “topological fingerprints”.

Overall quality Worse

HLLE1.66

t-SNE2.26

Isomap2.26

PCA3.42

RP3.71

SPE10.67

Best

Bastian Rieck Structural Analysis of Multivariate Point Clouds 19

Conclusion

Simplicial chain graphs as a visual metaphor for structures inmultivariate point clouds.Aspects: Comparison and quantifying differences graphically.Recent work concentrates on metric quantification usingpersistent homology.Open question: What “interesting” structures inhigh-dimensional space can we capture?

Bastian Rieck Structural Analysis of Multivariate Point Clouds 20

top related