structural analysis of multivariate point clouds using simplicial … · 2019. 2. 10. · bastian...

27
Structural Analysis of Multivariate Point Clouds using Simplicial Chains Bastian Rieck Heike Leitte Interdisciplinary Center for Scientific Computing Heidelberg University Bastian Rieck Structural Analysis of Multivariate Point Clouds 1

Upload: others

Post on 21-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Structural Analysis of Multivariate Point Clouds using Simplicial … · 2019. 2. 10. · Bastian RieckStructural Analysis of Multivariate Point Clouds1. Title: Structural Analysis

Structural Analysis of Multivariate Point Cloudsusing Simplicial Chains

Bastian Rieck Heike Leitte

Interdisciplinary Center for Scientific ComputingHeidelberg University

Bastian Rieck Structural Analysis of Multivariate Point Clouds 1

Page 2: Structural Analysis of Multivariate Point Clouds using Simplicial … · 2019. 2. 10. · Bastian RieckStructural Analysis of Multivariate Point Clouds1. Title: Structural Analysis

Motivation

SPLOM

MDS Isomap

Common workflow:

Use e.g. SPLOM to make sense of data.Use dimensionality reduction methods to obtain visualizationfor exploratory data analysis.

Bastian Rieck Structural Analysis of Multivariate Point Clouds 2

Page 3: Structural Analysis of Multivariate Point Clouds using Simplicial … · 2019. 2. 10. · Bastian RieckStructural Analysis of Multivariate Point Clouds1. Title: Structural Analysis

Motivation

SPLOM

MDS Isomap

Common workflow:Use e.g. SPLOM to make sense of data.

Use dimensionality reduction methods to obtain visualizationfor exploratory data analysis.

Bastian Rieck Structural Analysis of Multivariate Point Clouds 2

Page 4: Structural Analysis of Multivariate Point Clouds using Simplicial … · 2019. 2. 10. · Bastian RieckStructural Analysis of Multivariate Point Clouds1. Title: Structural Analysis

Motivation

SPLOM

MDS Isomap

Common workflow:Use e.g. SPLOM to make sense of data.Use dimensionality reduction methods to obtain visualizationfor exploratory data analysis.

Bastian Rieck Structural Analysis of Multivariate Point Clouds 2

Page 5: Structural Analysis of Multivariate Point Clouds using Simplicial … · 2019. 2. 10. · Bastian RieckStructural Analysis of Multivariate Point Clouds1. Title: Structural Analysis

Motivation

SPLOM MDS

Isomap

Common workflow:Use e.g. SPLOM to make sense of data.Use dimensionality reduction methods to obtain visualizationfor exploratory data analysis.

Bastian Rieck Structural Analysis of Multivariate Point Clouds 2

Page 6: Structural Analysis of Multivariate Point Clouds using Simplicial … · 2019. 2. 10. · Bastian RieckStructural Analysis of Multivariate Point Clouds1. Title: Structural Analysis

Motivation

SPLOM MDS Isomap

Common workflow:Use e.g. SPLOM to make sense of data.Use dimensionality reduction methods to obtain visualizationfor exploratory data analysis.

Bastian Rieck Structural Analysis of Multivariate Point Clouds 2

Page 7: Structural Analysis of Multivariate Point Clouds using Simplicial … · 2019. 2. 10. · Bastian RieckStructural Analysis of Multivariate Point Clouds1. Title: Structural Analysis

Beyond comparing projection scatterplots?

Bastian Rieck Structural Analysis of Multivariate Point Clouds 3

Page 8: Structural Analysis of Multivariate Point Clouds using Simplicial … · 2019. 2. 10. · Bastian RieckStructural Analysis of Multivariate Point Clouds1. Title: Structural Analysis

Typical goalQuantify changes & differences

Bastian Rieck Structural Analysis of Multivariate Point Clouds 4

Page 9: Structural Analysis of Multivariate Point Clouds using Simplicial … · 2019. 2. 10. · Bastian RieckStructural Analysis of Multivariate Point Clouds1. Title: Structural Analysis

Our approach

Combine geometrical (distance-preserving) and topological(structure-preserving) methods.Visualize internal connectivity of a data set as a graph

Bastian Rieck Structural Analysis of Multivariate Point Clouds 5

Page 10: Structural Analysis of Multivariate Point Clouds using Simplicial … · 2019. 2. 10. · Bastian RieckStructural Analysis of Multivariate Point Clouds1. Title: Structural Analysis

Topology

b1 = 2, b2 = 1

Describe data sets by high-dimensional “holes”.Hole ≈ inhomogeneous region in the data.Detection using persistent homology.Description using simplicial chains

Bastian Rieck Structural Analysis of Multivariate Point Clouds 6

Page 11: Structural Analysis of Multivariate Point Clouds using Simplicial … · 2019. 2. 10. · Bastian RieckStructural Analysis of Multivariate Point Clouds1. Title: Structural Analysis

The need for geometrical informationWithout geometrical information:

What we want

What we get

With geometrical information:

What we want What we get

Bastian Rieck Structural Analysis of Multivariate Point Clouds 7

Page 12: Structural Analysis of Multivariate Point Clouds using Simplicial … · 2019. 2. 10. · Bastian RieckStructural Analysis of Multivariate Point Clouds1. Title: Structural Analysis

The need for geometrical informationWithout geometrical information:

What we want What we get

With geometrical information:

What we want What we get

Bastian Rieck Structural Analysis of Multivariate Point Clouds 7

Page 13: Structural Analysis of Multivariate Point Clouds using Simplicial … · 2019. 2. 10. · Bastian RieckStructural Analysis of Multivariate Point Clouds1. Title: Structural Analysis

The need for geometrical informationWithout geometrical information:

What we want What we get

With geometrical information:

What we want

What we get

Bastian Rieck Structural Analysis of Multivariate Point Clouds 7

Page 14: Structural Analysis of Multivariate Point Clouds using Simplicial … · 2019. 2. 10. · Bastian RieckStructural Analysis of Multivariate Point Clouds1. Title: Structural Analysis

The need for geometrical informationWithout geometrical information:

What we want What we get

With geometrical information:

What we want What we get

Bastian Rieck Structural Analysis of Multivariate Point Clouds 7

Page 15: Structural Analysis of Multivariate Point Clouds using Simplicial … · 2019. 2. 10. · Bastian RieckStructural Analysis of Multivariate Point Clouds1. Title: Structural Analysis

Finding loops

Bastian Rieck Structural Analysis of Multivariate Point Clouds 8

Page 16: Structural Analysis of Multivariate Point Clouds using Simplicial … · 2019. 2. 10. · Bastian RieckStructural Analysis of Multivariate Point Clouds1. Title: Structural Analysis

Localization of loops

Obtain unlocalized loops by traversal.Solve all-pairs shortest path problem.Obtain localized loops by traversal.

Bastian Rieck Structural Analysis of Multivariate Point Clouds 9

Page 17: Structural Analysis of Multivariate Point Clouds using Simplicial … · 2019. 2. 10. · Bastian RieckStructural Analysis of Multivariate Point Clouds1. Title: Structural Analysis

The simplicial chain graphIdealized example

Bastian Rieck Structural Analysis of Multivariate Point Clouds 10

Page 18: Structural Analysis of Multivariate Point Clouds using Simplicial … · 2019. 2. 10. · Bastian RieckStructural Analysis of Multivariate Point Clouds1. Title: Structural Analysis

The simplicial chain graphIn practice

Bastian Rieck Structural Analysis of Multivariate Point Clouds 11

Page 19: Structural Analysis of Multivariate Point Clouds using Simplicial … · 2019. 2. 10. · Bastian RieckStructural Analysis of Multivariate Point Clouds1. Title: Structural Analysis

Data set: Voting data

Alice +1 +1 . . . +1Bob −1 −1 . . . −1Carol +1 −1 . . . 0Dave 0 0 . . . 0

Voting data from the United States House of Representatives.About 430 points of dimensions 600–900.Data sets from 1990–2011.

Bastian Rieck Structural Analysis of Multivariate Point Clouds 12

Page 20: Structural Analysis of Multivariate Point Clouds using Simplicial … · 2019. 2. 10. · Bastian RieckStructural Analysis of Multivariate Point Clouds1. Title: Structural Analysis

Data set: Voting data

Democrats

Republicans

Republicans

Democrats

2008 2009

Boundaries between parties are defined by “dissenters” from theparty line.

Bastian Rieck Structural Analysis of Multivariate Point Clouds 13

Page 21: Structural Analysis of Multivariate Point Clouds using Simplicial … · 2019. 2. 10. · Bastian RieckStructural Analysis of Multivariate Point Clouds1. Title: Structural Analysis

Data set: TAOTropical atmosphere ocean project

El Nino phenomenon.Continuous data stream from all buoy moorings.5-dimensional feature space: Wind velocities, humidity, airtemperature, and sea surface temperature.

Bastian Rieck Structural Analysis of Multivariate Point Clouds 14

Page 22: Structural Analysis of Multivariate Point Clouds using Simplicial … · 2019. 2. 10. · Bastian RieckStructural Analysis of Multivariate Point Clouds1. Title: Structural Analysis

Data set: TAO

1993, 1996: No El Nino phenomenon in data set.1994–1995, 1997–1998: El Nino phenomenon.

Bastian Rieck Structural Analysis of Multivariate Point Clouds 15

Page 23: Structural Analysis of Multivariate Point Clouds using Simplicial … · 2019. 2. 10. · Bastian RieckStructural Analysis of Multivariate Point Clouds1. Title: Structural Analysis

Lessons learned

Features obtained via persistent homology are suitable forcomparative analysis.Visualization of internal point cloud structure quickly becomestoo abstract.Thus: Aim to quantify differences in persistent homology usingwell-defined metrics.

Bastian Rieck Structural Analysis of Multivariate Point Clouds 16

Page 24: Structural Analysis of Multivariate Point Clouds using Simplicial … · 2019. 2. 10. · Bastian RieckStructural Analysis of Multivariate Point Clouds1. Title: Structural Analysis

Summarize data even further

Data sets Persistence diagrams

Persistenthomology

Bastian Rieck Structural Analysis of Multivariate Point Clouds 17

Page 25: Structural Analysis of Multivariate Point Clouds using Simplicial … · 2019. 2. 10. · Bastian RieckStructural Analysis of Multivariate Point Clouds1. Title: Structural Analysis

ApplicationQuantifying differences in models for multivariate data

bagged trees

regression trees

enet

enet tuned

knn

lm tuned

lm

m5

mars

plspls tuned

random forest

ridge

ridge tuned

rlm pca

rlm

rpart

svmsvm tuned

cubist

Bastian Rieck Structural Analysis of Multivariate Point Clouds 18

Page 26: Structural Analysis of Multivariate Point Clouds using Simplicial … · 2019. 2. 10. · Bastian RieckStructural Analysis of Multivariate Point Clouds1. Title: Structural Analysis

ApplicationA framework for comparing dimensionality reduction methods

Define data descriptors on data set.Calculate their persistent homology.Compare their “topological fingerprints”.

Overall quality Worse

HLLE1.66

t-SNE2.26

Isomap2.26

PCA3.42

RP3.71

SPE10.67

Best

Bastian Rieck Structural Analysis of Multivariate Point Clouds 19

Page 27: Structural Analysis of Multivariate Point Clouds using Simplicial … · 2019. 2. 10. · Bastian RieckStructural Analysis of Multivariate Point Clouds1. Title: Structural Analysis

Conclusion

Simplicial chain graphs as a visual metaphor for structures inmultivariate point clouds.Aspects: Comparison and quantifying differences graphically.Recent work concentrates on metric quantification usingpersistent homology.Open question: What “interesting” structures inhigh-dimensional space can we capture?

Bastian Rieck Structural Analysis of Multivariate Point Clouds 20