topological data analysis: visual presentation of multidimensional data sets
DESCRIPTION
Topology data analysis (TDA) is an unsupervised approach which may revolutionise the way data can be mined and eventually drive the new generation of analytical tools. The idea behind TDA is an attempt to "measure" shape of data and find compressed combinatorial representation of the shape. In ordinary topology, the combinatorial representations serve the purpose of providing the compressed representation of high dimensional data sets which retains information about the geometric relationships between data points. TDA can also be used as a very powerful clustering technique. Edward will present the comparison between TDA and other dimension reduction algorithms like PCA, LLE, Isomap, MDS, and Spectral Embedding.TRANSCRIPT
![Page 1: Topological Data Analysis: visual presentation of multidimensional data sets](https://reader033.vdocument.in/reader033/viewer/2022050920/54c3360e4a79595c528b45a1/html5/thumbnails/1.jpg)
Topological Data Analysis
Visual presentation of multidimensional data sets
![Page 2: Topological Data Analysis: visual presentation of multidimensional data sets](https://reader033.vdocument.in/reader033/viewer/2022050920/54c3360e4a79595c528b45a1/html5/thumbnails/2.jpg)
Current vs New SQL Topological Data Analysis
![Page 3: Topological Data Analysis: visual presentation of multidimensional data sets](https://reader033.vdocument.in/reader033/viewer/2022050920/54c3360e4a79595c528b45a1/html5/thumbnails/3.jpg)
Topology
The Seven Bridges of Königsberg, a problem solved by Leonard Euler (1736).
The study of qualitative properties of certain objects (topological spaces) that are invariant under a certain kind of transformation (continuous map), especially those properties that are invariant under a certain kind of equivalence (homeomorphism).
![Page 4: Topological Data Analysis: visual presentation of multidimensional data sets](https://reader033.vdocument.in/reader033/viewer/2022050920/54c3360e4a79595c528b45a1/html5/thumbnails/4.jpg)
Topology Data Analysis Pipeline
a b
a. First approximate the unknown space X in a combinatorial structure K
b. Then compute topological invariants of K
![Page 5: Topological Data Analysis: visual presentation of multidimensional data sets](https://reader033.vdocument.in/reader033/viewer/2022050920/54c3360e4a79595c528b45a1/html5/thumbnails/5.jpg)
Combinatorial Representations The Čech Complex
![Page 6: Topological Data Analysis: visual presentation of multidimensional data sets](https://reader033.vdocument.in/reader033/viewer/2022050920/54c3360e4a79595c528b45a1/html5/thumbnails/6.jpg)
Combinatorial Representations Alpha Complex Vietoris-‐‑Rips Complex
Cubical Complex Witness Complex
![Page 7: Topological Data Analysis: visual presentation of multidimensional data sets](https://reader033.vdocument.in/reader033/viewer/2022050920/54c3360e4a79595c528b45a1/html5/thumbnails/7.jpg)
Topological Invariants A topological invariant is a map f that assigns the same object to homeomorphic spaces, that is:
Homology: is a machine that converts local data about a space into global algebraic structure
Reference: Wikipedia, 2010.
![Page 8: Topological Data Analysis: visual presentation of multidimensional data sets](https://reader033.vdocument.in/reader033/viewer/2022050920/54c3360e4a79595c528b45a1/html5/thumbnails/8.jpg)
Morse Theory and Reeb Graph Theorem: Suppose h : X g is a discrete Morse function. Then X is homotopy equivalent to a CW-‐‑complex with exactly one cell of dimension p for each critical simplex of dimension p.
Reference: Teng Ma ; Zhuangzhi Wu ; Pei Luo ; Lu Feng. Reeb graph computation through spectral clustering, 2011.
![Page 9: Topological Data Analysis: visual presentation of multidimensional data sets](https://reader033.vdocument.in/reader033/viewer/2022050920/54c3360e4a79595c528b45a1/html5/thumbnails/9.jpg)
Case study: Demographics
Data shape: [220:45]
![Page 10: Topological Data Analysis: visual presentation of multidimensional data sets](https://reader033.vdocument.in/reader033/viewer/2022050920/54c3360e4a79595c528b45a1/html5/thumbnails/10.jpg)
Case study: YT channel stats
Data shape: [1500:12]
![Page 11: Topological Data Analysis: visual presentation of multidimensional data sets](https://reader033.vdocument.in/reader033/viewer/2022050920/54c3360e4a79595c528b45a1/html5/thumbnails/11.jpg)
Case study: Netflix dataset
Data shape: [17770:480189] 8.5 billions of elements
![Page 12: Topological Data Analysis: visual presentation of multidimensional data sets](https://reader033.vdocument.in/reader033/viewer/2022050920/54c3360e4a79595c528b45a1/html5/thumbnails/12.jpg)
Case study: Netflix dataset
Music
Indian
Anime
French
Honk Kong
US Cartoons
Kids Movie
German
US Retro
Horror
![Page 13: Topological Data Analysis: visual presentation of multidimensional data sets](https://reader033.vdocument.in/reader033/viewer/2022050920/54c3360e4a79595c528b45a1/html5/thumbnails/13.jpg)
Case study: Netflix comparison
PCA Isomap
LLE
Spectral Embedding
LTSA Hessian LLE
![Page 14: Topological Data Analysis: visual presentation of multidimensional data sets](https://reader033.vdocument.in/reader033/viewer/2022050920/54c3360e4a79595c528b45a1/html5/thumbnails/14.jpg)
Case study: Netflix (music)
![Page 15: Topological Data Analysis: visual presentation of multidimensional data sets](https://reader033.vdocument.in/reader033/viewer/2022050920/54c3360e4a79595c528b45a1/html5/thumbnails/15.jpg)
Case study: Netflix (kids movie)
![Page 16: Topological Data Analysis: visual presentation of multidimensional data sets](https://reader033.vdocument.in/reader033/viewer/2022050920/54c3360e4a79595c528b45a1/html5/thumbnails/16.jpg)
Case study: Netflix (horror)
![Page 18: Topological Data Analysis: visual presentation of multidimensional data sets](https://reader033.vdocument.in/reader033/viewer/2022050920/54c3360e4a79595c528b45a1/html5/thumbnails/18.jpg)
Questions?