![Page 1: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/1.jpg)
Algorithms for analyzing spatio-temporal data
PhD defenseAbhinandan Nath
Department of Computer ScienceDuke University
Committee :Pankaj K. Agarwal (supervisor) Kamesh MunagalaRong Ge Yusu Wang
![Page 2: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/2.jpg)
2
Introduction
![Page 3: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/3.jpg)
3
Introduction
![Page 4: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/4.jpg)
4
The Data Deluge
“Mankind created 150 exabytes (billion gigabytes) of data in 2005. This year, it will create 1,200 exabytes.”
- The Economist, 2010
https://www.economist.com/node/15579717
![Page 5: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/5.jpg)
5
Some (more) numbers ...
USGS National Elevation data (10 metre resolution)[Dewberry, 2012]
NYC taxi pickup and dropoff data, 2009-2016 : 1.3 billion points[towardsdatascience.com]
![Page 6: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/6.jpg)
6
Geometric flavor of data
● Many data sets geometric in nature
● Problems in other domains can be mapped to geometric domain
– e.g., SELECT query in relational databases
NAME AGE SALARY
Alice 26 30,000
Bob 30 35,000
Charlie 28 25,000
... ... ….
![Page 7: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/7.jpg)
7
Challenges
Massive data sets that are -
Noisy [towardsdatascience.com]
Have outliers
Incomplete Time-varying, e.g., trajectories
![Page 8: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/8.jpg)
8
My Research
● Use techniques from computational geometry and topology to tackle some of these challenges in geometric data sets
● Design algorithms that are– Practical– Have provable performance guarantees
![Page 9: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/9.jpg)
9
Broad themes
● Distributed algorithms– Inspired by frameworks like MapReduce [Dean
& Ghemawat, 2008] and Spark [Zaharia et al., 2010]
● Succinct descriptors– Concisely encode desired properties of big
data sets– Noise-robust proxies for data sets– Clustering
![Page 10: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/10.jpg)
10
At a glance
Distributedalgorithms
Succinctdescriptors
Indices to answer range and nearest-neighbor queries [AFMN, 2016]
Triangulation & contour tree of massive terrains [AFMN, 2016]
Comparing merge trees of real-valued functions [AFNSW, 2015]
Common movement patterns from trajectory data [AFMNPT, 2018]
![Page 11: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/11.jpg)
11
At a glance
Distributedalgorithms
Succinctdescriptors
Indices to answer range and nearest-neighbor queries [AFMN, 2016]
Triangulation & contour tree of massive terrains [AFMN, 2016]
Comparing merge trees of real-valued functions [AFNSW, 2015]
Common movement patterns from trajectory data [AFMNPT, 2018]
![Page 12: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/12.jpg)
12
Distributed model of computation
● Massively Parallel Communication (MPC) model [Beame et al., 2013]
● Captures salient features of modern frameworks like MapReduce [Dean & Ghemawat, 2008]
![Page 13: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/13.jpg)
13
MPC model of computation
● : no. of machines● : input distributed
across machines● :
each machine has storage
Assume ,
for
Communication Medium
Input size n
O(s) O(s) O(s) O(s) O(s) O(s)
![Page 14: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/14.jpg)
14
MPC model of computation
● Computation proceeds in rounds– In each round, each machine computes on
local data
● Communication between machines occurs between rounds
● No. of messages sent/received by any machine in a round bounded by
![Page 15: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/15.jpg)
15
Performance measures
● No. of rounds of computation :
● Running time : – : running time of machine in round
–
● Total work :
![Page 16: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/16.jpg)
16
At a glance
Distributedalgorithms
Succinctdescriptors
Indices to answer range and nearest-neighbor queries [AFMN, 2016]
Triangulation & contour tree of massive terrains [AFMN, 2016]
Comparing merge trees of real-valued functions [AFNSW, 2015]
Common movement patterns from trajectory data [AFMNPT, 2018]
Joint work with Pankaj K. Agarwal,Kyle Fox & Kamesh Munagala
![Page 17: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/17.jpg)
17
Indexing big data
● Query big data sets faster, but how?
– Build an index !
● Consider geometric queries– Orthogonal range queries– Nearest-neighbor queries
![Page 18: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/18.jpg)
18
Previous work
● Work on conjunctive and join queries, graph processing in MapReduce and its variants [Lee et al., 2012; Qin et al., 2014; Malewicz et al., 2010; Beame et al., 2013; Koutris et al.,2018; ...]
● Geometric queries - MapReduce implementations for analyzing and querying spatial and geometric data [Eldawy et al., 2013, 2015; Arabi et
al.,2014; …] - no provable performance guarantees!!
![Page 19: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/19.jpg)
19
Our work
Build and query distributed variants of the following classical data structures, with provable performance guarantees
– Orthogonal range searching● Kd-tree [Bentley, 1975]
● Range tree [Bentley, 1980]
![Page 20: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/20.jpg)
20
Our work
Build and query distributed variants of the following classical data structures, with provable performance guarantees
– Orthogonal range searching● Kd-tree [Bentley, 1975]
● Range tree [Bentley, 1980]
![Page 21: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/21.jpg)
21
Our work
Build and query distributed variants of the following classical data structures, with provable performance guarantees
– Orthogonal range searching● Kd-tree [Bentley, 1975]
● Range tree [Bentley, 1980]
![Page 22: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/22.jpg)
22
Our work
Build and query distributed variants of the following classical data structures, with provable performance guarantees
– Orthogonal range searching● Kd-tree [Bentley, 1975]
● Range tree [Bentley, 1980]
– Nearest-neighbor searching● Balanced Box Decomposition
(BBD)-tree [Arya et al., 1998]
![Page 23: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/23.jpg)
23
Our work
Build and query distributed variants of the following classical data structures, with provable performance guarantees
– Orthogonal range searching● Kd-tree [Bentley, 1975]
● Range tree [Bentley, 1980]
– Nearest-neighbor searching● Balanced Box Decomposition
(BBD)-tree [Arya et al., 1998]
![Page 24: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/24.jpg)
24
Our work
Build and query distributed variants of the following classical data structures, with provable performance guarantees
– Orthogonal range searching● Kd-tree [Bentley, 1975]
● Range tree [Bentley, 1980]
– Nearest-neighbor searching● Balanced Box Decomposition
(BBD)-tree [Arya et al., 1998]
![Page 25: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/25.jpg)
25
Our work
Build and query distributed variants of the following classical data structures, with provable performance guarantees
– Orthogonal range searching● Kd-tree [Bentley, 1975]
● Range tree [Bentley, 1980]
– Nearest-neighbor searching● Balanced Box Decomposition
(BBD)-tree [Arya et al., 1998]
![Page 26: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/26.jpg)
26
Our results
: total no. of input points in
: total no. of points reported for a range query
: max no. of points reported by a machine for a range query
![Page 27: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/27.jpg)
27
Our results
● Kd-tree :– Construction : rounds, time,
work
– Query : rounds, time, work – optimal if each point can be stored exactly once
Also extends to partition trees [Chan 2012] for simplex range searching
![Page 28: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/28.jpg)
28
Our results
● Range tree :– Construction : rounds, time,
work
– Query : rounds, time, and work
● BBD-tree :– Construction : rounds, time,
work
– Query : rounds, time and work
![Page 29: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/29.jpg)
29
Key idea : random sampling
● Data structures based on balanced hierarchical partitioning of input points represented as a tree
![Page 30: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/30.jpg)
30
Key idea : random sampling
● Data structures based on balanced hierarchical partitioning of input points represented as a tree
● Approximate this partitioning using a small random sample of input!
![Page 31: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/31.jpg)
31
![Page 32: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/32.jpg)
32
![Page 33: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/33.jpg)
33
![Page 34: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/34.jpg)
34
Balanced partitioning on random sample leads to balanced partitioning on entire set!!
![Page 35: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/35.jpg)
35
At a glance
Distributedalgorithms
Succinctdescriptors
Indices to answer range and nearest-neighbor queries [AFMN, 2016]
Triangulation & contour tree of massive terrains [AFMN, 2016]
Comparing merge trees of real-valued functions [AFNSW, 2015]
Common movement patterns from trajectory data [AFMNPT, 2018]
![Page 36: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/36.jpg)
36
At a glance
Distributedalgorithms
Succinctdescriptors
Indices to answer range and nearest-neighbor queries [AFMN, 2016]
Triangulation & contour tree of massive terrains [AFMN, 2016]
Comparing merge trees of real-valued functions [AFNSW, 2015]
Common movement patterns from trajectory data [AFMNPT, 2018]
Joint work with Pankaj K. Agarwal,Kyle Fox & Kamesh Munagala
![Page 37: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/37.jpg)
37
Terrain modeling
Airborne LiDAR scanning[http://www.lgs.ie/airborne-lidar.shtml]
Raw elevation data (3D point cloud)
[kellylab.berkeley.edu]
Digital Elevation Model (DEM)[gisgeography.com/free-global-dem-data-sources/]
![Page 38: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/38.jpg)
38
From 3D point cloud to DEM
● Terrain – xy-monotone surface in
● Graph of a height function
● Often stored as a triangulated irregular network (TIN)
● How to build TINs and perform terrain analysis in the MPC model ?
![Page 39: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/39.jpg)
39
Our Work
● Build TIN model, using Delaunay triangulation
● Compute the contour tree to succinctly encode all contours of terrain
Input points in
Build terrain model
Build contour tree
Use contour tree Many applications, e.g., waterflow
prediction, climate model viz.
![Page 40: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/40.jpg)
40
Prior Work
● Delaunay triangulation– RAM and I/O model [Crauser et al., 2001]
– PRAM algorithms [Blelloch et al., 1999]
– Goodrich's algorithm [Goodrich, 1997] can be adapted to MPC model – too complicated
– SpatialHadoop [Eldawy et al., 2015] – no theoretical bounds
● Contour tree– RAM and I/O model [Carr et al., 2003; Pascucci and Cole-McLaughlin, 2002; Agarwal
et al., 2010; …]
– Distributed and parallel algorithms [Morozov and Weber, 2013, 2014;
Pascucci and Cole-McLaughlin, 2003; Acharya and Natarajan, 2015; ...]
![Page 41: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/41.jpg)
41
Our results
● Given points, compute its Delaunay triangulation in rounds, time, and work, with high probability
● Given a terrain of size , compute its contour tree in rounds, time, and work
![Page 42: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/42.jpg)
42
Build terrain model
Input points in
Build terrain model
Build contour tree
Use contour tree
![Page 43: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/43.jpg)
43
Delaunay Triangulation
● Given points in , a triangulation of is Delaunay if– No triangle contains
any point of in interior of its circumcircle
● Many useful properties, e.g., avoids skinny triangles
[gamedev.stackexchange.com]
![Page 44: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/44.jpg)
44
Basic idea
● Randomly sample small set of points and compute triangulation of
● Use triangulation of to split input into smaller chunks
● Recurse on each chunk in parallel
![Page 45: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/45.jpg)
45
Algorithm
1. Given points stored across many machines, randomly sample of size and send to one machine
![Page 46: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/46.jpg)
46
Algorithm
1. Given points stored across many machines, randomly sample of size and send to one machine
![Page 47: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/47.jpg)
47
Algorithm
2. Compute , and use it to distribute to disjoint machines
![Page 48: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/48.jpg)
48
Algorithm
2. Compute , and use it to distribute to disjoint machines
![Page 49: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/49.jpg)
49
Algorithm
2. Compute , and use it to distribute to disjoint machines
![Page 50: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/50.jpg)
50
Algorithm
2. Compute , and use it to distribute to disjoint machines
With slight changes, it can be shown that each chunk has size with high probability
![Page 51: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/51.jpg)
51
Algorithm
3. Recursively compute for each chunk in parallel. Can filter unnecessary triangles by simple geometric tests to get
![Page 52: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/52.jpg)
52
Analysis
● No. of levels of recursion is
● Each level takes rounds, time, and work
![Page 53: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/53.jpg)
53
Build contour tree
Input points in
Build TIN DEM
Build contour tree
Use contour tree
![Page 54: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/54.jpg)
54
Level sets and contours
● : triangulation of
● Height function – Defined on each vertex
– Linearly interpolated within each face(triangle)
● Level set
● Contour : connected component of a level set
![Page 55: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/55.jpg)
55
Topology changes at saddle points
Image from [Agarwal et al., 2015]
![Page 56: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/56.jpg)
56
Contour tree
● Obtained by contracting each contour of to a point
Agarwal et al., 2015
![Page 57: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/57.jpg)
57
Our contribution
A simple and efficient divide-and-conquer algorithm to build and store the contour tree of a massive triangulated terrain in MPC model
![Page 58: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/58.jpg)
58
Storage
● Contour tree stored in a distributed fashion
![Page 59: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/59.jpg)
59
Storage
● Contour tree stored in a distributed fashion
– Top subtree : a sized subtree stored on one machine
α2
y2
α3
![Page 60: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/60.jpg)
60
Storage
● Contour tree stored in a distributed fashion
– Top subtree : a sized subtree stored on one machine
– Remaining subtrees stored on other machines, pointers to which stored with
α4
α5
y1y2
x4α3
α2
y2
α3
α2
α1
x1 x2
x3
![Page 61: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/61.jpg)
61
Algorithm (divide step)
1. Split into smaller chunks● Each chunk has same no. of points, goes to
disjoint set of machines
![Page 62: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/62.jpg)
62
Algorithm (divide step)
1. Split into smaller chunks● Each chunk has same no. of points, goes to
disjoint set of machines
![Page 63: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/63.jpg)
63
Algorithm (divide step)
1. Split into smaller chunks● Each chunk has same no. of points, goes to
disjoint set of machines
![Page 64: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/64.jpg)
64
Algorithm (conquer step)
2. Compute distributed contour trees of each chunk recursively in parallel
![Page 65: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/65.jpg)
65
Algorithm (conquer step)
2. Compute distributed contour trees of each chunk recursively in parallel
![Page 66: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/66.jpg)
66
Algorithm (merge step)
3. Combine contour trees to get
![Page 67: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/67.jpg)
67
Algorithm (merge step)
3. Combine contour trees to get – Minimize interaction b/w neighboring chunks– Take advantage of data distribution and
triangulation
![Page 68: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/68.jpg)
68
Our main result
Given a terrain of size , designed algorithm to compute its contour tree in rounds, time, and work
● These bounds are worst-case optimal !
![Page 69: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/69.jpg)
69
At a glance
Distributedalgorithms
Succinctdescriptors
Indices to answer range and nearest-neighbor queries [AFMN, 2016]
Triangulation & contour tree of massive terrains [AFMN, 2016]
Comparing merge trees of real-valued functions [AFNSW, 2015]
Common movement patterns from trajectory data [AFMNPT, 2018]
![Page 70: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/70.jpg)
70
At a glance
Distributedalgorithms
Succinctdescriptors
Indices to answer range and nearest-neighbor queries [AFMN, 2016]
Triangulation & contour tree of massive terrains [AFMN, 2016]
Comparing merge trees of real-valued functions [AFNSW, 2015]
Common movement patterns from trajectory data [AFMNPT, 2018]
Joint work with Pankaj K. Agarwal,Kyle Fox, Tasos Sidiropoulos &
Yusu Wang
Gonna skip!!
![Page 71: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/71.jpg)
71
At a glance
Distributedalgorithms
Succinctdescriptors
Indices to answer range and nearest-neighbor queries [AFMN, 2016]
Triangulation & contour tree of massive terrains [AFMN, 2016]
Comparing merge trees of real-valued functions [AFNSW, 2015]
Common movement patterns from trajectory data [AFMNPT, 2018]
Joint work with Pankaj K. Agarwal,Kyle Fox, Kamesh Munagala,
Jiangwei Pan & Erin Taylor
![Page 72: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/72.jpg)
72
Trajectory data
● Huge data available
– Improve decision making
– Gain insights
● Noisy and incomplete
● Several computational challenges
[https://www.sundried.com]
[developer.huawei.com]
![Page 73: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/73.jpg)
73
Motivation
![Page 74: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/74.jpg)
74
Motivation
![Page 75: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/75.jpg)
75
Motivation
● Subtrajectory clusters capture common portions● Different from clustering trajectories as a whole
![Page 76: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/76.jpg)
76
Motivation
● Extract high-level shared structure from large trajectory data sets
![Page 77: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/77.jpg)
77
Motivation
● Extract high-level shared structure from large trajectory data sets
![Page 78: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/78.jpg)
78
Pathlet
Representative pathlet for each cluster– Cluster “center”– Pathlet is a curve, not necessarily part of the
input
![Page 79: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/79.jpg)
79
Application of pathlets
● Compression of large trajectory data [Chen et al. 2013]
– Hope that each trajectory can be reconstructed with small no. of pathlets
– Small pathlet dictionary - non-linear dimension reduction
● Reconstructing road network from trajectory data [Li et al. 2013; Buchin et al. 2017]
![Page 80: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/80.jpg)
80
Our contribution
● Model for subtrajectory clustering– Robust to noise and missing data
– Data-driven clusters and pathlets
● NP-hardness of subtrajectory clustering problem
● Provably-efficient approximation algorithms– Faster algorithms for realistic inputs
● Experimental results
![Page 81: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/81.jpg)
81
Previous work
● Graph setting – no noise or gaps [Chen et al. 2013]
● Based only on point density [Panagiotakis et al. 2012]
● Restricted to line segments [Lee et al. 2007]
● Search for pre-defined patterns [Fan et al. 2016; Tang et al. 2013; Wang et al. 2015; Zheng et al. 2013]
None of these have provable performance guarantees!!
![Page 82: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/82.jpg)
82
Model and problem formulation
Model inputs :– Trajectories :
– Each trajectory is sequence of points in
● Subtrajectory is subsequence of traj.
– Let be all trajectory points
![Page 83: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/83.jpg)
83
Objective function
![Page 84: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/84.jpg)
84
Objective function
![Page 85: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/85.jpg)
85
Objective function
Need small# pathlets Measure of cluster quality
![Page 86: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/86.jpg)
86
Objective function
Need small# pathlets Measure of cluster quality
![Page 87: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/87.jpg)
87
Objective function
Need small# pathlets Measure of cluster quality
Fraction of pointsunassigned for
each trajectory : “gaps”
![Page 88: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/88.jpg)
88
Objective function
![Page 89: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/89.jpg)
89
A note on the distance
We use discrete Fréchet distance
Given and
● Correspondence s.t. every pt. in at least one pair
● is monotone if for all ,
![Page 90: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/90.jpg)
90
Discrete Fréchet distance
: Set of all monotonone correspondencess b/w ,
![Page 91: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/91.jpg)
91
Choosing pathlets
Given , goal is to choose from set of candidate pathlets to minimize objective function
● If is given as input : pathlet-cover problem
● If not given but assumed to be (uncountably) infinite set of all trajectories in plane : subtrajectory-clustering problem
![Page 92: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/92.jpg)
92
Basic idea
● Reduce to set-cover
● Solve using greedy algorithm : gives approximation
● Challenge : implementing greedy step efficiently
![Page 93: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/93.jpg)
93
Set-cover
Input :● Set system● Cost
Goal is to find of minimum total cost such that
![Page 94: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/94.jpg)
94
From pathlet-cover to set-cover
●
● has two kinds of sets :– For all , with
where
![Page 95: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/95.jpg)
95
From pathlet-cover to set-cover
●
● has two kinds of sets :– For all , with
where
Corresponds to treating as a gap in pathlet cover
![Page 96: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/96.jpg)
96
From pathlet-cover to set-cover
●
● has two kinds of sets :– For all and for any set of subtraj. ,
with
![Page 97: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/97.jpg)
97
From pathlet-cover to set-cover
●
● has two kinds of sets :– For all and for any set of subtraj. ,
with
Corresponds to assigningsubtraj. in to
![Page 98: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/98.jpg)
98
From pathlet-cover to set-cover
●
● has two kinds of sets :– For all and for any set of subtraj. ,
with
Exponential # sets : cannot construct explicitly!!
![Page 99: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/99.jpg)
99
From pathlet-cover to set-cover
Theorem : There exists bijection between feasible solutions of and with same cost across bijection
![Page 100: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/100.jpg)
100
Greedy algorithm for set-cover
Initialize
● At each step add to the set in that maximizes the coverage-to-cost ratio
● Stop when all points are covered
![Page 101: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/101.jpg)
101
Coverage-to-cost ratio
● For let denote coverage-to-cost ratio
![Page 102: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/102.jpg)
102
Coverage-to-cost ratio
● For let denote coverage-to-cost ratio
where is set of uncovered pts. of
![Page 103: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/103.jpg)
103
Coverage-to-cost ratio
● For let denote coverage-cost ratio
, if is not yet covered
, otherwise
![Page 104: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/104.jpg)
104
Implementing greedy step
For each need to compute that maximizes – Tricky, since we do not construct these sets at all !
![Page 105: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/105.jpg)
105
Implementing greedy step
For each need to compute that maximizes – Tricky, since we do not construct these sets at all !
● Best set for can be found in poly-time without explicitly constructing all the sets !!
– Can decompose into contribution corresponding to each traj.
– Independently chose “best” subtraj. from each traj.
![Page 106: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/106.jpg)
106
Our result
Let ,
● Theorem : The greedy algorithm computes a -approximate solution to the pathlet-cover problem in time
![Page 107: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/107.jpg)
107
Subtrajectory clustering
Set of candidate pathlets not given, assumed to be all possible trajectories
![Page 108: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/108.jpg)
108
Reducing # candidate pathlets
● satisfies triangle inequality :– Let candidate pathlets be subtraj. of input traj.– # candidate pathlets is – Optimal solution cost increases by factor of 2
![Page 109: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/109.jpg)
109
Reducing # candidate pathlets
● satisfies triangle inequality :– Let candidate pathlets be subtraj. of input traj.– # candidate pathlets is – Optimal solution cost increases by factor of 2
● :– Can reduce # candidate pathlets to – Cost increases by factor of
![Page 110: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/110.jpg)
110
Improved running time
● For realistic inputs can achieve more speed-up– For each pathlet only subtraj. assigned from
each traj.
● Theorem : For realistic curves using Fréchet distance, can compute -approximate solution to the subtrajectory clustering problem in time
![Page 111: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/111.jpg)
111
Experiments : data sets
Real data sets :● Beijing taxi data [Tsinghua University]
– 28,000 cabs over 4 days
– 9 mil. points
– Incomplete and sparse
![Page 112: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/112.jpg)
112
Experiments : data sets
Real data sets :● GeoLife [Microsoft Research Asia]
– Pedestrian data of 182 users over 4 years
– ~2,600 trajs.
– ~1.5 mill. pts.
● Cycling– 37 traj.
– 106,000 pts.
– Has self-intersections and loops
![Page 113: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/113.jpg)
113
Experiments : data sets
Synthetic data sets :● RTP
– Traffic data generated by web-based tool [http://mntg.cs.umn.edu/tg/index.php]
– Research Triangle in NC
– ~20,000 traj.
– ~1 mill. pts.
![Page 114: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/114.jpg)
114
Dense & popular regions
![Page 115: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/115.jpg)
115
Common trajectory portions
![Page 116: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/116.jpg)
116
Handling noise
![Page 117: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/117.jpg)
117
Gaps
![Page 118: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/118.jpg)
118
Data-driven pathlets
![Page 119: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/119.jpg)
119
Summary
● Indexing big data
● Massive terrain analysis
● Comparing merge trees - briefly
● Extracting common movement patterns from trajectories
![Page 120: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/120.jpg)
120
Future directions
● MPC model– Point location queries, multiway separators for
planar graphs ...
– Big open problem – general graph connectivity in rounds
– Other open problems in parallel query processing in databases [Koutris et al. 2018]
● Gromov-Hausdorff distance– Big gap b/w upper and lower bound :(
– More research into additive distortion of metric embeddings
![Page 121: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/121.jpg)
121
Future directions
● Trajectory clustering– Efficient -approx. to k-center, k-median, k-
means for say Frechet distance
– Stumbling block – infinite doubling dimension
– Work by [Driemel et al. 2016] on clustering time-series data● Running time is exponential in complexity of cluster
centers – assumed to be constant● Is it a good assumption??
– What are good assumptions? Perturbation resilience? Stability?
● Can anything interesting be proved ?
![Page 122: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/122.jpg)
122
Acknowledgements
![Page 123: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/123.jpg)
123
Committee
Pankaj
Kamesh Rong Yusu
![Page 124: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/124.jpg)
124
Collaborators
Pankaj Kamesh YusuKyle Tasos
Jiangwei Erin
![Page 125: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/125.jpg)
125
Theory group
![Page 126: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/126.jpg)
126
CS@Duke
● Ergys and Cassie; other students ...
● Marilyn, Pam, Celeste, Alison, Kathleen …
● CS Lab staff
![Page 127: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/127.jpg)
127
Outside Duke
![Page 128: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/128.jpg)
128
![Page 129: Algorithms for analyzing spatio-temporal dataabhinath/defense_slides.pdfAlgorithms for analyzing spatio-temporal data PhD defense Abhinandan Nath Department of Computer Science Duke](https://reader033.vdocument.in/reader033/viewer/2022060508/5f242a1554ea1c6043507f51/html5/thumbnails/129.jpg)
129