![Page 1: DITA: Distributed In-Memory Trajectory AnalyticsDITA: Distributed In-memory Trajectory Analytics Support trajectory similarity search and join with SQL and DataFrame API Support most](https://reader030.vdocument.in/reader030/viewer/2022040912/5e877a2b69a7b77718627d78/html5/thumbnails/1.jpg)
DITA: Distributed In-Memory Trajectory Analytics
Zeyuan Shang(MIT), Guoliang Li(Tsinghua), Zhifeng Bao(RMIT)[email protected]
![Page 2: DITA: Distributed In-Memory Trajectory AnalyticsDITA: Distributed In-memory Trajectory Analytics Support trajectory similarity search and join with SQL and DataFrame API Support most](https://reader030.vdocument.in/reader030/viewer/2022040912/5e877a2b69a7b77718627d78/html5/thumbnails/2.jpg)
Motivation
Trajectory data is getting bigger and bigger
2 Billion Uber trips by 06/201662 Million Uber trips in 06/2016
![Page 3: DITA: Distributed In-Memory Trajectory AnalyticsDITA: Distributed In-memory Trajectory Analytics Support trajectory similarity search and join with SQL and DataFrame API Support most](https://reader030.vdocument.in/reader030/viewer/2022040912/5e877a2b69a7b77718627d78/html5/thumbnails/3.jpg)
Motivation
Applications of trajectory analytics
Trajectory Recommendation Road Planning Transportation Optimization
![Page 4: DITA: Distributed In-Memory Trajectory AnalyticsDITA: Distributed In-memory Trajectory Analytics Support trajectory similarity search and join with SQL and DataFrame API Support most](https://reader030.vdocument.in/reader030/viewer/2022040912/5e877a2b69a7b77718627d78/html5/thumbnails/4.jpg)
Motivation
Existing systems are limited in a number of ways● Data locality● Load balance● Easy-to-use interface● Versatility to support various trajectory similarity
functions
○ Non-metric ones: DTW, LCSS, EDR
○ Metric ones: Frechet
![Page 5: DITA: Distributed In-Memory Trajectory AnalyticsDITA: Distributed In-memory Trajectory Analytics Support trajectory similarity search and join with SQL and DataFrame API Support most](https://reader030.vdocument.in/reader030/viewer/2022040912/5e877a2b69a7b77718627d78/html5/thumbnails/5.jpg)
Background
● Trajectory: a sequence of multi-dimensional points○ E.g., (1, 2) -> (2, 3) -> (3, 4) -> (5, 5)
● Distance Function between trajectories (e.g., Dynamic Time Warping)
![Page 6: DITA: Distributed In-Memory Trajectory AnalyticsDITA: Distributed In-memory Trajectory Analytics Support trajectory similarity search and join with SQL and DataFrame API Support most](https://reader030.vdocument.in/reader030/viewer/2022040912/5e877a2b69a7b77718627d78/html5/thumbnails/6.jpg)
Background
Trajectory Similarity
Given two trajectories T and Q, a trajectory-based distancefunction f (e.g., DTW), and a threshold !, if f(T, Q) ≦ !,wesay that T and Q are similar.
![Page 7: DITA: Distributed In-Memory Trajectory AnalyticsDITA: Distributed In-memory Trajectory Analytics Support trajectory similarity search and join with SQL and DataFrame API Support most](https://reader030.vdocument.in/reader030/viewer/2022040912/5e877a2b69a7b77718627d78/html5/thumbnails/7.jpg)
Overview of System
● Built on Spark SQL● Support SQL and DataFrame● Filter-verification framework
![Page 8: DITA: Distributed In-Memory Trajectory AnalyticsDITA: Distributed In-memory Trajectory Analytics Support trajectory similarity search and join with SQL and DataFrame API Support most](https://reader030.vdocument.in/reader030/viewer/2022040912/5e877a2b69a7b77718627d78/html5/thumbnails/8.jpg)
Overview of Methods
● Index○ Partitioning○ Global and Local Index
● Trajectory Similarity Search○ Filter (global + local)○ Verification
● Trajectory Similarity Join○ Cost Models○ Division-based Load Balancing
![Page 9: DITA: Distributed In-Memory Trajectory AnalyticsDITA: Distributed In-memory Trajectory Analytics Support trajectory similarity search and join with SQL and DataFrame API Support most](https://reader030.vdocument.in/reader030/viewer/2022040912/5e877a2b69a7b77718627d78/html5/thumbnails/9.jpg)
Indexing
Partitioning
ROOT
…
… … … …
first point
last point
![Page 10: DITA: Distributed In-Memory Trajectory AnalyticsDITA: Distributed In-memory Trajectory Analytics Support trajectory similarity search and join with SQL and DataFrame API Support most](https://reader030.vdocument.in/reader030/viewer/2022040912/5e877a2b69a7b77718627d78/html5/thumbnails/10.jpg)
Indexing
Global Index○ If MinDist(q, MBR) ≤ !, then for any q ∈ MBR, Dist(p, q) ≤ !○ If MinDist(q, #$%&) + MinDist(q, #$%') > !, then the partition (f, l)
doesn’t have trajectories similar with qROOT
…
MBR1,NG
fMBR1,NG
f… … … …MBR
1,1fMBR1,1f MBR
2,1fMBR2,1f MBR
2,NG
fMBR2,NG
f MBRNG,1fMBRNG,1f MBR
NG,NG
fMBRNG,NG
f
ROOT
…
MBR1,NG
lMBR1,NG
l… … … …MBR
1,1lMBR1,1l MBR
2,1lMBR2,1l MBR
2,NG
lMBR2,NG
l MBRNG,1lMBRNG,1l MBR
NG,NG
lMBRNG,NG
l
![Page 11: DITA: Distributed In-Memory Trajectory AnalyticsDITA: Distributed In-memory Trajectory Analytics Support trajectory similarity search and join with SQL and DataFrame API Support most](https://reader030.vdocument.in/reader030/viewer/2022040912/5e877a2b69a7b77718627d78/html5/thumbnails/11.jpg)
Indexing
● Pivot Point Based Distance Estimation
![Page 12: DITA: Distributed In-Memory Trajectory AnalyticsDITA: Distributed In-memory Trajectory Analytics Support trajectory similarity search and join with SQL and DataFrame API Support most](https://reader030.vdocument.in/reader030/viewer/2022040912/5e877a2b69a7b77718627d78/html5/thumbnails/12.jpg)
Indexing
Local Index
![Page 13: DITA: Distributed In-Memory Trajectory AnalyticsDITA: Distributed In-memory Trajectory Analytics Support trajectory similarity search and join with SQL and DataFrame API Support most](https://reader030.vdocument.in/reader030/viewer/2022040912/5e877a2b69a7b77718627d78/html5/thumbnails/13.jpg)
Trajectory Similarity Search
● Basic Idea○ Global Pruning: find relevant partitions○ Local Search: find similar trajectories
![Page 14: DITA: Distributed In-Memory Trajectory AnalyticsDITA: Distributed In-memory Trajectory Analytics Support trajectory similarity search and join with SQL and DataFrame API Support most](https://reader030.vdocument.in/reader030/viewer/2022040912/5e877a2b69a7b77718627d78/html5/thumbnails/14.jpg)
Trajectory Similarity Join
● Cost Models● Join Graph● Weight of edges (a->b)
● a sends candidate trajectories to b● Transmission cost of a (data transmitted)● Computation cost of b (candidate pairs)
● Built by Sampling
AB
![Page 15: DITA: Distributed In-Memory Trajectory AnalyticsDITA: Distributed In-memory Trajectory Analytics Support trajectory similarity search and join with SQL and DataFrame API Support most](https://reader030.vdocument.in/reader030/viewer/2022040912/5e877a2b69a7b77718627d78/html5/thumbnails/15.jpg)
Trajectory Similarity Join
● Cost Models● Join Graph● Weight of edges (a->b)
● a sends candidate trajectories to b● Transmission cost of a (data transmitted)● Computation cost of b (candidate pairs)
● Built by Sampling● Goal: minimize the maximum total cost
AB
![Page 16: DITA: Distributed In-Memory Trajectory AnalyticsDITA: Distributed In-memory Trajectory Analytics Support trajectory similarity search and join with SQL and DataFrame API Support most](https://reader030.vdocument.in/reader030/viewer/2022040912/5e877a2b69a7b77718627d78/html5/thumbnails/16.jpg)
Trajectory Similarity Join
● Graph Orientation
AB
AB
![Page 17: DITA: Distributed In-Memory Trajectory AnalyticsDITA: Distributed In-memory Trajectory Analytics Support trajectory similarity search and join with SQL and DataFrame API Support most](https://reader030.vdocument.in/reader030/viewer/2022040912/5e877a2b69a7b77718627d78/html5/thumbnails/17.jpg)
Trajectory Similarity Join
Greedy Algorithm
AB
Initialize Find partition with largest total cost Repeat
A B A B
![Page 18: DITA: Distributed In-Memory Trajectory AnalyticsDITA: Distributed In-memory Trajectory Analytics Support trajectory similarity search and join with SQL and DataFrame API Support most](https://reader030.vdocument.in/reader030/viewer/2022040912/5e877a2b69a7b77718627d78/html5/thumbnails/18.jpg)
Trajectory Similarity Join
● Limitation of Graph Orientation
○ It is greedy
○ Doesn’t work well for partitions with inherently huge cost
![Page 19: DITA: Distributed In-Memory Trajectory AnalyticsDITA: Distributed In-memory Trajectory Analytics Support trajectory similarity search and join with SQL and DataFrame API Support most](https://reader030.vdocument.in/reader030/viewer/2022040912/5e877a2b69a7b77718627d78/html5/thumbnails/19.jpg)
Trajectory Similarity Join
● Division-based Load Balancing
○ Division unit: the 98% quantile of total cost
○ For partitions whose total cost bigger than the division unit, we divide them into corresponding number of units
AB A B
![Page 20: DITA: Distributed In-Memory Trajectory AnalyticsDITA: Distributed In-memory Trajectory Analytics Support trajectory similarity search and join with SQL and DataFrame API Support most](https://reader030.vdocument.in/reader030/viewer/2022040912/5e877a2b69a7b77718627d78/html5/thumbnails/20.jpg)
Experimental Results
● Setup
○ 64 nodes with a 8-core Intel Xeon E5-2670 CPU and 24GB RAM
○ Hadoop 2.6.0 and Spark 1.6.0
○ Datasets
![Page 21: DITA: Distributed In-Memory Trajectory AnalyticsDITA: Distributed In-memory Trajectory Analytics Support trajectory similarity search and join with SQL and DataFrame API Support most](https://reader030.vdocument.in/reader030/viewer/2022040912/5e877a2b69a7b77718627d78/html5/thumbnails/21.jpg)
Experimental Results
● Baseline Methods○ Naive○ Simba (SIGMOD 2016)○ DFT (VLDB 2017)
![Page 22: DITA: Distributed In-Memory Trajectory AnalyticsDITA: Distributed In-memory Trajectory Analytics Support trajectory similarity search and join with SQL and DataFrame API Support most](https://reader030.vdocument.in/reader030/viewer/2022040912/5e877a2b69a7b77718627d78/html5/thumbnails/22.jpg)
Experimental Results
Search on Large Datasets (141M trajectories, 703GB)
![Page 23: DITA: Distributed In-Memory Trajectory AnalyticsDITA: Distributed In-memory Trajectory Analytics Support trajectory similarity search and join with SQL and DataFrame API Support most](https://reader030.vdocument.in/reader030/viewer/2022040912/5e877a2b69a7b77718627d78/html5/thumbnails/23.jpg)
Experimental Results
Join on Large Datasets (65M trajectories, 312GB)
![Page 24: DITA: Distributed In-Memory Trajectory AnalyticsDITA: Distributed In-memory Trajectory Analytics Support trajectory similarity search and join with SQL and DataFrame API Support most](https://reader030.vdocument.in/reader030/viewer/2022040912/5e877a2b69a7b77718627d78/html5/thumbnails/24.jpg)
Conclusion
DITA: Distributed In-memory Trajectory Analytics
● Support trajectory similarity search and join with SQL and DataFrame API● Support most trajectory distance functions● Filter-verification Framework
○ Global and Local Index○ Optimizing Verification
● Experimental results show that DITA outperformed state-of-the-art approaches significantly
● Future Work