query processing of massive trajectory data based on mapreduce qiang ma, bin yang (fudan university)...

19
Query Processing of Massive Trajectory Data based on MapReduce Qiang Ma, Bin Yang (Fudan University) Weining Qian, Aoying Zhou (ECNU) Presented By: Xin Cao (Aalborg University)

Upload: moris-cain

Post on 17-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Query Processing of Massive Trajectory Data based on MapReduce Qiang Ma, Bin Yang (Fudan University) Weining Qian, Aoying Zhou (ECNU) Presented By: Xin

Query Processing of Massive Trajectory Data based on MapReduce

Qiang Ma, Bin Yang (Fudan University)Weining Qian, Aoying Zhou (ECNU)

Presented By: Xin Cao (Aalborg University)

Page 2: Query Processing of Massive Trajectory Data based on MapReduce Qiang Ma, Bin Yang (Fudan University) Weining Qian, Aoying Zhou (ECNU) Presented By: Xin

Outline

• Introduction • Preliminary• Trajectory Processing– Execution Overview– Storage– Indexing Methods– Query Processing

• Experimental Study• Future Works

Page 3: Query Processing of Massive Trajectory Data based on MapReduce Qiang Ma, Bin Yang (Fudan University) Weining Qian, Aoying Zhou (ECNU) Presented By: Xin

Introduction

• Location-based services are playing important roles.

• Large volumes of diverse formats of trajectory data have been accumulated.

• Traditional centralized technologies may not deal with the large amount of trajectories.

• Cloud computing, such as GFS and MapReduce, provides a promising paradigm to conquer the explosion of trajectory data.

Page 4: Query Processing of Massive Trajectory Data based on MapReduce Qiang Ma, Bin Yang (Fudan University) Weining Qian, Aoying Zhou (ECNU) Presented By: Xin

Challenge

• Huge volume, updates frequently, rapidly increasing.

• Trajectory data is “continuous”, i.e. ordered sequentially.

• Highly skewed.• MapReduce is good at offline data

analysis, but not efficient for online query.

Page 5: Query Processing of Massive Trajectory Data based on MapReduce Qiang Ma, Bin Yang (Fudan University) Weining Qian, Aoying Zhou (ECNU) Presented By: Xin

Our Contributions

• Extend the MapReduce framework to manage massive sequential data, such as trajectories of moving objects.

• Study what kind of query processing methods are appropriate for large clusters.

• Provide two scalable indexing methods to facilitate query processing efficiently.

Page 6: Query Processing of Massive Trajectory Data based on MapReduce Qiang Ma, Bin Yang (Fudan University) Weining Qian, Aoying Zhou (ECNU) Presented By: Xin

Preliminary

• Data Model - line segments model–A polyline in three-dimensional space.

• Query Types– Spatio-temporal Range Query:

– Q(Es, Et) → {Sk}– Trajectory-based Query:

– Q(O, Et) → {Sk}

Page 7: Query Processing of Massive Trajectory Data based on MapReduce Qiang Ma, Bin Yang (Fudan University) Weining Qian, Aoying Zhou (ECNU) Presented By: Xin

Trajectory Processing

• Execution Overviews

Page 8: Query Processing of Massive Trajectory Data based on MapReduce Qiang Ma, Bin Yang (Fudan University) Weining Qian, Aoying Zhou (ECNU) Presented By: Xin

Storage• Data are grouped with key and organized in data chunks in GFS-style

storage.• The whole data set is divided into several parts, and each part is called a

partition and assigned to one data chunk to store.• Each trajectory data is assigned to at least one partition according to

spatio-temporal information

Page 9: Query Processing of Massive Trajectory Data based on MapReduce Qiang Ma, Bin Yang (Fudan University) Weining Qian, Aoying Zhou (ECNU) Presented By: Xin

Storage

• A good spatio-temporal partitioning makes the size of data per chunk is fairly uniform.

• Static partitioning strategies are easy to control and suitable for distributed scheduling, but may lead to load imbalance.

• Dynamic strategies can resolve load imbalance, but re-split data can cause distantly migration of large volume of data in clusters.

• Appropriate strategies should be trained

Page 10: Query Processing of Massive Trajectory Data based on MapReduce Qiang Ma, Bin Yang (Fudan University) Weining Qian, Aoying Zhou (ECNU) Presented By: Xin

PMI (Partition based Multilevel Index)

• Aim to speed up spatio-temporal range queries.• Generate all candidate partitions by invoking space

partition strategy.• Store together as key/value.– <PartitionID, Sk>

• Each data chunk only contains trajectory segments that belong to the same partition.

• Multilevel index for each node can be built local. (using traditional centralized methods)

Page 11: Query Processing of Massive Trajectory Data based on MapReduce Qiang Ma, Bin Yang (Fudan University) Weining Qian, Aoying Zhou (ECNU) Presented By: Xin

OII (Object Inverted Index)

• Aim to speed up trajectory based queries.• Collect each object's all historical trajectories.• Store together as key/value.–<OID, { PartitionID, T}>–Access according to key(object identifier).

Page 12: Query Processing of Massive Trajectory Data based on MapReduce Qiang Ma, Bin Yang (Fudan University) Weining Qian, Aoying Zhou (ECNU) Presented By: Xin

Data Insertion

Page 13: Query Processing of Massive Trajectory Data based on MapReduce Qiang Ma, Bin Yang (Fudan University) Weining Qian, Aoying Zhou (ECNU) Presented By: Xin

Query Processing• Query Processing

• Trajectory based Queries– Given any object ID, the system can locate the object's trajectory

according to OII.• Range Queries

Page 14: Query Processing of Massive Trajectory Data based on MapReduce Qiang Ma, Bin Yang (Fudan University) Weining Qian, Aoying Zhou (ECNU) Presented By: Xin

Experimental Study

• Settings–Hadoop version 0.19.0–8 PC nodes• Ubuntu Linux version 8.04• Pentium IV 1.7GHz CPU• 512M memory

– Java SDK 1.42– Experiment data: Network-based Generator

Page 15: Query Processing of Massive Trajectory Data based on MapReduce Qiang Ma, Bin Yang (Fudan University) Weining Qian, Aoying Zhou (ECNU) Presented By: Xin

Experiments – Load Balance

Standard Deviation of Partitioning Load Balance of PRADASE

Page 16: Query Processing of Massive Trajectory Data based on MapReduce Qiang Ma, Bin Yang (Fudan University) Weining Qian, Aoying Zhou (ECNU) Presented By: Xin

Experiments – Data Importing and Index Creating

Data Importing with PMI Data Importing with OII

Page 17: Query Processing of Massive Trajectory Data based on MapReduce Qiang Ma, Bin Yang (Fudan University) Weining Qian, Aoying Zhou (ECNU) Presented By: Xin

Experiments – Query Processing

Spatio-temporal Range Query Processing with PMI Trajectory Base Query Processing with OII

Page 18: Query Processing of Massive Trajectory Data based on MapReduce Qiang Ma, Bin Yang (Fudan University) Weining Qian, Aoying Zhou (ECNU) Presented By: Xin

Future Works

• More heuristic partitioning methods.• Reducing data migration between nodes.• Efficient real-time query processing on

Cloud infrastructure.

Page 19: Query Processing of Massive Trajectory Data based on MapReduce Qiang Ma, Bin Yang (Fudan University) Weining Qian, Aoying Zhou (ECNU) Presented By: Xin

Thanks!