trajectory clustering - traclus algorithm

19
Trajectory Clustering BASED ON: TRAJECTORY CLUSTERING: A PARTITION-AND-GROUP FRAMEWORK EDITED BY: IVAN SANCHEZ BY: JAE-GIL LEE JIAWEI HAN KYU-YOUNG WHANG EDUCATIONAL SLIDES ON TRACLUS, AN ALGORITHM FOR CLUSTERING TRAJECTORY DATA CREATED BY JAE-GIL LEE, JIAWEI HAN AND KYU-YOUNG WANG, PUBLISHED ON SIGMOD’07. http://web.engr.illinois.edu/~hanj/pdf/sigmod07_jglee.pdf

Upload: ivan-sanchez-vera

Post on 21-Jul-2015

187 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Trajectory clustering - Traclus Algorithm

Trajectory Clustering BASED ON: TRAJECTORY CLUSTERING: A PARTITION -AND-GROUP FRAMEWORK

EDITED BY: IVAN SANCHEZ

BY: JAE-GIL LEE

J IAWEI HAN

KYU-YOUNG WHANG

EDUCATIONAL SLIDES ON TRACLUS, AN ALGORITHM FOR CLUSTERING TRAJECTORY DATA CREATED BY JAE -GIL LEE, J IAWEI HAN AND KYU-YOUNG WANG, PUBLISHED ON SIGMOD’07 .

http://web.engr.illinois.edu/~hanj/pdf/sigmod07_jglee.pdf

Page 2: Trajectory clustering - Traclus Algorithm

Objective To group similar trajectories together (cluster).

Trajectory define a set of multidimensional points Tr = p1, p2, p3… pn.

A point is d-dimensional entity.

Most Approaches take in consideration only complete trajectories, thus missing valuable information on common Subtrajectories.

Input: Set of trajectories S = (Tr1, Tr2, Tr3….Tri…TrnumTraj)

Output: Cluster of Trajectories C = (C1, C2 … CnumClusters) where each cluster contains ε or more trajectories.

◦ Ε is a threshold that determines the minimum number of trajectories to create a cluster.

◦ Each cluster is composed by a set of trajectories. E.g. C1 = (Tr3, Tr9… Trc1max).

Page 3: Trajectory clustering - Traclus Algorithm

Approaches DBScan

◦ Uses density clustering

◦ Works only on entire trajectories

Partition and Group ◦ Also uses density-based clustering (help to

discover clusters of arbitrary shape and to filter out noise-outliers).

◦ Can discover common subtrajectories.

Page 4: Trajectory clustering - Traclus Algorithm

Partition and Group Framework 2 phased: Partition and Grouping

Additionally calculates a representative trajectory per cluster.

Discover Common Subtrajectories

TRACLUS Algorithm. ◦ Partition trajectories into segments. O(n)

◦ Where n is the number of trajectories.

◦ Group similar segments together (clustering). O(n log n) ◦ Where n is the number of segments

◦ Calculate representative trajectory per cluster. O(n) ◦ Where n is the number of trajectories.

A trajectory can belong to multiple clusters.

Page 5: Trajectory clustering - Traclus Algorithm

Overview

Page 6: Trajectory clustering - Traclus Algorithm

Overview

Page 7: Trajectory clustering - Traclus Algorithm

Partition Phase Partition a trajectory in a set of Segments.

A trajectory partition is a line segment pipj where i<j and both points belong to the same trajectory.

Groups similar line segments together

This allows to find common subtrajectories.

All segments from all trajectories are inserted into a common set D.

Time complexity O(n) where n is the number of points on a trajectory.

Page 8: Trajectory clustering - Traclus Algorithm

How to partition a trajectory? Characteristic Points: Points where the trajectory changes rapidly

From a Trajectory Tr: p1,p2,p3…pj…plen determine a set of characteristic points {pc1,pc2,pc3,…,pcPart}.

The trajectory is partitioned a every characteristic point, and each partition is represented by a line segment between two consecutive partition points.

Line segment = Trajectory partition.

Page 9: Trajectory clustering - Traclus Algorithm

How to optimally partition a trajectory? Properties:

◦ Preciseness: Difference between a trajectory and a set of its trajectory partitions should be as small as possible.

◦ Conciseness: Number of trajectory partitions should as small as possible.

Balance Preciseness and Conciseness using MDL (minimum description length).

Best Hypothesis H to explain D is the one that minimizes the sum of L(H) and L(D|H). ◦ L(H): Sum of length of all trajectory partitions. Measures conciseness.

◦ L(D|H): Sum of the difference between a trajectory and a set of its trajectory partitions. Measures Preciseness.

◦ This can be costly so it is approximated by a local Optima, such that MDLpart(pi,pj)<=MDLnopart(pi,pj).

Time Complexity O(n).

Page 10: Trajectory clustering - Traclus Algorithm

How to optimally partition a trajectory?

Page 11: Trajectory clustering - Traclus Algorithm

Distance Measure Based on the projection of points of one segment over the other.

3 components: ◦ Perpendicular Distance: (Lehmer mean of order 2) between to line segments.

◦ It is the Euclidean distance between the projected points of one trajectory (over the other) and the original points that generated the projection.

◦ Parallel Distance: Is the minimum distance of the projected points and the points of the segment over which the projection was made.

◦ Angle Distance: Smallest intersecting angle between the segments. Helps to measure trajectories with direction.

Distance measure can be easily calculated with vector operations.

The overall distance between two segments is given by the sum of the 3 components.

Page 12: Trajectory clustering - Traclus Algorithm

Distance Measure

Page 13: Trajectory clustering - Traclus Algorithm

Clustering Phase Line segments of the same cluster are close to each other according to a distance measure.

Use Density-Based clustering as in DBSCAN.

Being D is the set of all line segments:

Page 14: Trajectory clustering - Traclus Algorithm

Density-Based Clustering

Page 15: Trajectory clustering - Traclus Algorithm

Clustering Algorithm 2 Parameters:

◦ ε: Neighborhood of Segment

◦ MinLns: Minimum number of Lines.

Trajectory cardinality limits maximum number of clusters.

Turns a set of Segments D into a Set of clusters O.

Complexity: ◦ O(n log n): where n is the number of segments. Using a spatial index.

◦ O(n²)= For number of dimensions >= 2.

Page 16: Trajectory clustering - Traclus Algorithm

Algorithm

Page 17: Trajectory clustering - Traclus Algorithm

Algorithm

Page 18: Trajectory clustering - Traclus Algorithm

Representative Trajectories Imaginary trajectory obtain from the clusters.

As a regular trajectory, a representative trajectory is a sequence of points.

Representative trajectory indicates the major behavior of segments of a cluster.

Representative trajectory = Common subtrajectory.

Page 19: Trajectory clustering - Traclus Algorithm

End =)