zhenhui li, jae-gil lee, xiaolei li, jiawei han univ. of illinois at urbana-champaign dasfaa...

34
Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei Han Univ. of Illinois at Urbana-Champaign DASFAA Conference 2010 April, Tsukuba, Japan Incremental Clustering for Trajectories

Upload: paige-batten

Post on 28-Mar-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei Han Univ. of Illinois at Urbana-Champaign DASFAA Conference 2010 April, Tsukuba, Japan Incremental Clustering

Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei HanUniv. of Illinois at Urbana-Champaign

DASFAA Conference 2010April, Tsukuba, Japan

Incremental Clustering for Trajectories

Page 2: Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei Han Univ. of Illinois at Urbana-Champaign DASFAA Conference 2010 April, Tsukuba, Japan Incremental Clustering

OutlineMotivationIntroducing previous work TRACLUSTrajectory Clustering using Micro- and

Macro-clusteringExperimentConclusionFuture Work

Page 3: Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei Han Univ. of Illinois at Urbana-Champaign DASFAA Conference 2010 April, Tsukuba, Japan Incremental Clustering

OutlineMotivationIntroducing previous work TRACLUSTrajectory Clustering using Micro- and

Macro-clustering ExperimentConclusionFuture Work

Page 4: Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei Han Univ. of Illinois at Urbana-Champaign DASFAA Conference 2010 April, Tsukuba, Japan Incremental Clustering

Tracking by GPS/Sensor is becoming more common

Hurricane AnimalsVehicles

Page 5: Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei Han Univ. of Illinois at Urbana-Champaign DASFAA Conference 2010 April, Tsukuba, Japan Incremental Clustering

Moving object data is accumulated fastTaxi tracking system tracks 5,000 taxis in

San FranciscoLocation information received each taxi

every minuteAfter a day, 7.2 million points collectedAfter a week, 50.4 million points collected...

Page 6: Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei Han Univ. of Illinois at Urbana-Champaign DASFAA Conference 2010 April, Tsukuba, Japan Incremental Clustering

Online monitoring demandTrajectory clusters have applications in

discovering common hurricane pathsmonitoring hot traffic pathsanalyzing animals’ movement

As data is updated along with time, there is need to online monitor the clustering result.

But, it is inefficient to compute the trajectory clusters from scratch every time.

Page 7: Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei Han Univ. of Illinois at Urbana-Champaign DASFAA Conference 2010 April, Tsukuba, Japan Incremental Clustering

New data will only affect local shiftsThe key observation is that, the new data

will only affect local shifts.

Snapshot Time 1 Snapshot Time 2

Page 8: Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei Han Univ. of Illinois at Urbana-Champaign DASFAA Conference 2010 April, Tsukuba, Japan Incremental Clustering

OutlineMotivationIntroducing previous work TRACLUSTrajectory Clustering using Micro- and

Macro-clusteringExperimentConclusionFuture Work

Page 9: Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei Han Univ. of Illinois at Urbana-Champaign DASFAA Conference 2010 April, Tsukuba, Japan Incremental Clustering

TRACLUS: trajectory clustering

Clustering trajectories as a whole could not detect similar portions of the trajectories (i.e., common sub-trajectories)Example: if we cluster TR1~TR5 as a whole, we

cannot discover the common behavior since they move to totally different directions

Jae-Gil Lee, Jiawei Han, and Kyu-Young Whang, “Trajectory Clustering: A Partition-and-Group Framework”, in Proc. 2007 ACM SIGMOD Int. Conf. on Management of Data (SIGMOD'07), Beijing, China, June 2007.

A common sub-trajectoryTR5

TR1

TR2

TR3

TR4

Page 10: Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei Han Univ. of Illinois at Urbana-Champaign DASFAA Conference 2010 April, Tsukuba, Japan Incremental Clustering

The Partition-and-Group FrameworkConsists of two phases: partitioning and

grouping TR5

TR1

TR2

TR3

TR4

A set of trajectories

A set of line segmentsA cluster

(1) Partition

(2) Group

A representative trajectory

Note: a representative trajectory is a common sub-trajectory

Page 11: Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei Han Univ. of Illinois at Urbana-Champaign DASFAA Conference 2010 April, Tsukuba, Japan Incremental Clustering

PartitionIdentify the points where the behavior of a trajectory

changes rapidly; such points are called characteristic points

A trajectory is partitioned at every characteristic point

A line segment between consecutive characteristic points is called a trajectory partition

1p

2p3p4p

5p6p 7p

8p

: characteristic point : trajectory partition

1cp2cp

3cp 4cp

iTR

Page 12: Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei Han Univ. of Illinois at Urbana-Champaign DASFAA Conference 2010 April, Tsukuba, Japan Incremental Clustering

GroupGroup line segments based on density

L1, L2, L3, L4, and L5 are core line segmentsL2 (or L3) is directly density-reachable from L1

L6 is density-reachable from L1, but not vice versa

L1, L4, and L5 are all density-connected

L1L3

L5 L2L4

L6

L6 L5 L3 L1 L2 L4

MinLns = 3

Page 13: Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei Han Univ. of Illinois at Urbana-Champaign DASFAA Conference 2010 April, Tsukuba, Japan Incremental Clustering

OutlineMotivationIntroducing previous work TRACLUSTrajectory Clustering using Micro- and

Macro-clusteringExperimentConclusionFuture Work

Page 14: Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei Han Univ. of Illinois at Urbana-Champaign DASFAA Conference 2010 April, Tsukuba, Japan Incremental Clustering

TCMM Framework

Trajectories received along with time

Partition the trajectory into line segments

A micro-cluster stores a small group of close line segments

A macro-cluster a cluster of micro-clusters

Page 15: Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei Han Univ. of Illinois at Urbana-Champaign DASFAA Conference 2010 April, Tsukuba, Japan Incremental Clustering

Data PreprocessFinding the optimal partitioning translates to finding

the best hypothesis using the MDL principleH a set of trajectory partitions, D a trajectoryL(H) the sum of the length of all trajectory partitionsL(D|H) the sum of the difference between a trajectory

and a set of its trajectory partitions

L(H) measures conciseness; L(D|H) preciseness)),(),(),((log)),(),(),((log)|(

))((log)(

4341324121412

4341324121412

412

ppppdppppdppppdppppdppppdppppdHDL

pplenHL

1cp 2cp

1p2p 3p

4p

5p

Page 16: Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei Han Univ. of Illinois at Urbana-Champaign DASFAA Conference 2010 April, Tsukuba, Japan Incremental Clustering

Micro-Cluster DefinitionMicro-cluster maintains a fine-granularity

clustering.Each micro-cluster holds and summarizes

the information of local partitioned trajectories.

A micro-cluster for a set of directed line segments is defined as the tuple:

:number of line segments :linear sums of the line

segments’ center points, angles and lengths :squared sums of the line

segments’ center points, angles and lengths

Page 17: Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei Han Univ. of Illinois at Urbana-Champaign DASFAA Conference 2010 April, Tsukuba, Japan Incremental Clustering

Distance between Micro-ClustersRepresentative line segment of a micro-

clusterDistance between two micro-clusters can

be defined as the distance between representative line segments of two micro-clusters

Page 18: Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei Han Univ. of Illinois at Urbana-Champaign DASFAA Conference 2010 April, Tsukuba, Japan Incremental Clustering

Creating and updating Micro-ClusterWhen a new line segment is received

Find the closest micro-clusterIf the distance is between the new line

segment and its closest micro-cluster is less than threshold , add the new line segment into this micro-cluster and update the micro-cluster

If not, create a new micro-cluster, and the new micro-cluster only contains this line segment are the center, angle, and

length of this line segment are the square of the center,

angle and length of this line segment

Page 19: Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei Han Univ. of Illinois at Urbana-Champaign DASFAA Conference 2010 April, Tsukuba, Japan Incremental Clustering

Merging Micro-ClustersWhy merging micro-clusters?

If the number of micro-clusters is large, it is time-consuming to find the closest micro-cluster when a new line segment is

receiveddo macro-clustering over micro-clusters

And the memory might not be enough to store all the micro-clusters

Merge close micro-clusters to save storage space and save computation time“closeness” can be simply defined as the distance

between two micro-clustersHowever, it does not consider the “tightness” of a

micro-cluster

Page 20: Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei Han Univ. of Illinois at Urbana-Champaign DASFAA Conference 2010 April, Tsukuba, Japan Incremental Clustering

Merging Micro-Clusters (cont.)We prefer to merge loose micro-clusters

rather than tight ones to better preserve the “tightness” of micro-clusters.

Lose more information when merging two tight micro-clusters.

Page 21: Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei Han Univ. of Illinois at Urbana-Champaign DASFAA Conference 2010 April, Tsukuba, Japan Incremental Clustering

Merging Micro-Clusters (cont.)Introducing “extent” of a micro-clusterExtent defines the tightness of a micro-

cluster in terms of center, angle and length

Page 22: Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei Han Univ. of Illinois at Urbana-Champaign DASFAA Conference 2010 April, Tsukuba, Japan Incremental Clustering

Merging Micro-Clusters (cont.)Distance between micro-clusters with

extent

Center distance Angle distance

Length distance

Page 23: Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei Han Univ. of Illinois at Urbana-Champaign DASFAA Conference 2010 April, Tsukuba, Japan Incremental Clustering

Micro-clustering summary

Page 24: Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei Han Univ. of Illinois at Urbana-Champaign DASFAA Conference 2010 April, Tsukuba, Japan Incremental Clustering

Macro-ClusteringMacro-clustering is evoked only when it is

called upon by the userMacro-clustering is performed on the

representative line segments of micro-clusters

Similar to the group step in TRACLUS framework

Page 25: Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei Han Univ. of Illinois at Urbana-Champaign DASFAA Conference 2010 April, Tsukuba, Japan Incremental Clustering

OutlineMotivationIntroducing previous work TRACLUSTrajectory Clustering using Micro- and

Macro-clusteringExperimentConclusionFuture Work

Page 26: Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei Han Univ. of Illinois at Urbana-Champaign DASFAA Conference 2010 April, Tsukuba, Japan Incremental Clustering

ExperimentReal taxi data in san Francisco, 7000+ trajectories in a week, 100,000 points in total

Page 27: Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei Han Univ. of Illinois at Urbana-Champaign DASFAA Conference 2010 April, Tsukuba, Japan Incremental Clustering

Experiment (cont.)Effectiveness

SSQ (sum of squared distance) is the average of all the line segments to the centroid of its macro-cluster

TCMM reaches similar quality as TRACLUS

Page 28: Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei Han Univ. of Illinois at Urbana-Champaign DASFAA Conference 2010 April, Tsukuba, Japan Incremental Clustering

Experiment (cont.)Efficiency

TCMM is much faster than TRACLUS

Page 29: Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei Han Univ. of Illinois at Urbana-Champaign DASFAA Conference 2010 April, Tsukuba, Japan Incremental Clustering

Experiment (cont.)

Sensitivity with parameter:When d_max is larger, the quality is lower but the efficiency is better

Page 30: Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei Han Univ. of Illinois at Urbana-Champaign DASFAA Conference 2010 April, Tsukuba, Japan Incremental Clustering

OutlineMotivationIntroducing previous work TRACLUSTrajectory Clustering using Micro- and

Macro-clusteringExperimentConclusionFuture Work

Page 31: Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei Han Univ. of Illinois at Urbana-Champaign DASFAA Conference 2010 April, Tsukuba, Japan Incremental Clustering

ConclusionWe address the problem to incrementally

cluster trajectories.We propose the TCMM (Trajectory

Clustering based on Micro- and Macro-clustering) framework.

The definition of extent is proposed to better capture the “tightness” of micro-clusters.

Experiments show that TCMM achieves similar quality as TRACLUS but it is much faster.

Page 32: Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei Han Univ. of Illinois at Urbana-Champaign DASFAA Conference 2010 April, Tsukuba, Japan Incremental Clustering

OutlineMotivationIntroducing previous work TRACLUSTrajectory Clustering using Micro- and

Macro-clusteringExperimentConclusionFuture Work

Page 33: Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei Han Univ. of Illinois at Urbana-Champaign DASFAA Conference 2010 April, Tsukuba, Japan Incremental Clustering

Future WorkEfficiency

Use an index to find closest micro-clusterNot easy because our distance function is non-metric

Parameter insensitivityMake our algorithm more insensitive to parameter

values

Temporal informationTake account of temporal information during clustering

Other applicationsIncrementally discover outliers and patterns

Page 34: Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei Han Univ. of Illinois at Urbana-Champaign DASFAA Conference 2010 April, Tsukuba, Japan Incremental Clustering

Thank you!