zhenhui li, jae-gil lee, xiaolei li, jiawei han univ. of illinois at urbana-champaign dasfaa...
TRANSCRIPT
Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei HanUniv. of Illinois at Urbana-Champaign
DASFAA Conference 2010April, Tsukuba, Japan
Incremental Clustering for Trajectories
OutlineMotivationIntroducing previous work TRACLUSTrajectory Clustering using Micro- and
Macro-clusteringExperimentConclusionFuture Work
OutlineMotivationIntroducing previous work TRACLUSTrajectory Clustering using Micro- and
Macro-clustering ExperimentConclusionFuture Work
Tracking by GPS/Sensor is becoming more common
Hurricane AnimalsVehicles
Moving object data is accumulated fastTaxi tracking system tracks 5,000 taxis in
San FranciscoLocation information received each taxi
every minuteAfter a day, 7.2 million points collectedAfter a week, 50.4 million points collected...
Online monitoring demandTrajectory clusters have applications in
discovering common hurricane pathsmonitoring hot traffic pathsanalyzing animals’ movement
As data is updated along with time, there is need to online monitor the clustering result.
But, it is inefficient to compute the trajectory clusters from scratch every time.
New data will only affect local shiftsThe key observation is that, the new data
will only affect local shifts.
Snapshot Time 1 Snapshot Time 2
OutlineMotivationIntroducing previous work TRACLUSTrajectory Clustering using Micro- and
Macro-clusteringExperimentConclusionFuture Work
TRACLUS: trajectory clustering
Clustering trajectories as a whole could not detect similar portions of the trajectories (i.e., common sub-trajectories)Example: if we cluster TR1~TR5 as a whole, we
cannot discover the common behavior since they move to totally different directions
Jae-Gil Lee, Jiawei Han, and Kyu-Young Whang, “Trajectory Clustering: A Partition-and-Group Framework”, in Proc. 2007 ACM SIGMOD Int. Conf. on Management of Data (SIGMOD'07), Beijing, China, June 2007.
A common sub-trajectoryTR5
TR1
TR2
TR3
TR4
The Partition-and-Group FrameworkConsists of two phases: partitioning and
grouping TR5
TR1
TR2
TR3
TR4
A set of trajectories
A set of line segmentsA cluster
(1) Partition
(2) Group
A representative trajectory
Note: a representative trajectory is a common sub-trajectory
PartitionIdentify the points where the behavior of a trajectory
changes rapidly; such points are called characteristic points
A trajectory is partitioned at every characteristic point
A line segment between consecutive characteristic points is called a trajectory partition
1p
2p3p4p
5p6p 7p
8p
: characteristic point : trajectory partition
1cp2cp
3cp 4cp
iTR
GroupGroup line segments based on density
L1, L2, L3, L4, and L5 are core line segmentsL2 (or L3) is directly density-reachable from L1
L6 is density-reachable from L1, but not vice versa
L1, L4, and L5 are all density-connected
L1L3
L5 L2L4
L6
L6 L5 L3 L1 L2 L4
MinLns = 3
OutlineMotivationIntroducing previous work TRACLUSTrajectory Clustering using Micro- and
Macro-clusteringExperimentConclusionFuture Work
TCMM Framework
Trajectories received along with time
Partition the trajectory into line segments
A micro-cluster stores a small group of close line segments
A macro-cluster a cluster of micro-clusters
Data PreprocessFinding the optimal partitioning translates to finding
the best hypothesis using the MDL principleH a set of trajectory partitions, D a trajectoryL(H) the sum of the length of all trajectory partitionsL(D|H) the sum of the difference between a trajectory
and a set of its trajectory partitions
L(H) measures conciseness; L(D|H) preciseness)),(),(),((log)),(),(),((log)|(
))((log)(
4341324121412
4341324121412
412
ppppdppppdppppdppppdppppdppppdHDL
pplenHL
1cp 2cp
1p2p 3p
4p
5p
Micro-Cluster DefinitionMicro-cluster maintains a fine-granularity
clustering.Each micro-cluster holds and summarizes
the information of local partitioned trajectories.
A micro-cluster for a set of directed line segments is defined as the tuple:
:number of line segments :linear sums of the line
segments’ center points, angles and lengths :squared sums of the line
segments’ center points, angles and lengths
Distance between Micro-ClustersRepresentative line segment of a micro-
clusterDistance between two micro-clusters can
be defined as the distance between representative line segments of two micro-clusters
Creating and updating Micro-ClusterWhen a new line segment is received
Find the closest micro-clusterIf the distance is between the new line
segment and its closest micro-cluster is less than threshold , add the new line segment into this micro-cluster and update the micro-cluster
If not, create a new micro-cluster, and the new micro-cluster only contains this line segment are the center, angle, and
length of this line segment are the square of the center,
angle and length of this line segment
Merging Micro-ClustersWhy merging micro-clusters?
If the number of micro-clusters is large, it is time-consuming to find the closest micro-cluster when a new line segment is
receiveddo macro-clustering over micro-clusters
And the memory might not be enough to store all the micro-clusters
Merge close micro-clusters to save storage space and save computation time“closeness” can be simply defined as the distance
between two micro-clustersHowever, it does not consider the “tightness” of a
micro-cluster
Merging Micro-Clusters (cont.)We prefer to merge loose micro-clusters
rather than tight ones to better preserve the “tightness” of micro-clusters.
Lose more information when merging two tight micro-clusters.
Merging Micro-Clusters (cont.)Introducing “extent” of a micro-clusterExtent defines the tightness of a micro-
cluster in terms of center, angle and length
Merging Micro-Clusters (cont.)Distance between micro-clusters with
extent
Center distance Angle distance
Length distance
Micro-clustering summary
Macro-ClusteringMacro-clustering is evoked only when it is
called upon by the userMacro-clustering is performed on the
representative line segments of micro-clusters
Similar to the group step in TRACLUS framework
OutlineMotivationIntroducing previous work TRACLUSTrajectory Clustering using Micro- and
Macro-clusteringExperimentConclusionFuture Work
ExperimentReal taxi data in san Francisco, 7000+ trajectories in a week, 100,000 points in total
Experiment (cont.)Effectiveness
SSQ (sum of squared distance) is the average of all the line segments to the centroid of its macro-cluster
TCMM reaches similar quality as TRACLUS
Experiment (cont.)Efficiency
TCMM is much faster than TRACLUS
Experiment (cont.)
Sensitivity with parameter:When d_max is larger, the quality is lower but the efficiency is better
OutlineMotivationIntroducing previous work TRACLUSTrajectory Clustering using Micro- and
Macro-clusteringExperimentConclusionFuture Work
ConclusionWe address the problem to incrementally
cluster trajectories.We propose the TCMM (Trajectory
Clustering based on Micro- and Macro-clustering) framework.
The definition of extent is proposed to better capture the “tightness” of micro-clusters.
Experiments show that TCMM achieves similar quality as TRACLUS but it is much faster.
OutlineMotivationIntroducing previous work TRACLUSTrajectory Clustering using Micro- and
Macro-clusteringExperimentConclusionFuture Work
Future WorkEfficiency
Use an index to find closest micro-clusterNot easy because our distance function is non-metric
Parameter insensitivityMake our algorithm more insensitive to parameter
values
Temporal informationTake account of temporal information during clustering
Other applicationsIncrementally discover outliers and patterns
Thank you!