Zhenhui Jessie Li 1
Mining Event Periodicity from Incomplete Observations
Zhenhui (Jessie) Li*, Jingjing Wang, Jiawei HanUniversity of Illinois at Urbana-Champaign
*Now at Penn State University
KDD 2012Beijing, China
Zhenhui Jessie Li 2
Prologue: Detect Periodicity in Movements [Li et al., KDD’10]
Problem: What is the periodicity of the
movement?
Bee example:8 hours in hive16 hours fly nearby
Zhenhui Jessie Li 3
Prologue: Detect Periodicity in Movements [Li et al., KDD’10]
Observe the in-and-out movements from the reference spot (i.e., hive).
in hive
outside hive
time
Two-Dimensional Movement One-Dimensional Binary Sequence
Easy to see the
periodicity.
Zhenhui Jessie Li 4
Challenge: Periodicity Detection for Incomplete Observations
• Two factors result in incomplete observations: inconsistent + low sampling rate
• Movement data collection in real scenarios:– Human movements data collected from cellphones: only report
locations when making calls– Animal movement data: 2~3 locations in 3~5 days
2009-05-02 01:03 in2009-05-03 11:30 out2009-05-05 03:12 in2009-05-09 12:03 in2009-05-10 11:14 out2009-05-11 02:15 in…
in hive
outside hive
Complete Observations Incomplete Observations
Zhenhui Jessie Li 5
A Challenging Case of Detecting Periodicity for Incomplete
Observations
2009-05-02 01:03 in2009-05-03 11:30 out2009-05-05 03:12 in2009-05-09 12:03 in2009-05-10 11:14 out2009-05-11 02:15 in…
Sparse Raw Data
in out in
Any periodicity in the above sequence?
Zhenhui Jessie Li 6
Mining Periodicity in Incomplete Data
• Event has a period of 20• Occurrences of the event happen between 20k+5 to 20k+10
Zhenhui Jessie Li 7
A Probabilistic Model for Periodic Event
Example:• Human daily periodicity visiting
office• Period as 24• Visiting office at 10-11am, 14-
16pm
Zhenhui Jessie Li 8
A Probabilistic Model for Periodic Event with Random Observation
generate
x(5)=1 x(62)=0
Zhenhui Jessie Li 9
Periodicity Detection by Overlaying Observations
Skewed distribution
Even distribution
True period Wrong period
Zhenhui Jessie Li 10
Relationship between Observation Ratio and Probabilistic Model
Pos/Neg Ratio Periodic Distribution Vector
Zhenhui Jessie Li 11
Discrepancy Score to Measure Periodicity
If T (=24) is the correct period, the discrepancy score should be large for certain set of timestamps
If T (=23) is the wrong period, the discrepancy scores are likely to be zero for any set of timestamps
Zhenhui Jessie Li 12
Periodicity Measure
Zhenhui Jessie Li 13
Performance Comparisons
Sampling rate(Ratio of observed points in the complete sequence)
Zhenhui Jessie Li 14
Experiment on Real Human Data
One person’s visits to a specific location
Sampling rate: 20min
Sampling rate: 1hour
Zhenhui Jessie Li 15
Problems with Using Fourier Transform to Detect Periodicity
T=4
T=16
Zhenhui Jessie Li 16
Summary: Mining Event Periodicity from Incomplete Observations
• Motivation– Challenge of the real data: incomplete
observations (inconsistent + low sampling rate)
• Method– Overlay the segments and measure the
“skewness” of the distribution– Theoretically prove the correctness of the method
• Application– Location prediction– 2nd place in Nokia Mobile Data Challenge 2012– Periodicity-based feature + SVM
Thanks! Questions?