a topic model for traffic speed data analysis
DESCRIPTION
http://link.springer.com/chapter/10.1007%2F978-3-319-07467-2_8TRANSCRIPT
![Page 2: A Topic Model for Traffic Speed Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022052410/555dd16ad8b42aec698b5423/html5/thumbnails/2.jpg)
Real-Time Traffic Speed Data | NYC Open Datahttps://data.cityofnewyork.us/Transportation/Real-Time-Traffic-Speed-Data/xsat-x5sa
Speed measurements at hundreds of sensors
(Regrettably, the data seems no longer maintained.)
![Page 3: A Topic Model for Traffic Speed Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022052410/555dd16ad8b42aec698b5423/html5/thumbnails/3.jpg)
Problem
• Traffic speed data show a periodicity at
one day period.
• However, there is a wide variety not only
between periods but also within periods.
• How can we analyze it?
![Page 4: A Topic Model for Traffic Speed Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022052410/555dd16ad8b42aec698b5423/html5/thumbnails/4.jpg)
Solution
• We take intuition from topic models
in text mining.
![Page 5: A Topic Model for Traffic Speed Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022052410/555dd16ad8b42aec698b5423/html5/thumbnails/5.jpg)
Topic models for documents
• We can assume that each document contains
multiple topics.
• That is, each document is modeled
– not as a single word probability distribution,
– but as a mixture of word probability distributions.
![Page 6: A Topic Model for Traffic Speed Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022052410/555dd16ad8b42aec698b5423/html5/thumbnails/6.jpg)
Latent Dirichlet Allocation (LDA)
• LDA [Blei et al. 03]
topic <-> word probability distribution
document <-> mixing proportions of topics
• LDA models each document as follows:
![Page 7: A Topic Model for Traffic Speed Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022052410/555dd16ad8b42aec698b5423/html5/thumbnails/7.jpg)
v3v3
v1v1
v3v3
v2v2
v2v2
v1 v2 v3 v4
t3φ31
φ32
φ33
φ34
v1 v2 v3 v4
t2φ21
φ22
φ23 φ24
v1 v2 v3 v4
t1φ11
φ12
φ13
φ14
θj1 θj2
θj3
![Page 8: A Topic Model for Traffic Speed Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022052410/555dd16ad8b42aec698b5423/html5/thumbnails/8.jpg)
An important difference
• Words are discrete entities.
– Therefore, LDA uses multinomial distributions for
modeling per-topic word distributions.
• Speeds (in mph) are continuous entities.
– We can’t use multinomial distributions.
![Page 9: A Topic Model for Traffic Speed Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022052410/555dd16ad8b42aec698b5423/html5/thumbnails/9.jpg)
Gamma distribution
![Page 10: A Topic Model for Traffic Speed Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022052410/555dd16ad8b42aec698b5423/html5/thumbnails/10.jpg)
Comparing LDA with Patchy
• LDA <-> Patchy
– Word <-> Speed observation (in mph)
– Topic (multinomial) <-> Patch (Gamma)
– Document <-> Roll (from 0 AM to 12 PM)
![Page 11: A Topic Model for Traffic Speed Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022052410/555dd16ad8b42aec698b5423/html5/thumbnails/11.jpg)
Full joint distribution of Patchy
• We estimate parameters by a variational
Bayesian inference.
![Page 12: A Topic Model for Traffic Speed Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022052410/555dd16ad8b42aec698b5423/html5/thumbnails/12.jpg)
Variational Bayesian inference
• The posterior parameters are estimated
by maximizing ELBO.
– ELBO = the lower bound of the evidence
![Page 13: A Topic Model for Traffic Speed Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022052410/555dd16ad8b42aec698b5423/html5/thumbnails/13.jpg)
![Page 14: A Topic Model for Traffic Speed Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022052410/555dd16ad8b42aec698b5423/html5/thumbnails/14.jpg)
Context dependency
Observations of the same mph
are assigned to different patches.
Observations of the same mph
are assigned to different patches.
![Page 15: A Topic Model for Traffic Speed Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022052410/555dd16ad8b42aec698b5423/html5/thumbnails/15.jpg)
Context dependency
• Context = mixing proportions of patches
– Which patch is dominant?
• Context-dependency
–Observations of the same speed can be
assigned to different patches depending on
their contexts.
![Page 16: A Topic Model for Traffic Speed Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022052410/555dd16ad8b42aec698b5423/html5/thumbnails/16.jpg)
Context dependencyOn May 27, this purple patch is
dominant.
On May 28, this yellow patch is
dominant.
![Page 17: A Topic Model for Traffic Speed Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022052410/555dd16ad8b42aec698b5423/html5/thumbnails/17.jpg)
Evaluation
• Binary classification
–Weekdays / Weekends (Sat, Sun)
• Data
– Training: May 27 ~ June 16 (three weeks)
– Test: July 23 ~ August 5 (two weeks)
![Page 18: A Topic Model for Traffic Speed Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022052410/555dd16ad8b42aec698b5423/html5/thumbnails/18.jpg)
Comparison
• Nearest neighbor
–Measure similarity by Euclidean distance
–Require timestamps
• Patchy
–Measure similarity by predictive probability
–Require no timestamps
![Page 19: A Topic Model for Traffic Speed Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022052410/555dd16ad8b42aec698b5423/html5/thumbnails/19.jpg)
Classification results
Nearest neighbor
![Page 20: A Topic Model for Traffic Speed Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022052410/555dd16ad8b42aec698b5423/html5/thumbnails/20.jpg)
Summary
• We proposed a topic model for traffic data analysis.
• Patchy can assign the observations of the same
traffic speed to different groups in a context-
dependent manner.
• Patchy achieved a classification accuracy comparable
with NN with no timestamps.
![Page 21: A Topic Model for Traffic Speed Data Analysis](https://reader033.vdocument.in/reader033/viewer/2022052410/555dd16ad8b42aec698b5423/html5/thumbnails/21.jpg)
Future work
• Model timestamps