geographical topic discovery and comparison
TRANSCRIPT
![Page 1: Geographical Topic Discovery and Comparison](https://reader031.vdocument.in/reader031/viewer/2022021209/62063e308c2f7b173005c8c5/html5/thumbnails/1.jpg)
Geographical Topic Discovery and Comparison
Zhijun Yin, Liangliang Cao, Jiawei Han,
Chengxiang Zhai, Thomas Huang
UIUC
To appear in WWW’11
Presenter: Jeff Huang
![Page 2: Geographical Topic Discovery and Comparison](https://reader031.vdocument.in/reader031/viewer/2022021209/62063e308c2f7b173005c8c5/html5/thumbnails/2.jpg)
Outline • Motivation
• Problem Formulation
• Solution Sketch
• Experiments
• Q/A
3/21/2011 2
![Page 3: Geographical Topic Discovery and Comparison](https://reader031.vdocument.in/reader031/viewer/2022021209/62063e308c2f7b173005c8c5/html5/thumbnails/3.jpg)
Motivation • GPS records are popular on the Web
o Advanced cameras with GPS receivers could record GPS
locations when the photos were taken.
o Some applications including Google Earth and Flickr provide
interfaces for users to specify a location on the world map.
o People can record their locations by GPS functions in their smart
phones.
3/21/2011 3
![Page 4: Geographical Topic Discovery and Comparison](https://reader031.vdocument.in/reader031/viewer/2022021209/62063e308c2f7b173005c8c5/html5/thumbnails/4.jpg)
Motivation (Cont.) • Examples of GPS-associated documents
o Flickr: geo-tagged photos
o Twitter: tweets from iPhone
3/21/2011 4
![Page 5: Geographical Topic Discovery and Comparison](https://reader031.vdocument.in/reader031/viewer/2022021209/62063e308c2f7b173005c8c5/html5/thumbnails/5.jpg)
Motivation (Cont.) • What can we do?
o By analyzing the geographical distribution of food and
festivals, we can compare the cultural differences around
the world.
o We can also explore the hot topics regarding the candidates in presidential election in different places.
o We can compare the popularity of specific products in
different regions and help make the marketing strategy.
3/21/2011 5
![Page 6: Geographical Topic Discovery and Comparison](https://reader031.vdocument.in/reader031/viewer/2022021209/62063e308c2f7b173005c8c5/html5/thumbnails/6.jpg)
Motivation (Cont.)
6
• Discovering different topics of interests that are
coherent in geographical regions.
• Comparing several topics across different
geographical locations.
• Geographical topic discovery and comparison
3/21/2011
![Page 7: Geographical Topic Discovery and Comparison](https://reader031.vdocument.in/reader031/viewer/2022021209/62063e308c2f7b173005c8c5/html5/thumbnails/7.jpg)
Problem Formulation • A GPS-associated document is a text document
associated with a GPS location.
• A geographical topic is a spatially coherent theme.
In other words, the words that are often close in
space are clustered in a topic.
• An example of geographical topics o Given a collection of geo-tagged photos related to festival with
tags and locations in Flickr, the desired geographical topics are
the festivals in different areas, such as Cherry Blossom Festival in
Washington DC and South by Southwest Festival in Austin, etc.
3/21/2011 7
![Page 8: Geographical Topic Discovery and Comparison](https://reader031.vdocument.in/reader031/viewer/2022021209/62063e308c2f7b173005c8c5/html5/thumbnails/8.jpg)
Problem Formulation (Cont.) • Given a collection of GPS-associated documents
o Discover the geographical topics
o Compare the topics in different geographical
locations.
3/21/2011 8
![Page 9: Geographical Topic Discovery and Comparison](https://reader031.vdocument.in/reader031/viewer/2022021209/62063e308c2f7b173005c8c5/html5/thumbnails/9.jpg)
Problem Formulation (Cont.) • An example of geographical topic discovery and
comparison
o Given a collection of geo-tagged photos related to food
with tags and locations in Flickr, we would like to discover
the geographical topics, i.e., what people eat in different areas. After we discover the food preferences, we would
like to compare the food preference distributions in
different geographical locations.
3/21/2011 9
![Page 10: Geographical Topic Discovery and Comparison](https://reader031.vdocument.in/reader031/viewer/2022021209/62063e308c2f7b173005c8c5/html5/thumbnails/10.jpg)
Problem Formulation (Cont.) • A topic distribution in geographical location is the
distribution of the topics given a specific location.
o Formally, p(z|l) is the probability of topic z given location l
= (x, y) where x is longitude and y is latitude.
3/21/2011 10
![Page 11: Geographical Topic Discovery and Comparison](https://reader031.vdocument.in/reader031/viewer/2022021209/62063e308c2f7b173005c8c5/html5/thumbnails/11.jpg)
Geographical Topic
Discovery and Comparison • Given a collection of GPS-associated documents D
and the number of topics K, we would like to
discover K geographical topics, i.e., where
Z is the topic set and a geographical topic z is
represented by a word distribution
s.t. .
• Along with the discovered geographical topics, we
also would like to know the topic distribution in
different geographical locations for topic
comparison, i.e., p(z|l) for all z Z in location l.
3/21/2011 11
Zzz }{
Vwz zwp )}|({ 1)|( Vwzwp
![Page 12: Geographical Topic Discovery and Comparison](https://reader031.vdocument.in/reader031/viewer/2022021209/62063e308c2f7b173005c8c5/html5/thumbnails/12.jpg)
Solution • Location-Driven Model (LDM)
• Text-Driven Model (TDM)
• Location-Text Joint Model (Latent Geographical
Topic Analysis (LGTA))
3/21/2011 12
![Page 13: Geographical Topic Discovery and Comparison](https://reader031.vdocument.in/reader031/viewer/2022021209/62063e308c2f7b173005c8c5/html5/thumbnails/13.jpg)
Location-Driven Model (LDM) • LDM
o Clustering based on document locations
o One location clustering is a topic
o Generate topic description for each cluster
• Disadvantage o No text guidance
o It is possible that there is no spatial cluster patterns. A
geographical topic may be from several different areas
and these areas may not be close to each other.
• In landscape dataset, mountains exists in different areas and
these areas are not close to each other
3/21/2011 13
![Page 14: Geographical Topic Discovery and Comparison](https://reader031.vdocument.in/reader031/viewer/2022021209/62063e308c2f7b173005c8c5/html5/thumbnails/14.jpg)
Text-Driven Model (TDM) • Discover the geographical topics using topic modeling
o Topic modeling with network regularization [Mei et al. WWW’08]
o Regularization based on the closeness in location between
documents
• Disadvantage
o How to define the document closeness w(u, v)?
o How to have the topic distribution of locations p(z|l)?
3/21/2011 14
![Page 15: Geographical Topic Discovery and Comparison](https://reader031.vdocument.in/reader031/viewer/2022021209/62063e308c2f7b173005c8c5/html5/thumbnails/15.jpg)
LOCATION-TEXT JOINT MODEL • Main Insight: Construct a model to encode the
spatial structure of words
o The words that are close in space are likely to be clustered
into the same geographical topic.
• Assume there are a set of regions. The topics are
generated from regions instead of documents.
o If two words are close to each other in space, they are
more likely to belong to the same region.
o If two words are from the same region, they are more likely
to be clustered into the same topic.
3/21/2011 15
![Page 16: Geographical Topic Discovery and Comparison](https://reader031.vdocument.in/reader031/viewer/2022021209/62063e308c2f7b173005c8c5/html5/thumbnails/16.jpg)
Latent Geographical Topic Analysis (LGTA)
3/21/2011 16
region
importance
p(z|d) p(w|z)
location shape
• Combine text and location information
• Adapts the region discovery process according to
the dataset.
![Page 17: Geographical Topic Discovery and Comparison](https://reader031.vdocument.in/reader031/viewer/2022021209/62063e308c2f7b173005c8c5/html5/thumbnails/17.jpg)
Parameter Estimation • EM algorithm
• Iterations: o Geo-clustering (region
discovery) is based on both location and topic
information.
o Topic modeling is based on
the text and region information.
3/21/2011 17
![Page 18: Geographical Topic Discovery and Comparison](https://reader031.vdocument.in/reader031/viewer/2022021209/62063e308c2f7b173005c8c5/html5/thumbnails/18.jpg)
Data Set • Flickr images with GPS locations
o Flickr API supports search criteria including tag, time, GPS
range, etc.
3/21/2011 18
![Page 19: Geographical Topic Discovery and Comparison](https://reader031.vdocument.in/reader031/viewer/2022021209/62063e308c2f7b173005c8c5/html5/thumbnails/19.jpg)
Compared Methods • LDM: Location-driven model
• TDM: Text-driven model
• GeoFolk [Sizov WSDM’10]: o A topic modeling method that uses both text and spatial information.
o Model each region as an isolated topic
o Assume the geographical distribution of each topic is Gaussian
• LGTA: Latent Geographical Topic Analysis
3/21/2011 19
![Page 20: Geographical Topic Discovery and Comparison](https://reader031.vdocument.in/reader031/viewer/2022021209/62063e308c2f7b173005c8c5/html5/thumbnails/20.jpg)
Topic Discovery Comparison • Festival dataset
o Topics related to South By Southwest Festival
3/21/2011 20
![Page 21: Geographical Topic Discovery and Comparison](https://reader031.vdocument.in/reader031/viewer/2022021209/62063e308c2f7b173005c8c5/html5/thumbnails/21.jpg)
Topic Discovery Comparison • Activity dataset
3/21/2011 21
![Page 22: Geographical Topic Discovery and Comparison](https://reader031.vdocument.in/reader031/viewer/2022021209/62063e308c2f7b173005c8c5/html5/thumbnails/22.jpg)
Topic Discovery Comparison
• Landscape dataset
3/21/2011 22
coast desert mountain
LDM
TDM
GeoFolk
LGTA
![Page 23: Geographical Topic Discovery and Comparison](https://reader031.vdocument.in/reader031/viewer/2022021209/62063e308c2f7b173005c8c5/html5/thumbnails/23.jpg)
• Average distance of word distributions of all
pairs of topics by KL-divergence
3/21/2011 23
Topic Quality Qualitative Comparison
![Page 24: Geographical Topic Discovery and Comparison](https://reader031.vdocument.in/reader031/viewer/2022021209/62063e308c2f7b173005c8c5/html5/thumbnails/24.jpg)
Topic Quality Qualitative Comparison
• Text Perplexity
3/21/2011 24
![Page 25: Geographical Topic Discovery and Comparison](https://reader031.vdocument.in/reader031/viewer/2022021209/62063e308c2f7b173005c8c5/html5/thumbnails/25.jpg)
Topic Quality Qualitative Comparison
• Location/Text Perplexity
3/21/2011 25
![Page 26: Geographical Topic Discovery and Comparison](https://reader031.vdocument.in/reader031/viewer/2022021209/62063e308c2f7b173005c8c5/html5/thumbnails/26.jpg)
3/21/2011 26
Geographical Topic Comparison
![Page 27: Geographical Topic Discovery and Comparison](https://reader031.vdocument.in/reader031/viewer/2022021209/62063e308c2f7b173005c8c5/html5/thumbnails/27.jpg)
27 3/21/2011
• Complicated model and parameter estimation
• How to set the number of regions and the number
of topics?
• How about estimating geographical locations for
images that are without geo information? o Generating representative photos for the landmarks