beira: a geo-semantic clustering method for area summary
DESCRIPTION
The 8th International Conference on Web Information Systems Engineering (WISE2007)TRANSCRIPT
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.
BEIRA: A geo-semantic clustering method for area summary
Osamu Masutani, Hirotoshi IwasakiDenso IT Laboratory, Inc.
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 2 of 26
Summary
BackgroundConceptSystem architectureEvaluationConclusions & Future works
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 3 of 16
Background – Map service
Target- Car navigation or PND (Personal
Navigation Devices) - GPS mobile phone- Web-based Map Service
Major functionalities of map service- View maps around current position- Search route to destination- Search favorite POI (Point of
Interests)
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 4 of 16
A scenario : A visitor to NancyNo previous knowledge about Nancy.- Japanese- A little interest about Art
He has a free time.- No plan.- He can’t speak French.- He has a GPS mobile phone.
The only available information is from mobile map service.- He’d like to search POIs using the service.- What is a problem ?
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 5 of 16
Use cases : Searching POIs on mobile
3 ways to searchLocation based search- Nearby area
Category based search- “Restaurant” / “Italian” / …- “Public” / “Library” / …
Keyword based search- “chocolate cake”, “soccer”,
“beautiful”, “calm” , …
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 6 of 16
Problem in location based search
Filtering by the specified areaSometimes results are numerous- In central urban area- Broad area is chosen
Selection is very hard- UI is limited. (especially on mobile)
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 7 of 16
Problem in category based search
Filtering by specific categorySometimes results are numerous- When the user doesn’t specify
detail category
Information awareness- Once the user chose “Museum”
category, he can’t find “Place Stanislas”.
museum park
Place Stanislas
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 8 of 16
Problem of keyword based search
Filtering by keyword matchInformation awareness- The users is required to know about
the keyword in advance- “Art Nouveau” is good keyword to
find Nancy’s features.- But if the user mistakes the keyword
for “Art Deco” the result will be poor
Art nouveau
Place Stanislas
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.
ProblemsInformation overload- Numerous candidates- Millions of POIs in mobile phone service
Information awareness- Both fixed category and free keyword
search have the similar problem.
Solution- Reduce the candidates- But keep information awareness- Clustering and summarization of
information
9 of 16
museum park
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 10 of 16
Clustering and summarization
Similar concept- Web search engine “Vivisimo”- Displays clustering result and
their topic of search results- Dynamic category
Easy to choose but comprehensive- There are reduced number of
candidates but has comprehensive view
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.
Is Vivisimo enough ?
It provides only semantic (topic) view.- With map service- Switching between semantic and
geographic view will be complicated
Can these two views be combined?- Use only map view- Cluster = area
11 of 16
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 12 of 16
BEIRA :Bird’s Eye Information Retrieval Application
Topic based IR through geographic view.- Use AOI (Area of Interest) instead of POI- AOI consists of area(cluster) and its summary
(the word list)
Art Nouveau
Area
Summary=word list
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.
System architecturePOI database- Address of POI- Text of POI (guide text, reputation text etc.)
Preprocessing- Geo-coding and Topic vector generation.
Geo-semantic clustering and summarizationDisplay AOI
13 of 16
POI database
Geographic preprocessing
POI ID Address text Etc…
Semantic preprocessing
Geo-semanticclustering
Geo-semanticsummarization
AOI
AOI ID Area Polygon Summary
Topic Vector
Latitude Longitude
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.
Implementation
Combinations of GIS and Text mining tools
14 of 16
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.
Geo-semantic clusteringGeographic clustering doesn’t reflect area topics : Circular areaSemantic clustering doesn’t consider geographic view : Scattered areaGeo-semantic clustering solves these problems
15 of 16
Semantic Clustering G/S Clustering Geographic Clustering
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 16 of 16
Geo-semantic clustering
Co-clustering with geographic and semantic features- Geographic feature : latitude, longitude- Semantic feature : large dimension matrix (Latent
semantic indexing)
G/S ratio R: the combination ratio- R =Geographic bias / Semantic bias
Geographic Features Semantic FeaturesPOI ID Latitude longitude LSI1 LSI2 LSI3
・・・ ・・・ ・・・ ・・・ ・・・ ・・・
*R *1
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 17 of 16
Evaluation : geo-semantic clustering
Dataset : Cafes in Shibuya- Text contents : restaurants evaluation web site
“asku.com”- 272 cafes in the region (Shibuya ward).
Correct cluster data- Generated manually- 13 clusters in the region- F measure
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.
Results of clustering
Geo-semantic clustering produces non-circular area according to its topic.
Semantic GeographicGeo-semantic
R=1.0E-02 R=1.0E+06R=1.0E-04
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 19 of 16
Evaluation of clustering
We confirmed geo-semantic clustering is better than each solo clustering- Intermediate ratio (0.01) is optimal.
0
0.1
0.2
0.3
0.4
0.5
0.6
1.0E-04 1.0E-02 1.0E+00 1.0E+02 1.0E+04 1.0E+06
MLSA
Tensor-Kmeans
Semantic Geographic
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 20 of 16
Area summarization
Document summarizationTerm weighting : ex. TF/IDF- The term that occurs many times in a
document is important (TF term frequency)
- The rare term in entire document set is important (IDF inverse document frequency)
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.
The simple IDF cannot extract regional characteristic word- According to IDF , “onion” and “wedding” have same weight- “wedding” should be regarded as more important because the
area where wedding is held should be biased.
z Normal term“onion”
Place name “Dogenzaka”
Area term “Wedding”
IDF
IDF 3.08 3.51 3.04K 4.41 54.0 9.93
21 of 16
Problem of IDF
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.
The geographic distribution of word- Term occurrence in the geographic space
More condensed is regarded as more important- Measurement : K-value (point distribution analysis method)
IDF * K
22 of 16
Location aware IDF
z Normal term“onion”
Place name “Dogenzaka”
Area term “Wedding”
IDF
IDF 3.08 3.51 3.04K 4.41 54.0 9.93
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.
Evaluation measure : Extraction rate of location names- The area characteristic terms has similar
distribution with location name
z Normal term“onion”
Place name “Dogenzaka”
Area term “Wedding”
IDF
IDF 3.08 3.51 3.04K 4.41 54.0 9.93
23 of 16
Evaluation of location aware IDF
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 24 of 16
Evaluation data- All words in Shibuya area.- Top 1,000 weighted terms
Location aware IDF (IDF*K) efficiently extracts location name than conventional ones
Evaluation of location aware IDF
0
5
10
15
20
25
30
1 100 200 300 400 500 600 700 800 900
rank
densi
ty o
f lo
cation
nam
e[%
]
IDF
K
IDF*K
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved. 25 of 16
ConclusionsBEIRA attacks the issues on map service- Information overload- Information awareness
Geo-semantic combination of features and processing can be used to make area characteristics view.Future works- Automatic adaptation of G/S ratio- Evaluation on other contents Hokkai Takashima
(1850-1931)
Copyright (C) 2007 DENSO IT LABORATORY, INC. All Rights Reserved.
Thank you for your attention!
26 of 26