new perspectives in social data managementtower bridge new york city 100 3,991 brooklyn bridge,...
TRANSCRIPT
![Page 1: New Perspectives in Social Data ManagementTower Bridge New York City 100 3,991 Brooklyn Bridge, Ellis Island Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco 80 12,308 Aquarium](https://reader033.vdocument.in/reader033/viewer/2022050500/5f92e98715dedd17fc450e24/html5/thumbnails/1.jpg)
New Perspectives in Social Data Management
Sihem Amer-Yahia Research Director
CNRS @ LIG
TSUKUBA University May 20th, 2014
![Page 2: New Perspectives in Social Data ManagementTower Bridge New York City 100 3,991 Brooklyn Bridge, Ellis Island Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco 80 12,308 Aquarium](https://reader033.vdocument.in/reader033/viewer/2022050500/5f92e98715dedd17fc450e24/html5/thumbnails/2.jpg)
TSUKUBA Univ. – May 2014
Traditional data management stack
relational tables++ native XML backend
Physical Layer
Logical Layer
Logical and physical optimizations
Index creation
Application Layer
High-level specification
![Page 3: New Perspectives in Social Data ManagementTower Bridge New York City 100 3,991 Brooklyn Bridge, Ellis Island Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco 80 12,308 Aquarium](https://reader033.vdocument.in/reader033/viewer/2022050500/5f92e98715dedd17fc450e24/html5/thumbnails/3.jpg)
TSUKUBA Univ. – May 2014
Collaborative data model
Item 1
Item 2
Item 3
Item 4
Item 5
Item 6
Item 7
boolean, rating, tag, sentiment…
User space (with attributes)
Item space (with attributes)
user1
user2
user3
user4
user5
User6
![Page 4: New Perspectives in Social Data ManagementTower Bridge New York City 100 3,991 Brooklyn Bridge, Ellis Island Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco 80 12,308 Aquarium](https://reader033.vdocument.in/reader033/viewer/2022050500/5f92e98715dedd17fc450e24/html5/thumbnails/4.jpg)
Let’s examine a canonical social application
![Page 5: New Perspectives in Social Data ManagementTower Bridge New York City 100 3,991 Brooklyn Bridge, Ellis Island Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco 80 12,308 Aquarium](https://reader033.vdocument.in/reader033/viewer/2022050500/5f92e98715dedd17fc450e24/html5/thumbnails/5.jpg)
TSUKUBA Univ. – May 2014
Extracting travel itineraries from Flickr
Goal: extract the itinerary of each traveler by mapping photos into Points Of Interest (POIs) and aggregate actions of many travelers into coherent queryable itineraries
Automatic construction of travel itineraries using social breadcrumbs: with Munmun De Choudhury
(Arizona State University), Moran Feldman (Technion), Nadav Golbandi, Ronny Lempel (Yahoo! Research), Cong Yu (Google Research). HyperText Conference 2010
Interactive Itinerary Planning: with Senjuti Basu Roy (Univ. of Washington), Gautam Das (Univ. of
Texas at Arlington), Cong Yu (Google Research). ICDE 2011
Deployed on Yahoo! Mobile
![Page 6: New Perspectives in Social Data ManagementTower Bridge New York City 100 3,991 Brooklyn Bridge, Ellis Island Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco 80 12,308 Aquarium](https://reader033.vdocument.in/reader033/viewer/2022050500/5f92e98715dedd17fc450e24/html5/thumbnails/6.jpg)
TSUKUBA Univ. – May 2014
• Iden%fy photos of a given city • Filter out residents of a city • Validate photo %mestamps
• Photo Streams Segmenta%on o Split the stream whenever the %me
difference between two successive photos is “large”
• Dis%lla%on of Timed Visits • <POI, start %me, end %me>
• Construc%on of Timed Paths o A sequence of Timed Visits
• Extract Candidate POIs o Lonely Planet/Y! Travel to extract
landmarks o Yahoo! Maps API to retrieve their geo-‐
loca%ons
• Tag & geo-‐based POI associa%on
Timed Paths
Photo-‐PO
I Map
ping
Ph
oto Stream
s
![Page 7: New Perspectives in Social Data ManagementTower Bridge New York City 100 3,991 Brooklyn Bridge, Ellis Island Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco 80 12,308 Aquarium](https://reader033.vdocument.in/reader033/viewer/2022050500/5f92e98715dedd17fc450e24/html5/thumbnails/7.jpg)
TSUKUBA Univ. – May 2014
Problem definition
• Definitions – Each itinerary is a timed path – The set of timed paths implies a weighted graph G over POIs – An itinerary is a path in the graph G – The value of an itinerary is the sum of popularities of its POIs – The time of an itinerary is the sum of POI visit and transit times
• Problem Instance (“Orienteering”) – Find an itinerary in G from a source POI to a target POI of budget (=time) at
most B maximizing total value – The time budget B is typically whole days – source and target POIs provided by user (e.g. hotel)
![Page 8: New Perspectives in Social Data ManagementTower Bridge New York City 100 3,991 Brooklyn Bridge, Ellis Island Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco 80 12,308 Aquarium](https://reader033.vdocument.in/reader033/viewer/2022050500/5f92e98715dedd17fc450e24/html5/thumbnails/8.jpg)
TSUKUBA Univ. – May 2014
Example i*nerary for NYC (single-‐day)
![Page 9: New Perspectives in Social Data ManagementTower Bridge New York City 100 3,991 Brooklyn Bridge, Ellis Island Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco 80 12,308 Aquarium](https://reader033.vdocument.in/reader033/viewer/2022050500/5f92e98715dedd17fc450e24/html5/thumbnails/9.jpg)
TSUKUBA Univ. – May 2014
Social data management stack
raw data
Data preparation
Search and Recommendation
Social Analytics
Application logic
Application evaluation
![Page 10: New Perspectives in Social Data ManagementTower Bridge New York City 100 3,991 Brooklyn Bridge, Ellis Island Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco 80 12,308 Aquarium](https://reader033.vdocument.in/reader033/viewer/2022050500/5f92e98715dedd17fc450e24/html5/thumbnails/10.jpg)
TSUKUBA Univ. – May 2014
Architecture of a typical Data Mining system
Data Warehouse
Data cleaning &
data integration
Filtering
Database
Data mining engine
Pattern evaluation
Graphical user interface
Knowledge-base
![Page 11: New Perspectives in Social Data ManagementTower Bridge New York City 100 3,991 Brooklyn Bridge, Ellis Island Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco 80 12,308 Aquarium](https://reader033.vdocument.in/reader033/viewer/2022050500/5f92e98715dedd17fc450e24/html5/thumbnails/11.jpg)
TSUKUBA Univ. – May 2014
Architecture of a typical Data Mining system
![Page 12: New Perspectives in Social Data ManagementTower Bridge New York City 100 3,991 Brooklyn Bridge, Ellis Island Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco 80 12,308 Aquarium](https://reader033.vdocument.in/reader033/viewer/2022050500/5f92e98715dedd17fc450e24/html5/thumbnails/12.jpg)
TSUKUBA Univ. – May 2014
• Examined typical social applications: URL recommendation in Delicious, group recommendation in MovieLens, social analytics on Twitter, itinerary extraction in Flickr – Data Collection
• mapping data into <u,i,label> triples – Data Sanitation
• Pruning: cut long tails of user actions, remove photos taken by residents – in delicious, removing URLs tagged with less than 5 tags reduces input data to 27% of input size
• Text processing: topic extraction • Normalization: of ratings– in MovieLens, critics are more moderate than less-active
reviewers • Enrichment: POI-to-photo association, named entity extraction, twitter
vocabulary expansion (e.g., using Yahoo! Boss interface), sentiment analysis – Data Transformation
• from <u,i,label> to <u,i,label> and <u,{(v,w)}> …
SOCLE: A framework for social data preparation with N. Ibrahim, C. Kamdem-Kengne, F. Uliana, M.C. Rousset submitted for publication
![Page 13: New Perspectives in Social Data ManagementTower Bridge New York City 100 3,991 Brooklyn Bridge, Ellis Island Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco 80 12,308 Aquarium](https://reader033.vdocument.in/reader033/viewer/2022050500/5f92e98715dedd17fc450e24/html5/thumbnails/13.jpg)
TSUKUBA Univ. – May 2014
Search & Recommendation
<u, i, label>
<u, i, label> <u, {(v,w)}>
Data Preparation
output
raw social data
Social Analytics
Data Collection
User Similarity Functions
<u, i, label>
<u, i, label>
Data Transformation
Network Construction
Functions
Index Generation
Data Sanitation
Pruning Enrichment Text Processing Normalization
![Page 14: New Perspectives in Social Data ManagementTower Bridge New York City 100 3,991 Brooklyn Bridge, Ellis Island Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco 80 12,308 Aquarium](https://reader033.vdocument.in/reader033/viewer/2022050500/5f92e98715dedd17fc450e24/html5/thumbnails/14.jpg)
TSUKUBA Univ. – May 2014
SOCLE model
– Which data model? an extensible type system – Which storage model?
![Page 15: New Perspectives in Social Data ManagementTower Bridge New York City 100 3,991 Brooklyn Bridge, Ellis Island Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco 80 12,308 Aquarium](https://reader033.vdocument.in/reader033/viewer/2022050500/5f92e98715dedd17fc450e24/html5/thumbnails/15.jpg)
TSUKUBA Univ. – May 2014
SOCLE model and algebra with L. Lakshmanan and C. Yu SocialScope: Enabling Information Discovery on Social Content Sites at CIDR 2009
Enrich a node with attributes -> new node type • Algebra operator : γN
C,d,att,A(G)
n1 {id=101; Name=John; Type=user,traveler, Vst={Paris, Grenoble, Pekin}}
{id=10; Name=Paris; Type=destination}
n2 n3
n4
{id=11; Name=Grenoble; Type=destination}
{id=15; Name=Pekin; Type=destination}
L13
L12
L14
L12 = {id=30, type=visit, ..} L13 = {id=31, type=visit, ..} L14 = {id=32, type=visit, ..}
![Page 16: New Perspectives in Social Data ManagementTower Bridge New York City 100 3,991 Brooklyn Bridge, Ellis Island Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco 80 12,308 Aquarium](https://reader033.vdocument.in/reader033/viewer/2022050500/5f92e98715dedd17fc450e24/html5/thumbnails/16.jpg)
TSUKUBA Univ. – May 2014
Storage Model: native or relational++?
![Page 17: New Perspectives in Social Data ManagementTower Bridge New York City 100 3,991 Brooklyn Bridge, Ellis Island Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco 80 12,308 Aquarium](https://reader033.vdocument.in/reader033/viewer/2022050500/5f92e98715dedd17fc450e24/html5/thumbnails/17.jpg)
TSUKUBA Univ. – May 2014
SOCLE algebra
– Examine how existing algebras/languages for querying social data can be used for data preparation
– Properties • Declarativity • Expressivity and closure • Provenance • Invertibility
![Page 18: New Perspectives in Social Data ManagementTower Bridge New York City 100 3,991 Brooklyn Bridge, Ellis Island Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco 80 12,308 Aquarium](https://reader033.vdocument.in/reader033/viewer/2022050500/5f92e98715dedd17fc450e24/html5/thumbnails/18.jpg)
TSUKUBA Univ. – May 2014
What makes SDM different from DM?
• SDM needs a different data management stack: data preparation
• In social computing, analysts do not always know what to look for
• In social computing, application output must be evaluated
![Page 19: New Perspectives in Social Data ManagementTower Bridge New York City 100 3,991 Brooklyn Bridge, Ellis Island Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco 80 12,308 Aquarium](https://reader033.vdocument.in/reader033/viewer/2022050500/5f92e98715dedd17fc450e24/html5/thumbnails/19.jpg)
TSUKUBA Univ. – May 2014
• Since analysts do not know what to look for, let’s examine some social data exploration instances – Rating exploration MRI: Meaningful Interpretations of collaborative Ratings with M. Das, S. Thirumuruganathan, G. Das (UT Arlington), C. Yu (Google) at VLDB 2011 – Tag exploration Who tags what? An analysis framework with M. Das, S. Thirumuruganathan, G. Das (UT Arlington), C. Yu (Google) at VLDB 2012 – Temporal exploration Efficient sentiment correlation for Large-scale Demographics with M. Tsytsarau and T. Palpanas (Univ. of Trento) at SIGMOD 2013
Social data exploration instances
![Page 20: New Perspectives in Social Data ManagementTower Bridge New York City 100 3,991 Brooklyn Bridge, Ellis Island Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco 80 12,308 Aquarium](https://reader033.vdocument.in/reader033/viewer/2022050500/5f92e98715dedd17fc450e24/html5/thumbnails/20.jpg)
Rating exploration
![Page 21: New Perspectives in Social Data ManagementTower Bridge New York City 100 3,991 Brooklyn Bridge, Ellis Island Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco 80 12,308 Aquarium](https://reader033.vdocument.in/reader033/viewer/2022050500/5f92e98715dedd17fc450e24/html5/thumbnails/21.jpg)
TSUKUBA Univ. – May 2014
Collaborative rating model
• Rating tuple: <item attributes, user attributes, rating>
• Group: a set of ratings describable by a set of attribute values – Based on data cubes in OLAP (for mining multidimensional data)
ID Title Genre Director Name Gender Location Rating
1 Titanic Drama James Cameron
Amy Female New York 8.5
2 Schindler’s List
Drama Steven Speilberg
John Male New York 7.0
![Page 22: New Perspectives in Social Data ManagementTower Bridge New York City 100 3,991 Brooklyn Bridge, Ellis Island Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco 80 12,308 Aquarium](https://reader033.vdocument.in/reader033/viewer/2022050500/5f92e98715dedd17fc450e24/html5/thumbnails/22.jpg)
TSUKUBA Univ. – May 2014
Exploration space
Partial Rating Lattice for a Movie
(M:Male, Y:Young, CA:California, S:Student)
Each node/data cube/ cuboid in lattice is a group
Example group: Gender: Male Age: Young Location: CA Occupation: Student
Task Quickly identify
“good” groups in the lattice that help users
understand ratings effectively
![Page 23: New Perspectives in Social Data ManagementTower Bridge New York City 100 3,991 Brooklyn Bridge, Ellis Island Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco 80 12,308 Aquarium](https://reader033.vdocument.in/reader033/viewer/2022050500/5f92e98715dedd17fc450e24/html5/thumbnails/23.jpg)
TSUKUBA Univ. – May 2014
![Page 24: New Perspectives in Social Data ManagementTower Bridge New York City 100 3,991 Brooklyn Bridge, Ellis Island Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco 80 12,308 Aquarium](https://reader033.vdocument.in/reader033/viewer/2022050500/5f92e98715dedd17fc450e24/html5/thumbnails/24.jpg)
TSUKUBA Univ. – May 2014
DEM: Meaningful Description Mining
• For an input item covering RI ratings, return set C of cuboids, s.t.:
– description error is minimized, subject to: • |C| ≤ k; • coverage ≥ α
Description Error: how well a cuboid average rating approximates
the numerical score of each individual rating belonging to it Coverage: percentage of ratings covered by the returned cuboids
![Page 25: New Perspectives in Social Data ManagementTower Bridge New York City 100 3,991 Brooklyn Bridge, Ellis Island Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco 80 12,308 Aquarium](https://reader033.vdocument.in/reader033/viewer/2022050500/5f92e98715dedd17fc450e24/html5/thumbnails/25.jpg)
TSUKUBA Univ. – May 2014
DEM: Meaningful Description Mining
Identify groups of reviewers who consistently share similar ratings on items
![Page 26: New Perspectives in Social Data ManagementTower Bridge New York City 100 3,991 Brooklyn Bridge, Ellis Island Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco 80 12,308 Aquarium](https://reader033.vdocument.in/reader033/viewer/2022050500/5f92e98715dedd17fc450e24/html5/thumbnails/26.jpg)
TSUKUBA Univ. – May 2014
DEM: Meaningful Description Mining
To verify NP-completeness, we reduce the Exact 3-Set Cover problem (EC3) to the decision version of our problem. EC3 is the problem of finding an exact cover for a finite set U, where each of the subsets available for use contain exactly 3 elements.
![Page 27: New Perspectives in Social Data ManagementTower Bridge New York City 100 3,991 Brooklyn Bridge, Ellis Island Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco 80 12,308 Aquarium](https://reader033.vdocument.in/reader033/viewer/2022050500/5f92e98715dedd17fc450e24/html5/thumbnails/27.jpg)
TSUKUBA Univ. – May 2014
DEM Algorithms
• Exact Algorithm (E-DEM) – Brute-force enumerating all possible combinations of cuboids in
lattice to return the exact (i.e., optimal) set as rating descriptions
• Random Restart Hill Climbing Algorithm – Often fails to satisfy Coverage constraint; Large number of restarts
required – Need an algorithm that optimizes both Coverage and Description
Error constraints simultaneously
• Randomized Hill Exploration Algorithm (RHE-DEM)
![Page 28: New Perspectives in Social Data ManagementTower Bridge New York City 100 3,991 Brooklyn Bridge, Ellis Island Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco 80 12,308 Aquarium](https://reader033.vdocument.in/reader033/viewer/2022050500/5f92e98715dedd17fc450e24/html5/thumbnails/28.jpg)
TSUKUBA Univ. – May 2014
RHE-DEM Algorithm
C= {Male, Student} {California, Student}
Satisfy Coverage
Minimize Error
Figure: Partial Rating Lattice for a Movie; k=2, α=80%
(M:Male, Y:Young, CA:California, S:Student)
![Page 29: New Perspectives in Social Data ManagementTower Bridge New York City 100 3,991 Brooklyn Bridge, Ellis Island Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco 80 12,308 Aquarium](https://reader033.vdocument.in/reader033/viewer/2022050500/5f92e98715dedd17fc450e24/html5/thumbnails/29.jpg)
TSUKUBA Univ. – May 2014
RHE-DEM Algorithm
Say, C does not satisfy Coverage Constraint
C= {Male, Student} {California, Student}
Satisfy Coverage
Minimize Error
Figure: Partial Rating Lattice for a Movie; k=2, α=80%
(M:Male, Y:Young, CA:California, S:Student)
![Page 30: New Perspectives in Social Data ManagementTower Bridge New York City 100 3,991 Brooklyn Bridge, Ellis Island Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco 80 12,308 Aquarium](https://reader033.vdocument.in/reader033/viewer/2022050500/5f92e98715dedd17fc450e24/html5/thumbnails/30.jpg)
TSUKUBA Univ. – May 2014
RHE-DEM Algorithm
C= {Male, Student} {California, Student}
Satisfy Coverage
Minimize Error
Figure: Partial Rating Lattice for a Movie; k=2, α=80%
(M:Male, Y:Young, CA:California, S:Student)
C= {Male} {California,Student}
C= {Student} {California,Student}
![Page 31: New Perspectives in Social Data ManagementTower Bridge New York City 100 3,991 Brooklyn Bridge, Ellis Island Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco 80 12,308 Aquarium](https://reader033.vdocument.in/reader033/viewer/2022050500/5f92e98715dedd17fc450e24/html5/thumbnails/31.jpg)
TSUKUBA Univ. – May 2014
RHE-DEM Algorithm
C= {Male} {California, Student}
Say, C satisfies Coverage Constraint
Satisfy Coverage
Minimize Error √
Figure: Partial Rating Lattice for a Movie; k=2, α=80%
(M:Male, Y:Young, CA:California, S:Student)
![Page 32: New Perspectives in Social Data ManagementTower Bridge New York City 100 3,991 Brooklyn Bridge, Ellis Island Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco 80 12,308 Aquarium](https://reader033.vdocument.in/reader033/viewer/2022050500/5f92e98715dedd17fc450e24/html5/thumbnails/32.jpg)
TSUKUBA Univ. – May 2014
RHE-DEM Algorithm
C= {Male} {California, Student}
Satisfy Coverage
Minimize Error √
Figure: Partial Rating Lattice for a Movie; k=2, α=80%
(M:Male, Y:Young, CA:California, S:Student)
![Page 33: New Perspectives in Social Data ManagementTower Bridge New York City 100 3,991 Brooklyn Bridge, Ellis Island Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco 80 12,308 Aquarium](https://reader033.vdocument.in/reader033/viewer/2022050500/5f92e98715dedd17fc450e24/html5/thumbnails/33.jpg)
TSUKUBA Univ. – May 2014
RHE-DEM Algorithm
C= {Male} {California, Student}
Satisfy Coverage
Minimize Error √
Figure: Partial Rating Lattice for a Movie; k=2, α=80%
(M:Male, Y:Young, CA:California, S:Student)
![Page 34: New Perspectives in Social Data ManagementTower Bridge New York City 100 3,991 Brooklyn Bridge, Ellis Island Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco 80 12,308 Aquarium](https://reader033.vdocument.in/reader033/viewer/2022050500/5f92e98715dedd17fc450e24/html5/thumbnails/34.jpg)
TSUKUBA Univ. – May 2014
RHE-DEM Algorithm
C= {Male} {Student}
Satisfy Coverage
Minimize Error √ √
Figure: Partial Rating Lattice for a Movie; k=2, α=80%
(M:Male, Y:Young, CA:California, S:Student)
![Page 35: New Perspectives in Social Data ManagementTower Bridge New York City 100 3,991 Brooklyn Bridge, Ellis Island Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco 80 12,308 Aquarium](https://reader033.vdocument.in/reader033/viewer/2022050500/5f92e98715dedd17fc450e24/html5/thumbnails/35.jpg)
TSUKUBA Univ. – May 2014
What makes SDM different from DM?
• SDM needs a different data management stack: data preparation
• In social computing, analysts do not always know what to look for
• In social computing, application output must be evaluated
![Page 36: New Perspectives in Social Data ManagementTower Bridge New York City 100 3,991 Brooklyn Bridge, Ellis Island Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco 80 12,308 Aquarium](https://reader033.vdocument.in/reader033/viewer/2022050500/5f92e98715dedd17fc450e24/html5/thumbnails/36.jpg)
TSUKUBA Univ. – May 2014
City #POIs #Timed Paths Sample POIs
Barcelona 74 6,087 Museu Picasso, Plaza Reial
London 163 19,052 Buckingham Palace, Churchill Museum, Tower Bridge
New York City
100 3,991 Brooklyn Bridge, Ellis Island
Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco
80 12,308 Aquarium of the Bay, Golden Gate Bridge, Lombard Street
City Ground Truth Sources Barcelona www.barcelona-tourist-guide.com London www.theoriginaltour.com New York City www.newyorksightseeing.com Paris www.carsrouges.com San Francisco www.allsanfranciscotours.com
![Page 37: New Perspectives in Social Data ManagementTower Bridge New York City 100 3,991 Brooklyn Bridge, Ellis Island Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco 80 12,308 Aquarium](https://reader033.vdocument.in/reader033/viewer/2022050500/5f92e98715dedd17fc450e24/html5/thumbnails/37.jpg)
TSUKUBA Univ. – May 2014
Global comparison
POI quality
Transit times
Comparative evaluation
![Page 38: New Perspectives in Social Data ManagementTower Bridge New York City 100 3,991 Brooklyn Bridge, Ellis Island Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco 80 12,308 Aquarium](https://reader033.vdocument.in/reader033/viewer/2022050500/5f92e98715dedd17fc450e24/html5/thumbnails/38.jpg)
TSUKUBA Univ. – May 2014
Results for side-by-side comparison
![Page 39: New Perspectives in Social Data ManagementTower Bridge New York City 100 3,991 Brooklyn Bridge, Ellis Island Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco 80 12,308 Aquarium](https://reader033.vdocument.in/reader033/viewer/2022050500/5f92e98715dedd17fc450e24/html5/thumbnails/39.jpg)
TSUKUBA Univ. – May 2014
Challenge 1: Filtering expert AMT workers
Multi-answer questions on “less-known” POIs
![Page 40: New Perspectives in Social Data ManagementTower Bridge New York City 100 3,991 Brooklyn Bridge, Ellis Island Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco 80 12,308 Aquarium](https://reader033.vdocument.in/reader033/viewer/2022050500/5f92e98715dedd17fc450e24/html5/thumbnails/40.jpg)
TSUKUBA Univ. – May 2014
Challenge 2: How to better exploit the crowd?
Crowds, not drones: modeling human factors in crowdsourcing with S. B. Roy (U. of Washington), G. Das, S. Thirumuruganathan (UT Arlington), I. Lykourentzou (Tudor Institute and INRIA) at DBCrowd 2013
![Page 41: New Perspectives in Social Data ManagementTower Bridge New York City 100 3,991 Brooklyn Bridge, Ellis Island Paris 114 10,651 Tour Eiffel, Musee du Louvre San Francisco 80 12,308 Aquarium](https://reader033.vdocument.in/reader033/viewer/2022050500/5f92e98715dedd17fc450e24/html5/thumbnails/41.jpg)
TSUKUBA Univ. – May 2014
Summary
• There are three kinds of users in SDM – End user who generates content of varying quality and demands high
quality content – Analyst (data scientist and application developer) who needs a better
understanding of the underlying data and users – Worker who helps relate to end user and evaluate content utility
• Data preparation tools and efficient social exploration would help analysts – new opportunities for algebraic optimizations – a collection of optimization problems with data-centric or analyst-centric
goals – often a reduction of hard problems with heuristics/approximation algorithms – but also appropriate indexing
• Application validation could benefit from worker profiling and crowd indexing