getting semantics from the crowd
DESCRIPTION
Talk given at the Dagstuhl seminar on Semantic Data Management, April 2012TRANSCRIPT
![Page 1: Getting Semantics from the Crowd](https://reader034.vdocument.in/reader034/viewer/2022052618/554e8c67b4c90526358b4b2f/html5/thumbnails/1.jpg)
Ge#ng Seman*cs from the Crowd
Gianluca Demar*ni eXascale Infolab, University of Fribourg
Switzerland
![Page 2: Getting Semantics from the Crowd](https://reader034.vdocument.in/reader034/viewer/2022052618/554e8c67b4c90526358b4b2f/html5/thumbnails/2.jpg)
Seman<c Web 2.0
• not the Web 3.0
• GeDng seman<cs from (non-‐expert) people – From few publishers and many consumers (SW 1.0) – To many publishers and many consumers (SW 2.0)
27-‐Apr-‐12 Gianluca Demar<ni, eXascale Infolab 2
![Page 3: Getting Semantics from the Crowd](https://reader034.vdocument.in/reader034/viewer/2022052618/554e8c67b4c90526358b4b2f/html5/thumbnails/3.jpg)
read/write SW
• WikidatahQp://meta.wikimedia.org/wiki/Wikidata
• Seman<cs is about the meaning • Get people in the loop! • Social compu<ng for SemWeb applica<ons
27-‐Apr-‐12 Gianluca Demar<ni, eXascale Infolab 3
![Page 4: Getting Semantics from the Crowd](https://reader034.vdocument.in/reader034/viewer/2022052618/554e8c67b4c90526358b4b2f/html5/thumbnails/4.jpg)
Crowdsourcing
• Exploit human intelligence to solve – Tasks simple for humans, complex for machines – With a large number of humans (the Crowd) – Small problems: micro-‐tasks (Amazon MTurk)
• Examples – Wikipedia, Flickr
• Incen<ves – Financial, fun, visibility
27-‐Apr-‐12 Gianluca Demar<ni, eXascale Infolab 4
![Page 5: Getting Semantics from the Crowd](https://reader034.vdocument.in/reader034/viewer/2022052618/554e8c67b4c90526358b4b2f/html5/thumbnails/5.jpg)
Crowdsourcing
• Success Stories – Training set for ML – Image tagging – Document annota<on/transla<on – IR evalua<on [Blanco et al. SIGIR 2011] – CrowdDB [Franklin et al. SIGMOD 2011]
27-‐Apr-‐12 Gianluca Demar<ni, eXascale Infolab 5
![Page 6: Getting Semantics from the Crowd](https://reader034.vdocument.in/reader034/viewer/2022052618/554e8c67b4c90526358b4b2f/html5/thumbnails/6.jpg)
Crowd-‐powered SW apps
• En<ty Linking [ZenCrowd at WWW12] • Create/validate sameAs links • Schema matching
• ... Add your own favorite applica<on!
27-‐Apr-‐12 Gianluca Demar<ni, eXascale Infolab 6
HTML+ RDFaPages
LOD Cloud
![Page 7: Getting Semantics from the Crowd](https://reader034.vdocument.in/reader034/viewer/2022052618/554e8c67b4c90526358b4b2f/html5/thumbnails/7.jpg)
ZenCrowd
• Combine both algorithmic and manual linking • Automate manual linking via crowdsourcing • Dynamically assess human workers with a probabilis<c reasoning framework
27-‐Apr-‐12 7
Crowd
Algorithms Machines
![Page 8: Getting Semantics from the Crowd](https://reader034.vdocument.in/reader034/viewer/2022052618/554e8c67b4c90526358b4b2f/html5/thumbnails/8.jpg)
ZenCrowd Architecture
Micro Matching
Tasks
HTMLPages
HTML+ RDFaPages
LOD Open Data Cloud
CrowdsourcingPlatform
ZenCrowdEntity
Extractors
LOD Index Get Entity
Input Output
Probabilistic Network
Decision Engine
Micr
o-Ta
sk M
anag
er
Workers Decisions
AlgorithmicMatchers
27-‐Apr-‐12 Gianluca Demar<ni, eXascale Infolab 8
![Page 9: Getting Semantics from the Crowd](https://reader034.vdocument.in/reader034/viewer/2022052618/554e8c67b4c90526358b4b2f/html5/thumbnails/9.jpg)
The micro-‐task
27-‐Apr-‐12 Gianluca Demar<ni, eXascale Infolab 9
![Page 10: Getting Semantics from the Crowd](https://reader034.vdocument.in/reader034/viewer/2022052618/554e8c67b4c90526358b4b2f/html5/thumbnails/10.jpg)
En<ty Factor Graphs
• Graph components – Workers, links, clicks – Prior probabili<es – Link Factors – Constraints
• Probabilis<c Inference – Select all links with posterior prob >τ
w1 w2
l1 l2
pw1( ) pw2( )
lf1( ) lf2( )
pl1( ) pl2( )
l3
lf3( )
pl3( )
c11 c22c12c21 c13 c23
u2-3( )sa1-2( )
2 workers, 6 clicks, 3 candidate links
Link priors
Worker priors
Observed variables
Link factors
SameAs constraints
Dataset Unicity constraints
27-‐Apr-‐12 Gianluca Demar<ni, eXascale Infolab 10
![Page 11: Getting Semantics from the Crowd](https://reader034.vdocument.in/reader034/viewer/2022052618/554e8c67b4c90526358b4b2f/html5/thumbnails/11.jpg)
ZenCrowd: Lessons Learnt
• Crowdsourcing + Prob reasoning works! • But – Different worker communi<es perform differently – No differences w/ different contexts – Comple<on <me may vary (based on reward) – Many low quality workers + Spam
27-‐Apr-‐12 Gianluca Demar<ni, eXascale Infolab 11
![Page 12: Getting Semantics from the Crowd](https://reader034.vdocument.in/reader034/viewer/2022052618/554e8c67b4c90526358b4b2f/html5/thumbnails/12.jpg)
ZenCrowd
• Worker Selec<on
Top$US$Worker$
0$
0.5$
1$
0$ 250$ 500$
Worker&P
recision
&
Number&of&Tasks&
US$Workers$
IN$Workers$
27-‐Apr-‐12 Gianluca Demar<ni, eXascale Infolab 12
![Page 13: Getting Semantics from the Crowd](https://reader034.vdocument.in/reader034/viewer/2022052618/554e8c67b4c90526358b4b2f/html5/thumbnails/13.jpg)
Challenges for Crowd-‐SW
• How to design the micro-‐task • Where to find the crowd – MTurk, Facebook (900M users)
• Evalua<on – Which ground truth?!
• Quality control / Spam – Need for spam benchmarks in Crowdsourcing [Mechanical Cheat at CrowdSearch 2012]
27-‐Apr-‐12 Gianluca Demar<ni, eXascale Infolab 13
![Page 14: Getting Semantics from the Crowd](https://reader034.vdocument.in/reader034/viewer/2022052618/554e8c67b4c90526358b4b2f/html5/thumbnails/14.jpg)
27-‐Apr-‐12 Gianluca Demar<ni, eXascale Infolab 14