aaai2012
TRANSCRIPT
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association www.kit.edu
Institute of Applied Informatics and Formal Description Methods (AIFB)
Institute of Applied Informatics and Formal Description Methods (AIFB)
Crowdsourcing tasks in open query answering Elena Simperl,1 Barry Norton,2 Denny Vrandecic1
1Institute AIFB, Karlsruhe Institute of Technology, Germany 2Ontotext AD, Bulgaria
Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
2 07.06.2012
Background: what is Linked Data?
Linked Data: set of best practices to publish and connect structured data on the Web.
URIs to identify entities and concepts in the world HTTP to access and retrieve resources and descriptions of these resources RDF as generic graph-based data model to structure and link data
Taken together Linked Data is said to form a ‘cloud’ of shared references and vocabularies.
http://linkeddata.org/faq
Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
3 07.06.2012
Background: why is Linked Data important? Data.gov & public sector information:
more transparency and accountability in governance
BBC & media: added value of content through interlinking
Google, Yahoo, Bing & schema.org: enhanced search
Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
4 07.06.2012
Crowdsourcing Linked Data management
Tasks requiring human contributions Interlinking Conceptual modeling Labeling and translation Classification Ordering
Crowdsourcing already in use
Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
5 07.06.2012
Example: open query answering
Query FOAF data using the vCard vocabulary hp:Harry foaf:mbox <mailto:[email protected]> ;
foaf:nick "Harry" ; foaf:familyName "Potter" .
SELECT ?name ?email WHERE
{ ?p vcard:email ?email ; vcard:fn ?name }
In order to answer the query as intended
Vocabulary mapping and entity resolution (FOAF to vCard) Metadata completion (full name is “Harry Potter”)
Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
6 07.06.2012
Crowdsourcing-enabled query answering
• Integral part of a query engine At design time application developer specifies which data portions workers can process and via which types of HITs At run time
The system materializes the data Workers process it Data and application are updated to reflect crowdsourcing results
Formal, declarative description of the data and tasks using SPARQL patterns as a basis for the automatic design of HITs Reducing the number of tasks through automatic reasoning
Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
7 07.06.2012
Example: Identity resolution
Identity resolution involves the creation of links, either by comparison of metadata or by investigation of links on the human Web.
{?station a metar:Station; rdfs:label ?slabel; wgs84:lat ?slat; wgs84:long ?slong . ?airport a dbp-owl:Airport; rdfs:label ?alabel; wgs84:lat ?alat; wgs84:long ?along}
Input:
{OPTIONAL {?airport owl:sameAs ?station}}
Output:
Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
8 07.06.2012
Example: Classification
Classification of entities to classes cannot be always automatically inferred from the schema.
{?station a metar:Station; rdfs:label ?label; wgs84:lat ?lat; wgs84:long ?long}
Input:
{?station a ?type. ?type rdfs:subClassOf metar:Station}
Output:
Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
9 07.06.2012
Challenges Decomposition of queries
Query optimisation obfuscates what is used and should involve costs for human tasks
Query execution and caching Naively we can materialise HIT results into datasets How to deal with partial coverage and dynamic datasets
Appropriate level of granularity for HITs design for specific SPARQL constructs and typical functionality of Linked Data management components Optimal user interfaces of graph-like content
(Contextual) Rendering of LOD entities and tasks Pricing and workers’ assignment
Can we connect the end-users of an application and their wish for specific data to be consumed with the payment of workers and prioritization of HITs? Dealing with spam / gaming
Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
10 07.06.2012
QUESTIONS