logics for data and knowledge representation

28
Logics for Data and Knowledge Representation Introduction to Semantic Web Fausto Giunchiglia Feroz Farazi

Upload: dex

Post on 25-Feb-2016

31 views

Category:

Documents


4 download

DESCRIPTION

Logics for Data and Knowledge Representation. Introduction to Semantic Web. Fausto Giunchiglia Feroz Farazi. Semantic Web. Definitions. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Logics for Data and  Knowledge Representation

Logics for Data and Knowledge Representation

Introduction to Semantic Web Fausto Giunchiglia

Feroz Farazi

Page 2: Logics for Data and  Knowledge Representation

Semantic Web An extension of the WWW, in which information is given well-defined

meaning, better enabling computers and people to work in

cooperation [T. Berners-Lee et al., 2001]

A new form of Web content that is computer comprehensible will

open up a revolution of new possibilities [T. Berners-Lee et al., 2001]

An alternative approach to represent Web content in machine

processable way, and to use intelligent techniques to take advantage

of these representations [G. Antoniou and F.v. Harmelen, 2004]

An extra abstraction layer, a so-called semantic layer, to be built on

top of the Web [F. Giunchiglia et al., 2010]

Definitions

Page 3: Logics for Data and  Knowledge Representation

Semantic Web Semantics

Data and documents are assigned semantics

Semantics are codified as metadata

Logic Logic as a tool for expressing knowledge and semantics

Ontology A set of terms and semantic relations among them

ZIP code and postal code are equivalent for example

Language and Vocabulary Semantic Web Languages (e.g., RDF and OWL)

Standard Vocabularies (e.g., Dublin Core and FOAF)

Keys

Page 4: Logics for Data and  Knowledge Representation

World Wide Web An enormous collection of data and documents

Any kind Mixed Keeps growing Open to all

Suffers from some well known limitations in information Searching Extracting Maintaining Unveiling

With all this limitations and features it is quite useful and interesting

Nevertheless, for better user experience we want to build a more

integrated and consistent Web

Page 5: Logics for Data and  Knowledge Representation

Dumb Web to Smart Web Consider that you are planning vacation to major excavation region of

Heraklion in Crete Island Find a list of hotels by location List shows your known hotel chain Aldemar has a branch there Unfortunately, you do not see it in Aldemar’s website What would you call it? Dumb? Here with dumb we mean inconsistent

Consider that you are planning a conference trip to Crete Island You find many branches of Aldemar in the surroundings of the conference venue You wonder to know the nearest (minimum walking distance) one You can find many mapping sites (e.g., Google Map) answering the distance with

the addresses given in input You are the one spending time in copying and pasting addresses on the site. Can

we make it any better?

[D. Allemang and J. Hendler, 2008]

Page 6: Logics for Data and  Knowledge Representation

Dumb Web to Smart Web Suppose you wonder to know the municipalities in the Autonomous

Province of Trento municipalities in the province of Trento were reorganized in 2010

these were reduced from 223 to 217

still many sites listing the former statistics instead of the latter

because information is hard-coded in the html pages or retrieved from the databases

of the authorities to represent them on the web

in way for human consumption only

not for the machines, which hinders other parties to update changes automatically

Considering all the above what do we opt to build a smart web? Smart applications or smart Web infrastructure?

Why

Page 7: Logics for Data and  Knowledge Representation

Smart Web Applications The Web is overwhelmed with smart applications, in addition day to

day new ones are coming to the scene

Great advancement achieved in the implementation of the ideas once

considered very hard to do or will never happen

To name a few applications Search engines’ matches are non-trivial, seem deep and intuitive Commerce sites recommend intelligently considering customer purchase patterns Mapping sites can plan routes and provide detailed information about geography

What role the Web infrastructure can play? All these smart applications are only as smart as the data provided to them Inconsistent data will lead to dumb result even from smart applications Web infrastructure needs to be improved to support better consistency of the data the

fact that smart applications can perform to their potential

Page 8: Logics for Data and  Knowledge Representation

Smarter Web A Web with an infrastructure that enhances the whole Web

experience by enabling connections among data

letting users connect data to smart Web applications

not surprising us with inconsistencies

In the case of Aldemar hotel branch in the major excavation region of

Heraklion we need a coordination between the Aldemar site and the hotel listing site by location in the level of data

that would help updating the list when there is a change in the location of hotels

In the mapping site scenario, we would like it to understand the data from the conference and the hotels sites

without requiring human intervention in copying and pasting

Page 9: Logics for Data and  Knowledge Representation

Semantic Data and Web of Data Semantic data is computer understandable data

e.g., representing the hotels as real world entities and their addresses as attributes

in Semantic Web languages using standard vocabularies

e.g., representing each municipality of Trento as part_meronym of the province,

entity-entity connectivity within a dataset

The Semantic Web is a web of interconnected datasets where one data element can point to another (through URIs), rather than a webpage

points to another, forming a web of data

the Web infrastructure provides a data model supporting a single entity can be

distributed over the Web

the data model coherence is part of the Web infrastructure

Page 10: Logics for Data and  Knowledge Representation

Linked Data Linked Data approach form the basis of data publishing guidelines

pinpointing how can data from government, public and private sectors

be more valuable for the consumers

Linked Data approach came up with a set of principles

the star rating system

Principles the use of http URIs as the identifiers of things (concepts, entities and attributes)

the provision of meaningful content published in RDF for each such URI reference

the production of navigable content via links

Page 11: Logics for Data and  Knowledge Representation

Linked Open Data The star rating system is a system that rates the published data in a

scale from 1-star to 5-star Getting 1-star requires publishing data on the Web with an open license regardless of

format, e.g., datasets can be published as images; this is also called Open Data

Producing 2-star data requires the Open Data to be made available in structured

format (e.g., excel; proprietary) in order to make it become machine readable

Producing 3-star data requires non-proprietary formats, e.g., csv or tsv, on top of the

previous rating levels

Getting 4-star requires publishing data using W3C open standards, e.g., RDF

Achieving 5-star, the highest level in the rating spectrum, demands establishing links

to RDF datasets published by others

A dataset that reaches 5-star is also called Linked Open Data

Page 12: Logics for Data and  Knowledge Representation

A World of Entities

EntitypediaLinked Entities

Page 13: Logics for Data and  Knowledge Representation

What is an entity?We organize our world (ground) knowledge around entities

»Entities are objects which are so important in our everyday life to be referred with a name»Each entity has its own metadata (e.g. name, latitude, longitude, height…)»Each entity is in relation with many other entities (e.g. Eiffel Tower is located in Paris, Fausto is a friend of Raffaella)»There are relatively “few” commonsense entity types (person, …, event)»There are many application/focus dependent entities (artifacts, maths, ..)

Eiffel Tower

Page 14: Logics for Data and  Knowledge Representation

Entitypedia – the key ideas• Clear separation between the

– knowledge (about entities/instances) and the– language (classes/concepts) used to express the knowledge

• Knowledge as very carefully designed (2)– Lattice of entity types (attributes, relations, services)– … unifying most (all?) standards (de jure, de facto) (Dublin Core,

FOAF, Facebook, …)• Language as very carefully designed (1)

– Linguistic resource (Wordnet + (Corelex + homographs) + multiple NLs)

– … + a faceted domain Knowledge organization infrastructure, developed using the analytico-synthetic approach (extending Library Science PMEST/DEPA frameworks)

• Direct linear time encoding into RDF/DL (3)– but (!) with fine tuned very fast data structures (for search, entity

matching, …)• (Relatively) large scale bootstrapping + continuous evolution (4)

– via system-sourcing and crowd-sourcing (under study now)• Data certification (5)

– … via quality certification pipeline (under study now)

Page 15: Logics for Data and  Knowledge Representation

Natural language and formal language

AUTOMOBILE CAR MACCHINA

The same concept can be expressed in different ways in the same language and

across languages

Different languages and terminology

Page 16: Logics for Data and  Knowledge Representation

Formal language: domainsDERA domains (D for Domain) organize the (formal concept) language into any number of domains (“any area of knowledge, chosen subjectively, that we want to reason or communicate about”). Examples: medicine. music, pop music, people, Movies, skiing, my garden …

LOCATION

MONUMENT

BODY OF WATER

RIVER

EIFFEL TOWER

COLOSSEUM

GARDA LAKE

MISSISSIPI

AMAZON RIVER

A fragment of the Space Domain

» Inspired by Ranganatan faceted approach

» Following precise design principles (analytico-synthetic approacch)

» Organize entities as classes of similar objects

» Independent of the specific chosen domains

» Lattice of (overlapping) domains » Top level domain = upper level

ontology

Page 17: Logics for Data and  Knowledge Representation

Formal language: Facets» A DERA Domain contains any number of facets (hierarchy of terms

each denoting an atomic concept – often corresponding to a NL multiword)

» A DERA Facet is of one of three types (E for Entity, R for Relation, A for Attribute)

LOCATION

MONUMENT

BODY OF WATER

RIVER

EIFFEL TOWER

COLISEUM

GARDA LAKE

MISSISSIPI

AMAZON RIVER

A fragment of an entity facet in the Space Domain

» Entity: see picture (classes of entities and entities)

» Relation: Far, near, east, … with roles playing the double role of entity and relation

» Attribute: qualities / quantities (high, low, 23m,) , descriptive attributes (“India is a democratic country”)

Page 18: Logics for Data and  Knowledge Representation

User interface

Page 19: Logics for Data and  Knowledge Representation

Knowledge» A set of entity types, each entity type defined in terms of:˃ Attributes (e.g., height, lattitude)˃ Relations (e.g., locatedIn, friend)˃ Services (e.g., computeAge, computeFoFs, computeInverseRelation, ..)˃ Many (categories of) metaattributes (e.g., mandatory, identifying,

permanent, timespan, provenance, …)

» Entity types organized in a lattice ˃ coherent with the domain lattice˃ With an ordering on <attibutes, relations, services> but also subsupmption,

value ranges, …

» Entities:˃ A name and a URI˃ Etype <attributes, relations, services> plus free ˃ One reference etype and many induced etypes

Page 20: Logics for Data and  Knowledge Representation

Knowledge services» CRUD on entities» EntitySearch(“metadata of E1”) (*useful in NER

*)» EntityMatch (E1, E2)» Etypes (“some element of an entity”)» Extension (etype) (* same as search(etype)

*)» Navigate (E1, R) (* Navigate (Fausto,

Friends) *)» Distance(E1,E2,R) (* Distance(Fausto, Obama, Friend)

*)» … » … many etype and application dependent services

Page 21: Logics for Data and  Knowledge Representation

Entity type lattice

Page 22: Logics for Data and  Knowledge Representation

Some examples of etypes

ENTITYName String [ ] Description SString [ ] Part Of <Entity> Homepage URL [ ] Start Moment End Moment Duration Duration

EVENT extends ABSTRACT ENTITYParticipant <Person> [ ] | <Organization> [ ]Location <Location> Status Enum <SString>…

LOCATION extends PHYSICAL ENTITYLatitude floatLongitude floatAltitude float…

PHYSICAL ENTITY extends ENTITY Height floatLength floatWidth floatWeight float

Page 23: Logics for Data and  Knowledge Representation

Example of entities

ETH Zurich

UNIVERSITY

Albert Einstein Mileva Maric

Ulm Germanypart-of

birth place

spouse

affiliation

SCIENTIST PERSON

CITY COUNTRY

Page 24: Logics for Data and  Knowledge Representation

A critical issue: dot-objects

ETH Zurich

UNIVERSITY(as organization)

UNIVERSITY(as building)

Some entities have a clear inherent polysemy (Pustejovski)

» According to the situation either one aspect or the other (typically the physical or abstract aspect) of the entity is emphasized. This generates polysemy in language.

» Since it depends on the situation, it would be wrong to permanently disambiguate it in one or the other way

» We need a systematic way to represent these entities

Page 25: Logics for Data and  Knowledge Representation

Encoding into RDF» Choose (sub)domain» E facet translates into TBOX concept subsumption

axioms (e.g., river LG “body of water”)» R facet translates into TBOX role subsumption (e.g.,

parentOf MG fatherOf)» A facet translates into TBOX subsumption (e.g.,

angularDistance MG latitude)» Entity properties translate into ABOX axioms (e.g.,

livesIn(Fausto, Trento)

NOTE: Used only for interoperability, open data, … reasoning on native data structures as specific purpose services

Page 26: Logics for Data and  Knowledge Representation

Features of a Semantic Web Radical new way of thinking about representing information for better

results and better management

The feature of the Web is characterized by AAA Slogan (Anyone can

say Anything about Any topic)

On the Semantic Web any individual has to be allowed to contribute a

piece of data about some entity that can be linked to the information

from other sources

This requirement was taken into account while designing RDF

has a consequence that there is always one more (something new that someone

will express) could be known – Open World Assumption

Page 27: Logics for Data and  Knowledge Representation

RDF RDF (Resource Description Framework)

– A language for representing data in the Semantic Web

– a simple data model for making statements

– the capability to perform inference on the statements

Data model in RDF

– The data model in RDF is a graph data model

– An edge with two connecting nodes form a triple

– Triple elements are subject, object and predicate

RDF representation

– URIs to identify subjects, objects and predicates

– Objects can be Literals

Page 28: Logics for Data and  Knowledge Representation

References T. Berners-Lee, J. Hendler, & O. Lassila (2001, May). The Semantic

Web. Scientific American 284,34–43. G. Antoniou & F. van Harmelen (2004). A Semantic Web Primer

(Cooperative Information Systems). MIT Press, Cambridge MA, USA. F. Giunchiglia, F. Farazi, L. Tanca, and R. D. Virgilio. The semantic

web languages. In Semantic Web Information management, a model based perspective. Roberto de Virgilio, Fausto Giunchiglia, Letizia Tanca (Eds.), Springer, 2009.

D. Allemang and J. Hendler. Semantic web for the working ontologist: modeling in RDF, RDFS and OWL. Morgan Kaufmann Elsevier, Amsterdam, NL, 2008.

T. Berners-Lee. Linked Data. Design Issues for the World Wide Web - W3C, http://www.w3.org/DesignIssues/LinkedData.html, 2006.