releasing relational data to the semantic web

73
Releasing Relational Data to the Semantic Web Alex Miller [email protected] 1

Upload: alex-miller

Post on 20-Aug-2015

2.412 views

Category:

Technology


4 download

TRANSCRIPT

Releasing Relational Data to the Semantic Web

Alex [email protected]

1

Semantic webRelational data

FederationHREIW

Analytics

2

3

There are things we wish to describe.

We need some way to identify each thing.

4

On the web, we identify things with a URI.

5

A URI is about "identifying" things, not "locating" things (a URL).

6

dbp:Chicago

dbp:The_Blues_Brothers_(film)

dbp: http://dbpedia.org/resource/

dbp:Wrigley_Field

dbp:Chicago_Cubs dbp:Barack_Obama

dbp:Pizza

dbp:Chicago_(band)

Things are more interesting if we relate

them.

Relationships are also described by a URI.

7

8

dbp:Chicago

dbp:The_Blues_Brothers_(film)dbp:Wrigley_Field

dbp:Chicago_Cubsdbp:Barack_Obama

dbp:Pizza

dbp:Chicago_(band)

movie:

film_lo

catio

ndbpo:location

dbp: http://dbpedia.org/resource/dbpo: http://dbpedia.org/ontology/

dbpo:owner

dbpo:residence

Triple

<subject> <predicate> <object>

9

10

dbp:Chicago

dbp:The_Blues_Brothers_(film)dbp:Wrigley_Field

dbp:Chicago_Cubsdbp:Barack_Obama

dbp:Pizza

dbp:Chicago_(band)

movie:

film_lo

catio

ndbpo:location

dbp: http://dbpedia.org/resource/dbpo: http://dbpedia.org/ontology/

dbpo:owner

dbpo:residence

Subject

Predicate

Object

11

dbp:Wrigley_Field dbpo:location dbp:Chicago

<subject> <predicate> <object>

resource resource resourceor

value

12

dbp:Chicago

dbp:The_Blues_Brothers_(film)dbp:Wrigley_Field

dbp:Chicago_Cubsdbp:Barack_Obama

dbp:Pizza

dbp:Chicago_(band)

movie:

film_lo

catio

ndbpo:location

dbp: http://dbpedia.org/resource/dbpo: http://dbpedia.org/ontology/

dbpo:owner

dbpo:residence

13

Congratulations! You now know

RDF.

14

If things and relationships can be defined by any URI, how do we know

what we're talking about?

15

We need metadata.

16

Specifically, we need a vocabulary of common

terms that describe our data.

17

A class describes a group of things that

share common properties.

18

dbp:Chicago dbp:Saint_Louis

dbp: http://dbpedia.org/resource/ex: http://example.org/ontology/rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#rdfs: http://www.w3.org/2000/01/rdf-schema#

is a

ex:City

dbp:San_Francisco

is a is a

19

dbp:Chicago dbp:Saint_Louis

dbp: http://dbpedia.org/resource/ex: http://example.org/ontology/rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#rdfs: http://www.w3.org/2000/01/rdf-schema#

rdf:type

ex:City

dbp:San_Francisco

rdf:type rdf:type

20

dbp:Chicago dbp:Saint_Louis

dbp: http://dbpedia.org/resource/ex: http://example.org/ontology/rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#rdfs: http://www.w3.org/2000/01/rdf-schema#

rdf:type

ex:City

dbp:San_Francisco

rdf:type rdf:type

rdfs:Class

rdf:type

21

ex:City

ex:Location

rdfs:subClassOf

dbp: http://dbpedia.org/resource/ex: http://example.org/ontology/rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#rdfs: http://www.w3.org/2000/01/rdf-schema#

rdfs:Classrdf:type

rdfs:Classrdf:type

22

Classes let us talk about kinds of things. Now we

need some way to describe attributes.

23

dbp: http://dbpedia.org/resource/ex: http://example.org/ontology/rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#rdfs: http://www.w3.org/2000/01/rdf-schema#

ex:City

dbp:United_States

dbp:Chicago

rdf:type

ex:founded1837

ex:country

24

dbp: http://dbpedia.org/resource/ex: http://example.org/ontology/rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#rdfs: http://www.w3.org/2000/01/rdf-schema#

ex:City

dbp:Chicago

rdf:type

ex:founded1837

rdf:Property

rdf:type

rdfs:domainrdfs:range

xsd:gYear

25

Congratulations! You now knowRDF Schema.

26

How do we find stuff in this data?

SPARQL

27

dbp:Chicagodbp:Wrigley_Fielddbp:Chicago_Cubs

dbpo:location

dbp: http://dbpedia.org/resource/dbpo: http://dbpedia.org/ontology/

dbpo:owner

rdf:type

ex:City

rdf:type

ex:Stadiumex:Baseball_Team

rdf:type

28

dbpo:locationdbpo:owner

ex:City

rdf:type

ex:Stadium

rdf:type

?owner ?stadium ?city

29

dbpo:locationdbpo:owner

ex:City

rdf:type

ex:Stadium

rdf:type

?owner ?stadium ?city

?owner dbpo:owner ?stadium . ?stadium dbpo:location ?city .

?stadium rdf:type ex:Stadium . ?city rdf:type ex:City .

30

dbpo:locationdbpo:owner

ex:City

rdf:type

ex:Stadium

rdf:type

?owner ?stadium ?city

?owner dbpo:owner ?stadium . ?stadium dbpo:location ?city .

?stadium rdf:type ex:Stadium . ?city rdf:type ex:City .

SELECT ?owner ?stadium ?cityWHERE { ?owner dbpo:owner ?stadium . ?stadium dbpo:location ?city . ?stadium rdf:type ex:Stadium . ?city rdf:type ex:City .}

31

Unions JoinsOuter joinsFilter with criteriaProject expressionsSortDuplicate removalSlice (limit / offset)Aggregates (grouping, etc)Subqueries

22

SPARQL

Semantic webRelational data

FederationHREIW

Analytics

32

33

Sounds interesting.But my data is in a relational database!

Music Database

34

MID First Last Inst_ID

1 Eddie Van Halen 10

2 Yo Yo Ma 20

3 Kenny G 30

Musicians:

IID Instrument Type

10 Guitar String

20 Cello String

30 Saxophone Woodwind

Instruments:

Musician Schema

35

music:Instrument

rdfs:domain

music:Musician

rdf:type

rdfs:Class rdf:Property

music:firstName

music:lastName

music:plays

music:instName

music:instType

rdf:type

rdfs:domain

rdfs:domain

rdfs:range

rdfs:domainrdfs:domain

Triples From Tables

36

MID First Last Inst_ID

1 Eddie Van Halen 10

2 Yo Yo Ma 20

3 Kenny G 30

Musicians:

artist:1 rdf:type music:Musicianartist:2 rdf:type music:Musicianartist:3 rdf:type music:Musician

Turn each key into a resource and specify the proper type of each resource:

IID Instrument Type

10 Guitar String

20 Cello String

30 Saxophone Woodwind

Instruments:

instrument:10 rdf:type music:Instrumentinstrument:20 rdf:type music:Instrumentinstrument:30 rdf:type music:Instrument

Triples From Tables

37

MID First Last Inst_ID

1 Eddie Van Halen 10

2 Yo Yo Ma 20

3 Kenny G 30

Musicians:

artist:1 music:firstName "Eddie"artist:1 music:lastName "Van Halen"artist:2 music:firstName "Yo Yo"artist:2 music:lastName "Ma"artist:3 music:firstName "Kenny"artist:3 music:lastName "G"

Turn each cell into a triple based on the key, property (mapped per column), and value:

IID Instrument Type

10 Guitar String

20 Cello String

30 Saxophone Woodwind

Instruments:

instrument:10 music:instName "Guitar"instrument:10 music:instType "String"instrument:20 music:instName "Cello"instrument:20 music:instType "String"instrument:30 music:instName "Saxophone"instrument:30 music:instType "Woodwind"

Triples From Tables

38

MID First Last Inst_ID

1 Eddie Van Halen 10

2 Yo Yo Ma 20

3 Kenny G 30

Musicians:

artist:1 music:plays instrument:10artist:1 music:plays instrument:20artist:2 music:plays instrument:30

Turn each foreign key reference into a relationship between the foreign and primary resources.

IID Instrument Type

10 Guitar String

20 Cello String

30 Saxophone Woodwind

Instruments:

R2RML

39

• "Relational to RDF Mapping Language"

• RDB2RDF Working Group at W3C

• ETL "data transformation" use case

• Dynamic "query translation" use case

• SPARQL to SQL

R2RML Triple Mapping

40

IID Instrument Type

10 Guitar String

music:Instrumentmusic:instName

music:instType

rdfs:domain

rdfs:domain

Instruments:

R2RML Triple Mapping

40

IID Instrument Type

10 Guitar String

music:Instrumentmusic:instName

music:instType

rdfs:domain

rdfs:domain

Instruments:

Triples Map rr:tableName

R2RML Triple Mapping

40

IID Instrument Type

10 Guitar String

music:Instrumentmusic:instName

music:instType

rdfs:domain

rdfs:domain

Instruments:

Triples Map

Subject Map"http://example.com/music/

Inst-{iid}"

rr:class

rr:tableName

R2RML Triple Mapping

40

IID Instrument Type

10 Guitar String

music:Instrumentmusic:instName

music:instType

rdfs:domain

rdfs:domain

Instruments:

Triples Map

Subject Map"http://example.com/music/

Inst-{iid}"

Predicate Object Map

Predicate Map

Object Map

rr:class

rr:tableName

rr:predicate

rr:column

@prefix rr: <http://www.w3.org/ns/r2rml#> .@prefix music: <http://example.com/music/> .@prefix mapping: <http://example.com/ont/> .

mapping:InstrumentMapping a rr:TriplesMapClass; rr:tableName "Instruments"; rr:subjectMap [ rr:template "http://example.com/music/Inst-{iid}"; rr:class music:Instrument ]; rr:predicateObjectMap [ rr:predicateMap [ rr:predicate music:instName ]; rr:objectMap [ rr:column "instrument" ]; ]; rr:predicateObjectMap [ rr:predicateMap [ rr:predicate music:instType ]; rr:objectMap [ rr:column "type" ]; ];.

41

SPARQL translation

42

R2RML

SPARQL

SQL Results

Solutions

Database

Semantic webRelational data

FederationHREIW

Analytics

43

SPARQL Protocol

44

• Standard HTTP API for calling a SPARQL processor

• Supported by all major triple stores and query processors

SPARQL Federation

45

SELECT ?artist ?song ?buyLinkWHERE { SERVICE <http://listening> { ?listened rdf:type listen:event . ?listened listen:artist ?artist . ?listened listen:song ?song } OPTIONAL { SERVICE <http://amazon> { ?isbn rdf:type amaz:mp3 . ?isbn amaz:artist ?artist . ?isbn amaz:song ?song . ?isbn amaz:link ?buyLink } } }

Call SPARQL endpoint that tracks your listening (like last.fm)

Call Amazon endpoint to get info on where to download the song.

Return Federated data

Service Descriptions

46

Federator

47

R2RML Endpoint

Web Endpoint

SPARQL Endpoint

Triple Store

SPARQL Endpoint

Federator

Database

Dbpedia

Ontology and service registry

Named graph mapping

• Services can provide named graphs, described in their service description

• Federator lets you create federated named graphs that map to service named graphs

48

Data integration

• Performance - data volume from sources is key

• Source capabilities

• Source statistics

49

Performance concerns: data volume

50

Domain

SELECT ... FILTER (?age >= 24) ...

WHERE Person.age >= 24

Reduction factors:•criteria•minimal projection•aggregation•joins (sometimes)•dup removal

Que

ry

Resu

lts

Performance concerns: federated joins

51

Data source capabilities

• SQL support

• Function support

• Function translation

• Inverse functions

• Data type mappings and translations

52

Data source statistics

• Table cardinality

• Column selectivity

• Column null density

• Join selectivity

53

Semantic webRelational data

FederationHREIW

Analytics

54

HREIW - HR Analysis

55

“Which Marines that speak French and/or French Creole have had at least six months since their last deployment?”

56

“How many discharges were the result of the Don’t Ask Don’t Tell

policy per year?”

57

“What is the average length of service for soldiers deployed in Afghanistan vs Iraq?”

58

Where’s the data?

59

Ontologies

HR Domain

Mapping

Sources

HR Standards

60

Technologies

HR Domain

Mapping

Sources

HR Standards

61

Ontology

development

Analytics

SPARQL Federation

SPARQL to database

Rules

Collaborative Ontologies

Domain ontology

Ontologist Subject Matter Experts

model

discuss

wiki discuss

62

diagram

Ontology Visualization

63

Semantic webRelational data

Federation HREIW

Analytics

64

RIF

• Rule Interchange Format, W3C recommendation

• Rule = IF - THEN statement

• Used to derive new triples from existing triples

• Dialects

• Core

• Framework for Logic Dialects (FLD)

• Basic Logic Dialect (BLD)

• Production Rules Dialect (PRD)

• Rex - Revelytix RIF Core implementation

65

Dashboards

66

Dashboards

67

Dashboards

68

Enterprise Semantic Web

69

• Knoodl - collaborative ontology creation

• OntVis - ontology visualization (OWL)

• Spyder - SPARQL to SQL (RDF, R2RML)

• Federator - SPARQL federation (SPARQL 1.1, SPARQL Federation extensions)

• Rex - entailment with rules (RIF)

• Dashboards - analytics, visualization

More information

70

• Revelytix - http://revelytix.com

• Knoodl - http://knoodl.com

• OntVis - http://bit.ly/hLm3sd

• Spyder - http://revelytix.com/content/spyder

• Federator - beta coming soon...

• Rex - beta coming soon...