lecture linked data cloud & sparql

79
COMP3725 Knowledge Enriched Information Systems Lecture 11-12: Linked Data & SPARQL Dhavalkumar Thakker (Dhaval) School of Computing, University of Leeds 1

Upload: dhavalkumar-thakker

Post on 20-Aug-2015

2.145 views

Category:

Documents


14 download

TRANSCRIPT

Page 1: Lecture linked data cloud & sparql

1

COMP3725Knowledge Enriched Information

Systems

Lecture 11-12: Linked Data & SPARQL

Dhavalkumar Thakker (Dhaval)School of Computing, University of Leeds

Page 2: Lecture linked data cloud & sparql

2

Reading & Reflections

Bizer, et al. Linked Data – The Story so far• What is Linked Data?

– Is it same as Web of Data?

• What excited you most about linked data while reading this article? OR what did you find most interesting?

• Is Linked Data happening in real life? Have you seen this anywhere?

Page 3: Lecture linked data cloud & sparql

3

Outline

• What is Linked Data?• Why Linked Data?• How to publish as part of Linked Data

– Linked Data Principles– Finding existing sources– Possible software architectures– Query Language: SPARQL

Page 5: Lecture linked data cloud & sparql

5

About:• United States• Barack Obama• Presidential Election (Past)• Some relevance to currently

held• Democrats & Republicans• Winner & Looser• Chicago• Etc..

Web of Documents

THINGSAbout:• Location, Event, Places,

Persons, Groups, Abstract concepts (winning, losing)

Page 6: Lecture linked data cloud & sparql

6

..people can parse documents and extract meaning

Page 7: Lecture linked data cloud & sparql

7

The web of documents

• Analogy– Global file system

• Designed for– Human consumption

• Primary objects– documents

• Links between– documents (or sub-parts of)

• Semantics– implicit

Page 8: Lecture linked data cloud & sparql

8

The web of documents: Issues

• Web of Documents but primarily About Data– But the connection is implicit

• Integration & Querying– Show me all the news stories by US Presidents

coming from Chicago?

Page 9: Lecture linked data cloud & sparql

9

Semantic Web

• We need to help machines to understand the web..so machines can help us to understand things.

• If machines have access to the data about things (i.e. knowledge) then they can do better job while processing documents

Page 10: Lecture linked data cloud & sparql

10

Linked Data

An introduction to Linked Data- Tim Heath, Talis

Linking Things

Thing

relationshiplinks

Thing

Thing

Thing

Thing

Thing Thing

Thing

Thing

Thing

relationshiplinks

relationshiplinks

relationshiplinks

Page 11: Lecture linked data cloud & sparql

11

Linked Data…

• …. is about creating global database of linked things

• …refers to a set of best practices for publishing and interlinking data on the Web…

• ….is a method of publishing data [on the Web], so that it can be interlinked and become more useful.

Page 12: Lecture linked data cloud & sparql

12

The Web of Linked Data

• Analogy– a global database

• Designed for– machines first, Humans later

• Primary objects– things (or descriptions of things)

• Links between– things

• Semantics– explicit

Page 13: Lecture linked data cloud & sparql

13

Linked Data: Technologies

• Pre-requisite– URIs– HTTPs– RDF– (RDFS/OWL)

Page 14: Lecture linked data cloud & sparql

14

Linked Data Technologies : URIs

• Like URLs but not just for Web pages– For things (cars, people, places, organisations,

coursework, etc.)• “A Uniform Resource Identifier (URI) provides a

simple and extensible means for identifying a resource.” -- RFC 3986

• Many different schemes – http://, ftp://, mailto:• Examples:

http://imash.leeds.ac.uk/ontologies/foaf/dhaval/me.rdfhttp://dbpedia.org/resource/University_of_Leeds

Page 15: Lecture linked data cloud & sparql

15

HTTP

• Data access mechanism between web browsers (client) and servers

• HTTP messages consists of requests from client to servers and responses from servers to clients

• HTTP request/response methods: GET, POST, etc.

Page 16: Lecture linked data cloud & sparql

16

RDF

• Data format to describe things and their interrelations

• is based on triples• Subject, predicate, object• <The sky> <has the colour> <blue>

Page 17: Lecture linked data cloud & sparql

17

RDF

dt:dhaval foaf:Personrdf:type

Dhaval Thakkerfoaf:name

dbpedia:Leedsfoaf:based_near

Prefixesdt: < http://imash.leeds.ac.uk/ontologies/foaf/dhaval/ me.rdf#>rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>foaf: <http://xmlns.com/foaf/0.1/>dbpedia: <http://dbpedia.org/resource/>

From my profile in RDF

Page 18: Lecture linked data cloud & sparql

18

Data Merging with RDF

dt:dhaval foaf:Personrdf:type

Dhaval Thakkerfoaf:name

dbpedia:Leedsfoaf:based_near

dbp-prop:population

dbp-prop: is part of dbpedia:West_Yorkshire

751,500

dbpedia:Leeds

From Dbpedia

From my profile in RDF

Prefixesdt: < http://imash.leeds.ac.uk/ontologies/foaf/dhaval/ me.rdf#>rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>foaf: <http://xmlns.com/foaf/0.1/>dbpedia: http://dbpedia.org/resource/dbp-prop: <http://dbpedia.org/ontology/>

Page 19: Lecture linked data cloud & sparql

19

Data Merging with RDF

dt:dhaval foaf:Personrdf:type

Dhaval Thakkerfoaf:name

dbpedia:Leedsfoaf:based_near

Prefixesdt: < http://imash.leeds.ac.uk/ontologies/foaf/dhaval/ me.rdf#>rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>foaf: <http://xmlns.com/foaf/0.1/>dbpedia: http://dbpedia.org/resource/dbp-prop: <http://dbpedia.org/ontology/>

dbp-prop:population

dbp-prop: is part of dbpedia:West_Yorkshire

751,500

dbpedia:Leeds

From Dbpedia

From my profile in RDF

Page 20: Lecture linked data cloud & sparql

20

Linked Data Principles

• Use URIs as names for things– anything, not just documents

• Use HTTP URIs– globally unique names, distributed ownership– allows people to look up those names

• Provide useful information in RDF– when someone looks up a URI

• Include RDF links to other URIs– to enable discovery of related information

Tim Berners-Lee 2007http://www.w3.org/DesignIssues/LinkedData.html

Page 21: Lecture linked data cloud & sparql

21

Linked Data Principles

• Use URIs as names for things– anything, not just documents

• Use HTTP URIs– globally unique names, distributed ownership– allows people to look up those names

Tim Berners-Lee 2007http://www.w3.org/DesignIssues/LinkedData.html

Page 22: Lecture linked data cloud & sparql

22

Linked Data Principles

• Use URIs as names for things– anything, not just documents

• Use HTTP URIs– globally unique names, distributed ownership– allows people to look up those names

• Provide useful information in RDF– when someone looks up a URI

Tim Berners-Lee 2007http://www.w3.org/DesignIssues/LinkedData.html

Page 23: Lecture linked data cloud & sparql

23

Provide useful information in RDF

dt:me foaf:Personrdf:type

Dhaval Thakkerfoaf:name

dbpedia:Leedsfoaf:based_near

Prefixesdt: < http://imash.leeds.ac.uk/ontologies/foaf/dhaval/ me.rdf#>rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>foaf: <http://xmlns.com/foaf/0.1/>dbpedia: <http://dbpedia.org/resource/>

From my profile in RDF

http://imash.leeds.ac.uk/ontologies/foaf/dhaval/ me.rdf#me

Page 24: Lecture linked data cloud & sparql

24

RDF is Data Model, Not Serialisation Format

• RDF Serialisation Formats : RDF/XML, Turtle, N-Triples

– RDF/XML

<rdf:RDF

xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#

xmlns:foaf=http://xmlns.com/foaf/0.1 />

<foaf:Person rdf:ID="me">

<foaf:name>Dhavalkumar Thakker</foaf:name>

<foaf:title>Dr</foaf:title>

<foaf:based_near rdf:resource="http://dbpedia.org/resource/Leeds"/>

Page 25: Lecture linked data cloud & sparql

25

RDF is Data Model, Not Serialisation Format

• RDF Serialisation Formats : RDF/XML, Turtle, N-Triples

– Turtle

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix dt: < http://imash.leeds.ac.uk/ontologies/foaf/dhaval/me.rdf#>

dt:me rdf:type foaf:Person ;

foaf:name “Dhavalkumar Thakker" ; foaf:title “Dr" .

Page 26: Lecture linked data cloud & sparql

26

RDF is Data Model, Not Serialisation Format

• RDF Serialisation Formats : RDF/XML, Turtle, N-Triples

– N-Triples

< http://imash.leeds.ac.uk/ontologies/foaf/dhaval/me.rdf#me> <xmlns:foaf=http://xmlns.com/foaf/0.1#name> “Dhavalkumar Thakker”.

< http://imash.leeds.ac.uk/ontologies/foaf/dhaval/me.rdf#me>

< http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <xmlns:foaf=http://xmlns.com/foaf/0.1#Person>.

Page 27: Lecture linked data cloud & sparql

27

Linked Data Principles

• Use URIs as names for things– anything, not just documents

• Use HTTP URIs– globally unique names, distributed ownership– allows people to look up those names

• Provide useful information in RDF– when someone looks up a URI

• Include RDF links to other URIs– to enable discovery of related information

Tim Berners-Lee 2007http://www.w3.org/DesignIssues/LinkedData.html

Page 28: Lecture linked data cloud & sparql

28

Including Links to other Things: Relationship Links

• Relationship Links point at related things in other data sources, for instance, other people, places or genes.

• For example, relationship links enable people to point to background information about the place they live, or to bibliographic data about the publications they have written.

Page 29: Lecture linked data cloud & sparql

29

Including Links to other Things: Relationship Links

dt:dhaval foaf:Personrdf:type

Dhaval Thakkerfoaf:name

dbpedia:Leedsfoaf:based_near

Prefixesdt: < http://imash.leeds.ac.uk/ontologies/foaf/dhaval/ me.rdf#>rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>foaf: <http://xmlns.com/foaf/0.1/>dbpedia: http://dbpedia.org/resource/dbp-prop: <http://dbpedia.org/ontology/>

dbp-prop:population

dbp-prop: is part of dbpedia:West_Yorkshire

751,500

dbpedia:Leeds

From Dbpedia

From my profile in RDF

Page 30: Lecture linked data cloud & sparql

30

Including Links to other Things: Identity Links

• Different URIs may refer to the same object

<URI1> in one dataset

is same as

<URI2> defined somewhere else

<http://dbpedia.org/resource/Kirkgate_Markets> <owl:sameAs> <http://rdf.freebase.com/ns/guid.9202a8c04000641f8000000000c5f680>

• Such a need exists due to:– Different opinions. – Traceability. – No central points of failure. 

Page 31: Lecture linked data cloud & sparql

31

Including Links to other Things: Vocabulary Links

• Reusing existing Vocabularies to further specify yours

<htttp://mydomain.co.uk/myvocab/enterprise#SmallMediumEnterprise>

rdfs:subClassOf

<http://dbpedia.org/ontology/Company>;

rdfs:subClassOf

<http://umbel.org/umbel/sc/Business> ;

rdfs:subClassOf

<http://rdf.freebase.com/ns/m/0qb7t>.

Page 32: Lecture linked data cloud & sparql

32

Linked Data Principles: Summary

• Use URIs as names for things– anything, not just documents

• Use HTTP URIs– globally unique names, distributed ownership– allows people to look up those names

• Provide useful information in RDF– when someone looks up a URI

• Include RDF links to other URIs– to enable discovery of related information

RDF serialisation formats: RDF/XML, N-Triples & Turtle

Include Links:Relationship, Vocabulary & Identity Links

Page 33: Lecture linked data cloud & sparql

33

Finding Existing Datasets or Vocabularies

• All of the scenarios about including links to other things assume some sort of knowledge of existing vocabularies/datasets

• Where to Find such datasets?• How to Find such datasets?

– Two steps:• Find datasets/vocabularies that contain certain

Things or Concepts• Once found, how to inspect the coverage and

suitability

Page 34: Lecture linked data cloud & sparql

34

Where to Find: Web of Data

• A significant number of individuals and organisations have adopted Linked Data as a way to publish their data

• The result is a global data space we call the Web of Data

• The Web of Data forms a giant global graph consisting of billions of RDF triples from numerous sources covering all sorts of topics

Page 35: Lecture linked data cloud & sparql

35

Web of Data

http://richard.cyganiak.de/2007/10/lod/

Page 36: Lecture linked data cloud & sparql

36

Statistics about Web of Data (2011)Domain

Number of datasets

Triples (Out-)Links %

Media 25 1,841,852,061 50,440,705 10.01 %

Geographic 31 6,145,532,484 35,812,328 7.11 %

Government 49 13,315,009,400 19,343,519 3.84 %

Publications 87 2,950,720,693 139,925,218 27.76 %

Cross-domain 41 4,184,635,715 63,183,065 12.54 %

Life sciences 41 3,036,336,004 191,844,090 38.06 %

User-generated content

20 134,127,413 3,449,143 0.68 %

295 31,634,213,770 503,998,829

More statistics from: http://www4.wiwiss.fu-berlin.de/lodcloud/state/

Page 37: Lecture linked data cloud & sparql

37

Step1: Finding existing datasets and vocabularies: publishing sites-> Data Hub

Available from: http://datahub.io/

Page 38: Lecture linked data cloud & sparql

38

Step 1: Finding existing datasets and vocabularies: search engines-> Sindice

Available from: http://sindice.com/

Page 39: Lecture linked data cloud & sparql

39

Step 1: Finding existing datasets and vocabularies: search engines-> Sindice

Page 40: Lecture linked data cloud & sparql

40

Step 1: Finding existing datasets and vocabularies: search engines-> Sindice

Page 41: Lecture linked data cloud & sparql

41

Step 1: Finding existing datasets and vocabularies: search engines-> Falcon

Available from: http://ws.nju.edu.cn/falcons/conceptsearch/index.jsp

Page 42: Lecture linked data cloud & sparql

42

Finding existing datasets and vocabularies: search engines-> Watson

Available from: http://kmi-web05.open.ac.uk/WatsonWUI/

Page 43: Lecture linked data cloud & sparql

43

Finding existing datasets and vocabularies: search engines-> Swoogle

Available from: http://swoogle.umbc.edu/

Page 44: Lecture linked data cloud & sparql

44

Step 1: Finding existing datasets and vocabularies: search engines-> SWSE

Available from: http://swse.deri.org/

Page 45: Lecture linked data cloud & sparql

45

Step 2: Once found, how to inspect further for coverage, suitability

• Linked Data sources usually provides SPARQL endpoint for their dataset(s)

• SPARQL endpoint is an end point to dataset(s) that can receive query, and return results

• If you have used MySQL, you might be familiar with PhPMyAdmin– SPARQL endpoint are in similar in nature and

its functionality

Page 46: Lecture linked data cloud & sparql

46

Web of Data

http://richard.cyganiak.de/2007/10/lod/

Page 47: Lecture linked data cloud & sparql

http://en.wikipedia.org/wiki/Calgary

http://dbpedia.org/resource/Calgary

dbpedia:native_name Calgary”;

dbpedia:altitude “1048”;

dbpedia:population_city “988193”;

dbpedia:population_metro “1079310”;

mayor_name

dbpedia:Dave_Bronconnier ;

governing_body

dbpedia:Calgary_City_Council;

...

Dbpedia: Extracting Infobox

Page 48: Lecture linked data cloud & sparql

Dbpedia: SPARQL EndpointWeb address: dbpedia.org/sparql

Page 49: Lecture linked data cloud & sparql

49

SPARQL

• Query Language for RDF– Based on RDF Data Model

• Possible to write complex joins of disperate datasets

• Implemented by all major RDF databases

See more: http://www.w3.org/TR/rdf-sparql-query/

Page 50: Lecture linked data cloud & sparql

50

Structure of a SPARQL Query

Page 51: Lecture linked data cloud & sparql

51

#prefix declaration

prefix dbp-ont: <http://dbpedia.org/ontology/>

#result clause

SELECT *

#dataset definition

FROM <http://dbpedia.org>

#query pattern

WHERE {

dbp-ont:Person ?p ?o.

}

SELECT query: Find everything about Concept of “Person” as in Dbpedia

Page 52: Lecture linked data cloud & sparql

52

#prefix declaration

prefix dbp-ont: <http://dbpedia.org/ontology/>

#result clause

SELECT *

#dataset definition

FROM <http://dbpedia.org>

#query pattern

WHERE {

dbp-ont:Person ?p ?o.

}

SELECT query: Find everything about Concept of “Person” as in Dbpedia

Page 53: Lecture linked data cloud & sparql

53

#prefix declaration

prefix dbp-ont: <http://dbpedia.org/ontology/>

Prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>

#result clause

SELECT ?o

#dataset definition

FROM <http://dbpedia.org>

#query pattern

WHERE {

dbp-ont:Person rdfs:subClassOf ?o.

}

SELECT query: Find superclasses of Concept of “Person” as in Dbpedia

Page 54: Lecture linked data cloud & sparql

54

#prefix declaration

prefix dbp-ont: <http://dbpedia.org/ontology/>

Prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

#result clause

SELECT ?s

#dataset definition

FROM <http://dbpedia.org>

#query pattern

WHERE {

?s rdf:type dbp-ont:Person .

}

SELECT query: Find all persons in Dbpedia

Page 55: Lecture linked data cloud & sparql

55

#prefix declaration

prefix dbp-ont: <http://dbpedia.org/ontology/>

Prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

#result clause

SELECT ?s

#dataset definition

FROM <http://dbpedia.org>

#query pattern

WHERE {

?s rdf:type dbp-ont:Person .

?s rdf:type dbp-ont:Astronaut.

}

SELECT query: Find specific types of persons in Dbpedia

Some one who is Person & Astronaut

Page 56: Lecture linked data cloud & sparql

56

#prefix declaration

prefix dbp-ont: <http://dbpedia.org/ontology/>

Prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

#result clause

SELECT ?s

#dataset definition

FROM <http://dbpedia.org>

#query pattern

WHERE {

?s rdf:type dbp-ont:Person .

?s rdf:type dbp-ont:Astronaut.

?s dbp-ont:status "Retired"@en.

}

SELECT query: Find specific types of persons in Dbpedia

Some one who is Person & Astronaut& Retired

Page 57: Lecture linked data cloud & sparql

57

#prefix declaration

prefix dbp-ont: <http://dbpedia.org/ontology/>

Prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

#result clause

SELECT ?s

#dataset definition

FROM <http://dbpedia.org>

#query pattern

WHERE {

?s rdf:type dbp-ont:Person .

?s rdf:type dbp-ont:Astronaut.

?s dbp-ont:status "Retired"@en.

}

LIMIT 10

SELECT query: Find 10 of this, LIMIT

Some one who is Person & Astronaut& Retired

Page 58: Lecture linked data cloud & sparql

58

#prefix declaration

prefix dbp-ont: <http://dbpedia.org/ontology/>

Prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

#result clause

SELECT *

#dataset definition

FROM <http://dbpedia.org>

#query pattern

WHERE {

?s rdf:type dbp-ont:Person .

?s rdf:type dbp-ont:Astronaut.

?s dbp-ont:status "Retired"@en.

?s dbp-ont:birthDate ?date

} ORDER BY ?date,

LIMIT 10

SELECT query: Find 10 of this and order it by date: ORDER BY

Some one who is Person & Astronaut& Retired & youngest first

Page 59: Lecture linked data cloud & sparql

59

Mathematical operations & Filtering results• Find me all landlocked countries with a population greater

than 15 million , with the highest population country first

PREFIX type: <http://dbpedia.org/class/yago/>

PREFIX prop: <http://dbpedia.org/property/>

SELECT ?country_name ?population

WHERE

{ ?country a type:LandlockedCountries .

?country rdfs:label ?country_name .

?country prop:populationEstimate ?population .

FILTER (?population > 15000000 && langMatches(lang(?country_name), "EN")) . }

ORDER BY DESC(?population)

Page 60: Lecture linked data cloud & sparql

60

ASK query: Is India a Landlocked country?

• Is India a Landlocked country?• ASK query:

PREFIX yago: <http://dbpedia.org/class/yago/>

PREFIX prop: <http://dbpedia.org/property/>

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

ASK

{ <http://dbpedia.org/resource/India> rdf:type yago:LandlockedCountries.}

Replace with Afghanistan

DO NOT HAVE TO SPECIFY “WHERE”

Page 61: Lecture linked data cloud & sparql

61

Exercise: Write a SPARQL query

• Write a SPARQL query to retrieve all the bands that are of genre rock bands from Republic of Ireland.

Prefix dbpedia: <http://dbpedia.org/resource/>

Prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

Prefix dbp-onto: <http://dbpedia.org/ontology/>

Use following classes or properties

dbp-onto:Band, dbp-onto : genre. dbpedia:Rock_music, dbpedia:Republic_of_Ireland, dbp-ont:hometown

Page 62: Lecture linked data cloud & sparql

62

Exercise: Write a SPARQL query

• Write a SPARQL query to retrieve all the bands that are of genre rock bands from Republic of Ireland.

Prefix dbpedia: <http://dbpedia.org/resource/>

Prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

Prefix dbp-onto: <http://dbpedia.org/ontology/>

Select * where {

?s rdf:type dbp-onto:Band.

?s dbp-onto:genre dbpedia:Rock_music.

?s dbp-onto:hometown dbpedia:Republic_of_Ireland

}

Page 63: Lecture linked data cloud & sparql

63

Summary: Finding existing datasets/vocabularies

• Use of search engines to find a dataset • Use of SPARQL endpoints to inspect the

dataset further• SPARQL queries

– SELECT query for selecting a set of results to display

– ASK query to ask a specific question about something

– Variations in terms of LIMIT, ORDER BY

Page 64: Lecture linked data cloud & sparql

64

Publishing Linked Data: Software Architecture Patterns

• Follow linked data principles– They are good practice principles NOT norms

or rules

• The software architecture needs to support such way of publication– Existing architectures using structured or

unstructured data– doing it from scratch – publishing linked data – different from when working with existing

applications and infrastructure already in place

Page 65: Lecture linked data cloud & sparql

65

Architecture scenarios

Page 66: Lecture linked data cloud & sparql

66

Architecture scenarios

Page 67: Lecture linked data cloud & sparql

67

Type of data

• Structured data– Database tables– XML documents

• Unstructured data– Textual documents

• News stories, reports, textual descriptions – as textual files

Name Address Post code Author of

A ---- ------- Book B

Page 68: Lecture linked data cloud & sparql

68

Architecture scenarios

Page 69: Lecture linked data cloud & sparql

69

Query-able Structured Data to Linked Data

• Example: A movie business that has movie database in a relational database

• published relatively easily as Linked Data through the use of relational database to RDF wrappers.

• Maps database schemas to RDF schemas• Wrappers

– Virtuoso RDF Views – Triplify

Page 70: Lecture linked data cloud & sparql

70

Architecture scenarios

Page 71: Lecture linked data cloud & sparql

71

Static Structured Data to Linked Data

• A UK government department that has performance data of each department in excel sheets

• must undergo a conversion process that outputs static RDF files or loads converted data directly into an RDF store.

• RDFizing tools – http://www.w3.org/wiki/ConverterToRdf– Tools to convert data from various format to

RDF

Page 72: Lecture linked data cloud & sparql

72

RDF store

• Also called “triple store” or “semantic repository” • They are engines similar to the DBMS- they allow

for storage, querying, and management of structured data. Major differences:– they use ontologies as semantic schemata. This allows

them to automatically reason about the data.– they work with flexible and generic physical data

models (e.g. graphs). This allows them to easily interpret and adopt "on the fly" new ontologies or metadata schemata.

• Available RDF stores: OWLIM, Allegrograph, Virtuoso, Sesame, Jena TDB

Page 73: Lecture linked data cloud & sparql

73

Architecture scenarios

Page 74: Lecture linked data cloud & sparql

74

From Text Documents to Linked Data

• Example: News publisher with a corpus of news stories produced in the last month

• it is possible to pass these documents through a Linked Data entity extractor such as Open Calais(http://www.opencalais.com/), or DBpedia Spotlight(http://dbpedia-spotlight.github.com/demo/index.html) which annotate documents with the Linked Data URIs of entities referenced in the documents.

Page 75: Lecture linked data cloud & sparql

75

From Text Documents to Linked Data

• Publishing these annotations together with the documents – increases the discoverability of the documents – enables applications to use the referenced Linked Data

sources as background knowledge to display complementary information on web pages

– or to enhance information retrieval tasks, for instance, offer faceted browsing instead of simple full-text search.

• Applications like this to be presented in next lecture(s)

Page 76: Lecture linked data cloud & sparql

76

Summary

• Linked Data is a way of publishing and interlinking structured data on the web

• Linked Data principles to follow to create such data

• How to find existing datasets: Web of Data• How to query existing datasets: SPARQL• Possible software architecture patterns

Page 77: Lecture linked data cloud & sparql

77

Next Lecture

• Consuming Linked Data– Linked Data Applications

• What datasets they use from Web of Data• What software architecture they follow

– Benefits• Integration – for organisations• Browsing and interaction – for users

Page 78: Lecture linked data cloud & sparql

78

References

• Tom Heath, An Introduction to Linked Data, Linked Data Tutorial, Austin, Texas, 2009.

• Raimond et al., A skim-read introduction to linked data

• Tom Heath, Christian Bizer: Linked Data: Evolving the Web into a Global Data Space. Synthesis Lectures on the Semantic Web, Morgan & Claypool Publishers 2011

• Cambridge Semantics, SPARQL by example

Page 79: Lecture linked data cloud & sparql

79

TED talk from Tim Berners Lee on Linked Data

• http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html