masws - natural language and the semantic web · populating the semantic web why and how? why...

Post on 17-Oct-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

MASWSNatural Language and the Semantic Web

Kate Byrne

School of Informatics

14 February 2011

1 / 27

Populating the Semantic Webwhy and how?text to RDF

Organising the Triplesschema designopen issues and problems

References

2 / 27

Populating the Semantic Web why and how?

The Semantic Web needs data

• Semantic Web first proposed in 1999

• Why hasn’t it taken off the way WWW did?

3 / 27

Populating the Semantic Web why and how?

The Semantic Web needs data

• Semantic Web first proposed in 1999

• Why hasn’t it taken off the way WWW did?

3 / 27

Populating the Semantic Web why and how?

The Semantic Web needs data

• Semantic Web first proposed in 1999

• Why hasn’t it taken off the way WWW did?

not

enough

data

queries

aren’t

usefulmy data

adding

I’ll delay

3 / 27

Populating the Semantic Web why and how?

How to get data

create convert

How do we populate the Semantic Web?

structured data

and semi−structured

unstructured

4 / 27

Populating the Semantic Web why and how?

How to get data

create convert

structured data

How do we populate the Semantic Web?

unstructured

and semi−structured

4 / 27

Populating the Semantic Web why and how?

Semi-structured content is everywhere

5 / 27

Populating the Semantic Web why and how?

Semi-structured content is everywhere

5 / 27

Populating the Semantic Web why and how?

Semi-structured content is everywhere

5 / 27

Populating the Semantic Web why and how?

Why convert unstructured text?

• Extracting RDF triples from free text is not easy• So why is it worth doing?

• there was knowledge before WWW (!)• high quality material, often historical• big digitisation push in 1990s (and again in 2011?) –• – but still challenging to search text effectively• natural language is our preferred way of communicating

• Can the web learn to talk in natural language...?

6 / 27

Populating the Semantic Web why and how?

Why convert unstructured text?

• Extracting RDF triples from free text is not easy• So why is it worth doing?

• there was knowledge before WWW (!)• high quality material, often historical• big digitisation push in 1990s (and again in 2011?) –• – but still challenging to search text effectively• natural language is our preferred way of communicating

• Can the web learn to talk in natural language...?

6 / 27

Populating the Semantic Web why and how?

Why convert unstructured text?

• Extracting RDF triples from free text is not easy• So why is it worth doing?

• there was knowledge before WWW (!)• high quality material, often historical• big digitisation push in 1990s (and again in 2011?) –• – but still challenging to search text effectively• natural language is our preferred way of communicating

• Can the web learn to talk in natural language...?

6 / 27

Populating the Semantic Web why and how?

Why convert unstructured text?

• Extracting RDF triples from free text is not easy• So why is it worth doing?

• there was knowledge before WWW (!)• high quality material, often historical• big digitisation push in 1990s (and again in 2011?) –• – but still challenging to search text effectively• natural language is our preferred way of communicating

• Can the web learn to talk in natural language...?

6 / 27

Populating the Semantic Web why and how?

Why convert unstructured text?

• Extracting RDF triples from free text is not easy• So why is it worth doing?

• there was knowledge before WWW (!)• high quality material, often historical• big digitisation push in 1990s (and again in 2011?) –• – but still challenging to search text effectively• natural language is our preferred way of communicating

• Can the web learn to talk in natural language...?

6 / 27

Populating the Semantic Web why and how?

Why convert unstructured text?

• Extracting RDF triples from free text is not easy• So why is it worth doing?

• there was knowledge before WWW (!)• high quality material, often historical• big digitisation push in 1990s (and again in 2011?) –• – but still challenging to search text effectively• natural language is our preferred way of communicating

• Can the web learn to talk in natural language...?

6 / 27

Populating the Semantic Web why and how?

Why convert unstructured text?

• Extracting RDF triples from free text is not easy• So why is it worth doing?

• there was knowledge before WWW (!)• high quality material, often historical• big digitisation push in 1990s (and again in 2011?) –• – but still challenging to search text effectively• natural language is our preferred way of communicating

• Can the web learn to talk in natural language...?

6 / 27

Populating the Semantic Web why and how?

Why convert unstructured text?

• Extracting RDF triples from free text is not easy• So why is it worth doing?

• there was knowledge before WWW (!)• high quality material, often historical• big digitisation push in 1990s (and again in 2011?) –• – but still challenging to search text effectively• natural language is our preferred way of communicating

• Can the web learn to talk in natural language...?

6 / 27

Populating the Semantic Web why and how?

7 / 27

Populating the Semantic Web txt2rdf

Natural Language Processing

• Extraction of semantic content from natural language text

• Systems now available – Open Calais, Powerset, AlchemyAPI

• To explain the process: part of my PhD work

“Kate likes chocolate.”Think of RDF triples as declarative sentences:RDF nodes = nouns; RDF arcs = verbs

8 / 27

Populating the Semantic Web txt2rdf

Natural Language Processing

• Extraction of semantic content from natural language text

• Systems now available – Open Calais, Powerset, AlchemyAPI

• To explain the process: part of my PhD work

“Kate likes chocolate.”Think of RDF triples as declarative sentences:RDF nodes = nouns; RDF arcs = verbs

8 / 27

Populating the Semantic Web txt2rdf

Natural Language Processing

• Extraction of semantic content from natural language text

• Systems now available – Open Calais, Powerset, AlchemyAPI

• To explain the process: part of my PhD work

“Kate likes chocolate.”Think of RDF triples as declarative sentences:RDF nodes = nouns; RDF arcs = verbs

8 / 27

Populating the Semantic Web txt2rdf

Natural Language Processing

• Extraction of semantic content from natural language text

• Systems now available – Open Calais, Powerset, AlchemyAPI

• To explain the process: part of my PhD work

“Kate likes chocolate.”Think of RDF triples as declarative sentences:RDF nodes = nouns; RDF arcs = verbs

8 / 27

Populating the Semantic Web txt2rdf

Natural Language Processing

• Extraction of semantic content from natural language text

• Systems now available – Open Calais, Powerset, AlchemyAPI

• To explain the process: part of my PhD work

“Kate likes chocolate.”Think of RDF triples as declarative sentences:RDF nodes = nouns; RDF arcs = verbs

:kate:likes

:chocolate

prefix : <http://www.ltg.ed.ac.uk/>

8 / 27

Populating the Semantic Web txt2rdf

Converting text to RDF

fancy NLP processing

and RDFisation

9 / 27

Populating the Semantic Web txt2rdf

Example – RCAHMS text

10 / 27

Evidence of a quartz knapping site was found within the confines of the stone

strongly suggests a domestic site.Besides the quartz implements and corresponding waste, several other artifacts of localorigin occurred including a split pebble axe of greenstone with Shetland EarlyBronze Age affinities. B Beveridge, 1972.Field survey and excavation, as a response to continual wind and marineerosion, was carried out at the Sands of Breckon between1982 and 1983.HP50NW 11.00 was recorded as a stone settings surrounded byoccupational debris (Site 22). Excavation revealed midden deposits of anearly Iron Age date and a surface scatter of artefacts of mixed dates. Thestone settings were tentatively interpreted as the basal stones of longcists.Historic Scotland Archive Project (SW) 2002.

circle, and in conjunction with several structures within the inner ring,

site 20

Populating the Semantic Web txt2rdf

Example – RCAHMS text

10 / 27

site 20

Populating the Semantic Web txt2rdf

Example – RCAHMS text

10 / 27

site 20

Populating the Semantic Web txt2rdf

The txt2rdf process in a nutshell

• First find the important nouns, or “named entities”:

Example

Mr. Peter Moar found a small hoard of 5 polished stone knives on thesame patch in May 1946.

• Then look for relationships between them:

11 / 27

Populating the Semantic Web txt2rdf

The txt2rdf process in a nutshell

• First find the important nouns, or “named entities”:

Example

Mr. Peter Moar found a small hoard of 5 polished stone knives on thesame patch in May 1946.

• Then look for relationships between them:

11 / 27

Populating the Semantic Web txt2rdf

The txt2rdf process in a nutshell

• First find the important nouns, or “named entities”:

Example

Mr. Peter Moar found a small hoard of 5 polished stone knives on thesame patch in May 1946.

• Then look for relationships between them:

11 / 27

Populating the Semantic Web txt2rdf

The txt2rdf process in a nutshell

• First find the important nouns, or “named entities”:

Example

Mr. Peter Moar found a small hoard of 5 polished stone knives on thesame patch in May 1946.

• Then look for relationships between them:

Mr. Peter Moar found a small hoard of 5 polished stone knives on the

same patch in May 1946.

11 / 27

Populating the Semantic Web txt2rdf

My txt2rdf pipeline

sentenceand para

split

POS tagtokenise

multi−wordtokens and

features trained NERmodel

list of NEsand

classes

removeunwantedrelations

generatetriples

attachsiteids

trained REmodel

set of NEpairs andfeatures

list ofrelations

and classes

sfsjksjwjvssjkljljs sd’lajoen s

jjs kjdlk lksjlkj sks oihhg sk

jjlkjlj jljbjl skj ekw

RDFtranslation

Graphof triples

Pre−processing Named Entity Recognition

Relation Extraction

Text documents

12 / 27

Populating the Semantic Web txt2rdf

Dealing with EVENTs

Mr. Peter Moar found a small hoard of 5 polished stone knives on the

same patch in May 1946.

• FIND event is n-ary relation:

eventRel(eventAgent, eventAgentRole, eventDate, eventPatient, eventPlace)

find123(“Mr. Peter Moar”,,“May 1946”,“polished stone knives”,“Hill of Shurton”)

Split event relation into RDF triples

:find123 :hasAgent :peterMoar .:find123 :hasDate :may1946 .:find123 :hasPatient :polishedStoneKnives .:find123 :hasLocation :hillOfShurton .

13 / 27

Populating the Semantic Web txt2rdf

Dealing with EVENTs

Mr. Peter Moar found a small hoard of 5 polished stone knives on the

same patch in May 1946.

• FIND event is n-ary relation:

eventRel(eventAgent, eventAgentRole, eventDate, eventPatient, eventPlace)

find123(“Mr. Peter Moar”,,“May 1946”,“polished stone knives”,“Hill of Shurton”)

Split event relation into RDF triples

:find123 :hasAgent :peterMoar .:find123 :hasDate :may1946 .:find123 :hasPatient :polishedStoneKnives .:find123 :hasLocation :hillOfShurton .

13 / 27

Populating the Semantic Web txt2rdf

Archaeological events

• Example was from RCAHMS data:http://canmore.rcahms.gov.uk/en/site/1102/details/hill+of+shurton/

• Extracting events allows structured searches

• Question: What do we do with NEs that are not in relations?

For more informationByrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text.

14 / 27

Populating the Semantic Web txt2rdf

Archaeological events

• Example was from RCAHMS data:http://canmore.rcahms.gov.uk/en/site/1102/details/hill+of+shurton/

• Extracting events allows structured searches

• Question: What do we do with NEs that are not in relations?

For more informationByrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text.

14 / 27

Populating the Semantic Web txt2rdf

Archaeological events

• Example was from RCAHMS data:http://canmore.rcahms.gov.uk/en/site/1102/details/hill+of+shurton/

• Extracting events allows structured searches

• Question: What do we do with NEs that are not in relations?

For more informationByrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text.

14 / 27

Populating the Semantic Web txt2rdf

Archaeological events

• Example was from RCAHMS data:http://canmore.rcahms.gov.uk/en/site/1102/details/hill+of+shurton/

• Extracting events allows structured searches

• Question: What do we do with NEs that are not in relations?

For more informationByrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text.

14 / 27

Populating the Semantic Web txt2rdf

Available NLP systems

• OpenCalais: http://viewer.opencalais.com/

• Powerset: now part of Microsoft’s Bing

• AlchemyAPI: http://www.alchemyapi.com/api/demo.html

• Trained on lots of (English) text, mostly news articles• Many general-purpose NLP tools available online, eg

• nltk (http://www.nltk.org/)• Edinburgh LTG software (http://www.ltg.ed.ac.uk/software)• NaCTeM (http://nactem.ac.uk/)

15 / 27

Populating the Semantic Web txt2rdf

Available NLP systems

• OpenCalais: http://viewer.opencalais.com/

• Powerset: now part of Microsoft’s Bing

• AlchemyAPI: http://www.alchemyapi.com/api/demo.html

• Trained on lots of (English) text, mostly news articles• Many general-purpose NLP tools available online, eg

• nltk (http://www.nltk.org/)• Edinburgh LTG software (http://www.ltg.ed.ac.uk/software)• NaCTeM (http://nactem.ac.uk/)

15 / 27

Populating the Semantic Web txt2rdf

Available NLP systems

• OpenCalais: http://viewer.opencalais.com/

• Powerset: now part of Microsoft’s Bing

• AlchemyAPI: http://www.alchemyapi.com/api/demo.html

• Trained on lots of (English) text, mostly news articles• Many general-purpose NLP tools available online, eg

• nltk (http://www.nltk.org/)• Edinburgh LTG software (http://www.ltg.ed.ac.uk/software)• NaCTeM (http://nactem.ac.uk/)

15 / 27

Populating the Semantic Web txt2rdf

Available NLP systems

• OpenCalais: http://viewer.opencalais.com/

• Powerset: now part of Microsoft’s Bing

• AlchemyAPI: http://www.alchemyapi.com/api/demo.html

• Trained on lots of (English) text, mostly news articles• Many general-purpose NLP tools available online, eg

• nltk (http://www.nltk.org/)• Edinburgh LTG software (http://www.ltg.ed.ac.uk/software)• NaCTeM (http://nactem.ac.uk/)

15 / 27

Populating the Semantic Web txt2rdf

Available NLP systems

• OpenCalais: http://viewer.opencalais.com/

• Powerset: now part of Microsoft’s Bing

• AlchemyAPI: http://www.alchemyapi.com/api/demo.html

• Trained on lots of (English) text, mostly news articles• Many general-purpose NLP tools available online, eg

• nltk (http://www.nltk.org/)• Edinburgh LTG software (http://www.ltg.ed.ac.uk/software)• NaCTeM (http://nactem.ac.uk/)

15 / 27

Populating the Semantic Web txt2rdf

Available NLP systems

• OpenCalais: http://viewer.opencalais.com/

• Powerset: now part of Microsoft’s Bing

• AlchemyAPI: http://www.alchemyapi.com/api/demo.html

• Trained on lots of (English) text, mostly news articles• Many general-purpose NLP tools available online, eg

• nltk (http://www.nltk.org/)• Edinburgh LTG software (http://www.ltg.ed.ac.uk/software)• NaCTeM (http://nactem.ac.uk/)

15 / 27

Populating the Semantic Web txt2rdf

Available NLP systems

• OpenCalais: http://viewer.opencalais.com/

• Powerset: now part of Microsoft’s Bing

• AlchemyAPI: http://www.alchemyapi.com/api/demo.html

• Trained on lots of (English) text, mostly news articles• Many general-purpose NLP tools available online, eg

• nltk (http://www.nltk.org/)• Edinburgh LTG software (http://www.ltg.ed.ac.uk/software)• NaCTeM (http://nactem.ac.uk/)

15 / 27

Populating the Semantic Web txt2rdf

Available NLP systems

• OpenCalais: http://viewer.opencalais.com/

• Powerset: now part of Microsoft’s Bing

• AlchemyAPI: http://www.alchemyapi.com/api/demo.html

• Trained on lots of (English) text, mostly news articles• Many general-purpose NLP tools available online, eg

• nltk (http://www.nltk.org/)• Edinburgh LTG software (http://www.ltg.ed.ac.uk/software)• NaCTeM (http://nactem.ac.uk/)

15 / 27

Organising the Triples schema design

Populating the Semantic Webwhy and how?text to RDF

Organising the Triplesschema designopen issues and problems

References16 / 27

Organising the Triples schema design

Do we have a data schema?

create convert

structured data

How do we populate the Semantic Web?

unstructured

and semi−structured

16 / 27

Organising the Triples schema design

A schema for free text

• So far we’ve looked at instance data (ABox statements)• Where is framework to slot instances into?• Named Entity categories: PERSON, PLACE, DATE, etc• NE categories become RDFS classes:

Entities are typed by their NE class

:peterMoar rdf:type :person .:found rdf:type :event .:polishedStoneKnives rdf:type :artefact .:may1946 rdf:type :date .

17 / 27

Organising the Triples schema design

A schema for free text

• So far we’ve looked at instance data (ABox statements)• Where is framework to slot instances into?• Named Entity categories: PERSON, PLACE, DATE, etc• NE categories become RDFS classes:

Entities are typed by their NE class

:peterMoar rdf:type :person .:found rdf:type :event .:polishedStoneKnives rdf:type :artefact .:may1946 rdf:type :date .

17 / 27

Organising the Triples schema design

A schema for free text

• So far we’ve looked at instance data (ABox statements)• Where is framework to slot instances into?• Named Entity categories: PERSON, PLACE, DATE, etc• NE categories become RDFS classes:

Entities are typed by their NE class

:peterMoar rdf:type :person .:found rdf:type :event .:polishedStoneKnives rdf:type :artefact .:may1946 rdf:type :date .

17 / 27

Organising the Triples schema design

A schema for free text

• So far we’ve looked at instance data (ABox statements)• Where is framework to slot instances into?• Named Entity categories: PERSON, PLACE, DATE, etc• NE categories become RDFS classes:

Entities are typed by their NE class

:peterMoar rdf:type :person .:found rdf:type :event .:polishedStoneKnives rdf:type :artefact .:may1946 rdf:type :date .

17 / 27

Organising the Triples schema design

A schema for free text

• So far we’ve looked at instance data (ABox statements)• Where is framework to slot instances into?• Named Entity categories: PERSON, PLACE, DATE, etc• NE categories become RDFS classes:

Mr. Peter Moar found a small hoard of 5 polished stone knives on the

same patch in May 1946.

Entities are typed by their NE class

:peterMoar rdf:type :person .:found rdf:type :event .:polishedStoneKnives rdf:type :artefact .:may1946 rdf:type :date .

17 / 27

Organising the Triples schema design

RDFS class hierarchy

• NE categories usually flat list, not hierarchical• My set: ORG, PERSNAME, ROLE, SITETYPE, ARTEFACT,

PLACE, SITENAME, ADDRESS, PERIOD, DATE, EVENT• Contrast with structured data• Hand-designed hierarchy (with rdfs:subClassOf ):

18 / 27

Organising the Triples schema design

RDFS class hierarchy

• NE categories usually flat list, not hierarchical• My set: ORG, PERSNAME, ROLE, SITETYPE, ARTEFACT,

PLACE, SITENAME, ADDRESS, PERIOD, DATE, EVENT• Contrast with structured data• Hand-designed hierarchy (with rdfs:subClassOf ):

18 / 27

Organising the Triples schema design

RDFS class hierarchy

• NE categories usually flat list, not hierarchical• My set: ORG, PERSNAME, ROLE, SITETYPE, ARTEFACT,

PLACE, SITENAME, ADDRESS, PERIOD, DATE, EVENT• Contrast with structured data• Hand-designed hierarchy (with rdfs:subClassOf ):

18 / 27

Organising the Triples schema design

RDFS class hierarchy

• NE categories usually flat list, not hierarchical• My set: ORG, PERSNAME, ROLE, SITETYPE, ARTEFACT,

PLACE, SITENAME, ADDRESS, PERIOD, DATE, EVENT• Contrast with structured data• Hand-designed hierarchy (with rdfs:subClassOf ):

18 / 27

Organising the Triples schema design

RDFS class hierarchy

• NE categories usually flat list, not hierarchical• My set: ORG, PERSNAME, ROLE, SITETYPE, ARTEFACT,

PLACE, SITENAME, ADDRESS, PERIOD, DATE, EVENT• Contrast with structured data• Hand-designed hierarchy (with rdfs:subClassOf ):

http://www.ltg.ed.ac.uk/tether/

timeagentsiteid

rolepersonorg date period

loc classn

sitetype objtype

event

addresssitenameplace

grid

survey excavation find visit description creation alteration

18 / 27

Organising the Triples schema design

Property labels

• Can we extract RDF predicate set automatically from text?

• Eg clustering commonly occurring verbs

• But – schema design only has to be done once

• Property set and class hierarchy are related

• My set: hasEvent, hasAgent, hasAgentRole, hasPeriod,hasPatient, hasLocation, hasClassn, hasObject, partOf

• Plus “standard” ones like owl:sameAs, rdfs:seeAlso,skos:broader

• Flat list (no rdfs:subPropertyOf )

19 / 27

Organising the Triples schema design

Property labels

• Can we extract RDF predicate set automatically from text?

• Eg clustering commonly occurring verbs

• But – schema design only has to be done once

• Property set and class hierarchy are related

• My set: hasEvent, hasAgent, hasAgentRole, hasPeriod,hasPatient, hasLocation, hasClassn, hasObject, partOf

• Plus “standard” ones like owl:sameAs, rdfs:seeAlso,skos:broader

• Flat list (no rdfs:subPropertyOf )

19 / 27

Organising the Triples schema design

Property labels

• Can we extract RDF predicate set automatically from text?

• Eg clustering commonly occurring verbs

• But – schema design only has to be done once

• Property set and class hierarchy are related

• My set: hasEvent, hasAgent, hasAgentRole, hasPeriod,hasPatient, hasLocation, hasClassn, hasObject, partOf

• Plus “standard” ones like owl:sameAs, rdfs:seeAlso,skos:broader

• Flat list (no rdfs:subPropertyOf )

19 / 27

Organising the Triples schema design

Property labels

• Can we extract RDF predicate set automatically from text?

• Eg clustering commonly occurring verbs

• But – schema design only has to be done once

• Property set and class hierarchy are related

• My set: hasEvent, hasAgent, hasAgentRole, hasPeriod,hasPatient, hasLocation, hasClassn, hasObject, partOf

• Plus “standard” ones like owl:sameAs, rdfs:seeAlso,skos:broader

• Flat list (no rdfs:subPropertyOf )

19 / 27

Organising the Triples schema design

Property labels

• Can we extract RDF predicate set automatically from text?

• Eg clustering commonly occurring verbs

• But – schema design only has to be done once

• Property set and class hierarchy are related

• My set: hasEvent, hasAgent, hasAgentRole, hasPeriod,hasPatient, hasLocation, hasClassn, hasObject, partOf

• Plus “standard” ones like owl:sameAs, rdfs:seeAlso,skos:broader

• Flat list (no rdfs:subPropertyOf )

19 / 27

Organising the Triples schema design

Property labels

• Can we extract RDF predicate set automatically from text?

• Eg clustering commonly occurring verbs

• But – schema design only has to be done once

• Property set and class hierarchy are related

• My set: hasEvent, hasAgent, hasAgentRole, hasPeriod,hasPatient, hasLocation, hasClassn, hasObject, partOf

• Plus “standard” ones like owl:sameAs, rdfs:seeAlso,skos:broader

• Flat list (no rdfs:subPropertyOf )

19 / 27

Organising the Triples schema design

Property labels

• Can we extract RDF predicate set automatically from text?

• Eg clustering commonly occurring verbs

• But – schema design only has to be done once

• Property set and class hierarchy are related

• My set: hasEvent, hasAgent, hasAgentRole, hasPeriod,hasPatient, hasLocation, hasClassn, hasObject, partOf

• Plus “standard” ones like owl:sameAs, rdfs:seeAlso,skos:broader

• Flat list (no rdfs:subPropertyOf )

19 / 27

Organising the Triples schema design

Existing schemas and vocabularies

• Lots of vocabularies to choose from• http://esw.w3.org/topic/VocabularyMarket

• RDFS and OWL are fundamental• SKOS – translate existing domain thesauri

• http://www.w3.org/2004/02/skos/

• Reuse where possible; invent local scheme where not

• Incentive to reuse is strong

20 / 27

Organising the Triples schema design

Existing schemas and vocabularies

• Lots of vocabularies to choose from• http://esw.w3.org/topic/VocabularyMarket

• RDFS and OWL are fundamental• SKOS – translate existing domain thesauri

• http://www.w3.org/2004/02/skos/

• Reuse where possible; invent local scheme where not

• Incentive to reuse is strong

20 / 27

Organising the Triples schema design

Existing schemas and vocabularies

• Lots of vocabularies to choose from• http://esw.w3.org/topic/VocabularyMarket

• RDFS and OWL are fundamental• SKOS – translate existing domain thesauri

• http://www.w3.org/2004/02/skos/

• Reuse where possible; invent local scheme where not

• Incentive to reuse is strong

20 / 27

Organising the Triples schema design

Existing schemas and vocabularies

• Lots of vocabularies to choose from• http://esw.w3.org/topic/VocabularyMarket

• RDFS and OWL are fundamental• SKOS – translate existing domain thesauri

• http://www.w3.org/2004/02/skos/

• Reuse where possible; invent local scheme where not

• Incentive to reuse is strong

20 / 27

Organising the Triples schema design

Existing schemas and vocabularies

• Lots of vocabularies to choose from• http://esw.w3.org/topic/VocabularyMarket

• RDFS and OWL are fundamental• SKOS – translate existing domain thesauri

• http://www.w3.org/2004/02/skos/

• Reuse where possible; invent local scheme where not

• Incentive to reuse is strong

20 / 27

Organising the Triples schema design

Existing schemas and vocabularies

• Lots of vocabularies to choose from• http://esw.w3.org/topic/VocabularyMarket

• RDFS and OWL are fundamental• SKOS – translate existing domain thesauri

• http://www.w3.org/2004/02/skos/

• Reuse where possible; invent local scheme where not

• Incentive to reuse is strong

20 / 27

Organising the Triples schema design

Existing schemas and vocabularies

• Lots of vocabularies to choose from• http://esw.w3.org/topic/VocabularyMarket

• RDFS and OWL are fundamental• SKOS – translate existing domain thesauri

• http://www.w3.org/2004/02/skos/

• Reuse where possible; invent local scheme where not

• Incentive to reuse is strong

20 / 27

Organising the Triples open issues and problems

Issues around schema design

• Schema granularity• more categories to choose from⇒ NLP less accurate• so class hierarchy quite coarse and flat

• Schema discovery• tension between simplicity and query power• how will software agents “understand” your schema?

21 / 27

Organising the Triples open issues and problems

Issues around schema design

• Schema granularity• more categories to choose from⇒ NLP less accurate• so class hierarchy quite coarse and flat

• Schema discovery• tension between simplicity and query power• how will software agents “understand” your schema?

21 / 27

Organising the Triples open issues and problems

Issues around schema design

• Schema granularity• more categories to choose from⇒ NLP less accurate• so class hierarchy quite coarse and flat

• Schema discovery• tension between simplicity and query power• how will software agents “understand” your schema?

21 / 27

Organising the Triples open issues and problems

Issues around schema design

• Schema granularity• more categories to choose from⇒ NLP less accurate• so class hierarchy quite coarse and flat

• Schema discovery• tension between simplicity and query power• how will software agents “understand” your schema?

21 / 27

Organising the Triples open issues and problems

Issues around schema design

• Schema granularity• more categories to choose from⇒ NLP less accurate• so class hierarchy quite coarse and flat

• Schema discovery• tension between simplicity and query power• how will software agents “understand” your schema?

21 / 27

Organising the Triples open issues and problems

Issues around schema design

• Schema granularity• more categories to choose from⇒ NLP less accurate• so class hierarchy quite coarse and flat

• Schema discovery• tension between simplicity and query power• how will software agents “understand” your schema?

21 / 27

Organising the Triples open issues and problems

Issues around NLP accuracy

• Disambiguation of extracted NEs (Mr Peter Moar, P Moar, etc)• NLP is not 100% accurate!

• how many false statements can Semantic Web stand?• “the computer said it, so it must be true”

• Even given canonical URI :peterMoar for our Peter Moar...

• ... can we distinguish him from others with same name?

22 / 27

Organising the Triples open issues and problems

Issues around NLP accuracy

• Disambiguation of extracted NEs (Mr Peter Moar, P Moar, etc)• NLP is not 100% accurate!

• how many false statements can Semantic Web stand?• “the computer said it, so it must be true”

• Even given canonical URI :peterMoar for our Peter Moar...

• ... can we distinguish him from others with same name?

22 / 27

Organising the Triples open issues and problems

Issues around NLP accuracy

• Disambiguation of extracted NEs (Mr Peter Moar, P Moar, etc)• NLP is not 100% accurate!

• how many false statements can Semantic Web stand?• “the computer said it, so it must be true”

• Even given canonical URI :peterMoar for our Peter Moar...

• ... can we distinguish him from others with same name?

22 / 27

Organising the Triples open issues and problems

Issues around NLP accuracy

• Disambiguation of extracted NEs (Mr Peter Moar, P Moar, etc)• NLP is not 100% accurate!

• how many false statements can Semantic Web stand?• “the computer said it, so it must be true”

• Even given canonical URI :peterMoar for our Peter Moar...

• ... can we distinguish him from others with same name?

22 / 27

Organising the Triples open issues and problems

Issues around NLP accuracy

• Disambiguation of extracted NEs (Mr Peter Moar, P Moar, etc)• NLP is not 100% accurate!

• how many false statements can Semantic Web stand?• “the computer said it, so it must be true”

• Even given canonical URI :peterMoar for our Peter Moar...

• ... can we distinguish him from others with same name?

22 / 27

Organising the Triples open issues and problems

Issues around NLP accuracy

• Disambiguation of extracted NEs (Mr Peter Moar, P Moar, etc)• NLP is not 100% accurate!

• how many false statements can Semantic Web stand?• “the computer said it, so it must be true”

• Even given canonical URI :peterMoar for our Peter Moar...

• ... can we distinguish him from others with same name?

22 / 27

Organising the Triples open issues and problems

Grounding local data

• Grounding: tie unique canonical URIs to authoritative reference

• Is http://www.ltg.ed.ac.uk/tether/Loc/Place#edinburgh same ashttp://www.geonames.org/2650225/edinburgh.html?

• Grounding during NLP processing impractical

• Use owl:sameAs? – redundant triples, more complex queries

• Who maintains the authoritative lists of URIs?

• Specialist thesauri, eg archaeological classification terms like“Stone setting”

23 / 27

Organising the Triples open issues and problems

Grounding local data

• Grounding: tie unique canonical URIs to authoritative reference

• Is http://www.ltg.ed.ac.uk/tether/Loc/Place#edinburgh same ashttp://www.geonames.org/2650225/edinburgh.html?

• Grounding during NLP processing impractical

• Use owl:sameAs? – redundant triples, more complex queries

• Who maintains the authoritative lists of URIs?

• Specialist thesauri, eg archaeological classification terms like“Stone setting”

23 / 27

Organising the Triples open issues and problems

Grounding local data

• Grounding: tie unique canonical URIs to authoritative reference

• Is http://www.ltg.ed.ac.uk/tether/Loc/Place#edinburgh same ashttp://www.geonames.org/2650225/edinburgh.html?

• Grounding during NLP processing impractical

• Use owl:sameAs? – redundant triples, more complex queries

• Who maintains the authoritative lists of URIs?

• Specialist thesauri, eg archaeological classification terms like“Stone setting”

23 / 27

Organising the Triples open issues and problems

Grounding local data

• Grounding: tie unique canonical URIs to authoritative reference

• Is http://www.ltg.ed.ac.uk/tether/Loc/Place#edinburgh same ashttp://www.geonames.org/2650225/edinburgh.html?

• Grounding during NLP processing impractical

• Use owl:sameAs? – redundant triples, more complex queries

• Who maintains the authoritative lists of URIs?

• Specialist thesauri, eg archaeological classification terms like“Stone setting”

23 / 27

Organising the Triples open issues and problems

Grounding local data

• Grounding: tie unique canonical URIs to authoritative reference

• Is http://www.ltg.ed.ac.uk/tether/Loc/Place#edinburgh same ashttp://www.geonames.org/2650225/edinburgh.html?

• Grounding during NLP processing impractical

• Use owl:sameAs? – redundant triples, more complex queries

• Who maintains the authoritative lists of URIs?

• Specialist thesauri, eg archaeological classification terms like“Stone setting”

23 / 27

Organising the Triples open issues and problems

Grounding local data

• Grounding: tie unique canonical URIs to authoritative reference

• Is http://www.ltg.ed.ac.uk/tether/Loc/Place#edinburgh same ashttp://www.geonames.org/2650225/edinburgh.html?

• Grounding during NLP processing impractical

• Use owl:sameAs? – redundant triples, more complex queries

• Who maintains the authoritative lists of URIs?

• Specialist thesauri, eg archaeological classification terms like“Stone setting”

23 / 27

Organising the Triples open issues and problems

Grounding specialist terminology

siteid:site20

address:breckon

sitename:sands+of+breckon

date:1982

event:excavation

event:excavation20w158

address:hp50nw+11.01+hp+5304+0519

sitetype:stone+settings20w179

sitetype:stone+setting

"stone setting"

"An arrangement of twoor more standing stones"

sitetype:religious+ritual+and+funerary

sitetype:standing+stone

sitetype:stone+circle

sitetype:stone+row

sitetype:

:hasLocation

:hasLocation

:hasPeriod

rdf:type

:hasEvent

:hasLocation

:hasClassn

rdf:type

rdfs:label

skos:scopeNote

skos:broader

skos:related

rdfs:subClassOf

Aim: ground place names, people, organisations, etc.24 / 27

Organising the Triples open issues and problems

Grounding specialist terminology

siteid:site20

address:breckon

sitename:sands+of+breckon

date:1982

event:excavation

event:excavation20w158

address:hp50nw+11.01+hp+5304+0519

"An arrangement of twoor more standing stones"

sitetype:religious+ritual+and+funerary

sitetype:standing+stone

sitetype:stone+circle

sitetype:stone+row

sitetype:stone+settings20w179

sitetype:

"stone setting"

sitetype:stone+setting

:hasClassn

rdfs:label

skos:scopeNote

skos:broader

skos:related

rdf:type

rdfs:subClassOf

:hasLocation

:hasLocation

:hasPeriod

rdf:type

:hasEvent

:hasLocation

Aim: ground place names, people, organisations, etc.24 / 27

Organising the Triples open issues and problems

Grounding specialist terminology

siteid:site20

address:breckon

sitename:sands+of+breckon

date:1982

event:excavation

event:excavation20w158

address:hp50nw+11.01+hp+5304+0519

"An arrangement of twoor more standing stones"

sitetype:religious+ritual+and+funerary

sitetype:standing+stone

sitetype:stone+circle

sitetype:stone+row

sitetype:stone+settings20w179

sitetype:

"stone setting"

sitetype:stone+setting

:hasClassn

rdfs:label

skos:scopeNote

skos:broader

skos:related

rdf:type

rdfs:subClassOf

:hasLocation

:hasLocation

:hasPeriod

rdf:type

:hasEvent

:hasLocation

Aim: ground place names, people, organisations, etc.24 / 27

Organising the Triples open issues and problems

Linked Data

25 / 27

Organising the Triples open issues and problems

Grounding turns data into Linked Data

26 / 27

Organising the Triples open issues and problems

Grounding turns data into Linked Data

26 / 27

Organising the Triples open issues and problems

Grounding turns data into Linked Data

26 / 27

Organising the Triples open issues and problems

Grounding turns data into Linked Data

grounding local URIs

against "authority" nodes

is the

next big challenge!

26 / 27

References

References to follow up

• NLP software for Semantic Web applications• OpenCalais: http://opencalais.com/• AlchemyAPI: http://www.alchemyapi.com/api/demo.html

• VocabularyMarket: http://esw.w3.org/topic/VocabularyMarket• Case study

• Byrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text. CAA2009, Williamsburg, VA.

• General references for NLP• Natural Language Processing with Python, Bird, Klein and Loper

(http://www.nltk.org/book)• Speech and Language Processing, Jurafsky and Martin

(http://www.cs.colorado.edu/∼martin/slp.html)

27 / 27

References

References to follow up

• NLP software for Semantic Web applications• OpenCalais: http://opencalais.com/• AlchemyAPI: http://www.alchemyapi.com/api/demo.html

• VocabularyMarket: http://esw.w3.org/topic/VocabularyMarket• Case study

• Byrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text. CAA2009, Williamsburg, VA.

• General references for NLP• Natural Language Processing with Python, Bird, Klein and Loper

(http://www.nltk.org/book)• Speech and Language Processing, Jurafsky and Martin

(http://www.cs.colorado.edu/∼martin/slp.html)

27 / 27

References

References to follow up

• NLP software for Semantic Web applications• OpenCalais: http://opencalais.com/• AlchemyAPI: http://www.alchemyapi.com/api/demo.html

• VocabularyMarket: http://esw.w3.org/topic/VocabularyMarket• Case study

• Byrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text. CAA2009, Williamsburg, VA.

• General references for NLP• Natural Language Processing with Python, Bird, Klein and Loper

(http://www.nltk.org/book)• Speech and Language Processing, Jurafsky and Martin

(http://www.cs.colorado.edu/∼martin/slp.html)

27 / 27

References

References to follow up

• NLP software for Semantic Web applications• OpenCalais: http://opencalais.com/• AlchemyAPI: http://www.alchemyapi.com/api/demo.html

• VocabularyMarket: http://esw.w3.org/topic/VocabularyMarket• Case study

• Byrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text. CAA2009, Williamsburg, VA.

• General references for NLP• Natural Language Processing with Python, Bird, Klein and Loper

(http://www.nltk.org/book)• Speech and Language Processing, Jurafsky and Martin

(http://www.cs.colorado.edu/∼martin/slp.html)

27 / 27

References

References to follow up

• NLP software for Semantic Web applications• OpenCalais: http://opencalais.com/• AlchemyAPI: http://www.alchemyapi.com/api/demo.html

• VocabularyMarket: http://esw.w3.org/topic/VocabularyMarket• Case study

• Byrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text. CAA2009, Williamsburg, VA.

• General references for NLP• Natural Language Processing with Python, Bird, Klein and Loper

(http://www.nltk.org/book)• Speech and Language Processing, Jurafsky and Martin

(http://www.cs.colorado.edu/∼martin/slp.html)

27 / 27

References

References to follow up

• NLP software for Semantic Web applications• OpenCalais: http://opencalais.com/• AlchemyAPI: http://www.alchemyapi.com/api/demo.html

• VocabularyMarket: http://esw.w3.org/topic/VocabularyMarket• Case study

• Byrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text. CAA2009, Williamsburg, VA.

• General references for NLP• Natural Language Processing with Python, Bird, Klein and Loper

(http://www.nltk.org/book)• Speech and Language Processing, Jurafsky and Martin

(http://www.cs.colorado.edu/∼martin/slp.html)

27 / 27

References

References to follow up

• NLP software for Semantic Web applications• OpenCalais: http://opencalais.com/• AlchemyAPI: http://www.alchemyapi.com/api/demo.html

• VocabularyMarket: http://esw.w3.org/topic/VocabularyMarket• Case study

• Byrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text. CAA2009, Williamsburg, VA.

• General references for NLP• Natural Language Processing with Python, Bird, Klein and Loper

(http://www.nltk.org/book)• Speech and Language Processing, Jurafsky and Martin

(http://www.cs.colorado.edu/∼martin/slp.html)

27 / 27

References

References to follow up

• NLP software for Semantic Web applications• OpenCalais: http://opencalais.com/• AlchemyAPI: http://www.alchemyapi.com/api/demo.html

• VocabularyMarket: http://esw.w3.org/topic/VocabularyMarket• Case study

• Byrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text. CAA2009, Williamsburg, VA.

• General references for NLP• Natural Language Processing with Python, Bird, Klein and Loper

(http://www.nltk.org/book)• Speech and Language Processing, Jurafsky and Martin

(http://www.cs.colorado.edu/∼martin/slp.html)

27 / 27

References

References to follow up

• NLP software for Semantic Web applications• OpenCalais: http://opencalais.com/• AlchemyAPI: http://www.alchemyapi.com/api/demo.html

• VocabularyMarket: http://esw.w3.org/topic/VocabularyMarket• Case study

• Byrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text. CAA2009, Williamsburg, VA.

• General references for NLP• Natural Language Processing with Python, Bird, Klein and Loper

(http://www.nltk.org/book)• Speech and Language Processing, Jurafsky and Martin

(http://www.cs.colorado.edu/∼martin/slp.html)

27 / 27

top related