“publishing and consuming linked data. (lessons learnt when using lod in an application)”

36
Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an application) Marta Villegas Universitat Pompeu Fabra Cercedillas, June 2015

Upload: marta-villegas

Post on 17-Jan-2017

22 views

Category:

Presentations & Public Speaking


1 download

TRANSCRIPT

Page 1: “Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an application)”

Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an application)

Marta Villegas

Universitat Pompeu Fabra Cercedillas, June 2015

Page 2: “Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an application)”

IULA-UPF scenario

Page 3: “Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an application)”

OLAC Language Resource Catalogue

OAI-PMH SERVER

Dublin Core Metashare OLAC

Metadata Formats

.....METADATA HARVESTING....

Page 4: “Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an application)”

IULA-UPF moving to LOD

Ojectives: - Displaying data to the user in a comprehensive way - Aggregating external data in a sensitive manner - Making hidden implicit relations explicit. Triple store (Virtuoso) http://lodserver.iula.upf.edu Sparql server (Virtuoso) http://lodserver.iula.upf.edu/sparql Web Browser (RoR + SPARQL) http://lod.iula.upf.edu/

RDF

Page 5: “Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an application)”

When the focus shifts from growing the cloud to deploying applications

• Complex types (identity resolution) • Simple types (as instances) • Linking data (linking vs. reusing) • Data enrichment

• Approach: incremental process first bunch and curation

process

RDFying – index

Page 6: “Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an application)”

RDFyinf – complex instances

<Document>

<Person>

<Organisation>

<Project>

<LangResourceInfo> <identificationInfo>

<distributionInfo>

<contactPerson>

<metadataInfo>

<validationInfo>

<resourceDocumentationInfo>

<resourceCreationInfo>

<resourceComponentType>

</LangResourceInfo>

Page 7: “Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an application)”

RDFyinf – complex instances

<langResource-URI-1>

<langResource-URI-2>

<langResource-URI-3>

<langResource-URI-n>

<person-URI-1>

<person-URI-2>

<person-URI-3>

=?

Identity resolution

Page 8: “Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an application)”

<contactPerson> <surname>Monachini</surname> <givenName>Monica</givenName> <communicationInfo> <email>[email protected]</email> <email>[email protected]</email> <url>http://www.ilc.cnr.it/</url> <address>Via Giuseppe Moruzzi</address> <zipCode>56124</zipCode> <city>Pisa</city> <country>Italy</country> </communicationInfo> <affiliation> <organizationName>………</organizationName> <departmentName>Istituto …</departmentName> <communicationInfo> </affiliation> </contactPerson>

http://…/Monica_Monachini

Page 9: “Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an application)”

<contactPerson> <surname>Monachini</surname> <givenName>Monica</givenName> <communicationInfo> <email>[email protected]</email> <email>[email protected]</email> <url>http://www.ilc.cnr.it/</url> <address>Via Giuseppe Moruzzi</address> <zipCode>56124</zipCode> <city>Pisa</city> <country>Italy</country> </communicationInfo> <affiliation> <organizationName>………</organizationName> <departmentName>Istituto …</departmentName> <communicationInfo> </affiliation> </contactPerson>

http://…/Monica_Monachini

Page 10: “Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an application)”

<fundingProject> <projectName> Platform for Automatic, Normalised Annotation and Cost-Effective Acquisition of Language Resources for Human Languages Technologies </projectName> <projectShortName> PANACEA </projectShortName> <url> http://panacea-lr.eu/ </url> <fundingType> euFunds </fundingType> <funder> European Union </funder>

</fundingProject> <organizationInfo>

<organizationName> Consiglio Nazionale delle Ricerche. Istituto di Linguistica Computazionale “Antonio Zampolli” </organizationName> <organizationShortName>CNR</organizationShortName> …

Page 11: “Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an application)”

For each embeded Project/Person/Organization/ 1. Generate: Subject property URI triple for the

backwards relation. – If Person then use “name_givenName” – If “short name” exists use “shortname” – Else use 20 first characters of “long name”

2. Generate URI property object triples as the result of

the union of all local declarations (where union removes duplicate triples).

– This requires a final curation task that agrees on node values

in case they are different.

– The preliminary version needs further curation (we used SPARQL select distinct to identify oddities)

Page 12: “Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an application)”

RDFying Documents:

- DBLP to get full RDF descriptions - Google Scholar to get BibTex descriptions

- For a small dataset this can be assumed. For big

datasets this needs a lot of work (some automatic tasks may be defined)

<document>Quochi V, Frontini F, Rubino F. A MWE Acquisition and Lexicon Builder Web Service. COLING 2012, 24th International Conference on Computational Linguistics, Proceedings of the Conference: Technical

Papers,8-15 December 2012, Mumbai, India</document>

Page 13: “Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an application)”

RDFying - Where to stop?

BIBTEX: @inproceedings {quochi2012mwe, title={A MWE Acquisition and Lexicon Builder Web Service.}, author={Quochi, Valeria & Frontini, Francesca & Rubino, Francesco}, booktitle={COLING}, year={2012}}

DBLP <http://dblp.uni-trier.de/rec/conf/coling/QuochiFR12 > owl:sameAs <http://dblp.org/rec/conf/coling/QuochiFR12> ; dblp:title “A MWE Acquisition and Lexicon Builder Web Service”; dblp:authoredBy <http://dblp.uni-trier.de/pers/q/Quochi:Valeria>; dblp:authoredBy <http://dblp.uni-trier.de/pers/f/Frontini:Francesca>; dblp:authoredBy <http://dblp.uni-trier.de/pers/r/Rubino:Francesco >; dblp:publishedAsPartOf <http://dblp.uni-trier.de/rec/conf/coling/2012 >: dblp:yearOfPublication “2012” .

Page 14: “Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an application)”

Article title creator Mikel Forcada subject discourse analysis, question answering keywords NER, LMF, ... references FreeLing, TreeBank, PANACEA ... language English

RDFying- simple types

Page 15: “Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an application)”

<subject>Gender Studies</subject> <usage>NER</usage> <format>XCES</format> <standard>LMF</standard>

Not only Enumerations but also string elements !!!

RDFying - simple types as instances

Page 16: “Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an application)”

RDFying - simple types as instances

Value Value counter

Resource counter

eng 518 476 en 215 174 EN 120 120 Spa 390 376 es 77 71 ES 10 10

Language codes in MS central node

Page 17: “Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an application)”

Enumerations: object property + Class + instances + checking existing vocabularies ‘free strings’: 1) generate data type property + string value. 2) curation process that: a) identifies ‘enumeration like’ candidates (eg. language) and choose an appropriate Vocabulary b) Match value strings to relevant URIS (Dbpedia)

RDFying - simple types as instances

Page 18: “Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an application)”

SELECT DISTINCT ?language WHERE { ?s ms:languageId ?language }

(eng , en , EN …) INSERT { ?s ms:language <http://.../English>.} WHERE { ?s ms:language “EN". } DELETE { ?s ms:language “EN". }

Curation using SPARQL

RDFying - simple types as instances

Page 19: “Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an application)”

Linking data !!

Person Organization Document Project

Enumerations String valued

VIAF ORCID DBLP Vocabularies

DBpedia

Page 20: “Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an application)”

Linking data !! – linking vs reusing

documentation sameAs

documentation

Page 21: “Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an application)”

Linking data !! – linking vs reusing

http://lod.iula.upf.edu/resources/PAN_metadata_MW_ENV_IT http://lod.iula.upf.edu/resources/doc_37

local URIs

external URIs

Core concepts which belong to some ‘local’ Class.

Instances which belong to some ‘external’ Class:

• Person (FOAF) • Document (BIBO) • Organisation (FOAF) •….

But, some functional reasons:

Page 22: “Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an application)”
Page 23: “Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an application)”
Page 24: “Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an application)”

Why all this ? Is it worth it?

- Displaying data to the user in a comprehensive way - Aggregating external data in a sensitive manner - Making hidden implicit relations explicit.

Page 25: “Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an application)”

<usage>NER</usage> <format>XCES</format> <standard>LMF</standard>

Any good article or tool ?

Page 26: “Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an application)”

NER

Projects

Services Articles

Reports

Named Entity

SELECT * WHERE { ?s ?p ms:NER }

Page 27: “Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an application)”

IULA?

10!

Why all this ? – IULA at MS central node

Page 28: “Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an application)”

IULA?

104

Page 29: “Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an application)”

P E R S O N

ID

name

description

...

A N I S AT I O N

ID

name

description

...

R E S O U R C E

ID

name

description

...

L I C E N S E

ID

name

description

...

Has_

ID

ID

Has_

ID

ID

Has_

ID

ID

Has_

ID

ID

Has_

ID

ID

Has_

ID

ID

D O C U M E N T

ID

name

description

...

P E R S O N

ID

name

description

...

P R O J E C T

ID

name

description

...

SELECT * FROM WHERE { … ...} HELP!!

Everything about IULA?

HELP!!

Page 30: “Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an application)”

SELECT * WHERE { ?s ?p “IULA” }

Page 31: “Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an application)”

SELECT * WHERE { ?s ?p “IULA” }

Page 32: “Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an application)”

sample data (855 records)

Page 33: “Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an application)”

sample data (855 records)

Page 34: “Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an application)”

Why all this ? – data Mashups

Page 35: “Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an application)”

Backwards relations

Page 36: “Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an application)”

• LOD opens new possibilities and SPARQL is a powerful tool

BUT • Curation task is crucial and effort/time consuming. You can

address it as an incremental process.

Publishing LOD vs. deploying LOD applications

• Until now, the LOD community seems to focus on “growing the cloud”

• In this scenario, creating new URIs and mapping to existing URIs is OK but,

• when the focus shifts from growing the cloud to developing applications, new problems will arise: massive redundancy of URIs, trust on third party servers/data, …

Conclussions & reflections