multilingualism in linked data

16
Multilingualism in Linked Data Ontology Engineering Group (OEG) Artificial Intelligence Department Universidad Politécnica de Madrid (UPM) W3C Multilingual Web Workshop Rome, 12-13 March 2013 G.Aguado J. Gracia A. Gómez-Pérez E. Montiel- Ponsoda D. Vila

Upload: others

Post on 04-Feb-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multilingualism in Linked Data

Multilingualism in Linked Data

Ontology Engineering Group (OEG) Artificial Intelligence Department

Universidad Politécnica de Madrid (UPM)

W3C Multilingual Web Workshop Rome, 12-13 March 2013

G.Aguado J. Gracia A. Gómez-Pérez E. Montiel-Ponsoda

D. Vila

Page 2: Multilingualism in Linked Data

Foundations: the model, the data, URIs and links RDF(S) models (ontologies) and data

Cer El Quijote Cervantes Is creator of

Cer Work Person Is creator of

Is a Is a

Unique identifiers: URI identify or name a resource

http://datos.bne.es/resource/XX1718747 http://datos.bne.es/resource/XX3383563

http://iflastandards.info/ns/fr/frbr/frbrer/C1005 http://iflastandards.info/ns/fr/frbr/frbrer/C1001

Equivalence links to other datasets Same As

http://viaf.org/viaf/17220427

Cervantes

Same As Same As

http://dbpedia.org/resource/Miguel_de_Cervantes

Cervantes

Ontology

Data

Asuncion Gomez-Perez Multilingualism in Linked Data. W3C Multilingual Web Workshop. Rome March 2013.

Page 3: Multilingualism in Linked Data

Sources of information in different languages

Geographical  Informa1on  

shp2RDF  

REST  service  annota1on  

Web  2.0  Library  and  Cultural  Heritage  

Diverse  Informa1on  Sensor Networks data

RDF Generation and Linking

Geographical Visualization

Linked Library Data Visualisation

Sensor Data Visualisation

Visualization

Asuncion Gomez-Perez Multilingualism in Linked Data. W3C Multilingual Web Workshop. Rome March 2013.

Page 4: Multilingualism in Linked Data

Observatory of the Multilingual Web of Data

•  Analysis of BTC datasets

4

•  Analyzed literals: 1,072,386,405 •  Total literals with lang tag: 116,058,734 •  % Literals with lang tag: 10.822 % •  % Literals tagged as English: 94.68 %

2011

•  Analyzed literals: 543,933,327 •  Total literals with lang tag: 304,115,676 •  % Literals with lang tag: 55.91 % •  % Literals tagged as English: 94.44 %

2012

Asuncion Gomez-Perez Multilingualism in Linked Data. W3C Multilingual Web Workshop. Rome March 2013.

1 100

10000 1000000

100000000

Eng

lish

Fren

ch

Ger

man

E

nglis

h U

S

Spa

nish

R

uman

ian

Sw

edis

h C

hine

se

Hun

garia

n 2011 2012

Page 5: Multilingualism in Linked Data

A motivating example for using multilingual LD [1]

“Dame farmacias de guardia en Colonia que

tengan Beglan” (*)

Medicine catalog

German chemists

Cities

Köln Serevent

(*) Give me the duty chemists in Cologne having Beglan

[1] J. Gracia, E. M. Ponsoda, P. Cimiano, A. G. Pérez, P. Buitelaar, and J. McCrae, "Challenges for the multilingual Web of Data," Journal of Web Semantics

5 Asuncion Gomez-Perez Multilingualism in Linked Data. W3C Multilingual Web Workshop. Rome March 2013.

Page 6: Multilingualism in Linked Data

Multilingualism and the Linked Data Process [2]

Specification

Modelling

RDF Generation

Publication

Links Generation

Exploitation

•  Monolingual or multilingual data resources •  DB, documents, tables, etc. •  Linguistic resources: Dictionaries, Lexicons, Thesauri, etc.

•  Ontology(TBox URIs) http://phenomenontology.linkeddata.es/ontology/Municipio http://iflastandards.info/ns/fr/frbr/frbrer/C1005

•  Data (ABox URIs) http://geo.linkeddata.es/resource/Municipio/Madrid

http://datos.bne.es/resource/XX1718747

[2] Villazón-Terrazas, B. et al., Methodological Guidelines for Publishing Government Linked Data. In D. Wood, ed. Linking Government Data. Springer.

6 Asuncion Gomez-Perez Multilingualism in Linked Data. W3C Multilingual Web Workshop. Rome March 2013.

Page 7: Multilingualism in Linked Data

Specification

Modelling

RDF Generation

Publication

Links Generation

Exploitation

How can we adapt and translate the lexical/terminological layer of an existent ontology into other languages?

Multilingual labeling approach if languages

involved share a single view on a certain domain

Ontology Localization Algorithms

Cross-lingual linking approach if independent monolingual

ontologies exist that cover same or similar subject domain (Problems: conceptualization mismatches, or

granularity and viewpoint differences)

Cross-lingual Mapping Algorithms

Multilingualism and the Linked Data Process

7 Asuncion Gomez-Perez Multilingualism in Linked Data. W3C Multilingual Web Workshop. Rome March 2013.

Page 8: Multilingualism in Linked Data

Multilingualism and the Linked Data Process

How to represent multilingual Linked Data? §  Traditional annotation properties for most cases §  Richer models for more demanding applications

LexInfo

dbpedia:Miguel_de_Cervantes rdfs:label "Miguel de Cervantes"@es .

"ミゲル・デ・セルバンテス"@ja . "미겔 데 세르반테스"@ko .

LIR SKOS-XL

8

Specification

Modelling

RDF Generation

Publication

Links Generation

Exploitation

Asuncion Gomez-Perez Multilingualism in Linked Data. W3C Multilingual Web Workshop. Rome March 2013.

Page 9: Multilingualism in Linked Data

Main issues of cross-lingual linking

"   How to discover cross-lingual links ?

"   How to represent cross-lingual links?

"   How to store and reuse cross-lingual links?

9

Specification

Modelling

RDF Generation

Publication

Links Generation

Exploitation

Asuncion Gomez-Perez Multilingualism in Linked Data. W3C Multilingual Web Workshop. Rome March 2013.

Page 10: Multilingualism in Linked Data

Multilingualism and the Linked Data Process

Specification

Modelling

RDF Generation

Publication

Links Generation

Exploitation

How to discover correspondences between ontologies and between LD expressed in different natural languages?

Health  ontology  

Medicines catalog

Health    ontology  

Medicines catalog

Cross-lingual links

Cross-lingual links

Beglan

salmeterol salmeterol

inhalador der Inhalator

Medikament medicamento

rdfs:type rdfs:type

Arzneistoff

via

principio activo via

Serevent

10 Asuncion Gomez-Perez Multilingualism in Linked Data. W3C Multilingual Web Workshop. Rome March 2013.

Page 11: Multilingualism in Linked Data

Cross-lingual Link Discovery

11

Specification

Modelling

RDF Generation

Publication

Links Generation

Exploitation

onto2

Monolingual OM

Translation

onto1 onto1’

Alignment

1.  Projecting lexical content of the ontology into a common language, then applying traditional OM techniques

2.  Comparing ontology entities directly by means of cross-lingual semantic measures (see CIDER-CL)

Cross-lingual OM

onto1

onto2

Alignment

Asuncion Gomez-Perez Multilingualism in Linked Data. W3C Multilingual Web Workshop. Rome March 2013.

Page 12: Multilingualism in Linked Data

Cross-lingual Link Storage and Reuse

•  Links can be discovered: §  runtime -> need of scalable techniques §  offline -> need of storage methods

•  Storage §  Following Linked Data principles §  Links can be stored jointly to some of the data

sources that they relate (e.g., during LD generation)

§  Links can be stored in separate repositories to be accessed by semantic applications (e.g., for CL-Question Answering)

12

Specification

Modelling

RDF Generation

Publication

Links Generation

Exploitation

Asuncion Gomez-Perez Multilingualism in Linked Data. W3C Multilingual Web Workshop. Rome March 2013.

Page 13: Multilingualism in Linked Data

Multilingualism and the Linked Data Process

How can a user pose questions in their own language to be processed against the web of Linked Data? Specification

Modelling

RDF Generation

Publication

Links Generation

Exploitation

Semantic query

“Colonia” “farmacia”

13

1.  Multilingual query interpretation 2.  Query federation, ...

How should the results of a semantic query be adapted to the linguistic and cultural background of a user?

1.  Adaptation and localization of user interfaces 2.  Natural language generation 3.  Presentation views to specific linguistic and cultural contexts

Asuncion Gomez-Perez Multilingualism in Linked Data. W3C Multilingual Web Workshop. Rome March 2013.

Page 14: Multilingualism in Linked Data

Services for the Multilingual Web of Data

Services  for  cross-­‐lingual  access  

Users  

Data  silos   ...  

Services  for  genera8ng  mul8lingual  Linked  Data    

Mul8lingual    linguis8c    informa8on  

Services  for    cross-­‐lingual  linkage  

Mul8lingual  mappings  

Linked  Data  

Services  for    transla8on  and  

ontology  localiza8on  

14 Asuncion Gomez-Perez Multilingualism in Linked Data. W3C Multilingual Web Workshop. Rome March 2013.

Page 15: Multilingualism in Linked Data

Thanks for your attention !

15

Page 16: Multilingualism in Linked Data

Research agenda on Multilingual LD at OEG

•  Ontology lexica representation Elena Montiel, Lupe Aguado

•  Lexico-syntactic patterns Elena Montiel, Lupe Aguado

•  Ontology localisation (translation) Elena Montiel, Jorge Gracia, Asun Gomez-Perez

•  Exploratory analysis of the Multilingual Web of Data Daniel Vila, Asun Gómez-Pérez, Jorge Gracia

•  Cross-lingual ontology and Instance matching Jorge Gracia, Daniel Vila

•  Query federation Oscar Corcho

16