multilingual access to cultural heritage content on the semantic web - acl2013

19
Multilingual Access to Cultural Heritage Content on the Semantic Web Dana Dannells, Aarne Ranta, Ramona Enache, Mariana Damova, Maria Mateva [email protected] LATECH - ACL’2013 Sofia, August 2013

Upload: mariana-damova

Post on 06-May-2015

363 views

Category:

Technology


1 download

DESCRIPTION

Presentation of a method for NLP to Ontology interoperability for multilingual communication with museum content on teh Semantic Web

TRANSCRIPT

Page 1: Multilingual Access to Cultural Heritage Content on the Semantic Web - Acl2013

Multilingual Access to Cultural Heritage Content on the Semantic

Web

Dana Dannells, Aarne Ranta, Ramona Enache, Mariana Damova, Maria Mateva [email protected]

LATECH - ACL’2013Sofia, August 2013

Page 2: Multilingual Access to Cultural Heritage Content on the Semantic Web - Acl2013

The aim …

To build an ontology-based application for communication of museum content on the Semantic Web and make it accessible in 15 languages.

Why ? …• Semantic Technologies• Communicating with Semantic Web

easier for computers than for humans• Cultural heritage on the Web requires easy human computer interaction

Natural LanguageLATECH - ACL’2013

Page 3: Multilingual Access to Cultural Heritage Content on the Semantic Web - Acl2013

• Easy integration of multiple data-sources– once the schemata of these sources is semantically aligned, the inference

capabilities of the engine supports the interlinking and combination of the facts from the different sources;

• Easy querying against rich or diverse data schemata – inference is applied to match semantics of the query to the semantics of

the data, regardless of the vocabulary and the data modeling patterns used for encoding of the data;

• Based on open standards (W3C) – RDF, RDFs– OWL– SPARQL

Semantic Technologies: Main Characteristics

"The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.“ [Berners-Lee et al. "The Semantic Web", Scientific American, May 2001]

LATECH - ACL’2013

Page 4: Multilingual Access to Cultural Heritage Content on the Semantic Web - Acl2013

• graphs published on the web and explorable across servers in a manner similar to the way the HTML web is navigated

• combining facts and knowledge from different datasets is the ultimate goal of the Semantic Web

Need of convincing real life use cases demonstrating the benefits of these technologies

Linked Open Data: Driver of the Semantic Web

MacManus, the Founder and Editor-in-Chief of ReadWriteWeb defined an exemplary test for the Semantic Web

cities around the world which have Modigliani art works

Cultural Heritage

FactFoge (factfoge.net ) gives the result based on info from 6 datasets)LATECH - ACL’2013

Page 5: Multilingual Access to Cultural Heritage Content on the Semantic Web - Acl2013

Outline

http://molto-project.eu

• Museum Reason-able View: Interoperable cultural heritage knowledge bases

• Ontology-based multilingual grammar for retrieving and generating museum content:– RDF to NL – well-formed descriptions– NL to RDF – SPARQL queries linearization

• Cross-language retrieval and representation system using Semantic Web technology

LATECH - ACL’2013

Page 6: Multilingual Access to Cultural Heritage Content on the Semantic Web - Acl2013

Museum Reason-able View: Interoperable cultural heritage knowledge base

Domain ontology: CIDOC Conceptual Reference Model (CRM) v. 5.0.1Application ontologies: Painting ontology and Museum Artifacts Ontology (MAO)

Ontology Classes PropertiesCIDOC-CRM 87 130Painting ontology 197 107MAO 10 20Total 836 440

The museum Linked Open Data (LOD)Gothenburg City Museum (GCM): Only two collections (GSM and GIM)DBpedia: Larger amount of painting entities

Source Amount of entitiesGCM 48DBpedia 614Total 662

Loaded into OWLIM semantic repository, inference with respect to OWL HorstAmounting to 1,987,616 retrievable RDF triples, from 460,367 explicit statements including

label translations in multiple languages

- Paintings from the Gothenburg museum acquired though donation- Museum objects from Britain- Museum artefacts preserved in the museum since 2005- Paintings from the GSM collection- Inventory numbers of the paintings from the GSM collection - Location of the objects created by Anders Hafrin- Paintings with length less than 1 m

PREFIX fb: <http://rdf.freebase.com/ns/>PREFIX dbpedia: <http://dbpedia.org/resource/>PREFIX dbp-ont: <http://dbpedia.org/ontology/>PREFIX umbel-sc: <http://umbel.org/umbel/sc/>PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX ff: <http://factforge.net/>

SELECT DISTINCT ?painting_l ?owner_l ?city_fb_con ?city_db_lWHERE {

dbpedia:Amedeo_Modigliani fb:visual_art.visual_artist.artworks ?o . ?o fb:visual_art.artwork.owners [ fb:visual_art.artwork_owner_relationship.owner ?ow ] .

?o fb:type.object.name ?painting_l. ?ow ff:preferredLabel ?owner_l .

OPTIONAL { ?ow fb:location.location.containedby[ rdf:type dbp-ont:City ; ff:preferredLabel ?city_fb_con ] } .

OPTIONAL { ?ow dbp-ont:location [ rdf:type dbp-ont:City ; ff:preferredLabel ?city_db_l ] } FILTER ( bound(?city_fb_con) || bound(?city_db_l) )}

NLP Interface required

LATECH - ACL’2013

Page 7: Multilingual Access to Cultural Heritage Content on the Semantic Web - Acl2013

GF – Grammatical Framework

TextQuery Answer

Data

YAQLRGL

Lexicon

Application Grammar

A grammar formalism, based on Martin Loef’s type theoryDivision between- abstract syntax, i.e. the semantic representation of the domain- concrete syntaxes, representing linearizations in various target languages- resource library with the syntaxe of 30 languages, covering a couple of hundred of syntactic structures

Multilingual coverage

Supported languagesBulgarian, Catalan, Danish, Dutch, English, Finnish, French, Hebrew, Italian, German, Norwegian, Romanian, Russian, Spanish, Swedish

LATECH - ACL’2013

Page 8: Multilingual Access to Cultural Heritage Content on the Semantic Web - Acl2013

NL to RDF interoperability

LATECH - ACL’2013

Page 9: Multilingual Access to Cultural Heritage Content on the Semantic Web - Acl2013

Querying mechanism

LATECH - ACL’2013

Page 10: Multilingual Access to Cultural Heritage Content on the Semantic Web - Acl2013

Text grammarCovers a subset of the ontology classes and propertiesNine classes: Title, Painter, Type, Colour, Size, Year, Material, Museum, Place

One function, three sentences Painting description, e.g. query: Everything known about a paintingGuernica was painted by Pablo Picasso in 1937. It measures 349 by 776 cm. This painting is displayed at the Museo Nacional Centro de Arte Reina Sofía.

LATECH - ACL’2013

Page 11: Multilingual Access to Cultural Heritage Content on the Semantic Web - Acl2013

Lexicon grammar

Input:createdBy (Guernica, Pablo Picasso) isA (Pablo Picasso, Painter)isA (Guernica, Painting)

Direct verbalization: Guernica is a painting. Guernica was created by Pablo

Picasso. Pablo Picasso is a painter.

Actual realization: Guenica was painted by Pablo Picasso.

LATECH - ACL’2013

Page 12: Multilingual Access to Cultural Heritage Content on the Semantic Web - Acl2013

Multilingual data grammarContains ontology entities that were extracted from GCM and

DBpediaNo adequate translations from DBpedia; Entities of museum

names are automatically translated from WikipediaClass EntitiesTitle 662Painter 116Museum 104Place 22 Total 904

LATECH - ACL’2013

Page 13: Multilingual Access to Cultural Heritage Content on the Semantic Web - Acl2013

Ita. s1 : Text = mkText (mkS (mkCl painting (mkVP (mkVP (mkVP (mkVP dipinto_A) material.s) (SyntaxIta.mkAdv by8agent_Prep (title painter.long))) year.s))) ; Fre. s1 : Text = mkText (mkS anteriorAnt (mkCl painting (mkVP (mkVP (mkVP (passiveVP paint_V2) material.s) (SyntaxFre.mkAdv by8agent_Prep (title painter.long))) year.s))) ;Ger. s1 : Text = mkText (mkS pastTense (mkCl painting (mkVP (mkVP (mkVP (passiveVP paint_V2) year.s) (SyntaxGer.mkAdv von_Prep (title painter.long))) material.s))); Rus. s1 : Text = mkText (mkS pastTense (mkCl painting (mkVP (mkVP (mkVP (passiveVP paint_V2) (SyntaxRus.mkAdv part_Prep (title painter.long masculine animate))) material.s) year.s))) ;

Multilingual text generation

LATECH - ACL’2013

Page 14: Multilingual Access to Cultural Heritage Content on the Semantic Web - Acl2013

Example: Show everything you know about all paintings at the Louvre (English)

Unfinished portrait of General Bonaparte was painted on canvas by Jacques-Louis David in 1798. It measures 65 by 81 cm. This work is displayed at the Musée du Louvre.

Unfinished portrait of General Bonaparte a été peint sur canvas par Jacques-Louis David en 1798. Il est de 65 sur 81 cm. Cette oeuvre est exposée au Musée du Louvre.

http://museum.ontotext.com

LATECH - ACL’2013

Page 15: Multilingual Access to Cultural Heritage Content on the Semantic Web - Acl2013

Multilingual Challenges

• Lexicalizations Classes: compounds, multiword expressions Properties: verbs, adverbs, prepositions

• Order of semantic elements Material, Year

• Tense and voice Past, past participle, present, active/passive

• Aggregation Conjunction, relative clause, punctuations

• Coreference Pronoun, noun, empty reference

LATECH - ACL’2013

Page 16: Multilingual Access to Cultural Heritage Content on the Semantic Web - Acl2013

Related WorkIon Androutsopoulos, Vassiliki Kokkinaki, Aggeliki Dimitromanolaki, Jo Calder,

Jon Oberl, and Elena Not. 2001. Generating multilingual personalized descriptions of museum exhibits: the M-PIRO project. In Proceedings of the International Conference on Computer Applications and Quantitative Methods in Archaeology.

Nadjet Bouayad-Agha, Gerard Casamayor, Si- mon Mille, Marco Rospocher, Horacio Saggion, Luciano Serafini, and Leo Wanner. 2012. From Ontology to NL: Generation of multilingual user-oriented environmental reports. Lecture Notes in Computer Science, 7337.

Basil Ell, Denny Vrandeci˘c, and Elena Simperl. 2012. SPARTIQULATION – Verbalizing SPARQL queries. In Proceedings of ILD Workshop, ESWC 2012.

Axel-Cyrille Ngonga Ngomo, Lorenz Buhmann,¨ Christina Unger, Jens Lehmann, and Daniel Gerber. 2013. Sorry, i don’t speak sparql: trans- lating sparql queries into natural language. In Proceedings of the 22nd international conference on World Wide Web, WWW ’13, pages 977– 988, Republic and Canton of Geneva, Switzerland. International World Wide Web Conferences Steering Committee.

LATECH - ACL’2013

Page 17: Multilingual Access to Cultural Heritage Content on the Semantic Web - Acl2013

Conclusion and Future Work

• Method for NL to Ontology interoperability• Large scale Semantic Web interlinked dataset• Multilingual analysis and generation on the fly

• Experiment with the coverage of the NL grammars• Incorporate in a real application, i.e. Europeana• Scalability of GF• Enlarge the knowledge base

LATECH - ACL’2013

Page 18: Multilingual Access to Cultural Heritage Content on the Semantic Web - Acl2013

References http://molto-project.euMariana Damova, Dana Dannells, Ramona Enache, Maria Mateva, Aarne Ranta

Natural Language Interaction with Semantic Web Knowledge Bases and LOD. In: Towards the Multilingual Semantic Web, Paul Buitelaar and Philip Cimiano, eds., Springer, Heidelberg, Germany, 2013.

Dana Dannells, Mariana Damova, Ramona Enache, Milen Chechev. Multilingual Online Generation from Semantic Web Ontologies. In: Proceedings of WWW'2012, Lyon, France, April 2012.

Dana Dannells, Mariana Damova, Ramona Enache, and Milen Chechev. A Framework for Improved Access to Museum Databases in the Semantic Web. In Recent Advances in Natural Language Processing (RANLP). Language Technologies for Digital Humanities and Cultural Heritage (LaTeCH), Hissar, Bulgaria, 2011.

Mariana Damova, Dana Dannells. Reason-able View of Linked Data for Cultural Heritage In: Proceedings of S3T'2011, Burgas, Bulgaria, September 2011, Advances in Intelligent and Soft Computing, ISSN: 1867-5662, Springer Verlag, Heidelberg, Germany, 2011.

LATECH - ACL’2013

Page 19: Multilingual Access to Cultural Heritage Content on the Semantic Web - Acl2013

Thank you for your attention

Questions

[email protected] - ACL’2013