multilingual access to cultural heritage content on the semantic web - acl2013
DESCRIPTION
Presentation of a method for NLP to Ontology interoperability for multilingual communication with museum content on teh Semantic WebTRANSCRIPT
Multilingual Access to Cultural Heritage Content on the Semantic
Web
Dana Dannells, Aarne Ranta, Ramona Enache, Mariana Damova, Maria Mateva [email protected]
LATECH - ACL’2013Sofia, August 2013
The aim …
To build an ontology-based application for communication of museum content on the Semantic Web and make it accessible in 15 languages.
Why ? …• Semantic Technologies• Communicating with Semantic Web
easier for computers than for humans• Cultural heritage on the Web requires easy human computer interaction
Natural LanguageLATECH - ACL’2013
• Easy integration of multiple data-sources– once the schemata of these sources is semantically aligned, the inference
capabilities of the engine supports the interlinking and combination of the facts from the different sources;
• Easy querying against rich or diverse data schemata – inference is applied to match semantics of the query to the semantics of
the data, regardless of the vocabulary and the data modeling patterns used for encoding of the data;
• Based on open standards (W3C) – RDF, RDFs– OWL– SPARQL
Semantic Technologies: Main Characteristics
"The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.“ [Berners-Lee et al. "The Semantic Web", Scientific American, May 2001]
LATECH - ACL’2013
• graphs published on the web and explorable across servers in a manner similar to the way the HTML web is navigated
• combining facts and knowledge from different datasets is the ultimate goal of the Semantic Web
Need of convincing real life use cases demonstrating the benefits of these technologies
Linked Open Data: Driver of the Semantic Web
MacManus, the Founder and Editor-in-Chief of ReadWriteWeb defined an exemplary test for the Semantic Web
cities around the world which have Modigliani art works
Cultural Heritage
FactFoge (factfoge.net ) gives the result based on info from 6 datasets)LATECH - ACL’2013
Outline
http://molto-project.eu
• Museum Reason-able View: Interoperable cultural heritage knowledge bases
• Ontology-based multilingual grammar for retrieving and generating museum content:– RDF to NL – well-formed descriptions– NL to RDF – SPARQL queries linearization
• Cross-language retrieval and representation system using Semantic Web technology
LATECH - ACL’2013
Museum Reason-able View: Interoperable cultural heritage knowledge base
Domain ontology: CIDOC Conceptual Reference Model (CRM) v. 5.0.1Application ontologies: Painting ontology and Museum Artifacts Ontology (MAO)
Ontology Classes PropertiesCIDOC-CRM 87 130Painting ontology 197 107MAO 10 20Total 836 440
The museum Linked Open Data (LOD)Gothenburg City Museum (GCM): Only two collections (GSM and GIM)DBpedia: Larger amount of painting entities
Source Amount of entitiesGCM 48DBpedia 614Total 662
Loaded into OWLIM semantic repository, inference with respect to OWL HorstAmounting to 1,987,616 retrievable RDF triples, from 460,367 explicit statements including
label translations in multiple languages
- Paintings from the Gothenburg museum acquired though donation- Museum objects from Britain- Museum artefacts preserved in the museum since 2005- Paintings from the GSM collection- Inventory numbers of the paintings from the GSM collection - Location of the objects created by Anders Hafrin- Paintings with length less than 1 m
PREFIX fb: <http://rdf.freebase.com/ns/>PREFIX dbpedia: <http://dbpedia.org/resource/>PREFIX dbp-ont: <http://dbpedia.org/ontology/>PREFIX umbel-sc: <http://umbel.org/umbel/sc/>PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX ff: <http://factforge.net/>
SELECT DISTINCT ?painting_l ?owner_l ?city_fb_con ?city_db_lWHERE {
dbpedia:Amedeo_Modigliani fb:visual_art.visual_artist.artworks ?o . ?o fb:visual_art.artwork.owners [ fb:visual_art.artwork_owner_relationship.owner ?ow ] .
?o fb:type.object.name ?painting_l. ?ow ff:preferredLabel ?owner_l .
OPTIONAL { ?ow fb:location.location.containedby[ rdf:type dbp-ont:City ; ff:preferredLabel ?city_fb_con ] } .
OPTIONAL { ?ow dbp-ont:location [ rdf:type dbp-ont:City ; ff:preferredLabel ?city_db_l ] } FILTER ( bound(?city_fb_con) || bound(?city_db_l) )}
NLP Interface required
LATECH - ACL’2013
GF – Grammatical Framework
TextQuery Answer
Data
YAQLRGL
Lexicon
Application Grammar
A grammar formalism, based on Martin Loef’s type theoryDivision between- abstract syntax, i.e. the semantic representation of the domain- concrete syntaxes, representing linearizations in various target languages- resource library with the syntaxe of 30 languages, covering a couple of hundred of syntactic structures
Multilingual coverage
Supported languagesBulgarian, Catalan, Danish, Dutch, English, Finnish, French, Hebrew, Italian, German, Norwegian, Romanian, Russian, Spanish, Swedish
LATECH - ACL’2013
NL to RDF interoperability
LATECH - ACL’2013
Querying mechanism
LATECH - ACL’2013
Text grammarCovers a subset of the ontology classes and propertiesNine classes: Title, Painter, Type, Colour, Size, Year, Material, Museum, Place
One function, three sentences Painting description, e.g. query: Everything known about a paintingGuernica was painted by Pablo Picasso in 1937. It measures 349 by 776 cm. This painting is displayed at the Museo Nacional Centro de Arte Reina Sofía.
LATECH - ACL’2013
Lexicon grammar
Input:createdBy (Guernica, Pablo Picasso) isA (Pablo Picasso, Painter)isA (Guernica, Painting)
Direct verbalization: Guernica is a painting. Guernica was created by Pablo
Picasso. Pablo Picasso is a painter.
Actual realization: Guenica was painted by Pablo Picasso.
LATECH - ACL’2013
Multilingual data grammarContains ontology entities that were extracted from GCM and
DBpediaNo adequate translations from DBpedia; Entities of museum
names are automatically translated from WikipediaClass EntitiesTitle 662Painter 116Museum 104Place 22 Total 904
LATECH - ACL’2013
Ita. s1 : Text = mkText (mkS (mkCl painting (mkVP (mkVP (mkVP (mkVP dipinto_A) material.s) (SyntaxIta.mkAdv by8agent_Prep (title painter.long))) year.s))) ; Fre. s1 : Text = mkText (mkS anteriorAnt (mkCl painting (mkVP (mkVP (mkVP (passiveVP paint_V2) material.s) (SyntaxFre.mkAdv by8agent_Prep (title painter.long))) year.s))) ;Ger. s1 : Text = mkText (mkS pastTense (mkCl painting (mkVP (mkVP (mkVP (passiveVP paint_V2) year.s) (SyntaxGer.mkAdv von_Prep (title painter.long))) material.s))); Rus. s1 : Text = mkText (mkS pastTense (mkCl painting (mkVP (mkVP (mkVP (passiveVP paint_V2) (SyntaxRus.mkAdv part_Prep (title painter.long masculine animate))) material.s) year.s))) ;
Multilingual text generation
LATECH - ACL’2013
Example: Show everything you know about all paintings at the Louvre (English)
Unfinished portrait of General Bonaparte was painted on canvas by Jacques-Louis David in 1798. It measures 65 by 81 cm. This work is displayed at the Musée du Louvre.
Unfinished portrait of General Bonaparte a été peint sur canvas par Jacques-Louis David en 1798. Il est de 65 sur 81 cm. Cette oeuvre est exposée au Musée du Louvre.
http://museum.ontotext.com
LATECH - ACL’2013
Multilingual Challenges
• Lexicalizations Classes: compounds, multiword expressions Properties: verbs, adverbs, prepositions
• Order of semantic elements Material, Year
• Tense and voice Past, past participle, present, active/passive
• Aggregation Conjunction, relative clause, punctuations
• Coreference Pronoun, noun, empty reference
LATECH - ACL’2013
Related WorkIon Androutsopoulos, Vassiliki Kokkinaki, Aggeliki Dimitromanolaki, Jo Calder,
Jon Oberl, and Elena Not. 2001. Generating multilingual personalized descriptions of museum exhibits: the M-PIRO project. In Proceedings of the International Conference on Computer Applications and Quantitative Methods in Archaeology.
Nadjet Bouayad-Agha, Gerard Casamayor, Si- mon Mille, Marco Rospocher, Horacio Saggion, Luciano Serafini, and Leo Wanner. 2012. From Ontology to NL: Generation of multilingual user-oriented environmental reports. Lecture Notes in Computer Science, 7337.
Basil Ell, Denny Vrandeci˘c, and Elena Simperl. 2012. SPARTIQULATION – Verbalizing SPARQL queries. In Proceedings of ILD Workshop, ESWC 2012.
Axel-Cyrille Ngonga Ngomo, Lorenz Buhmann,¨ Christina Unger, Jens Lehmann, and Daniel Gerber. 2013. Sorry, i don’t speak sparql: trans- lating sparql queries into natural language. In Proceedings of the 22nd international conference on World Wide Web, WWW ’13, pages 977– 988, Republic and Canton of Geneva, Switzerland. International World Wide Web Conferences Steering Committee.
LATECH - ACL’2013
Conclusion and Future Work
• Method for NL to Ontology interoperability• Large scale Semantic Web interlinked dataset• Multilingual analysis and generation on the fly
• Experiment with the coverage of the NL grammars• Incorporate in a real application, i.e. Europeana• Scalability of GF• Enlarge the knowledge base
LATECH - ACL’2013
References http://molto-project.euMariana Damova, Dana Dannells, Ramona Enache, Maria Mateva, Aarne Ranta
Natural Language Interaction with Semantic Web Knowledge Bases and LOD. In: Towards the Multilingual Semantic Web, Paul Buitelaar and Philip Cimiano, eds., Springer, Heidelberg, Germany, 2013.
Dana Dannells, Mariana Damova, Ramona Enache, Milen Chechev. Multilingual Online Generation from Semantic Web Ontologies. In: Proceedings of WWW'2012, Lyon, France, April 2012.
Dana Dannells, Mariana Damova, Ramona Enache, and Milen Chechev. A Framework for Improved Access to Museum Databases in the Semantic Web. In Recent Advances in Natural Language Processing (RANLP). Language Technologies for Digital Humanities and Cultural Heritage (LaTeCH), Hissar, Bulgaria, 2011.
Mariana Damova, Dana Dannells. Reason-able View of Linked Data for Cultural Heritage In: Proceedings of S3T'2011, Burgas, Bulgaria, September 2011, Advances in Intelligent and Soft Computing, ISSN: 1867-5662, Springer Verlag, Heidelberg, Germany, 2011.
LATECH - ACL’2013