europeana datainaction nov2012

21
Europeana Semantic Data in Action (a Pilot Service based on OWLIM) http://europeana.ontotext.com Mariana Damova (PhD) (with contribution to the work by Antoine Isaac, Valentine Charles, Zdravko Tashev, Svetoslav Petrov) Europeana AGM November 2012

Upload: mariana-damova

Post on 20-Jun-2015

267 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Europeana datainaction nov2012

Europeana Semantic Data in

Action (a Pilot Service based on OWLIM)

http://europeana.ontotext.com

Mariana Damova (PhD)

(with contribution to the work by Antoine Isaac, Valentine Charles,

Zdravko Tashev, Svetoslav Petrov)

Europeana AGM

November 2012

Page 2: Europeana datainaction nov2012

September 2012

Page 3: Europeana datainaction nov2012

Europeana Data Standards

• Unified metadata • ESE – Europeana Semantic Elements

• DublinCore & Europeana fields• 36 fields: flat, limited ability semantic links

dc:title europeana:provider dc:creator europeana:dataProvider dc:subject europeana:rights dc:description europeana:typedc:publisher europeana:isShownBy and/or europeana:isShownAt … …

3

• EDM - Europeana Data Model

Basic data model Two contextual classes

Page 4: Europeana datainaction nov2012

Europeana Data in EDM

• 268GB of data in RDF

• 20M+ cultural objects data and linkages to other datasets, mainly DBpedia

4

datasets, mainly DBpedia• EDM model• SKOS

Page 5: Europeana datainaction nov2012

Semantic Technologies – Main Features

• Semantic technologies (RDF, LOD) allow for an unprecedented ease of

integration of heterogeneous data sources

– Already adopted in pharmaceuticals and publishing industries

BBC – when MySQL was replaced with OWLIM in their “Dynamic Semantic

Publishing” architecture, the BBC team observed considerable reduction of

complexity of database design, query specification, application

development, and query evaluation time. BBC World Cup 2010 dynamic

semantic publishing. Jem Rayfield, Senior Technical Architect BBC News

and Knowledge.

http://www.bbc.co.uk/blogs/bbcinternet/2010/07/bbc_world_cup_2010_dyna

mic_sem.html

Page 6: Europeana datainaction nov2012

Linking Open Data

• Linking Open Data (LOD) W3C SWEO Community project http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData

• Initiative for publishing “linked data” – a set of principles,

which allows browsing of RDF data, spread across different

servers, in the way HTML is browsed

Page 7: Europeana datainaction nov2012

Semantic Technologies and Cultural Heritage

combining facts and knowledge from different datasets need for

convincing real life use cases demonstrating the benefits of these

technologies

The cultural heritage domain can become a useful usecase for the

application of semantic technologies.application of semantic technologies.

MacManus, the Founder and Editor-in-Chief of ReadWriteWeb

defined an exemplary test for the Semantic Web

cities around the world which have Modigliani art works

Page 8: Europeana datainaction nov2012

FactForge of Ontotext solves the Modigliani query

by combining knowledge from 6 datasets from the Linked Open Data Cloud

http://factforge.net

Page 9: Europeana datainaction nov2012

OWLIM - a scalable, robust and efficient triple store

– Serving the two most important web-sites for the London Olympic Games• Official Olympics website

• BBC Olympics website

– Performance highlights• OWLIM loads the 100M and the 200M datasets almost twice as fast as the next best product (17

min. for 100M)

• Best query performance among those repositories that can handle update and multi-client query

tasks (5,285 Query-mixes-per-hour, where a query mix contains 25 queries; e.g. about 100

queries/sec)

• OWLIM v5 is 43% faster than v.4.3 on the BSBM Explore and Update scenario• OWLIM v5 is 43% faster than v.4.3 on the BSBM Explore and Update scenario

• OWLIM v5 requires between 25% and 70% less storage space

• OWL 2 RL-type languages have proven to be the only feasible approach for

reasoning with billion statements

Page 10: Europeana datainaction nov2012

Reason-able View with Europeana data in EDM

• 268GB of data• cultural objects data and linkages to other datasets

Loaded into OWLIM with inference wrt OWL-Horst Optimized

Dataset size:Dataset size:NumberOfStatements=3,899,531,218NumberOfExplicitStatements= 993,332,911NumberOfEntities=264,523,842

EDM modelSKOS

Page 11: Europeana datainaction nov2012

SPARQL endpoint

• http://europeana.ontotext.com

Page 12: Europeana datainaction nov2012
Page 13: Europeana datainaction nov2012

Semantic Queries over Structured Data

• Available objects with their aggregators

• Data providers having contributing content to Europeana

• Datasets from Italy

• Objects from the 18th century provided to Europeana

• The original URL, the copyright and the creative commons right of objects provided by The

European Library

• Copyrights and Creative Commons rights of Europeana objects per provider

• Enrichment statements produced by Europeana for objects provided by institutions from

the United Kingdom

• List of Europeana enriched objects from Sweden, their equivalents and related entities

• Time enrichment statements produced by Europeana for provided objects

• The complete ordered list of Europeana aggregators and the specific data providers they

gather

Page 14: Europeana datainaction nov2012

Europeana objects with their images

Page 15: Europeana datainaction nov2012

Other cultural heritage sources available for interlinking

Gothenburg City Museum objects

• Oil paintings from the GIM collection

• Paintings of value less than 5000 Swedish Krona

• Paintings with a Gothenburg motive• Paintings with a Gothenburg motive

• Portraits and their painters

• Museum Objects from Swedish Museums

• Museum objects of height more than 30 centimeter

• Paintings given as a present to the Gothenburg City Museum

http://museum.ontotext.com

Page 16: Europeana datainaction nov2012

Linking Open Data Cloud

Page 17: Europeana datainaction nov2012

Europeana Creative - PSP projectlead by the Austrian National Library26 partnersObjective: experimenting with re-use of cultural

content for creativityProject: Europeana re-use framework and 6 pilots in

Outlook …

17Sofia, 13 March 2012

Project: Europeana re-use framework and 6 pilots in different domains such as education, tourism, etc.

Ontotext: participate in the infrastructure for re-use with the semantic repository OWLIM, and data integration

Page 18: Europeana datainaction nov2012

Ontotext

– Top-5 provider of core Semantic Technology

– Established in year 2000; offices in Bulgaria, UK, USA

– Active both in research and commercial projects (FP7 funding for 10 years)

• 360° semantic technology – unique portfolio:

– Semantic Databases: high-performance RDF DBMS, scalable reasoning

– Semantic Search: text-mining (IE), metadata generation, Information Retrieval (IR)

– Web Mining: focused crawling, screen scraping, data fusion

– Linked Data Management and Data Integration

Good recognition in the SemTech community

– Ontotext pages are ranked #1 for “semantic annotation” and “semantic repository” at

GYM, #3 for “linked data management” at Google

Several joint ventures and subsidiaries

– Innovantage: leading online recruitment intelligence provider in UK

Page 19: Europeana datainaction nov2012

Ontotext Clients (selected)

British Broadcasting Corporation (BBC)– Run its World Cup 2010 sites on top of OWLIM

– Since Mar’12 BBC Sports

– 2012 Olympics sections are driven by OWLIM and a Concept Extraction service developed by Ontotext

Press Association (UK)– Analysis of Sports news

– Concept extraction

– Linked data generation– Linked data generation

Top-3 USA media (not allowed to name)

The National Archives (UK) contracted Ontotext to implement semantic KB and semantic search for the Government Web Archive

British Museum (UK) Ontotext leads the development of Phase 3 of ResearchSpace project on collaborative research in cultural heritage; British Museum’s public SPARQL end-point is powered by OWLIM

Page 20: Europeana datainaction nov2012

Ontotext in the Cultural Heritage Domain

Selected commercial projects

ResearchSpace project funded by the Andrew W. Mellon Foundation Support for collaborative web-based research, information sharing and web publishing for the cultural heritage scholarly community. An Ontotext-led international consortium.

The Polish Digital National Museum aggregates artifacts from over 70 contributing cultural institutions in the Digital Libraries Federation PIONIER Network using OWLIM repository of Ontotext

LODAC (Linked Open Data in Academia), Japan's National Institute of Informatics aggregates various information across multiple Japanese resources as LOD. The system uses 8 OWLIM nodes and aggregates 19 collections with 700 000 entities and 15M triples. uses 8 OWLIM nodes and aggregates 19 collections with 700 000 entities and 15M triples.

SemTech for Cultural Heritage project funded by ITCCSemantic publishing of Bulgarian cultural heritage to Europeana Establishing a Bulgarian technical aggregator for Europeana

Selected research projects

MOLTO FP7 project, a use case in cultural heritage for a semantic knowledgerepresentationinfrastructure for querying RDF and presenting query results, includes close to 9K museum objects from two collections of The Gothenburg City

Charisma (Cultural Heritage Advanced Research Infrastructures) an EU-funded integrating activity project, a consortium of 21 partners, metadata from 6 major European cultural institutions has selected OWLIM repository of Ontotext

Page 21: Europeana datainaction nov2012

Thank you for your attention!

21

[email protected]