europeana datainaction nov2012

Post on 20-Jun-2015

267 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Europeana Semantic Data in

Action (a Pilot Service based on OWLIM)

http://europeana.ontotext.com

Mariana Damova (PhD)

(with contribution to the work by Antoine Isaac, Valentine Charles,

Zdravko Tashev, Svetoslav Petrov)

Europeana AGM

November 2012

September 2012

Europeana Data Standards

• Unified metadata • ESE – Europeana Semantic Elements

• DublinCore & Europeana fields• 36 fields: flat, limited ability semantic links

dc:title europeana:provider dc:creator europeana:dataProvider dc:subject europeana:rights dc:description europeana:typedc:publisher europeana:isShownBy and/or europeana:isShownAt … …

3

• EDM - Europeana Data Model

Basic data model Two contextual classes

Europeana Data in EDM

• 268GB of data in RDF

• 20M+ cultural objects data and linkages to other datasets, mainly DBpedia

4

datasets, mainly DBpedia• EDM model• SKOS

Semantic Technologies – Main Features

• Semantic technologies (RDF, LOD) allow for an unprecedented ease of

integration of heterogeneous data sources

– Already adopted in pharmaceuticals and publishing industries

BBC – when MySQL was replaced with OWLIM in their “Dynamic Semantic

Publishing” architecture, the BBC team observed considerable reduction of

complexity of database design, query specification, application

development, and query evaluation time. BBC World Cup 2010 dynamic

semantic publishing. Jem Rayfield, Senior Technical Architect BBC News

and Knowledge.

http://www.bbc.co.uk/blogs/bbcinternet/2010/07/bbc_world_cup_2010_dyna

mic_sem.html

Linking Open Data

• Linking Open Data (LOD) W3C SWEO Community project http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData

• Initiative for publishing “linked data” – a set of principles,

which allows browsing of RDF data, spread across different

servers, in the way HTML is browsed

Semantic Technologies and Cultural Heritage

combining facts and knowledge from different datasets need for

convincing real life use cases demonstrating the benefits of these

technologies

The cultural heritage domain can become a useful usecase for the

application of semantic technologies.application of semantic technologies.

MacManus, the Founder and Editor-in-Chief of ReadWriteWeb

defined an exemplary test for the Semantic Web

cities around the world which have Modigliani art works

FactForge of Ontotext solves the Modigliani query

by combining knowledge from 6 datasets from the Linked Open Data Cloud

http://factforge.net

OWLIM - a scalable, robust and efficient triple store

– Serving the two most important web-sites for the London Olympic Games• Official Olympics website

• BBC Olympics website

– Performance highlights• OWLIM loads the 100M and the 200M datasets almost twice as fast as the next best product (17

min. for 100M)

• Best query performance among those repositories that can handle update and multi-client query

tasks (5,285 Query-mixes-per-hour, where a query mix contains 25 queries; e.g. about 100

queries/sec)

• OWLIM v5 is 43% faster than v.4.3 on the BSBM Explore and Update scenario• OWLIM v5 is 43% faster than v.4.3 on the BSBM Explore and Update scenario

• OWLIM v5 requires between 25% and 70% less storage space

• OWL 2 RL-type languages have proven to be the only feasible approach for

reasoning with billion statements

Reason-able View with Europeana data in EDM

• 268GB of data• cultural objects data and linkages to other datasets

Loaded into OWLIM with inference wrt OWL-Horst Optimized

Dataset size:Dataset size:NumberOfStatements=3,899,531,218NumberOfExplicitStatements= 993,332,911NumberOfEntities=264,523,842

EDM modelSKOS

SPARQL endpoint

• http://europeana.ontotext.com

Semantic Queries over Structured Data

• Available objects with their aggregators

• Data providers having contributing content to Europeana

• Datasets from Italy

• Objects from the 18th century provided to Europeana

• The original URL, the copyright and the creative commons right of objects provided by The

European Library

• Copyrights and Creative Commons rights of Europeana objects per provider

• Enrichment statements produced by Europeana for objects provided by institutions from

the United Kingdom

• List of Europeana enriched objects from Sweden, their equivalents and related entities

• Time enrichment statements produced by Europeana for provided objects

• The complete ordered list of Europeana aggregators and the specific data providers they

gather

Europeana objects with their images

Other cultural heritage sources available for interlinking

Gothenburg City Museum objects

• Oil paintings from the GIM collection

• Paintings of value less than 5000 Swedish Krona

• Paintings with a Gothenburg motive• Paintings with a Gothenburg motive

• Portraits and their painters

• Museum Objects from Swedish Museums

• Museum objects of height more than 30 centimeter

• Paintings given as a present to the Gothenburg City Museum

http://museum.ontotext.com

Linking Open Data Cloud

Europeana Creative - PSP projectlead by the Austrian National Library26 partnersObjective: experimenting with re-use of cultural

content for creativityProject: Europeana re-use framework and 6 pilots in

Outlook …

17Sofia, 13 March 2012

Project: Europeana re-use framework and 6 pilots in different domains such as education, tourism, etc.

Ontotext: participate in the infrastructure for re-use with the semantic repository OWLIM, and data integration

Ontotext

– Top-5 provider of core Semantic Technology

– Established in year 2000; offices in Bulgaria, UK, USA

– Active both in research and commercial projects (FP7 funding for 10 years)

• 360° semantic technology – unique portfolio:

– Semantic Databases: high-performance RDF DBMS, scalable reasoning

– Semantic Search: text-mining (IE), metadata generation, Information Retrieval (IR)

– Web Mining: focused crawling, screen scraping, data fusion

– Linked Data Management and Data Integration

Good recognition in the SemTech community

– Ontotext pages are ranked #1 for “semantic annotation” and “semantic repository” at

GYM, #3 for “linked data management” at Google

Several joint ventures and subsidiaries

– Innovantage: leading online recruitment intelligence provider in UK

Ontotext Clients (selected)

British Broadcasting Corporation (BBC)– Run its World Cup 2010 sites on top of OWLIM

– Since Mar’12 BBC Sports

– 2012 Olympics sections are driven by OWLIM and a Concept Extraction service developed by Ontotext

Press Association (UK)– Analysis of Sports news

– Concept extraction

– Linked data generation– Linked data generation

Top-3 USA media (not allowed to name)

The National Archives (UK) contracted Ontotext to implement semantic KB and semantic search for the Government Web Archive

British Museum (UK) Ontotext leads the development of Phase 3 of ResearchSpace project on collaborative research in cultural heritage; British Museum’s public SPARQL end-point is powered by OWLIM

Ontotext in the Cultural Heritage Domain

Selected commercial projects

ResearchSpace project funded by the Andrew W. Mellon Foundation Support for collaborative web-based research, information sharing and web publishing for the cultural heritage scholarly community. An Ontotext-led international consortium.

The Polish Digital National Museum aggregates artifacts from over 70 contributing cultural institutions in the Digital Libraries Federation PIONIER Network using OWLIM repository of Ontotext

LODAC (Linked Open Data in Academia), Japan's National Institute of Informatics aggregates various information across multiple Japanese resources as LOD. The system uses 8 OWLIM nodes and aggregates 19 collections with 700 000 entities and 15M triples. uses 8 OWLIM nodes and aggregates 19 collections with 700 000 entities and 15M triples.

SemTech for Cultural Heritage project funded by ITCCSemantic publishing of Bulgarian cultural heritage to Europeana Establishing a Bulgarian technical aggregator for Europeana

Selected research projects

MOLTO FP7 project, a use case in cultural heritage for a semantic knowledgerepresentationinfrastructure for querying RDF and presenting query results, includes close to 9K museum objects from two collections of The Gothenburg City

Charisma (Cultural Heritage Advanced Research Infrastructures) an EU-funded integrating activity project, a consortium of 21 partners, metadata from 6 major European cultural institutions has selected OWLIM repository of Ontotext

Thank you for your attention!

21

mariana.damova@ontotext.com

top related