metadata normalisation in europeana the hague, 13 & 14 january 2009 julie verleyen scientific...

Post on 30-Dec-2015

212 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Metadata Normalisation in Europeana

The Hague, 13 & 14 January 2009

Julie Verleyen

Scientific Coordinator, Europeana Office

EuropeanaLocal Knowledge Sharing Workshop

A. Workflow

B. Metadata normalisation with ESE

C. Approach in practice: Demo of tools used

D. Knowledge SHARING Workshop:

Discussion of the practice for EuropeanaLocal

Session

A. Workflow

B. Metadata normalisation with ESE

C. Approach in practice: Demo of tools used

D. Knowledge SHARING Workshop:

Discussion of the practice for EuropeanaLocal

Session

CONTENT SURVEY

#0

Stage #0: Content survey

Input:

Output:

Specifications of

content contribution

Excelspecs

questionnaire

CONTENT SURVEY

#0

Stage #1: Harvesting and package creation

Input:

Output: Harvested data in XML

Collection-specific analysis tool

Sample of source data: 1000 records

Mapping specifications template

Excelspecs

XMLrawdata

HTMLanalysis

toolXML

samplerawdata

TXTmappingtemplate

CONTENT SURVEY

#0

#2 Analysis and mapping specifications

Input:

Output:

Excelspecs

TXTmapping

specs

HTMLanalysis

tool

XMLsample

rawdata

TXTmappingtemplate

CONTENT SURVEY

#0

Stage #3: Mapping and normalisation

Input:

Output:

XMLrawdata

TXTmapping

specs

XMLnormalised

mappeddata

XMLprofile

Quality check

NORMALISER

STAGE 3

CONTENT SURVEY

#0

Stage #4: Database storage and indexing

Input:

Output:

XMLnormalised

mappeddata

DB INDEX

A. Workflow

B. Metadata normalisation with ESE

C. Approach in practice: Demo of tools used

D. Knowledge SHARING Workshop:

Discussion of the practice for EuropeanaLocal

Session

Europeana Semantic Element (ESE)

• Europeana “Schema” for the Prototype

• Based on Dublin Core Metadata Elements Set

(DCMES)(ISO )

49 Elements (26 Elements & 23 Refinements)

• Created through discussions in July/August 2008

ESE specialities

• europeana:country • europeana:provider (dc:source)• europeana:language (dc:language)• europeana:type (dc:type, dc:format)• europeana:year (dc:date)• europeana:isShownBy (dc:relation)• europeana:isShownAt (dc:relation)• europeana:object • europeana:uri (dc:identifier)

All normalised:

Syntax

Value

Let’s examine their characteristics

ESE specialities

Definition: Country of content provider.

If several countries: Europe Format:

String, ex: switzerland, germany,… Reference:

TEL controlled list. Supports TEL interface translation mechanism Mechanism:

Manual In portal:

Facet browsing of search results

Normalised ESE terms: Country

Definition: Organisation sending the data to Europeana

Format: String, ex: Musées lausannois, Nasjonalbiblioteket,…

Reference: Europeana controlled list of content providers: <original_name>

Mechanism: Manual but potentially can be automated

In portal: Facet browsing of search results

Normalised ESE terms: Provider

Definition: Language of provider’s country (ESE:languages of the metadata)

Format: 2-letters, ex: it, no,fr, en, es,…

Reference: ISO639-1 language codes Exception: If several languages: “mul”

Mechanism: Manual but potentially can be automated

In portal: Facet browsing of search results

Normalised ESE terms: Language

Definition: Type of the original object

Format: String

Reference: 4 Europeana types: IMAGE, TEXT, SOUND, VIDEO

Mechanism: Manual: Mapping specified by content provider

In portal: Categorisation display Facet browsing of search results

Normalised ESE terms: Type

Definition: Date of creation of the original object (analog or born digital)

Format: 4 digits [YYYY], ex: 1950

Reference: Europeana year

Mechanism: Automatic extraction with “YearExtractor” converter

In portal: Facet browsing of search results Browsing by time (timeline)

Normalised ESE terms: Year

Definition: URL to the digital object

Format: URL (http://...)

Mechanism: Automatic or manual

In portal: Linking

Normalised ESE terms: isShownBy

Definition: URL to the digital object with context

Format: URL (http://...)

Mechanism: Automatic or manual

In portal: Linking

Normalised ESE terms: isShownAt

Definition: URL to the digital object as thumbnail

Format: URL (http://...)

Mechanism: Automatic or manual

In portal: Display

Normalised ESE terms: Object

Definition: Record identifier for Europeana system

Format: URI

Mechanism: Automatic: special algorithm guaranteeing uniqueness (and

integrity) of recordshttp://www.europeana.eu/resolve/record/91101/0BAF44EDF8B98F1322DEEAD4AB989778E6394418

In portal: MyEuropeana Full digital object view in Europeana

Normalised ESE terms: URI

A. Workflow

B. Metadata normalisation with ESE

C. Approach in practice: Demo of tools used

D. Knowledge SHARING Workshop:

Discussion of the practice for EuropeanaLocal

Session

Metadata normalisation in practice

Demo of stage #3’s workflow:

1. Go through data of example collection #1

2. Practical exercise: let’s normalise example collection #2 for Europeana!!

3. 2 examplesof known issues

MAPPING & NORMALISATION

#3

SUBVERSION (SVN)

COLLECTION FOLDER

SOURCE XML

MAPPING SPECS TXT

OUTPUT XML

MAPPING/NORM. SPECS XML

Example 1: “Midas” collection

83 moving image records from the Association des Cinémathèques Européennes Harvested data Fields mapping/Type values mapping specs Analysis file (source data) Mapping file Profile file Analysis file + sample (normalised data)

Example 2: “Outsider Art Museum” collection 4142 records from the Musées Lausannois

Known issues with mapping/profile files

1. Wrong syntax in mapping file causes errors

in profile.xml:

If use “=>” in comment in mapping.txt this

creates a mapping entry in profile.xml!

Ex: ………

BEFORE

AFTER

Known issues with mapping/profile files

2. Wrong syntax in mapping file causes errors

in profile.xml:

There should be 2 blanks between “=>” and

“N/A” and not one otherwise the mapping

specification is not well formatted in XML in

profile.xml:

Ex: ………………….

MAPPING.TXT

PROFILE.XML

MAPPING.TXT

PROFILE.XML

profile.xml with error: 2 white spaces!

Documentation in Europeana context

Europeana Semantic Elements (ESE) v3.1

“Europeana – Data Offline Preparation”

Commented version of “profile.xml”

“Quality Control Checklist”

A. Workflow

B. Metadata normalisation with ESE

C. Approach in practice: Demo of tools used

D. Knowledge SHARING Workshop:

Discussion of the practice for EuropeanaLocal

Session

Questions about Europeana metadata

ingestion/normalisation process?

Integration and/or compatibility of this process with

EuropeanaLocal content strategy:

Where normalisation will take place?

By who?

Discussion

Thank you

Julie.Verleyen@kb.nl

Duplicated records

Records without URLs to digital object

Records without Europeana type (SOUND, TYPE,

IMAGE, VIDEO)

Records to copyright-protected digital objects

Discarding factors during normalisation

top related