metadata normalisation in europeana the hague, 13 & 14 january 2009 julie verleyen scientific...
TRANSCRIPT
Metadata Normalisation in Europeana
The Hague, 13 & 14 January 2009
Julie Verleyen
Scientific Coordinator, Europeana Office
EuropeanaLocal Knowledge Sharing Workshop
A. Workflow
B. Metadata normalisation with ESE
C. Approach in practice: Demo of tools used
D. Knowledge SHARING Workshop:
Discussion of the practice for EuropeanaLocal
Session
A. Workflow
B. Metadata normalisation with ESE
C. Approach in practice: Demo of tools used
D. Knowledge SHARING Workshop:
Discussion of the practice for EuropeanaLocal
Session
CONTENT SURVEY
#0
Stage #0: Content survey
Input:
Output:
Specifications of
content contribution
Excelspecs
questionnaire
CONTENT SURVEY
#0
Stage #1: Harvesting and package creation
Input:
Output: Harvested data in XML
Collection-specific analysis tool
Sample of source data: 1000 records
Mapping specifications template
Excelspecs
XMLrawdata
HTMLanalysis
toolXML
samplerawdata
TXTmappingtemplate
CONTENT SURVEY
#0
#2 Analysis and mapping specifications
Input:
Output:
Excelspecs
TXTmapping
specs
HTMLanalysis
tool
XMLsample
rawdata
TXTmappingtemplate
CONTENT SURVEY
#0
Stage #3: Mapping and normalisation
Input:
Output:
XMLrawdata
TXTmapping
specs
XMLnormalised
mappeddata
XMLprofile
Quality check
NORMALISER
STAGE 3
CONTENT SURVEY
#0
Stage #4: Database storage and indexing
Input:
Output:
XMLnormalised
mappeddata
DB INDEX
A. Workflow
B. Metadata normalisation with ESE
C. Approach in practice: Demo of tools used
D. Knowledge SHARING Workshop:
Discussion of the practice for EuropeanaLocal
Session
Europeana Semantic Element (ESE)
• Europeana “Schema” for the Prototype
• Based on Dublin Core Metadata Elements Set
(DCMES)(ISO )
49 Elements (26 Elements & 23 Refinements)
• Created through discussions in July/August 2008
ESE specialities
• europeana:country • europeana:provider (dc:source)• europeana:language (dc:language)• europeana:type (dc:type, dc:format)• europeana:year (dc:date)• europeana:isShownBy (dc:relation)• europeana:isShownAt (dc:relation)• europeana:object • europeana:uri (dc:identifier)
All normalised:
Syntax
Value
Let’s examine their characteristics
ESE specialities
Definition: Country of content provider.
If several countries: Europe Format:
String, ex: switzerland, germany,… Reference:
TEL controlled list. Supports TEL interface translation mechanism Mechanism:
Manual In portal:
Facet browsing of search results
Normalised ESE terms: Country
Definition: Organisation sending the data to Europeana
Format: String, ex: Musées lausannois, Nasjonalbiblioteket,…
Reference: Europeana controlled list of content providers: <original_name>
Mechanism: Manual but potentially can be automated
In portal: Facet browsing of search results
Normalised ESE terms: Provider
Definition: Language of provider’s country (ESE:languages of the metadata)
Format: 2-letters, ex: it, no,fr, en, es,…
Reference: ISO639-1 language codes Exception: If several languages: “mul”
Mechanism: Manual but potentially can be automated
In portal: Facet browsing of search results
Normalised ESE terms: Language
Definition: Type of the original object
Format: String
Reference: 4 Europeana types: IMAGE, TEXT, SOUND, VIDEO
Mechanism: Manual: Mapping specified by content provider
In portal: Categorisation display Facet browsing of search results
Normalised ESE terms: Type
Definition: Date of creation of the original object (analog or born digital)
Format: 4 digits [YYYY], ex: 1950
Reference: Europeana year
Mechanism: Automatic extraction with “YearExtractor” converter
In portal: Facet browsing of search results Browsing by time (timeline)
Normalised ESE terms: Year
Definition: URL to the digital object
Format: URL (http://...)
Mechanism: Automatic or manual
In portal: Linking
Normalised ESE terms: isShownBy
Definition: URL to the digital object with context
Format: URL (http://...)
Mechanism: Automatic or manual
In portal: Linking
Normalised ESE terms: isShownAt
Definition: URL to the digital object as thumbnail
Format: URL (http://...)
Mechanism: Automatic or manual
In portal: Display
Normalised ESE terms: Object
Definition: Record identifier for Europeana system
Format: URI
Mechanism: Automatic: special algorithm guaranteeing uniqueness (and
integrity) of recordshttp://www.europeana.eu/resolve/record/91101/0BAF44EDF8B98F1322DEEAD4AB989778E6394418
In portal: MyEuropeana Full digital object view in Europeana
Normalised ESE terms: URI
A. Workflow
B. Metadata normalisation with ESE
C. Approach in practice: Demo of tools used
D. Knowledge SHARING Workshop:
Discussion of the practice for EuropeanaLocal
Session
Metadata normalisation in practice
Demo of stage #3’s workflow:
1. Go through data of example collection #1
2. Practical exercise: let’s normalise example collection #2 for Europeana!!
3. 2 examplesof known issues
MAPPING & NORMALISATION
#3
SUBVERSION (SVN)
COLLECTION FOLDER
SOURCE XML
MAPPING SPECS TXT
OUTPUT XML
MAPPING/NORM. SPECS XML
Example 1: “Midas” collection
83 moving image records from the Association des Cinémathèques Européennes Harvested data Fields mapping/Type values mapping specs Analysis file (source data) Mapping file Profile file Analysis file + sample (normalised data)
Example 2: “Outsider Art Museum” collection 4142 records from the Musées Lausannois
Known issues with mapping/profile files
1. Wrong syntax in mapping file causes errors
in profile.xml:
If use “=>” in comment in mapping.txt this
creates a mapping entry in profile.xml!
Ex: ………
BEFORE
AFTER
Known issues with mapping/profile files
2. Wrong syntax in mapping file causes errors
in profile.xml:
There should be 2 blanks between “=>” and
“N/A” and not one otherwise the mapping
specification is not well formatted in XML in
profile.xml:
Ex: ………………….
MAPPING.TXT
PROFILE.XML
MAPPING.TXT
PROFILE.XML
profile.xml with error: 2 white spaces!
Documentation in Europeana context
Europeana Semantic Elements (ESE) v3.1
“Europeana – Data Offline Preparation”
Commented version of “profile.xml”
“Quality Control Checklist”
A. Workflow
B. Metadata normalisation with ESE
C. Approach in practice: Demo of tools used
D. Knowledge SHARING Workshop:
Discussion of the practice for EuropeanaLocal
Session
Questions about Europeana metadata
ingestion/normalisation process?
Integration and/or compatibility of this process with
EuropeanaLocal content strategy:
Where normalisation will take place?
By who?
…
Discussion
Duplicated records
Records without URLs to digital object
Records without Europeana type (SOUND, TYPE,
IMAGE, VIDEO)
Records to copyright-protected digital objects
Discarding factors during normalisation