controlled vocabularies in telplus antoine isaac vrije universiteit amsterdam edlproject workshop...

Post on 29-Mar-2015

218 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Controlled Vocabularies in TELPlus

Antoine ISAACVrije Universiteit Amsterdam

EDLProject Workshop22-23 November 2007

Agenda

• TELPlus Context

• Improving subject access– 3 sub-tasks

• Services for TEL

TELPlus Context

• Started October 2007• Running 27 months

• Content WPs– OCRing previously digitised material– Improving the usability of TEL through OAI

PMH compliancy– Improving Access– Integrating services with TEL portal– User personalisation services– Extending TEL to Bulgaria & Romania

WP3 – Improving Access

• Task 1: Indexing for usability– Review/test state-of-the-art semantic search

engines• On content of documents

• Task 2: Improving subject access• Task 3: FRBR aggregation, search and

browsing– Create/exploit FRBR metadata repositories

• Task 4: Focus on users– Focus groups on prototypes

WP 3 Task 2 – Improving Subject Access• Improving subject access via semantic

alignment between subjects

• Search through collections– Using metadata– In a controlled setting

• Paving the way for enhanced usages– Advanced treatments mentioned in TELplus

need conceptual structures and links between these structures

• E.g. clustering

WP 3 Task 2 – Improving Subject Access• Improving subject access via semantic

alignment between subjects

• Reference: MACS project– Manually-built semantic equivalences

between Rameau, SWD & LCSH headings

MACS: Querying Collections

MACS: Query Reformulation Options

WP 3 Task 2 – Improving Subject Access• Improving subject access via semantic

alignment between subjects

• Reference: MACS project– Manual equivalences between Rameau,

SWD, LCSH headings

• Here: an experiment on deploying automatic alignment techniques– Determining possible strategies– Assessing feasibility and usefulness– MACS context

WP3.2 Sub-tasks

• 3.2.1. Converting the subjects to standard representation language – Semantic web format (SKOS)

• 3.2.2. Aligning the vocabularies– Semantic correspondences between subjects

• 3.2.3. Deploying the alignment knowledge obtained into TEL framework– E.g. using links to reformulate queries from one

subject list to the other

Converting subjects to standard representation language

Goal: solving syntactic heterogeneity between vocabularies

• Enabling the use of standard tools– E.g. for query (re)formulation

• Paving the way for dealing with semantic heterogeneity– Definitions of concepts expressed according

to a common model

Converting subjects to standard representation language

Approach: Semantic Web and SKOS• Semantic Web

– Knowledge objects as web resources (URIs)– Description by linking resources (RDF)– Description using shared formal

vocabularies (ontologies)

• SKOS – A standard Semantic Web model (ontology)– For knowledge organization systems

(thesauri, subject heading lists…)

http://www.iconclass.nl/s_11

http://www.iconclass.nl/s_11F

skos:Concept

rdf:type

skos:broader

skos:prefLabel

“the Virgin Mary”@en

skos:prefLabel“la Vierge Marie”@fr

http://www.iconclass.nl/

skos:inScheme

skos:ConceptScheme

rdf:type

SKOS: Example

Converting subjects to standard representation language - Process

• Getting processable versions from owners – E.g. XML

• Analyzing the models

• Converting to SKOS

WP3.2 Sub-tasks

• 3.2.1. Converting the subjects to standard representation language – Semantic web format (SKOS)

• 3.2.2. Aligning the vocabularies– Semantic correspondences between subjects

• 3.2.3. Deploying the alignment knowledge obtained into TEL framework– E.g. using links to reformulate queries from one

subject list to the other

Vocabulary Alignment

• Specifying required alignment format (links)– Type of mapping links: equivalence, broader– Cardinality: one-to-one, one-to-many– Taking application context (TEL) into account

Vocabulary Alignment

• Specifying required alignment format (links)

• Selecting (& running) alignment techniques/tools– Inspired by semantic web approaches

Vocabulary Alignment Techniques

• Similar to ontology alignment problem• Existing approaches for (semi-) automatic

ontology alignment– Using techniques from linguistics, computer

science, statistics

• Problem: performances do not allow 100% automatic alignment

• Problem: multilingual case– Some techniques cannot be used

Backgroundknowledge

Potential Technique: Using Background Knowledge

• Using a shared conceptual reference to find links

SHL 1 SHL 2

“Calendar”

“Publication”

Potential Technique: Statistical Alignment

• Object information (book indexing)

SHL 1 SHL 2

Dually-indexedbooks

“DutchLiterature”

“Dutch”

Vocabulary Alignment

• Specifying required alignment format (links)

• Selection (& running) of tool/method

• Evaluation (& cleaning)– Considering application

Evaluation of Alignments

• MACS has produced mappings!– Possible gold standard

• But: has MACS produced all mappings?– Which proportion of the SHLs is covered?– Taking into account all indexing strings?

• Are MACS mappings the only interesting ones?– “Serendipity” mappings

• Concepts that are not equivalent but could bring useful results when added to queries

– Compensating for indexing variability

Evaluation of Alignments

• Several scenarios for using and evaluating alignments– Concept-based search– Re-indexing– Integration of one SHL into the other– SHL Merging– Free-text search– Navigation

Evaluation of Alignments

• Several scenarios for using and evaluating alignments– Concept-based search

• Retrieving books indexed by SHL1 using SHL2 concepts

– Re-indexing– Integration of one SHL into the other– SHL Merging– Free-text search

• Matching user search terms to both SHL1 or SHL2 concepts

– Navigation• Browsing several collections using one SHL

structure

Evaluation of Alignments

• Several settings for a single scenario– Fully automatic reformulation vs assisted

reformulation (candidates)

• Different evaluation measures– Good mappings vs acceptable ones– Number of candidates for reformulation– Semantic closeness to original query

Vocabulary Alignment

• Specifying required alignment format (links)

• Selection (& running) of tool/method

• Evaluation (& cleaning)

• Assessment of the approach– Efforts required, quality, extendibility

WP3.2 Sub-tasks

• 3.2.1. Converting the subjects to standard representation language – Semantic web format (SKOS)

• 3.2.2. Aligning the vocabularies– Semantic correspondences between subjects

• 3.2.3. Deploying the alignment knowledge obtained into TEL framework– E.g. using links to reformulate queries from one

subject list to the other

Deploying the alignment knowledge obtained into TEL framework

• Observing integration of MACS data into TEL– Conceptual input for alignment requirements

• Integration of the obtained alignment in TEL

• Assessment of the alignment integration– Technical aspects, usage aspects

Reminder

• Alignment is a difficult problem• Application-specific alignment pretty much

unexplored in Semantic Web research

More a feasibility study than a complete solution to the problem

Practical goal: investigate how automatic techniques could help MACS-like initiatives

• Manual mapping is labour-intensive

Agenda

• TELPlus Context

• Improving subject access– 3 sub-tasks

• Services for TEL

WP4 – Integrating services with the European Library portal

Theo van Veen (KB)

Tasks:• Identifying services that are going to give the

user the greatest return• Creating new services• Integrating services within TEL…

WP4 – Some Services Mentioned

Preliminary inventory: no official commitment!

Services based on controlled vocabularies:• Thesaurus and name authority service

– Providing terms linked to query terms

• Semantic enrichment service– Users can annotate search results with

terms

• Distance between terms and related terms

WP4 – Some Services Mentioned

Preliminary inventory: no official commitment!

Services based on controlled vocabularies:• Thesaurus and name authority service• Semantic enrichment service• Distance between terms and related terms

Adding more value from controlled vocabularies and alignments between them

Thanks!

top related