the value of usage scenarios for thesaurus alignment in cultural heritage context
DESCRIPTION
The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context. Antoine Isaac , Claus Zinn, Henk Matthezing, Lourens van der Meij, Stefan Schlobach, Shenghui Wang Cultural Heritage on the Semantic Web Workshop Oct. 12 th , 2007. Introduction. One important problem in CH - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context](https://reader035.vdocument.in/reader035/viewer/2022081502/56815e81550346895dcd1264/html5/thumbnails/1.jpg)
The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context
Antoine Isaac, Claus Zinn, Henk Matthezing, Lourens van der Meij, Stefan Schlobach, Shenghui Wang
Cultural Heritage on the Semantic Web WorkshopOct. 12th, 2007
![Page 2: The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context](https://reader035.vdocument.in/reader035/viewer/2022081502/56815e81550346895dcd1264/html5/thumbnails/2.jpg)
OAEI 2007: Results from the Library Track
Introduction
• One important problem in CH
• Heterogeneity of description resources• Thesauri (at large)• Classification schemes, subject heading lists …
• Hampers access across collections
![Page 3: The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context](https://reader035.vdocument.in/reader035/viewer/2022081502/56815e81550346895dcd1264/html5/thumbnails/3.jpg)
OAEI 2007: Results from the Library Track
Introduction
Ontology alignment can help• Semantic links between ontology elements
• o1:Cat owl:equivalentClass o2:Chat• Using automatic tools
• E.g. exploiting labels, structure
![Page 4: The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context](https://reader035.vdocument.in/reader035/viewer/2022081502/56815e81550346895dcd1264/html5/thumbnails/4.jpg)
OAEI 2007: Results from the Library Track
Introduction
• Problem: not much on alignment applications• Need further research on context-specific
alignment• Generation• Deployment• Evaluation
• Important context dimension: application scenarios
![Page 5: The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context](https://reader035.vdocument.in/reader035/viewer/2022081502/56815e81550346895dcd1264/html5/thumbnails/5.jpg)
OAEI 2007: Results from the Library Track
Agenda
• Introduction• Dutch National Library Scenarios for
Alignment• Book Re-indexing• Scenario-specific Evaluation
![Page 6: The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context](https://reader035.vdocument.in/reader035/viewer/2022081502/56815e81550346895dcd1264/html5/thumbnails/6.jpg)
OAEI 2007: Results from the Library Track
KB and Thesaurus Alignement
• National Library of the Netherlands (KB)• 2 main collections• Each described (indexed) by its own thesaurus• Problem: maintenance optimized wrt. redundancy/size?
ScientificCollection
Depot
1.4Mbooks
1Mbooks
GTT Brinkman
250Kbooks
![Page 7: The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context](https://reader035.vdocument.in/reader035/viewer/2022081502/56815e81550346895dcd1264/html5/thumbnails/7.jpg)
OAEI 2007: Results from the Library Track
Usage Scenarios for Thesaurus Alignment at KB• Concept-based search
• Retrieving GTT-indexed books using Brinkman concepts• Re-indexing
• Indexing GTT-indexed books with Brinkman concepts• Integration of one Thesaurus into the other
• Inserting GTT elements into the Brinkman thesaurus• Thesaurus Merging
• Building a new thesaurus fro GTT and Brinkman• Free-text search
• matching user search terms to both GTT or Brinkman concepts• Navigation:
• browse the 2 collections through a merged version of the thesauri
![Page 8: The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context](https://reader035.vdocument.in/reader035/viewer/2022081502/56815e81550346895dcd1264/html5/thumbnails/8.jpg)
OAEI 2007: Results from the Library Track
Agenda
• Introduction• Dutch National Library Scenarios for
Alignment• Book Re-indexing• Scenario-specific Evaluation
![Page 9: The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context](https://reader035.vdocument.in/reader035/viewer/2022081502/56815e81550346895dcd1264/html5/thumbnails/9.jpg)
OAEI 2007: Results from the Library Track
The Book Re-indexing Scenario
• Scenario: re-indexing of GTT-indexed books by Brinkman concepts
ScientificCollection
Depot
1.4Mbooks
1Mbooks
GTT Brinkman
![Page 10: The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context](https://reader035.vdocument.in/reader035/viewer/2022081502/56815e81550346895dcd1264/html5/thumbnails/10.jpg)
OAEI 2007: Results from the Library Track
The Book Re-indexing Scenario
• If one of the thesauri is dropped, legacy data has to be indexed according to the other voc.• Automatically• Semi-automatically, users presented with candidate
annotations
![Page 11: The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context](https://reader035.vdocument.in/reader035/viewer/2022081502/56815e81550346895dcd1264/html5/thumbnails/11.jpg)
OAEI 2007: Results from the Library Track
Scenario Requirements
• Mapping sets of GTT concepts to sets of Brinkman• alignreindex: {g1,…,gm} →{b1,…,bn}• Option where users select based on probabilities
• Candidates concepts are given weights (e.g. [0;1])• alignreindex’: {g1,…,gm} →{(b1,w1),…,(bn,wn)}
• Generated index should be generally small• 99.2% of depot books indexed with no more than 3
Brinkman concepts
![Page 12: The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context](https://reader035.vdocument.in/reader035/viewer/2022081502/56815e81550346895dcd1264/html5/thumbnails/12.jpg)
OAEI 2007: Results from the Library Track
Semantic Interpretation of Re-indexing Function
1-1 case: g1->b1• b1 is semantically equivalent to g1
• OK• b1 is more general than g1
• Loss of information• Possible if b1 is the most specific subsumer of g1’s
meanings• Indexing specificity rule
![Page 13: The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context](https://reader035.vdocument.in/reader035/viewer/2022081502/56815e81550346895dcd1264/html5/thumbnails/13.jpg)
OAEI 2007: Results from the Library Track
Semantic Interpretation of Re-indexing Function
Generic cases: combinations of concepts• Considerations on semantic links are the same• Combination matters
Indexing is post-coordinated• {“Geography”; “the Netherlands”} in GTT-> book about geography of the Netherlands
• Different granularities/indexing points of view• Brinkman has “Netherlands; Geography”
![Page 14: The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context](https://reader035.vdocument.in/reader035/viewer/2022081502/56815e81550346895dcd1264/html5/thumbnails/14.jpg)
OAEI 2007: Results from the Library Track
Problem of Alignment Deployment
Results of existing tools may need re-interpretation
• Unclear semantics of mapping links• "=","<"• weights
• Single concepts involved in mappings
![Page 15: The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context](https://reader035.vdocument.in/reader035/viewer/2022081502/56815e81550346895dcd1264/html5/thumbnails/15.jpg)
OAEI 2007: Results from the Library Track
Example of Alignment Deployment Approach• Starting from similarity measures over both
thesauri• sim(X,Y)=n
• Aggregation strategy: simple Ranking• For a concept, take the top k similar concepts
• Gather GTT concepts and Brinkman ones
• Re-indexing function specified by conditions for firing rules• E.g., if the book indexing contains the left part of the rule
![Page 16: The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context](https://reader035.vdocument.in/reader035/viewer/2022081502/56815e81550346895dcd1264/html5/thumbnails/16.jpg)
OAEI 2007: Results from the Library Track
Agenda
• Introduction• Dutch National Library Scenarios for
Alignment• Book Re-indexing• Scenario-specific Evaluation
![Page 17: The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context](https://reader035.vdocument.in/reader035/viewer/2022081502/56815e81550346895dcd1264/html5/thumbnails/17.jpg)
OAEI 2007: Results from the Library Track
Evaluation Design
• We do not assess the rules• We assess their application on book indexing
• 2 classical aspects:• Correctness (cf. precision)• Completeness (cf. recall)
![Page 18: The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context](https://reader035.vdocument.in/reader035/viewer/2022081502/56815e81550346895dcd1264/html5/thumbnails/18.jpg)
OAEI 2007: Results from the Library Track
Evaluation Design: Different Variants and Settings
• Fully automatic evaluation• Using the set of dually indexed books as gold standard
• Manual evaluation 1• Human expert assesses candidate indices• Unsupervised setting: margin of error should be very
low• Supervised setting: less strict, but size also matter
• Manual evaluation 2• Same as 1, but a first index has been produced by the
expert• Distance between the two indices is assessed• Eventually changing original index
![Page 19: The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context](https://reader035.vdocument.in/reader035/viewer/2022081502/56815e81550346895dcd1264/html5/thumbnails/19.jpg)
OAEI 2007: Results from the Library Track
Human Evaluation vs. Automatic Evaluation
Taking into account• Indexing variability
• Automatic evaluation compares with a specific indexing choice
• Especially important if thesaurus doesn’t match book subject
• Evaluation variability• Only one expert judgment is considered per book
• Evaluation set bias• Dually-indexed books (may) present specific
characteristics
![Page 20: The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context](https://reader035.vdocument.in/reader035/viewer/2022081502/56815e81550346895dcd1264/html5/thumbnails/20.jpg)
OAEI 2007: Results from the Library Track
New Developments, Outside of Paper!• Reviews: you should add actual results of good
general alignment systems as compared to your scenario
• Ontology Alignment Evaluation Initiative• http://oaei.ontologymatching.org/2007• State-of-the-art aligners applied to specific cases
• This paper: grounding for an OAEI Library track• KB vocabularies• Evaluation in re-indexing scenario
![Page 21: The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context](https://reader035.vdocument.in/reader035/viewer/2022081502/56815e81550346895dcd1264/html5/thumbnails/21.jpg)
OAEI 2007: Results from the Library Track
Automatic Evaluation
• There is a gold standard for re-indexing scenario
• General method: for dually indexed books, compare existing Brinkman annotations and new ones
ScientificCollection
Depot
1.4Mbooks
1Mbooks
GTT Brinkman
250Kbooks
![Page 22: The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context](https://reader035.vdocument.in/reader035/viewer/2022081502/56815e81550346895dcd1264/html5/thumbnails/22.jpg)
OAEI 2007: Results from the Library Track
Automatic Evaluation
• Book level: Precision & Recall for matched books • Books for which there is one good annotation• Minimal hint about users’ (dis)satisfaction
• Annotation level: P & R for candidate annotations• Note: counting over annotations and books, not rules
and concepts• Rules & concepts used more often are more important
![Page 23: The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context](https://reader035.vdocument.in/reader035/viewer/2022081502/56815e81550346895dcd1264/html5/thumbnails/23.jpg)
OAEI 2007: Results from the Library Track
Automatic Evaluation Results
Notice: for exactMatch only
![Page 24: The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context](https://reader035.vdocument.in/reader035/viewer/2022081502/56815e81550346895dcd1264/html5/thumbnails/24.jpg)
OAEI 2007: Results from the Library Track
Manual Evaluation Method
Variant 1, for supervised setting
• Selection of 100 books• 4 KB evaluators• Paper forms + copy of books
![Page 25: The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context](https://reader035.vdocument.in/reader035/viewer/2022081502/56815e81550346895dcd1264/html5/thumbnails/25.jpg)
OAEI 2007: Results from the Library TrackPaper Forms
![Page 26: The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context](https://reader035.vdocument.in/reader035/viewer/2022081502/56815e81550346895dcd1264/html5/thumbnails/26.jpg)
OAEI 2007: Results from the Library Track
Annotation Transl.: Manual Evaluation Results
Research question: quality of candidate annotations
• Measures used: cf. automatic evaluation
• Performances are consistently higher
[Left: manual evaluation, Right: automatic evaluation]
![Page 27: The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context](https://reader035.vdocument.in/reader035/viewer/2022081502/56815e81550346895dcd1264/html5/thumbnails/27.jpg)
OAEI 2007: Results from the Library Track
Annotation Transl.: Manual Evaluation Results
Research question: evaluation variability
• Krippendorff’s agreement coefficient (alpha)
• High variability: overall alpha=0.62• <0.67, classic threshold for Computational
Linguistics tasks• But indexing seems to be more variable than usual
CL tasks
![Page 28: The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context](https://reader035.vdocument.in/reader035/viewer/2022081502/56815e81550346895dcd1264/html5/thumbnails/28.jpg)
OAEI 2007: Results from the Library Track
Annotation Transl.: Manual Evaluation Results
Research question: indexing variability• Measuring acceptability of original book
indices
• Kripendorff’s agreement for indices chosen by evaluators• 0.59 overall alpha confirms high variability
![Page 29: The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context](https://reader035.vdocument.in/reader035/viewer/2022081502/56815e81550346895dcd1264/html5/thumbnails/29.jpg)
OAEI 2007: Results from the Library Track
Conclusions
• Better characterization of alignment scenarios is needed
• For a single case there are many scenarios and variants
• Requires to elicit requirements• And evaluation criteria
![Page 30: The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context](https://reader035.vdocument.in/reader035/viewer/2022081502/56815e81550346895dcd1264/html5/thumbnails/30.jpg)
OAEI 2007: Results from the Library Track
Discussion: Differences between scenarios?
• Concept-based search• Re-indexing• Integration of one thesaurus into the other• Thesaurus merging• Free-text search aided by thesauri• Navigation
![Page 31: The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context](https://reader035.vdocument.in/reader035/viewer/2022081502/56815e81550346895dcd1264/html5/thumbnails/31.jpg)
OAEI 2007: Results from the Library Track
Discussion: Differences between scenarios?
Semantics of alignment
• A core of primitives that are be useful• broader/narrower, related
• Some constructs are more specific• “AND” combination for re-indexing
• Interpretation of equivalence?• Thesaurus merging: “excavation” = “excavation”• Query reformulation: “excavation” = “archeology;
Netherlands”
![Page 32: The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context](https://reader035.vdocument.in/reader035/viewer/2022081502/56815e81550346895dcd1264/html5/thumbnails/32.jpg)
OAEI 2007: Results from the Library Track
Discussion: Differences between scenarios?
Multi-concept alignment
• Useful for re-indexing or concept-based search• Less for thesaurus re-engineering scenarios
• Combinations are not fully dealt with by thesaurus formats
• But simple links involving a same concept can be useful• C1 BT C2• C1 BT C3
![Page 33: The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context](https://reader035.vdocument.in/reader035/viewer/2022081502/56815e81550346895dcd1264/html5/thumbnails/33.jpg)
OAEI 2007: Results from the Library Track
Discussion: Differences between scenarios?
Precision and recall
• Browsing -> emphasis on recall• For other scenarios, it depends on the setting
Supervised vs. unsupervised
![Page 34: The Value of Usage Scenarios for Thesaurus Alignment in Cultural Heritage Context](https://reader035.vdocument.in/reader035/viewer/2022081502/56815e81550346895dcd1264/html5/thumbnails/34.jpg)
OAEI 2007: Results from the Library Track
Thanks!