antoine isaac , dirk kramer, lourens van der meij, shenghui wang, stefan schlobach, johan stapel

Download Antoine Isaac , Dirk Kramer, Lourens van der Meij, Shenghui Wang, Stefan Schlobach, Johan Stapel

If you can't read please download the document

Upload: sheba

Post on 06-Jan-2016

30 views

Category:

Documents


0 download

DESCRIPTION

Vocabulary Matching for Book Indexing Suggestion in Linked Libraries – A Prototype Implementation & Evaluation. Antoine Isaac , Dirk Kramer, Lourens van der Meij, Shenghui Wang, Stefan Schlobach, Johan Stapel. Problem: subject indexing. Describing subjects of books - PowerPoint PPT Presentation

TRANSCRIPT

  • Vocabulary Matching for Book IndexingSuggestion in Linked Libraries A PrototypeImplementation & Evaluation

    Antoine Isaac, Dirk Kramer, Lourens van der Meij, Shenghui Wang, Stefan Schlobach, Johan Stapel

  • Problem: subject indexingDescribing subjects of booksUsing concepts from vocabularies (e.g. thesauri)

  • Problem: re-indexingDescribing a book that has already be describedWith a new vocabularyFitting a different context (e.g., different libraries)

  • Why re-indexing at KB?The Dutch National Library (KB) holds many books that are also in other Dutch public librariesKB deposit uses Brinkman thesaurus for indexingPublic Libraries use Biblion thesaurus

    KB DepositCollection

    DutchPublic Libraries

    Biblion

    Brinkman

    overlap betweenbook collections

  • A wider issueKB shares books with many other librariesAll having their own description practices

    KB

    KB DepositColl.

    KB ScientificColl.

    DutchPublic Libraries

    LC(US Nat. Lib)

    BnF(French Nat. Lib)

    DNB(German Nat. Lib)

    DutchBook-trade

    Biblion

    NUR

    BISACsubject codes

    Brinkman

    GTT

    NBCclass.

    UNESCOclass.

    KB Corporatie+ Persoon

    RAMEAUsubject headings

    LCSHsubject headings

    DDCDeweydecimalclass.

    SWDsubject headings

    Personennamendatei

    LC authority file

    AutoritsBNF

    overlap between book collections (thickness indicates degree of overlap)

    Vertical adjustment between a coll. and KOSs denotes KOSs' being used to describe that coll.

    otherclassifications

    domain/disciplineclassifications

    subject thesauri / subj. heading lists

    book collection datasets

    person/corporation data

    Doel-groep--audience

  • Room for improvement?Libraries devote large resources to indexing20 people at KBAbout 20,000 books per year

    Leveraging already existing descriptions for re-indexing can be beneficial for both sides

  • Alignment and re-indexingSTITCH projectTackling semantic interoperability in Cultural HeritageUsing ontology alignment

    Mappings between concepts from different vocabularies can be used for re-indexingBasic idea: replace concepts in descriptionsby conceptually equivalent concepts

  • Goal: a re-indexing prototypePast: preliminary experiments with KB data

    Now: building a prototype andplugging it onto the KB production systemhaving it evaluated by its potential users (indexers)

    Prototype case: Dutch public libraries / KBSuggesting Brinkman subjects based on Biblion ones

  • Alignment and re-indexing: requirementsSubjects can be complex

    Mappings between groups of concepts "Travel guides" + "Spain" "Spain; travel guides"

    Concepts are used in descriptions

    Mappings taking into account extensional semantics"Building engineering" "Learning material ; building engineering"

  • Obtaining re-indexing rulesLexical alignments are not good enough

    Probabilistic rules are calculatedUsing extension of concepts: existing indexingSimple probabilities, with adhoc adjustment"Travel guides","Spain""Spain; travel guides", 0.982

    Not only based on Biblion subjectsAUT main authors of booksKAR characteristicDGP intellectual level/target group

  • DemoDoesn't work?

  • User studyQuantitative aspectHow well does the tool compare to human subject indexing?

    Qualitative aspectUser satisfactionImprovement suggestion

  • Evaluation setting6 indexers6 weeks284 booksEvaluation integrated in daily indexing work

    Pre-evaluation briefingQuestionnaire during evaluation Post-evaluation de-briefing & questionnaire

  • User study resultsTop ranked mappings are indeed much better

    Individual book satisfaction level > 70%

    Suggestion class# suggestionsprecisionrecallblue30872.7%47.9%purple1,18810.7%27.1%red2,5251.11%5.98%non suggested8919.0%

  • User study results (1)But the general satisfaction is lowerOnly two out of six would use the tool as such

    Quality of suggestionsLower-level suggestions are often not meaningful

    Perception of suggestions' qualityLong lists with wrong suggestions ad the end are badRanking is appreciated, but it is not enough

  • User study results (2)Suggestions were found promisingBridging the indexing gap between collectionsDifferent indexing strategies"Persian language" (Biblion)vs. "Iranian language and literature" (Brinkman)

    Lots of suggestions for improvementMore re-indexing!Suggesting concepts from other vocabulariesMore context metadata as input

  • ConclusionsShows the potential of re-using data in a library network

    Alignment approach fitting indexing practice

    Concrete demonstration, in KB production environment

    Technology transfer: KB wants to continue efforts

    Flexibility: architecture ready to exploit other vocabulariesLinked data & SKOS

  • Prototype components

    Database

    STITCH stylesheet (XSLT)

    WinIBWcataloguing interface

    IE

    GGC cataloguing system

    LOD SPARQL endpoints

    suggestion service(SWI-Prolog)

    vocabularyservice(Java/Tomcat)

    STITCH script(VisualBasic)

    Indexer

    lexical alignmentsSesame RDF store

    Sesame SKOSRDF store

  • Linked libraries?

    KB DepositColl.

    DutchPublic Libraries

    KB ScientificColl.

    LC(US Nat. Lib)

    BnF(French Nat. Lib)

    DNB(German Nat. Lib)

    DutchBook-trade

    Biblion

    NUR

    BISACsubject codes

    Brinkman

    GTT

    NBCclass.

    UNESCOclass.

    KB Corporatie+ Persoon

    RAMEAUsubject headings

    LCSHsubject headings

    DDCDeweydecimalclass.

    SWDsubject headings

    Personennamendatei

    wikipedia.nl

    wikipedia.de

    LC authority file

    AutoritsBNF

    existing KOS alignment

    potential KOS alignment of interest

    overlap between book collections (thickness indicates degree of overlap)

    otherclassifications

    domain/disciplineclassifications

    subject thesauri / subj. heading lists

    book collection datasets

    person/corporation data

    others

    LCSH

    currently available entry point to the LOD cloud

    KB

    Vertical adjustment between a coll. and KOSs denotes KOSs' being used to describe that coll.

    Doel-groep--audience

  • Thank you!Questions?

  • Screenshots

  • WinIBW production tool

  • STITCH suggestion tool

  • Original metadata

  • Concept suggestions

  • Comparing with human re-indexing

  • Complement: lexical alignments

  • Adding subjects using thesaurus access

  • Concept suggestions

  • Saving and back to WinIBW

  • ScreenshotsBack

    fr : 2BI 2BR