formats and frbr catalogues – where's our focus? trond aalberg ntnu and bibsys norway

25
Formats and FRBR Catalogues – Where's our focus? Trond Aalberg NTNU and BIBSYS Norway

Upload: alexandra-stafford

Post on 27-Mar-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Formats and FRBR Catalogues – Where's our focus? Trond Aalberg NTNU and BIBSYS Norway

Formats and FRBR Catalogues – Where's our focus?

Trond Aalberg

NTNU and BIBSYS

Norway

Page 2: Formats and FRBR Catalogues – Where's our focus? Trond Aalberg NTNU and BIBSYS Norway

Topics

• FRBRizing existing catalogues– The BIBSYS FRBR project

• Internal FRBR structures– How to structure and store FRBR data internally

• Exchange – How to express and exchange FRBR data externally

• What kind of specification do we want/need the FRBR to be for implementations?

Page 3: Formats and FRBR Catalogues – Where's our focus? Trond Aalberg NTNU and BIBSYS Norway

The BIBSYS FRBR projecta case study in the use of the FRBR model on the BIBSYS database

• BIBSYS– Norwegian service center for libraries: Norwegian university libraries, the

National Library, all college libraries, and a number of research libraries– Bibliographic database with circa 3.8 mill. records (8 mill. holdings)– BIBSYSMARC ~ NORMARC (subset but not proper subset of USMARC)

• Project cooperates with– Norwegian University of Science and Technology (Project management,

modeling and implementations) – The National Library of Norway (Mapping FRBR – BIBSYSMARC)– OCLC (running the Work-Set algorithm on the BIBSYS database)– The National Database Project of Norwegian University Museums (CRM)

• Funded by the Norwegian Archive, Library and Museum Authority (1/9 2004 – 31/8 2005) and is a part of the Norwegian Digital Library Initiative

Page 4: Formats and FRBR Catalogues – Where's our focus? Trond Aalberg NTNU and BIBSYS Norway

Motivation and objectives

• Large number of existing MARC-based bibliographic catalogues – FRBRizing existing catalogues is a major challenge and the key

to a FRBRized bibliographic universe– Realistic FRBR prototypes can be used to validate the model

• ”Holistic” view– Process the complete database (not ideal subset)– From FRBR data model to test database and search prototype– Cover as much as possible of the BIBSYS data

• Findings – Possibilities and limitations – How to improve support for FRBR in BIBSYSMARC– Further research on specific problems

Page 5: Formats and FRBR Catalogues – Where's our focus? Trond Aalberg NTNU and BIBSYS Norway

FRBRizing existing catalogues

• Def: – to implement aspects of the FRBR model

• Two different strategies:– Presentation layer only

• Adding system component that enables generation of FRBR

• Run-time or preprocessed

– Presentation and storage layer• Convert data to a FRBR ”compatible” model

Page 6: Formats and FRBR Catalogues – Where's our focus? Trond Aalberg NTNU and BIBSYS Norway

Levels of FRBRizing

• Different levels of FRBRizing– Implement group 1 entities and inherent

relationships– Implement group 2 and 3 entitites and

inherent relationsips– Implement other relationships– Implement FRBR attributes

Page 7: Formats and FRBR Catalogues – Where's our focus? Trond Aalberg NTNU and BIBSYS Norway

Implementing FRBR

Record 4

Record 5

Record 1

Record 2

Record 3

Internal FRBR data structure

• Build on ER approach• Decompose and convert MARC to FRBR attributes

Page 8: Formats and FRBR Catalogues – Where's our focus? Trond Aalberg NTNU and BIBSYS Norway

BIBSYSMARC Example record*008 pv eng*015 $a nf0113657*020 $a 0-8222-1636-1$b h.*082 $d 839.822[S]*100 $a Ibsen, Henrik*241 $a Et dukkehjem $w dukkehjem*245 $a A doll's house $c by Henrik Ibsen ; adapted by Frank McGuinness*260 $a New York $b Dramatists Play Service $c c1998*300 $a 70 s.*700 $a McGuinness, Frank*096 $a NBO $c Småtr. 582 $n 02ga00027*096 $a NBO $c Ibsensenteret $n 01ga20306

*100 $a Ibsen, Henrik*241 $a Et dukkehjem $w dukkehjem

*008 pv eng*245 $a A doll's house $c by Henrik Ibsen ; adapted by Frank McGuinness*700 $a McGuinness, Frank

*020 $a 0-8222-1636-1$b h.*245 $a A doll's house $c by Henrik Ibsen ; adapted by Frank McGuinness*260 $a New York $b Dramatists Play Service $c c1998*300 $a 70 s.

Page 9: Formats and FRBR Catalogues – Where's our focus? Trond Aalberg NTNU and BIBSYS Norway

*001982396694*008 pv eng*241 $aNår vi døde vågner*241 $aLille Eyolf*241 $aJohn Gabriel Borkman*245 $aLittle Eyolf ; John Gabriel Borkman ; When we dead awaken $cwithintroductions by William Archer*260 $c1907*300 $aXXVIII, 456 s.*491 $n91087302x$q11$v11*096ga$aNBO$cIbsensenteret/11$n85ga06648*096ga$aNBO$cIbsensenteret/11$n75ga29424*096ga$aNBO$cIbsensenteret/11$n74ga02038

*001980846714*008 pv*245 $aBrand$ctranslated and with introduction by C.H. Herford*260 $c1906*300 $aXIII, 262 s.*491 $n91087302x$q3$v3*700 $aHereford, C.H.*096ga$aNBO$cNA/A 2001:579$n75ga27601*096ga$aNBO$cIbsensenteret/3$n74ga02035*096ga$aNBO$cIbsensenteret/3$n74ga02036

Whole/Part records in BIBSYS fornumbered series and multi-volumed publications

*00191087302x*008 pv eng*015 $alc90186650*082 $c839.82/26*100 $aIbsen, Henrik*240 $aVerker$lEngelsk*245 $aThe collected works of Henrik Ibsen$c[entirely revised and edited byWilliam Archer]$wcollected works of Henrik Ibsen*250 $aCopyright ed.*260 $aLondon$bHeinemann$c1906-1912*300 $a12 b.*700 $aArcher, William$d1856-1924*580 $aDette er et lenket flerbindsverk*096kj$aUHS$bISS$c839.82 Ibs:Col$n75k005729*096ga$aNBO$cIbsensenteret$n75ga27600*096ga$aNBO$n75ga29508*096ga$aNBO$cIbsensenteret$n74ga02037*096ga$aNBO$cIbsensenteret$n85ga06639

*491 is used to implement an isPartOf referenceApp. 20% of the records

Page 10: Formats and FRBR Catalogues – Where's our focus? Trond Aalberg NTNU and BIBSYS Norway

Preliminary results: W:E:M Statisticsfrom the BIBSYS database

1000000

2000000

3000000

4000000 Manifestations

Expressions

Works

1:N1:1

Page 11: Formats and FRBR Catalogues – Where's our focus? Trond Aalberg NTNU and BIBSYS Norway

Data quality problems• Typical problems for “not normalized” data

– Redundant information• The same information is duplicated in multiple records

– Records are missing information– The same information is expressed in different ways

• Inherent problems with data quality

• Results from earlier work on the subset of ”Ibsen” records (~3000)– Using manual inspection and corrections of entries (language, titles, etc)– Based on knowledge about author, works, titles, ….

• Compared to results from automatic processing

• Numbers indicate– a high level of imprecise

information– quality can significantly

be improved

Works Expressions

With error corrections

84 747

Withouterror corrections

865 1354

Page 12: Formats and FRBR Catalogues – Where's our focus? Trond Aalberg NTNU and BIBSYS Norway

Typical problems

• Different capitalization

• Spelling errors

• Substrings

• Only selected values

• Indicative information

• Missing information

Easy

Difficult

Page 13: Formats and FRBR Catalogues – Where's our focus? Trond Aalberg NTNU and BIBSYS Norway

Conversion process outlined

FRBRImplementation

model

FRBR – BIBSYSMARCmapping

Identifyentities and relationships

Convert or extract fromMARC fields

to FRBR attributes

Page 14: Formats and FRBR Catalogues – Where's our focus? Trond Aalberg NTNU and BIBSYS Norway

FRBR in MARC catalogues

Work

Expression

Manifestation

Item

MARC-record

Relationships

1~1

N:1

N:1

N:N

1:N

Group 2 and 3 entitiesN:N

Page 15: Formats and FRBR Catalogues – Where's our focus? Trond Aalberg NTNU and BIBSYS Norway

FRBR attributes• Each of the entities in the model has associated with it a set of

characteristics or attributes

• Attributes serve as the means by which users formulate queries and interpret responses

• Derived from a logical analysis of the data that are typically reflected in bibliographic records

• Attributes are defined at a logical level

• Some are generally applicable, others are applicable only to subtypes

• Intended to be comprehensive but not exhaustive

• Not every instance will exhibit all attributes listed

Page 16: Formats and FRBR Catalogues – Where's our focus? Trond Aalberg NTNU and BIBSYS Norway

Mapping MARC to FRBR• FRBR attributes are the bridge between FRBR and other formats

• Functional Analysis of the MARC 21Bibliographic and Holdings Formats

• Local mapping tables are needed

• Mapping is easy but conversion is difficult

• Depending on the purpose of mapping– Full conversion of data– Enable searching in different formats

– Mapping tables need to be close to conversion processes– Requires refinement of many FRBR attributes – and generalization of others

• What structures/formats do we implement?

Page 17: Formats and FRBR Catalogues – Where's our focus? Trond Aalberg NTNU and BIBSYS Norway

Example: Manifestation.title

• 245 TITLE • $a – Title• $b – Other title information• $n – Number of part of work• $p – Title of part of work

• 246 PARALELL TITLE (R) • $a - Title proper/short title • $b - Other title information

• 740 ADDED ENTRY TITLE (R) • $a – Title• $b – Other title information• $n – Number of part of work• $p – Title of part of work

• 210 - ABBREVIATED TITLE• $a – Abbreviated title• $b – Complementary information

• 222 - KEY TITLE• $a – Key title• $b – Complementary information

• Field names are translations of BIBSYSMARC fieldnames • 740 is also mapped to expression and work title

• Complex data that maps to a single element• Generic category of information except for 740• Somewhat comparable structure

Page 18: Formats and FRBR Catalogues – Where's our focus? Trond Aalberg NTNU and BIBSYS Norway

Example: Manifestation.identifier

• 020 ISBN (R)– $a ISBN– $z Invalid ISBN

• 022 ISSN (R)– $a ISSN– $y Invalid ISSN

• 024 ISMN and ISRC (R)– $a Number– $x Type of number– $y Invalid number

• And 027, 028, ..

• Complex data that maps to a single element• 020 and 022 comparable structure, but not 024

Page 19: Formats and FRBR Catalogues – Where's our focus? Trond Aalberg NTNU and BIBSYS Norway

Example: 300 PHYSICAL DESCRIPTION

• $a Extent= Extent of the Carrier~ Form of Carrier~ Presentation Format (Visual Projection)~ Foliation (Hand-Printed Book)~ Collation (Hand-Printed Book)

• $b Illustrations (Other physical details)~ Capture Mode~ Colour (Image)~ Playing Speed (Sound Recording)~ Kind of Sound (Sound Recording)

• $c Format (Dimensions of the carrier)= Dimensions of the Carrier

*Mapped to manifestation

• Some FRBR attributes are too specific!

Page 20: Formats and FRBR Catalogues – Where's our focus? Trond Aalberg NTNU and BIBSYS Norway

Prototype solution• Substructure is not always important for searching

• Substructure is important for presentation

• Mix models (FRBR and MARC)?

• Classifying specific fields/subfields as belonging to a specific entity/attribute– Not possible for fields that map to several FRBR entities and/or attributes

• Decomposing record instances– Determine what belongs to what entity/attribute– Tag values in MARC records with FRBR entity/attribute

• E.g extend MARCXML with attributes that identify FRBR entity/attribute– Tag FRBR attribute values with original MARC field/subfield

• Prototype solution using XML:– Different records for different entities– Maintain MARC substructure– To avoid runtime selection of work and expression entities– To facilitate error corrections and improve overall FRBR group 1 structure

Page 21: Formats and FRBR Catalogues – Where's our focus? Trond Aalberg NTNU and BIBSYS Norway

FRBR as ontology

• FRBR is a conceptual model– Mainly interpreted as a reference model

• Can be formalized to an ontology eg. using W3C OWL:– ”This is a FRBR.Expression and it has a

FRBR.Translation relationship to another FRBR.Expression”

• Using Topic Maps and FRBR as typology (example from another project)

Page 22: Formats and FRBR Catalogues – Where's our focus? Trond Aalberg NTNU and BIBSYS Norway

TM prototype• FRBR as ontology for music information:

– Works and creators– Artists and recorded performances– Navigation as the main discovery/search strategy

• Model and represent music information as distinct entities and relationships using FRBR as “types”– Not including FRBR attributes

• Exchange and integrate fragments using P2P (TMRAP)

• Objective– Explore and evaluate the use of FRBR entities and relationships – P2P exchange and integration of rich music information– Identifiers in the domain of music– The use of FRBR as an ontology in Topic Maps

* Examples are based on demo version of Omnigator software from Ontopia

Page 23: Formats and FRBR Catalogues – Where's our focus? Trond Aalberg NTNU and BIBSYS Norway

FRBR ontology

Page 24: Formats and FRBR Catalogues – Where's our focus? Trond Aalberg NTNU and BIBSYS Norway

Work example

Page 25: Formats and FRBR Catalogues – Where's our focus? Trond Aalberg NTNU and BIBSYS Norway

Conclusion

• What do we want FRBR to be?– A reference model for bibliographic

catalogues– A conceptual model for understanding

bibliographic records– An ontology for exchanging bibliographic

information within the domain and with other domains