formats and frbr catalogues – where's our focus? trond aalberg ntnu and bibsys norway
TRANSCRIPT
Formats and FRBR Catalogues – Where's our focus?
Trond Aalberg
NTNU and BIBSYS
Norway
Topics
• FRBRizing existing catalogues– The BIBSYS FRBR project
• Internal FRBR structures– How to structure and store FRBR data internally
• Exchange – How to express and exchange FRBR data externally
• What kind of specification do we want/need the FRBR to be for implementations?
The BIBSYS FRBR projecta case study in the use of the FRBR model on the BIBSYS database
• BIBSYS– Norwegian service center for libraries: Norwegian university libraries, the
National Library, all college libraries, and a number of research libraries– Bibliographic database with circa 3.8 mill. records (8 mill. holdings)– BIBSYSMARC ~ NORMARC (subset but not proper subset of USMARC)
• Project cooperates with– Norwegian University of Science and Technology (Project management,
modeling and implementations) – The National Library of Norway (Mapping FRBR – BIBSYSMARC)– OCLC (running the Work-Set algorithm on the BIBSYS database)– The National Database Project of Norwegian University Museums (CRM)
• Funded by the Norwegian Archive, Library and Museum Authority (1/9 2004 – 31/8 2005) and is a part of the Norwegian Digital Library Initiative
Motivation and objectives
• Large number of existing MARC-based bibliographic catalogues – FRBRizing existing catalogues is a major challenge and the key
to a FRBRized bibliographic universe– Realistic FRBR prototypes can be used to validate the model
• ”Holistic” view– Process the complete database (not ideal subset)– From FRBR data model to test database and search prototype– Cover as much as possible of the BIBSYS data
• Findings – Possibilities and limitations – How to improve support for FRBR in BIBSYSMARC– Further research on specific problems
FRBRizing existing catalogues
• Def: – to implement aspects of the FRBR model
• Two different strategies:– Presentation layer only
• Adding system component that enables generation of FRBR
• Run-time or preprocessed
– Presentation and storage layer• Convert data to a FRBR ”compatible” model
Levels of FRBRizing
• Different levels of FRBRizing– Implement group 1 entities and inherent
relationships– Implement group 2 and 3 entitites and
inherent relationsips– Implement other relationships– Implement FRBR attributes
Implementing FRBR
Record 4
Record 5
Record 1
Record 2
Record 3
Internal FRBR data structure
• Build on ER approach• Decompose and convert MARC to FRBR attributes
BIBSYSMARC Example record*008 pv eng*015 $a nf0113657*020 $a 0-8222-1636-1$b h.*082 $d 839.822[S]*100 $a Ibsen, Henrik*241 $a Et dukkehjem $w dukkehjem*245 $a A doll's house $c by Henrik Ibsen ; adapted by Frank McGuinness*260 $a New York $b Dramatists Play Service $c c1998*300 $a 70 s.*700 $a McGuinness, Frank*096 $a NBO $c Småtr. 582 $n 02ga00027*096 $a NBO $c Ibsensenteret $n 01ga20306
*100 $a Ibsen, Henrik*241 $a Et dukkehjem $w dukkehjem
*008 pv eng*245 $a A doll's house $c by Henrik Ibsen ; adapted by Frank McGuinness*700 $a McGuinness, Frank
*020 $a 0-8222-1636-1$b h.*245 $a A doll's house $c by Henrik Ibsen ; adapted by Frank McGuinness*260 $a New York $b Dramatists Play Service $c c1998*300 $a 70 s.
*001982396694*008 pv eng*241 $aNår vi døde vågner*241 $aLille Eyolf*241 $aJohn Gabriel Borkman*245 $aLittle Eyolf ; John Gabriel Borkman ; When we dead awaken $cwithintroductions by William Archer*260 $c1907*300 $aXXVIII, 456 s.*491 $n91087302x$q11$v11*096ga$aNBO$cIbsensenteret/11$n85ga06648*096ga$aNBO$cIbsensenteret/11$n75ga29424*096ga$aNBO$cIbsensenteret/11$n74ga02038
*001980846714*008 pv*245 $aBrand$ctranslated and with introduction by C.H. Herford*260 $c1906*300 $aXIII, 262 s.*491 $n91087302x$q3$v3*700 $aHereford, C.H.*096ga$aNBO$cNA/A 2001:579$n75ga27601*096ga$aNBO$cIbsensenteret/3$n74ga02035*096ga$aNBO$cIbsensenteret/3$n74ga02036
Whole/Part records in BIBSYS fornumbered series and multi-volumed publications
*00191087302x*008 pv eng*015 $alc90186650*082 $c839.82/26*100 $aIbsen, Henrik*240 $aVerker$lEngelsk*245 $aThe collected works of Henrik Ibsen$c[entirely revised and edited byWilliam Archer]$wcollected works of Henrik Ibsen*250 $aCopyright ed.*260 $aLondon$bHeinemann$c1906-1912*300 $a12 b.*700 $aArcher, William$d1856-1924*580 $aDette er et lenket flerbindsverk*096kj$aUHS$bISS$c839.82 Ibs:Col$n75k005729*096ga$aNBO$cIbsensenteret$n75ga27600*096ga$aNBO$n75ga29508*096ga$aNBO$cIbsensenteret$n74ga02037*096ga$aNBO$cIbsensenteret$n85ga06639
*491 is used to implement an isPartOf referenceApp. 20% of the records
Preliminary results: W:E:M Statisticsfrom the BIBSYS database
1000000
2000000
3000000
4000000 Manifestations
Expressions
Works
1:N1:1
Data quality problems• Typical problems for “not normalized” data
– Redundant information• The same information is duplicated in multiple records
– Records are missing information– The same information is expressed in different ways
• Inherent problems with data quality
• Results from earlier work on the subset of ”Ibsen” records (~3000)– Using manual inspection and corrections of entries (language, titles, etc)– Based on knowledge about author, works, titles, ….
• Compared to results from automatic processing
• Numbers indicate– a high level of imprecise
information– quality can significantly
be improved
Works Expressions
With error corrections
84 747
Withouterror corrections
865 1354
Typical problems
• Different capitalization
• Spelling errors
• Substrings
• Only selected values
• Indicative information
• Missing information
Easy
Difficult
Conversion process outlined
FRBRImplementation
model
FRBR – BIBSYSMARCmapping
Identifyentities and relationships
Convert or extract fromMARC fields
to FRBR attributes
FRBR in MARC catalogues
Work
Expression
Manifestation
Item
MARC-record
Relationships
1~1
N:1
N:1
N:N
1:N
Group 2 and 3 entitiesN:N
FRBR attributes• Each of the entities in the model has associated with it a set of
characteristics or attributes
• Attributes serve as the means by which users formulate queries and interpret responses
• Derived from a logical analysis of the data that are typically reflected in bibliographic records
• Attributes are defined at a logical level
• Some are generally applicable, others are applicable only to subtypes
• Intended to be comprehensive but not exhaustive
• Not every instance will exhibit all attributes listed
Mapping MARC to FRBR• FRBR attributes are the bridge between FRBR and other formats
• Functional Analysis of the MARC 21Bibliographic and Holdings Formats
• Local mapping tables are needed
• Mapping is easy but conversion is difficult
• Depending on the purpose of mapping– Full conversion of data– Enable searching in different formats
– Mapping tables need to be close to conversion processes– Requires refinement of many FRBR attributes – and generalization of others
• What structures/formats do we implement?
Example: Manifestation.title
• 245 TITLE • $a – Title• $b – Other title information• $n – Number of part of work• $p – Title of part of work
• 246 PARALELL TITLE (R) • $a - Title proper/short title • $b - Other title information
• 740 ADDED ENTRY TITLE (R) • $a – Title• $b – Other title information• $n – Number of part of work• $p – Title of part of work
• 210 - ABBREVIATED TITLE• $a – Abbreviated title• $b – Complementary information
• 222 - KEY TITLE• $a – Key title• $b – Complementary information
• Field names are translations of BIBSYSMARC fieldnames • 740 is also mapped to expression and work title
• Complex data that maps to a single element• Generic category of information except for 740• Somewhat comparable structure
Example: Manifestation.identifier
• 020 ISBN (R)– $a ISBN– $z Invalid ISBN
• 022 ISSN (R)– $a ISSN– $y Invalid ISSN
• 024 ISMN and ISRC (R)– $a Number– $x Type of number– $y Invalid number
• And 027, 028, ..
• Complex data that maps to a single element• 020 and 022 comparable structure, but not 024
Example: 300 PHYSICAL DESCRIPTION
• $a Extent= Extent of the Carrier~ Form of Carrier~ Presentation Format (Visual Projection)~ Foliation (Hand-Printed Book)~ Collation (Hand-Printed Book)
• $b Illustrations (Other physical details)~ Capture Mode~ Colour (Image)~ Playing Speed (Sound Recording)~ Kind of Sound (Sound Recording)
• $c Format (Dimensions of the carrier)= Dimensions of the Carrier
*Mapped to manifestation
• Some FRBR attributes are too specific!
Prototype solution• Substructure is not always important for searching
• Substructure is important for presentation
• Mix models (FRBR and MARC)?
• Classifying specific fields/subfields as belonging to a specific entity/attribute– Not possible for fields that map to several FRBR entities and/or attributes
• Decomposing record instances– Determine what belongs to what entity/attribute– Tag values in MARC records with FRBR entity/attribute
• E.g extend MARCXML with attributes that identify FRBR entity/attribute– Tag FRBR attribute values with original MARC field/subfield
• Prototype solution using XML:– Different records for different entities– Maintain MARC substructure– To avoid runtime selection of work and expression entities– To facilitate error corrections and improve overall FRBR group 1 structure
FRBR as ontology
• FRBR is a conceptual model– Mainly interpreted as a reference model
• Can be formalized to an ontology eg. using W3C OWL:– ”This is a FRBR.Expression and it has a
FRBR.Translation relationship to another FRBR.Expression”
• Using Topic Maps and FRBR as typology (example from another project)
TM prototype• FRBR as ontology for music information:
– Works and creators– Artists and recorded performances– Navigation as the main discovery/search strategy
• Model and represent music information as distinct entities and relationships using FRBR as “types”– Not including FRBR attributes
• Exchange and integrate fragments using P2P (TMRAP)
• Objective– Explore and evaluate the use of FRBR entities and relationships – P2P exchange and integration of rich music information– Identifiers in the domain of music– The use of FRBR as an ontology in Topic Maps
* Examples are based on demo version of Omnigator software from Ontopia
FRBR ontology
Work example
Conclusion
• What do we want FRBR to be?– A reference model for bibliographic
catalogues– A conceptual model for understanding
bibliographic records– An ontology for exchanging bibliographic
information within the domain and with other domains