optimising metadata workflows in a distributed information environment r. john robertson & jane...
TRANSCRIPT
Optimising metadata workflows ina distributed information environment
R. John Robertson & Jane BartonCentre for Digital Library Research
University of Strathclyde, UK
Overview
Introductions & definitions:
Metadata, workflow & optimisation
Diversity & the distributed information environment
Models and frameworks:
Generic models: repositories, objects & metadata
Existing models & frameworks
Developing a metadata lifecycle model
Using the metadata lifecycle model tooptimise workflow
Moving forward
Metadata, workflow & optimisation
Metadata= good quality metadata= metadata that meets repository requirements
Metadata workflow= quality assured metadata by design= metadata creation & QA processes designed to meet repository requirements with available resources
Metadata workflow optimisation= refining metadata workflow to improve quality & enhance metadata
Critical to functionality, interoperability & sustainability of repositories
Optimising metadata workflow
Determine required metadata quality
Determine target metadata quality
Design & implement workflow
Refine workflow
Review
Determine purpose of metadata
Local environment
Wider environment
Barton, J. & Robertson, R.J. Designing workflows for quality assured metadata. CETIS Metadata & Digital Repositories SIG Meeting, Edinburgh, 10th March 2005.
Diversity & the dIE
In the wider environment, there is considerable diversity
• of purpose
• of metadata requirements
• of metadata creation processes & priorities
Diversity presents challenges for interoperability between repositories
Diversity also offers potential for refinement of metadata workflow among repositories
Assumes/requires persistent object identifiers
Optimising metadata workflow in the dIE
Workflow optimisation requires a model of the dIE
• to facilitate strategic partnerships
• to inform allocation of resources
• to foster holistic approach to creation, augmentation & enhancement of metadata
To achieve this, two conditions must be met:
• local workflow must be articulated
• local workflow must be placed in context of wider environment
Reference models for workflow optimisation
Ecology of repositoriesprovides a typology of repositories & associated servicesmodels the relationships between them & between their domains
Object lifecycle modelprofiles objects within repositories & their movement, transformation & adaptation within the dIE
Metadata lifecycle modelprofiles metadata within repositories & its movement, augmentation & enhancement within the dIE
Existing models & frameworks
Existing models that relate to (parts of) the reference models:
the E-Learning Framework
McLean & Blinco’s cosmic view
the JISC Information Environment
CORDRA
the work of Gonçalves et al
The E-Learning Framework (ELF)
A common approach to service oriented architectures for education via:a definitional model of service componentsstandards & tools to support their interoperability
Addresses a specific domain & provides a typology of functions within that domain
(The E-Learning Framework. http://www.elframework.org)
McLean & Blinco’s cosmic view
A service domain typology of repositoriesmore comprehensive than ELF but less detailedhighlights potential for cross-domain approachidentifies need for better articulation of context & methodologies to deal with complex contextual issues
(McLean, N. The ecology of repository services: a cosmic view. ECDL, 2004. http://www.ecdl2004.org/presentations/mclean/)
The JISC Information Environment
Provides convenient access to a comprehensive collection of scholarly & educational materialscan be viewed as a specific implementation of ELFprovides a superstructure to inform & co-ordinate technical infrastructure developmentfocuses on technical solutions to support structural & syntactical interoperabilitytaking a lead in addressing unresolved issues in the object lifecycle
(JISC. Strategic activities: Information Environment.2004. http://www.jisc.ac.uk/about_info_env.html)
CORDRA
Enables access to wide range of learning object repositories through federated searching:high common denominator for participating LORscreates community of repositories behind interoperability boundaryassumes federation as method of interaction, with metadata integration rather than interoperability, so little potential for metadata workflow optimisation
(Kraan,W. & Mason,J. Issues in federating repositories: a report on the first International CORDRA Workshop. D-Lib Magazine, 11(3), 2005.)
Gonçalves et al’s 5S
Complex formal taxonomy of repositories:comprehensively catalogues repositories from five perspectivesengages with all three reference models but does not engage with interactions & offers only a static view
(Goncalves,M.A. et al. Streams, structures, spaces, Scenarios, societies (5S): a formal model for digital libraries. ACM Transactions on Information Systems, 22(2), 2004.)
Existing models & frameworks
In general, existing models
address structural & syntactic interactions to a degree but do not address semantic interactions
provide voices, vocabularies & grammar for repositories
could usefully be extended to profile not only what repositories do but how they might interact with each other
Developing a metadata lifecycle model
A metadata lifecycle model (MLM) must:
• include profiles of each repository’s metadata, ideally at element level, more realistically in terms of structure, semantics & syntax
• distinguish between local requirements & those of the wider community
• enable clusters of similar repositories to be identified & relationships established
• include processes carried out as a result of these relationships, formal or informal
Components of the model
Using the MLM to optimise workflow
MLM enables repositories to optimise workflow by:
• exploiting known metadata sources elsewhere in the dIE via intelligent import or harvesting
• exploiting formal metadata relationships between repositories & services via negotiation & establishment of minimum standards
• provides a framework for assessing the cost/benefit of eg implementing particular metadata elements or participating in consortia
Using the MLM: example
The NSDL is a centralised service harvesting metadata from multiple sources:
• breaks harvested metadata into elements & assigns provenance metadata to them
• creates optimum records by combining metadata elements from various sources
• creates metadata profiles of sources to enable these processes to be automated
• demonstrates that metadata workflow optimisation & intelligent harvesting can yield real benefits
Using the MLM: use cases
LOR using LOM wants to harvest metadata records, has crosswalks & mappings for structure & syntax, seeks repositories with similar semantic approach
federated search service wants to dynamically select search targets that can support MESH
departmental repository enhances its metadata byre-harvesting general subject terms from its IR & specialist subject terms from a subject repository
centralised service augments metadata automatically & original source re-harvests improved record
Moving forward…
In context of rapid repository development with limited resources, must use available resources as effectively as possible
Optimising metadata workflow across the dIE can enable repositories to:
• expand element sets without compromising on quality
• expand functionality
• improve ingest processes
• support more automatic metadata transformation & enhancement
Moving forward…
Development of the MLM to support metadata workflow optimisation requires:
• standard way of profiling repositories at repository, object & metadata level
• integration with registry projects for repositories, standards, application profiles & vocabularies
• at individual repository level, a method for the design of metadata workflows that makes reference to & exploits workflows elsewhere in the dIE
Optimising metadata workflow
Determine required metadata quality
Determine target metadata quality
Design and implement workflow
Refine workflow
Review
Determine purpose of metadata
Local environment
Wider environment
Barton, J. & Robertson, R.J. Designing workflows for quality assured metadata. CETIS Metadata & Digital Repositories SIG Meeting, Edinburgh, 10th March 2005.