andrewwilliamson_u4

Upload: el-piporro

Post on 14-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 AndrewWilliamson_u4

    1/6

    LR54,9

    508

    Library ReviewVol. 54 No. 9, 2005pp. 508-513# Emerald Group Publishing Limited

    0024-2535DOI 10.1108/00242530510629515

    Received 25 May 2005Reviewed 31 May 2005Revised 1 June 2005Accepted 2 June 2005

    DIGITAL DIRECTIONS

    Strategies for managing digitalcontent formats

    Andrew WilliamsonResearcher, Centre for Digital Library Research,

    University of Strathclyde, Glasgow, UK

    Abstract

    Purpose With heavy ongoing investment in the creation, storage and delivery of electronic content,it is important to consider the long-term preservation of the resources produced.Design/methodology/approach A viewpoint paper based on extensive practitioner experiencewith the management of digitisation, digital preservation, and quality assurance procedures.Findings The choice of file and media formats for the content can have a significant effect on long-

    term access to electronic content.Practical implications Gives some useful insights on some of the issues surrounding the choiceof open or proprietary formats. The paper also examines some of the pitfalls of a proprietaryapproach and suggests some strategies that might be employed for managing digital content formatsin the long-term.Originality/value An attempt to provide clear, experience based strategies on how best to engagein the long-term management of digital content formats.

    Keywords Digital storage, Information systems, Digital libraries

    Paper type Viewpoint

    Introduction: a standards based approach to digital preservationDigital content has become an increasingly important element in many library collections

    over recent years. At institutional, regional, national and international levels, large sumsof money are being invested in the creation of such content, and the means of storage anddelivery to users. Google made headlines in 2004, pledging to spend between $150 millionand $200 million over a decade on digitising some 15 million books from librarycollections in the USA and the UK (Riding, 2005). Also in the UK, the NOF-Digiprogramme has invested 50 million across 150 projects to produce and publish onlinematerial that supports lifelong learning for all (Nicholson and Macgregor, 2003).

    Consideration must be given at an early stage to ensuring the longevity of digitalresources, in order to protect and maximise the return on the investment in contentcreation. One of the key components in ensuring resource longevity is the choice of fileand media formats used to create, store, and deliver digital content, and the strategiesthat are employed to manage these in the long term.

    Guidance from funding bodies and advisory services now generally recommends,and in some cases mandates, a standards-based approach to the entire process, arguingthat electronic content should be created, stored, maintained and disseminated usingopen standards whenever possible. An example of such guidance can be found inUKOLN (2003).

    The UK joint information systems committee ( JISC) quality assurance focus(QA Focus, 2003) identified the following as the characteristics of open standards:

    . They are the product of an open standards-making process;

    . Documentation of the standard is freely available;

    The Emerald Research Register for this journal is available at

    www.emeraldinsight.com/researchregister

    The current issue and full text archive of this journal is available at

    www.emeraldinsight.com/0024-2535.htm

  • 7/29/2019 AndrewWilliamson_u4

    2/6

    Strategiefor managin

    digital conten

    50

    . The standard can be used unrestricted by patent or licence issues;

    . The standard is ratified by a recognised standards body, such as NISO.

    An open standards approach brings a wide range of benefits including:

    . Resources are freed from dependence on a single application, or particularhardware platforms;

    . Resources can be preserved and accessed over the long term.

    One open standard that is becoming ever more important and widespread is extensiblemarkup language (XML). Yeates (2002, p. 72) asks, Why are so many librarians,archivists and museum curators talking about XML? and then answers the questionby illustrating the potential (and problems) of exporting data from legacy systems toXML in order to promote interoperability, resource discovery and, by implication, non-proprietary digital preservation.

    While preference should always be given to an open standards approach, it isimportant to realise that situations will arise where an open approach is not possibleand proprietary formats will be chosen instead. These formats are owned by anorganisation or group (e.g. Microsoft), may sometimes be accepted as de factostandards through sheer ubiquity, and might even be referred to as standards, butcannot be regarded as open since the owner could theoretically choose to change theformat or the conditions of usage at any time.

    The main focus of this brief paper is on the proprietary approach; consideringsome of the reasons why organisations may choose a proprietary format, the problemsthis might cause in the future, and considering some of the strategies which maybe employed to manage digital content formats both open and proprietary in thelong term.

    Why might organisations choose proprietary formats?Organisations or individuals may choose to utilise proprietary rather than openformats for a number of reasons:

    . Delayed development of open formats. For certain content types there may be nosuitable open format available at the time that the content is being created;

    . Organisational expertise. Proprietary software and formats (e.g. MicrosoftOffice), may already be widely deployed within an organisation, with staff beingtrained and comfortable in its use;

    . Resourcing. There may be a reluctance to move to an open standards approachdue to the additional training and software costs required, particularly whenubiquitous proprietary solutions are already easily available.

    What problems can the choice of proprietary formats cause?The choice of proprietary media and/or storage formats can lead to digital preservationproblems in the future, arising from both the choice of digital media and the fileformats encoded on that media.

    Media format issuesWhen a physical media format is chosen for the storage of electronic content,consideration must be given to the possibility of that format becoming obsolete over

  • 7/29/2019 AndrewWilliamson_u4

    3/6

    LR54,9

    510

    time. This can particularly be a problem with new storage technologies, where anumber of similar formats may be competing or coexisting in the marketplace e.g. thecompetition between VHS and Betamax format video recorders, or the current marketfor recordable DVD technology, which sees several competing standards vying for

    dominance (DAmbrise, 2004). There is always the possibility that one format willeventually dominate whether through technological superiority or the power ofmarketing thus marginalising competitors and, ultimately over time, rendering anyopposing formats obsolete.

    Darlington et al. (2003) outline a famous example of media obsolescence; that ofthe BBC Domesday Project, a collection of digital content created in 1986 to mark the900th anniversary of the original Domesday book. The content was stored using aproprietary laser disk format, the media and players for which were no longeravailable, thus rendering the output of the innovative project virtually inaccessible.Darlington outlines the painstaking work undertaken in 2002 and 2003 to preserve thecontent, noting that the work had taken place just in time while some original systemsand hardware were still available and workable.

    It is clear that physical storage media (CDs, tapes, etc.), the associated storagehardware, and the necessary software for reading/writing the media must be consideredand maintained together, as each becomes effectively useless without the others. Ifhardware develops faults over time it may become impossible to retrieve the contentfrom the media and may result in damage to the media, compounding the problems.Equally, pristine hardware cannot protect against data loss due to compromised media.As some degree of physical degradation is inevitable over time, the strategies outlinedlater should be employed to mitigate loss.

    File format issuesThe choice of proprietary file formats adds further complexity to ensuring long-term

    access to electronic content. Proprietary software applications are regularly updatedwith new versions. While functionality may not change markedly from one version toits immediate successor, cumulative changes to a file format may become moresignificant in the longer term, potentially jeopardising backwards compatibility.

    Maintaining copies of legacy software may seem desirable, but can be fraught withproblems. Just like application software, operating systems are also periodicallyupgraded and may, in the long term, simply cease to support legacy packages asunderlying system architectures develop. For example, the release of service pack 2for Windows XP in 2004 witnessed reports of functionality problems with over200 applications (Leyden, 2004). Maintaining older operating systems may not be anattractive solution, particularly in an online networked environment where there existsan increased risk of new security problems emerging in unsupported legacy systems.

    Strategies for managing digital formatsAs outlined above, the choice of media and file formats for the storage of electroniccontent could cause serious problems for the long-term accessibility of the materials,particularly where a proprietary format has been used. Whatever the choice ofapproach, strategies must be put in place to manage digital formats over the long term,in order to mitigate (or avoid altogether) the problems outlined earlier. It is not withinthe remit of this short paper to explore the intricacies of each strategy; references andfurther reading lists are provided for this purpose. Rather, the strategies outlined

  • 7/29/2019 AndrewWilliamson_u4

    4/6

    Strategiefor managin

    digital conten

    51

    below serve to raise awareness among readers as to the options available to thosepractitioners engaging in the long-term management of digital resources.

    These strategies can be grouped under six headings. Although most of the strategicelements within each are interlinked, few will be successful in the long term if pursued

    in isolation. It is also worth noting that while some of these elements are moreapplicable to the proprietary approach, they are generally valid across all electroniccontent, regardless of format. Each of these strategic components might also beproblematic within organisations particularly those in project-funded environments,where staffing and other technical resources may not be readily available beyond thefunded lifespan of a project.

    Strategy 1: documentationIt is with some irony that the preservation of digital resources begins with ensuring thepreservation of staff knowledge and sound knowledge management practices. Qualitydocumentation is a key component of any preservation strategy and it is importantthat information about the technical decisions taken at each stage of the creation,

    storage and maintenance process is available in the long term, possibly after those staffthat had direct knowledge and experience of the process have moved on.

    Strategy 2: migrationMigration involves ensuring that all electronic content is held in a format which isuseable and accessible by current software and hardware; keeping content up to datewith the latest developments and guarding against format obsolescence. Wherecontent is stored using a proprietary format, it is particularly desirable to migrate to asuitable open standard format, as and when one becomes available.

    Migration is potentially time-consuming, complex and expensive, and couldrepresent a significant drain on organisational resources in the long term, particularlyas the need to migrate may depend on the progress of a volatile technology industry.Moreover, migration can potentially inhibit any functionality inherent in the original.Such costs must be balanced against the initial investment in content creation and thevalue of long-term access to the content.

    Strategy 3: refreshmentRefreshment is the periodic transfer of electronic content to newer storage media (e.g.CD/DVD/DAT tape). This helps to guard against data loss due to media degradation.The timing of refreshment cycles should be informed by manufacturers informationon, and practitioners experience of, the typical lifespan of their physical media. It isadvisable to check a random sample of used storage media on a regular basis at leastannually to ensure that the physical media remain accessible and the contents remain

    intact. If problems emerge within the sample, then urgent refreshment action should betaken A prudent strategy would be to ensure that content is on at least two types ofdigital media and in different physical locations.

    Strategy 4: emulationIn the event of system or media obsolescence, organisations may choose to create oruse emulation software, to mimic the behaviour of obsolete hardware and operatingsystems, and enable use of legacy software. The emergence of a significant market inlegacy emulators would seem a real possibility as and when access problems begin tobe widespread.

  • 7/29/2019 AndrewWilliamson_u4

    5/6

    LR54,9

    512

    However, it should be recognised that although avoiding the repeated costs associatedwith migration, the widespread deployment of specialist emulation software inlibraries remains tentative, with further research and wider practical experimentationsorely required. New emulators can be costly unless there is scope to reap economies of

    scale and such software often has to be created in parallel with significant computerparadigm shifts. Indeed, as Jones and Beagrie note (2002), such realities can quitefeasibly surpass the costs incurred by assuming a repeated migration strategy.

    Strategy 5: controlled storageTo mitigate against the degradation of storage media and access devices, these shouldbe stored and operated in suitable environmental conditions, ideally within theenvironmental tolerances specified by manufacturers. Storage media should behandled as infrequently as possible, with minimal movement that involves exposingthe media to significantly different environmental conditions. Backup media shouldideally be stored offsite, as a precaution against disasters that may damage onsiteresources.

    Strategy 6: backup/recovery proceduresDigital content is inherently vulnerable to loss or damage from hardware or softwarefaults. Resources must therefore be allocated to the backup and recovery requirementsof an organisation. Initial backups should be created at the time a resource is created,with a regular routine implemented so that further backups are created during thelifetime of the resource. The recovery phase must also be considered. Procedures fordata recovery should be tested periodically to ensure that data can be restored frombackup media, and that the media remains compatible with changes in backuptechnology.

    Concluding thoughtsWhether as a result of organisational expertise issues, resource issues, or because asuitable open standard has yet to be developed, organisations will often be compelledto use, and will occasionally choose, a proprietary format for their digital resources.This brief paper has outlined the rationale behind such behaviour and has aimed tohighlight some of the problems proprietary formats can cause for digital resourcemanagement and suitable strategies for managing digital formats both open andproprietary in the long-term. The huge sums being invested in the creation ofelectronic content have the potential to create a golden digital heritage for futuregenerations. For this potential to be realised however, attention must be given at allstages of the content creation, storage and delivery process to the digital contentformats being employed, and steps must be taken to actively manage content formatsover time, to guard against the dangers of creeping technical obsolescence or long-termdegradation of resources.

    References

    DAmbrise, R. (2004), DVD update: from double layers to blue lasers, Computer TechnologyReview, Vol. 24 No. 5, pp. 30-2.

    Darlington, J., Finney, A. and Pearce, A. (2003), Domesday Redux: the rescue of the BBCdomesday project videodisc, Ariadne, No. 36, available at: www.ariadne.ac.uk/issue36/tna/ (accessed 2 June 2005).

  • 7/29/2019 AndrewWilliamson_u4

    6/6

    Strategiefor managin

    digital conten

    51

    Jones, M. and Beagrie, N. (2002), Preservation Management of Digital Materials A Handbook,British Library, London, available at: www.dpconline.org/graphics/handbook/ (accessed2 June 2005).

    Leyden, J. (2004), 200 apps clash with XP SP2, The Register, 17 August 2004, available at:

    www.theregister.co.uk/2004/08/17/xp_sp2_glitches/ (accessed 2 June 2005).Nicholson, D. and Macgregor, G. (2003), NOF-Digi: putting UK culture online, OCLC Systems

    and Services, Vol. 19 No. 3, pp. 96-9.

    QA Focus (2003), What are Open Standards?, UKOLN, University of Bath, available at:www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-11/html/ (accessed 2 June 2005).

    Riding, A. (2005), France detects a cultural threat in Google, The New York Times, 11 April.

    UKOLN (2003), Technical Guidelines for Digital Content Creation Programmes, Working DraftVersion 0.05, UKOLN, University of Bath, available at: www.minervaeurope.org/structure/workinggroups/servprov/documents/techguid005draft.pdf (accessed 2 June2005).

    Yeates, R. (2002), An XML infrastructure for archives, libraries and museums: resourcediscovery in the COVAX project, Program: Electronic Library and Information Systems,

    Vol. 36 No. 2, pp. 72-88.

    Further reading

    Lin, L.S., Ramaiah, C.K. and Wal, P.K. (2003), Problems in the preservation of electronic records,Library Review, Vol. 52 No. 3, pp. 117-25.

    New Opportunities Fund (2004), NOF-Digitise Programme Manual: Digital Preservation,NOF-Digitise Technical Advisory Service, University of Bath, available at: www.ukoln.ac.uk/nof/support/manual/digital-preservation/ (accessed 2 June 2005).

    Semple, N. (2004), Developing a digital preservation strategy at Edinburgh University Library,VINE: The Journal of Information and Knowledge Management Systems, Vol. 34 No. 1,pp. 33-7.