![Page 1: 1 CS 502: Computing Methods for Digital Libraries Lecture 27 Preservation](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d6c5503460f94a4c24a/html5/thumbnails/1.jpg)
1
CS 502: Computing Methods for Digital Libraries
Lecture 27
Preservation
![Page 2: 1 CS 502: Computing Methods for Digital Libraries Lecture 27 Preservation](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d6c5503460f94a4c24a/html5/thumbnails/2.jpg)
2
Administration
Online survey
http://create.hci.cornell.edu/cssurvey.cfm
Course evaluations
at end of class today
![Page 3: 1 CS 502: Computing Methods for Digital Libraries Lecture 27 Preservation](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d6c5503460f94a4c24a/html5/thumbnails/3.jpg)
3
Long-term preservation
Objective
Retain digital library materials over centuries
Longer than ...
• computer architectures (Wintel, Linux, 390, ...)
• magnetic storage (disks, tapes, ...)
• formats, protocols, applications (Unicode, Java, XML, ...)
• Internet or the web
for purposes that we have not yet considered
![Page 4: 1 CS 502: Computing Methods for Digital Libraries Lecture 27 Preservation](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d6c5503460f94a4c24a/html5/thumbnails/4.jpg)
4
![Page 5: 1 CS 502: Computing Methods for Digital Libraries Lecture 27 Preservation](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d6c5503460f94a4c24a/html5/thumbnails/5.jpg)
5
![Page 6: 1 CS 502: Computing Methods for Digital Libraries Lecture 27 Preservation](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d6c5503460f94a4c24a/html5/thumbnails/6.jpg)
6
![Page 7: 1 CS 502: Computing Methods for Digital Libraries Lecture 27 Preservation](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d6c5503460f94a4c24a/html5/thumbnails/7.jpg)
7
![Page 8: 1 CS 502: Computing Methods for Digital Libraries Lecture 27 Preservation](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d6c5503460f94a4c24a/html5/thumbnails/8.jpg)
8
![Page 9: 1 CS 502: Computing Methods for Digital Libraries Lecture 27 Preservation](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d6c5503460f94a4c24a/html5/thumbnails/9.jpg)
9
Levels of preservation
• Preserve full look and feel of digital material in its context
e.g., A video game with its hardware
• Preserve content with an access system but migrate the look and feel to new environments
e.g., successive versions of MS Windows
• Preserve raw content but no software system
e.g., UTF-8 text with XML/XSL mark-up, but no XML/XSL software
The complexity of preservation varies greatly with the level.
![Page 10: 1 CS 502: Computing Methods for Digital Libraries Lecture 27 Preservation](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d6c5503460f94a4c24a/html5/thumbnails/10.jpg)
10
Challenges: user needs
Digital information differs from print
May be useless without its environment.
Creator and subscriber may not have copies.
Numerous versions.
Example: A scientific journal on-line
If the author does not subscribe - no access to own article.
If the library does not renew subscription - no access to anything.
![Page 11: 1 CS 502: Computing Methods for Digital Libraries Lecture 27 Preservation](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d6c5503460f94a4c24a/html5/thumbnails/11.jpg)
11
Challenges: technical problems
Technical issues
Storage media have short life-span.
Formats and specifications change continually.
Computing environments are very complex.
Example: personal files
I have retained all my personal computer files since 1984, but have great difficulty in reading some of them.
![Page 12: 1 CS 502: Computing Methods for Digital Libraries Lecture 27 Preservation](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d6c5503460f94a4c24a/html5/thumbnails/12.jpg)
12
Challenges: economic and legal
Legal
Archives require permission to save information.
Institutions:
Library of Congress, National Archives, etc. do not provide the same services for electronic information that they provide for physical artifacts.
Example: discontinued serials
What happens if a journal publisher goes bankrupt, or a scientific archive does not get its grant renewed?
![Page 13: 1 CS 502: Computing Methods for Digital Libraries Lecture 27 Preservation](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d6c5503460f94a4c24a/html5/thumbnails/13.jpg)
13
Technical approaches: 1. Persistent storage
Material Approximate life (years)
Acid-free paper 500+
Microfilm 300
Optical disks 100?
Color film 25-50
CDs 20?
Magnetic disk and tape 5
• Persistent storage preserves raw content only
• Research in high-volume, long-term digital media in lacking
![Page 14: 1 CS 502: Computing Methods for Digital Libraries Lecture 27 Preservation](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d6c5503460f94a4c24a/html5/thumbnails/14.jpg)
14
Technical approaches2. Copying bits (refreshing)
Refreshing bits
Repeatedly copy bits from one storage medium to the next.
• A standard technique in data processing.• Benefits from the rapid fall in prices of storage devices.• Preserves raw content only.
Requires active management
Mirrors
Have many copies of the same information with independent management.
![Page 15: 1 CS 502: Computing Methods for Digital Libraries Lecture 27 Preservation](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d6c5503460f94a4c24a/html5/thumbnails/15.jpg)
15
Technical approaches3. Migration of content
Migration
• Retain content but change formats and representations to keep current with technology
• Used by journal publishers
• Preserves content and an access system
Example. Pension funds
The Social Security Administration has records of every FICA payment, which migrate between systems over many years.
![Page 16: 1 CS 502: Computing Methods for Digital Libraries Lecture 27 Preservation](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d6c5503460f94a4c24a/html5/thumbnails/16.jpg)
16
Technical approaches4. Emulation
Concept
• Record a full specification of the computing environment in which the digital information was created
• At time in future, emulate the original computing environment
• Would preserve full look and feel
Clearly not practical for complex computing systems
• Emulation is never perfect
• Computing environments are remarkably complex
But may be useful for parts of systems
e.g., Java virtual machine
![Page 17: 1 CS 502: Computing Methods for Digital Libraries Lecture 27 Preservation](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d6c5503460f94a4c24a/html5/thumbnails/17.jpg)
17
Technical approaches5. Digital archeology
After periods of neglect, archeologists are needed
• Recover data from old media
• Reverse engineer lost formats and specifications
• Experts in digital paleography (reading archaic scripts and formats)
Example. East Germany
German archivists are reconstructing the records of the East German state from worn out tapes, broken computer systems, undocumented data bases, and the recollections of staff.
![Page 18: 1 CS 502: Computing Methods for Digital Libraries Lecture 27 Preservation](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d6c5503460f94a4c24a/html5/thumbnails/18.jpg)
18
Preservation at publication
This is a period of experimentation and change in formats, protocols, object models, etc.
Some information is easier to preserve than others.
Longevity is more likely if:
Formats are widely used, in important applications.
Methods are simple, without using obscure options.
Coding schemes are easy to interpret.
Example. Internet RFC Series
The Internet RFC Series use text/ascii. The RFCs go back to 1969 and have no preservation problems. A few RFCs are in PostScript and already hard to decipher
![Page 19: 1 CS 502: Computing Methods for Digital Libraries Lecture 27 Preservation](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d6c5503460f94a4c24a/html5/thumbnails/19.jpg)
19
Metadata
Digital information needs interpretation
• Self-documentation is always good
• Persistent identification is vital
• Simple, standard metadata has a chance of long-life
• Authentication of material need not be complex (e.g., hash)
• History of changes (e.g., migration to different format)
![Page 20: 1 CS 502: Computing Methods for Digital Libraries Lecture 27 Preservation](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d6c5503460f94a4c24a/html5/thumbnails/20.jpg)
20
Preservation of specifications
Digital information needs a context
Therefore store the specifications of:
• Formats
• Database designs
• Technical documentation
• User manuals
...on high-quality archival materials, e.g., paper.
![Page 21: 1 CS 502: Computing Methods for Digital Libraries Lecture 27 Preservation](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d6c5503460f94a4c24a/html5/thumbnails/21.jpg)
21
Final word
Long-term preservation needs people
and organizations who want it!