digital presevation
TRANSCRIPT
![Page 1: Digital presevation](https://reader033.vdocument.in/reader033/viewer/2022060117/5586053bd8b42a90638b481e/html5/thumbnails/1.jpg)
Digital preservationfor ongoing access
Presentation for Council July 2008
David PearsonManager, Digital Preservation Section
![Page 2: Digital presevation](https://reader033.vdocument.in/reader033/viewer/2022060117/5586053bd8b42a90638b481e/html5/thumbnails/2.jpg)
Overview
1. We have lots of “digital stuff” in our collections and it is growing
2. We will lose access to it unless we take action
3. We need to manage the process of keeping it accessible and usable
4. Solutions have to be scalable, reliable and automated
![Page 3: Digital presevation](https://reader033.vdocument.in/reader033/viewer/2022060117/5586053bd8b42a90638b481e/html5/thumbnails/3.jpg)
1. “Digital stuff”- many collections
Oral HistoryPictures
Historical Newspapers
Maps
Manuscripts
Books
Web sites
Ephemera
Sheet music
Serial
![Page 4: Digital presevation](https://reader033.vdocument.in/reader033/viewer/2022060117/5586053bd8b42a90638b481e/html5/thumbnails/4.jpg)
How does it grow?
1. We collect it – Physical carriers– Online
• PANDORA web archive• Australian web domain harvests
2. We create it– Oral history interviews – Photographs – Publications
3. We convert it– Digitise our collections
![Page 5: Digital presevation](https://reader033.vdocument.in/reader033/viewer/2022060117/5586053bd8b42a90638b481e/html5/thumbnails/5.jpg)
Web Archives
• Web sites are collected selectively – Individually for access via PANDORA, or
– On a large scale via annual domain snapshots
• No control over content creation
• Lots of – File formats
– Individual files (Pandora ≈ 51 million, Domain harvest ≈ 1.3 billion files)
– Links
– Software (browser, plug-ins, readers)
• Internet content changes over time
![Page 6: Digital presevation](https://reader033.vdocument.in/reader033/viewer/2022060117/5586053bd8b42a90638b481e/html5/thumbnails/6.jpg)
![Page 7: Digital presevation](https://reader033.vdocument.in/reader033/viewer/2022060117/5586053bd8b42a90638b481e/html5/thumbnails/7.jpg)
![Page 8: Digital presevation](https://reader033.vdocument.in/reader033/viewer/2022060117/5586053bd8b42a90638b481e/html5/thumbnails/8.jpg)
Digitisation
• Around 135,000 items digitised
• Newspaper project = 4 million pages by 2010
• Internally created so we can control– Standards– File formats (e.g. TIFF,
JPEG, PDF )– Metadata– Workflows
• Issues– Growing volume
![Page 9: Digital presevation](https://reader033.vdocument.in/reader033/viewer/2022060117/5586053bd8b42a90638b481e/html5/thumbnails/9.jpg)
Physical carriers
• Approx. 12,000 items – grows by 1,000 a year
Issues• No control over creation
• Time lag before acquisition
• Variety of carriers (fragile) and file formats
• Require various hardware, software, operating systems, drivers to access
• Labour intensive to process and transfer to safe storage (growing backlog)
![Page 10: Digital presevation](https://reader033.vdocument.in/reader033/viewer/2022060117/5586053bd8b42a90638b481e/html5/thumbnails/10.jpg)
Growth : digital collection storage
0
50
100
150
200
250
300
350
Jan-03 Jul-03 Jan-04 Jul-04 Jan-05 Jul-05 Jan-06 Jul-06 Jan-07 Jul-07 Jan-08 Jul-08
Stor
age
size
(ter
abyt
es)
Australian Web Harvests
Newspapers
![Page 11: Digital presevation](https://reader033.vdocument.in/reader033/viewer/2022060117/5586053bd8b42a90638b481e/html5/thumbnails/11.jpg)
Type of Digital Collections2008
Pandora3%
Maps2%
Sheet Music4%
Manuscripts2%
Pictures7%
Oral History18%
Other3%
Historical Newspapers
21%
Australian Web Harvest
40%
![Page 12: Digital presevation](https://reader033.vdocument.in/reader033/viewer/2022060117/5586053bd8b42a90638b481e/html5/thumbnails/12.jpg)
Comparison of books collection & digital collection "book equivalents"
0.00
1.00
2.00
3.00
4.00
5.00
6.00
2005 2006 2007 2008
Year end June
"Boo
k Eq
uiva
lent
s" (m
illio
ns)
Digital Collection20 mb "bookequivalents"Books Collection
Growth: compared to books
![Page 13: Digital presevation](https://reader033.vdocument.in/reader033/viewer/2022060117/5586053bd8b42a90638b481e/html5/thumbnails/13.jpg)
2. Act or risk losing it
• “Digital stuff” is dependent on technology at all stages– Creation/capture
– Storage
– Access
• Technology changes rapidly thus software, hardware, media, file formats, operating systems become obsolete
• Unless managed deterioration can occur rapidly e.g. data can be corrupted or lost in storage or transfer process
![Page 14: Digital presevation](https://reader033.vdocument.in/reader033/viewer/2022060117/5586053bd8b42a90638b481e/html5/thumbnails/14.jpg)
Computer Museum
![Page 15: Digital presevation](https://reader033.vdocument.in/reader033/viewer/2022060117/5586053bd8b42a90638b481e/html5/thumbnails/15.jpg)
3. Managing to keep it
• “Not managing it” is not an option
• We need to
– Understand our “digital stuff” & associated risks
– Provide safe storage & ensure integrity
– Ensure access over time as technology changes
– Develop & implement preservation workflows, skills, standards, & strategies for ongoing access
– Enable content to be shared and used in different ways in the future
![Page 16: Digital presevation](https://reader033.vdocument.in/reader033/viewer/2022060117/5586053bd8b42a90638b481e/html5/thumbnails/16.jpg)
4. Solutions and implications
• Large scale automated processes
• Original research & time to deliver the solutions
• Reasonably long lead times
• Audit processes and quality control monitoring are critical
• Significant resources are required
![Page 17: Digital presevation](https://reader033.vdocument.in/reader033/viewer/2022060117/5586053bd8b42a90638b481e/html5/thumbnails/17.jpg)
Conclusions
• We are responsible for a lot of “digital stuff”• If we simply collect and store it, it will become
unusable in a relatively short time as technologies change
• Maintaining the ability to access it requires a lot of good management, planning, & dedicated resources
• We have to find and use solutions that can be applied automatically and reliably to billions of digital files