preserving webharvests at the national library of new zealand te puna mātauranga o aotearoa peter...
TRANSCRIPT
![Page 1: Preserving webharvests at the National Library of New Zealand Te Puna Mātauranga o Aotearoa Peter McKinney Digital Preservation Policy Analyst National](https://reader035.vdocument.in/reader035/viewer/2022081519/56649db45503460f94aa498d/html5/thumbnails/1.jpg)
Preserving webharvests at the National Library of New Zealand Te Puna Mātauranga o AotearoaPeter McKinney Digital Preservation Policy AnalystNational Library of New Zealand Te Puna Mātauranga o Aotearoa
![Page 2: Preserving webharvests at the National Library of New Zealand Te Puna Mātauranga o Aotearoa Peter McKinney Digital Preservation Policy Analyst National](https://reader035.vdocument.in/reader035/viewer/2022081519/56649db45503460f94aa498d/html5/thumbnails/2.jpg)
Agenda
• Relationship with National Library of China
• National Library mandate• Collection analysis• Preservation of those collections• Future plans
![Page 3: Preserving webharvests at the National Library of New Zealand Te Puna Mātauranga o Aotearoa Peter McKinney Digital Preservation Policy Analyst National](https://reader035.vdocument.in/reader035/viewer/2022081519/56649db45503460f94aa498d/html5/thumbnails/3.jpg)
Agreement on Cooperation: NLZ and NLNZ
- Drawn up in the spirit of friendship- Principles of equity and reciprocity- Purpose is to enhance the ability to
contribute to improvement of cultural and economic life of their respective nations
- Cooperation on digital library matters Share knowledge and information in relation to Digital Preservation
![Page 4: Preserving webharvests at the National Library of New Zealand Te Puna Mātauranga o Aotearoa Peter McKinney Digital Preservation Policy Analyst National](https://reader035.vdocument.in/reader035/viewer/2022081519/56649db45503460f94aa498d/html5/thumbnails/4.jpg)
Mandate“The Minister may, by notice in the Gazette, authorise the
National Librarian to make a copy, at any time or times and at his or her discretion, of public documents that are internet documents in accordance with any terms and conditions as to format, public access, or other matters that are specified in the notice.” National Library Act 2003
Te Wehenga, by Cliff Whiting, 1974. CAC-readingroomdetail_00002493
![Page 5: Preserving webharvests at the National Library of New Zealand Te Puna Mātauranga o Aotearoa Peter McKinney Digital Preservation Policy Analyst National](https://reader035.vdocument.in/reader035/viewer/2022081519/56649db45503460f94aa498d/html5/thumbnails/5.jpg)
Two web archiving programmes
http://topics.breitbart.com/fishing+pole/ http://www.trimarinegroup.com/operations/fleet.php
Domain Harvesting of the entire “New Zealand Internet”(2008, 2010, 2013)
Selective harvesting of specific websites or parts of websites (since 1999)
![Page 6: Preserving webharvests at the National Library of New Zealand Te Puna Mātauranga o Aotearoa Peter McKinney Digital Preservation Policy Analyst National](https://reader035.vdocument.in/reader035/viewer/2022081519/56649db45503460f94aa498d/html5/thumbnails/6.jpg)
Collections
Selective Harvests (since 1999)No. of Websites 14,943
No. of ARC files 71,374
Size on disk (Tb) 5.63
Total number of files in harvest IE (excluding those within the ARC)
c.291,000
No. of Files contained within ARCs 87,112,551
![Page 7: Preserving webharvests at the National Library of New Zealand Te Puna Mātauranga o Aotearoa Peter McKinney Digital Preservation Policy Analyst National](https://reader035.vdocument.in/reader035/viewer/2022081519/56649db45503460f94aa498d/html5/thumbnails/7.jpg)
2008 2010 20130
2
4
6
8
10
12
0
50,000,000
100,000,000
150,000,000
200,000,000
250,000,000
No. of files (within WARCs)
Data size
Tera
byte
s
No
of fi
les
Whole of Domain Harvests
Collections
![Page 8: Preserving webharvests at the National Library of New Zealand Te Puna Mātauranga o Aotearoa Peter McKinney Digital Preservation Policy Analyst National](https://reader035.vdocument.in/reader035/viewer/2022081519/56649db45503460f94aa498d/html5/thumbnails/8.jpg)
Digital preservationThoughts for today1. How much information?2. Whole of domain into preservation system3. ARC to WARC migration
![Page 9: Preserving webharvests at the National Library of New Zealand Te Puna Mātauranga o Aotearoa Peter McKinney Digital Preservation Policy Analyst National](https://reader035.vdocument.in/reader035/viewer/2022081519/56649db45503460f94aa498d/html5/thumbnails/9.jpg)
Digital preservationSimply:1. What are the technical characteristics of the content?
- What format is it in?- What are the properties the file exhibits?- Is it a valid expression of the specification?
2. What is the institution’s capability in terms of rendering the content?
![Page 10: Preserving webharvests at the National Library of New Zealand Te Puna Mātauranga o Aotearoa Peter McKinney Digital Preservation Policy Analyst National](https://reader035.vdocument.in/reader035/viewer/2022081519/56649db45503460f94aa498d/html5/thumbnails/10.jpg)
Digital preservationWe use that information for:• Repository analysis• Risk assessment• Preservation planning• Preservation action• Evaluation
Rice, Anne Estelle, 1879-1959. Artist unknown :[Embroidered Chinese silk shawl belonging to Katherine Mansfield. Made ca 1900. Detail of bird in flight]. Artist unknown :[Embroidered Chinese silk shawl belonging to Katherine Mansfield] [made ca 1900]. Ref: D-014-007-detail-2. Alexander Turnbull Library, Wellington, New Zealand. http://natlib.govt.nz/records/22688625
![Page 11: Preserving webharvests at the National Library of New Zealand Te Puna Mātauranga o Aotearoa Peter McKinney Digital Preservation Policy Analyst National](https://reader035.vdocument.in/reader035/viewer/2022081519/56649db45503460f94aa498d/html5/thumbnails/11.jpg)
Validation stack - current
![Page 12: Preserving webharvests at the National Library of New Zealand Te Puna Mātauranga o Aotearoa Peter McKinney Digital Preservation Policy Analyst National](https://reader035.vdocument.in/reader035/viewer/2022081519/56649db45503460f94aa498d/html5/thumbnails/12.jpg)
Validation stack - current
Thoughts1. Mimetype is insufficient2. Not enough information about
the website files to support preservation management
![Page 13: Preserving webharvests at the National Library of New Zealand Te Puna Mātauranga o Aotearoa Peter McKinney Digital Preservation Policy Analyst National](https://reader035.vdocument.in/reader035/viewer/2022081519/56649db45503460f94aa498d/html5/thumbnails/13.jpg)
Extracted information - mimetypesTotal number of mime types across selective harvests= 543
text/h
tml
imag
e/jpeg
imag
e/gif
imag
e/png
applica
tion/pdf
text/p
lain
text/d
ns
text/c
ss
applica
tion/x-jav
ascrip
t
text/x
ml
no-type
applica
tion/x-sh
ockwav
e-flash
applica
tion/msw
ord
text/j
avasc
ript
applica
tion/java
script
applica
tion/atom+x
ml
applica
tion/xml
applica
tion/rss+x
ml
applica
tion/octe
t-stre
am100000
1000000
10000000
100000000
Top 20 Mime Types - log scale
Qty
![Page 14: Preserving webharvests at the National Library of New Zealand Te Puna Mātauranga o Aotearoa Peter McKinney Digital Preservation Policy Analyst National](https://reader035.vdocument.in/reader035/viewer/2022081519/56649db45503460f94aa498d/html5/thumbnails/14.jpg)
Validation stack – proposed
![Page 15: Preserving webharvests at the National Library of New Zealand Te Puna Mātauranga o Aotearoa Peter McKinney Digital Preservation Policy Analyst National](https://reader035.vdocument.in/reader035/viewer/2022081519/56649db45503460f94aa498d/html5/thumbnails/15.jpg)
Validation stack – proposed
Thoughts1. METS file is ‘large’2. Take only format and fixity?3. Take only format summary at
ARC level?
![Page 16: Preserving webharvests at the National Library of New Zealand Te Puna Mātauranga o Aotearoa Peter McKinney Digital Preservation Policy Analyst National](https://reader035.vdocument.in/reader035/viewer/2022081519/56649db45503460f94aa498d/html5/thumbnails/16.jpg)
NDHA Overview
Publishing
Delivery
Search ToolsEg. Find / Tapuhi /
Voyager /Beta
PERMANENT REPOSITORY
DEPOSIT
Management UI Back-Office UI
STAGINGTA
AssessorArranger Approver
Web Curator Tool
EnrichmentValidation Stack
Rosetta
Domain Harvests into preservation system
![Page 17: Preserving webharvests at the National Library of New Zealand Te Puna Mātauranga o Aotearoa Peter McKinney Digital Preservation Policy Analyst National](https://reader035.vdocument.in/reader035/viewer/2022081519/56649db45503460f94aa498d/html5/thumbnails/17.jpg)
Whole of Domain legal considerations
- Films, Videos, and Publications Classifications Act 1993; - Criminal Procedure Act 2011; - Defamation Act 1992; - Human Rights Act 1993; - Copyright Act 1994;- Privacy Act 1993
![Page 18: Preserving webharvests at the National Library of New Zealand Te Puna Mātauranga o Aotearoa Peter McKinney Digital Preservation Policy Analyst National](https://reader035.vdocument.in/reader035/viewer/2022081519/56649db45503460f94aa498d/html5/thumbnails/18.jpg)
Legal- Objectionable/illegal
material- Copyright - Is giving access
“republishing”
Technical/Intellectual- One corpus?- Full text search?- Themed?- What is the data
model?
Whole of Domain legal considerations
![Page 19: Preserving webharvests at the National Library of New Zealand Te Puna Mātauranga o Aotearoa Peter McKinney Digital Preservation Policy Analyst National](https://reader035.vdocument.in/reader035/viewer/2022081519/56649db45503460f94aa498d/html5/thumbnails/19.jpg)
ARC to WARC conversionWhy?• Take advantage of most up to
date standard• Normalise all web content
Critical considerations- Stakeholder engagement- Development of tools- Proofs of conversion
![Page 20: Preserving webharvests at the National Library of New Zealand Te Puna Mātauranga o Aotearoa Peter McKinney Digital Preservation Policy Analyst National](https://reader035.vdocument.in/reader035/viewer/2022081519/56649db45503460f94aa498d/html5/thumbnails/20.jpg)
Conclusions
• Scale of harvests challenge processes for description, access and preservation
• Preservation of harvests at NLNZ is at a basic level• Want to move to fuller preservation management of the
website files