Download - How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge
![Page 1: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge](https://reader033.vdocument.in/reader033/viewer/2022051611/54b3001e4a7959276a8b46b0/html5/thumbnails/1.jpg)
How to Face the Challenges of Web Archiving?
The experiences of a small library on the edge.
Chloe Martin, Internet Memory Catherine Ryan, National Library of Ireland
LIBER 2012 - 1
![Page 2: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge](https://reader033.vdocument.in/reader033/viewer/2022051611/54b3001e4a7959276a8b46b0/html5/thumbnails/2.jpg)
Context: National Library of Ireland
• Beginnings: Established by the Dublin Science and Museum Act, 1877
• Mission: “to collect, preserve, promote and make accessible the documentary and intellectual record of the life of Ireland”.
• The Digital Record: Born Digital Programme established in 2010, covering web archiving.
• Web Archive Projects: 2 pilot projects in 2011
LIBER 2012 - 2
![Page 3: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge](https://reader033.vdocument.in/reader033/viewer/2022051611/54b3001e4a7959276a8b46b0/html5/thumbnails/3.jpg)
Context: Internet Memory
European Archive / Internet Memory Foundation•Established in 2004 in Amsterdam (offices also in Paris)•Mission: to preserve Web content as a new media for current and future generations •Actions: Sensibilization, partnerships, R&D•Open Access Collections: UK National Archives & Parliament, PRONI, CERN and The National Library of Ireland
Internet Memory Research•Spin-off of IM established in June 2011 in Paris•Missions: to operate large scale or selective crawls & develop new technologies (crawl, access, processing and extraction)
LIBER 2012 - 3
![Page 4: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge](https://reader033.vdocument.in/reader033/viewer/2022051611/54b3001e4a7959276a8b46b0/html5/thumbnails/4.jpg)
Web Archiving Project: Project Origins National Library of Ireland
Building a 21st Century Library:
– Born Digital– Digitisation– Single Integrated Catalogue– Digital Repository– OSCAIL, the Digital Library Programme
LIBER 2012 - 4
![Page 5: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge](https://reader033.vdocument.in/reader033/viewer/2022051611/54b3001e4a7959276a8b46b0/html5/thumbnails/5.jpg)
Web Archiving Project: Project Origins National Library of Ireland
Born Digital Materials:• Natural progression for NLI’s strong political,
cultural and historical collections• How best to approach this in time of
unprecedented financial difficulty?• Born Digital Programme established to examine
requirements and produce a policy document for the next steps
LIBER 2012 - 5
![Page 6: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge](https://reader033.vdocument.in/reader033/viewer/2022051611/54b3001e4a7959276a8b46b0/html5/thumbnails/6.jpg)
Web Archiving Project: Project Origins National Library of Ireland
The Hand of History:
– Snap General Election
– Five Weeks
LIBER 2012 - 6
![Page 7: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge](https://reader033.vdocument.in/reader033/viewer/2022051611/54b3001e4a7959276a8b46b0/html5/thumbnails/7.jpg)
Web Archiving Project: Project Origins National Library of Ireland
Just do it
LIBER 2012 - 7
![Page 8: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge](https://reader033.vdocument.in/reader033/viewer/2022051611/54b3001e4a7959276a8b46b0/html5/thumbnails/8.jpg)
Web Archiving Project: Project Origins National Library of Ireland
Just do it
How?
LIBER 2012 - 8
![Page 9: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge](https://reader033.vdocument.in/reader033/viewer/2022051611/54b3001e4a7959276a8b46b0/html5/thumbnails/9.jpg)
Web Archiving Project: Project Origins National Library of Ireland
Collaborative Partnership:
Partner that suited our requirements and that had experience with others in the cultural sector
Requirements:– Technical skills in the
NLI but working on other projects – needed these skills
– Leverage NLI’s on strong curatorial experience, esp. in politics
– Fast!
LIBER 2012 - 9
![Page 10: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge](https://reader033.vdocument.in/reader033/viewer/2022051611/54b3001e4a7959276a8b46b0/html5/thumbnails/10.jpg)
Web Archiving Project: Project OriginsNational Library of Ireland
Project phases:
– Project scoping and contract– Site selection– Permissions gathering– QA (look and feel)– Publication and promotion
LIBER 2012 - 10
![Page 11: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge](https://reader033.vdocument.in/reader033/viewer/2022051611/54b3001e4a7959276a8b46b0/html5/thumbnails/11.jpg)
Site Selection and PermissionsNational Library of Ireland
Selection Criteria:
– Website presence– Technical reasons– Cut-off date– Women candidates
Permissions:
– All sites contacted and provided with a brief
– Pressurised but necessary phase
LIBER 2012 - 11
![Page 12: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge](https://reader033.vdocument.in/reader033/viewer/2022051611/54b3001e4a7959276a8b46b0/html5/thumbnails/12.jpg)
Scope of projectsNational Library of Ireland
General Election:
– Crawl: 200 snapshots– Scope: 100 seeds– Frequency: 2 times– Date: Feb. 2011
Presidential Election:
– Crawl: 80 snapshots– Scope: 70 seeds– Frequency: 3 times– Date: Oct-Nov. 2011
LIBER 2012 - 12
![Page 13: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge](https://reader033.vdocument.in/reader033/viewer/2022051611/54b3001e4a7959276a8b46b0/html5/thumbnails/13.jpg)
CrawlInternet Memory
• Seeds Validation: URLs, Duplication, Redirection, External links, Dynamic websites
• Scope Parameters: Domain, host and path ; Social Web content ; Frequency ; Robots.txt
files exclusion ; Politeness
• Specific incidents technical changes on the flyModification of scope ; Pending crawls ; Adaptation of the politeness
• Improvement of second crawl
LIBER 2012 - 13
![Page 14: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge](https://reader033.vdocument.in/reader033/viewer/2022051611/54b3001e4a7959276a8b46b0/html5/thumbnails/14.jpg)
Quality Assurance (QA)National Library of Ireland
• Manual QA
• Jira software
• IM – Technical QA
• NLI - ‘Look and Feel’ QA
• Multiple browsers
• Communication with site owners (building relationships and promotion)
LIBER 2012 - 14
![Page 15: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge](https://reader033.vdocument.in/reader033/viewer/2022051611/54b3001e4a7959276a8b46b0/html5/thumbnails/15.jpg)
Quality Assurance (QA)Internet Memory
• Why?
• How? • Manual and visual method: homepage + 2 • Resolution of issues
• Temporal Coherence
LIBER 2012 - 15
![Page 16: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge](https://reader033.vdocument.in/reader033/viewer/2022051611/54b3001e4a7959276a8b46b0/html5/thumbnails/16.jpg)
AccessNational Library of Ireland
• Available to the public
• Full text search
• IM website – search by keyword, URL
• NLI catalogue – keyword via widget developed by NLI IS team and IM
• Future – access through NLI’s own interfaces, issue of integrating results
LIBER 2012 - 16
![Page 17: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge](https://reader033.vdocument.in/reader033/viewer/2022051611/54b3001e4a7959276a8b46b0/html5/thumbnails/17.jpg)
Publication and PromotionNational Library of Ireland
• NLI social media initiative (Twitter and blog)
• Project participants
• Print media (esp. in area of technology)
• And IM!
• Usage figures have increased but real value more apparent in 5-10 years
LIBER 2012 - 17
![Page 18: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge](https://reader033.vdocument.in/reader033/viewer/2022051611/54b3001e4a7959276a8b46b0/html5/thumbnails/18.jpg)
Usage Statistics of Web ArchiveNational Library of Ireland
21/09/2011: Official launch of NLI Web archives (Tweets)
26/10/2011: Blog post on nli.ie/blog and Paper in thejournal.ie
25/11/2011: Paper on irishtimes.com
20/01/2012: Paper on irishtimes.com
17/03/2012: Post on soundofthearchives.wordpress.com
04/05/2012: Paper on irisheconomy.ie
LIBER 2012 - 18
![Page 19: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge](https://reader033.vdocument.in/reader033/viewer/2022051611/54b3001e4a7959276a8b46b0/html5/thumbnails/19.jpg)
Advantages of Web ArchivingNational Library of Ireland
Web archiving:– New opportunities for delivery of materials to
users– Work with existing users expectations that
content be online– Reach new audiences
LIBER 2012 - 19
![Page 20: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge](https://reader033.vdocument.in/reader033/viewer/2022051611/54b3001e4a7959276a8b46b0/html5/thumbnails/20.jpg)
Advantages of Web ArchivingNational Library of Ireland
Political web archives;Irish General Election:– Researchers can compare online content pre-
and post-election– Facilitates research into how ‘online’ this
election was– Assess impact of technological developments
in campaign communications– Record of campaign information
LIBER 2012 - 20
![Page 21: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge](https://reader033.vdocument.in/reader033/viewer/2022051611/54b3001e4a7959276a8b46b0/html5/thumbnails/21.jpg)
Benefits of Working TogetherNational Library of Ireland
Pilot project for a long-term activity:– Allowed us to enter a new collecting area
despite lack of tech expertise– Facilitated collection of important material that
one else was collecting– Collect material quickly– Leverage curatorial skills– Gained new technical skills
LIBER 2012 - 21
![Page 22: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge](https://reader033.vdocument.in/reader033/viewer/2022051611/54b3001e4a7959276a8b46b0/html5/thumbnails/22.jpg)
Benefits of Working TogetherInternet Memory
• To supporte the development of Web archiving initiatives
• To operate rapid deployment of Web archives
• To address new challenges in this area:• Social media content• QA• Automatization
LIBER 2012 - 22
![Page 23: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge](https://reader033.vdocument.in/reader033/viewer/2022051611/54b3001e4a7959276a8b46b0/html5/thumbnails/23.jpg)
Conclusion
General Election:• 18,495,771 URLs• 1.14 TB• 10,405 ARCs
Presidential Election:• 7,333,399 URLs• 278.10 GB• 2,513 ARCs
View the NLI collections at:http://www.nli.ie/en/udlist/digital-collections.aspx
View the Web archive blog entry at:http://www.nli.ie/blog/index.php/2011/10/26/general-election-2011-web-archiving/
View Internet Memory Collections at:http://collections.europarchive.org/
To be continued…
LIBER 2012 - 23
![Page 24: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge](https://reader033.vdocument.in/reader033/viewer/2022051611/54b3001e4a7959276a8b46b0/html5/thumbnails/24.jpg)
LIBER 2012 - 24
Questions?
Thanks for your attention!
Chloe MartinInternet
Memoryhttp://internetmemory.org
[email protected]@InternetMemory
Catherine RyanNational Library of Irelandhttp://[email protected]@NLIreland