making enterprise-level archive tools accessible for personal web archiving

1
Making Enterprise-Level Archive Tools Accessible for Personal Web Archiving Mat Kelly, Michele C. Weigle, and Michael L. Nelson {mkelly,mweigle, mln}@cs.odu.edu Department of Computer Science, Old Dominion University, Norfolk, Virginia USA One-Click, User Instigated Preservation “Archive Now!” button sets up crawl, initiates crawl and puts archive file in correct location to be indexed. Wayback consumption can be checked with “Check Archive Status” button. Once indexed, “View Archive” button shows all archives for URL in local Wayback. Selecting the date in local Wayback displays the pre- served webpage. Features Collection of Archiving Tools Drag & Drop Installation And Removal All Tools Can Reside on a Single Machine Managed Through a Graphical User Interface (GUI) Tools Installed Locally CREATE ARCHIVES: Heritrix (Crawler) REPLAY ARCHIVES: Wayback Machine INSPECT ARCHIVES: WARC-Proxy More to Come! Advanced Options/Features Specify Multiple URLs to be Included in the Crawl Setup Crawls and Allow for Customization Prior to Execution (e.g., crawl period) Start or Stop Services Not Currently Needed (e.g., initialize a long crawl but delay replay until later) Interface for Tweaking Support GENERATED ARCHIVES ARE SAFE Web ARChives (WARCs) reside on your hard drive, can be backed up for safe keeping like any other file CROSS PLATFORM Support for MacOS X, Windows and Linux WORKS WITH EXISTING WARCS Just drop in and local Wayback will index for replay COMPATIBLE WITH OTHER ARCHIVING TOOLS Use the WARC-generating preservation tool of your choice (e.g., WARCreate, Wget) in lieu of Heritrix PDA 2013; College Park, MD; February 21, 2013 http://matkelly.com/wail

Upload: mat-kelly

Post on 18-Nov-2014

2.517 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Making Enterprise-Level Archive Tools Accessible for Personal Web Archiving

MakingEnterprise-LevelArchiveToolsAccessibleforPersonalWebArchiving

Mat Kelly, Michele C. Weigle, and Michael L. Nelson{mkelly,mweigle, mln}@cs.odu.edu

Department of Computer Science, Old Dominion University, Norfolk, Virginia USA

One-Click, User Instigated Preservation

• “Archive Now!” button sets up crawl, initiates crawl andputs archive file in correct location to be indexed.• Wayback consumption can be checked with “CheckArchive Status” button.

• Once indexed, “View Archive” button shows all archivesfor URL in local Wayback.

• Selecting the date in local Wayback displays the pre-served webpage.

Features• Collection of Archiving Tools

• Drag & Drop Installation And Removal

• All Tools Can Reside on a Single Machine

• Managed Through a Graphical User Interface(GUI)

Tools Installed Locally• CREATE ARCHIVES: Heritrix (Crawler)

• REPLAY ARCHIVES: Wayback Machine

• INSPECT ARCHIVES: WARC-Proxy

• More to Come!

Advanced Options/Features• Specify Multiple URLs to be Included in the Crawl

• Setup Crawls and Allow for Customization Prior to Execution (e.g., crawl period)

• Start or Stop Services Not Currently Needed (e.g., initialize a long crawl but delay replay until later)

Interface for Tweaking Support• GENERATED ARCHIVES ARE SAFE

Web ARChives (WARCs) reside on your harddrive, can be backed up for safe keeping likeany other file

• CROSS PLATFORMSupport for MacOS X, Windows and Linux

• WORKS WITH EXISTING WARCSJust drop in and local Wayback will index forreplay

• COMPATIBLE WITH OTHERARCHIVING TOOLSUse the WARC-generating preservation toolof your choice (e.g., WARCreate, Wget) inlieu of Heritrix

PDA 2013; College Park, MD; February 21, 2013 http://matkelly.com/wail