netarchivesuite sabine schostag the netarchive [email protected]

7
NetarchiveSuite Sabine Schostag The Netarchive [email protected]

Upload: anis-alexander

Post on 12-Jan-2016

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NetarchiveSuite Sabine Schostag The Netarchive sas@statsbiblioteket.dk

NetarchiveSuite

Sabine SchostagThe Netarchive

[email protected]

Page 2: NetarchiveSuite Sabine Schostag The Netarchive sas@statsbiblioteket.dk

How we use NetarchiveSuite

Questions and answers on NetarchiveSuite: lifecycle: What aspects of the web archiving life cycle model does the tool cover? What aspects of the model would you like to/do you intend to build into the tool? What functionality does the tool provide that isn't reflected in the model?development: What resources are committed to the tool's ongoing development? What are major features in the roadmap? Is the code open source?adoption: What is the user base for the tool? How environment-specific is the tool as opposed to readily reusable by other organizations?functionality: What are the tool's unique features? What are its shortcomings?

Page 3: NetarchiveSuite Sabine Schostag The Netarchive sas@statsbiblioteket.dk

NetarchiveSuite

LifecycleWhat aspects of the web archiving life cycle model does the tool cover?

What aspects of the model would you like to/do you intend to build into the tool? Extended documentation, Search functions, time schedules ≤ 1 hour

What functionality does the tool provide that isn't reflected in the model?Time schedules min: once an hour – max ??

Page 4: NetarchiveSuite Sabine Schostag The Netarchive sas@statsbiblioteket.dk

NetarchiveSuite

development: What resources are committed to the tool's ongoing development?

2,6 MP

What are major features in the roadmap? Technical improvements, Upgrade to or support Heritrix 3, Replacing current NetarchiveSuite Archive module Better integration of documentation

Is the code open source?https://sbforge.org/display/NASDOC42/NetarchiveSuite+Overview

Page 5: NetarchiveSuite Sabine Schostag The Netarchive sas@statsbiblioteket.dk

NetarchiveSuite

adoption: What is the user base for the tool? How environment-specific is the tool as opposed to readily reusable by other organizations? Even though the NetarchiveSuite software is developed in Java, and therefore is

mostly platform independent, we do have a couple of external calls to the Unix sort command. The parts of our software using this external command therefore only run on Linux/Unix, or Windows with Cygwin installed.

Se installation manual: https://sbforge.org/display/NASDOC42/Installation+Overview

Page 6: NetarchiveSuite Sabine Schostag The Netarchive sas@statsbiblioteket.dk

NetarchiveSuite

Functionality: What are the tool's unique features? What are its shortcomings?

Multifaceted aplication Selective Harvests Snapshot Harvests Domains Schedules Extended fields Heritrix GUI Access Global Crawler Traps Harvest History Harvester Templates Quality Assurance System State Bit Preservation See: https://sbforge.org/display/NASDOC42/User+Manual

Page 7: NetarchiveSuite Sabine Schostag The Netarchive sas@statsbiblioteket.dk

NetarchiveSuite

Netarchive use of NAS /overviewBroad crawlsSelective crawls

”Selective crawls” Event crawls Special crawls (e.g. upon a scholars wish) Focused crawls: Social media (special templates), very big sites,..