was uc3-nov2012wkshps-final

Post on 04-Dec-2014

186 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

Web Archiving Service (WAS)

Rosalie Lackrosalie.lack@ucop.edu

Data Curation for Practitioners 2012 Workshop

Imagine a world …

This is our world …

WAS … is

A service of the UC Curation Center to collect, manage, preserve and publish websites and documents.

WAS Snapshot

53 public archives

120+ archives total

7,500+ sites

50+ TB

23 institutions

WAS Institutions• Institute of Governmental

Studies Library, UCB• UC Berkeley Office of Public

Affairs• UC Davis Libraries• UC Irvine Libraries • UC Los Angeles Libraries • UC Riverside Libraries • UC San Diego Libraries • UC San Francisco Libraries • UC Santa Barbara • UC Santa Cruz McHenry

Library

• Emory University Library• Institute for Research on Labor

and Employment• New York University• Northwestern University Library• Purdue  University • Stanford University Libraries • Temple University• University of Arkansas Libraries • University of Illinois at Urbana

Champaign Libraries • University of Michigan, Bentley

Historical Library • USDA Economic Research

Service • Water Resources Collections and

Archives 

WAS Overview

A) Curator Tools

Curator Workflow

1. Create Site

• Enter site name, URL and description• Scope• Capture frequency• Robots.txt

2. Capture Sites

3. View Captures

• View captures• QA• Compare

4. Public Access

• Customize the archive• Write description• Create custom banner and icon

WAS Overview

B) Public Archives

Web Archive ‘home page’

Browse: Site List + Tags

Search: All Sites in an Archive

Integration with your Systems

How are people using WAS?

Institution’s website

• Preserve intuitional history

• Capture university news and events

Geographically focused

Topical

Support special research collections

Event• Sudden action

required• May need many

selectors• Start date / end

date

Researcher’s Perspective

• Building collections for research– Study the topic / event– Study site change or web-based

communication– Websites are datasets for analysis and data

mining

• Preservation of research– Archive grant-funded websites– Selected sites

• Create stable citations for publications

Get started!

• Each library has WAS administrator(s)

• Unlimited number of curators per account

• What’s the cost?–UC does not pay a service fee– Storage only: $1040/per TB (average

site is $1.46/annually); storage costs to go down

Challenges

• Shared collection development• Metadata issues• Workflow and cost models for faculty

projects• Time!• Limitations of web crawlers• Websites are messy

Contact me!

Rosalie LackWAS Service Managerrosalie.lack@ucop.edu

Imagine a world …

“Imagine a world in which libraries and archives

had never existed. No institutions had ever

systematically collected or preserved our

collective cultural past: every book, letter, or

document was created, read and then

immediately thrown away.  What would we know

about our past?’’ 

This is our world …

“Yet, that is precisely what is happening with the

web: more and more of our daily lives occur

within the digital world, yet more than two

decades after the birth of the modern web, the

“libraries” and “archives” of this world are still

just being formed.”

A Vision Of The Role And Future Of Web ArchivesKalev H. Leetaru, Graduate School of Library and Information Science, University of Illinois. Presented as the keynote address at the 2012 IIPC General Assembly in Washington, DC.http://netpreserve.org/sites/default/files/resources/VisionRoles.pdf

top related