using wayback machine for research

75
Nicholas Taylor Repository Development Group Using Wayback Machine for Research

Upload: nullhandle

Post on 19-Nov-2014

1.488 views

Category:

Technology


0 download

DESCRIPTION

Presentation given at the Library of Congress on how to use Wayback Machine more effectively to answer historical research questions.

TRANSCRIPT

Page 1: Using Wayback Machine for Research

Nicholas TaylorRepository Development Group

Using Wayback Machine for Research

Page 2: Using Wayback Machine for Research

WAYBACK MACHINE?What Is the

Page 3: Using Wayback Machine for Research

WABAC Machine?

Page 4: Using Wayback Machine for Research

Internet Archive’s Wayback Machine

Page 5: Using Wayback Machine for Research

not one, but many Wayback Machines

open source software to “replay” web archives rewrites links to point to archived resources allows for temporal navigation within

archive used by many web archiving institutions

33 out of 62 initiatives listed on Wikipedia

Page 6: Using Wayback Machine for Research

Government of Canada Web Archive

Page 7: Using Wayback Machine for Research

Government of Canada Web Archive

Page 8: Using Wayback Machine for Research

Portuguese Web Archive

Page 9: Using Wayback Machine for Research

Web Archive Singapore

Page 10: Using Wayback Machine for Research

Web Archive Singapore

Page 11: Using Wayback Machine for Research

Catalonian Web Archive

Page 12: Using Wayback Machine for Research

Catalonian Web Archive

Page 13: Using Wayback Machine for Research

California Digital Library Web Archiving Service

Page 14: Using Wayback Machine for Research

Harvard University Web Archive Collection Service

Page 15: Using Wayback Machine for Research

LIMITATIONS AND WORKAROUNDS

Common

Page 16: Using Wayback Machine for Research

limitation: banner displaces page elements

Page 17: Using Wayback Machine for Research

workaround: hide the banner

Page 18: Using Wayback Machine for Research

limitation: AJAX-enabled sites

Page 19: Using Wayback Machine for Research

limitation: AJAX-enabled sites

Page 20: Using Wayback Machine for Research

workaround: disable JavaScript

Page 21: Using Wayback Machine for Research

limitation: nav menu link errors

Page 22: Using Wayback Machine for Research

workaround: insert live site URL in archive

Page 23: Using Wayback Machine for Research

workaround: insert live site URL in archive

Page 24: Using Wayback Machine for Research

workaround: insert live site URL in archive

Page 25: Using Wayback Machine for Research

limitation: no full-text search

Page 26: Using Wayback Machine for Research

workaround: none yet, but R&D ongoing

Page 27: Using Wayback Machine for Research

MECHANICSBasic

Page 28: Using Wayback Machine for Research

structure of a Wayback Machine URL

http://webarchiveqr.loc.gov/loc_sites/20120131201510/http://www.loc.gov/index.html

Wayback Machine URL collection date/timestamp(YYYYMMDDHHMMSS)

URL of archivedresource

Page 29: Using Wayback Machine for Research

URL-based access

Page 30: Using Wayback Machine for Research

URL-based access

Page 31: Using Wayback Machine for Research

date wildcarding

Page 32: Using Wayback Machine for Research

date wildcarding

Page 33: Using Wayback Machine for Research

document wildcarding

Page 34: Using Wayback Machine for Research

document wildcarding

Page 35: Using Wayback Machine for Research

document wildcarding

Page 36: Using Wayback Machine for Research

FINDING MISSING RESOURCES

Strategies for

Page 37: Using Wayback Machine for Research

removed or moved?

don’t start with the archive missing resources have often just moved (

Klein & Nelson, 2010) Synchronicity for Firefox helps find new

location scrapes archived version for “fingerprint”

keywords; uses them to query search engines

Page 38: Using Wayback Machine for Research

MementoFox

Page 39: Using Wayback Machine for Research

MementoFox

Page 40: Using Wayback Machine for Research

find archives for a site whose URL has changed

website URL changed recently historical URL is unknown solution: use search engine to find

historical URL then apply it in the archive

Page 41: Using Wayback Machine for Research

Federal IT Dashboard

Page 42: Using Wayback Machine for Research

check Internet Archive’s Wayback Machine

Page 43: Using Wayback Machine for Research

IA Wayback coverage goes back to July 2010

Page 44: Using Wayback Machine for Research

LCWA only goes back to June 2011

Page 45: Using Wayback Machine for Research

use search engine to find historical URL

Page 46: Using Wayback Machine for Research

use search engine to find historical URL

Page 47: Using Wayback Machine for Research

White House IT Dashboard announcement

Page 48: Using Wayback Machine for Research

note the redirect from http://it.usaspending.gov/

Page 49: Using Wayback Machine for Research

append URL to IA Wayback URL

Page 50: Using Wayback Machine for Research

append URL to LC Wayback URL

Page 51: Using Wayback Machine for Research

find archives for a site whose URL has changed

congressional committee hearings archive live site URL doesn’t work in archive solution: find a site in the archive that

would link to the desired site, then navigate to contemporaneous snapshot

Page 52: Using Wayback Machine for Research

hearings archive only spans 2001-2006

Page 53: Using Wayback Machine for Research

hearings archive URL changed in 2011

Page 54: Using Wayback Machine for Research

truncate archival access URL

Page 55: Using Wayback Machine for Research

snapshot from prior to site change

Page 56: Using Wayback Machine for Research

navigate to appropriate section

Page 57: Using Wayback Machine for Research

navigate to appropriate section

Page 58: Using Wayback Machine for Research

find archives for a previously accessible webpage

records currently stored in password-protected part of site may have previously been publicly-accessible

conceptual site organization lasts longer than exact link construction

solution: figure out where desired resource would be on the live site, then navigate to analogous section on archived site

Page 59: Using Wayback Machine for Research

location of resources on live site

Page 60: Using Wayback Machine for Research

location of resources on live site

Page 61: Using Wayback Machine for Research

authentication required

Page 62: Using Wayback Machine for Research

check the site in the archive

Page 63: Using Wayback Machine for Research

navigate to an individual capture

Page 64: Using Wayback Machine for Research

navigate to appropriate section

Page 65: Using Wayback Machine for Research

navigate to appropriate section

Page 66: Using Wayback Machine for Research

GET INVOLVEDHow You Can

Page 67: Using Wayback Machine for Research

what websites from today would you want to be able to consult in five, ten, twenty years’ time?

have you told us what is important to capture?

help us to help you

Page 68: Using Wayback Machine for Research

End of Term 2012 Web Archive

Page 69: Using Wayback Machine for Research

USEFUL RESOURCESOther

Page 70: Using Wayback Machine for Research

End of Term 2008 Web Archive

Page 71: Using Wayback Machine for Research

CyberCemetery

Page 72: Using Wayback Machine for Research

LCWA

Page 73: Using Wayback Machine for Research

Project One Web Archives

Page 74: Using Wayback Machine for Research

links

Library of Congress Web Archiving Program: http://www.loc.gov/webarchiving/

Library of Congress Web Archives: http://loc.gov/lcwa/

International Internet Preservation Consortium: http://netpreserve.org/

National Digital Information Infrastructure and Preservation Program: http://www.digitalpreservation.gov/

Page 75: Using Wayback Machine for Research

questions?

[email protected]