update on memento (iipc 2011 plenary)

40
Memento Update 2011 IIPC General Assembly, Den Hague 1 Update on Memento http://www.mementoweb.org/ Herbert Van de Sompel Robert Sanderson Michael L. Nelson This research funded by the Library of Congress Towards Seamless Navigation of the Web of the Past

Upload: robert-sanderson

Post on 11-May-2015

1.046 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Update on Memento (IIPC 2011 Plenary)

Memento Update

2011 IIPC General Assembly, Den Hague 1

Update on Memento

http://www.mementoweb.org/

Herbert Van de Sompel Robert Sanderson Michael L. Nelson

This research funded by the Library of Congress

Towards Seamless Navigation of the Web of the Past

Page 2: Update on Memento (IIPC 2011 Plenary)

Memento Update

2011 IIPC General Assembly, Den Hague 2

Overview of Memento Framework

Deployment Progress

Memento and Discovery

Memento and Branding

Alternative Web Archiving Strategies

Page 3: Update on Memento (IIPC 2011 Plenary)

Memento Update

2011 IIPC General Assembly, Den Hague 3

Overview of Memento Framework

Deployment Progress

Memento and Discovery

Memento and Branding

Alternative Web Archiving Strategies

Page 4: Update on Memento (IIPC 2011 Plenary)

Memento Update

2011 IIPC General Assembly, Den Hague 4

Memento wants to make it easy

to navigate the Web of the Past.

Page 5: Update on Memento (IIPC 2011 Plenary)

Memento Update

2011 IIPC General Assembly, Den Hague 5

Tate Online Today

Select Date March 16 2008

Tate Online March 16 2008

From National Archives

Page 6: Update on Memento (IIPC 2011 Plenary)

Memento Update

2011 IIPC General Assembly, Den Hague 6

Content Management Systems

•  Designed to be aware of all versions of a resource

•  Self-contained

•  Variety of proprietary version mechanisms

•  Versions interlinked using proprietary mechanisms

World Wide Web

•  Designed to forget about prior versions of a resource

•  Highly Distributed

•  No standard version mechanisms

•  Standardized interlinking mechanisms

Versions: Web vs CMS

Page 7: Update on Memento (IIPC 2011 Plenary)

Memento Update

2011 IIPC General Assembly, Den Hague 7

The Web Architecture has a hard time dealing with the versions that do exist:

•  Cannot talk about a resource as it used to exist

•  Cannot access a prior version given the current one

•  Cannot access the current version given a prior one

Versions are not Integrated

Page 8: Update on Memento (IIPC 2011 Plenary)

Memento Update

2011 IIPC General Assembly, Den Hague 8

•  Regards the Web as a big Content Management System

•  Introduces a uniform capability to access versions on the Web

•  Does not build new archives but leverages all systems that host versions

Memento Framework

Page 9: Update on Memento (IIPC 2011 Plenary)

Memento Update

2011 IIPC General Assembly, Den Hague 9

•  Is Distributed: versions may exist on several servers

•  Uses Time as a global version indicator

•  Is based on the primitives of the Web: resource, resource state, representation, content negotiation, link

Memento Framework

Page 10: Update on Memento (IIPC 2011 Plenary)

Memento Update

2011 IIPC General Assembly, Den Hague 10

Memento Interaction Overview

Page 11: Update on Memento (IIPC 2011 Plenary)

Memento Update

2011 IIPC General Assembly, Den Hague 11

Original Resource and Versions

Page 12: Update on Memento (IIPC 2011 Plenary)

Memento Update

2011 IIPC General Assembly, Den Hague 12

Bridge from Present to Past

Page 13: Update on Memento (IIPC 2011 Plenary)

Memento Update

2011 IIPC General Assembly, Den Hague 13

Bridge from Past to Present

Page 14: Update on Memento (IIPC 2011 Plenary)

Memento Update

2011 IIPC General Assembly, Den Hague 14

Memento Framework

Page 15: Update on Memento (IIPC 2011 Plenary)

Memento Update

2011 IIPC General Assembly, Den Hague 15

Framework with Multiple Archives

Page 16: Update on Memento (IIPC 2011 Plenary)

Memento Update

2011 IIPC General Assembly, Den Hague 16

Overview of Memento Framework

Deployment Progress

Memento and Discovery

Memento and Branding

Alternative Web Archiving Strategies

Page 17: Update on Memento (IIPC 2011 Plenary)

Memento Update

2011 IIPC General Assembly, Den Hague 17

Significant progress has been made towards

seamless navigation of the Web of the Past.

Page 18: Update on Memento (IIPC 2011 Plenary)

Memento Update

2011 IIPC General Assembly, Den Hague 18

•  Standardization process started via the IETF

•  Interest from IETF and W3C

•  Encouraged by major Web architects, including: Tim Berners-Lee, Mark Nottingham, Michael Hausenblas

https://datatracker.ietf.org/doc/draft-vandesompel-memento/

Standardization

Page 19: Update on Memento (IIPC 2011 Plenary)

Memento Update

2011 IIPC General Assembly, Den Hague 19

•  Several client tools developed by us and others

•  Add-ons for FireFox (operational) and Internet Explorer (experimental)

•  Applications for Android (operational) and iPhone/iPad (in development)

•  Paper in current Issue of Code4Lib Journal

http://www.mementoweb.org/tools/

Memento Clients

Page 20: Update on Memento (IIPC 2011 Plenary)

Memento Update

2011 IIPC General Assembly, Den Hague 20

•  Memento-compliant Wayback software:

•  In use by Internet Archive

•  Available to Web archives, worldwide

•  Please experiment with this new 1.6 version!

http://www.mementoweb.org/tools/

Memento Server Support

Page 21: Update on Memento (IIPC 2011 Plenary)

Memento Update

2011 IIPC General Assembly, Den Hague 21

•  Plug-in for MediaWiki (operational)

•  Used on W3C’s main wiki

•  Please install it for your MediaWiki!

http://www.mementoweb.org/tools/

Memento Server Support (2)

Page 22: Update on Memento (IIPC 2011 Plenary)

Memento Update

2011 IIPC General Assembly, Den Hague 22

•  Server side client:

•  Attempts to perform all Memento actions against a given URI

•  Reports success/failure of the interactions and warnings for optional aspects

•  Kept up to date with IETF Internet Draft

http://www.mementoweb.org/tools/validator/

Memento Server Validator

Page 23: Update on Memento (IIPC 2011 Plenary)

Memento Update

2011 IIPC General Assembly, Den Hague 23

•  Several systems that host Mementos made Memento-compliant “by proxy”:

•  Many Web Archives that do not yet run Memento-compliant software

•  3,000+ MediaWiki systems, including Wikipedia, Wikia

•  We would love all of these to become natively Memento compliant!

Memento Proxy Support

Page 24: Update on Memento (IIPC 2011 Plenary)

Memento Update

2011 IIPC General Assembly, Den Hague 24

•  Ongoing effort to add materials that support understanding and adoption:

•  Introduction to Memento •  How to recognize

Mementos, TimeGates, Original Resources?

•  Guidelines for servers that host Mementos (Web Archives, CMS, snapshot archives, etc.)

http://www.mementoweb.org/guide/

Memento Web Site

Page 25: Update on Memento (IIPC 2011 Plenary)

Memento Update

2011 IIPC General Assembly, Den Hague 25

•  2007-2010: US $250K grant from Library of Congress

•  Approx. $50K on Memento

•  2010-2011: US $1 Million follow-up grant from Library of Congress

•  For: Specification, outreach, tool development, further research

Funding

Page 26: Update on Memento (IIPC 2011 Plenary)

Memento Update

2011 IIPC General Assembly, Den Hague 26

Overview of Memento Framework

Deployment Progress

Memento and Discovery

Memento and Branding

Alternative Web Archiving Strategies

Page 27: Update on Memento (IIPC 2011 Plenary)

Memento Update

2011 IIPC General Assembly, Den Hague 27

Very few Web sites provide a “timegate” link.

Need additional mechanisms to support Discovery.

Page 28: Update on Memento (IIPC 2011 Plenary)

Memento Update

2011 IIPC General Assembly, Den Hague 28

Batch Discovery: TimeMaps

A TimeMap minimally lists:

•  URI and datetime of Mementos known to an archive •  URI of Original Resource

TimeMaps can be aggregated across systems that host Mementos

Page 29: Update on Memento (IIPC 2011 Plenary)

Memento Update

2011 IIPC General Assembly, Den Hague 29

Batch Discovery: Feed of TimeMaps

System that hosts Mementos exposes Feed of TimeMaps to allow applications to remain in sync with its collection:

•  One Atom entry per Original Resource •  The entry links to or includes a TimeMap •  The entry's updated changes when additional Mementos become available •  The ID of the entry is a tag URI based on URI of Original Resource •  Can be protected, and include license information •  Could be anonymized by aggregating service

Page 30: Update on Memento (IIPC 2011 Plenary)

Memento Update

2011 IIPC General Assembly, Den Hague 30

Batch Discovery: robots.txt

•  robots.txt file is used by Web servers to convey crawling policies

•  Add a directives to support discovery of TimeGates and Feeds of TimeMaps

TimeGate: http://dutch.archive.org/timegate/ Archived: .nl

TimeGate: http://all.archive.org/timegate/ Archived: *

TimeMapFeed: http://dutch.archive.org/feed/feed1.xml

Page 31: Update on Memento (IIPC 2011 Plenary)

Memento Update

2011 IIPC General Assembly, Den Hague 31

Overview of Memento Framework

Deployment Progress

Memento and Discovery

Memento and Branding

Alternative Web Archiving Strategies

Page 32: Update on Memento (IIPC 2011 Plenary)

Memento Update

2011 IIPC General Assembly, Den Hague 32

Memento can recreate pages using resources from different archives.

This poses a branding challenge.

Page 33: Update on Memento (IIPC 2011 Plenary)

Memento Update

2011 IIPC General Assembly, Den Hague 33

Current Branding Practice for Web Archives

Page and embedded resources from same Web Archive

Branding for

page and

embedded resources from single

archive

Page 34: Update on Memento (IIPC 2011 Plenary)

Memento Update

2011 IIPC General Assembly, Den Hague 34

Branding for Web Archives in Memento Mode

Will be researched

Page and embedded resources from various Web Archives

HTML's branding

No branding

No branding

Page 35: Update on Memento (IIPC 2011 Plenary)

Memento Update

2011 IIPC General Assembly, Den Hague 35

Overview of Memento Framework

Deployment Progress

Memento and Discovery

Memento and Branding

Alternative Web Archiving Strategies

Page 36: Update on Memento (IIPC 2011 Plenary)

Memento Update

2011 IIPC General Assembly, Den Hague 36

Crawl-based Archives host distinct observations.

Transactional Archives never miss an update.

Page 37: Update on Memento (IIPC 2011 Plenary)

Memento Update

2011 IIPC General Assembly, Den Hague 37

Crawl-Based Web Archives

Distinct Observations are Archived for Many Servers

Page 38: Update on Memento (IIPC 2011 Plenary)

Memento Update

2011 IIPC General Assembly, Den Hague 38

Server-Side Transactional Web Archives

Entire Change History is Archived for a Single Server

Page 39: Update on Memento (IIPC 2011 Plenary)

Memento Update

2011 IIPC General Assembly, Den Hague 39

Development of Transactional Web Archive Software

Access: •  Online, real time access via Memento TimeGates •  Batch Export via WARC files for long term preservation

Capture: •  Apache connection filter module captures URI, headers, body •  POSTs in real-time to transactional archive

Page 40: Update on Memento (IIPC 2011 Plenary)

Memento Update

2011 IIPC General Assembly, Den Hague 40

Update on Memento http://mementoweb.org/

Herbert Van de Sompel Robert Sanderson Michael L. Nelson

Towards Seamless Navigation of the Web of the Past