resourcesync - an introduction
TRANSCRIPT
ResourceSync - An Introduction
Todd CarpenterExecutive Director, NISO
ALCTS Continuing Resources Standards ForumSunday, June 24, 2012
With thanks to Herbert Van de Sompel and Robert Sanderson (LANL)
@TAC_NISO Twitter Highlights• Presenting this morning on the ResourceSync project at ALCTS Continuing Resources Standards
Forum #ALCTSCRS #ala12
• I’m pre-tweeing my slides during #rsync presentation. Slides here: _________ #ala12
• NISO mission is to develop and maintain technical standards related to information, documentation, discovery and distribution of content #ala12
• Standards are all around us, even if we don't notice them, especially in books. Things like page numbers, paper, binding, even spelling is standardized. #NISO #ala12
• Machines don’t talk like people do. Then again some people don’t talk like other people do, particularly teenagers #ala12
• So where did the ResourceSync project start? #NISO approached OAI about updating the PMH protocol. #ala12
• The #NISO / OAI ResourceSync project was possible through the generous support of the Alfred P. Sloan Foundation. Thank you! #ala12
• What is RSync trying to solve: Source Server has resources that change. Destination servers want to leverage some or all of Source on regular ongoing basis in near-real-time & at web scale. #ala12
• Syntonization can be good enough or perfect and synchronization can be fast or fast enough. #ala12
• RSync is studying a number of existing protocols to determine which (or combination of) protocols can best meet needs. We have an bias against developing new spec from scratch. #ala12
• There are several models for synchronizing content: pull, push, conditional pull, mediated feed and pull, and a mix of feed/push/pull/service models. #ala12
• The goal of ResourceSync is to find the model that most efficiently distributes the content, while limiting the tax on the source system. #ala12
• This is very early days in the process of standards development. We’re still in the incubation stage. Consensus and adoption phases will come in 2013 and beyond. #ala12
• We hope to have a beta specification available by the end of 2012 of ResourceSync #ala12
Non-profit industry trade association accredited by ANSI
Mission of developing and maintaining technical standards related to information, documentation, discovery and distribution of published materials and media
Volunteer driven organization: 400+ spread out across the world
About
June 23, 2012 ALCTS CRS Standards IG -‐ ALA Annual 2012
Standards are familiar, even if you don’t no4ce
4
June 23, 2012 ALCTS CRS Standards IG -‐ ALA Annual 2012 5
Machines don’t talk like people do
June 23, 2012 ALCTS CRS Standards IG -‐ ALA Annual 2012 6
Machines talk like this
June 23, 2012 ALCTS CRS Standards IG -‐ ALA Annual 2012
How did we get here?
• OAI-‐PMH Protocol– Developed in 200X
– Developed by Herbert van de Sompel, Carl Lagoze and the OAI team
– Fairly wide adopQon in scholarly community
• In spring 2011, NISO approached OAI to discuss updaQng PMH Protocol
• Response was “Let’s try something else more in line with more modern technology”
7
June 23, 2012 ALCTS CRS Standards IG -‐ ALA Annual 2012 8
A partnership is born
• Agreement to launch RSync as a NISO standards process
• Partnership on grant application
• OAI team comprised core technology team
• Partnership on grant application
June 23, 2012 ALCTS CRS Standards IG -‐ ALA Annual 2012
Special thanks are due to...
9
June 23, 2012 ALCTS CRS Standards IG -‐ ALA Annual 2012
What we trying to solve?
Consideration:Source (server) A has resources that change over time: they get
created, modified, deleted, moved, …Destination (servers) X, Y, and Z leverage (some) resources of
Source A.Problem:
Destinations want to keep in step with the resource changes at Source A: resource synchronization.
Task of ResourceSync effort:Design an approach for resource synchronization aligned with the
Web Architecture that has a fair chance of adoption by different communities.The approach must scale better than recurrent HTTP HEAD/
GET on resources.
10
June 23, 2012 ALCTS CRS Standards IG -‐ ALA Annual 2012
Use cases differ
How good is the synchronization?
How fast is the synchronization?
11
Perfect Good enough
Fast Fast enough
June 23, 2012 ALCTS CRS Standards IG -‐ ALA Annual 2012
3 disQnct needs regarding resource synchronizaQon
Baseline matching: An approach to allow a DesQnaQon that wants to start synchronizing with a Source to perform an iniQal catch up – Dump.
Incremental resource synchronizaQon: An approach to allow a DesQnaQon to remain up-‐to-‐date regarding changes at the Source.
Audit: An approach to allow checking whether a DesQnaQon is in sync with a Source – Inventory.
=> All 3 are considered in scope for the effort
12
June 23, 2012 ALCTS CRS Standards IG -‐ ALA Annual 2012
ResourceSync Working GroupHerbert Van de Sompel (Chair)Los Alamos National Laboratory
Todd Carpenter (Co-Chair)National Information Standards Organization (NISO)
Nettie LagaceNational Information Standards Organization (NISO)
Manuel BernhardtDelving B.V.
Kevin FordLibrary of Congress
Bernhard HaslhoferCornell University
Richard JonesJoint Information Systems Committee (JISC)
Martin KleinLos Alamos National Laboratory
Graham KlyneJoint Information Systems Committee (JISC)
Carl LagozeCornell University
Stuart LewisJoint Information Systems Committee (JISC)
Peter MurrayLyrasis
Michael NelsonOld Dominion UniversityDavid RosenthalStanford University
Christian SadilekRed Hat
Shlomo SandersEx Libris, Inc.
Robert SandersonLos Alamos National Laboratory
Sjoerd SiebingaDelving B.V.
Ed SummersLibrary of Congress
Simeon WarnerCornell University
Jeff YoungOCLC Online Computer Library Center
13
8/23/11 Data AeribuQon and CitaQon Workshop14
hep://imgs.xkcd.com/comics/standards.png/
June 23, 2012 ALCTS CRS Standards IG -‐ ALA Annual 2012
Change NoQficaQon -‐ Protocols
Atom PubSubHubbub (PuSH)XMPP
PubSub extensionBoSH (XMPP over HTTP)
Comet / HTTP StreamingOpen an HTTP connection and keep reading from itBayeux Protocol
Long PollingKeep HTTP connection open until a message, then reopenBoSH, Bayeux option
WebSocketsNullMQ / ZeroMQXMPP over WebSockets?
15
June 23, 2012 ALCTS CRS Standards IG -‐ ALA Annual 2012
Incremental Synchroniza9on
Change NoQficaQon (CN)Alert that something happened (create,update,delete)
Content Transfer (CT)Transfer of just the change or the full resource
16
June 23, 2012 ALCTS CRS Standards IG -‐ ALA Annual 2012
Trivial versus OpQmal Approaches
• Trivial Approach -‐ Retrieve & Compare
• OpQmal Approach -‐ push only the change to only the destinations monitoring the resource
17
June 23, 2012 ALCTS CRS Standards IG -‐ ALA Annual 2012
More advanced opQons
• Trivial Approach plus CondiQonal GET:– Retrieve every resource if it has changed
– EssenQally this is a Change NoQficaQon Pull
– Not scalable, strain on Source Systems, no way to know of new resources
18
June 23, 2012 ALCTS CRS Standards IG -‐ ALA Annual 2012
More advanced opQonsSimplest Workable Model:
Introduce a Feed of change noQficaQons for all resources
Atom, RSS, OAI-‐PMH, SiteMaps, etc
=>SQll not efficient, no way to know when to pull
19
June 23, 2012 ALCTS CRS Standards IG -‐ ALA Annual 2012
More advanced opQonsFeed Extension SoluQon:
ConQnue the Feed paradigm, but introduce aggregaQng service and ping noQficaQon to re-‐pull (simulated push)
Only advantageous if Source already supports a Feed
20
June 23, 2012 ALCTS CRS Standards IG -‐ ALA Annual 2012
The lifecycle of standards
21
You are here
June 23, 2012 ALCTS CRS Standards IG -‐ ALA Annual 2012
Ongoing Research
• Change NoQficaQon -‐ XMPP & XMPP PubSub & bleeps
– LANL
– Ongoing Experiment with Live DBPedia
• Change NoQficaQon -‐ Comet / HTTP Streaming & bleeps
– ODU
– Bayeux Protocol via Faye ImplementaQon
• Change NoQficaQon -‐ Change Simulator
– Cornell U
– Generate configurable change noQficaQons
– Use as standardized input to different systems for tesQng
• Baseline Matching & Audit
– Cornell U
22
June 23, 2012 ALCTS CRS Standards IG -‐ ALA Annual 2012
Timeline
• Project Launch = November 2011
• Approved work item = December 2011
• Working Group formed = February 2012
• Webinar on project = March 2012
• JCDL meeQng, Washington DC = June 2012
• Alpha = ?? September 2012
• Beta/Dran for trail use = ?? December 2012
• Comment period = ?? December 2012 -‐ March 2012
• Training = ?? May -‐ July 2013
• Approval = ?? December 2013
23
June 23, 2012 ALCTS CRS Standards IG -‐ ALA Annual 2012 24
Thank you!
Todd Carpenter, Executive [email protected]
National Information Standards Organization (NISO)One North Charles Street, Suite 1905Baltimore, MD 21201 USA+1 (301) 654-2512www.niso.org
NOTE =>NISO IS MOVING IN JULY 2012 <=