lots of copies keep stuff safe lots of lockss keeping ... · web services imperative 1. “all...
TRANSCRIPT
LOTS OF COPIES KEEP STUFF SAFE
Lots of LOCKSS Keeping Stuff Safe: The Future of the LOCKSS Program
NicholasTaylor(@nullhandle)ProgramManagerforLOCKSSandWebArchivingStanfordUniversityLibrariesCNIFall2016MembershipMeeKng12December2016
why more LOCKSS? • mature,community-validatedtechnology
• research-based+builttoaspecificthreatmodel
• web-centricpreserva6onforweb-centricscholarship
• community-centricpreserva6onforcollecKvechallenges+opportuniKes
• robust,distributeddigitalpreservaKon
“Cologne Love Padlocks” by orkomedix under CC BY-NC-SA 2.0
Program History
“Grant Park” by xelipe under CC BY-NC-SA 2.0
inception • aserialslibrarian+acomputerscienKst
• printjournals→Web• conservelibrary’sroleaspreserver
• collectfrompublishers’websites
• preservew/cheap,distributed,library-managedhardware
• disseminatewhenunavailablefrompublisher
Chris Dobson: “From Bright Idea to Beta Test”
philosophy + focus
• lotsofcopieskeepstuffsafe
• preservaKonisanac6vecommunityeffort
• lotsofcommuni6eskeepstuffsafe
• enablecommuniKestopreserve+accesstheirscholarlyrecord
“Le Penseur” by Ian Abbott under CC BY-NC-SA 2.0
present day
• financiallyself-sustaining
• tensofnetworks• hundredsofinsKtuKons• alltypesofcontent
“LOCKSS | Lots of Copies Keep Stuff Safe”
looking forward
• organizaKonalchanges• soYwareevoluKon• LOCKSSnetworks• distributeddigitalpreservaKon
“Looking for a brighter Future?” by Vincent Brassinne under CC BY-NC-ND 2.0
“Olympic Relay Handoff” by Dr. Mark Kubert under CC BY-NC-ND 2.0
Organizational Changes
David + Vicky
AmericanLibraryAssociaKon:“VictoriaReichandDavidS.H.Rosenthal”
personal introduction • 10yearsinresearchlibraries:
• StanfordUniversityLibraries(2013–present)• LibraryofCongress(2010–2013)• U.S.SupremeCourt(2007–2010)
• professionalbackground:• webarchives• digitallibraryservices• librarytechnology
• whatIcareabout:• scalability+sustainabilityofPLNs,CLOCKSS• mainstreamingLOCKSSfordigitalpreservaKon
• buildingcollaboraKvetechnicalcommuni6es
SUL Web Archiving • end-to-endservice:
• collect• preserve• makeaccessible• makediscoverable
• integratew/collecKondevelopment
• usecases:• scholarlyinputs/outputs• insKtuKonallegacy/compliance
• governmentinformaKon
InternetArchive:“StanfordUniversityHomepage”
LOCKSS + DLSS administrativa • LOCKSSintegraKngw/SULDigitalLibrarySystems&Services(DLSS)
• ledbyTomCramer,Director&AssociateUniversityLibrarian
• LOCKSS+SULWebArchiving,underNicholasTaylor
“SPO.101514.SLIDERlathrop.jpg” by Michael Hong
LOCKSS + DLSS synergies • realizeopera6onalefficiencies
• adopt,drivesharedengineeringbestprac6ces
• promoteAPI-orientedarchitectures
• streamlinerepository→PLNdatahand-offs
• contributeupstreamtosharedtools
• broaden,diversifycommunityoutreach
Software Evolution
“Why we love our macs” by Jason Corneveaux under CC BY-NC-ND 2.0
new functionality
• supportedbyMellonFoundaKongrant
• ingest/harvest• form-filling• AJAX
• disseminaKon• Memento• Shibboleth
• preservaKon• pollingperformance “1.13.09: versatility” by Team Dalog under CC BY 2.0
new architecture
• exisKngfuncKonality• discretecomponentsaswebservices
• incorporateexternalsoYware
“San Francisco Oakland Bay Bridge, East Spans New and Old” by Shanan under CC BY-NC 2.0
web services imperative 1. “Allteamswillhenceforthexposetheirdataand
funcKonalitythroughserviceinterfaces.”2. “Teamsmustcommunicatewitheachotherthrough
theseinterfaces.”3. “Therewillbenootherformofinterprocess
communica6onallowed:nodirectlinking,nodirectreadsofanotherteam'sdatastore,noshared-memorymodel,noback-doorswhatsoever.”
4. “Allserviceinterfaces,withoutexcepKon,mustbedesignedfromthegrounduptobeexternalizable.Thatistosay,theteammustplananddesigntobeabletoexposetheinterfacetodevelopersintheoutsideworld.”
5. “Anyonewhodoesn'tdothiswillbefired.”Steve Yegge: “Stevey's Google Platforms Rant”
risk of large projects
smallprojects(<$1million)4%
20%
76%
largeprojects(>$10million)
38%
52%
10%
StandishGroup:“ChaosManifesto2013:ThinkBig,ActSmall”
successful(onKme,onbudget)
challenged(late,overbudget,lackingfuncKonality)
failed(cancelled,ordelivered
andneverused)
Basedonan8-yearsurveyof50,000so3wareprojectsbytheStandishGroup.
why re-architect LOCKSS?
• reducesupport+operaKonscosts• leverageweb-scaleopen-sourcesoYware• alignw/webarchivingmainstream
• de-silocomponents+enableexternalintegraKon• metadataextracKon• archiveaccessviaDOI+OpenURL• polling+repairprotocol
• preparetoevolvew/theWeb• webservicesarchitectureasflexiblefoundaKon
integration opportunities
• polling+repair• repositoryreplicaKonlayer
• otherdistributeddigitalpreservaKonsystems
• access• Dockerizedfull-textsearchforwebarchives
• DOI+OpenURLaccesstowebarchives
• metadataextracKon “A Different Kind of Weave” by Barbara Courouble under CC BY-NC 2.0
aligning with web archiving WebARChive(WARC)format compa6bletechnologies
• Heritrix• OpenWayback• WarcBase• WebArchivingProxy
21
web archiving system APIs (WASAPI)
leveraging community components
development progress
• accessWARC-storedcontentvia:
• DOI• OpenURL• URL• Solrfull-textsearch
• webservices:• metadataextracKon• metadatadatabase
“Milestones” by Dheeraj Nagwani under CC BY-NC-ND 2.0
product roadmap • 2017
• Docker-izecomponents• webharvestframework• polling+repairwebservice
• releasetoPLNs• 2018
• IPaddress+ShibbolethaccessviaOpenWayback
• OpenWaybackformatnegoKaKonframework
• full-textsearchwebservice
• releasetoGLN“Printemps, work in progress” by Eric Gjerde under CC BY-NC 2.0
LOCKSS Networks
“Railroad Wye Switch” by Noel Hankamer under CC BY-NC-SA 2.0
Controlled LOCKSS (CLOCKSS) • whatisit?
• library/publisherpartnership• preservethescholarlyrecord• 12globally-distributednodes• darkunKlnolongeraccessible• triggeredcontentworld-accessible
• lookingforward• expandcapacity• increasepursuitoflongtail• championstandardstosimplifyarchiving(e.g.,SignposKng)
Private LOCKSS Networks (PLNs) • whatarethey?
• communityofinterest• jointlydesignatecontent• rundistributednodes• establishgovernance• preservaKonviadiversetechnologies,insKtuKons,networks
• lookingforward• createdocumenta6on• enableself-setup• supportcommunitycollabora6on
• preservewebarchives
national networks
• whatarethey?• in-countrypreservaKon• localstewardship• perpetualaccess• non-consumpKveuse
• lookingforward• morenetworks• preservingna6onallong-tailcontent
“1951 World Map” by peonyandthistle under CC BY-NC-ND 2.0
Distributed Preservation
“Catho longtime [explored]” by Bill Collison under CC BY-NC 2.0
distributed preservation landscape • beqerunderstandingofroleofdistributeddarkarchives
• nextlogicalstepbeyondmaturelocalpreservaKon
• appealingopKonforthosew/omaturelocalpreservaKon
a greater role for LOCKSS?
• bolsterexisKngefforts• undergirdPLNserviceproviders
• mainstreamdistributeddigitalpreservaKon
“DSCN7867” by tyalis_2 under CC BY-NC-ND 2.0
LOCKSS for web archiving
• growthinwebarchiving• centralizaKoninwebarchiving
• naKveWARCsupport• logicalcomplementforwebarchivepreservaKon
NDSA: “Web Archiving in the United States”
reliance on service provider
25.40%
60.32%
14.29%
19.51%
63.41%
15.85%
4.81%
63.29%
30.38%
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
Local External Both
2011 2013 2015
NDSA: “2016 NDSA Web Archiving Survey”
flat data transfer trend
19.15%
80.85%
20.29%
79.71%
20.27%
79.73%
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
Transferdata Donottransferdata
2011 2013 2015
NDSA: “2016 NDSA Web Archiving Survey”
Recap
“Rearview” by jenkinson2455 under CC BY-NC 2.0
vision
• beqerensurethepreserva6onofwebarchives• LOCKSSteammoreacKvelyengagedincommunity-supporteddevelopmentefforts
• communiKesenabledtomoreeasilycontributetoLOCKSSsocware,orrunitw/oourhelp
• alongertailofins6tu6onsabletocapitalizeondistributeddigitalpreservaKon
• LOCKSScomponentsappliedincontextsotherthanLOCKSSnetworks
Questions?
“stanford dish at sunset” by Dan under CC BY-NC-SA 2.0