archiving websites containing archiving young composer...
TRANSCRIPT
5/16/17
1
ArchivingWebsitesContainingStreamingMedia
HowardBesser,NYUh5p://besser.tsoa.nyu.edu/howard/Talks/
Besser-IS&TArchiving16/5/2017 1
ArchivingWebsitesContainingStreamingMedia
• Backgroundissuesandproblems• ArchivingYoungComposerwebsites
– OurTechnicalCollaboraOon– OurCollaboraOonwithContentCreators(preservaOon,accesscontrol,rights,agreements)
– Workflows– Howthingswilllook– EvaluaOon
• ImpactbeyondthisProject
Besser-IS&TArchiving16/5/2017 2
BACKGROUNDISSUESANDPROBLEMS
Besser-IS&TArchiving16/5/2017 3
WebArchivingposeschallenges
• Anygivenwebpagemaybeupdatedfrequently
• Weblinksconstantlybreak(404errors)• Fewtools/servicesexistfor“Curated”webarchiving(Archive-It,CDL’sWAS),andtheyrequiresignificanttraining/experiencetolearn,butwedohaveint’l-acceptedformat(WARC)
Besser-IS&TArchiving16/5/2017 4
ManyparametersneedtobesetforWebArchiving
• Frequencyofcrawls• Depthofcrawls(#ofhops)• StarOngpointsofcrawls(seeds)
Besser-IS&TArchiving16/5/2017 5
Otherissuesfordevelopinggoodcrawls
• Qualitycontrol/assurance• Workflows• Fidelitytooriginalwebpages• Howenduserwillnavigateandviewit
Besser-IS&TArchiving16/5/2017 6
5/16/17
2
Archive-It
• TheleadingapplicaOon/serviceforcuratedwebarchivinginNorthAmerica
• RunbytheInternetArchive,andismuchmoretargetedandcuratedthantheirWayBackMachine
• IsbasedonCrawlersogwaredevelopedbyIA(Heritrix)in2003-2004
• IsverypooratcapturingstreamingaudioorvideoaswellasinserOngitproperlyintoacomposedwebpage-
Besser-IS&TArchiving16/5/2017 7
Archive-ItIssuesw/StreamingMedia
Besser-IS&TArchiving16/5/2017 8
Archive-ItIssuesw/StreamingMedia
Besser-IS&TArchiving16/5/2017 9
Archive-ItIssuesw/StreamingMedia
Besser-IS&TArchiving16/5/2017 10
Archive-Itscreenshotsgeneratedaspartofourproject-
• ByLorenaRamirez-Løpez
Besser-IS&TArchiving16/5/2017 11
Archive-ItIssuesw/StreamingMediaFireFoxversion39.0.ScreenshotofTarikO’Regan’ssitetaken2015/10/05
Besser-IS&TArchiving16/5/2017 12
5/16/17
3
Archive-ItIssuesw/StreamingMediaFireFoxversion39.0.ScreenshotofTarikO’Regan’ssitetaken2015/10/05
Besser-IS&TArchiving16/5/2017 13
Archive-ItIssuesw/StreamingMediaFireFoxversion39.0.ScreenshotofTarikO’Regan’ssitetaken2015/10/05
Besser-IS&TArchiving16/5/2017 14
Archive-ItIssuesw/StreamingMediaFireFoxversion39.0.ScreenshotofTedHearne’swebsitetaken2015/10/05
Besser-IS&TArchiving16/5/2017 15
Somesourcesofstreamingissues
• Problemswithcapturingresourcesresidingon3rdpartyservices(YouTube,Vimeo,Soundcloud)
• ProblemswithhowfaithfullytheA/VmaterialsarecapturedandplacedbyArchive-It
• ProblemswithwebsitesgeneratedthroughsitebuildingplamormssuchasSquarespace
Besser-IS&TArchiving16/5/2017 16
ARCHIVINGYOUNGCOMPOSERWEBSITES
Besser-IS&TArchiving16/5/2017 17
ArchivingComposerWebsitesh5p://www.nyu.edu/about/news-publicaOons/news/2015/03/27/nyu-libraries-to-team-with-internet-archive-to-preserve-high-
quality-musical-content-on-the-web.html
• Collect,preserve,&makeavailableWebsitesofComposers
• $480,000grantfromMellonin2015toNYULibrary/MIAP/InternetArchive
• Dealingwiththeissuethatcontemporarycomposerwebsitesgoupanddown(andalsoincorporaterelaOonship-buildingbtwncomposerandfans)
• AddressingtheproblemsofcollecOngstreamingmedia• AlsoselecOvelycollecOnghigh-qualityversionsthatareusedtogeneratethestreams,andallowingfutureresearcherstosee/hearthehigherqualityversions
Besser-IS&TArchiving16/5/2017 18
5/16/17
4
ArchivingComposerWebsites
Besser-IS&TArchiving16/5/2017 19
• DevelopgoodandongoingrelaOonshipsbtwnLibrariesandComposers
• DevelopTrust– fordevelopingcollecOons,andconOnuingtoaddtothem– forPolicyreasons
• Examinewhattypeoferrorstakeplace– howfaithfullyaudiovisualmaterialsarebeingcaptured– howresourcesthatresideonthird-partyweb-services(YouTube,Vimeo,Soundcloud)are(not)displayedwithinArchive-It’sinterface
– IssueswwebsitesgeneratedthroughsitebuildingplamormssuchasSquarespace
• Findwaystofixthoseerrors
MetricsAccomplished(asofJan2017)
• 172Composersitescrawled,scoped,assessedforquality,&analyzedforproblems(feedingintoIAdevelopmentwork)
• 800QA/QCreportsgenerated• IniOalwebarchivingagreementfrom165Composers(25fromNPR’s100)
• IdenOfiedwebsiteinfrastructuresencounteredandcreatedaclassificaOonmatrix-
Besser-IS&TArchiving16/5/2017 20
WebsiteInfrastructureencountered
Besser-IS&TArchiving16/5/2017 21
ProjectTeam• JeffersonBailey(InternetArchive)• HowardBesser(MIAP)• LoriDonovan(InternetArchive)• AprilHathcock(Lib/ScholComm)• NicoleGreenhouse(Lib/ACM)• CarolKassel(Lib/DLTS)• Sco5Statland(MIAP)• DonaldMennerich(Lib/ACM/DLTS)• DavidMillman(Lib/DLTS)• CourtneyMumma(InternetArchive)• RobinPreiss(Lib/AFC)• LorenaRamirez(MIAP)---specialthanks!• MichaelStoller(Lib/C&RS)• KentUnderwood(Lib/AFC)• ChelaSco5Weber(Lib/AFC)
Besser-IS&TArchiving16/5/2017 22
OURTECHNICALCOLLABORATION:CRAWLING
Besser-IS&TArchiving16/5/2017 23
NYU/IACollaboraOon
Besser-IS&TArchiving16/5/2017 24
5/16/17
5
NYU/IACollaboraOon
Besser-IS&TArchiving16/5/2017 25
TradiOonalCrawlers
Besser-IS&TArchiving16/5/2017 26
• Archive-ItandotherwebarchivesuseHeritrix• Followlinks,capturemostwebcontent• Lesssuccessfulwithstreamingvideoanddynamiccontentexecutedinthebrowser
• Umbrahelps
BROZZLER!
“browser” | “crawler” = BROZZLER
Logo: Noah Levitt Besser-IS&TArchiving16/5/2017 27 Besser-IS&TArchiving16/5/2017 28
BrozzlerSystemArchitecturev1
Besser-IS&TArchiving16/5/2017 29
OURCOLLABORATIONWITHCONTENTCREATORS
Besser-IS&TArchiving16/5/2017 30
5/16/17
6
YoungComposersCorpus
• BeganwithNPR’s2011listof“100ComposersUnder40”
• 91of100haveownself-containedsites• Asof5/2016havewri5enagreementswith165Composers(25ofthemfromNPR’slist)
• Willrecruit10ofthemforenhancedarchiving(uncompressed;be5erthanwhatisonwebsite)– Thiswillrequireanaddedappendixtocontract/agreement(whichmayinvolvedarkarchivingand/orrestrictedaccess)
Besser-IS&TArchiving16/5/2017 31
BuildingrelaOonshipswithComposers
• EngagethemwiththeideaofpreservingtheirWebsite
• Aretheywillingtogiveusricherversionsofcontentontheirsite?
• Aretheywillingtomakeall(orjustpart)ofthecontentfreelyaccessible?Dotheywanttoembargosomecontentinadarkarchive?
• DonorAgreement/Contract-
Besser-IS&TArchiving16/5/2017 32
DonorAgreement/Contract
• Havebeenworkingonthiswithlawyersforapproximatelyoneyear
• Havehadfairlystablelanguageinitfor6months,and2contractsalreadysignedandreturned
• Doesdefaulttoallowinguscompleterightsforreformaungandforallowingresearcherstosee/hearallhighqualityversionsatminimumon-site– AndthusfarallComposerscontactedhaveagreedtothoseprinciples(butnotnecessarilytothecontractuallanguage)
Besser-IS&TArchiving16/5/2017 33
LongTermPreservaOonforScholarship
Besser-IS&TArchiving16/5/2017 34
Highestquality;futurelibraryprocesses
Besser-IS&TArchiving16/5/2017 35
ComposerindicatesrestricOonsonAccess
Besser-IS&TArchiving16/5/2017 36
5/16/17
7
WhoarecerOfiedusers?
Besser-IS&TArchiving16/5/2017 37
ARCHITECTURE&WORKFLOWS
Besser-IS&TArchiving16/5/2017 38
Architecture&Workflows
• Fullcopyofallwebsites(inclrichercontent)storedinNYURepositoryandaccessiblethroughNYUFindingAids
• MetadataisinArchiveSpace• ConnecOonsbuiltoffofArchiveSpaceback-endAPI
Besser-IS&TArchiving16/5/2017 39
CurrentDevelopmentwork
• Supplyingaseparateaudioplayer?• HiringaDigitalArchivist• PreciseformsofnavigaOonbtwnArchiveSpace,Archive-It,andrichercontentwithinNYU’sdigitalrepository
• API-
Besser-IS&TArchiving16/5/2017 40
API• WhatIAneedsfromNYUAPI
– APIURL– CredenOals(username,password)->AuthenOcaOonToken()– RepositoryID– ResourceID
• WhatIAwillreturnasJSONarray– UnitTitle– Creator– DataExpression– ExtentStatement– TechCharacterisOcs– [SomethingBasedonAccessRestricOon,i.e.canitbestreamed]???
• WeSpeakEtruscan,1993May21,23.5MB,1AIFFfileStereouncompressed16bit/44.1K
• TheDreamofInnocenceIII,1998March26,150MB,1AIFFfileStereouncompressed16bit/44.1K
Besser-IS&TArchiving16/5/2017 41
HOWTHINGSMAYLOOK
Besser-IS&TArchiving16/5/2017 42
5/16/17
8
UserQueries
• UserbrowsesthroughArchive-It• UserseesthatA/Vcontentexists(andinsomecases,itwillincluderichercontent,butsomeofthatmightbeaccess-restricted)
• Archive-IthandsoffusertoNYU(eitherdirectlytoA/Vcontent,ortoFindingAid)
Besser-IS&TArchiving16/5/2017 43
OneopOonforQueries
Besser-IS&TArchiving16/5/2017 44
OneopOonforhighqualitycontent
• OnarchivedwebsitepagelisOngcomposer’scontent,userseesamessagethathigherqualitycontentisavailable,with:– AccessrestricOons,ifapplicable– Linktorelevantfindingaid– (lookinglikefollowingimage)-
Besser-IS&TArchiving16/5/2017 45 Besser-IS&TArchiving16/5/2017 46
EVALUATION
Besser-IS&TArchiving16/5/2017 47
EvaluaOonforImprovement
• ComposersandtheirsaOsfacOonwiththewaysinwhichaudienceswillbeabletoviewarchivesoftheirwebsites
• Researchers,andwhetherthecontentandfuncOonalityofthesewebarchivesworksforthem
• Tweakingwhatwedoinordertobe5erserveCreatorsandResearchers
Besser-IS&TArchiving16/5/2017 48
5/16/17
9
ScheduleandMethodologyforEvaluaOon
• Dec2017—Scheduleone-on-oneinterviewswithsetsofcomposersandResearchers
• Jan-Mar2018—Onehourindividualsessionswith10Composersandalsowith10Researchers,havingthemlookattheuserinterfaceandconductqueries– Composers:AretheysaOsfiedwithhowaudienceswillbeabletoviewthe
archivalcopiesoftheirwebsites?Isitbe5erorworsethantheirownlivesites?AretheysaOsfiedwiththeaudioandvideoplacementandquality(aswellasopOons)?AretheycontentwiththeDonorAgreement?Whatchanges/improvementsmightbemadetoanyofthese?
– Researchers:Cantheyfindwhattheyneedinthewebarchive?Isitdifficult(clunky)touse?Whatpartsdon’tworkwelloraren’tintuiOve?WewanttoidenOfywhatchangesinthecontent,funcOonality,ornavigaOonfeatureswouldimprovetheiruserexperience
• Apr-May2018—ConstrucOonofEvaluaOonSummarycontainingthelistofimprovements/changesthatshouldbemadetotheArchivingproject
• June-Aug2018—Implementthechanges
Besser-IS&TArchiving16/5/2017 49
IMPACTBEYONDTHISPROJECT
Besser-IS&TArchiving16/5/2017 50
ImpactBeyondthisProject• Archive-Itwillbeabletobe5erhandlestreamingmedia,anddisplayitinpropercontext
• WewillhavearchitecturesandworkflowsforArchive-Ittointeractwithricherlocalresources(aswellasexamplesofhowinteracOonandnavigaOoncanproceedbtwnArchive-It,ArchiveSpace,FindingAids,andaninternaldigitalrepository)
• ModelsforinteracOonbtwncreatorsandcollecOngorganizaOonswillhavebeendeveloped(incldonoragreements)
• Wewillhavepreserved100+++websitesofyoungcomposers
Besser-IS&TArchiving16/5/2017 51
ArchivingWebsitesContainingStreamingMedia
• h5p://besser.tsoa.nyu.edu/howard/Talks/• h5p://www.nyu.edu/about/news-publicaOons/news/
2015/03/27/nyu-libraries-to-team-with-internet-archive-to-preserve-high-quality-musical-content-on-the-web.html
Besser-IS&TArchiving16/5/2017 52