archiving websites containing archiving young composer...

9
5/16/17 1 Archiving Websites Containing Streaming Media Howard Besser, NYU h5p://besser.tsoa.nyu.edu/howard/Talks/ Besser-IS&T Archiving 16/5/2017 1 Archiving Websites Containing Streaming Media Background issues and problems Archiving Young Composer websites Our Technical CollaboraOon Our CollaboraOon with Content Creators (preservaOon, access control, rights, agreements) Workflows How things will look EvaluaOon Impact beyond this Project Besser-IS&T Archiving 16/5/2017 2 BACKGROUND ISSUES AND PROBLEMS Besser-IS&T Archiving 16/5/2017 3 Web Archiving poses challenges Any given web page may be updated frequently Web links constantly break (404 errors) Few tools/services exist for “Curated” web archiving (Archive-It, CDL’s WAS), and they require significant training/experience to learn, but we do have int’l-accepted format (WARC) Besser-IS&T Archiving 16/5/2017 4 Many parameters need to be set for Web Archiving Frequency of crawls Depth of crawls (# of hops) StarOng points of crawls (seeds) Besser-IS&T Archiving 16/5/2017 5 Other issues for developing good crawls Quality control/assurance Workflows Fidelity to original web pages How end user will navigate and view it Besser-IS&T Archiving 16/5/2017 6

Upload: others

Post on 09-Jun-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Archiving Websites Containing Archiving Young Composer …besser.tsoa.nyu.edu/howard/Talks/17archiving-streaming... · 2017-05-16 · 5/16/17 3 Archive-It Issues w/Streaming Media

5/16/17

1

ArchivingWebsitesContainingStreamingMedia

HowardBesser,NYUh5p://besser.tsoa.nyu.edu/howard/Talks/

Besser-IS&TArchiving16/5/2017 1

ArchivingWebsitesContainingStreamingMedia

•  Backgroundissuesandproblems•  ArchivingYoungComposerwebsites

– OurTechnicalCollaboraOon– OurCollaboraOonwithContentCreators(preservaOon,accesscontrol,rights,agreements)

– Workflows– Howthingswilllook– EvaluaOon

•  ImpactbeyondthisProject

Besser-IS&TArchiving16/5/2017 2

BACKGROUNDISSUESANDPROBLEMS

Besser-IS&TArchiving16/5/2017 3

WebArchivingposeschallenges

•  Anygivenwebpagemaybeupdatedfrequently

•  Weblinksconstantlybreak(404errors)•  Fewtools/servicesexistfor“Curated”webarchiving(Archive-It,CDL’sWAS),andtheyrequiresignificanttraining/experiencetolearn,butwedohaveint’l-acceptedformat(WARC)

Besser-IS&TArchiving16/5/2017 4

ManyparametersneedtobesetforWebArchiving

•  Frequencyofcrawls•  Depthofcrawls(#ofhops)•  StarOngpointsofcrawls(seeds)

Besser-IS&TArchiving16/5/2017 5

Otherissuesfordevelopinggoodcrawls

•  Qualitycontrol/assurance•  Workflows•  Fidelitytooriginalwebpages•  Howenduserwillnavigateandviewit

Besser-IS&TArchiving16/5/2017 6

Page 2: Archiving Websites Containing Archiving Young Composer …besser.tsoa.nyu.edu/howard/Talks/17archiving-streaming... · 2017-05-16 · 5/16/17 3 Archive-It Issues w/Streaming Media

5/16/17

2

Archive-It

•  TheleadingapplicaOon/serviceforcuratedwebarchivinginNorthAmerica

•  RunbytheInternetArchive,andismuchmoretargetedandcuratedthantheirWayBackMachine

•  IsbasedonCrawlersogwaredevelopedbyIA(Heritrix)in2003-2004

•  IsverypooratcapturingstreamingaudioorvideoaswellasinserOngitproperlyintoacomposedwebpage-

Besser-IS&TArchiving16/5/2017 7

Archive-ItIssuesw/StreamingMedia

Besser-IS&TArchiving16/5/2017 8

Archive-ItIssuesw/StreamingMedia

Besser-IS&TArchiving16/5/2017 9

Archive-ItIssuesw/StreamingMedia

Besser-IS&TArchiving16/5/2017 10

Archive-Itscreenshotsgeneratedaspartofourproject-

•  ByLorenaRamirez-Løpez

Besser-IS&TArchiving16/5/2017 11

Archive-ItIssuesw/StreamingMediaFireFoxversion39.0.ScreenshotofTarikO’Regan’ssitetaken2015/10/05

Besser-IS&TArchiving16/5/2017 12

Page 3: Archiving Websites Containing Archiving Young Composer …besser.tsoa.nyu.edu/howard/Talks/17archiving-streaming... · 2017-05-16 · 5/16/17 3 Archive-It Issues w/Streaming Media

5/16/17

3

Archive-ItIssuesw/StreamingMediaFireFoxversion39.0.ScreenshotofTarikO’Regan’ssitetaken2015/10/05

Besser-IS&TArchiving16/5/2017 13

Archive-ItIssuesw/StreamingMediaFireFoxversion39.0.ScreenshotofTarikO’Regan’ssitetaken2015/10/05

Besser-IS&TArchiving16/5/2017 14

Archive-ItIssuesw/StreamingMediaFireFoxversion39.0.ScreenshotofTedHearne’swebsitetaken2015/10/05

Besser-IS&TArchiving16/5/2017 15

Somesourcesofstreamingissues

•  Problemswithcapturingresourcesresidingon3rdpartyservices(YouTube,Vimeo,Soundcloud)

•  ProblemswithhowfaithfullytheA/VmaterialsarecapturedandplacedbyArchive-It

•  ProblemswithwebsitesgeneratedthroughsitebuildingplamormssuchasSquarespace

Besser-IS&TArchiving16/5/2017 16

ARCHIVINGYOUNGCOMPOSERWEBSITES

Besser-IS&TArchiving16/5/2017 17

ArchivingComposerWebsitesh5p://www.nyu.edu/about/news-publicaOons/news/2015/03/27/nyu-libraries-to-team-with-internet-archive-to-preserve-high-

quality-musical-content-on-the-web.html

•  Collect,preserve,&makeavailableWebsitesofComposers

•  $480,000grantfromMellonin2015toNYULibrary/MIAP/InternetArchive

•  Dealingwiththeissuethatcontemporarycomposerwebsitesgoupanddown(andalsoincorporaterelaOonship-buildingbtwncomposerandfans)

•  AddressingtheproblemsofcollecOngstreamingmedia•  AlsoselecOvelycollecOnghigh-qualityversionsthatareusedtogeneratethestreams,andallowingfutureresearcherstosee/hearthehigherqualityversions

Besser-IS&TArchiving16/5/2017 18

Page 4: Archiving Websites Containing Archiving Young Composer …besser.tsoa.nyu.edu/howard/Talks/17archiving-streaming... · 2017-05-16 · 5/16/17 3 Archive-It Issues w/Streaming Media

5/16/17

4

ArchivingComposerWebsites

Besser-IS&TArchiving16/5/2017 19

•  DevelopgoodandongoingrelaOonshipsbtwnLibrariesandComposers

•  DevelopTrust–  fordevelopingcollecOons,andconOnuingtoaddtothem–  forPolicyreasons

•  Examinewhattypeoferrorstakeplace–  howfaithfullyaudiovisualmaterialsarebeingcaptured–  howresourcesthatresideonthird-partyweb-services(YouTube,Vimeo,Soundcloud)are(not)displayedwithinArchive-It’sinterface

–  IssueswwebsitesgeneratedthroughsitebuildingplamormssuchasSquarespace

•  Findwaystofixthoseerrors

MetricsAccomplished(asofJan2017)

•  172Composersitescrawled,scoped,assessedforquality,&analyzedforproblems(feedingintoIAdevelopmentwork)

•  800QA/QCreportsgenerated•  IniOalwebarchivingagreementfrom165Composers(25fromNPR’s100)

•  IdenOfiedwebsiteinfrastructuresencounteredandcreatedaclassificaOonmatrix-

Besser-IS&TArchiving16/5/2017 20

WebsiteInfrastructureencountered

Besser-IS&TArchiving16/5/2017 21

ProjectTeam•  JeffersonBailey(InternetArchive)•  HowardBesser(MIAP)•  LoriDonovan(InternetArchive)•  AprilHathcock(Lib/ScholComm)•  NicoleGreenhouse(Lib/ACM)•  CarolKassel(Lib/DLTS)•  Sco5Statland(MIAP)•  DonaldMennerich(Lib/ACM/DLTS)•  DavidMillman(Lib/DLTS)•  CourtneyMumma(InternetArchive)•  RobinPreiss(Lib/AFC)•  LorenaRamirez(MIAP)---specialthanks!•  MichaelStoller(Lib/C&RS)•  KentUnderwood(Lib/AFC)•  ChelaSco5Weber(Lib/AFC)

Besser-IS&TArchiving16/5/2017 22

OURTECHNICALCOLLABORATION:CRAWLING

Besser-IS&TArchiving16/5/2017 23

NYU/IACollaboraOon

Besser-IS&TArchiving16/5/2017 24

Page 5: Archiving Websites Containing Archiving Young Composer …besser.tsoa.nyu.edu/howard/Talks/17archiving-streaming... · 2017-05-16 · 5/16/17 3 Archive-It Issues w/Streaming Media

5/16/17

5

NYU/IACollaboraOon

Besser-IS&TArchiving16/5/2017 25

TradiOonalCrawlers

Besser-IS&TArchiving16/5/2017 26

•  Archive-ItandotherwebarchivesuseHeritrix•  Followlinks,capturemostwebcontent•  Lesssuccessfulwithstreamingvideoanddynamiccontentexecutedinthebrowser

•  Umbrahelps

BROZZLER!

“browser” | “crawler” = BROZZLER

Logo: Noah Levitt Besser-IS&TArchiving16/5/2017 27 Besser-IS&TArchiving16/5/2017 28

BrozzlerSystemArchitecturev1

Besser-IS&TArchiving16/5/2017 29

OURCOLLABORATIONWITHCONTENTCREATORS

Besser-IS&TArchiving16/5/2017 30

Page 6: Archiving Websites Containing Archiving Young Composer …besser.tsoa.nyu.edu/howard/Talks/17archiving-streaming... · 2017-05-16 · 5/16/17 3 Archive-It Issues w/Streaming Media

5/16/17

6

YoungComposersCorpus

•  BeganwithNPR’s2011listof“100ComposersUnder40”

•  91of100haveownself-containedsites•  Asof5/2016havewri5enagreementswith165Composers(25ofthemfromNPR’slist)

•  Willrecruit10ofthemforenhancedarchiving(uncompressed;be5erthanwhatisonwebsite)–  Thiswillrequireanaddedappendixtocontract/agreement(whichmayinvolvedarkarchivingand/orrestrictedaccess)

Besser-IS&TArchiving16/5/2017 31

BuildingrelaOonshipswithComposers

•  EngagethemwiththeideaofpreservingtheirWebsite

•  Aretheywillingtogiveusricherversionsofcontentontheirsite?

•  Aretheywillingtomakeall(orjustpart)ofthecontentfreelyaccessible?Dotheywanttoembargosomecontentinadarkarchive?

•  DonorAgreement/Contract-

Besser-IS&TArchiving16/5/2017 32

DonorAgreement/Contract

•  Havebeenworkingonthiswithlawyersforapproximatelyoneyear

•  Havehadfairlystablelanguageinitfor6months,and2contractsalreadysignedandreturned

•  Doesdefaulttoallowinguscompleterightsforreformaungandforallowingresearcherstosee/hearallhighqualityversionsatminimumon-site– AndthusfarallComposerscontactedhaveagreedtothoseprinciples(butnotnecessarilytothecontractuallanguage)

Besser-IS&TArchiving16/5/2017 33

LongTermPreservaOonforScholarship

Besser-IS&TArchiving16/5/2017 34

Highestquality;futurelibraryprocesses

Besser-IS&TArchiving16/5/2017 35

ComposerindicatesrestricOonsonAccess

Besser-IS&TArchiving16/5/2017 36

Page 7: Archiving Websites Containing Archiving Young Composer …besser.tsoa.nyu.edu/howard/Talks/17archiving-streaming... · 2017-05-16 · 5/16/17 3 Archive-It Issues w/Streaming Media

5/16/17

7

WhoarecerOfiedusers?

Besser-IS&TArchiving16/5/2017 37

ARCHITECTURE&WORKFLOWS

Besser-IS&TArchiving16/5/2017 38

Architecture&Workflows

•  Fullcopyofallwebsites(inclrichercontent)storedinNYURepositoryandaccessiblethroughNYUFindingAids

•  MetadataisinArchiveSpace•  ConnecOonsbuiltoffofArchiveSpaceback-endAPI

Besser-IS&TArchiving16/5/2017 39

CurrentDevelopmentwork

•  Supplyingaseparateaudioplayer?•  HiringaDigitalArchivist•  PreciseformsofnavigaOonbtwnArchiveSpace,Archive-It,andrichercontentwithinNYU’sdigitalrepository

•  API-

Besser-IS&TArchiving16/5/2017 40

API•  WhatIAneedsfromNYUAPI

–  APIURL–  CredenOals(username,password)->AuthenOcaOonToken()–  RepositoryID–  ResourceID

•  WhatIAwillreturnasJSONarray–  UnitTitle–  Creator–  DataExpression–  ExtentStatement–  TechCharacterisOcs–  [SomethingBasedonAccessRestricOon,i.e.canitbestreamed]???

•  WeSpeakEtruscan,1993May21,23.5MB,1AIFFfileStereouncompressed16bit/44.1K

•  TheDreamofInnocenceIII,1998March26,150MB,1AIFFfileStereouncompressed16bit/44.1K

Besser-IS&TArchiving16/5/2017 41

HOWTHINGSMAYLOOK

Besser-IS&TArchiving16/5/2017 42

Page 8: Archiving Websites Containing Archiving Young Composer …besser.tsoa.nyu.edu/howard/Talks/17archiving-streaming... · 2017-05-16 · 5/16/17 3 Archive-It Issues w/Streaming Media

5/16/17

8

UserQueries

•  UserbrowsesthroughArchive-It•  UserseesthatA/Vcontentexists(andinsomecases,itwillincluderichercontent,butsomeofthatmightbeaccess-restricted)

•  Archive-IthandsoffusertoNYU(eitherdirectlytoA/Vcontent,ortoFindingAid)

Besser-IS&TArchiving16/5/2017 43

OneopOonforQueries

Besser-IS&TArchiving16/5/2017 44

OneopOonforhighqualitycontent

•  OnarchivedwebsitepagelisOngcomposer’scontent,userseesamessagethathigherqualitycontentisavailable,with:– AccessrestricOons,ifapplicable– Linktorelevantfindingaid–  (lookinglikefollowingimage)-

Besser-IS&TArchiving16/5/2017 45 Besser-IS&TArchiving16/5/2017 46

EVALUATION

Besser-IS&TArchiving16/5/2017 47

EvaluaOonforImprovement

•  ComposersandtheirsaOsfacOonwiththewaysinwhichaudienceswillbeabletoviewarchivesoftheirwebsites

•  Researchers,andwhetherthecontentandfuncOonalityofthesewebarchivesworksforthem

•  Tweakingwhatwedoinordertobe5erserveCreatorsandResearchers

Besser-IS&TArchiving16/5/2017 48

Page 9: Archiving Websites Containing Archiving Young Composer …besser.tsoa.nyu.edu/howard/Talks/17archiving-streaming... · 2017-05-16 · 5/16/17 3 Archive-It Issues w/Streaming Media

5/16/17

9

ScheduleandMethodologyforEvaluaOon

•  Dec2017—Scheduleone-on-oneinterviewswithsetsofcomposersandResearchers

•  Jan-Mar2018—Onehourindividualsessionswith10Composersandalsowith10Researchers,havingthemlookattheuserinterfaceandconductqueries–  Composers:AretheysaOsfiedwithhowaudienceswillbeabletoviewthe

archivalcopiesoftheirwebsites?Isitbe5erorworsethantheirownlivesites?AretheysaOsfiedwiththeaudioandvideoplacementandquality(aswellasopOons)?AretheycontentwiththeDonorAgreement?Whatchanges/improvementsmightbemadetoanyofthese?

–  Researchers:Cantheyfindwhattheyneedinthewebarchive?Isitdifficult(clunky)touse?Whatpartsdon’tworkwelloraren’tintuiOve?WewanttoidenOfywhatchangesinthecontent,funcOonality,ornavigaOonfeatureswouldimprovetheiruserexperience

•  Apr-May2018—ConstrucOonofEvaluaOonSummarycontainingthelistofimprovements/changesthatshouldbemadetotheArchivingproject

•  June-Aug2018—Implementthechanges

Besser-IS&TArchiving16/5/2017 49

IMPACTBEYONDTHISPROJECT

Besser-IS&TArchiving16/5/2017 50

ImpactBeyondthisProject•  Archive-Itwillbeabletobe5erhandlestreamingmedia,anddisplayitinpropercontext

•  WewillhavearchitecturesandworkflowsforArchive-Ittointeractwithricherlocalresources(aswellasexamplesofhowinteracOonandnavigaOoncanproceedbtwnArchive-It,ArchiveSpace,FindingAids,andaninternaldigitalrepository)

•  ModelsforinteracOonbtwncreatorsandcollecOngorganizaOonswillhavebeendeveloped(incldonoragreements)

•  Wewillhavepreserved100+++websitesofyoungcomposers

Besser-IS&TArchiving16/5/2017 51

ArchivingWebsitesContainingStreamingMedia

•  h5p://besser.tsoa.nyu.edu/howard/Talks/•  h5p://www.nyu.edu/about/news-publicaOons/news/

2015/03/27/nyu-libraries-to-team-with-internet-archive-to-preserve-high-quality-musical-content-on-the-web.html

Besser-IS&TArchiving16/5/2017 52