managing and preserving data and complex digital objects€¦ · managing and preserving data and...

Post on 30-May-2020

9 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

ManagingandPreservingDataandComplexDigitalObjectsKATHERINESKINNER,PHDEXECUTIVEDIRECTOR,EDUCOPIAINSTITUTEAUGUST9,2017:ETD2017WORKSHOP

LearningObjectivesAfterthisworkshop,youwillbeableto:1. Demonstrateyourfamiliaritywithemergingmethodsandtoolsto

supportstudents’digitalobjectmanagement2. Anticipatetheimpactofexpandingresearchformatsand

materialsonlifecyclemanagementpracticesforETDprograms3. Describewhatotherinstitutionsaredoingtosupportstudentsin

managing(orlearningtomanage)theirresearchoutputs4. Deliverworkshopstostudentsand/orfacultyonyourown

campustohelpthembuildtheirdigitalmanagementskills

Agenda09:00– 09:20:Welcome,Introductions09:20– 09:40:Activity09:40– 10:00:ManagingDataandComplexDigitalObjects10:00– 10:15:Break10:15– 11:00:DataOrganization,FileFormats,andStorage11:00– 11:40:ConnectingStudentNeedstoUniversityNeeds11:45– 12:00:Wrapup

ActivityUsingthestickynotesprovidedonyourtable,pleaseanswerthefollowingquestion:

Whatcontenttype(s)doesyourinstitution’sETDprogramcurrentlyaccept?

◦Writedownonefileformattypeperstickynote◦ Asyoufinish,pleasesticktheminthemiddleofyourtable◦ Tallythenumberandwritethatdownonaseparatesticky◦ “Idon’tknow”isaperfectlyreasonableanswer

ActivityUsingthestickynotesprovidedonyourtable,pleaseanswerthefollowingquestion:

Whatcontenttypesdoesyourinstitution’sdigitalpreservationprogramcurrentlysupport?

◦Writedownonefileformattypeperstickynote◦ Asyoufinish,pleasesticktheminthemiddleofyourtable◦ Tallythenumberandwritethatdownonaseparatesticky◦ “Idon’tknow”isaperfectlyreasonableanswer

Reflections

Photo by Erik Eastman on Unsplash Photo by Yolanda Sun on Unsplash Photo by Mike Wilson on Unsplash

ETD+ audio-videofiles

digitalart

GISdatasetsvisualizations

softwarecode

digitaltext

researchdata

ManagingDataandComplexDigitalObjects

http://metaarchive.org

ETDplus team:EducopiaInstitute

MetaArchive Cooperative

NDLTD

ProQuest

CarnegieMellonUniversity

ColoradoStateUniversity

HBCULibraryAlliance

IndianaStateUniversity

OregonStateUniversity

PennStateUniversity

PurdueUniversity

UniversityofLouisville

UNCSchoolofLibraryandInformationScience

UniversityofNorthTexas

UniversityofTennessee- Knoxville

VirginiaTechUniversity

9

Coreresearchquestion:HowcaninstitutionsbestensurethelongevityandavailabilityofETDresearchdataandcomplexdigitalobjects(e.g.,software,multimediafiles)thatcompriseanintegralcomponentofstudentthesesanddissertations?

Startingplace:Surveys§NineCampuses§TenIRBs(argh!)§Twosurveys§795studentresponses§35administratorsandstaff

Photo by Braden Collum on Unsplash.com

ParticipatingUniversitiesCarnegieMellonUniversityColoradoStateUniversityIndianaStateUniversityOregonStateUniversityPennStateUniversity

PurdueUniversityUniversityofLouisvilleUniversityofTennesseeKnoxvilleVirginiaTechUniversity

Fully80%of795studentsreporttheywillproducenon-textfilesintheirdissertationorthesisresearch

Fully80%of795studentsreporttheywillproducenon-textfilesintheirdissertationorthesisresearch

Studentsreportthatthesenon-PDFfileslikeresearchdata,video,digitalart,andsoftwarecodeareeitherasimportantormoreimportantthanthosesubmittedasPDFstosatisfydegreerequirements.

Mostimportantcontenttoauthor Mostimportantcontenttoothersasguessedbyauthor

Photoby JoeGardner on Unsplash

35%+ofrespondentssaidthattheirnon-textfilesaretheirmostimportantfiles

…butonly13%ofrespondentsreportedplanstoactuallysubmit thosematerialsintheirETDpackage.

Astudent’sperceptionofvalue isnotthekeydriverforwhatresearchoutputss/hechoosestosubmitinathesisordissertationpackage.

NEEDTOCLOSETHISGAP. Photo by Sonja Guina on Unsplash

0

5

10

15

20

25

30

Yes No Don'tknow Prefernottoanswer

DoesyourinstitutionacceptobjectsinadditiontothePDFofthethesisordissertation?

ADMINISTRATORS’SURVEY:

YES NO DON’TKNOW PREFERNOTTOSAY

STUDENTSURVEY:

Guidancedocuments

None

Software

Webinars

Workshops

Threeprongedapproach1.EducateETDprogrammanagers2.Educate students3.Educate faculty

Discussion

Photo by James Pond on Unsplash

Agenda09:00– 09:20:Welcome,Introductions09:20– 09:40:Activity09:40– 10:00:ManagingDataandComplexDigitalObjects10:00– 10:15:Break10:15– 11:15:ETD+Toolkit:DataOrganization,FileFormats,Storage11:15– 11:45:ConnectingStudentNeedstoUniversityNeeds11:45– 12:00:Wrapup

TheETD+Toolkithelpstheacademiccommunitytotrainstudentstoensurethelongevityandaccessibilityoftheirresearchoutputs.

WhatistheToolkitAnopensetofsixmodulesandevaluationinstrumentsthatpreparestudentstocreate,store,andmaintaintheirresearchoutputs.

MODULE 1: COPYRIGHTHow can students gain appropriatepermissions and how can studentssignal copyright for their own works?

MODULE 2: DATA ORGANIZATIONHow can students structure, describe,store, and deposit data and researchfiles for reuse and/or future access?

MODULE 3: FILE FORMATSHow will the formats students choosemake future access to their researcheasier or more difficult?

MODULE 4: METADATAHow can students store informationdescribing their files to make sure theycan tell what they are in the future?

MODULE 5: STORAGEHow can students make well informedchoices about where to store theirresearch materials?

MODULE 6: VERSION CONTROLWhat mechanisms can students use tomake it easier to see the history of a filewith multiple versions?

Eachmoduleincludes:1. LearningObjectives2. One-pageHandout3. GuidanceBrief(customizable)4. Slideshowwithpresenternotes5. Evaluationsurvey

WhocanusetheToolkit?Anyonemayfreelyadoptandadaptthistoolkit.

http://educopia.org/etdplustoolkit

LINKTOTHETOOLKIT

https://educopia.org/publications/etdplustoolkit

https://educopia.org/publications/etdplustoolkit

https://educopia.org/publications/etdplustoolkit

Solet’sexperienceacoupleofthese!

ETD+ audio-videofiles

digitalart

GISdatasetsvisualizations

softwarecode

digitaltext

researchdata

MODULE 1: COPYRIGHTHow can students gain appropriatepermissions and how can studentssignal copyright for their own works?

MODULE 2: DATA ORGANIZATIONHow can students structure, describe,store, and deposit data and researchfiles for reuse and/or future access?

MODULE 3: FILE FORMATSHow will the formats students choosemake future access to their researcheasier or more difficult?

MODULE 4: METADATAHow can students store informationdescribing their files to make sure theycan tell what they are in the future?

MODULE 5: STORAGEHow can students make well informedchoices about where to store theirresearch materials?

MODULE 6: VERSION CONTROLWhat mechanisms can students use tomake it easier to see the history of a filewith multiple versions?

DataOrganization

Photo by jesse orrico on Unsplash

KeytakeawayThedecisionsyoumakeabouthowyouorganizeandstructureyourdatatodaywillhaveimplicationsforhowyouandotherscanaccessandmakeuse(orsense!)ofthatdatainthefuture.

Photo by Fede Casanova on Unsplash

Whyisdatahardtodealwith?• Datawithoutdatadocumentation(e.g.,adatadictionary)

isoftenimpossibletounderstand.

Whyisdatahardtodealwith?• Datawithoutdatadocumentation(e.g.,adatadictionary)

isoftenimpossibletounderstand.• Withoutaccesstospecific(oftenexpensive)software,a

datafilemaybeunabletobeviewedorused.

Whyisdatahardtodealwith?• Datawithoutdatadocumentation(e.g.,adatadictionary)

isoftenimpossibletounderstand.• Withoutaccesstospecific(oftenexpensive)software,a

datafilemaybeunabletobeviewedorused.• IRBandfunderrequirementsmayimpactthewayyou

needtostructureyourdata.

Whyisdatahardtodealwith?• Datawithoutdatadocumentation(e.g.,adatadictionary)

isoftenimpossibletounderstand.• Withoutaccesstospecific(oftenexpensive)software,a

datafilemaybeunabletobeviewedorused.• IRBandfunderrequirementsmayimpactthewayyou

needtostructureyourdata.• Asdatausageincreases,dataoftenneedstobe

interoperableinordertoenablesharingandreuse.

Structuringyourdatawellhelpsyouto…ReproduceresultsReuseitinthefutureShareitwithothersGainandretaincredibilityComplywithIRB/funderrequirements

Photo by Joe Pizzio on Unsplash

Questionstoask…repeatedly!1. Whatarethedataorganizationstandardsforyour

field?2. Whatarethedataexportoptionsinthesoftware

youareusing?3. Whatformsofthedatawillbeneededforfuture

access?

Photo by Khara Woods on Unsplash

ProvidingcontextforyourdataDocument:1. Thedata’spurpose2. Alistofthefilesinyourdatapackage3. Datadictionarylistinganddescribingallvariables

Dataorganizationprinciples• Useonevariablepercolumn.• Makeoneobservationperrow.• Usehuman-readablecolumnnames.• Includeonetablepertab.• Indicaterelationshipsbetweentablesusingakey.

MovieTitle Director Distributor RunningTime Budget ReleasedPeterPan HerbertBrenon ParamountPictures 105minutes 40,030 Dec291924GirlShy FredC.Newmeyerand

SamTaylorPatheExchange 82minutes 400,000 Apr201924

Greed EricVonStroheim Metro-Goldwyn-Mayer 140minutes 665,603 Dec4, 1924

• ConsiderwhatyourNULLvaluesareandhowtheyarerepresented

• Considerwhatcontextualdocumentationisrequired

• Usestandarddatarepresentations(e.g.,(YYYYMMDDfordates)

• Useformattingtoconveyinformation

• Placecommentsincells• Usespecialcharactersin

fieldnames• Useblankspacesorsymbols

incolumnnames

Do Donot

• SocialSciences:ICPSRhttp://www.icpsr.umich.edu/icpsrweb/deposit/index.jsp

• Genomics:GenBank https://www.ncbi.nlm.nih.gov/genbank/• EarthSciences:NASA’sEarthdata https://earthdata.nasa.gov/• Archaeology:tDAR http://www.tdar.org/• Oceanography:NODChttp://www.nodc.noaa.gov/• BioSciences:Dryadhttps://datadryad.org/

Discipline-baseddatarepositories

Source- GuidanceBriefs:ManagingYourETDResearchFiles

Examplehand-out

• Chooseonespreadsheetyouareusingforacurrentdata-gatheringproject.o Usingthe“DataOrganizationPrinciples,”checktoseeifyour

filemeetsthoserequirements.o Createadatadictionaryforthespreadsheetthatdescribes

themeaningofeachcolumnheader.

Activity

FileFormats

• Images:jpg,gif,tiff,png,ai,svg,...• Video:mpeg,m2tvs,flv,dv,...• GIS:kml,dxf,shp,tiff,...• CAD:dxf,dwg,pdf,…• Data:csv,mdf,fp,spv,xlx,tsv,...

ExamplesofFileFormats

• Usesoftwarethatimportsandexportsdataincommonformats.

KeyConcepts

• Usesoftwarethatimportsandexportsdataincommonformats.

• Askadvisorsandcolleagueswhatformatstheyuseandwhy.

KeyConcepts

• Usesoftwarethatimportsandexportsdataincommonformats.

• Askadvisorsandcolleagueswhatformatstheyuseandwhy.• Chooseaformatwithfunctionsthatsupportyourresearch

needs.

KeyConcepts

• Usesoftwarethatimportsandexportsdataincommonformats.

• Askadvisorsandcolleagueswhatformatstheyuseandwhy.• Chooseaformatwithfunctionsthatsupportyourresearch

needs.• Saveyourcontentinmultipleformatstospreadyourrisk

acrosssoftwareplatforms(e.g.,docx,pdf,&txt;ormp4,avi,&mpg).

KeyConcepts

• SustainabilityofDigitalFormatshttp://www.digitalpreservation.gov/formats/content/content_categories.shtml

• RecommendedFormatsStatementhttps://www.loc.gov/preservation/resources/rfs/

Choosingafileformat

• RobustLinkshttp://robustlinks.mementoweb.org/• Archive-Ithttps://archive-it.org/• Wayback Machinehttp://waybackmachine.org/• Screenshots

SavingWebresources

• Optionsincludeproprietary,freeware,andopensourcesolutions.

FileFormatConversions

• Optionsincludeproprietary,freeware,andopensourcesolutions.

• Formatsinbroaduseusuallyhavemoreavailableoptionsforconversion.

FileFormatConversions

• Optionsincludeproprietary,freeware,andopensourcesolutions.

• Formatsinbroaduseusuallyhavemoreavailableoptionsforconversion.

• Whenyouconvertthefile,recognizethattheprocessmaytransformyourcontent.

FileFormatConversions

• Optionsincludeproprietary,freeware,andopensourcesolutions.

• Formatsinbroaduseusuallyhavemoreavailableoptionsforconversion.

• Whenyouconvertthefile,recognizethattheprocessmaytransformyourcontent.

• Beforeyouconvert,identifywhatcharacteristicsaremostimportanttomaintainintheconversionprocess.

FileFormatConversions

• Embedfonts.• Embedhyperlinks.• Stabilizehyperlinks.• Storesupplementarymaterialsasseparatefiles.• VerifyPDF/Acompliance.• TestEVERYTHING.

PDF-specificadvice

ManyETDprogramsfavorpdffiles.Ifyouexportresearchoutputstopdf,makesureyou:1. Embedyourfonts2. Embed(andtest!)hyperlinks3. Stabilizeyourweb-basedresourcesand

citations(usingatoollikeRobustLinks,Archive-It,orPermaCC)

4. Storesupplementarymaterialsasseparatefiles

5. VerifythePDF/Acompliance(useAcrobatPro“Preflight”featureunder“Edit)

Beforeyouundertakeanyconversion,youneedtoidentifywhatcharacteristicsofyourdataareimportanttomaintainduringtheconversion.Forexample,arethecolorsinadocumentorimageimportant?Isthepaginationessential?Whataboutreferences?Youwillwanttotesttheseafteryourconversioniscompletetoensurethatyouhaveaconversionthatwillmeetyourneeds.

AdditionalResources:● ListofFileFormats(Wikipedia)● RecommendedFormatsStatement

(Library ofCongress)● EvaluatingYourFileFormats (UK

NationalArchives)● ReformattingGuides (USNational

Archives)

FileFormats

Howtoselectfileformats:● Usesoftwarethatimportsandexportsdata

incommonformats.● Askadvisorsandcolleagueswhatformats

theyuse.● Chooseaformatwithfunctionsthatsupport

yourresearchneeds.● Savefinalversionsofyourcontentin

multipleformatsinordertospreadyourriskacrossmultiplesoftwareplatforms(e.g.,docx,pdf,andtxt;ormp4,avi,andmpg).

Ifyouusewebsite-basedmaterialsasevidenceorreferences,takeprecautionstoensurethatifthecontentmoves,changes,ordisappears,youstillhaveevidenceofitsexistence.CurrenttoolstohelpyouensurethelongevityofthesematerialsincludeRobustLinks andArchive-It.Youcanalsotakescreenshotsofimportantdigitalcontentinordertopreservethelookandfeelofanobject.

Thereisnoperfectfileformat.Eachwillhaveadvantagesanddisadvantagesdependingonyourresearchuses.Selectafileformat,orsetoffileformats,thathelpsyoucompleteyourresearchnow,andthatyoucanaccessagaininthefuture.Thisisimportantbothforyourresearchoutputs(whatyoucreate)andyourresearchinputs(materialsyouuseintheresearchprocess).

Commonfiletypesinclude:Images:jpg,gif,tiff,png,ai,svg,…Video:mpeg,m2tvs,flv,dv,…GIS:kml,dxf,shp,tiff,…CAD:dxf,dwg,pdf,…Data:csv,mdf,fp,spv,xlx,tsv,…Text:txt,rtf,tvi,doc,pdf…

Considerwhatmighthappenifyoucannolongeruseyoursoftware.Whetherthesoftwarepublishergoesbankrupt,thelatestversionrefusestoreadolderdata,oryoucan’taffordapersonallicenseforitafteryougraduate,theendresultisthesame.Losingaccesstoyoursoftwarecanmeanlosingyourdata,especiallyifitistheonlysoftwarethatcanreadyourdata.

Examplehand-out

Lookatafolderofyourresearchmaterialsandanswerthefollowingquestions.• Whatsoftwaredoyouneedtoaccessthesematerials?• Isthereariskoflosingaccesstothatsoftware,noworlater?• Wouldacolleaguebeabletoopenanduseyourmaterialsifyou

sharedthem?• Canyousubmityourthesis/dissertationanditsrelatedresearch

materialsusingfileformatssupportedbyyoursoftware?

Activity

Storage

Photo by Samuel Zeller on Unsplash

WhyStoringCopiesinMultipleLocationsMatters…

Viruses

File Corruption

Physicaldisasters

Theft

Storage device malfunctions

Accidental erasure

Malicious deletion

Overwritten files

Lost password or key

Hacking

Bit Rot

Acopyofyourdigitalcontent,ideallystoredinadifferentlocationfromtheoriginal,usuallymadetopreventdataloss.

Back-up

Photo by Kalle K on Unsplash

• Laptop• Desktop• Externalharddrive(spinningorSSD)• Flashdrive• The“cloud”

Commonstorageoptions

• Maintainatleastonelocal(i.e.,non-cloud-based)copyofyourcontent.

• Maintainatleastthreeseparatecompletecopiesofyourresearchcontent.

• Maintainatleastoneofthosecopiesinadifferentgeographiclocation.

• Maintainahistoryofchangesinatleastonelocation(e.g.,usinga“TimeCapsule”).

Storagerecommendations

Theseriesofmanagedactivitiesnecessarytoensurecontinuedaccesstodigitalmaterialsforaslongasnecessary

-DigitalPreservationCoalition

Preservation

Photo by Jakob Owens on Unsplash

• Produceandmaintainaninventoryofallofyourcontent,documentingfilenames,sizes,locations,types,and“checksums”.

• Createandregularlycheck“checksums”foryourmostimportantresearchfiles.

• Employatoollike“Fixity”toscanspecifiedfoldersordirectoriesonaregularbasisandreportchangestoyouviaemail.https://github.com/avpreserve/fixity

Managedactivitiesandpreservation

• Systematizeyourfolder- andfile-nameconventionsusinghuman-identifiablenames.

• Usenamingconventionstomarkversionsoffiles(e.g.,MusicofSocialChange-v12.csv).

• Makesureyourfilenamesarefollowedbythecorrectfileextension(e.g.,.txt,.csv).

• Avoidusingspecialcharactersinallfileandfoldernames(e.g.,\?:*?<>{}[]&$,;.!).

Moremanagedactivities…

• JesusVigo,“WorldBackupDay:BestPracticestoBackupYourData,”TechRepublichttp://www.techrepublic.com/article/world-backup-day-best-practices-to-backup-your-data/

• Forgeneralinfoonarchivingandbackingupcontent,seethePersonalDigitalArchivingresources.http://digitalpreservation.gov/personalarchiving/

Resources

StorageBack-up:Acopyofyourdigitalcontent,ideallystoredinadifferentlocationfromtheoriginal,usuallymadetopreventdataloss.Preservation:The“seriesofmanagedactivitiesnecessarytoensurecontinuedaccesstodigitalmaterialsforaslongasnecessary”.–DigitalPreservationCoalition

Whereandhowyouchoosetostoreyourresearchmaterialsandwritingswilldeterminehowlongtheysurvive.Tomitigateagainstloss,makeyourownback-upsonaregular,formalizedschedule(e.g.dailyorweekly).

Threatstostorageenvironments:• Naturaldisaster• Humanerror• Humanmalice• Drivefailure• Formatobsolescence• Mediaobsolescence• Bitrot• Businessfailure• Softwareorhardwareerror

Advancedrecommendations:1. Produceandmaintainaninventoryofallof

yourcontent,documentingfilenames,sizes,locations,andtypes

2. Createandregularlycheck“checksums”ordigitalsignaturesforyourmostimportantresearchfiles.Checksumscanbegeneratedbyseveralopensourcetoolsandutilitiesandtheycanbestoredinyourinventory.

3. Monitoryourcontenttoensuremissing,moved,andrenamedfilesareautomaticallybroughttoyourattention.Atoollike“Fixity”canscanspecifiedfoldersordirectoriesonaregularbasisandreportchangestoyouviaemail.

Resources1. For“back-up”advice,seeJesusVigo,

BestPracticestoBackupYourData2. Formoreoncloud-basedbackups,pleasesee

CharlesBeagrieLtd.HowCloudStoragecanaddresstheneedofpublicarchivesintheUK

3. Forgeneralinformation,seealsoPersonalDigitalArchiving

Basicrecommendations:1. Maintainatleastonelocal(i.e.,non-cloud-

based)copyofyourcontent2. Maintainatleastthreeseparatecomplete

copiesofyourresearchcontent3. Maintainatleastonecopyinadifferent

geographiclocation4. Maintainahistoryofchangesinatleastone

location(e.g.,usinga“TimeCapsule”softwarepackagetoautomaticallybackupyourcontentwithoutdeletingoldercopies)

5. Documentinatextfilehow,when,andwhereyoustoreandbackupyourmaterials

6. Systematizeyourfolder- andfile-nameconventionsusinghuman-identifiableinformation

7. Usenamingconventionstomarkversionsoffiles,e.g.,usingconsecutivenumberstotrackafilethroughalleditsandrevisionsthattakeplacetoit.(e.g.,filename-v12.txt)

8. Makesureyourfilenamesarefollowedbythecorrectfileextension(e.g.,.txt,.csv)

9. Avoidusingspecialcharactersinallfileandfoldernames(e.g.,\?:*?<>{}[]&$,;.!)

10. Documenttheformatsyouaremanagingandthepotentialsustainabilityissues

11. Saveacopyofyourresearchfilesinnonproprietaryformats,sothatyoudon’tneedasoftwarelicensetorenderandusethem.

Examplehand-out

• Takeoneprojectyouareworkingonnow,anddevelopaspreadsheet-basedinventoryfortheassociatedfilesindicatingfilenames,sizes,types,andstoragelocations.(Usehttp://www.cdlib.org/services/dsc/contribute/docs/submission.inventory.rtf asaguide).

• Establisharegularroutineforbackingupyourcontentinatleastoneadditionallocation.Makesuretheroutineincludesaregularschedule,awayofstoringcontentorganizedbythedateofabackup,andawaytomaintainmultiplebackupssimultaneously.

Activity

ETDplus team:EducopiaInstitute

MetaArchive Cooperative

NDLTD

ProQuest

CarnegieMellonUniversity

ColoradoStateUniversity

HBCULibraryAlliance

IndianaStateUniversity

OregonStateUniversity

PennStateUniversity

PurdueUniversity

UniversityofLouisville

UNCSchoolofLibraryandInformationScience

UniversityofNorthTexas

UniversityofTennessee- Knoxville

VirginiaTechUniversity

79

ConnectingStudentandUniversityNeeds

Photo by Bryan Minear on Unsplash

StudentneedsTrainingopportunitiesRelevantexamplesUniversity“sealofapproval”FaithininstitutionalsafetyGuidelinesandrewards

UniversityneedsTrainingmaterials

ProfessionaldemonstratorsBrandingpower

ContributionstorepositoryStandardsandexpectations

Photo by Ivars Krutainis on Unsplash

Questions?

KatherineSkinnerkatherine@educopia.org

@educopia

Pleaseletusknowwhatyouthink!

ETD+ audio-videofiles

digitalart

GISdatasetsvisualizations

softwarecode

digitaltext

researchdata

https://educopia.org/publications/etdplustoolkit

top related