policy template workbook irods...
TRANSCRIPT
1
PolicyTemplateWorkbook–iRODS4.2
DataNetFederationConsortium
Sheau‐YenChen,MikeConway,JonCrabtree,CalLee,SunithaMisra,ReaganW.Moore,ArcotRajasekar,TerrellRussell,IsaacSimmons,LisaStillwell,Helen
Tibbo,HaoXu
August25,2015
2
PolicyWorkbook‐iRODS4.2bySheau‐YenChen,MikeConway,JonCrabtree,CalLee,SunithaMisra,ReaganW.Moore,ArcotRajasekar,TerrellRussell,IsaacSimmons,LisaStillwell,HelenTibbo,HaoXu
Copyright2015bytheiRODSConsortium.Allrightsreserved.PrintedintheUnitedStatesofAmerica.
PublishedbytheiRODSConsortium,100EuropaDrive,Suite540,ChapelHill,NorthCarolina,27517USA.
September2015
AcknowledgementsThisresearchwassupportedby:NSFITR0427196,Constraint‐BasedKnowledgeSystemsforGrids,DigitalLibraries,andPersistentArchives(2004–2007).NARAsupplementtoNSFSCI0438741,Cyberinfrastructure;FromVisiontoReality—DevelopingScalableDataManagementInfrastructureinaDataGrid‐EnabledDigitalLibrarySystem(2005–2006).NARAsupplementtoNSFSCI0438741,Cyberinfrastructure;FromVisiontoReality—ResearchPrototypePersistentArchiveExtension(2006–2007).NSFSDCI0910431,SDCIDataImprovement:DataGridsforCommunityDrivenApplications(2007–2010).NSF/NARAOCI0848296,NARATranscontinentalPersistentArchivePrototype(2008–2010).NSFOCI1032732,SDCIDataImprovement:ImprovementandSustainabilityofiRODSDataGridSoftwareforMulti‐DisciplinaryCommunityDrivenApplications(2010‐2012).NSFOCI0940841,DataNetFederationConsortium(2011‐2015).Theviewsandconclusionscontainedinthisdocumentarethoseoftheauthorsandshouldnotbeinterpretedasrepresentingtheofficialpolicies,eitherexpressedorimplied,oftheNationalArchivesandRecordsAdministration(NARA),theNationalScienceFoundation(NSF),ortheU.S.Government.
3
Abstract
Policy‐baseddatamanagementsystemssuchastheintegratedRuleOrientedDataSystem,automatetheenforcementofmanagementpolicies,automateadministrativetasks,andautomatethevalidationofassessmentcriteria.Thisbookpresentspolicysetsappliedinsixtypesofdatamanagementapplications:1)datasharing;2)studentdigitallibrary;3)productiondatacenters;4)preservation;5)protecteddatamanagement;and6)NSFDataManagementPlans.
4
TableofContents
1 Introduction ............................................................................................................ 11.1 PolicyLibrary..................................................................................................................................................81.2 Summary........................................................................................................................................................12
2 Data Sharing Policy Set .......................................................................................... 172.1 Manageusercreation(Policy1)..........................................................................................................172.2 Manageuserdeletion(Policy2)..........................................................................................................182.3 Managerenamingofadatagrid(Policy3).....................................................................................182.4 SetthemaximumnumberofI/Ostreams(Policy4)..................................................................192.5 Bypasspermissionchecksforregisteringafile(Policy5).......................................................192.6 Setpolicyfordefiningphysicalpathnameforafile(Policy6)...............................................202.7 Setnumberofexecutionthreadsusedtoprocessrules(Policy7).......................................202.8 Setpolicyforprocessingfilesinbulk(Policy8)...........................................................................212.9 Manageindexingofthesystemstatecatalog(Policy9)............................................................212.10 Setstoragequotapolicy(Policy10)................................................................................................222.11 Manageselectionofstorageresource(Policy11).....................................................................22
3 DataManagementPolicySet(SILSLifeTimeLibrary) ................................... 243.1 Turnonstoragequotaenforcement(Policy10)...........................................................................243.1.1 Checkformissingquotas........................................................................................................................243.1.2 Calculatetotalstorageusage...............................................................................................................243.1.3 Identifypersonswhoexceededtheirquota....................................................................................253.1.4 Periodicallyupdatequotacheck.........................................................................................................25
3.2 Manageselectionofstorageresource(Policy11).......................................................................263.3 Manageselectionofstorageresourceforreplication(Policy12).........................................263.4 Enforcereplicationofeachnewfile(Policy13)...........................................................................263.5 Manageaccesscontrolpolicy(Policy14)........................................................................................27
4 DataAdministrationPolicySet(RDAPracticalPolicyworkinggroup) ...... 294.1 Dataaccesscontrolpolicies(Policy14)...........................................................................................294.1.1 FindtheUser_IDassociatedwithaUser_name:..........................................................................294.1.2 FindtheFile_IDassociatedwithafilename:................................................................................304.1.3 Setwriteaccesscontrolforauser:....................................................................................................304.1.4 Setoperationsthatareallowablefortheuser"public"...........................................................314.1.5 Checktheaccesscontrolsonafile:....................................................................................................32
4.2 Dataformatcontrolpolicies(Policy15)..........................................................................................334.2.1 Setformatconversionflag.....................................................................................................................334.2.2 Invokeformatconversion.......................................................................................................................344.2.3 Identifyandarchivespecificfileformatsfromastagingarea.............................................34
4.3 NotificationPolicies(Policy16)...........................................................................................................354.3.1 Notifyoncollectiondeletion.................................................................................................................364.3.2 Notificationofevents................................................................................................................................36
4.4 Useagreementpolicies(Policy17)....................................................................................................374.4.1 Setreceiptofsigneduseagreement..................................................................................................374.4.2 Identifyuserswithoutsigneduseagreement...............................................................................38
4.5 Integritypolicy(Policy18)....................................................................................................................384.5.1 Verifyaccesscontrolsonfiles...............................................................................................................384.5.2 Checkintegrityandnumberofreplicasoffilesinacollection.............................................39
v
4.6 Metadataextraction(Policy19)..........................................................................................................424.6.1 LoadmetadatafromanXMLfile........................................................................................................424.6.2 Loadmetadatafromapipe‐delimitedfile......................................................................................434.6.3 Contextualmetadataextractionthroughpatternrecognition............................................444.6.4 Strippingmetadatafromafile............................................................................................................45
4.7 Databackuppolicies(Policy20).........................................................................................................464.7.1 Dataversioningpolicy.............................................................................................................................464.7.2 Databackupstagingpolicy...................................................................................................................474.7.3 Copyfilestoafederatedstagingarea..............................................................................................49
4.8 Dataretentionpolicies(Policy21).....................................................................................................504.8.1 Purgepolicytofreestoragespace.....................................................................................................504.8.2 Dataexpirationpolicy..............................................................................................................................51
4.9 Dispositionpolicyforexpiredfiles(Policy22).............................................................................524.10 Restrictedsearchingpolicy(Policy23).........................................................................................534.10.1 Strictaccesscontrol...............................................................................................................................534.10.2 Controlledqueries...................................................................................................................................53
4.11 Storagecostreports(Policy24)........................................................................................................534.11.1 Usagereportbyusernameandstoragesystem.......................................................................534.11.2 Costreportbyusernameandstoragesystem...........................................................................54
5 OdumDataPreservationPolicyset ................................................................. 565.1 Automateaccessrestrictions(Policy14)........................................................................................565.1.1 Setinheritanceofaccesscontrolsonacollection.......................................................................565.1.2 Checkwhetheraspecificpersonhasaccesstoacollection....................................................575.1.3 Identifyallpersonswithaccesstofilesinacollection..............................................................575.1.4 Identifyfilesthatcanbeaccessedbyanaccount........................................................................585.1.5 Deleteaccesstofilesforaspecifiedaccount.................................................................................585.1.6 Copyfiles,accesscontrollists,andAVUstoafederateddatagrid.....................................59
5.2 Normalizedatatonon‐proprietaryformats(Policy15)...........................................................615.2.1 Detectionofformattype.........................................................................................................................615.2.2 Automateformattypedetection.........................................................................................................625.2.3 Identifyfileformatextensionsinacollection...............................................................................62
5.3 CreationofPREMISeventdata(Policy16).....................................................................................635.3.1 CreatingPREMISeventinformation.................................................................................................635.3.2 SendingmessagesoverAMQP..............................................................................................................64
5.4 Automationofusersubmissionagreements(Policy17)..........................................................655.4.1 Stagingoffileswithausersubmissionagreement....................................................................65
5.5 AutomaticChecksums(Policy18)......................................................................................................665.5.1 CreatingaBagItfile..................................................................................................................................66
5.6 AutomatedcaptureofProvenance/contextualmetadata(Policy19).................................675.6.1 Provenanceforadministrativepolicies...........................................................................................67
5.7 Federation–periodicallycopydata(Policy20)...........................................................................735.8 De‐identificationofData(Policy25).................................................................................................745.8.1 BitCuratorbasedprocessing.................................................................................................................74
5.9 UniqueIdentifiersforDataSets(Policy26)...................................................................................825.9.1 AssigningaHandletoaFile..................................................................................................................835.9.2 RegisteringfilesinDataONEregistry...............................................................................................83
5.10 Authenticationidentitymanagement(Policy27).....................................................................845.10.1 Verifyaccesscontrolsoneachfile...................................................................................................84
5.11 AutomatedDataReviews(Policy28).............................................................................................84
vi
5.11.1 MetadataReview.....................................................................................................................................845.12 Mappingmetadataacrosssystems(Policy29)...........................................................................855.12.1 ValidateHIVEvocabularies................................................................................................................86
5.13 ExportDatasetsinMultipleFormats(Policy30).......................................................................865.13.1 PolyglotFormatConversion...............................................................................................................86
5.14 Checkforviruses(Policy31)..............................................................................................................875.14.1 Scanfilesandflaginfectedobjects..................................................................................................87
5.15 Rulesetmanagement(Policy32).....................................................................................................885.15.1 Deployrulesets.........................................................................................................................................88
5.16 Parseeventtrailforallpersonsaccessingacollection(Policy33)....................................89
6 Protected Data Policy Sets .................................................................................... 906.1 CheckforpresenceofPIIoningestion(Policy34)......................................................................926.2 Checkforvirusesoningestion(Policy31)......................................................................................926.2.1 Scanfilesandflaginfectedobjects.....................................................................................................936.2.2 Migratefilesthatpasstheviruscheck.............................................................................................93
6.3 Checkpasswordsforrequiredattributes(Policy35).................................................................936.4 Encryptdataoningestion(Policy36)..............................................................................................946.5 Encryptdatatransfers(Policy37)......................................................................................................946.6 Federation‐controldatacopies(Policy38)..................................................................................956.7 Federation‐manageremotedatagridinteractions(Policy32)............................................966.7.1 Updatingrulebaseacrossservers......................................................................................................97
6.8 Federation–CopyDatafromstagingarea(Policy20)...............................................................996.9 Federation‐managedataretrieval(Policy39)..........................................................................1006.10 Generatechecksumoningestion(Policy40)...........................................................................1026.11 Generatereportofcorrectionstodatasetsoraccesscontrols(Policy41).................1026.12 Generatereportforcost(time)requiredtoauditevents(Policy42)............................1036.13 Generatereportoftypesofprotectedassets(Policy43)....................................................1036.14 Generatereportofallsecurityandcorruptionevents(Policy44).................................1046.15 Generatereportofthepoliciesappliedtocollections(Policy45)...................................1046.15.1 Deployrulesets......................................................................................................................................1046.15.2 Updaterulesets.....................................................................................................................................1056.15.3 Printrulesets.........................................................................................................................................105
6.16 Listallstoragesystemsbeingused(Policy46).......................................................................1066.17 Listpersonswhocanaccessacollection(Policy47)............................................................1066.18 Liststaffbypositionandrequiredtrainingcourses(Policy48)......................................1076.18.1 Setpositionandtraining...................................................................................................................1076.18.2 Liststaffbypositionandtraining.................................................................................................108
6.19 Listversionsoftechnologythatarebeingused(Policy49)...............................................1086.20 Maintaindocumentonindependentassessmentofsoftware(Policy50)...................1096.21 Maintainlogofallsoftwarechanges,OSupgrades(Policy51).........................................1096.21.1 Versionlogfiles......................................................................................................................................110
6.22 Maintainlogofdisclosures(Policy52).......................................................................................1106.23 Maintainpasswordhistoryonusername(Policy53)..........................................................1126.24 Parseeventtrailforallaccessedsystems(Policy54)..........................................................1126.25 Parseeventtrailforallpersonsaccessingcollection(Policy33)....................................1126.26 Parseeventtrailforallunsuccessfulattemptstoaccessdata(Policy55)...................1136.27 Parseeventtrailforchangestopolicies(Policy56).............................................................1136.28 Parseeventtrailforinactivity(Policy57).................................................................................1136.29 Parseeventtrailforupdatestorulebases(Policy58).........................................................114
vii
6.30 Parseeventtrailtocorrelatedataaccesseswithclientactions(Policy59)................1146.31 Providetestenvironmenttoverifypoliciesonnewsystems(Policy60)....................1146.32 Providetestsystemforevaluatingarecoveryprocedure(Policy61)...........................1156.33 Providetrainingcoursesforusers(Policy62)........................................................................1156.34 Replicatedatasetsoningestion(Policy13).............................................................................1166.35 ReplicateiCATperiodically(Policy63)......................................................................................1166.36 Setaccessapprovalflag(Policy64)..............................................................................................1166.36.1 Restrictaccessfor“Protected”data............................................................................................117
6.37 Setaccesscontrols(Policy14)........................................................................................................1186.37.1 Setaccesscontrolsafterproprietaryperiod............................................................................119
6.38 Setaccessrestrictionuntilapprovalflagisset(Policy65)................................................1206.39 Setapprovalflagpercollectionforenablingbulkdownload(Policy66).....................1206.40 SetassetprotectionclassifierfordatasetsbasedontypeofPII(Policy67)..............1216.41 Setflagforwhetherticketscanbeusedonfilesinacollection(Policy68)................1216.41.1 Removepublicandanonymousaccess.......................................................................................122
6.42 Setlockoutflagandperiodonusername‐countingnumberoftries(Policy69)...1226.42.1 Setlockoutperiodonusername...................................................................................................122
6.43 Setpasswordupdateflagonusername(Policy70)..............................................................1236.44 Setretentionperiodfordatareviews(Policy71)..................................................................1246.45 Setretentionperiodoningestion(Policy21)..........................................................................1256.46 Tracksystemsbytype(server,laptop,router,….)(Policy72)..........................................1266.47 Verifyapprovalflagswithinacollection(Policy73).............................................................1266.48 Verifyfileshavenotbeencorrupted(Policy18)....................................................................1276.49 Verifypresenceofrequiredreplicas(Policy74)....................................................................1276.50 Verifythatnocontrolleddatahavepublicoranonymousaccess(Policy75)............1276.50.1 Restrictaccessto“Protected”data..............................................................................................127
6.51 Verifythatprotectedassetshavebeenencrypted(Policy76)..........................................1286.51.1 CheckthatfileswithACCESS_APPROVAL=0areencrypted...........................................128
7 Data Management Plan Example Rules ............................................................... 1297.1 Staffingpolicies(Policy48)................................................................................................................1347.2 Costreporting(Policy24)...................................................................................................................1347.3 Collectioncreationplanning(Policy45).......................................................................................1367.4 Instrumentcontrol(Policy77)..........................................................................................................1377.5 Eventlogforcollectionformation(Policy54)............................................................................1387.6 Collectionreports(Policy41)............................................................................................................1397.7 Productformation(Policy17)...........................................................................................................1407.8 Datacategorymanagement(Policy78)........................................................................................1417.9 Re‐usingexistingdata(Policy79)...................................................................................................1427.10 Qualitycontrol(Policy80)...............................................................................................................1427.11 Analysisprocedures(Policy81)....................................................................................................1437.12 Analysiscollaborations(Policy82)..............................................................................................1447.13 Datadictionary(Policy29)..............................................................................................................1457.14 Namingcontrol(Policy83)..............................................................................................................1457.15 Dataformatcontrol(Policy16)......................................................................................................1467.16 Uniqueidentifiers(Policy27).........................................................................................................1467.17 Metadatastandard(Policy29).......................................................................................................1477.18 Metadataexport(Policy84)............................................................................................................1487.19 Collectioncreationsystem(Policy85)........................................................................................1497.20 Collectionsize(Policy86)................................................................................................................150
viii
7.21 Publicationoforiginaldata(Policy87)......................................................................................1517.22 Publicationofdataproducts(Policy88)....................................................................................1527.23 Re‐usepolicies(Policy89)...............................................................................................................1537.24 Distributionpolicies(Policy90)....................................................................................................1547.25 Privacyaccessrestrictions(Policy14)........................................................................................1557.26 IPRrestrictions(Policy91)..............................................................................................................1567.27 Webaccesspolicies(Policy92)......................................................................................................1587.28 Datasharingsystem(Policy93)....................................................................................................1587.29 Codedistributionsystem(Policy94)...........................................................................................1597.30 Retentionperiod(Policy21)...........................................................................................................1597.31 Curationplans(Policy95)................................................................................................................1607.32 Archivesystem(Policy96)...............................................................................................................1617.33 Replicationpolicy(Policy13).........................................................................................................1627.34 Backuppolicy(Policy97).................................................................................................................1637.35 Integrityverification(Policy18)...................................................................................................1647.36 Technologymanagementpolicies(Policy49).........................................................................1657.37 Metadatacatalogmanagement(Policy9)..................................................................................1657.38 Transformativemigration(Policy15).........................................................................................165
8 Verifying Policy Sets: ........................................................................................... 1668.1 AnalysisoftheintegratedRuleOrientedDataSystem............................................................1698.2 Policy‐enforcementpoints..................................................................................................................1708.3 Clientinvocationofpolicy‐enforcementpoints.........................................................................1708.4 Proceduresexecutedateachpolicyenforcementpoint.........................................................171
9 Summary: ............................................................................................................ 176
10 Acknowledgements: .......................................................................................... 176
11 References: ....................................................................................................... 176
Appendix A: Policy‐enforcement Points ................................................................... 178
Appendix B: Client Invocation of Policy Enforcement Points .................................... 180
Appendix C: Micro‐services ...................................................................................... 183
Appendix D: Persistent State Variables .................................................................... 194
Appendix E: Protected Data Requirements .............................................................. 200
Appendix F: Mauna Loa Sensor Data DMP ................................................................ 204
1
1 Introduction TheDataNetFederationConsortium(DFC)infrastructureenablescommunitiestoimplementtheirpreferreddatamanagementapplication.PartnerswithintheDFChaveimplementeddatasharingenvironments,datapublicationsystems(digitallibraries),datapreservationsystems(archives),datadistributionsystems,anddataprocessingsystems(processingpipelines).TheDFCsupportseachtypeofdatamanagementapplicationbyspecifyingasetofpoliciesthatenforcethedesiredpurposeforthecollection.Adatasharingenvironmentfocuseson:
Unifiednamespacesforusers,files,collections,metadata Accesscontrols Hierarchicalarrangement Integrity
Adigitallibraryfocuseson: Controllednamespacesforfiles,collections,metadata Descriptivemetadatastandards Standarddataformat PREMISeventdata
Anarchivefocuseson: Authenticity Integrity ChainofCustody Originalarrangement
Adatadistributionsystemfocuseson: Caching Replication Synchronization Accesscontrols
Aprocessingpipelinefocuseson: Controllednamespacesforusers,files,collections,metadata,andprocedures Sharingofprocedures,files Accesscontrols Provenanceofworkflows
Eachofthesetypesofdatamanagementapplicationscanbuilduponcommondatagridinfrastructurebychoosinganappropriatesetofpoliciesandprocedures.Thepoliciesdeterminewhenandwheretheproceduresareexecuted.WithintheintegratedRuleOrientedDataSystem(iRODS)datagrid,policiescanbeautomaticallyenforcedatpolicyenforcementpoints,orpoliciescanbeexecutedinteractivelybyauserorgridadministrator,orpoliciescanbescheduledfordeferredandperiodicexecution.Thepolicyenforcementpointstypicallycontrolmanagementpolicies.Deferredandperiodicexecutionareusedforadministrativetasks.Interactiveexecutionmaybeusedtovalidateassessmentcriteria.
2
ThisbooklistspolicysetsthathavebeenimplementedinaniRODSdatagrid,generatedinacademicclassesondigitallibrary,andprovidedbyusercommunities.Figure1liststhebasicconceptsunderlyingpolicy‐baseddatamanagement.Givenaspecificdatamanagementpurpose,acollectioncanbeassembledthathasdesiredpropertiessuchasintegrity,authenticity,andaccesscontrols.Thepropertiesthemselvesmayhaveassociatedrequirementssuchascompleteness(allfilesinthecollectionhaveeachproperty),correctness(incorrectvaluesformetadatahavebeenidentifiedandeliminated),consensus(thepropertiesrepresentthecombineddesireofthegroupassemblingthecollection),andconsistency(thesamemetadataanddataformatstandardshavebeenappliedtoallfilesinthecollection).
Figure1.Policy‐baseddatamanagementconceptgraph
Eachdesiredpropertyisenforcedbyasetofpolicies,thatdeterminewhenandwhereassociatedproceduresareexecuted.Thusanintegritypropertymayrequirepoliciesforgeneratingchecksumsandreplicatingfiles.Theassociatedproceduresareworkflowscomposedbychainingtogetherbasictasksorfunctions(alsocalledmicro‐services).Thefunctionsapplybasicoperationssuchasgenerateachecksum,orreplicateafile,orsetthedatatype.Theresultsofapplyingthefunctionsaresavedaspersistentstateinformationormetadataattributesonthefiles,users,storagesystems,policies,andmicro‐services.
3
Clientsinteractwiththesystembyrequestingactionsthataretrappedatpolicyenforcementpoints(PEP).AteachPEP,arulebaseisexaminedtodeterminewhichpolicytoapply,andtheassociatedprocedureisexecuted.Toimplementassessmentcriteria,policiescanbeexecutedperiodicallytoverifycollectionproperties.Weconsiderpolicysetsforthefollowingpurposes:
Datasharing,implementedinthestandardintegratedRuleOrientedDataSystem(iRODS)release[1].
Digitallibrarymanagement,implementedintheSchoolofInformationandLibraryScienceLifeTimeLibrary[2].
Distributeddatamanagement,implementedintheResearchDataAlliancePracticalPolicyworkinggroup[3].
Datapreservation,implementedintheDataNetFederationConsortium. Protecteddatamanagement,definedintheUNCadministratormanual,
https://www.med.unc.edu/security/hipaa/documents/ADMIN0082%20Info%20Security.pdf
DataManagementPlans,definedattheDataManagementPlanningtoolsite,https://dmptool.org
For each policy set, we define a set of iRODS rules that can be used to enforcemanagement policies, automate administrative functions, and validate assessmentcriteria.TherulesarewrittenintheiRODSrulelanguage[4‐5].Eachrulethatisruninteractivelyhasarulename,arulebodyenclosed inbracesthat iswritten intheiRODSrule language, INPUTvariables,andOUTPUTvariables. Anexampleruletosay“helloworld”is:
Mytestrule{#ruletowritehelloworld writeLine("stdout","$userNameClientsayshelloworld"};}INPUTnullOUTPUTruleExecOut
Note that “ruleExecOut” on an OUTPUT line will copy the output informationwritten to "stdout" to the user’s screen. This enables retrieval of informationgeneratedthroughinteractiveexecutionofarule.Iftheruleisexecutedatapolicyenforcementpointorexecutedperiodically,theoutputshouldbewrittentoalogfileandsavedwithin thedatagrid. Thesessionvariable, “$userNameClient”,containsthe name of the person who executed the command. The result printed to thescreenbyrunningthisrulefromaccountrwmoorewiththeirulecommandis:
rwmooresayshelloworldThefollowingexamples includerulesthatcanberuninteractivelybyauser,rulesthat are run by a data grid administrator, rules that are enforced at Policy‐EnforcementPoints,andrulesthatrunperiodicallyunderruleenginecontrol.
4
Rules that are applied at Policy‐Enforcement‐Points have a standard rule namerelated to the specific action that is being controlled. The INPUT variables aretypically replaced with session variables that track who is executing an externalaction. The INPUT variables may also be set through queries on the metadatacatalog. Rules can query a metadata catalog to retrieve information about thecollection, theusers, the storage systems, anduser‐definedmetadata. Inmanyofthefollowingexamples,aqueryismadetothemetadatacatalog,a“foreach”loopisthenused toprocess the rowsreturned fromthequery,parametersareextractedfromtherowstructureusinga“.”operator,and information isoutput toa log fileusingawriteLinemicro‐service.MoreinformationontheiRODSrulelanguagecanbefoundathttp://irods.org.andinthe“iRODSPrimer”[4].Policiesfromallsixpolicysetsareincludedinthisdocument. Thereissubstantialoverlap between policies from the Practical Policy working group, the DFCpreservationpolicyset,andtheDataManagementPlanset. ThepoliciesuniquetotheDFCpreservationpolicysetrequireinteractionwithexternalsystems,whicharelisted inTable1. Whilemanyofthepoliciesaresupportedwithinthe iRODSdatagrid,policiesmayrequire theuseofexternal technologies, suchas the InCommonauthentication system, theHIVEHelping InterdisciplinaryVocabularyEngineeringsystem,thePolyglotformattranslationservice,theBitcuratordataanalysissystem,andtheHandlefileidentifiersystem.Thepolicysetsareidentifiebythenumberintheleftmostcolumn.Whenpoliciesoverlapacrossthesixexampleareas,thepolicynumbercanbeusedtoidentifyrelatedpolicies.Atotalof97policysetshavebeendefined.Table1.Comparisonofpolicysetsfordatasharing,LifeTimeLibrary,RDAdatamanagement,DFCpreservation,ProtectedDataandDataManagementPlans.
Policies
iRODS default policies for data sharing
sils LifeTime Library policies
rda Practical Policy WG policies for admini-stration
odum policy set for preser-vation
hipaa Pro-tected Data
dmp Data Man-
agement Plans
Sup-porting Tech-nology
1 User creation X iRODS
2 User deletion X iRODS
3 Rename data grid X iRODS
4 Set number of I/O streams X iRODS
5 Server Permission checks X iRODS
6 Physical path name X iRODS
7 Execution threads X iRODS
8 Bulk processing X iRODS
9 Catalog indexing X X iRODS
10 Storage quota X X iRODS
11 Select storage X X iRODS
12 Select replication resource X iRODS
5
13 Replicate files X X X iRODS
14 Access controls X X X X X iRODS
15 Data format control policies X X X iRODS, Polyglot
16 Notification policies X X iRODS, message
bus
17 Use agreement policies X X X iRODS
18 Verify files have not been corrupted
X X X X iRODS,
SHA-128
19 Contextual metadata extraction policies
X X iRODS
20 Federation ‐ periodically copy data X X X iRODS
21 Data retention policies X X X iRODS
22 Data disposition policies X iRODS
23 Restricted searching policies X iRODS
24 Storage cost reports X X iRODS
25 De-identification of data. X Bitcurator, iRODS
26 Applying unique identifiers to data sets.
X X Handle, iRODS
27 Authentication protocols for repository users.
X In-
Common, iRODS
28 Automated metadata review X X iRODS
29 Mapping metadata across systems. X HIVE, iRODS
30 Ability to export datasets in multiple formats
X Polyglot, iRODS
31 Check for viruses on ingestion X X Clam-Scan,
iRODS
32 Federation ‐ manage remote data grid interactions
X X iRODS
33 Parse event trail for all persons accessing collection
X X iRODS, operation
s
34 Check for presence of PII on ingestion
X Bit-
curator, iRODS
35 Check passwords for required attributes
X iRODS
36 Encrypt data on ingestion X iRODS
37 Encrypt data transfers X iRODS
38 Federation ‐ control data copies X iRODS
39 Federation‐ manage data retrieval X iRODS
40 Generate checksum on ingestion X iRODS
41 Generate report by collection of corrections to data sets or access controls
X iRODS
42 Generate report for cost (time) required to audit events
X iRODS
43 Generate report of types of protected assets present within a
X iRODS
6
collection
44 Generate report of all security and corruption events
X iRODS
45 Generate report of the policies that are applied to the collections
X iRODS
46 List all storage systems being used X iRODS
47 List persons who can access a collection
X iRODS
48 List staff by position and required training courses
X X iRODS
49 List versions of technology that are being used
X X iRODS, opera-tions
50 Maintain document on independent assessment of software
X iRODS, opera-tions
51 Maintain log of all software changes, OS upgrades
X iRODS, operation
s
52 Maintain log of disclosures X iRODS, opera-tions
53 Maintain password history on user name
X iRODS
54 Parse event trail for all accessed systems
X X iRODS, opera-tions
55 Parse event trail for all unsuccessful attempts to access data
X Data-book,
iRODS
56 Parse event trail for changes to policies
X Data-book,
iRODS
57 Parse event trail for inactivity X Data-book,
iRODS
58 Parse event trail for updates to rule bases
X Data-book,
iRODS
59 Parse event trail to correlate data accesses with client actions
X Data-book,
iRODS
60 Provide test environment to verify policies on new systems
X iRODS, opera-tions
61 Provide test system for evaluating a recovery procedure
X iRODS, opera-tions
62 Provide training courses for users X Opera-
tions
63 Replicate iCAT periodically X iRODS
64 Set access approval flag X iRODS
65 Set access restriction until approval flag is set
X iRODS
66 Set approval flag per collection for enabling bulk download
X iRODS
67 Set asset protection classifier for data sets based on type of PII
X iRODS
68 Set flag for whether tickets can be used on files in a collection
X iRODS
69 Set lockout flag and period on user name ‐ counting number of tries
X iRODS
7
70 Set password update flag on user name
X iRODS
71 Set retention period for data reviews
X iRODS
72 Track systems by type (server, laptop, router,….)
X iRODS, opera-tions
73 Verify approval flags within a collection
X iRODS
74 Verify presence of required replicas
X iRODS
75 Verify that no controlled data collections have public or anonymous access
X iRODS
76 Verify that protected assets have been encrypted
X iRODS
77 Instrument Type X iRODS
78 Data category X iRODS
79 Use of existing data X iRODS
80 Quality control X iRODS
81 Analysis X iRODS
82 Data sharing during analysis X iRODS
83 Naming attributes X iRODS
84 Metadata export X iRODS
85 Collection location X iRODS
86 Size X iRODS
87 Make original data public X iRODS
88 Make data products public X iRODS
89 Re‐use X iRODS
90 Re‐distribution X iRODS
91 IPR X iRODS
92 Web access X iRODS
93 Data sharing system X iRODS
94 Code distribution system X iRODS
95 Curation X iRODS
96 Archive X iRODS
97 Backup frequency X iRODS
Typically,thereismorethanonewaytoprovidethefunctionsneededforaspecificpolicy, and more than one way to implement a policy. In practice, policies areneededtoinitializeenvironmentalvariables,toenforcemanagementdecisions,andto validate assessment criteria. Thus each policy area may require theimplementationofasetofpoliciesforeachusergrouporcollection.
8
1.1 Policy Library Tosimplifywritingthepolicies,alibraryofstandardpolicyfunctionshasbeendeveloped,calleddfc‐functions.re.Theoperationsthataresupportedare:1. addAVUMetadata(*Path,*Attname,*Attvalue,*Aunit,*Status)
AddAVUmetadatatoafile*Path TheiRODSpathtoafile;*Attname Theattributenametobeadded*Attvalue Theattributevaluetobeadded*Aunit Theattributevaluetobeadded*Status Thereturnstatus(“0”ifsuccessful)
2. addAVUMetadataToColl(*Coll,*Attname,*Attvalue,*Attunit,*Status)AddAVUmetadatatoacollection
*Coll TheiRODScollectionname*Attname Theattributenametobeadded*Attvalue Theattributevaluetobeadded*Attunit Theattributeunittobeadded*Status Thereturnstatus(“0”ifsuccessful)
3. addToList(*Name,*Usage,*Listnam,*Listuse,*Min,*Num)Addusageandnametoalistinsortedorder
*Name Anametobeaddedtoalistwhichissortedbyusage*Usage Theusageassociatedwiththename*Listnam Thereturnlistofnamesthatissorted*Listuse Thereturnlistofusagevaluesassociatedwiththenames*Min Settotheminimumusagevaluecurrentlyinthelist*Num Thesizeofthelist(fixedinputvalue)
4. checkCollInput(*Coll)Thischeckswhethertheinputvariableisacollection.
*Coll Thenameofthecollectiontocheck.Failsifcollectiondoesnotexist.
5. checkFileInput(*File)Thischeckswhethertheinputvariableisafile.
*File Thenameofthefiletocheck.Failsiffiledoesnotexist.6. checkMetaExistsColl(*Attname,*Coll,*Lfile,*Value)
Thischeckswhetheracollectionexists.*Attname Thenameofametadataattributethatshouldbepresent
forthecollection.Createdifmissingwithvalue“0”.*Coll Thenameofthecollectionthatisbeingchecked*Lfile Thenameoftheoutputbufferforerrormessages*Val Thevalueofthemetadataattribute,settozeroifthe
attributewasmissing7. checkPathInput(*Path)
Thischeckswhetheravalidpathnameexists. *Path TheiRODSpathnametobeverified(collection/file).
8. checkRescInput(*Res,*Zone)Thischeckswhethertheinputvariableisastorageresourceinzone*Zone.
*Res Thenameofastorageresourcetobechecked. *Zone ThenameoftheiRODSzonewhichhastheresource.
9. checkUserInput(*User,*Zone)Thischeckswhetertheinputvariableisauserinzone*Zone.
*User TheUSER_NAMEofauser. *Zone TheUSER_ZONEofauser.
10. checkZoneInput(*Zone)Thischeckswhetherthedesignatedzoneisaccessiblethroughfederation.
9
*Zone Thefederatedzonetobechecked.Routinefailsifthezoneisnotfederatedcorrectly.
11. contains(*list,*elem)Returnstrueiflistcontainstheelement
*list Thelistthatischecked. *elem Theelementstringwhichistestedforpresenceinthelist.
12. createCollections(*coll,*cs)Createasub‐collectionforeachentryinlist*csunder*coll
*coll Thefullpathtotheparentcollection.*cs Alistofsubdirectoriesthatareaddedtotheparent
collection.13. createList(*Lista,*Num,*Val)
Createalistoflength*Numwithdefault*Val *Lista Thelistthatisbeingcreated. *Num Thenumberofdefaultvaluestoputinthelist. *Val Thedefaultvalueforeachlistitem.
14. createLogFile(*Coll,*Sub,*Name,*Res,*LPath,*Lfile,*L_FD)Thiscreatesalogcollectionandalogfile.
*Coll Thefullpathtoacollection.*Sub Thesubdirectorythatiscreatedifnecessarytoholdthe
logfile.*Name Thenameofthelogfiletowhichatimestampisappended*Res Thestorageresourcewherethelogfileisstored.*Lpath Returnsthefullpathtothelogcollection(*Coll/*Sub)*Lfile Returnsthenameofthelogfile*L_FD Returnsthefiledescriptorforthelogfile.
15. createReplicas(*N,*Numrepl,*Lfile,*Ulist,*Rlist,*Jround,*Resource,*Coll,*File,*NumRepCreated)
Thiscreates*Nreplicasonalistofresources. *N Thenumberofreplicastocreateofafile.
*Numrepl Thenumberofstorageresourcesincludedinthelistofresources.
*Lfile Theoutputbuffernameforwritingerrormessages.*Ulist Alistthatissetto“1”whenareplicaexistsonastorage
resource*Rlist Thecorrespondinglistofstoragereplicas.*Jround Anindexintothelistofstorageresourcesforthestarting
resourcetouseforreplication.*Resource Theresourceusedasthesourceforthereplica.*Coll Thecollectionnameofthefilebeingreplicated.*File Thenameofthefilethatisreplicated.*NumRepCreated Acounterthatisincrementedasreplicasare
created.16. deleteAVUMetadata(*Path,*Attname,*Attvalue,*AUnit,*Status)
Thisdeletesametadataattributeandvaluefromafile.*Path Theirodsfullpathtoafile.*Attname Theattributenamethatwillbedeleted.*Attvalue Theattributevaluethatwillbedeleted.*Aunit Theattributeunitsthatwillbedeleted.*Status Thereturnstatusresult(“0”ifsuccessful).
17. ext(*p)Extractsextensionbyparsingstringforlettersafteradot
*p Thestringthatisbeingparsed.18. findZoneHostName(*Zone,*Host,*Port)
ThisreturnstheHostnameandPortforafederatedzone.
10
*Zone ThenameoftheiRODSzonewhichisbeingaccessed.*Host Returnsthehostnameextractedfrom
ZONE_CONNECTION.*Port ReturnstheportextractedfromZONE_CONNECTION.
19. getCollections(*filePaths)Returnslistofcollectionsbydeletingthefilename
*filePaths Convertsalistofpathsintoalistofcollections.20. getFiles(*localRoot,*localPaths)
Returnslistoffilesbystripping*localRootfromlist*localPaths*LocalRoot Thecollectionnamethatisstrippedfromtheinputpaths.*localPaths Returnsthelistoffiles
21. getNumSizeColl(*Coll,*colldataID,*Size,*Num)Thiscountsthenumberoffilesandtotalsizeinacollection.
*Coll Thefullpathtoacollection.*colldataID Thenumberandsizeiscalculatedforallfilesinthe
collectionwithDATA_ID>*colldataID.*Size Returnsthetotalsizeoffilesinthecollection.*Num Returnsthenumberoffilesinthecollection.
22. getRescColl(*Coll,*Rlist,*Ulist,*Lfile,*Num)Thiscreatesalistofstorageresourcesusedbyfilesinacollection.
*Coll Thefullpathtoacollectionthatisanalyzed.*Rlist Returnsalistofresourcesonwhichfileswerestored.*Ulist Returnsausagelistinitializedto“0”.*Lfile Theoutputbuffertowhichinformationiswritten.*Num Returnsthenumberofresourcesthatwerefound.
23. isColl(*LPath,*Lfile,*Status)Checkifcollectionexistsandcreateifnecessary.
*Lpath ThefullpathnameforanniRODScollection.*Lfile Theoutputbuffertowhichinformationiswritten.*Status Returns“0”ifthecollectiondoesnotexist.
24. isData(*Coll,*File,*Status)Thischeckswhetherafilealreadyexists.
*Coll ThefullpathnameforaniRODScollection.*File Thenameofafilethatistestedforpresenceinthe
collection.*Status Returns“0”ifthefiledoesnotexist.
25. modAVUMetadata(*Path,*Attname,*Attvalue,*Aunit,*Status)ThismodifiesanexistingAVUattributeonadatafile.
*Path ThefullpathtoafileiniRODS.*Attname Theattributenamethatisbeingmodifiedwithanewvalue
orunit.*Attvalue Thenewvaluethatisbeinginserted.*Aunit Thenewunitthatisbeinginserted.*Status Returnsthestatusoftheoperation.
26. selectRescUpdate(*Rlist,*Ulist,*Num,*Resource)Thisselectsaresourcetousefromalistofstorageresources.
*Rlist Alistofstorageresources.*Ulist Correspondinglistofusagewithvalue“1”ifthestorage
resourcehasareplica.*Num Thenumberofstorageresourcesinthelist.*Resource Returnsaresourcethatdoesnotstoreareplica.
27. sendAccess(*AccessType,*UserName,*DataId,*DataType,*Time,*Description,*eventOutcome,*host,*queue)
GeneratesanaccesseventmessageandsendsitusingAMQP*AccessType Inputtypeofaccessevent.
11
*UserName Inputnameofuserwhocausedtheevent.*DataId InputDATA_IDofafilethatwasmanipulated.*Time Inputdatewhentheeventoccurred.*Description Inputdescriptionoftheevent.*eventOutcome Inputeventoutcome.*Host Inputaddressofhostwheretheeventinformationissent.*queue Inputqueuewherethemessageissent.
28. sendLinkingEvent(*DataId,*AccessId,*host,*queue)GenerateaJSONdocumentdescribingalinkbetweenobjects.
*DataId InputDATA_IDoffilethatwasmanipulated.*AccessId Inputeventidentifiervalue.*host Inputaddressofhostwheretheinformationissent.*queue Inputqueuewherethemessageissent.
29. sendRelatedEvent(*relationshipType,*relationshipSubType,*DataIds,*AccessIds,*host,*queue)
CreatesaJSONdocumentdescribingarelatedeventbetweenobjects.*relationshipType Inputtypeofrelationship.*relationshipSubType Inputsubtypeforrelationship.*DataIds ListofDATA_IDsforfilesthatarerelated.*AccessIds ListofaccessIDsforthefiles.*host Inputaddressofhostforsendingamessage.*queue Inputqueuewheremessageissent.
30. updateCollMeta(*Coll,*Attr,*OldValue,*NewValue,*Lfile)Thisupdatesametadataattributeonacollection.
*Coll Pathtoacollectionwhosemetadataismodified.*Attr Collectionattributenamewhosevalueismodified.*OldValue Originialvalueforattribute.*NewValue Newvalueforattribute.*Lfile Nameofbufferwhereinformationiswritten.
31. uploadFiles(*localRoot,*localPaths,*coll)Movesfilesin*localPathstothecollection*coll
*localRoot Thecollectionnamethatisstrippedfromtheinputpaths.*localPaths Listoffilepathnames.*coll Nameofcollectionwherefilesarecopied.
32. verifyReplicaChksum(*Coll,*File,*Lfile,*Num,*Rlist,*Ulist0,*Ulist,*Numr,*NumBad)Thisverifieschecksumsonthereplicasforafile.
*Coll Collectionwhosefileswillbecheckedforintegrity.*File Thefileinthecollectioncheckedforreplicas.*Lfile Nameofoutputbufferwhereinformationiswritten.*Num Numberofstorageresourcesinthestorageresourcelist.*Rlist Listofstorageresourcesusedbythecollection.*Ulist0 Alistthathasbeeninitializedto“0”.*Ulist Returnslistofresourcesthatwereusedtostoreareplica.*Numr Returnsthenumberofreplicasthatexistonthestorage
resources.*NumBad Returnsthenumberoffilesthathaveabadchecksum.
Theruleexamplesassumethatthelibraryofpolicyfunctionshasbeenenteredintotheconfigurationfile,/etc/irods/server_config.json,byadditiontothere_rulebase_set:
"re_rulebase_set":[{"filename":"core,dfc‐functions"}]
12
Thelibraryofpolicyfunctionsiscalleddfc‐functions.reandisavailablefordownloadathttp://github.com/DICE‐UNC/policy‐workbook/dfc‐functions.re.ApolicyfunctionforencodingastringintoJSONisavailablefromthepolicyfunctionfilejson‐encode.reathttp://github.com/DICE‐UNC/policy‐workbook.1. jsonEncode(*str)
Thisescapesallspecialcharactersinastring. *str Astringthatisprocessedforspecialcharacters
Eachpolicyimplementsaworkflowthatreliesuponinputvariables,sessionvariables,andpersistentstateinformationtomanagetheworkflowoperations.Eachpolicyisdefinedbythesetofoperationsandvariablesthatareapplied.AcopyofeachpolicywrittenintheiRODSrulelanguageisavailableathttp://github.com/DICE‐UNC/policy‐workbook.DefinitionsoftheworkflowoperationsaregiveninAppendixC.DefinitionsofthepersistentstatevariablesaregiveninAppendixD.
1.2 Summary Thisbookpresentstemplatesfor130policies.Theresultingruleswereanalyzedtodeterminethetasksthatwereautomated,thesessionvariablesthatwereused,thepersistentstateinformationthatwasused,andtheoperationsthatwereperformed.Thispresentsacharacterizationofa“minimal”policy‐baseddatamanagementsystemthatiscapableofsupporting:
Datasharing Digitallibraries Productiondatacenters Preservation Protecteddata NSFDataManagementPlans
ThetasklistinTable1hasbeensortedtogroupsimilartaskstogether.Table2a:SortedtasklistAbility to export datasets in multiple formats Encrypt data transfers
Access controls Execution threads
Analysis Federation ‐ control data copies
Applying unique identifiers to data sets. Federation ‐ manage remote data grid interactions
Archive Federation ‐ periodically copy data
Authentication protocols for repository users. Federation‐ manage data retrieval
Automated metadata review Generate checksum on ingestion
13
Backup frequency Generate report by collection of corrections to data sets or access controls
Bulk processing Generate report for cost (time) required to audit events
Catalog indexing Generate report of types of protected assets present within a collection
Check for presence of PII on ingestion Generate report of all security and corruption events
Check for viruses on ingestion Generate report of the policies that are applied to the collections
Check passwords for required attributes Instrument Type
Code distribution system IPR
Collection location List all storage systems being used
Contextual metadata extraction policies List persons who can access a collection
Curation List staff by position and required training courses
Data category List versions of technology that are being used
Data disposition policies Maintain document on independent assessment of software
Data format control policies Maintain log of all software changes, OS upgrades
Data retention policies Maintain log of disclosures
Data sharing during analysis Maintain password history on user name
Data sharing system Make data products public
De-identification of data. Make original data public
Encrypt data on ingestion Mapping metadata across systems.
Table2b:SortedtasklistMetadata export Encrypt data transfers
Naming attributes Execution threads
Notification policies Federation ‐ control data copies
Parse event trail for all accessed systems Federation ‐ manage remote data grid interactions
Parse event trail for all persons accessing collection Federation ‐ periodically copy data
Parse event trail for all unsuccessful attempts to access data
Federation‐ manage data retrieval
Parse event trail for changes to policies Generate checksum on ingestion
Parse event trail for inactivity Generate report by collection of corrections to data sets or access controls
Parse event trail for updates to rule bases Generate report for cost (time) required to audit events
Parse event trail to correlate data accesses with client actions
Generate report of types of protected assets present within a collection
Physical path name Generate report of all security and corruption events
Provide test environment to verify policies on new systems
Generate report of the policies that are applied to the collections
Provide test system for evaluating a recovery procedure Instrument Type
Provide training courses for users IPR
Quality control List all storage systems being used
Re‐distribution List persons who can access a collection
Re‐use List staff by position and required training courses
Rename data grid List versions of technology that are being used
Replicate files Maintain document on independent assessment of software
14
Replicate iCAT periodically Maintain log of all software changes, OS upgrades
Restricted searching policies Maintain log of disclosures
Select replication resource Maintain password history on user name
Select storage Make data products public
Server Permission checks Make original data public
Set access approval flag Mapping metadata across systems.
Persistentstateinformationforninetypesofobjectswasused:
Collections Data Metadata Quotas Resources Tickets Tokens Users Zones
Atotalof50persistentstateinformationvariableswereaccessed.Table3.PersistentStateInformationVariablesUsedinPoliciesCOLL_ACCESS_COLL_ID DATA_SIZE RESC_LOC
COLL_ACCESS_TYPE DATA_TYPE_NAME RESC_NAME
COLL_ACCESS_USER_ID META_COLL_ATTR_NAME TICKET_DATA_COLL_NAME
COLL_ID META_COLL_ATTR_VALUE TICKET_EXPIRY
COLL_NAME META_DATA_ATTR_ID TICKET_ID
DATA_ACCESS_DATA_ID META_DATA_ATTR_NAME TOKEN_ID
DATA_ACCESS_TYPE META_DATA_ATTR_UNITS TOKEN_NAME
DATA_ACCESS_USER_ID META_DATA_ATTR_VALUE TOKEN_NAMESPACE
DATA_CHECKSUM META_RESC_ATTR_NAME USER_GROUP_ID
DATA_CREATE_TIME META_RESC_ATTR_VALUE USER_ID
DATA_EXPIRY META_USER_ATTR_NAME USER_INFO
DATA_ID META_USER_ATTR_VALUE USER_NAME
DATA_MODIFY_TIME QUOTA_OVER USER_TYPE
DATA_NAME QUOTA_USAGE USER_ZONE
DATA_PATH QUOTA_USAGE_USER_ID ZONE_CONNECTION
DATA_REPL_NUM QUOTA_USER_ID ZONE_NAME
DATA_RESC_NAME RESC_ID
Onlyfivesessionvariableswereusedtotrackattributesaboutclients:
$objPath $otherUserName
15
$rodsZoneClient $rodsZoneProxy $userNameClient
Atotalof123operationswereappliedinautomatingthetasks.AlmostafifthoftheoperatorswererelatedtoinitializingdefaultenvironmentvariablessuchasnumberofparallelI/Ostreams,numberofprocessingthreads,defaultstorageresource,defaultreplicationresource,operationspermittedbypublicusers,etc.Table4a.OperationsNeededtoAutomateTasks
. ‐ dot operator msiCurlGetStr
break msiCurlUrlEncodeString
cons msiDataObjChksum
delay msiDataObjClose
elem msiDataObjCopy
errorcode msiDataObjCreate
errormsg msiDataObjGet
execCmdArg msiDataObjLseek
fail msiDataObjOpen
failmsg msiDataObjPut
for msiDataObjRead
foreach msiDataObjRename
if msiDataObjRepl
irods_curl‐get msiDataObjTrim
list msiDataObjUnlink
msiAclPolicy msiDataObjWrite
msiAddKeyVal msiDeleteCollByAdmin
msiAddUserToGroup msiDeleteDisallowed
msiAdmInsertRulesFromStructIntoDB msiDeleteUser
msiAdmReadRulesFromFileIntoStruct msiEncrypt
msiAdmRetrieveRulesFromDBIntoStruct msiExecCmd
msiAdmShowIRB msiExecGenQuery
msiAdmWriteRulesFromStructIntoFile msiExecStrCondQuery
msiAssociateKeyValuePairsToObj msiExtractTemplateMDFromBuf
msiChksumRuleSet msiFreeBuffer
msiCollCreate msiGetContInxFromGenQueryOut
msiCollRsync msiGetFormattedSystemTime
msiCommit msiGetIcatTime
msiCreateUserAccountsFromDataObj msiGetMoreRows
msiCreateCollByAdmin msiGetObjType
msiCreateUser msiGetStderrInExecCmdOut
16
Table4b.OperationsNeededtoAutomateTasksmsiGetStdoutInExecCmdOut msiSetDefaultResc
msiGetSystemTIme msiSetGraftPathScheme
msiGetValByKey msiSetNumThreads
msiLoadMetadataFromDataObj msiSetPublicUserOpr
msiLoadMetadataFromXml msiSetRescQuotaPolicy
msiLoadUserModsFromDataObj msiSetReServerNumProc
msiMakeGenQuery msiSleep
msiMakeQuery msiSplitPath
msiMvRuleSet msiSplitPathByKey
msiNoChkFilePathPerm msiStoreVersionWithTS
msiOrbClose msiString2KeyValPair
msiOrbDecodePkt msiStripAVUs
msiOrbOpen msiSysChksumDataObj
msiOrbReap msiSysMetaModify
msiOrbSelect msiSysReplDataObj
msiQuota msiTarFileCreate
msiReadMDTemplateIntoTagStruct msiVaccum
msiReadRuleSet msiWriteRodsLog
msiRemoveKeyValuePairsFromObj remote
msiRenameCollection select
msiRenameLocalZone setelem
msiRollback split
msiRmRuleSet strlen
msiRuleSetExists substr
msiSendMail succeed
msiSetACL time
msiSetAVU while
msiSetBulkGetPostProcPolicy writeKeyValPairs
msiSetBulkPutPostProcPolicy writeLine
msiSetDataType writeString
msiSetDataTypeFromExt
17
2 Data Sharing Policy Set TheiRODSDatagriddistributioncomeswith11defaultpoliciesthatimplementadatasharingenvironment.Thesepoliciesareprovidedinarulebase,andareinvokedautomaticallyatpolicy‐enforcementpointswithinthedatagridmiddleware.Actionsinitiatedbyclientsaretrappedatthepolicy‐enforcementpoints,therulebaseisaccessedtodeterminetheappropriatepolicytoapply,andanassociatedprocedureisexecutedtoenforcethepolicy.ThepoliciesinvokedattheseenforcementpointsinthestandardiRODSreleasearegivenanamethatcorrespondstothepolicy‐enforcementpoint(typicallystartingwith“ac”.IniRODSversion4.0.3thereare70standardpolicyenforcementpoints.Additionalpolicyenforcementpointscanbepluggedintothearchitecturetocontrolnewactions.Thedefaultrulebaseisavailableathttps://github.com/irods/irods/blob/master/packaging/core.re.template
2.1 Manageusercreation(Policy1)Thispolicyisinvokedwhenanewuseriscreated.Therulecreatesahomedirectoryandatrashdirectoryforeachnewuseraccount,andaddstheaccounttotheusergroup“public”.Iftheaccountis“anonymous”,thehomedirectoryandtrashdirectoriesarenotcreated.Theruleusessessionvariablestoidentifythedatagridzonename($rodsZoneProxy)andtheaccountname($otherUserName).NotethattherearetwoversionsoftheacCreateUserF1rules.Iftheconditionforthefirstruleisnotsatisfied,thesecondversionoftheruleisexecuted.Ifataskfails,themicro‐servicelistedafterthe“:::”separatorisexecuted.Thusinteractionswiththemetadatacatalogare“rolledback”iftheregistrationattemptfails.Thepolicyincludesinvocationofpre‐processingandpost‐processingrulesforusercreation.Thepolicyimplementsaconstraint:
AppliedattheacCreateUserpolicyenforcementpointTestonUser‐name=anonymous
Thepolicyusessessionvariables: $otherUserName $rodsZoneProxy
Theoperationsthatareperformedare:
msiAddUserToGroupmsiCommitmsiCreateCollByAdminmsiCreateUsermsiRollback
Theruleisavailableathttps://github.com/irods/irods/blob/master/packaging/core.re.template
18
2.2 Manageuserdeletion(Policy2)Thispolicyisinvokedwhenauseraccountisdeleted.Theruledeletesthehomeandtrashcollectionsassociatedwithauseraccount.Theruleusessessionvariablestoidentifythedatagridzonename($rodsZoneProxy)andtheaccountname($otherUserName).Notethatpreprocessingpolicies(acPreProcForDeleteUser)andpostprocessingpolicies(acPostProcForDeleteUser)canalsobedefined.Thesemightbeusedtomigratefilestoanarchive,orsende‐mailtotheuseraboutthedispositionofthefiles.Thepolicyimplementsaconstraint:
AppliedattheacDeleteUserpolicyenforcementpoint
Thepolicyusessessionvariables: $otherUserName $rodsZoneProxy
Theoperationsthatareperformedare:
msiCommitmsiDeleteCollByAdminmsiDeleteUsermsiRollback
Theruleisavailableathttps://github.com/irods/irods/blob/master/packaging/core.re.template
2.3 Managerenamingofadatagrid(Policy3)Thispolicyisinvokedwhenanadministrativecommandisexecutedtorenameadatagrid.Therulerenamesallofthecollectionswithintheoriginaldatagrid.Theruleusestwoinputparameterstoidentifytheoriginalzonename(*oldZone)andthenewzonename(*newZone).Boththenameofthecollectionrepresentingthezoneandthezonenamearereset.Thestringconcatenationoperator“++”isusedtocreatethehomedatagridcollectionfromthehomedatagridname.Thepolicyimplementsaconstraint:
AppliedattheacRenameLocalZonepolicyenforcementpoint
Thepolicyusesinputvariables: *oldZone *newZone
Theoperationsthatareperformedare:
msiCommitmsiRenameCollectionmsiRenameLocalZonemsiRollback
19
Theruleisavailableathttps://github.com/irods/irods/blob/master/packaging/core.re.template
2.4 SetthemaximumnumberofI/Ostreams(Policy4)Thispolicyisinvokedwhenfiletransportisdonefromastorageresource.ThepolicycontrolsthenumberofI/Ostreamsthatareusedtomovefilesacrossanetwork.Therulesupportsconditionsbasedonthesessionvariable$rescNamesothatdifferentpoliciescanbesetfordifferentresources.Onlyonefunctioncanbeusedforthisrule:
msiSetNumThreads(sizePerThrInMb,maxNumThr,windowSize)Thissetsthenumberofthreadsandthetcpwindowsize.ThenumberofthreadsisbasedontheinputparametersizePerThrInMb(sizeperthreadinMbytes).Thenumberofthreadsiscomputedusing:
numThreads=fileSizeInMb/sizePerThrInMb+1wheresizePerThrInMbisanintegervalueinMBytes.Italsoacceptstheword"default"whichsetssizePerThrInMbtoadefaultvalueof32
maxNumThr‐Themaximumnumberofthreadstouse.Itacceptsintegervaluesupto16.Italsoacceptstheword"default"whichsetsmaxNumThrtoadefaultvalueof4.Avalueof0meansnoparallelI/O.Thiscanbehelpfultogetaroundfirewallissues.
windowSize‐thetcpwindowsizeinBytesfortheparalleltransfer.Avalue of0or"default"meansadefaultsizeof1,048,576Bytes.
ThemsiSetNumThreadsfunctionmustbepresentornoparallelthreadswillbeusedforalltransfers.Thepolicyimplementsaconstraint:
AppliedattheacSetNumThreadspolicyenforcementpoint
Theoperationsthatareperformedare:msiSetNumThreads
Theruleisavailableathttps://github.com/irods/irods/blob/master/packaging/core.re.template
2.5 Bypasspermissionchecksforregisteringafile(Policy5)Thispolicyisinvokedwhenfilesareregisteredintothedatagrid.Theruledetermineswhetherfilepathpermissionsarecheckedwhenregisteringaphysicalfilepathusingcommandssuchasireg.Therulealsosetsthepolicyforcheckingthefilepathwhenunregisteringadataobjectwithoutdeletingthephysicalfile.Normally,arodsuseraccountcannotunregisteradataobjectifthephysicalfileislocatedinaresourcevault.ThemsiNoChkFilePathPermallowsthischecktobebypassed.Onlyonefunctioncanbecalled:
msiNoChkFilePathPerm()‐Donotcheckfilepathpermissionwhenregistering afile.WARNING‐Thisfunctioncancreateasecurityproblemifused.
20
Thepolicyimplementsaconstraint:AppliedattheacNoChkFilePathPermpolicyenforcementpoint
Theoperationsthatareperformedare:
msiNoChkFilePathPerm
Theruleisavailableathttps://github.com/irods/irods/blob/master/packaging/core.re.template
2.6 Setpolicyfordefiningphysicalpathnameforafile(Policy6)Thispolicyisinvokedbeforeafileisstoredinafilesystem.TheruledefinesthephysicalpaththatwillbeusedwithintheiRODSresourcevault.Twofunctionscanbecalled:
msiSetGraftPathScheme(addUserName,trimDirCnt)‐SettheVaultPathschemetoGRAFT_PATH‐graft(add)thelogicalpathtothevaultpathoftheresourcewhengeneratingthephysicalpathforadataobject.Thefirstargument(addUserName)specifieswhethertheuserNameshouldbeaddedtothephysicalpath.e.g.$vaultPath/$userName/$logicalPath."addUserName"canhavetwovalues‐yesorno.Thesecondargument(trimDirCnt)specifiesthenumberofleadingdirectoryelementsofthelogicalpathtotrim.Avalueof0or1isallowable.Thedefaultvalueis1.
msiSetRandomScheme()‐SettheVaultPathschemetoRANDOMmeaningarandomlygeneratedpathisappendedtothevaultPathwhengeneratingthephysicalpath.e.g.,$vaultPath/$userName/$randomPath.TheadvantagewiththeRANDOMschemeisrenamingoperations(imv,irm)aremuchfasterbecausethereisnoneedtorenamethecorrespondingphysicalpath.
ThedefaultistheGRAFT_PATHschemewithaddUserName==noandtrimDirCnt==1.Note:iftrimDirCntisgreaterthan1,thehomeortrashdirectorynamewillbetakenout.Thepolicyimplementsaconstraint:
AppliedattheacSetVaultPathPolicypolicyenforcementpoint
Theoperationsthatareperformedare:msiSetGraftPathScheme
Theruleisavailableathttps://github.com/irods/irods/blob/master/packaging/core.re.template
2.7 Setnumberofexecutionthreadsusedtoprocessrules(Policy7)ThispolicyspecifiesthenumberofprocessestousewhenrunningjobsintheirodsReServer.TheirodsReServercanmulti‐tasksuchthatoneortwolongrunningjobscannotblocktheexecutionofotherjobs.Onefunctioncanbecalled:
msiSetReServerNumProc(numProc)‐numProccanbe"default"oranumberintherange0‐4.Avalueof0meansnoforking.ThevalueofnumProcwillbesetto1if"default"isinput.
21
Thepolicyimplementsaconstraint:AppliedattheacSetReServerNumProcpolicyenforcementpoint
Theoperationsthatareperformedare:msiSetReServerNumProc
Theruleisavailableathttps://github.com/irods/irods/blob/master/packaging/core.re.template
2.8 Set policy for processing files in bulk (Policy 8) Thisrulesetsthepolicyforexecutingthepostprocessingputrule(acPostProcForPut)forbulkputoperations.Sincethebulkputoptionisintendedtoimprovetheuploadspeed,executingtheacPostProcForPutforeveryfilewillslowdownthetheupload.Thisruleprovidesanoptiontoturnthepostprocessingoff.Onlyonefunctioncanbecalled:
msiSetBulkPutPostProcPolicy(flag)‐Thismicro‐servicesetswhethertheacPostProcForPutrulewillberunonbulkput.Validvaluesfortheflagare:
"on"‐enableexecutionofacPostProcForPut."off"‐disableexecutionofacPostProcForPut(default).
Thepolicyimplementsaconstraint:
AppliedattheacBulkPutPostProcPolicypolicyenforcementpoint
Theoperationsthatareperformedare:msiSetBulkPutPostProcPolicy
Theruleisavailableathttps://github.com/irods/irods/blob/master/packaging/core.re.template
2.9 Manageindexingofthesystemstatecatalog(Policy9)Thisrulecontrolstheautomatedindexingofthemetadatacatalog.Intheruleexample,theindexingisdelayeduntilafuturetimespecifiedbythevariable*arg1.Validdelayexamplesfor*arg1are:
"<PLUSET>1s</PLUSET>" –delayexecutionforonesecond"<PLUSET>1m</PLUSET>"–delayexecutionforoneminute"<PLUSET>1h</PLUSET>" –delayexecutionforonehour"<PLUSET>1d</PLUSET>" –delayexecutionforoneday"<PLUSET>1y</PLUSET>" –delayexecutionforoneyear"<EA>ils.renci.org</EA>" ‐hostaddresswhereexecutionisperformed
ThispolicywasprovidediniRODSversion3.3,buthasbeendeprecatediniRODSversion4.x.Thepolicyimplementedaconstraint:
AppliedattheacVacuumpolicyenforcementpoint
Theoperationsthatwereperformedare:
22
delaymsiVacuum
2.10 Setstoragequotapolicy(Policy10)Thisrulecanbeusedtoturnonresourcequotaenforcement.Themaximumstoragespaceforeachusercanbesetusingtheadministratorcommand,iadmin.Quotascanbesetforusersandforgroupsofusers,foreitherthetotalallowedstorageorforthestorageonaspecificstoragesystem.Onlyonefunctioncanbecalled:
msiSetRescQuotaPolicy()‐Thismicro‐servicesetswhethertheResourceQuotashouldbeenforced.Validvaluesfortheflagare:"on"‐enableResourceQuotaenforcement,"off"‐disableResourceQuotaenforcement(default).
Thepolicyimplementsaconstraint:
AppliedattheacRescQuotaPolicypolicyenforcementpoint
Theoperationsthatareperformedare:msiSetRescQuotaPolicy
Theruleisavailableathttps://github.com/irods/irods/blob/master/packaging/core.re.template
2.11 Manageselectionofstorageresource(Policy11)Thispolicyisinvokedwhencreatingadataobject.Theruledefineshowresourcesareselectedforstoringfiles.Thisisapreprocessingrulethatisexecutedbeforetheobjectiscreated.Itcanbeusedtosettheresourceselectionschemewhenprocessingtheput,copyandreplicateoperations.Currently,threepreprocessingfunctionscanbeusedbythisrule:
msiSetNoDirectRescInp(rescList)‐setsalistofresourcesthatcannotbeusedbyanormaluserdirectly.Morethanoneresourcecanbeinputusingthecharacter"%"asseparator.e.g.,resc1%resc2%resc3.Thisfunctionisoptional,butifused,shouldbethefirstfunctiontoexecutebecauseitscreenstheresourceinput.
msiSetDefaultResc(defaultRescList,optionStr)‐setsthedefaultresource.Thisfunctionisnolongermandatory,butifitisused,ifshouldbeexecutedrightafterthescreeningfunctionmsiSetNoDirectRescInp.
defaultResc‐theresourcetouseifnoresourceisinput.A"null"meansthereisnodefaultResc.Morethanoneresourcecanbeinputusingthecharacter"%"asseparator.
optionStr–Valuecanbe"forced","preferred"or"null".A"forced"inputmeansthedefaultRescwillbeusedregardlessoftheuserinput.Theforcedactiononlyappliestouserswithnormalprivilege,“rodsuser”.
msiSetRescSortScheme(sortScheme)‐settheschemeforselectingthebestresourcetousewhencreatingadataobject.
sortScheme‐Thesortingscheme.Validschemesare"default","random","byLoad"and"byRescClass".The"byRescClass"schemewillputthe
23
cacheclassofresourceonthetopofthelist.The"byLoad"schemewillputtheleastloadedresourceonthetopofthelist.Inordertoworkproperly,theResourceMonitoringsystemmustbeswitchedoninordertopickuptheloadinformationforeachserverintheresourcegrouplist.Thescheme"random"and"byRescClass"canbeappliedinsequence.e.g.,
msiSetRescSortScheme(random)msiSetRescSortScheme(byRescClass)
willselectrandomlyacacheclassresourceandputitonthetopofthelist.
Thepolicyimplementsaconstraint:AppliedattheacSetRescSchemeForCreatepolicyenforcementpoint
Theoperationsthatareperformedare:msiSetDefaultResc
Theruleisavailableathttps://github.com/irods/irods/blob/master/packaging/core.re.template
24
3 DataManagementPolicySet(SILSLifeTimeLibrary)TheLifeTimeLibraryusesfiveadditionalpoliciestocontrolcreationofpersonaldigitallibrariesforstudents.Oneofthesepoliciesmodifiestheoptionforselectingthedefaultstorageresource.Asecondpolicyturnsonquotaenforcement.Thusonlythreepoliciesrepresentnewrules.Thepoliciesare:
3.1 Turnonstoragequotaenforcement(Policy10)Thisruleimplementsrestrictionsonthetotalamountofstoragespacethatcanbeusedbyastudent.Whenthequotaisexceeded,astudentwillbeabletoreadfiles,butwillnotbeabletowritenewfiles.Thequotavaluesaresetbyrunningtheiadmincommand.iadminsuqUserNameResourceName ‐tosetaquotaonastorageresourceiadminsuqUserNametotal ‐tosetatotalstoragequota
Thepolicyimplementsaconstraint:AppliedattheacRescQuotaPolicypolicyenforcementpoint
Theoperationsthatareperformedare:msiSetRescQuotaPolicy
Theruleisavailableathttps://github.com/DICE‐UNC/policyworkbook/blob/master/acRescQuotaPolicy.re
3.1.1 Check for missing quotas Thispolicyidentifiesallaccounts(usernames)forwhichaquotahasnotbeenset.Thepolicyusespersistentstateinformation:
USER_IDUSER_NAMEQUOTA_USER_ID
Theoperationsthatareperformedare:foreachifselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/sils‐missing‐quota.r
3.1.2 Calculate total storage usage Thispolicycalculatesthetotalamountofstorageusedbypersonandidentifiesthepersonwhohasstoredthemostdata.
25
Thepolicyusespersistentstateinformation:USER_IDUSER_NAMEQUOTA_USAGEQUOTA_USAGE_USER_ID
Theoperationsthatareperformedare:foreachifselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/sils‐storageReport.r
3.1.3 Identify persons who exceeded their quota Thisruleidentifiestheindividualswhohaveexceededtheirquotaandliststhetop10usersofstorage.Thisusestwopolicyfunctions,
createListaddToList.
Thepolicyusespersistentstateinformation:
USER_IDUSER_NAMEUSER_ZONEQUOTA_OVERQUOTA_USER_IDQUOTA_USAGEQUOTA_USAGE_USER_ID
Theoperationsthatareperformedare:breakselectforeachifwriteLinestrlenelem
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/sils‐checkQuota.r
3.1.4 Periodically update quota check ThestorageusageisupdatedwhenthemsiQuotamicro‐serviceisrun.Theusagecanalsobeupdatedbyrunningtheadministrativecommand:
26
iadmincuThisruleupdatestheusageeveryday.Thepolicyusesnopersistentstateinformation:
Theoperationsthatareperformedare:
delaymsiQuotawriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/sils‐missing‐quota.r
3.2 Manageselectionofstorageresource(Policy11)ThisrulechangesthenameofthedefaultstoragesystemthatisusedforstoringfileswithintheLifeTimeLibrary.
Thepolicyimplementsaconstraint:AppliedattheacSetRescSchemeForCreatepolicyenforcementpoint
Theoperationsthatareperformedare:msiSetDefaultResc
Theruleisavailableathttps://github.com/DICE‐UNC/policy‐workbook/blob/master/acSetRescSchemeForCreate.re
3.3 Manageselectionofstorageresourceforreplication(Policy12)ThisrulechangesthedefaultstoragesystemnameforreplicationoffileswithintheLifeTimeLibrary.Thepolicyimplementsaconstraint:
AppliedattheacSetRescSchemeForReplpolicyenforcementpoint
Theoperationsthatareperformedare:msiSetDefaultResc
Theruleisavailableathttps://github.com/DICE‐UNC/policy‐workbook/blob/master/acSetRescSchemeForRepl.re
3.4 Enforcereplicationofeachnewfile(Policy13)Thisruleimplementsanintegrityrequirement,ensuringthateachfileaddedtotheLifeTimeLibraryisreplicatedtoasecondstoragesystem.Thereplicationisqueuedforexecutiontominimizewaittimeontheoriginalputaction.Currently,threepostprocessingfunctionscanbeusedindividuallyorinsequenceintheacPostProcForPutrule:msiSysChksumDataObj–createachecksumonthefileandstorethechecksumin
27
themetadatacatalogunderthepersistentstatevariablename“DATA_CHECKSUM”.
msiExtractNaraMetadata‐extractandregistermetadatafromthejustuploadedNARAfiles.
msiSysReplDataObj(replResc,flag)‐canbeusedtoreplicateacopyofthefilejustuploadedorcopieddataobjecttothespecifiedreplicaresource(replResc).Validvaluesforthe"flag"inputare"all","updateRepl"and"rbudpTransfer".Morethanoneflagvaluescanbesetusingthe"%"characterasseparator.e.g.,"all%updateRepl"."updateRepl"meansupdateanexistingstalecopytothelatestcopy.The"all"flagmeansreplicatetoallresourcesinaresourcegrouporupdateallstalecopiesifthe"updateRepl"flagisalsoset."rbudpTransfer"meanstheRBUDPprotocolwillbeusedforthetransfer.A"null"inputmeansasinglereplicawillbemadeinoneoftheresourcesintheresourcegroup.ItmaybedesirabletodoreplicationonlyifthedataObjectisstoredinaresourcegroup.
Thepolicyimplementsaconstraint:AppliedattheacPostProcForPutpolicyenforcementpointChecksforspecificobjectpath,like"/lifelibZone/home/*"
Thesessionvariablesare: $objPathTheoperationsthatareperformedare:
delaymsiSysReplDataObj
Theruleisavailableathttps://github.com/DICE‐UNC/policy‐workbook/blob/master/acPostProcForPut‐ReplSILS.re
3.5 Manageaccesscontrolpolicy(Policy14)Thisrulekeepsusersfromseeingthenamesofotheruser’sfiles,andisneededtoensurethateachstudentcollectionisprivatetothatstudent.TherulesetstheAccessControlListpolicy.IftheruleisnotcalledorcalledwithanargumentotherthanSTRICT,theSTANDARDsettingisineffect,whichisfineformanysites.Bydefault,usersareallowedtoseecertainmetadata,forexamplethedata‐objectandsub‐collectionnamesineachother'scollections.WhenaccesscontrolsaremadeSTRICTbycallingmsiAclPolicy(STRICT),theGeneralQueryAccessControlisappliedoncollectionsanddataobjectmetadatawhichmeansthatthelistcommand,ils,willneed'read'accessorbettertothecollectiontoreturnthecollectioncontents(nameofdata‐objects,sub‐collections,etc.).Thedefaultisthenormal,non‐strictlevel,allowinguserstoseenamesofothercollections.Inallcases,accesscontroltothedata‐objectsisenforced.Evenifapersoncanseefilenamesinacollection,“read”accessisrequiredonafiletobeable
28
toreadthefile.EvenwithSTRICTaccesscontrol,however,theadminuserisnotrestrictedsovariousmicroservicesandquerieswillstillbeabletoevaluatesystem‐wideinformation.Thesessionvariable,“$userNameClient”canbeusedtolimitactionstoindividualusers.However,thisisonlysecureinanirods‐passwordenvironment(notGSI),butyoucanthenhaverulesforspecificusers:
acAclPolicy{ON($userNameClient=="quickshare"){}}acAclPolicy{msiAclPolicy("STRICT");}
whichwasrequestedbyARCS(SeanFleming).SeersGenQuery.cformoreinformationon$userNameClient.Thetypicaluseistojustsetitstrictornotforallusers:Thepolicyimplementsaconstraint:
AppliedattheacAclPolicypolicyenforcementpoint
Theoperationsthatareperformedare:msiAclPolicy
Theruleisavailableathttps://github.com/DICE‐UNC/policy‐workbook/blob/master/acAclPolicy‐strict.re
29
4 DataAdministrationPolicySet(RDAPracticalPolicyworkinggroup)
TheResearchDataAlliancePracticalPolicyworkinggroupconductedasurveyof41sitesthatweremanagingdatacollections.Asetof11policycategoriesthatwereappliedacrossmostofthesiteswasidentified.Thepoliciesincludeautomationofadministrativefunctions,enforcementofmanagementdecisions,andvalidationofassessmentcriteria.ThepoliciesarelistedinTable1andhaveminimaloverlapwiththepolicysetsfordatasharingandstudentdigitallibraries,exceptforpoliciestomanageaccesscontrols.Foreachpolicycategory,multiplepoliciesmaybedefined.
4.1 Dataaccesscontrolpolicies(Policy14)Automatedapplicationofaccessrestrictionsbasedonmetadatasimplifiesadministrationofadatagrid.Everyrepositoryneedstobeabletoeasilyrestrictvariousdatasetstospecificaudiences(e.g.,campusmembersaregrantedreadaccessduetolicensing,whilewriteaccessisgrantedtocreatorsofacollection).Thisinformationisstoredassystemmetadataandischeckedonallaccesses.Accesscontrolsrequiretheabilitytoassignauniqueidentifiertoeachperson,validatetheidentityofeachuser,andthenauthorizeeachoperation.WithintheiRODSdatagrid,uniqueidentifiersareassignedtousersandfiles.Theidentifiersareusedtoassociateacccesscontrolswithausername.
4.1.1 FindtheUser_IDassociatedwithaUser_name:Sinceidentifiersforusersmaybesetaseitherstrings(USER_NAME)orintegers(USER_ID),apolicythatallowsapersontofindtheUSER_IDfortheirUSER_NAMEisuseful.Thispolicyqueriesametadatacatalog,andretrievestheUSER_ID for the person who is running the rule. The policy can be appliedinteractively to files within a collection, or can be automated as part of a fileingestionprocess.Fortheinteractiveversionofthepolicy,theoutputiswrittentothescreen.Thepolicyusespersistentstateinformation:
USER_IDUSER_NAME
Theoperationsthatareperformedare:foreachselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐userID.r
30
4.1.2 FindtheFile_IDassociatedwithafilename:Sinceidentifiersforfilesmayalsobesetaseitherstrings(DATA_NAME)orintegers(DATA_ID),apolicythatfindstheDATA_IDforafileisuseful.Thispolicyqueriesametadatacatalog,andretrievestheDATA_IDforaspecifiedfilenamethatisinput to the rule. The result is written to the screen. The rule uses the policyfunctions: checkCollInput checkFileInputTheinputvariablesare:
*File afilename*RelativeCollectionName arelativecollectionname
Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:
COLL_IDCOLL_NAMEDATA_IDDATA_NAME
Theoperationsthatareperformedare:failforeachifselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐fileID.r
4.1.3 Setwriteaccesscontrolforauser:Apersoncansetanaccesscontrolonafilethattheyownbyspecifyingthefilename,thedesiredaccesscontrol,andtheusernamethatwillbegivenaccess.Thispolicyreadsasinputtheusername,thecollectionandfileonwhichtheaccesscontrolisset,andthedesiredaccesscontrol. Themetadatacatalogisupdatedtorecordthechangeinaccesscontrol.Thisissimilartotheichmodcommand.Thisruleusesthepolicyfunctions:
checkCollInputcheckFileInputcheckPathInputcheckUserInputfindZoneHostName
31
Theinputvariablesare:*Acl anaccesspermission*File afilename*RelativeCollection arelativecollectionname*User ausername
Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusesthepersistentstateinformation:
COLL_IDCOLL_NAMEDATA_IDDATA_NAMEUSER_IDUSER_NAMEUSER_ZONEZONE_CONNECTIONZONE_NAME
Theoperationsthatareperformedare:
failforeachifmsiSetACLmsiSplitPathmsiSplitPathByKeyremoteselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐setACL.r
4.1.4 Setoperationsthatareallowablefortheuser"public"Thispolicycontrolstheoperationsthat“public”usersareallowedtoexecute.Only 2 operations are allowed ‐"read" ‐ read files; and "query" ‐ browse somesystemlevelmetadata.Bothoperationscanbespecifiedbyusingtheseparator“%”.The rule uses the micro‐service “msiSetPublicUserOpr” to specify what types ofpublic accessoperationsareallowed. Themicro‐servicesare called fromapolicyenforcementpointassociatedwithsettingPublicUserPolicy.Thepolicyimplementsaconstraint:
32
AppliedattheacSetPublicUserPolicypolicyenforcementpoint
Theoperationsthatareperformedare:msiSetPublicUserOpr
Theruleisavailableathttps://github.com/DICE‐UNC/policy‐workbook/blob/master/acSetPublicUserPolicy.re
4.1.5 Checktheaccesscontrolsonafile:This policy checks each file in a collection forwhether a specific user has access.This rule has input parameters for the names of a collection and user for whichaccess controlswill be checked. Thedesiredaccesspermission is comparedwiththeaccesspermissions seton the file. If theaccess control isnot found, anerrormessage is written. In practice, access control checks on files are enforcedautomaticallybytheiRODSframework.Thisruleusespolicyfunctions:
checkCollInputcheckUserInputfindZoneHostName
Theinputvariablesare:
*Coll arelativecollectionname*User ausername
Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:
COLL_IDCOLL_NAMEDATA_ACCESS_DATA_IDDATA_ACCESS_TYPEDATA_ACCESS_USER_IDDATA_IDDATA_NAMETOKEN_IDTOKEN_NAMETOKEN_NAMESPACEUSER_IDUSER_NAMEUSER_ZONEZONE_CONNECTIONZONE_NAME
Theoperationsthatareperformedare:fail
33
foreachifmsiSplitPathByKeyremoteselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐acl.r
4.2 Dataformatcontrolpolicies(Policy15)FormatssuchasSPSS,SAS,andStatawillnotbearoundforeversoweneedtomovedataoutofsuchformatsintoopenandmoredurableformats.Policiesareneededtoidentifythedataformatsthatarepresentinacollection,andtransformobsoletedataformats.
4.2.1 SetformatconversionflagApolicyisneededtospecifywhenformatconversionisrequired.Thispolicysetsaconversionflagwhenthedatatypeisaspecifiedformat.Thedatatypeisnormallydefinedforafilewhenitisloadedintothedatagrid.Seethecommand iput–D“datatype”file‐nameTheruleusesthepolicyfunction: checkCollInputTheinputvariablesare:
*Collrel arelativecollectionname*Type adatatype
Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:
COLL_IDCOLL_NAMEDATA_NAMEDATA_TYPE_NAME
Theoperationsthatareperformedare:failforeachifmsiAddKeyValmsiAssociateKeyValuePairsToObjselectwriteLine
34
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐setconv.r
4.2.2 Invoke format conversion ThispolicyinvokestheNCSAPolyglotservicetotransformadataformat.ThisexternalserviceisinvokedbysendinghttprequeststoaserveratDrexelUniversity.NotethatthefilethatisbeingconvertedwillalsobemovedtoDrexel,withtheconvertedfilereturnedoverthenetwork.Theruleusesthepolicyfunctions: addAVUMetadata deleteAVUMetadataTherulehasaconstraint: *Aname mustequal “ConvertMe”Theinputvariablesare:
*Aname ‐flagwithvalue"ConvertMe"*ItemName ‐pathofthefilebeingconverted
Outputfromtheconversionprogramis:
*out ‐nameoftheconvertedfileThesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusesnopersistentstateinformation:
Theoperationsthatareperformedare:
ifirods_curl‐getmsiRemoveKeyValuePairsFromObjmsiSetAVUmsiString2KeyValPair
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐convertfile.r
4.2.3 IdentifyandarchivespecificfileformatsfromastagingareaFileformattypeisstoredinastateinformationvariablecalledDATA_TYPE_NAME.Queriescanbeissuedagainstthemetadatacatalogtoretrievefileswithagivenformattype.Operationsarealsosupportedforextractingthefileformattypeofafile,basedonthefileextension.Thispolicyexaminesastagingareaforfileswithaspecificformattype.Thefileformatisdeterminedfromthefileextension.Filesthathaveadesiredextension,
35
inthiscaseanextension“.r”,aremovedintoaspecifiedcollection.Thismakesitpossibletosortfilesbyfileformattype.Thecollectionthatcorrespondstothestagingareaandthecollectionthatcorrespondstothedestinationarchivearereadfrominput.Notethatwhenafileismoved,theaccesscontrolsmustbereset.Thisruleusesthepolicyfunctions:
checkCollInput createLogFile isColl
Theinputvariablesare:
*Coll arelativecollectionname*Res astorageresource*Stage arelativecollectionname
Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:
COLL_IDCOLL_NAMEDATA_NAME
Theoperationsthatareperformedare:delayfailforeachifmsiCollCreatemsiDataObjCreatemsiDataObjRenamemsiGetSystemTImemsiSetACLselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐stageformat.r
4.3 NotificationPolicies(Policy16)Eventsthatoccurwithinthedatamanagementsystemcanbeloggedinanaudittrail.Theaudittrailcanbeparsedtoanalyzewhathashappened.Eventscanalsobemonitored,withappropriateE‐mailsenttoanadministrator.Eventscanalsobetrackedthroughnotificationsthataresenttoanindexingservereachtimea
36
specifiedactionoccurs.Automatedcreationofeventmetadataisneededasdatasetsanddatacollectionsarebeingprocessed.Currentlythisisbeingdonemanuallyformostcollectionsatgreatcostandeffort.
4.3.1 NotifyoncollectiondeletionNotificationpoliciesareimplementedatPolicyEnforcementPoints,eitherbeforeanactionoccursoraftertheactioniscompleted.Arulecanbecreatedthatspecifiesthetypeofnotificationthatwillbeused.ThispolicysendsE‐mailtoanadministratorondeletionofacollection.Asessionvariable,$collName,isusedtoidentifywhichcollectionisbeingdeleted.
Thepolicyimplementsaconstraint:AppliedattheacPreprocForRmCollpolicyenforcementpoint
Theoperationsthatareperformedare:
msiSendMail
Theruleisavailableathttps://github.com/DICE‐UNC/policy‐workbook/blob/master/acPreProcForRmColl.re
4.3.2 Notification of events EventscanbedetectedatallpolicyenforcementpointsthroughuseofaC++versionofthepluggableruleengine.TheC++versionisfastenoughtotrackalloperationsperformedwithinthedatamanagementsystem.Thedetectedeventsaredocumentedinmessagesthataresenttoamessagequeueforprocessingbyanexternalindexingsystem.Thiscapabilitywillbeavailableinversion4.2ofiRODS.Policiescanthenbeassociatedwitheachmicro‐serviceplugintoautomateeventdetectionandauditing.Oneapplicationisthecorrelationofeachchangetothepersistentstateinformationwiththeeventthatcausedthechange.Thisrequiresmappingfromclientactions,tothepolicyenforcementpointsthatareinvoked,tothepoliciesthatarethenenforced,tothemicro‐servicesthatareexecuted,tothepersistentstateinformationattributesthataremodifiedorchanged.Anexampleofhowthiscanbedonebyhandisgiveninchapter8.Asimilarapproachcanbeusedtoauditallactionsperformeduponthedatamanagementsystem.ComputeractionablepoliciesformonitoringeventsarelistedinChapter5.6.The“rule_exists”functiontellstheruleenginepluginsystemwhichrulesthispluginlistensto.Inthiscaseitlistenstoanyruleunderthe"audit_"namespace.The“exec_rule”functionactuallyhandlestheauditing.Itlogsname,arguments,andthecondInputDatafieldoftheREIin‐memorystructureofanoperation,etc.totheserverlog.ThefullcodewillbeavailableonGithubwiththe4.2release.
37
4.4 Useagreementpolicies(Policy17)Thecreationofauseagreementrequiresaninteractionwitheachuser,independentlyofthedatagrid.Theresultinginformationcanbecapturedasmetadatathatisassociatedwitheachfileinacollection.Itisthenpossibletotrackwhetherauseagreementhasbeenreceived,andwritepoliciesthatrestrictaccesswhenfileshavenoofficialuseagreement.
4.4.1 SetreceiptofsigneduseagreementAmetadataattributecanbedefinedforeachusertodesignatereceiptofasigneduseragreement.Thisisanexampleofauser‐definedmetadataattributethatcanbeassociatedwitheachusername.The policy sets the use agreement for a specified user. This policy uses themetadata attribute “Use_Agreement” to store a value of “RECEIVED” when a useagreementisconfirmed.Theruleusesthepolicyfunction: checkUserInput findZoneHostNameTheinputvariablesare:
*User ausernameThesessionvariablesare: $rodsZoneClientThepolicyusesthepersistentstateinformation:
USER_IDUSER_NAMEUSER_ZONEZONE_CONNECTIONZONE_NAME
Theoperationsthatareperformedare:
failforeachifmsiAddKeyValmsiAssociateKeyValuePairsToObjmsiSplitPathByKeyremoteselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐useSet.r
38
4.4.2 IdentifyuserswithoutsigneduseagreementThispolicyqueriesallusernamestofinduserswhoeitherdonothavea“Use_Agreement”metadataattributename,orhaveavaluethatisnot“RECEIVED”.Ifeithercaseisfound,amessageiswrittentothescreen.Therearenoinputvariables.
Therearenosessionvariables.Thepolicyusespersistentstateinformation:
META_USER_ATTR_NAMEMETA_USER_ATTR_VALUEUSER_NAME
Theoperationsthatareperformedare:foreachifselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐useVerify.r
4.5 Integritypolicy(Policy18)Policiesaretypicallycreatedtoverifytheintegrityoffilesbycomparingthecurrentchecksumwithasavedvalueofthechecksum.However,integritypoliciescanalsobecreatedtoverifyaccesscontrolsonacollection,verifythepresenceofrequiredmetadata,verifyfiledistribution,etc.
4.5.1 VerifyaccesscontrolsonfilesThisruleanalysesthefilesinacollectiontoverifythatarequiredaccesscontrolispresentoneachfile.Theinputincludesthenameofthecollectionthatwillbeverified,thetypeofaccesscontrolthatisrequired,andthenameofapersonforwhichtheaccesscontrolisset.Theruleverifiesthecollectionname,retrievesaUSER_IDforthenamedperson,andretrievesaDATA_ACCESS_DATA_IDnumberforthetypeofaccesscontrol.Aloopismadeoverthefilesinthecollection,withasub‐loopthatverifiestheaccesscontroloneachfile.Theresultsareprintedtothescreen.Theruleusesthepolicyfunctions: checkCollInput checkUserInput findZoneHostNameTheinputvariablesare:
*Acl anaccesscontrol*Coll arelativecollectionname*User ausername
39
Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:
COLL_IDCOLL_NAMEDATA_ACCESS_DATA_IDDATA_ACCESS_TYPEDATA_ACCESS_USER_IDDATA_IDDATA_NAMETOKEN_IDTOKEN_NAMETOKEN_NAMESPACEUSER_IDUSER_NAMEUSER_ZONEZONE_CONNECTIONZONE_NAME
Theoperationsthatareperformedare:failforeachifmsiSplitPathByKeyremoteselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐integrityACL.r
4.5.2 CheckintegrityandnumberofreplicasoffilesinacollectionThispolicyimplements17basicoperationsneededforaproductionqualityruleforverifyingtheintegrityofacollection.Thebasicoperationsinclude:
1. Verifyingallinputparametersforconsistency2. Retrievingstateinformationfromthemetadatacatalogoneachexecution3. Verifying integrity of each file by comparing the saved checksumwith the
computedchecksum4. Updatingallreplicastothemostrecentversion5. Minimizingtheloadontheproductionservicesthroughadeadlinescheduler6. Differentiatingbetweenthelogicalnameforthefileandthephysicallocation
ofthereplicas
40
7. Identifyingmissingreplicasanddocumentingtheirabsence8. Creatingnewreplicastoreplacemissingfiles9. Implementingloadlevelingtodistributefilescrossavailablestoragesystems10. Creatingalogfiletorecordallrepairoperationsandstoringthelogfileinthe
datagrid11. Trackingprogressofthepolicyexecution12. Initializingtheruleforthefirstexecution,includingsettingvariablestotrack
progress.13. Enablingrestartfromthelastcheckedfile14. Manipulatingfilesinbatchesof256filesatatimetohandlearbitrarilylarge
collections15. Minimizingthenumberofsleepperiodsrequiredbythedeadlinescheduler16. Checkingnewfilesthathavebeenaddedonarestart17. Generatingstatisticsabouttheexecutionrateandpropertiesofthefilesthat
werechecked.Implementingall17operationsincreasesthesizeoftheproductionpolicysubstantially.However,itispossibletoshowthattheaveragetimespentperfileisstilllessthanadiskrotationperiod,implyingthattheproductionruleissuitableforverifyingintegrityacrossarbitrarilylargecollections.
Thepolicytoperiodicallycheckintegrityusesthepolicyfunctions:
addAVUMetadataToColl checkCollInput checkMetaExistsColl checkRescInput createLogFile createReplicas findZoneHostName getNumSizeColl getRescColl isColl selectRescUpdate updateCollMeta verifyReplicaChksum
Theinputvariablesare:*Coll acollectionpathname*Delt alengthoftimetoruninseconds*NumReplicas numberofreplicas*Res astorageresource
Thesessionvariablesare: $rodsZoneClient
41
Thepolicyusespersistentstateinformation:COLL_IDCOLL_NAMEDATA_CHECKSUMDATA_IDDATA_NAMEDATA_REPL_NUMDATA_RESC_NAMEDATA_SIZEMETA_COLL_ATTR_NAMEMETA_COLL_ATTR_VALUERESC_IDRESC_NAMEZONE_CONNECTIONZONE_NAME
Theoperationsthatareperformedare:breakconsdelayelemfailforforeachiflistmsiAssociateKeyValuePairsToObjmsiCollCreatemsiDataObjChksummsiDataObjCreatemsiDataObjReplmsiGetSystemTimemsiRemoveKeyValuePairsFromObjmsiSetAVUmsiSleepmsiSplitPathByKeymsiString2KeyValPairremoteselectsetelemwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐integrityACL.r
42
4.6 Metadata extraction (Policy 19) Thenecessarytaskinbuildingadigitallibraryisthecreationofprovenanceanddescriptivemetadata.Thistypicallyrequiresinteractivecreationofthedescriptivemetadata.Forcollectionsthathavemorethanathousanddigitalobjects,thisbecomesalaborioustask.Ifthemetadataattributescanbeaggregatedintoastandardformat,thenbulkloadingofmetadatamaybeappropriate.ExamplesincludebulkloadingfromanXMLfileorapipe‐delimitedfile.Analternateapproachis“feature‐based”indexing,inwhichthedigitalobjectisexaminedforthepresenceofdesiredfeatures.Informationaboutafeatureisextractedandregisteredasmetadataonthedigitalobject.Anexampleispattern‐basedrecognitionofdescriptivemetadatawithinatextfile.
4.6.1 LoadmetadatafromanXMLfileMetadatacanbeloadedintoadatagriddirectlyfromanXMLfile.ThispolicyassumesaspecificstructurefortheXMLfileoftheform:
<?xmlversion="1.0"encoding="UTF‐8"?><metadata><AVU><Target>/$rodsZoneClient/home/$userNameClient/XML/sample.xml</Target><Attribute>OrderID</Attribute><Value>889923</Value><Unit/></AVU><AVU><Target>/$rodsZoneClient/home/$userNameClient/XML/sample.xml</Target><Attribute>OrderPerson</Attribute><Value>JohnSmith</Value><Unit/></AVU></metadata>
Notethatthisspecifiesthetargetfiletowhichthemetadataisadded.Eachmetadataattribute,value,andunitisformedintoanAVUthatisattachedasmetadatatothefile.Theruleusesthepolicyfunction: checkPathInputTheinputvariablesare:
*targetObj arelativecollectionname*xmlObj arelativecollectionname
Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:
DATA_IDDATA_NAME
43
COLL_NAME
Theoperationsthatareperformedare:failforeachifmsiLoadMetadataFromXmlmsiSplitPathselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐loadMetadataFromXml.r
4.6.2 Load metadata from a pipe‐delimited file Metadatacanbeloadedintoadatagriddirectlyfromapipe‐delimitedfile.Thispolicyassumesaspecificstructureforthepipe‐delimitedfileoftheform:
File‐name|attribute‐name|attribute‐valueFile‐name|attribute‐name|attribute‐value|unitsC‐collection‐name|attribute‐name|attribute‐valueC‐collection‐name|attribute‐name|attribute‐value|units
ForthespecifiedFile‐nameorcollection‐name,thepipe‐delimitedvaluesfortheattributename,theattributevalue,andtheattributeunitsorcommentscanbebulkloaded.Thisruleusesthepolicyfunction: checkPathInputTheinputvariablesare:
*Coll arelativecollectionnameThesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:
DATA_IDDATA_NAMECOLL_NAME
Theoperationsthatareperformedare:failforeachifmsiLoadMetadataFromDataObj
44
msiSplitPathselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐metaloadpipe.r
4.6.3 ContextualmetadataextractionthroughpatternrecognitionPatternmatchingoperationscanbeappliedtotexttoextractcontextualmetadata.Atemplateforpatternmatchingcanbecreatedthatdefinestriplets:
<pre‐string‐regexp,keyword,post‐string‐regexp>.
Thetripletsarereadintomemory,andthenusedtosearchadatabuffer.Foreachsetofpreandpostregularexpressions,thestringbetweenthemisassociatedwiththespecifiedkeywordandcanbestoredasametadataattributeonthefile.Intheexample,thetemplatefilehastheformat:
<PRETAG>X‐Mailer:</PRETAG>MailerUser<POSTTAG></POSTTAG><PRETAG>Date:</PRETAG>SentDate<POSTTAG></POSTTAG><PRETAG>From:</PRETAG>Sender<POSTTAG></POSTTAG><PRETAG>To:</PRETAG>PrimaryRecipient<POSTTAG></POSTTAG><PRETAG>Cc:</PRETAG>OtherRecipient<POSTTAG></POSTTAG><PRETAG>Subject:</PRETAG>Subject<POSTTAG></POSTTAG><PRETAG>Content‐Type:</PRETAG>ContentType<POSTTAG></POSTTAG>
Theendtagisactuallya"return"forunixsystems,ora"carriage‐return/linefeed"forWindowssystems.Theexamplerulereadsatextfileintoabufferinmemory,readsinthetemplatefilethatdefinestheregularexpressions,andthenparses the text in the buffer to identify presence of a desiredmetadata attribute.Theruleusesthepolicyfunction: checkPathInput
Theinputvariablesare:*Len numberofbytes*Outfile arelativepathforafile*Pathfile arelativepathforafile*Tag arelativepathforafile
Thesessionvariablesare: $rodsZoneClient
45
$userNameClientThepolicyusespersistentstateinformation:
DATA_IDDATA_NAMECOLL_NAME
Theoperationsthatareperformedare:failforeachifmsiAssociateKeyValuePairsToObjmsiDataObjClosemsiDataObjOpenmsiDataObjReadmsiExtractTemplateMDFromBufmsiGetObjTypemsiLoadMetadataFromDataObjmsiReadMDTemplateIntoTagStructmsiSplitPathselectwriteKeyValPairswriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐metaload.r
4.6.4 Stripping metadata from a file Itmaybenecessarytostripmetadatafromafilebeforeaddingtherequiredmetadata.Thefollowingruletakesasinputthepathtothefile,andremovesdescriptivemetadata.Theruleusesthepolicyfunction: checkPathInputTheinputvariablesare:
*Path arelativepathtoafileThesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:
DATA_IDDATA_NAMECOLL_NAME
46
Theoperationsthatareperformedare:failforeachifmsiSplitPathmsiStripAVUsselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐metastrip.r
4.7 Databackuppolicies(Policy20)Databackupcantakemultipleforms:
Time‐stampedcopiesofdigitalobjectsthataresavedinaseparatecollection Replicasofdigitalobjectsthatcanbeaccessedwhentheoriginalis
unavailable Copiesofdigitalobjectsthatareputintoseparatecollectionsordatagrids
Thechoicedependsuponwhetheratimehistoryoftheevolutionofthefileisneededorwhetherrecoveryisneededwhenfilesarecorrupted.
4.7.1 DataversioningpolicyAversionofafilecanbecreatedbyaddingatimestamp,andmovingtheversiontoanarchivedirectory.Thisruleprocessesfilesinacollection,creatingaversionofeachfilethatisstoredinadestinationdirectorycalled“SaveVersions”.Theruleusesthepolicyfunction: checkCollInput Theinputvariablesare:
*Dest arelativecollectionname*SourceFile arelativecollectionname
Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:
COLL_IDCOLL_NAMEDATA_NAME
Theoperationsthatareperformedare:failforeachifmsiDataObjCopy
47
msiSetACLselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐version.rTheversionnumbercanbeinsertedinthefilenamebeforetheextension.Thisruleparsesthefilename,identifiesanextension,andinsertsthetimestampbeforetheextensionwhentheversionnameiscreated.Theruleusesthepolicyfunction: checkPathInputTheinputvariablesare:
*Fil afilenameThesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:
COLL_IDCOLL_NAMEDATA_IDDATA_NAME
Theoperationsthatareperformedare:breakfailforeachifmsiDataObjCopymsiGetSystemTimemsiSetACLmsiSplitPathstrlensubstrselectwhilewriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐versionfile.r
4.7.2 DatabackupstagingpolicyWithintheiRODSdatagrid,backups,copies,andreplicascanbesupported.The
48
differenceisthesetofstateinformationthatisneededforeachtypeofentity.Abackupisatime‐stampedcopyofafile.Areplicaisanadditionalcopyofafilethatisstoredonaseparatestoragesystem.Thereplicanumberistrackedalongwithwhethertheoriginalhasbeenchanged.Genericstateinformationincludesacreationtimeforthedataobject,thelocationwherethedataobjectisstored,theownerofthedataobject,modificationtimestamps,andaccesscontrols.Anoutcomeofthisapproachisthatitispossibletousethesameclienttoaccessbackups,copies,andreplicas.Thisrulecreatesatime‐stampedbackupdirectory,andcopiesallofthefilesfromthesourcedirectorytothebackupdirectory.Therulereadsfrominputthecollectionforwhichthebackupwillbedone,thestoragelocationwherethebackupswillbestored,andthedestinationcollectionthatwillholdthebackup.Withinthedestinationcollection,atime‐stampedsub‐directoryiscreatedtoholdeachbackupset.Therulecheckstheinput,checksthateachoperationcompletescorrectly,andwritesinformationtoaserverlog.Theruleusesthepolicyfunction: checkCollInputTheinputvariablesare:
*Collrel arelativecollectionname*Destrel arelativecollectionname*Resource astorageresource
Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:
COLL_IDCOLL_NAME
Theoperationsthatareperformedare:delayfailforeachifmsiCollCreatemsiCollRsyncmsiGetSystemTimeselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐backup.r
49
4.7.3 Copy files to a federated staging area Thisruletakesallfilesina“stage”directoryonthefirstdatagrid,copiesthemtoan“Archive”directoryontheseconddatagrid,anddeletesthefilefromthefirstdatagrid.Therulealsologsalloftheactionsandwritesthelogtoadirectoryintheseconddatagrid.Theruleusesthepolicyfunctions:
checkCollInputcheckRescInputcreateLogFilefindZoneHostNameisColl
Theinputvariablesare:
*Coll arelativecollectionname*DestZone azonename*Res astorageresource*Stage arelativecollectionname
Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:
COLL_IDCOLL_NAMEDATA_CHECKSUMDATA_NAMERESC_IDRESC_NAMEZONE_CONNECTIONZONE_NAME
Theoperationsthatareperformedare:failforeachifmsiCollCreatemsiDataObjChecksummsiDataObjCopymsiDataObjCreatemsiDataObjUnlinkmsiGetSystemTImemsiSetACLmsiSplitPathByKeyremoteselect
50
writeLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐stage.r
4.8 Dataretentionpolicies(Policy21)Each file in a collection may have a different retention period, or all files in acollectionmayhavethesameretentionperiod.TheiRODSdatagridspecifiesadataexpiration date in themetadata attribute “DATA_EXPIRY”. The expiration date isstoredasaUnix timevariable. Informationabout thecreation timeofeach file isstoredinthemetadataattributeDATA_CREATE_TIME.
4.8.1 PurgepolicytofreestoragespaceThispolicymanagesacachetoensurethataminimumamountoffreespaceisavailablefordepositionofnewfiles.Thepolicyrunsperiodically,every24hours.Aninformationcatalogisqueriedtofindthetotalamountofstoragespacethatisbeingused.Thisiscomparedtoaninputparameterthatspecifiesthemaximumallowedspace.Additionalinputparametersspecifythecollectionandthestorageresourcenames.Asecondqueryretrievesinformationaboutthefilenames,filesizes,andcreationtime.Theresultsetisorderedbythecreationdate,makingitpossibletoloopoverthefiles,deletingtheoldestfilesuntiltherequiredfreespaceisavailable.ThispolicywasdevelopedbyJean‐YvesNiefoftheFrenchNationalInstituteforNuclearPhysicsandParticlePhysicsComputerCenter.Thisrulecouldbemodifiedtopurgeoldbackupdirectories.Theruleusesthepolicyfunctions:
checkCollInputcheckRescInputfindZoneHostName
Theinputvariablesare:
*CacheRescName astorageresource*Collection arelativecollectionname*MaxSpAlwdTBs sizeinterabytes
Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:
COLL_IDCOLL_NAMEDATA_CREATE_TIMEDATA_NAMEDATA_RESC_NAMEDATA_SIZE
51
RESC_IDRESC_NAMEZONE_CONNECTIONZONE_NAME
Theoperationsthatareperformedare:breakdelayfailforeachifmsiDataObjTrimmsiGetIcatTimemsiSplitPathByKeyremoteselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐purge.r
4.8.2 Data expiration policy Thispolicychecksthedatespecifiedbyanexpirationmetadataattributethathasbeenassignedtothefile,andcreatesalistofallfilesthathaveexpired.Inputparametersareusedtospecifythecollectionthatisbeingcheckedandwhetherexpiredfilesshouldbefound.AqueryismadetotheinformationcatalogtogetalistoftheDATA_EXPIRYdateforeachfile.ThisiscomparedtothecurrentUnixtime. Filesthathaveexpiredare listedandthetotalnumber iscounted. Theruleusesthepolicyfunction: checkCollInput
Theinputvariablesare:*Coll arelativecollectionname*Flag ametadataflag
Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:
COLL_IDCOLL_NAMEDATA_EXPIRYDATA_IDDATA_NAME
52
Theoperationsthatareperformedare:failforeachifmsiGetIcatTimeselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐expiry.r
4.9 Dispositionpolicyforexpiredfiles(Policy22)FilesintheiRODSdatagridcanbetaggedwithadditionalmetadataattributes.Forexample,ametadataattributewiththename“Retention_Flag”canbeaddedtoeachfile,alongwithametadataattributevaluesuchas“EXPIRED”or“NOT_EXPIRED”.Byusingmetadatatotrackthestatusofeachfile,itispossibletoseparatetheretentionpolicyfromthedispositionpolicy.Theretentionpolicycansetthemetadataattribute,andthedispositionpolicycanreadthemetadataattribute.Thisrulemigratesfilestoanarchivethathaveametadataattributewiththename“Retention_Flag”thathasthevalue”EXPIRED”.Therulereadsasinputthenameofthecollectionthatwillbecheckedandthenameofthedestinationcollection.Thecollectionnamesareverified.Aqueryisthenissuedtotheinformationcatalogtoretrievethenamesofthefilesinthecollectionthathavethe“EXPIRED”valueforthe“Retention_Flag”.Allofthereturnedfilesinthelistaremoved to thedestination collection. Note that the access controls on the filewillneedtoberesetafterthemove.Theruleusesthepolicyfunction: checkCollInput
Theinputvariablesare:*Archiverel arelativecollectionname*Collrel arelativecollectionname
Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:
COLL_IDCOLL_NAMEDATA_IDDATA_NAMEMETA_DATA_ATTR_NAMEMETA_DATA_ATTR_VALUE
53
Theoperationsthatareperformedare:failforeachifmsiDataObjRenameselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐disposition.r
4.10 Restrictedsearchingpolicy(Policy23)Searchpoliciesmaybeappliedtothenamesoffiles,ortothedescriptivemetadata,ortosystemstateinformation.Adatagridadministratormaybeabletoexamineallofthemetadataandseeallfilenames,butanindividualusermayonlybeabletoseethecontentthattheyown.AnewgenqueryinterfaceisbeingdevelopedforiRODSversion4.2whichwillsupportaccesscontrolsonmetadata.
4.10.1 StrictaccesscontrolThemostcommonlyrequestedrestrictionistolimittheabilityofuserstoseeanyotheruser’sfiles.Thiscanbeappliedtoallusers,orappliedtoaspecificuser.AstrictaccesscontrolisimplementedthroughthePolicyEnforcementPointcalledacAclPolicy.Themicro‐servicemsiAclPolicyimplementstherestriction.Thepolicyimplementsaconstraint:
AppliedattheacAclPolicypolicyenforcementpoint
Theoperationsthatareperformedare:msiAclPolicy
Theruleisavailableathttps://github.com/DICE‐UNC/policy‐workbook/blob/master/acAclPolicy‐strict.re
4.10.2 Controlled queries Aquerytoanexternaldatabasecanbecreatedandregisteredasadatabaseobject.Clickingontheregisteredquerywillcausethequerytobeexecutedwiththeresultsreturnedasafile.Thismakesitpossibletocontrolinteractionswithsearchengines.
4.11 Storagecostreports(Policy24)Reportscanbegeneratedthatsummarizetheuseofanyaspectofthedatagrid.Themostcommonreportsdetailusagebyuserbystoragesystem.
4.11.1 UsagereportbyusernameandstoragesystemThebasicapproachistocalculatetheamountofstorageusedoneachstoragedeviceandthentogenerateacostbymultiplyingusagebythechargeperstorageforthedevicetype.Thiscanberefinedtoimplementaseparatecostperstorage
54
device.Thecostinformationcanbestoredasametadataattributethatisassociatedwitheachstorageresource.Thisrulesumstheamountofstorageusedforeachdevicebyeachuser.Aqueryisissuedtotheinformationcatalogthatsumsthestorageforeachhomedirectoryinthedatagrid.Theresultiswrittentothescreen.
Therearenoinputvariables:Thesessionvariablesare: $rodsZoneClientThepolicyusespersistentstateinformation:
COLL_NAMEDATA_IDDATA_RESC_NAMEDATA_SIZEUSER_NAME
Theoperationsthatareperformedare:foreachifselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐storage.r
4.11.2 CostreportbyusernameandstoragesystemAcostalgorithmisimplementedbystoringa“costperbyte”metadataattributeoneachstorageresource.The“costperbyte”attributeisstoredasthemetadataattributecalled“Storage_Cost”,withtheattributevalueequaltothestoragecostperbyte.Aqueryisissuedtotheinformationcatalogtogetalistoftheusers.Thenforeachuser,aqueryisissuedtosumthestorageforeachuserforeachstoragedevice.Thestoragecostperbyteisretrievedbyaquery,andthestoragecostiscalculated.
Therearenoinputvariables:Thesessionvariablesare: $rodsZoneClientThepolicyusespersistentstateinformation:
DATA_RESC_NAMEDATA_SIZECOLL_NAME
55
META_RESC_ATTR_NAMEMETA_RESC_ATTR_VALUEUSER_NAME
Theoperationsthatareperformedare:foreachselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/rda‐storageCost.r
56
5 OdumDataPreservationPolicysetThepreservationpoliciesoverlapwiththeRDAdatamanagementpolicies.Table1showshowthepolicysetsarerelated.TheOdumdatapreservationpoliciestypicallyrequiredintegrationwithadditionalsoftwaresystemsforimplementation.Thus:
De‐identificationofdata UsesBitcuratorApplyinguniquedataidentifiers UsesHandlesystemDatanormalizationtonon‐proprietaryformats UsesPolyglotAuthenticationidentitymanagement UsesInCommonCreationofPREMISeventdata UsesmessagebusAssessmentcriteriavalidation UsesindexingtechnologyMappingmetadataacrosssystems UsesHIVEAutomaticchecksums UsesSHA‐128Trackinguse UsesDataBook
5.1 Automate access restrictions (Policy 14) Oneapproachistoassociateaccessrestrictionswithacollection,andthenhaveallfileswithinthecollectioninherittheaccesscontrols.Whenafileisputintothecollection,therequiredaccesscontrolsareautomaticallyapplied.
5.1.1 SetinheritanceofaccesscontrolsonacollectionAccesscontrolsonafilecanbeinheritedfromthecollectionintowhichthefileisorganized.Thisrulereadsasinputthecollectionnameandthensetsan“inherit”flagonthecollection.Filesthataredepositedintothecollectionwill“inherit”theaccesscontrolsthatweresetonthecollection.Theruleusesthepolicyfunction: checkCollInputTheinputvariablesare:
*Acl anaccesscontrol*RelativeCollection arelativecollectionname*User ausername
Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation: COLL_ID COLL_NAME
Theoperationsthatareperformedare:
failforeachifmsiSetACL
57
selectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/odum‐inherit.r
5.1.2 Check whether a specific person has access to a collection Theruleshowninsection4.1.5checkseachfileinacollectiontodeterminewhetheraspecifiedpersonhasaccess.Thetypeofaccesscontrolisdisplayed.Therulefindstheperson’sUSER_IDandtheDATA_IDforeachfileinthecollection.
5.1.3 Identify all persons with access to files in a collection Thisrulecreatesalistofallofthepersonswhohaveaccesstoanyfilewithinacollection.Thenumberoffilesthatcanbeaccessedandthetotalsizeoftheaccessiblefilesiscalculated.Theruleusesthepolicyfunction: containsTherearenoinputvariables:Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation: COLL_NAME
DATA_ACCESS_DATA_IDDATA_ACCESS_TYPEDATA_ACCESS_USER_IDDATA_IDDATA_SIZETOKEN_IDTOKEN_NAMETOKEN_NAMESPACEUSER_IDUSER_NAME
Theoperationsthatareperformedare:
failforeachifselectstrlenwriteLine
Theruleisavailableat
58
http://github.com/DICE‐UNC/policy‐workbook/odum‐list‐ACL.r
5.1.4 Identify files that can be accessed by an account Onceacollectionhasbeenanalyzedtodeterminewhichaccountshaveaccess,thelistofaccountnamescanbeexaminedtodeterminewhichaccountaccessshouldbedeleted.Thefollowingrulelistsallofthefilesthatcanbeaccessedbyaspecifiedaccount.Theruleusesthepolicyfunctions: checkUserInput findZoneHostNameTheinputvariablesare:
*Usern ausernameThesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:
COLL_NAMEDATA_ACCESS_DATA_IDDATA_ACCESS_USER_IDDATA_IDDATA_NAMEUSER_IDUSER_NAMEUSER_ZONEZONE_CONNECTIONZONE_NAME
Theoperationsthatareperformedare:
failforeachifmsiSplitPathByKeyremoteselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/odum‐list‐ACL‐files.r
5.1.5 Delete access to files for a specified account Thefollowingrulesetstheaccessforaspecifiedaccountto“null”forallfileswithinacollection.Onlyfilesthatoriginallyhadaccesspermissionssetfortheaccountareprocessed.Theruleusesthepolicyfunction:
59
checkUserInputfindZoneHostName
Theinputvariablesare:
*Usern ausernameThesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:
COLL_NAMEDATA_ACCESS_DATA_IDDATA_ACCESS_USER_IDDATA_IDDATA_NAMEUSER_IDUSER_NAMEUSER_ZONEZONE_CONNECTIONZONE_NAME
Theoperationsthatareperformedare:
failforeachifmsiSetACLmsiSplitPathByKeyremoteselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/odum‐delete‐access.r
5.1.6 Copy files, access control lists, and AVUs to a federated data grid Onewaytocreateanarchiveofacollectionistocopythefilestoanindependentdatagrid,alongwiththeaccesscontrolsanddescriptivemetadata.Thispolicyassumesthattwodatagridsarefederated,thatthepathnamingforfilesintheseconddatagridisthesameasthepathnameintheprimarydatagrid,andthatuseraccountsfromtheprimarydatagridhavebeenestablishedintheseconddatagrid.Thepolicycopieseachfilefromthespecifiedcollectionintheprimarydatagridintoanequivalentdirectoryintheseconddatagrid,copiestheaccesscontrols,andcopiesthemetadata.Ifanaccounthasnotbeensetupinthefederateddatagrid,the
60
ACLisnotset.Currently,theAVUcopydoesnotworkandunitsneedtobecopied.Theruleusesthepolicyfunction:
checkCollInputcheckRescInputcreateLogFilefindZoneHostNameisColl
Theinputvariablesare:
*Coll arelativecollectionname*DestZone azonename*Res astoraeresource*Stage arelativecollectionname
Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation: COLL_ID COLL_NAME
DATA_ACCESS_DATA_IDDATA_ACCESS_TYPEDATA_ACCESS_USER_IDDATA_IDDATA_NAMEMETA_DATA_ATTR_NAMEMETA_DATA_ATTR_UNITSMETA_DATA_ATTR_VALUERESC_IDRESC_NAMETOKEN_IDTOKEN_NAMETOKEN_NAMESPACEUSER_IDUSER_NAMEUSER_ZONEZONE_CONNECTIONZONE_NAME
Theoperationsthatareperformedare:
failforeachifmsiCollCreatemsiDataObjCopy
61
msiDataObjCreatemsiDataObjUnlinkmsiGetSystemTImemsiSetACLmsiSetAVUmsiSplitPathByKeyremoteselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/odum‐copy‐ACL‐AVU.r
5.2 Normalize data to non‐proprietary formats (Policy 15) Apreservationenvironmentmustensurethatthedepositedrecordswillbeviewableinthefuture.Viabledataformatswillhavenon‐proprietaryoropensourceapplicationsforparsingthedataformats.Examplesofopensourceformatsincludetextfilesandpdffiles.Thearchivewilltypicallymaintainalistofalloweddataformats,checkeachfilethatisarchivedforthedataformattype,andcreateaversionofthefileinasustainableformat.Archivesthatmanagepersistentobjectswillstillpreservetheoriginaldataformat,enablingmigrationtoalternatedataformatsinthefuture.
5.2.1 Detection of format type FilesthathavetheformattypeincludedasanextensioninthefilenamecanbeautomaticallyanalyzedtosettheDATA_TYPE_NAMEpersistentstateattribute.ItisthenpossibletoqueryDATA_TYPE_NAMEtodetectwhetherfilesarepresentwithadefineddatatype.Thispolicyguessesthedatatypebasedonthefileextension,andthensetstheDATA_TYPE_NAMEpersistentstatevariableforeachfileinacollection.Theruleusesthepolicyfunction: checkCollInput
Theinputvariablesare:*Collrel arelativecollectionname
Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:
COLL_IDCOLL_NAMEDATA_IDDATA_NAME
62
Theoperationsthatareperformedare:failforeachifmsiSetDataTypemsiSplitPathByKeyremoteselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/odum‐set‐data‐type.r
5.2.2 Automate format type detection TheDATA_TYPE_NAMEcanbeautomaticallysetoneveryputofafileintothedatagrid.Theruleusesthe$objPathsessionvariabletogetthefilename.Thepolicyimplementsaconstraint:
AppliedattheacPostProcForPutpolicyenforcementpoint
Theoperationsthatareperformedare:msiSetDataTypeFromExt
Theruleisavailableathttps://github.com/DICE‐UNC/policy‐workbook/blob/master/acPostProcForPut‐datatype.re
5.2.3 Identify file format extensions in a collection Thispolicygeneratesalistoftheformatextensionsthatareusedinacollection,countsthenumberoffileswitheachextension,andsumsthesizesofthefileswitheachextension.Theruleusesthepolicyfunctions: contains extTherearenoinputvariables.Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:
COLL_NAMEDATA_IDDATA_NAMEDATA_SIZE
Theoperationsthatareperformedare:
63
foreachifselectstrlenwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/odum‐list‐extensions.r
5.3 Creation of PREMIS event data (Policy 16) ThePREMISschemaidentifieseventsthatareappliedtorecordsinanarchive.Thetypesofeventsincludemodificationstotherecord,usageoftherecord,andactionstakenbythearchiveadministrator.ThepluggablearchitectureofiRODSversion4.1allowseachoperationtobeannotatedwithpre‐andpost‐policyenforcementpointsInformationabouttheexecutionoftheoperationcanbetrappedandwrittentoalogfile.ThelogfilecanbeprocessedtoaddPREMIS‐styleeventmetadatatoeachrecord.AscalableapproachusesanexternalindextomanagethePREMISeventmetadata.PREMISmetadataincludesinformationabout:[1] Datarecordcomposition,location,creatingapplication,creationdate,
dependencies,format,type,size,softwaredependencies[2] Environment,hardware,storagemedium[3] Linkstopermissionstatements,intellectualentities[4] Messages[5] Relatedobjects,relationshiptype[6] Signatures,signers[7] Eventtypes,values,sequenceTheeventsthatoccurwithinthedatamanagementenvironmentcanbemappedtoPREMISeventinformation:
relatedEventIdentifierType relatedEventIdentifierValue relatedEventSequence
Thisinformationcanbekeptinanexternalindexingsystemtoenableanalysis,identificationofthetypesofeventsthatoccurwithinthedatamanagementsystem,andtimelinesoftheeventsappliedtoaspecificdatarecord.Communicationwiththeexternalindexingsystemisdonethroughamessagequeue.
5.3.1 Creating PREMIS event information ThefollowingrulesarebasedontheDatabooksystemfortrackingeventinformationaboutusage,datasets,andusers.TherulecreatesaJSONdocumentrepresentinganaccesseventencodedasPREMISmetadataandsendsitviatheAdvancedMessageQueueProtocoltoanexternalindexingsystem.ThePREMISeventinformationiscreatedusingthepolicyfunctions:
64
genAccessIdwhichgeneratesaURIrepresentingthisparticularevent. jsonEncodewhichencodesthedatasothattheycanbeconcatenatedwith
JSONstrings. sendAccesswhichgeneratesamessageandsendsitusingAMQP sendRelatedEventwhichcreatesaJSONdocumentdescribingarelatedevent
betweenobjects. sendLinkingEventwhichcreatesaJSONdocumentdescribingalinkbetween
twoobjects.
5.3.2 Sending messages over AMQP ManyindexingsystemsrespondtomessagesusingtheAdvancedMessageQueueProtocol(AMQP).Alibraryofpolicyfunctionshasbeenimplementedtosupportmessages,calleddfc‐amqp.re.Thefunctionsinclude:1. amqpSend(*Host,*Queue,*Msg)
Sendsamessage *Host Hostaddressformessagequeue*Queue Queueforreceivingmessage*Msg Message
2. amqpRecv(*Host,*Queue,*Emp,*Msg)Receiveamessage *Host Hostaddressofthemessagequeue*Queue Queuethatisqueriedformessage*Emp Flagfortrimmingendoflinefrommessage*Msg Messagethatisreceived
3. startXmsgAmqpBridge(*Tic,*Log)Messagesareoftheformat"Host:Queue:Msg",assumingthatthereisno":"inHostorQueue.Messagesaretransferredevery30seconds. *Tic TicketofmessagewithinXmsgsystem*Log Flagsetto“true”tologmessageeventonserverlog
4. XmsgAmqpBridge(*Tic,*Log)TransfermessagesfromXmsgtoAMQP. *Tic TicketofmessagewithinXmsgsystem*Log Flagsetto“true”tologmessageeventonserverlog
5. startAmqpXmsgBridge(*Host,*Queue,*Tic,*Log)
AMQPtoXmsgbridge.Messagesarereadfrom*Queueon*Host,andwrittentostreamwithticket*Tic,every30seconds*Host HostofAMQPmessagequeue*Queue QueueusedwithinAMQP*Tic TicketnumberofmessageinXmsgsystem*Log Flagsetto“true”tologmessageeventonserverlog
6. AmqpXmsgBridge(*Host,*Queue,*Tic,*Log)BridgefromAMQPmessagequeuetoXmsgqueue*Host HostofAMQPmessagequeue*Queue QueueusedwithinAMQP*Tic TicketnumberofmessageinXmsgsystem*Log Flagsetto“true”tologmessageeventonserverlog
65
7. startXmsgAmqpBridgeOneQueue(*Tic,*Host,*Queue,*Log)
XmsgtoAMQPbridgewhichsendsallXmsgsfromachanneltoaqueueevery30seconds*Host HostofAMQPmessagequeue*Queue QueueusedwithinAMQP*Tic TicketnumberofmessageinXmsgsystem*Log Flagsetto“true”tologmessageeventonserverlog
8. XmsgAmqpBridgeOneQueue(*Tic,*Host,*Queue,*Log)XmsgtoAMQPbridgewhichsendsallXmsgsfromachanneltoaqueue*Host HostofAMQPmessagequeue*Queue QueueusedwithinAMQP*Tic TicketnumberofmessageinXmsgsystem*Log Flagsetto“true”tologmessageeventonserverlog
Thelibraryisavailableat:https://github.com/DICE‐UNC/policy‐workbook/blob/master/dfc‐amqp.re
5.4 Automation of user submission agreements (Policy 17) Whenfilesareloadedintoastagingarea,processingstepscanbeappliedbeforethefileismovedtothearchivallocation.Anexampleistheacquisitionofasignedusersubmissionagreement.Ausersubmissionagreementtypicallyspecifiesthattheuserownsthecopyrighttothefile,hastheauthoritytosubmitthefiletoanarchive,andagreestoasetofaccesspermissionsforthefile.ThiscanbeautomatedthroughuseofE‐mail,webforms,orformalhardcopysubmissionagreements.
5.4.1 Staging of files with a user submission agreement Filescanbemovedfromastagingareaintoanarchivewhenthepresenceofausersubmissionagreementischecked.Thispolicyassumesthataseparatecollectionisformedwithin thestagingarea,and that theusersubmissionagreementhasbeenassociated as an attribute on the collection name. As in the previous policy , thevariablename“Use_Agreement”ischeckedtoseeifthevalueis“RECEIVED”.Inthiscase,thecollectionnameischeckedinsteadoftheUSER_NAME. Theruleusesthepolicyfunction: checkCollInput
Theinputvariablesare:*Coll arelativecollectionname*Stage arelativecollectionname
Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:
COLL_IDCOLL_NAMEDATA_NAMEMETA_COLL_ATTR_NAME
66
META_COLL_ATTR_VALUE
Theoperationsthatareperformedare:failforeachifmsiDataObjRenameselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/odum‐stage‐ag.r
5.5 Automatic Checksums (Policy 18) TheBagIttechnologyencapsulatesdatainacontainerbeforetransportoverthenetwork.Withinthecontainer,amanifestfileisaddedthatprovidesachecksumforeachenclosedfile.Thechecksumcanbeextracted,comparedtoanewchecksumgenerateduponreceivingthefile,andverifiedtoensurethatthedatawerenotcorruptedontransport.Thechecksumcanberecordedasametadataattributeonthefile,DATA_CHECKSUM,andusedinthefuturetoverifyfileintegrity.
5.5.1 Creating a BagIt file Thisrulegeneratesabag(tarfile)containingamanifest,alistofchecksums,andthefilescontainedwithinaspecifiedcollection.ThegenerateBagItrulecreatestheequivalentofaSubmissionInformationPackage.Extensionswouldbetheinclusionofdescriptivemetadata,provenancemetadata,andstructuralmetadata.Theruleusesthepolicyfunction: checkCollInput
Theinputvariablesare:*BAGITDATA acollectionname*NEWBAGITROOT acollectionname
Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:
COLL_IDCOLL_NAME
Theoperationsthatareperformedare:
failforeachif
67
msiCollCreatemsiCollRsyncmsiDataObjChksummsiDataObjClosemsiDataObjCreatemsiDataObjWritemsiFreeBuffermsiExecGenQuerymsiExecStrCondQuerymsiGetContInxFromGenQueryOutmsiGetValByKeymsiMakeGenQuerymsiMakeQuerymsiSplitPathmsiTarFileCreatemsiWriteRodsLogselectstrlensubstrwhilewriteLinewriteString
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/odum‐bagit.r
5.6 AutomatedcaptureofProvenance/contextualmetadata(Policy19)Provenanceandcontextualmetadatacanbeassociatedwithfilesasmetadataattributes.ThesourceofthemetadatamaybeanXMLfile,oratextfile,orastructurewithineachdatafile.Anautomatedprocesstoacquirethemetadatawouldparsethemetadatasourcefile,andloadthemetadataasattributesoneacharchivedfile.ExamplesofthisapproachareprovidedinChapter4.6.
5.6.1 Provenance for administrative policies Provenancecanalsobetrackedforexecutionofadministrativepolicies.Workflowstructuredobjectsimplementautomatedcaptureofprovenanceinformationforeachexecutionofaworkflow.Theworkflowfileisofdatatype'msso'andusesthedot‐extension'.mso'.TheworkflowfileisregisteredintoiRODSandcanbeshared,executed,andre‐executed.Theworkflowlanguageisthesameasthatofthe'.r'fileusedbyirulecommand,butneednothavetheINPUTandOUTPUTstatements.Policiescanbestoredasworkflows,witheachexecutionoftheworkflowtrackedbythedatagrid.Foreachworkflowfile,oneassociatesastructuredobjectthatimplementsaniRODScollection‐typeenvironmentfortrackingexecutionsoftheworkflow.Allfilesassociatedwithaworkflowexecutionarestoredunderthisstructuredobjectcalled
68
theWorkflowStructuredObject(WSO).OnecanviewtheWSOakintoaniRODScollectionwithahierarchicalstructure.Atthetoplevelofthisstructures,onestoresalltheparameterfilesneededtoruntheworkflow,aswellasanyinputfilesandmanifestfilesthatareneededfortheworkflowexecution.Beneaththislevel,asetofrundirectoriesiscreatedwhichactuallyhousetheresultsofanexecution.Hence,onecanviewtheWSOasacompletestructurethatcapturesallaspectsofaworkflowexecution.IniRODStheWSOiscreatedasamountpointintheiRODSlogicalcollectionhierarchy.Thisissimilartoamountedcollectionbutoftype"msso".Oneusestheimcollcommandtocreatethismountpoint.WeuseWSOandMSSO(micro‐servicestructuredobject)synonymouslyforhistoricreasonssincetheneedandideaforWSO/MSSOcamefromtheusageexperienceforMicro‐ServiceObjects(MSO).Apartfromtheworkflowfilethereisoneotherimportantfilecalledtheparameterfile(withdot‐extension'.mpf')whichcontainsinformationneededforexecutingtheworkflow.Weseparatedtheparameterfilefromtheworkflowfilesuchthatonecanassociatemultipleparameterfileswithaworkflowandusethemforexecutingwithdifferentinputvalues.Theparameterfilescontainsvaluesforworkflow*variablesthatareusedintheworkflowexecution.Italsocontainsinformationaboutfilesthatneedtostagedinbeforetheexecutionandstagedoutforarchivingaftertheexecution.Italsocontainsdirectivesfortheworkflowexecutionengine.TheparameterfilesaswellasanyinputfilescanbeingestedintotheWSOusingnormalicommandssuchasiput.WhenaparameterfileisingestedintoaWSO,arunfileisautomaticallycreatedwhichcanbeusedtoruntheparameterfilewiththeassociatedworkflow.Whenaworkflowexecutionoccursarundirectoryiscreatedforstoringtheresultsofthisrun.Dependinguponthedirectivesintheparameterfile,olderresultsareversionedoutordiscardedafterasuccessfulworkflowexecution.Theseversiondirectoriescanbelistedandaccessedusingthenormalicommandssuchasilsandiget.Workflowscanbecalledfromwithinotherworkflows.Thisfeatureallowsonetochainworkflows.Thiscanbedoneintwoways.Oneisbyopeninganotherworkflowparameterfileinsideaworkflowandusingthedatareturnedfromthisasnormallydoneforaccessingfilesinirods.Asecondwayofrunningaworkflowinsideanotheristocallitthroughaspecialpolicycalled"acRunWorkFlow".Thefirstwayisusefuliftheoutputfilefromaworkflowisverylargeandneedstoprocessmultiplebufferreadcalls.Thesecondwayisusefulwhenthereturneddataislessthan32MBinsize.Samplesofbothversionsareshownbelow.SampleWorkflowfile:eCWkflow.mssisavailableathttp://github.com/DICE‐UNC/policy‐workbook/odum‐eCWkflow.mss
#Inputparameters:#Nameof*File1‐firstoutputfilewrittenbytheworkflow#Nameof*File2‐secondoutputfilewrittenbytheworkflow
69
#Outputparameteris:#None#Outputfromrunningtheexampleis:#messageaboutcompletionwrittentostdout##ThisworkflowexecutesthefilecalledmyWorkFlowtwicewithtwodifferentinputvalues#Thisisanexecutablefilethatislocatedinbin/cmddirectoryoftheiRODSserver.#Itcreatesanoutputfileusingthevaluegiveninthesecondargument.#Theworkflowalsoprintstostdoutthestatementaboutwhentheexecutionoccurred.testWorkflow{#odum‐eCWkflow.mssmsiExecCmd("myWorkFlow",*File1,"null","null","null",*Result1);msiExecCmd("myWorkFlow",*File2,"null","null","null",*Result2);msiGetFormattedSystemTime(*myTime,"human","%d‐%d‐%d%ldh:%ldm:%lds");writeLine("stdout","WorkflowExecutedSuccessfullyat*myTime");}
SampleParameterfileusedwitheCWkflow.ms:eCWkflow.mpf#Comments##FileNameshouldbeStarVariableNameoccurring#eitherinINPUTofthemssofileorinINPARAMofthisfile.#Pleaseidentifyallfilenamesastheywillbehelpfulforlatermetadataextraction#FILEPARAMfileStarVariableName#DIRPARAMcollStarVariableName##INPARAMparamName=paramValue#INPARAMINFOparamName,paramType=type,paramUnit=unit,valueSize=size,Comments=comments#parametersusedbytheworkflow#InthiscaseTherearetwofilesandanotherstringvalueparameter.INPARAM*File1="OutFile3"INPARAM*File2="OutFile4"INPARAM*Aval="test"##Identifyfilesthatareusedininputparams‐neededtostagebackoutputs.FILEPARAM*File1FILEPARAM*File2##Identifythestageareawheretheworkflowexecutionisperformed#bydefaultitisperformedatthe"bin"directoryoftheiRODSserver.#ThisisneededifoneisusingmsiExecCmdmicro‐serviceaspartoftheworkflow.#STAGEAREAbin##StageinfilesfromanywhereiniRODStothe"stagearea"#myDataisafilelocatedintheWSOandphoto.JPGisafilesomewhereelseiniRODS.STAGEINmyDataSTAGEIN/raja8/home/rods/photo.JPG##Stagebackadditionalfilescreatedaspartofrun#COPYOUT‐willleaveacopyinthe"stagearea"andmakeacopyiniRODSWSO#‐helpfulifitisneededbysubsequentworkflowexecution#STAGEOUT‐willmovefilefrom"stagearea"toiRODSWSO#InthiscasewearearchivingthetwofilesmyDataandphoto.JPGaswellasthe#"myWorkFlow"fileusedbytheworkflowexecution.
70
COPYOUTmyWorkFlowSTAGEOUTmyDataSTAGEOUTphoto.JPG##Thenextsetofstatementsprovidedirectivestotheworkflowsystem.#CHECKFORCHANGEisusedfortestingwherethefilebeingcheckedhaschangedsince#thepreviousexecutionoftheworkflow.Ifthefileismodified/touchedthentheworkflow#isexecuted.Ifnoneofthefilesarechanged,thentheworkflowisnotexecuted.If#directed,thefilefrompreviousexecutionis"sentback"totheclient.#NOVERSIONisusedwhenversioningofoldresultsisnotneeded.#CLEANOUTisusedtoclearthestageareaafterexecution.#CHECKFORCHANGE/raja8/home/rods/photo.JPGCHECKFORCHANGEmyData
JustforfullinformationdisclosuretheexecutableformyWorkFlowisalsoprovidedbelow.
#!/bin/sh#Justatesttocopyanexistingfile#onemaylookatthisastakingafileandcreatinganewonepossiblyafterconversion#mycpisafilethattakesttasinputandcreatesanewoutputfilecmd/mycpcmd/tt"$1"
Callingaworkflowfromanotherworkflowispossible.Thefollowingexampleshowsaworkflowcallembeddedasanobjectopeninthesampleworkflowshownabove.Thisisavailableathttp://github.com/DICE‐UNC/policy‐workbook/odum‐testWorkflowCall1.mss
Thenextexampleshowsthesameactionusingaruleandisusefulwhenreadingsmallfiles.Thisisavailableathttp://github.com/DICE‐UNC/policy‐workbook/odum‐testWorkflowCall2.mssThestepsforusingaworkflowobjectareoutlinedbelow.Firstcreateanewcollectionandingesttheworkflowfile
imkdir/dfctest/home/rodsAdmin/workflowiput‐D"mssofile"./dfcDemoWkFlow.mss/dfctest/home/rodsAdmin/workflow/dfcDemoWkFlow.mss
CreateanewcollectionandmountthatcollectionasaWorkflowStructuredObjectassociatedwiththeworkflowfile.ThecollectionthatismountedasanMSOforaworkflowcanbeanywhereiniRODS.Ascanbeseen,onecanhavemorethanonesuchstructuremountedforaworkflowfile.Thenameofthecollectionneednotberelatedtothenameoftheworkflowfile.
imkdir/dfctest/home/rodsAdmin/workflow/dfcDemoWkFlowimcoll‐mmsso/dfctest/home/rodsAdmin/workflow/dfcDemoWkFlow.mss/dfctest/home/rodsAdmin/workflow/dfcDemoWkFlow
IngestaparameterfileintheWSOcollection.OnecaningestmorethanoneparameterfilealsointhesameWSOcollection.Arunfileforeachparametricfileisautomaticallycreated.
iputdfcDemoWkFlow.mpf/dfctest/home/rodsAdmin/workflow/dfcDemoWkFlowiputdfcDemoWkFlow2.mpf/dfctest/home/rodsAdmin/workflow/dfcDemoWkFlow
71
Onecaningestotherfiles(suchasinputfiles)thatareneededforworkflowexecution.
iputmyData/dfctest/home/rodsAdmin/workflow/dfcDemoWkFlow/myData
OnecanperformilsontheWSOcollection.Itwillshowthetwoparameterfilesaswellasrunfilesthatareautomaticallycreatedforeachofthem.Notethatthenameoftherunfileisbasedonthefilenameoftheparametricfile.
ils/dfctest/home/rodsAdmin/workflow/dfcDemoWkFlowdfcDemoWkFlow.rundfcDemoWkFlow.mpfdfcDemoWkFlow2.rundfcDemoWkFlow2.mpfmyData
OnecanperformothericommandsalsoontheWSOcollection.Theigetcommandwillshowthecontentsofthefile.
icd/dfctest/home/rodsAdmin/workflow/dfcDemoWkFlowils‐liget../dfcDemoWkFlow.mss‐igetdfcDemoWkFlow.mpf‐igetdfcDemoWkFlow2.mpf‐igetmyData‐
Toexecutetheworkflowusingaparametricfile,performanaccessontheassociatedrunfile.Insteadofshowingwhatisinthe"run"file,thisigetactionexecutestheworkflowusingtheassociatedparametricfileandstorestheresults.Theigetreturnsafilebacktotheclient.Bydefaultthestdoutfromexecutionoftheworkflowisreturned.Ifoneneedsadifferentfiletobereturned,onecansetthatupaspartoftheworkflowfileortheparametricfileusingthedirective"SHOW".
igetdfcDemoWkFlow.run‐WorkflowExecutedSuccessfullyat2012‐9‐2011h:28m
TheexecutionoftheworkflowalsocreatesanewdirectoryaspartoftheWSOstructureandstorestheresultsoftheexecution(asperthedirectivesinthe.mpfparametricfile).Thiscanbeseenbyperformingalistingofthedirectorywhichwillbenamedaftertheparametricfile.
ils/dfctest/home/rodsAdmin/workflow/dfcDemoWkFlow:dfcDemoWkFlow.rundfcDemoWkFlow.mpfdfcDemoWkFlow2.rundfcDemoWkFlow2.mpfmyDataC‐/dfctest/home/rodsAdmin/workflow/dfcDemoWkFlow/dfcDemoWkFlow.runDir
ListingtherunDirwillshowtheresultsoftherun.Comparethiswiththedirectiveintheparametricfileabove.
ils‐ldfcDemoWkFlow.runDir/dfctest/home/rodsAdmin/workflow/dfcDemoWkFlow/dfcDemoWkFlow.runDir:rodsAdminmssoStdemoResc112012‐09‐20.11:28&myDatarodsAdminmssoStdemoResc992012‐09‐20.11:28&myWorkFlowrodsAdminmssoStdemoResc202012‐09‐20.11:28&OutFile1rodsAdminmssoStdemoResc202012‐09‐20.11:28&OutFile2rodsAdminmssoStdemoResc11815882012‐09‐20.11:28&photo.JPGrodsAdminmssoStdemoResc522012‐09‐20.11:28&stdout
AnyofthefilesintherunDirdirectorycanbeaccessedusingtheigetcommand.
72
Also,onecanhavewholedirectoriesstoredundertherunDir.Ifyouruntheworkflowagainwithoutchangingtheinput,theworkflowisnotactuallyexecuted.Insteadthecontentsoftheoldstdoutissentbacktotheclient.Alsotherewillbenonewfilescreated.
igetdfcDemoWkFlow.run‐WorkflowExecutedSuccessfullyat2012‐9‐2011h:30m
Thisisbecauseneithertheinputfilesnortheworkflowsystemhavechangedandasperdirective,itwillnotre‐executetheworkflow.Ifweoverwriteoneoftheinputfiles,theworkflowwillbeexecuted.SincetheNOVERSIONdirectiveisnotintheparameterfile,theolderfileswillbeversionedandthenewfilescreatedintherunDirdirectory.
iput‐fmyData2/dfctest/home/rodsAdmin/workflow/dfcDemoWkFlow/myDataigetdfcDemoWkFlow.run‐WorkflowExecutedSuccessfullyat2012‐9‐2011h:30mils‐ldfcDemoWkFlow.runDir/dfctest/home/rodsAdmin/workflow/dfcDemoWkFlow/dfcDemoWkFlow.runDir:rodsAdminmssoStdemoResc202012‐09‐20.11:30&OutFile1rodsAdminmssoStdemoResc202012‐09‐20.11:30&OutFile2rodsAdminmssoStdemoResc11815882012‐09‐20.11:30&photo.JPGrodsAdminmssoStdemoResc212012‐09‐20.11:30&myDatarodsAdminmssoStdemoResc992012‐09‐20.11:30&myWorkFlowrodsAdminmssoStdemoResc522012‐09‐20.11:30&stdout
Ascanbeseenbelow,theolderexecutionfilesarestoredunderdfcDemoWkFlow.runDir0
ils‐ldfcDemoWkFlow.runDir0/dfctest/home/rodsAdmin/workflow/dfcDemoWkFlow/dfcDemoWkFlow.runDir0:rodsAdminmssoStdemoResc112012‐09‐20.11:28&myDatarodsAdminmssoStdemoResc992012‐09‐20.11:28&myWorkFlowrodsAdminmssoStdemoResc202012‐09‐20.11:28&OutFile1rodsAdminmssoStdemoResc202012‐09‐20.11:28&OutFile2rodsAdminmssoStdemoResc11815882012‐09‐20.11:28&photo.JPGrodsAdminmssoStdemoResc522012‐09‐20.11:28&stdout
Onecanruntheworkflowwithanotherparametricfileanditwillbeplacedinanewdirectory.
igetdfcDemoWkFlow2.run‐WorkflowExecutedSuccessfullyat2012‐9‐2011h:31mils‐l/dfctest/home/rodsAdmin/workflow/dfcDemoWkFlow:rodsAdminmssoStdemoResc335544122012‐09‐20.11:26&dfcDemoWkFlow.runrodsAdminmssoStdemoResc6432012‐09‐20.11:26&dfcDemoWkFlow.mpfrodsAdminmssoStdemoResc335544122012‐09‐20.11:27&dfcDemoWkFlow2.runrodsAdminmssoStdemoResc6472012‐09‐20.11:27&dfcDemoWkFlow2.mpfrodsAdminmssoStdemoResc212012‐09‐20.11:29&myDataC‐/dfctest/home/rodsAdmin/workflow/dfcDemoWkFlow/dfcDemoWkFlow.runDirmssoStructFileC‐/dfctest/home/rodsAdmin/workflow/dfcDemoWkFlow/dfcDemoWkFlow.runDir0mssoStructFileC‐/dfctest/home/rodsAdmin/workflow/dfcDemoWkFlow/dfcDemoWkFlow2.runDir
73
mssoStructFileils‐ldfcDemoWkFlow2.runDir/dfctest/home/rodsAdmin/workflow/dfcDemoWkFlow/dfcDemoWkFlow2.runDir:rodsAdminmssoStdemoResc202012‐09‐20.11:31&myOutFile3rodsAdminmssoStdemoResc202012‐09‐20.11:31&myOutFile4rodsAdminmssoStdemoResc11815882012‐09‐20.11:31&photo.JPGrodsAdminmssoStdemoResc212012‐09‐20.11:31&myDatarodsAdminmssoStdemoResc992012‐09‐20.11:31&myWorkFlowrodsAdminmssoStdemoResc522012‐09‐20.11:31&stdout
NotethatthenameoftheoutputfilesaredifferentinthesecondrunasthenameswerechangedindfcDemoWkFlow2.mpf
5.7 Federation–periodicallycopydata(Policy20)Apolicyforcopyingdatabetweentwofederateddatagridswasprovidedinsection4.7.3.Thepolicycanbeturnedintoaperiodicallyexecutedrulebyaddingadelaycommandthatexecutesthepolicyeveryweek.Thisruletakesallfilesina“stage”directoryonthefirstdatagrid,copiesthemtoan“Archive”directoryontheseconddatagrid,anddeletesthefilefromthefirstdatagrid.Therulealsologsalloftheactionsandwritesthelogtoadirectoryintheseconddatagrid.Theruleusesthepolicyfunctions: checkCollInput checkRescInput
createLogFilefindZoneHostName
isCollTheinputvariablesare:
*Dest acollectionname*DestZone thedestinationzone*Res astorageresource*Src acollectionname
Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:
COLL_IDCOLL_NAMEDATA_CHECKSUMDATA_NAMERESC_IDRESC_NAMEZONE_CONNECTIONZONE_NAME
Theoperationsthatareperformedare:
74
delayfailforeachifmsiCollCreatemsiDataObjChksummsiDataObjCopymsiDataObjCreatemsiGetSystemTimemsiSetACLmsiSplitPathByKeyremoteselectstrlensubstrwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/odum‐stage‐ag.r
5.8 De‐identificationofData(Policy25)Thisiscrucialforallrepositoriesinallfieldswhenhumansubjectsdataareinvolved.Informationrelatedtoaddresses,socialsecuritynumbers,andcreditcardshastobeidentifiedandremoved.Theidentificationofpersonallyidentifieddatawithinsubmitteddigitalobjectsmaybepartofausersubmissionagreement.Theabilitytoautomatethedetectionisessentialwhenresearcherssubmitmaterial.
5.8.1 BitCuratorbasedprocessingTheBitCuratorprojectbringsinaseriesofopensourcedigitalforensicstoolsandtechniquestocollectinginstitutions,topreservetheirborn‐digitalcollections[6].iRODS(Integratedrule‐orienteddatasystem)isadata‐gridsoftwaresystem,whereuserscanbuildsharablecollectionsfromdatadistributedacrossfilesystemsandtapearchives[9].Thisprojectintegratesthetwotechnologies,allowingauserofiRODStoruntheBitCuratortoolsinaniRODSenvironmentandcopytheresultingreportsintotheiRODSgrid.ThisdocumentliststheBitCuratortoolsthatareintegratedintoiRODSandaoverviewofeachtoolalongwithadescriptiononhowtouseit.ThetoolsarerunonaniRODSserver,requiringaninstallationbythedatagridadministrator.
TheprerequisiteforrunningtheBitcuratortoolsonamediaoranysetoffilesistousethetool“Guymager”(http://guymager.sourceforge.net/)andgenerateanimageinthe.affor.E01format.
5.8.1.1 Generate Digital Forensics XML file ThisutilityusestheBitCuratorFiwalktool,takesanimageinthe.afforE01formandgeneratesanXMLfile.Asper[7],“DigitalForensicsXML(orDFXML)isa
75
metadataschemadesignedtofacilitatethesharingofstructuredinformationproducedbyforensictools.DFXMLisanattempttostandardizeabstractionsbyprovidingaformalizedlanguagefordescribingforensicprocesses”.Referto[7]formoredetails.Thecommandtobeexecutedislocatedinthedirectoryirods/server/bin/cmd/fiwalk.ThisruleInvokestheFiwalktooltogeneratetheXMLoutputofthegivendiskimage.
CommandStructure:irule‐Fodum‐bcGenerateXml.r"*outXmlFile='/Path/to/xmlfile'""*image='/path/to/image.aff'"
Theinputvariablesare:
*image afilepathname*outXmlFile afilepathname
Thesessionvariablesare: $userNameClientThepolicyusespersistentstateinformation:
COLL_NAMEDATA_NAMEDATA_PATHDATA_RESC_NAMERESC_LOC
Theoperationsthatareperformedare:errorcodeerrormsgexecCmdArgfailforeachifmsiDataObjPutmsiExecCmdmsiGetStderrInExecCmdOutmsiGetStdoutInExecCmdOutmsiSplitPathremoteselecttimewriteLine
Theruleisavailableat
76
http://github.com/DICE‐UNC/policy‐workbook/odum‐bcGenerateFiwalkRule.rCommandexamples:
1.irule‐Fodum‐bcGenerateFiwalkRule.rDefaultparameterscanbemodifiedbychangingthefollowinglinewithappropriatevalues:
INPUT*outXmlFile="/AstroZone/home/pixel/bcfiles/xmlfile",*image="/AstroZone/home/pixel/bcfiles/charlie‐workusb‐2009‐12‐11.aff"
2.irule‐Fodum‐bcGenerateFiwalkRule.r"*outXmlFile='/home/xmlfile'""*image='/home/test.aff'"
Files:•LocalFileSystem:
ThefollowingfileresidesontheLocalFileSystem:$iRODS/server/bin/cmd/fiwalk
•iRODSGrid:Executingthisrulecreatesthefollowingfileonthegrid:$iRODS_grid/<xmlfile>
Implementationnotes:Thefiwalktool,anexecutablefile,iscopiedtoiRODS/server/bin/cmddirectory:
cp/usr/local/bin/fiwalkiRODS/server/bin/cmd/fiwalk
5.8.1.2 BulkExtractorThe“bulk_extractorisacomputerforensicstoolthatscansadiskimage,afile,oradirectoryoffilesandextractsusefulinformationwithoutparsingthefilesystemorfilesystemstructures.Theresultscanbeeasilyinspected,parsed,orprocessedwithautomatedtools.”[8]Thistooltakesthediskimage(the.afffile)asaninputandgeneratesanoutputdirectoryinthespecifiedlocation,containingatextfileforeachofthefeatureslocatedintheinputimage.FormoreinformationonBulkExtractorscanners,refertothefollowingURLs:
http://www.forensicswiki.org/wiki/Bulk_extractorhttp://wiki.bitcurator.net/index.php?title=Bulk_Extractor_Scanners
Thecommandtobeexecutedislocatedindirectory
irods/server/bin/cmd/bulk_extractorTheexecutioncommandis
bulk_extractor<image.aff>‐o<outputdirectory>
InputParameteris:ImageFilepathOutputParameteris:FilePathforFeatureFiles
CommandStructure:irule‐Fodum‐bcExtractFeatureFilesRule.r"*image='/path/to/image.aff'""outFeatDir='/path/to/outdir'"
Theinputvariablesare:
*image afilepathname
77
*outFeatDir acollectionnameThesessionvariablesare: $userNameClientThepolicyusespersistentstateinformation:
COLL_NAMEDATA_IDDATA_NAMEDATA_PATHDATA_RESC_NAMERESC_LOC
Theoperationsthatareperformedare:errorcodeerrormsgexecCmdArgfailforeachifmsiDataObjPutmsiExecCmdmsiGetStderrInExecCmdOutmsiGetStdoutInExecCmdOutremoteselecttimewriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/odum‐bcGenerateXml.r
Commandexamples:1.irule‐Fodum‐bcExtractFeatureFiles.r
Defaultparameterscanbemodifiedbychangingthefollowingline:INPUT*image="/AstroZone/home/pixel/bcfiles/charlie‐work‐usb‐2009‐12‐11.aff",*outFeatDir="/AstroZone/home/pixel/bcfiles/BeOutFeatDir"
2.irule‐Fodum‐bcExtractFeatureFiles.r"*image='<image>.aff'""*outDir='/home/be_feature_dir'"
Files:•LocalFileSystem:
Thefollowingfile(s)residesontheLocalFileSystem:$iRODS/server/bin/cmd/bulk‐extractor
•iRODSGrid:Executingthisrulecreatesthefollowingfileonthegrid:
78
$iRODS_grid/be_feature_dirTheactuallistoffileswithinthisdirectorydependsonthefeaturesidentifiedwithintheimagefile.Examples:
$iRODS_grid/be_feature_dir/domain.txt$iRODS_grid/be_feature_dir/telephone.txt
Implementationnotes:ThefollowingfileiscopiedtoiRODS/server/bin/cmddirectory:
cp/usr/local/bin/bulk_extractoriRODS/server/bin/cmd/bulk_extractor)
5.8.1.3 GenerateAnnotatedFiles(identify_filenames)Thistooltakestheoutputfilesgeneratedbybulk_extractorandthediskimagefile(.afforE01format)astheinputsandcreatestheannotatedversionsofeachofthefeaturefilesgeneratedbythebulk_extractor.
InputParametersare:ImageFilepathBulk_extractordirectory
OutputParameteris:OutputdirectoryannotatedFilesDirtostoretheannotatedfiles.
Tool:identify_filenames‐‐all–imagefile"path/to/imagefile.aff""Path/to/beFeatDir""Path/to/outAnnDir"
CommandStructure:irule‐Focum‐bcAnnotateBeFiles.r"*image='/path/to/image.aff'"\"*beOutDir='/path/to/beDir'""*annotateFilesDir='/path/to/newdir'"
Theinputvariablesare:*beFeatDir acollectionname*image afilepathname*outAnnDir acollectionname
Thesessionvariablesare. $userNameClientThepolicyusespersistentstateinformation:
COLL_NAMEDATA_NAMEDATA_PATHDATA_RESC_NAMERESC_LOC
Theoperationsthatareperformedare:breakerrorcodeerrormsgexecCmdArg
79
failforeachifmsiDataObjPutmsiExecCmdmsiGetStderrInExecCmdOutmsiGetStdoutInExecCmdOutmsiSplitPathremoteselectsplittimewriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/odum‐bcAnnotateBeFiles.r
Commandexamples:1.irule‐Fodum‐bcAnnotateBeFiles.r
Thedefaultparameterscanbemodifiedbychangingthefollowinglinesappropriately:INPUT*image="/AstroZone/home/pixel/bcfiles/charlie‐work‐usb‐2009‐12‐11.aff",*beFeatDir="/AstroZone/home/pixel/bcfiles/beFeatDir",*outAnnDir="/AstroZone/home/pixel/bcfiles/outAnnDir"
2.irule‐Focum‐bcAnnotateBeFiles.r"*image='/home/test.aff'""*beOutDir='/home/beDir'""*annotateFilesDir='/home/annotated_dir'"
Files:•LocalFileSystem:
Thefollowingfile(s)residesontheLocalFileSystem:$iRODS/server/bin/cmd/identify_filenames
•iRODSGrid:Executingthisrulecreatesthefollowingfileonthegrid:
$iRODS_grid/annotated_dirTheactuallistoffileswithinthisdirectorydependsonthefeaturesidentifiedwithintheimagefile.Examples:
$iRODS_grid/annotated_dir/annotated_domain.txt$iRODS_grid/annotated_dir/annotated_telephone.txt
ImplementationNotes:ThefollowingfilesarecopiedtoiRODS/server/bin/cmddirectory:
~/Research/Tools/bulk_extractor/python/fiwalk.py~/Research/Tools/bulk_extractor/python/dfxml.py~/Research/Tools/bulk_extractor/python/bulk_extractor_reader.py~/Research/Tools/bulk_extractor/python/identify_filenames.pyasidentify_filenames
80
5.8.1.4 GenerateBitCuratorReportsThistooltakesthexmloutputoftheFiwalktoolandtheannotatedfilescreatedbyidentify_filenamesastheinputsandproducesvariousreportsinExcelandPDFformatsinthespecifiedoutputdirectory.ThePythonscriptislocatedinirods/server/bin/cmd/bc_generate_reports
InputParametersare:AnnotatedFilesDirectory(GeneratedbytherulerulemsiBcAnnotateBeFiles.r)XMLfilegeneratedbyfiwalktool(usingtherule:
rulemsiBcGenerateXml.r)Configurationfile
OutputParameteris:OutputdirectorynewBcReportsDirwherethereportsaregenerated.
Tool:bc_generate_reports‐‐fiwalk_xmlfile</path/to/xmlfile/>‐‐annotated_dir</path/to/annotatedDir/‐‐outdir</path/to/outdir/>‐‐conf</path/to/configfile/>
CommandStructure:irule‐Fodum‐bcGenerateReportsRule.r"*fiwalkXmlFile='/Path/To/Xmlfile'""*annotatedDir='/Path/To/annotated_directory'""*outReportsDir='/Path/To/output_Reports_directory'""*conf='/Path/To/Config_file'"
Theinputvariablesare:*annotatedDir acollectionname*conf afilepathname*fiwalkXmlFile afilepathname*outReportsDir acollectionname
Thesessionvariablesare: $userNameClientThepolicyusespersistentstateinformation:
COLL_NAMEDATA_NAMEDATA_PATHDATA_RESC_NAMERESC_LOC
Theoperationsthatareperformedare:breakerrorcodeerrormsgexecCmdArgfailforeachifmsiDataObjPut
81
msiExecCmdmsiGetStderrInExecCmdOutmsiGetStdoutInExecCmdOutmsiSplitPathremoteselectsplittimewriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/odum‐bcGenerateReportsRule.r
Commandexamples:1.irule‐Fodum‐bcGenerateReportRules.r
Thedefaultparameterscanbemodifiedbychangingthefollowinglinewithappropriateparameters:INPUT*fiwalkXmlFile="/AstroZone/home/pixel/bcfiles/bcTestFiwalkXmlfile.xml",*annotatedDir="/AstroZone/home/pixel/bcfiles/bcTestBeAnnDir",*outReportsDir="/AstroZone/home/pixel/bcfiles/outReportsDir",*conf="/AstroZone/home/pixel/bcfiles/bcTestConfigFile"
2.irule‐Fodum‐bcGenerateReportRules.r"*fiwalkXmlFile='/home/xmlfile'""*annotatedDir='/home/annotated_directory""*outReportsDir='/grid/output_directory'"“*conf=/home/config_file”
Files:•LocalFileSystem:
Thefollowingfile(s)residesontheLocalFileSystem:$iRODS/server/bin/cmd/generate_report
•iRODSGrid:Executingthisrulecreatesthefollowingdirectories/filesonthegrid:$iRODS_grid/outReportsDir:$iRODS_grid/outReportsDir/BeReport.pdf$iRODS_grid/outReportsDir/FiwalkDeletedFiles.pdf$iRODS_grid/outReportsDir/FiwalkReport.pdf$iRODS_grid/outReportsDir/bcTestFiwalkXmlfile.xml.xlsx$iRODS_grid/outReportsDir/bc_format_bargraph.pdf$iRODS_grid/outReportsDir/format_table.pdf$iRODS_grid/outReportsDir/bcfiles/outReportsDir/featuresThefilesunderthefeaturesdirectorydependsontheimage.
Examplesare:$iRODS_grid/outReportsDir/bcfiles/outReportsDir/features/domain.xlsx$iRODS_grid/outReportsDir/bcfiles/outReportsDir/features/telephone.xlsx$iRODS_grid/outReportsDir/bcfiles/outReportsDir/features/domain.pdf
82
$iRODS_grid/outReportsDir/bcfiles/outReportsDir/features/telephone.pdf
Implementationnotes:ThefollowingfilesarecopiedtoiRODS/server/bin/cmddirectory:
$BitCurator/python/bc_reports_tab.pyasbc_reports_tab$BitCurator/python/generate_report.pyasbc_generate_reports$BitCurator/python/bc_utils.py$BitCurator/python/bc_config.py$BitCurator/python/bc_pdf.py$BitCurator/python/bc_graph.py$BitCurator/python/bc_regress.py$BitCurator/python/bc_genrep_dfxml.py$BitCurator/python/bc_genrep_text.py$BitCurator/python/bc_genrep_xls.py$BitCurator/python/bc_gen_feature_rep_xls.py$BitCurator/python/bc_config_file
5.8.1.5 BitcuratorGUIBitCuratorsupportsaGraphicalUserInterfaceusingwhichuserscanlaunchthetoolsexplainedabove.AruleiswrittentolaunchthisGUI.ButmoreworkneedstobedonetomaketheGUItoappearontheclientscreenratherthanontheserver.
Noinputvariablesareused:Nosessionvariablesareused.Thepolicyusesnopersistentstateinformation:
Theoperationsthatareperformedare:
errorcodeerrormsgifmsiExecCmdmsiGetStderrInExecCmdOutmsiGetStdoutInExecCmdOutwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/odum‐bcGenerateReportsGuiRule.r
Commandexample:irule‐Fodum‐bcGenerateReportsGuiRule.r
5.9 Unique Identifiers for Data Sets (Policy 26) MultipleexternalrepositoriesrequirethegenerationofauniquedataID.AnexampleisDataONE,whichusestheHandlesystemtoassignauniqueidentifiertoa
83
dataset.Notallrepositoriesusethesametypeofidentifier.Forinstance,theCaliforniaDigitalLibraryusesanARCidentifier.
5.9.1 Assigning a Handle to a File TheHandlesystemcanusealocalhandleregistryforassigningidentifierstofiles.Thelocalhandleregistry,inturn,isassignedauniqueidentifierinaglobalhandlesystem.ThefollowingrulecreatesahandleandregistersitintheDFChandleserver:(theregistrationofthehandleinourhandleserverindicatesitisavailableforaccessfromDataONE)Thepolicyimplementsaconstraint:
AppliedattheacPostProcForPutpolicyenforcementpointRestrictedtocollectionslike“nexrad”
Thepolicyusessessionvariables $userNameClient
Theoperationsthatareperformedare:
msiExecCmdmsiGetStdoutInExecCmdOutmsiWriteRodsLog
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/acPostProcForPut‐handle‐nexrad.re.
Theruleexecutesashellscript:
#!/bin/bashif["$#"‐ne2];thenecho"Usage:create_handle<dataobjectid><dataobjecturl>"exit1fiOID="$1"URL="$2"HANDLE=$(java‐classpath./irods‐hs‐tools.jarorg.irods.dfc.CreateHandle./admpriv.bin"$URL""$OID")echo"$HANDLE"exit0;
5.9.2 Registering files in DataONE registry DataONEwebservicesareusedtoautomateregistryofaniRODScollectionintheDataONEregistry.WhentheDataONEwebserviceasksforalistofDataONEregisterediRODSdataobjects,themembernodewebservicerespondsbyretrieving
84
thelistofobjectsthathavebeenregisteredinthehandleserver.Theharvestingisdoneperiodically,withtheresultthataniRODSdatacollectioncanbediscoveredandaccessedthroughtheDataONEservices.
5.10 Authentication identity management (Policy 27) TheiRODSdatagridprovidessupportforpluggableauthenticationenvironments.Eachplug‐incanalsosupportpre‐andpost‐policyenforcementpoints.Astandardexampleistheuseofanexternalcertificateauthorityforrecognizingusers.Anycertificatefromthatcertificateauthorityishonored,andacorrespondinguseraccountissetupinthedatagrid.Policiescontrolwhatthenewusersareallowedtodo.ThiscapabilitywasimplementedfortheAustralianResearchCollaborationService.TheiRODScommandlinetools(icommands)andGridFTPinterfacecanuseGSI(GridSecurityInfrastructure)authenticationwhichreliesonlimitedlifetimeproxycertificates.Inaddition,yourGSIcertificatemustbemappedtoyourARCSDataFabricaccount.ThisisdoneautomaticallyforARCSSLCScertificates,andyoucanaddadditionalmappingsforotherGSIcertificates.AcertificatecanalsobeacquiredfromCILogonthroughtheInCommoninfrastructure.AniRODSdatagridaccountcanbesetupwithauthenticationbasedontheGSIcertificate.
5.10.1 Verify access controls on each file Thedatagridmanagesaccesscontrollistsforeachfile.ItispossibletoquerytheiCATcatalogtocheckwhetheraccesspermissionhasbeengiventoindividualswhoshouldnolongerhaveaccess.Thistypicallyhappenswhenanadministratorretires,ortheaccesscontrolpoliciesforacollectionhavechanged.Therulelistedinsection4.1.5identiesaccesscontrolsonafileinacollectionforaspecificperson.
5.11 Automated Data Reviews (Policy 28) Itispossibletoreviewanyofthestateinformationthatisstoredforafile.Areportcanbegeneratedwhichlistsallofthenon‐compliantfileswithinacollection.
5.11.1 Metadata Review Thispolicycomparesthemetadataschemathatisassignedtoacollectionwiththemetadataattributessetoneachfilewithinthecollection.Thecollectionmetadataschemaisdefinedbysettingametadataattributeonthecollectionwithanattributevalueof“null”.Noinputvariablesareused.Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:
COLL_NAME
85
DATA_NAMEMETA_COLL_ATTR_NAMEMETA_COLL_ATTR_VALUEMETA_DATA_ATTR_NAMEMETA_DATA_ATTR_UNITS
Theoperationsthatareperformedare: break
foreachifselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/odum‐listmetadata.r
5.12 Mapping metadata across systems (Policy 29) TheHIVE(HelpingInterdisciplinaryVocabularyEngineering)technologyisusedtointegratevocabulariesencodedwiththeSimpleKnowledgeOrganizationSystem(SKOS),aWorldWideWebConsortium(W3C)standard.HIVEisaLinkedOpenData(LOD)technologyaligningwithLinkedOpenVocabularies(LOV)activities.TheHIVEapproachandtechnologiespromoteinteroperabilityamongdatarepositories,libraries,andarchives,allowingscholarlyworkstobeeasilyandquicklyindexedacrossmultipledisciplines.TheHIVEsystemcanbeaccessedfromtheiRODSDataGridusinganupdatedCurlmicro‐service.ARESTserviceisavailablethatcanqueryforhttp://URIsrepresentingconceptsinaSKOSvocabularythatisstoredintheHIVEsystem.AnexampleXMLrepresentationofa'concept'intheUATvocabularyforagivenURIis:
<hiveConcepturi="http://purl.org/astronomy/uat#T100"> <label>Astroparticlephysics</label> <altLabel>Particleastrophysics</altLabel> <broaderuri=http://purl.org/astronomy/uat#T828> <label>"Interdisciplinaryastronomy"</label></broader> <narroweruri=http://purl.org/astronomy/uat#T635> <label>"Gammarays"</label></narrower> <narroweruri=http://purl.org/astronomy/uat#T351> <label>"Cosmologicalneutrinos"</label></narrower> <narroweruri=http://purl.org/astronomy/uat#T689> <label>"Gravitationalwaves"</label></narrower> <relateduri=http://purl.org/astronomy/uat#T372> <label>"Darkmatter"</label></related> <vocabName>uat</vocabName></hiveConcept>
TheseURIsmaybeappliedtoiRODSdataobjectsusingtheAVUmechanism,wheretheAVUattributeisthevocabularyURI,andtheAVUunitisaspecialmarkerofthe
86
form'iRODSUserTagging:HIVE:VocabularyTerm'thatindicatesthattheAVUisaresolvableURI.
5.12.1 Validate HIVE vocabularies AnexamplevalidationruleutilizestheRESTservicetoiterateoveriRODScollections,validatingthetermsasbeingvalidSKOSreferences,andgeneratingareportoninvalidterms.Noinputvariablesareused.Thesessionvariablesare: $rodsZoneClient $userNameClientThepolicyusespersistentstateinformation:
COLL_NAME
Theoperationsthatareperformedare:foreachifmsiCurlGetStrmsiCurlUrlEncodeStringselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/odum‐validateOntologies.r
Hereisanexampleoutputwhentwodataobjectsareannotated,onewithaninvalidterm:
test1@ubuntu:~/workspace/rule_workbench$irule‐Fvalidate_data_object_ontologies.rMetadatavalidationreport/fedZone1/home/rods/hive/libmsiCurlGetObj.cpphasurihttp://purl.org/astronomy/uat#TT888thatisnotinavalidontology
5.13 Export Datasets in Multiple Formats (Policy 30) Themotivationforchangingtheformatofafilemaybetocreateastandardrepresentationforpreservation,ortocreateapreferredformatfordisplay.TheabilitytoexportormakeavailabletodownloaddatasetsinmultipleformatssuchasExcel,CVS,SPSS,orStata(inothersciencesthiswouldincludeotherformatsbuttheissueisthesame–beingabletogoinandoutofopenandproprietaryformatstoaidpreservation)addressesbothfutureuserneedsandimmediateuserneeds.
5.13.1 Polyglot Format Conversion ThispolicyinvokestheNCSAPolyglotservicetotransformadataformat.Theoriginalfileisreplacedwiththemodifiedfile,andmetadataattributesareupdated.
87
Ifanattributenamed“ConvertMe”ispresentonthefile,thefileisconverted.Thenameoftheoriginalfileisthenaddedasmetadata.Thepolicyimplementsaconstraint: AppliedattheacPostProcForModifyAVUMetadatapolicyenforcementpoint Checksthattheattributenameis“ConvertMe”
Theinputvariablesare:
*Option notused*ItemType notused*ItemName Fileorcollectionname*Aname Attributename*Avalue Attributevalue*Aunit Attributeunits
Thepolicyfunctionsare:deleteAVUMetadatamodAVUMetadata
Theoperationsthatareperformedare:
ifirods_curl_get
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/acPostProcForModifyAVUMetadata.re.
5.14 Check for viruses (Policy 31) Allfilesinastagingareacanbecheckedforthepresenceofavirus.Whenthecheckiscomplete,thefilescanthenbemovedintoacollection.Thisusestheclamscanviruscheckroutinewhichisrunasanexternalexecutable.TheclamscanprogrammustbeinstalledontheiRODSserverwherethestagingareaislocatedinthe/usr/bindirectory.
5.14.1 Scan files and flag infected objects Thisrulerunstheclamscanscriptonanexternalresource,whichchecksforthepresenceofviruses.Eachfileisflaggedwithametadataattributetorecordthestatusoftheviruscheck.Theclamscanpythonscriptis:
#!/usr/bin/pythonimportsubprocess,sysproc=subprocess.Popen(['/usr/local/bin/clamscan']+sys.argv[1:],stdout=subprocess.PIPE,stderr=subprocess.STDOUT)sys.stdout.write(proc.communicate()[0])sys.stdout.flush()sys.exit(abs(proc.returncode))
Thecontrollingpolicycanbeinvokedinteractively,oraddedtotherulebaseandinvokedaftereachfileload.
88
Thepolicyimplementsaconstraint:
AppliedattheacScanFileAndFlagObjectpolicyenforcementpoint
Theinputparametersare:*Objpath iRODSfilethatisscanned*FilePath PhysicallocationofiRODSfile*Resource ResourceholdingphysicalcopyofiRODSfile
Theoperationsthatareperformedare:ifmsiAddKeyValmsiAssociateKeyValuePairsToObjmsiExecCmdmsiGetStdoutInExecCmdOutmsiGetSystemTime
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/acScanFileAndFlagObject.re.
5.15 Rule set management (Policy 32) TheiRODSdatagridreliesuponadistributedruleengineanddistributedrulebasestoimplementpolicies.Ifapolicyischanged,forconsistencytherevisedrulebaseneedstobeinstalledateachserverlocation.
5.15.1 Deploy rule sets Thisruleidentifierstheservers,anduploadsanewversionoftherulebasetoeachserver.Themicro‐servicesusedbythisruleareavailableathttps://github.com/DICE‐UNC/irods_rule_admin_micorservicesTheinputvariablesare:
*ruleBaseName list(“core”)*targets list(“localhost”)
Nosessionvariablesareused.Thepolicyusesnopersistentstateinformation:
Theoperationsthatareperformedare:
breakerrorcodefailmsgforeachifmsiChksumRuleSetmsiMvRuleSetmsiReadRuleSet
89
msiRmRuleSetmsiRuleSetExistsremotewhilewriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/odum‐copyRule.r
5.16 Parse event trail for all persons accessing a collection (Policy 33) TheDFCDataBooksystemprovidesawaytorecordinformationabouteventsthatoccuronfileswithinthedatagrid.Thispolicyisimplementedintherulebase,suchthateventsareautomaticallytrackedacrossallclients.ThepoliciesareavailableinthefileiRODS/server/config/reConfigs/databook.re.Thepolicysetmodifieseachofthepolicyenforcementpointrulestoaddeventtracking.Theattributesthataretrackedare:
ATTR_IDATTR_HAS_VERSIONATTR_PREVIEWATTR_THUMB_PREVIEWATTR_CONTRIBUTORATTR_RELATEDATTR_REPLACED_BYATTR_REPLACESATTR_TITLEATTR_DESCRIPTION
Theruleisavailableathttps://github.com/DICE‐UNC/policy‐workbook/blob/master/databook.re
90
6 Protected Data Policy Sets TheUNCrequirementsformanagementofprotecteddatasetshavebeenanalyzedfordevelopmentofcomputeractionablepoliciesthatcanautomatemanagementtasks.Thedatamanagementrequirementsareabstractedfromthedocument,
https://www.med.unc.edu/security/hipaa/documents/ADMIN0082%20Info%20Security.pdf
TherequirementsarelistedinAppendixE.Eachrequirementhasbeenevaluatedforthefeasibilityofcreatingacomputeractionablepolicythatautomatesenforcement.Policiesarealsodefinedtoverifythateachrequirementhavebeenenforced.Adeeparchiveisproposedformanagingdatathatcontains“Protected”informationatUNC.Noaccessispermittedfromtheexternalworldtothedeeparchive.Insteadprocessesrunningwithinthedeeparchivepulldatarecordsfromastagingarea.Onthestagingarea,thedatasetsarecheckedfor“Protected”information,encrypted,andstoredintothedeeparchive,asshowninFigure1.
The“Protected”recordsmayalsobearchivedatan“off‐site”locationsuchastheTexasAdvancedComputerCentertominimizeriskofdataloss.TheiRODSdatagridauthenticateseveryuser,authorizeseveryoperation,managesinteractionswiththestoragesystems,andcreatesaneventdatabasedetailingeveryinteraction.Policies
Figure1.Federateddatagridsforadeeparchive
91
canparsetheeventdatabasetoverifycompliancewithpoliciesovertime,trackunauthorizedaccessattempts,andtrackdatacorruptionevents.Grouppermissionsaredefinedforaccesstothedatatosimplifyusermanagement.ThetasksforprotecteddataarelistedinTable2.
Table2.Protecteddatatasksrequiringpolicycontrol1 CheckforpresenceofPIIoningestion
2 Checkforvirusesoningestion
3 Checkpasswordsforrequiredattributes
4 Encryptdataoningestion
5 Encryptdatatransfers
6 Federation‐controldatacopies(accesscontrol)
7 Federation‐manageremotedatagridinteractions(updaterulebase)
8 Federation‐periodicallycopydata
9 Federation‐managedataretrieval(updateaccesscontrols)
10 Generatechecksumoningestion
11 Generatereportofcorrectionstodatasetsoraccesscontrols
12 Generatereportforcost(time)requiredtoauditevents
13 Generatereportoftypesofprotectedassetspresentwithinacollection
14 Generatereportofallsecurityandcorruptionevents
15 Generatereportofthepoliciesthatareappliedtothecollections
16 Listallstoragesystemsbeingused
17 Listpersonswhocanaccessacollection
18 Liststaffbypositionandrequiredtrainingcourses
19 Listversionsoftechnologythatarebeingused
20 Maintaindocumentonindependentassessmentofsoftware
21 Maintainlogofallsoftwarechanges,OSupgrades
22 Maintainlogofdisclosures
23 Maintainpasswordhistoryonusername
24 Parseeventtrailforallaccessedsystems
25 Parseeventtrailforallpersonsaccessingcollection
26 Parseeventtrailforallunsuccessfulattemptstoaccessdata
27 Parseeventtrailforchangestopolicies
28 Parseeventtrailforinactivity
29 Parseeventtrailforupdatestorulebases
30 Parseeventtrailtocorrelatedataaccesseswithclientactions
31 Providetestenvironmenttoverifypoliciesonnewsystems
92
Foreachlistedtask,wedemonstrateaniRODSpolicythatimplementstheassociateddatamanagementfunctions.
6.1 Check for presence of PII on ingestion (Policy 34) Thebitcuratortechnologyisabletoparsebinaryimagesforpersonallyidentifiedinformationsuchascreditcardnumbersandsocialsecuritynumbers.Thecurrentimplementationrunsthebitcuratorexecutableonthestoragesystemholdingthedata.Thebitcuratortechnologyisdescribedinsection5.8.
6.2 Check for viruses on ingestion (Policy 31) Allfilesinastagingareacanbecheckedforthepresenceofavirus.Whenthecheckiscomplete,thefilescanthenbemovedintoacollection.Thisusestheclamscanviruscheckroutinewhichisrunasanexternalexecutable.TheclamscanprogrammustbeinstalledontheiRODSserverwherethestagingareaislocatedinthe/usr/bindirectory.
Table2continued.Protecteddatatasksrequiringpolicycontrol
32 Providetestsystemforevaluatingarecoveryprocedure
33 Providetrainingcoursesforusers
34 Replicatedatasetsoningestion
35 ReplicateiCATperiodically
36 Setaccessapprovalflag
37 Setaccesscontrols
38 Setaccessrestrictionuntilapprovalflagisset
39 Setapprovalflagpercollectionforenablingbulkdownload
40 SetassetprotectionclassifierfordatasetsbasedontypeofPII
41 Setflagforwhetherticketscanbeusedonfilesinacollection
42 Setlockoutflagandperiodonusername‐countingnumberoftries
43 Setpasswordupdateflagonusername
44 Setretentionperiodfordatareviews
45 Setretentionperiodoningestion
46 Tracksystemsbytype(server,laptop,router,….)
47 Verifyapprovalflagswithinacollection
48 Verifyfileshavenotbeencorrupted
49 Verifypresenceofrequiredreplicas
50 Verifythatnocontrolleddatacollectionshavepublicoranonymousaccess
51 Verifythatprotectedassetshavebeenencrypted
93
6.2.1 Scan files and flag infected objects TheruleforinvokingvirusdetectionarelistedinSection5.14.1.Therulerunstheclamscanscriptonanexternalresource,whichchecksforthepresenceofviruses.Eachfileisflaggedwithametadataattributetorecordthestatusoftheviruscheck.
6.2.2 Migrate files that pass the virus check Aquerycanbemadetothecatalogtoidentifyfilesthathavepassedtheviruscheck.Thegoodfilesaremigratedtothearchive,andthevirusflagisreset.Noinputvariablesareused.Nosessionvariablesareused.Thepolicyusespersistentstateinformation:
COLL_NAMEDATA_NAMEMETA_DATA_ATTR_NAMEMETA_DATA_ATTR_VALUE
Theoperationsthatareperformedare:foreachifmsiAssociateKeyValuePairsToObjmsiDataObjRenamemsiRemoveKeyValuePairsFromObjmsiString2KeyValPairselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐migrate‐files.r
6.3 Check passwords for required attributes (Policy 35) ThepolicyenforcementpointacCheckPasswordStrengthcheckspasswordstrength(addedafteriRODS3.2),andiscalledwhentheadminorusersetsapassword.Bydefault,thisisano‐opbutthesimpleruleexamplebelowcanbeusedtoenforceaminimalpasswordlength.Thepasswordmayalsorequireatleastonenumber.ThischeckmaybedonebyanexternalauthenticationmanagerinsteadofwithiniRODS.Thepolicyimplementsaconstraint:
AppliedattheacCheckPasswordStrengthpolicyenforcementpoint
Theinputparametersare:*password Password
94
Theoperationsthatareperformedare:failifstrlenmsiSplitPathByKeysucceedwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/acCheckPasswordStrength.re.
6.4 Encrypt data on ingestion (Policy 36) TheiRODSdatagridsupportsSSLencryptionondatatransfers.Thesameencryptioncanbeaccessedthroughamicro‐servicetoencryptdataonstorage.Theexampleruleautomatesencryptiononfilessubmittedtothecollection:
/UNC‐CH/home/HIPAA/ArchiveThegoalistomaintaindataasanencryptedfileduringtransport,aswellaswithinstorage.TheruleisimplementedasapolicythatisenforcedattheacPostProcForPutpolicyenforcementpoint.Aflagissetonthefiletodenotethatencryptionhasbeendone.ThemetadataattributeDATA_ENCRYPTvalueissetto1.Thepolicyimplementsaconstraint:
AppliedattheacPostProcForPutpolicyenforcementpointChecksthatthecollectionis/UNC‐CH/home/HIPAA/Archive
Thesessionvariablesare:$objPath
Theoperationsthatareperformedare:failifmsiAssociateKeyValuePairsToObjmsiEncryptmsiSplitPathmsiString2KeyValPair
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/acPostProcForPut‐encrypt.re.
6.5 Encrypt data transfers (Policy 37) TheiRODSdatagridcanbesetuptouseSSL,andautomaticallyencryptdatatransfers.Thisisaconfigurationsettingthatiscontrolledbyenvironmentvariables:
95
irodsSSLCertificateChainFile(server)‐thefilecontainingtheserver'scertificatechain.ThecertificatesmustbeinPEMformatandmustbesortedstartingwiththesubject'scertificate(actualclientorservercertificate),followedbyintermediateCAcertificatesifapplicable,andendingatthehighestlevel(root)CA.
irodsSSLCertificateKeyFile(server)‐privatekeycorrespondingtotheserver'scertificateinthecertificatechainfile.
irodsSSLDHParamsFile(server)‐theDiffie‐Hellmanparameterfilelocation. irodsSSLVerifyServer(client)‐whatlevelofservercertificatebased
authenticationtoperform.'none'meansnottoperformanyauthenticationatall.'cert'meanstoverifythecertificatevalidity(i.e.thatitwassignedbyatrustedCA).'hostname'meanstovalidatethecertificateandtoverifythattheirodsHost'sFQDNmatcheseitherthecommonnameoroneofthesubjectAltNamesofthecertificate.'hostname'isthedefaultsetting.
irodsSSLCACertificateFile(client)‐locationofafileoftrustedCAcertificatesinPEMformat.Notethatthecertificatesinthisfileareusedinconjunctionwiththesystemdefaulttrustedcertificates.
irodsSSLCACertificatePath(client)‐locationofadirectorycontainingCAcertificatesinPEMformat.ThefileseachcontainoneCAcertificate.ThefilesarelookedupbytheCAsubjectnamehashvalue,whichmusthencebeavailable.IfmorethanoneCAcertificatewiththesamenamehashvalueexist,theextensionmustbedifferent(e.g.9d66eef0.0,9d66eef0.1etc).Thesearchisperformedintheorderingoftheextensionnumber,regardlessofotherpropertiesofthecertificates.Usethe'c_rehash'utilitytocreatethenecessarylinks.
6.6 Federation ‐ control data copies (Policy 38) Aprimaryconcernisthatprotectedfilesinafederationretainappropriateaccesscontrols.Onewaytoachievethisistocopythemetadataattributesforeachfilealongwiththedata,andthenrunthesameACCESS_APPROVALpoliciesinthefederateddatagrid.Thisrulecopiesaccesscontrolsandmetadataattributesforafile.Thisassumesthatequivalentaccountsexistinbothdatagrids.ThisrequiresupgradestosupportafederateddatagridformsiCopyAVUMetadataandmsiLoadACLFromDataObj.Theruleusesthepolicyfunctions:
checkCollInputisData
Theinputvariablesare:*Coll arelativecollectionname*Zone azonename
Thesessionvariablesare:
$rodsZoneClient$userNameClient
96
Thepolicyusespersistentstateinformation:
COLL_IDCOLL_NAMEDATA_ACCESS_DATA_IDDATA_ACCESS_TYPEDATA_ACCESS_USER_UDDATA_IDDATA_NAMEMETA_DATA_ATTR_NAMEMETA_DATA_ATTR_UNITSMETA_DATA_ATTR_VALUETOKEN_IDTOKEN_NAMETOKEN_NAMESPACEUSER_NAMEUSER_ZONE
Theoperationsthatareperformedare:failforeachifmsiDataObjCopymsiDataObjUnlinkmsiSetACLmsiSetAVUselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/odum‐bcGenerateFiwalkRule.r
6.7 Federation ‐ manage remote data grid interactions (Policy 32) Whentwodatagridsarefederated,decisionshavetobemadeaboutcompatibilityofthedatamanagementpolicies.Ifthedesireistohavebothdatagridsimplementthesamepolicies,thenthepoliciesfromtheUNCgridwillneedtobeloadedintothefederateddatagrid.Thisisofparticularimportanceforensuring:
Accesscontrols Retentionflags Protectedinformation Encryption Approvalflags
97
6.7.1 Updating rule base across servers TheruleengineiniRODSreadsalocalcopyoftherulebasetoimproveperformance.Coordinationofthemultiplerulebasesisneededwhenpoliciesareupdated.Thisruleset,developedbyChrisSmith,storestherulesintheiCATmetadatacatalog,extractsrulesfromthecatalogintoafile,andthenupdateseachoftheserverrulebases.
6.7.1.1 Storing rules in the DB from a source file. ThisruleisrunonthemasterICAT.ItreadsafiletoloadrulesintotheiCATcatalog.Oncerulesareloaded,theycanbeversionedbutnotdeleted.Theinputvariablesare:
*inFileName aninputfile*ruleBase arulebase
Nosessionvariablesareused.Thepolicyusesnopersistentstateinformation:
Theoperationsthatareperformedare:
msiAdmInsertRulesFromStructIntoDBmsiAdmReadRulesFromFileIntoStruct
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐idsStore.r
6.7.1.2 Prime the ICAT's rule base Thisruleisrunonthemastercatalog.RulesareretreivedfromtheiCATcatalog,andwrittenintoafilefordistribution.
Theinputvariablesare:*outFileName afilename*rloc hostname*ruleBase arulebase
Nosessionvariablesareused.Thepolicyusesnopersistentstateinformation.
Theoperationsthatareperformedare:
ifmsiAdmRetrieveRulesFromDBIntoStructmsiAdmWriteRulesFromStructIntoFileremote
98
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐idsApply.r
6.7.1.3 Push rules to resource serversThisrulepushestherulestoalltheresourceservers.Forserversthatdon'thostresources,aseparaterulewillneedtoberunateachservertoprimethelocalrulebasefromtheiCATcatalog.Theinputvariablesare:
*outFileName afilename*ruleBase arulebase
Nosessionvariablesareused.Thepolicyusespersistentstateinformation:
RESC_LOC
Theoperationsthatareperformedare:foreachifmsiAdmRetrieveRulesFromDBIntoStructmsiAdmWriteRulesFromStructIntoFilemsiGetContInxFromGenQueryOutmsiGetMoreRowsmsiExecGenQuerymsiGetValByKeymsiMakeGenQueryremotewhilewriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐idsPush.rAsecondapproachistoallowthefederateddatagridtoimplementaseparatesetofpolicies,butrestrictfileexchangebetweenthedatagridstodatathatdoesnotrequireprotection.Thiscanbecontrolledbyforcingalldataexchangestobedonewithdatathathaveanonymousaccess.ThisrestrictionisimplementedbynotallowinganymemberofthefederateddatagridtohaveanaccountintheUNCdatagrid.ThisminimizestheopportunitytogiveinappropriateaccesstodatawithintheUNCdatagrid.
99
6.8 Federation – Copy Data from staging area (Policy 20) Filescanbestagedbetweentwodatagrids.Thisrulerecursivelycopiesfilesfromastagingareaintoaseconddatagrid,checksthatthefilesdonotalreadyexistintheseconddatagrid,verifieschecksumsafterthecopy,andsetsaccesspermissions.Theruleusesthepolicyfunctions:
checkCollInputcheckRescInputcreateLogFilefindZoneHostNameisColl
Theinputvariablesare:
*Dest acollectionname*DestZone azonename*Owner ausername*Res astorageresource*Src acollectionname
Thesessionvariablesare:
$rodsZoneClient$userNameClient
Thepolicyusespersistentstateinformation:
COLL_IDCOLL_NAMEDATA_CHECKSUMDATA_MODIFY_TIMEDATA_NAMERESC_IDRESC_NAMEZONE_CONNECTIONZONE_NAME
Theoperationsthatareperformedare:failforeachifmsiCollCreatemsiDataObjChksummsiDataObjCopymsiDataObjCreatemsiGetSystemTimemsiSetACLmsiSplitPathByKeyremote
100
selectstrlensubstrwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐stageFederation.r
6.9 Federation‐ manage data retrieval (Policy 39) Inappropriatedataretrievalcanbecontrolledfromafederationbyapplyingthesameaccesscontrolsandpoliciesacrossthefederateddatagrid.Thisisnecessarybecausethefederateddatagridcanbeaccesseddirectly,independentlyoftheoriginaldatagrid.Ifaccessisdonethroughtheoriginaldatagrid,accountscanbeestablishedinthefederateddatagridtocontroldataretrieval.Theaccountsreferencetheoriginaldatagrid:
Accountname UNC‐HIPAA#HIPAAUNC‐HIPAA#publicUNC‐HIPAA#gridAdmin
Accesscontrolscanthenbeappliedinthefederateddatagridforeachaccountintheoriginaldatagrid.Thisrulegeneratesapipe‐delimitedfileofuseraccountsinthedatagrid.Theruleusesthepolicyfunctions: checkRescInput createLogFile
findZoneHostNameisColl
Theinputvariablesare:
*Accounts ausername*Res astorageresource
Thesessionvariablesare:
$rodsZoneClient$userNameClient
Thepolicyusespersistentstateinformation:
COLL_IDCOLL_NAMERESC_IDRESC_NAMEUSER_NAME
101
USER_TYPEZONE_CONNECTIONZONE_NAME
Theoperationsthatareperformedare:failforeachifmsiCollCreatemsiDataObjClosemsiDataObjCreatemsiGetSystemTimemsiSplitPathByKeyremoteselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐create‐accounts.rThisrulereadsanAccountfiletogeneratenewaccounts.Notethattheaccountfileneedstobecopiedintothefederateddatagrid.Thecommandmustalsoberuninthefederateddatagrid.Theaccountnamesarecreatedintheform User_name#zone_nameNotethatthemicro‐servicemsiCreateUserAccountsFromDataObjisusedtoloadtheaccounts.Thismicro‐serviceisnotyetportedtoiRODSversion4.2.Theruleusesthepolicyfunction: checkPathInputTheinputvariablesare:
*Path afilepathnameThesessionvariablesare:
$rodsZoneClientThepolicyusespersistentstateinformation:
COLL_NAMEDATA_IDDATA_NAME
Theoperationsthatareperformedare:failforeachifmsiCreateUserAccountsFromDataObjmsiSplitPath
102
selectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐accountImport.r
6.10 Generate checksum on ingestion (Policy 40) Achecksumisgeneratedforeveryfilethatisputintothedatagrid.Thepolicyimplementsaconstraint:
AppliedattheacPostProcForPutpolicyenforcementpoint
Theoperationsthatareperformedare:msiSysChksumDataObj
Theruleisavailableathttps://github.com/DICE‐UNC/policy‐workbook/blob/master/acPostProcForPut‐checksum.re
6.11 Generate report of corrections to data sets or access controls (Policy 41) Theauditlogcanbeparsedtoidentifyallchangestodatasetsoraccesscontrols.Weassumethatanyfileforwhichanewversionhasbeencreatedconstitutesacorrectiontoadataset. Theauditingcapabilitydependsonasetofexternalservicesandrules. Thefollowingservicesareused:ElasticSearch,OSGi,andAMQP.ServiceMixprovidesbothOSGiandAMQP.OntheiRODSserver,auditingrequiresalistofiRODSrules,andclientlibrariesforsendingmessagestotheAMQPservice.Inaddition,networkingontheserversrunningtheseservicesmustbeconfiguredtoallowtheseservicestocommunicate. Therulesthatneedtobeinstalledinclude:databook_pep.re,databook.re,andamqp.re.Therulesetdatabook_pep.reoverridesthedefaultiRODSPEPssothatmessagesaresentforauditing.ThishasthelimitationthatifyoualreadyhavecustomizedPEPs,youhavetomanuallyeditthem.Alternatively,startingfromiRODS4.2,youcaninstalltheauditingpluginwhichwillallowyoutoavoidchangingyourcustomizedPEPs.Therulesetdatabook.reprovidesthemainfunctionalityforauditing.Therulesetamqp.reprovidesrulesforinteractingwithAMQP.Inaddition,PythonlibrariesareusedtosendmessagestoAMQP.Thesecanbesetupusinganautomatedsetupscriptfromthesourcerepository,althoughcustomizingthescriptisusuallynecessaryinordertoachieveaparticularsetup. Oncetheauditingservicesareinstalled,allsystemaccessinformationisstoredinanElasticsearchindex.Theindexcanbequeried.Anadministratorcanretrieveeventsbasedonthefollowingparameters:
fromDate:fromwhichdate toData:towhichdate
103
event:theevent pid:urifilter start:startingindex andcount:howmanyresultstoreturn
AJavaprogramisusedtointeractwithElasticsearch.ThefollowingexamplegeneratesthenumberofaccesseventsperfileforreportingtoDataONE.Theresultscanbelimitedtoadaterange.TheEventsEnumdefineswhichtypeofeventtomonitor.Thetypesofeventsthataremonitoredarelistedinorg.dataone.service.types.v1.Event.
putdataobjectputgetdataobjectgetoverwritedataobjectoverwritedeletedataobjectdeletereplicatedataobjectreplicatesynch_failuredataobjectsynch_failure
Theprogramisavailableathttps://github.com/DICE‐UNC/policy‐workbook/blob/master/dfc‐elasticsearch.java
6.12 Generate report for cost (time) required to audit events (Policy 42) Thisrulequeriestheeventindextoidentifytheamountoftimeneededtorunanaudit.TheexecutiontimeoftheJavascriptforaccessingElasticSearchissavedtocreatethecostreport.
6.13 Generate report of types of protected assets (Policy 43) Asummaryreportcanbegeneratedthatcountsthenumberoffileswithinacollectionforeachtypeofassetclassifier:
1‐ProtectedHealthInformation–PHI 2‐PersonallyIdentifiableInformation–PIIsuchassocialsecuritynumbers 3‐PaymentCardInformation–PCIsuchasaccountnumbers,cardholder
name,expirationdate,servicecode,CID,PINs 4‐Legallyrestricteddata–classified 5‐Proprietaryinformation
Theruleusesthepolicyfunction: checkCollInputTheinputvariablesare:
104
*Coll acollectionnameNosessionvariablesareused:Thepolicyusespersistentstateinformation:
COLL_IDCOLL_NAMEDATA_IDMETA_DATA_ATTR_NAMEMETA_DATA_ATTR_VALUE
Theoperationsthatareperformedare:failforeachifselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐asset‐report.r
6.14 Generate report of all security and corruption events (Policy 44) Theauditlogcanbeparsedtoidentifyallaccessevents,andcorrelatetheaccesswithanauthenticationevent.Ifanaccesseventcannotbecorrelatedtoanauthenticationevent,apossiblesecurityeventcanbelogged.Forcorruptionevents,usethepolicyinSection14.Thisidentifiesandlistsallfilesthathavebeencorrupted.
6.15 Generate report of the policies applied to collections (Policy 45) WithintheiRODSdatagrid,policiesarestoredintheiCATmetadatacatalog.Thepoliciesareversioned,suchthateachpolicychangecreatesanewversion.Thepoliciescanbeextractedfromthecatalog,distributedtoeachsitewheredataarestored,andinstantiatedasadistributedrulebasethatcontrolsoperationswithinthedatagrid.TheiRODSdatagridreliesuponadistributedruleengineanddistributedrulebasestoimplementpolicies.Ifapolicyischanged,forconsistencytherevisedrulebaseneedstobeinstalledateachserverlocation.
6.15.1 Deploy rule sets Thisruleidentifierstheservers,anduploadsanewversionoftherulebasetoeachserver.Themicro‐servicesusedbythisruleareavailableathttps://github.com/DICE‐UNC/irods_rule_admin_micorservicesTheinputvariablesare:
105
*ruleBaseName alistofrulebases*targets alistofhosts
Thepolicyfunctionsinclude:writeRuleSet
Nosessionvariablesareused.Thepolicydoesnotusepersistentstateinformation.
Theoperationsthatareperformedare:
errorcodeforeachifmsiChksumRuleSetmsiReadRuleSetwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐deploy‐rules.r
6.15.2 Update rule sets ThispolicyfunctionreadsandwritesrulesetsthathavebeendepositedintotheiCATcatalog.Themicro‐servicesusedbythispolicyfunctionareavailableathttps://github.com/DICE‐UNC/irods_rule_admin_micorservices.Thepolicyfunctionsinclude:
1. writeRuleSetThisincludesfunctionstowrite,andchecksumrulesets
*rbs alistofrulebases*addrs alistofhostaddresses
2. backupRuleSetThiscreatearulesetbackup *rb arulebase *rbak arulebase
Thepolicyfunctionsareavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐write‐rules.rhttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐backup‐rules.r
6.15.3 Print rule sets ThisruleprintstherulesetusedbyiRODSbylistingthecore.refile.
106
Noinputvariablesareused.Nosessionvariablesareused.Thepolicydoesnotusepersistentstateinformation.
Theoperationsthatareperformedare:
msiAdmShowIRB
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐print‐rules.r
6.16 List all storage systems being used (Policy 46) Thisruleliststhestoragesystemsthatareattachedtothedatagrid.Noinputvariablesareused.Nosessionvariablesareused.Thepolicyusespersistentstateinformation: RESC_NAME
Theoperationsthatareperformedare:
foreachselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐list‐storage.r
6.17 List persons who can access a collection (Policy 47) Forthespecifiedcollection,alistisgeneratedofallpersonswhohaveaccesstofilesinacollection.Theruleusesthepolicyfunction: checkCollInputTheinputvariablesare:
*Coll acollectionnameNosessionvariablesareused:Thepolicyusespersistentstateinformation:
COLL_IDCOLL_NAMEDATA_ACCESS_DATA_ID
107
DATA_ACCESS_IDDATA_ACCESS_USER_IDDATA_IDDATA_NAMEUSER_NAMEUSER_ID
Theoperationsthatareperformedare:failforeachifselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐list‐access.r
6.18 List staff by position and required training courses (Policy 48) Alistofallpersonswithaccountsinthedatagridcanbegenerated.TheUSER_INFOfieldcanbeusedtoannotatethestaffpositionandthelasttrainingcoursethroughXMLtags:
USER_INFO="<Position>staff</Position><Training>course</Training>"
6.18.1 Set position and training ThispolicymodifiesexistinguseraccountsaccordingtoinformationinaniRODSobject.Theformatoftheaccountfileis:
user‐name|field|new‐valuewherevalidfieldsinclude: type
zonecommentinfopassword
AfilecontainingthedesiredupdatesisloadedintotheReportsdirectory.Theruleusesthepolicyfunction: checkPathInputTheinputvariablesare:
*Path afilepathnameNosessionvariablesareused:Thepolicyusespersistentstateinformation:
COLL_NAMEDATA_IDDATA_NAME
108
Theoperationsthatareperformedare:
failforeachifmsiLoadUserModsFromDataObjmsiSplitPathselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐update‐user‐info.r
6.18.2 List staff by position and training Areportofallstaffpositionsandthelatesttrainingcanbegenerated.Noinputvariablesareused.Nosessionvariablesareused:Thepolicyusespersistentstateinformation:
USER_INFOUSER_NAMEUSER_TYPE
Theoperationsthatareperformedare:foreachselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐list‐training.r
6.19 List versions of technology that are being used (Policy 49) Areportcanbekeptinthedatagridthatidentifiesthecurrentversionsofthehardwareandsoftwaretechnologiesusedinthepreservationenvironment.Thispolicydefinesthecollectionlocationandfilenameusedforthereport.
Technologyreportname TechVersionReport Collectionname Reports Location /UNC‐CH/home/HIPAA/Reports
Noinputvariablesareused.Nosessionvariablesareused:
109
Thepolicyusesnopersistentstateinformation:
Theoperationsthatareperformedare:
msiDataObjGet
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐tech‐report.r
IniRODSversion4.x,technologiesarepluggedintotheiRODSframework.Bylistingallplug‐ins,theversionsofallhardwareandsoftwaresystemscanbeautomaticallytracked.TheizonereportcommandgeneratesajsonfilethatliststheentireiRODSZoneconfigurationinformation.Thecommandizonereportvalidatestheinformationagainsttheschematafoundathttps://schemas.irods.org.
6.20 Maintain document on independent assessment of software (Policy 50) Thereportonsoftwareassessmentcanbemanagedwithinthedatagrid.ThispolicyretrievesthespecifieddocumentfromtheReportdirectory.
Softwareassessmentreportname softwareAssessment Collectionname Reports Location /UNC‐CH/home/HIPAA/Reports
Noinputvariablesareused.Nosessionvariablesareused:Thepolicyusesnopersistentstateinformation:
Theoperationsthatareperformedare:
msiDataObjGet
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐assessment‐report.r
6.21 Maintain log of all software changes, OS upgrades (Policy 51) Thelogofsoftwarechangesismaintainedbythedatagridoperators.Thispolicydefinesthecollectionlocationandfilenameusedforthereport.
Technologyreportname LogSoftwareChanges Collectionname Reports Location /UNC‐CH/home/HIPAA/Reports
Noinputvariablesareused.Nosessionvariablesareused:
110
Thepolicyusesnopersistentstateinformation:
Theoperationsthatareperformedare:msiDataObjGet
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐store‐log.r
6.21.1 Version log files Eachversionofalogfilecanbetracked.Whenafileisaddedtothesystem,aversionlabeledbythecurrenttimestampissaved,ensuringthatahistoryofchangescanbemaintained.Theversionismovedtoanarchivedirectory.
Theversionnumbercanbeinsertedinthefilenamebeforetheextension.Thisruleparsesthefilename,identifiesanextension,andinsertsthetimestampbeforetheextensionwhentheversionnameiscreated.TheownershipofthefileissettothehipaaAdminaccount.Theruleislistedinsection4.7.1.
6.22 Maintain log of disclosures (Policy 52) Adisclosurelogidentifiesalleventsassociatedwithunauthorizedaccesstofiles.Thewaysthismayhappeninclude:
Incorrectsettingofaccesscontrolsonthefilesinacollection.OnewaytodetectthisistologallfilesinacollectionthatdonothaveACCESS_APPROVALsetto1,buthaveanonymousorpublicaccess.
Directreadingofthefileondiskwithoutgoingthroughthedatagrid.Thismayhappenwhenasecurityvulnerabilityispresentwithintheoperatingsystemthathasnotbeenpatched.Detectionofthistypeofaccessrequiresparsingthesystemlogforthecomputer.
Unauthorizeduseofanaccount.Thisrequiresthattheunauthorizeduserlearnthepasswordassociatedwiththeaccount.Thismayhappenwhenapasswordissharedorstolen.Detectionofthistypeofaccessrequiresinteractionwiththeaccountownertodeterminewhethertheymadetheaccess.
Inallthreecases,areportcanbegeneratedthatisupdatedexternallytothedatagrid.Thereportcanbestoredinthedatagridwithversioningenabled,anddeletionturnedoff.TheversionisstoredinReports/Backup.Thispolicydefinesthecollectionlocationandfilenameusedforthereport.
Technologyreportname DisclosureReport Collectionname Reports Location /UNC‐CH/home/HIPAA/Reports Version /UNC‐CH/home/HIPAA/Reports/Backup
Aruletostorethereportusesthepolicyfunction: checkRescInput
111
findZoneHostNameTheinputvariablesare:
*destRescName astorageresourceThesessionvariablesare: $rodsZoneClientThepolicyusespersistentstateinformation:
COLL_NAMEDATA_IDDATA_NAMERESC_IDRESC_NAMEZONE_CONNECTIONZONE_NAME
Theoperationsthatareperformedare:failforeachifmsiDataObjPutmsiSplitPathByKeymsiStoreVersionWithTSremoteselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐version‐report.rToturnoffdeletiononcollection/UNC‐CH/home/HIPAA/Reports,setthepolicyenforcementpointacDataDeletePolicy.Thepolicyimplementsaconstraint:
AppliedattheacDataDeletePolicypolicyenforcementpointAcheckismadethattheobjectpathislike"/UNC‐CH/home/HIPAA/Reports/*"
Theoperationsthatareperformedare:
msiDeleteDisallowedTheruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/acDataDeletePolicy.re.
112
6.23 Maintain password history on user name (Policy 53) Ahistoryofpriorpasswordscanbekeptaseventsinanexternalindex.ThechallengeisthatthecurrentdesigndoesnotgenerateanidentityfortheuseruntilaftertheacCheckPasswordStrengthhasbeenexecuted.Oneapproachistocheckthepasswordhistoryaftertheusernameisdefined,withintheacSetPublicUserPolicyenforcementpoint.Metadataattributesforthepriorpasswordscanthenbechecked.Ifasimilarpriorpasswordisfound,arequesttochangethepasswordcanbemadeandtherulecanfail.Themetadataattributesare:
META_USER_ATTR_NAME PasswordHist META_USER_ATTR_VALUE priorpassword META_USER_ATTR_UNITS Setto0forcurrentpassword
ThispolicyloadspasswordsasattributesontheUSER_NAME.Thepolicyimplementsaconstraint:
AppliedattheacSetPublicUserPolicypolicyenforcementpoint
Thesessionvariablesthatareusedare:$userNameClient
Theoperationsthatareperformedare:
foreachifmsiAssociateKeyValuePairsToObjmsiRemoveKeyValuePairsFromObjmsiString2KeyValPairselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/acSetPublicUserPolicy.re.
6.24 Parse event trail for all accessed systems (Policy 54) Theauditlogcanbequeriedtoidentifyallaccessestotherepository.Foreachaccess,thestorageresourcecanbespecified.Theresultscanbesummarizedtoidentifyallofthestorageresourcesthatwereaccessed.
6.25 Parse event trail for all persons accessing collection (Policy 33) Theauditlogcanbequeriedtoidentifyallaccessestofilesinacollection.Foreachaccess,theidentityoftheaccountmakingtherequestisknown.Theresultscanbesummarizedtoidentifyallpersonswhoaccessedthecollection.Seesection5.16.
113
6.26 Parse event trail for all unsuccessful attempts to access data (Policy 55) Eachaccessofthedatagridisauthenticated.Iftheauthenticationfails,aneventcanbegeneratediftherequestedoperationwasareadattempt.Theauditlogcanthenbequeriedtoidentifyallunsuccessfulaccessattemptstofilesinacollection.Theresultscanbesummarizedtoidentifytheaccountsthathadunsuccessfulaccessattempts.
6.27 Parse event trail for changes to policies (Policy 56) TheiRODSdatagridcanmaintainaneventdatabasethatlistsalleventsassociatedwithmanagingoraccessingthedatasystem.Thepoliciesthatrecordeventsgeneratemessagesthataresenttoanexternalindexingsystem.Bysearchingintheexternalindex,eventsassociatedwiththepolicyenforcementpointscanbeidentified:
pep_PLUGINOPERATION_prepep_PLUGINOPERATION_post
ChangestopoliciesshouldbesavedintheiCATcatalogasruleversionsusingthemicro‐services
msiAdmReadRulesFromFileIntoStruct msiAdmInsertRulesFromStructIntoDB
Thecorrespondingeventsintheeventdatabaseare:
pep_msiAdmReadRulesFromFileIntoStruct_pre pep_msiAdmReadRulesFromFileIntoStruct_post pep_msiAdmInsertRulesFromStructIntoDB_pre pep_msiAdmInsertRulesFromStructIntoDB_post
Aqueryisissuedagainsttheeventindexbyissuingalibcurlcall.Theoperationsthatareperformedare:
msiCurlGetStrwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐issue‐url.r.
6.28 Parse event trail for inactivity (Policy 57) Eachaccessofthedatagridistreatedasaseparatesession.Theuserisauthenticatedandtheoperationisauthorized.Whentherequestedoperationcompletes,thesessionisterminated.Thususerscannotbeloggedintothedatagridwithoutapplyingoperationsonthedata.Usersareonly“logged”intothedatagridwhiletheyareapplyingoperationsontheirdata.Thereisthepossibilityoflong‐runningoperations,suchasvalidatingchecksumsforallfilesinacollection.However,theseareexpectedusesofthesystem.
114
6.29 Parse event trail for updates to rule bases (Policy 58) Theauditlogcanbequeriedtoidentifyallupdatesmadetothepolicies.Eventscanbegeneratedthatcorrespondtoexecutionofthemicro‐servicethatcreatesnewversionsofrulesthatareregisteredintotheiCATcatalog.Theresultscanbewrittentoafileorprinted.
6.30 Parse event trail to correlate data accesses with client actions (Policy 59) EventscanbegeneratedforaccessesthatincludethetypeofclientAPIthatwasused.EachclientAPIinteractsthroughaplug‐inthatcantrackusageevents.Eventsthataretrackedinclude:
dataobjread dataobjectupdate dataobjectoverwrite dataobjectput dataobjectget dataobjread dataobjwrite dataobjcreate dataobjremove
6.31 Provide test environment to verify policies on new systems (Policy 60) ThetestenvironmentshouldbeanindependentiRODSdatagridwithaseparateiCATcatalog,separatestorageservers,anddisjointuseraccounts.Thedirectorystructureshouldbesimilartotheproductionenvironment.Thispolicydownloadstherulesfromthetestenvironment,andstorestheminafile.Weassumethefollowing:
Testzoneiscalled uncTestZone Adminaccountiscalled uncTestAdmin Testzonerulebaseiscalled TestBase Rulefileiscalled NewRules
Theinputvariablesare:
*FileName afilenamein'server/config/reConfigs/'directorywithan.reextension
*RuleBase arulebasenameNosessionvariablesareused:Nopersistentstateinformationisused.
Theoperationsthatareperformedare:
msiGetRulesFromDBIntoStruct
115
msiAdmShowIRBmsiAdmWriteRulesFromStructIntoFile
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐export‐policies.r
AsecondrulereadstherulesfromthefileNewRulesandloadsthemintotheproductioniCATcatalog.Theinputvariablesare:
*FileName afilenamein'server/config/reConfigs/'directorywithan.reextension
*RuleBase arulebasenameNosessionvariablesareused:Nopersistentstateinformationisused.
Theoperationsthatareperformedare:
msiAdmInsertRulesFromStructIntoDBmsiAdmReadRulesFromFileIntoStructmsiAdmShowIRB
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐import‐policies.r
6.32 Provide test system for evaluating a recovery procedure (Policy 61) Atestsystemwouldideallycontainacompletesetofrecordsfromtheoriginaldatagrid,includinganup‐to‐datecopyofthemetadatacatalog.Arecoveryprocedurewouldthenneedtodothefollowingsteps:
RecreatetheiCATcatalogfromthetestsystem.Thiswouldsetaccounts,definestorageresources,definefilenames,definecollections
Achecksumonthefileswouldthenberuntodetectanycorruptedfiles. Corruptedfileswouldbereplacedfromthetestsystem
Areplicationrulecouldberuntodetectproblems.Ifoneofthereplicasintheoriginaldatagridisstillgood,thisshouldbesufficient.However,ifnogoodreplicasexist,thenthefilewillneedtobereplacedfromthetestsystem.Areplicationruleislistedinsection4.5.2
6.33 Provide training courses for users (Policy 62) Informationabouttrainingcoursescanbekeptinaseparatedatabase.Foreachstaffposition,asetofrequiredtrainingcoursescanbedefined.Thelistofrequiredcoursescanbecomparedwiththecoursesthatweretaken,andstoredasUSER_INFO.
116
6.34 Replicate data sets on ingestion (Policy 13) Whenafileisputintothecollection/UNC‐CH/home/HIPAA/Archive,itwillbereplicatedtoasecondstoragesystem.TheruleisenforcedattheacPostProcForPutpolicyenforcementpoint.Thepolicyimplementsaconstraint:
AppliedattheacPostProcForPutpolicyenforcementpointChecksthatthecollectionislike"/UNC‐ARCHIVE/home/Archive/*"
Thesessionvariablesthatareusedare:
$objPathTheoperationsthatareperformedare:
msiSysReplDataObjTheruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/acPostProcForPut‐replicate.re.
6.35 Replicate iCAT periodically (Policy 63) Atypicalapproachtoensuringthatthemetadataattributesareappropriatelybackedupistosetupamirrorcatalog,andusedynamicupdatestothemirrorcatalogtomaintainanactivecopy.Thisapproachworksaslongastherearenoerrorsintheoriginalcatalog.Toenablerecoveryfrompropagatederrors,anindependentsnapshotofthecatalogcanbeperiodicallycreated.Thisprovidesasecondrecoverymechanismincasebothcatalogsarecompromised.Inadditiontoreplication,thecatalogindicesneedtobeperiodicallyoptimized.Thisimprovesperformance.
6.36 Set access approval flag (Policy 64) ThisrulesetstheACCESS_APPROVALflagto1,andenablesaccessbypublicandanonymoususers.Theruleusesthepolicyfunctions: addAVUMetadata
checkCollInput deleteAVUMetadataTheinputvariablesare:
*Coll acollectionnameNosessionvariablesareused.Thepolicyusespersistentstateinformation:
117
COLL_IDCOLL_NAMEDATA_ACCESS_DATA_IDDATA_ACCESS_USER_IDDATA_IDDATA_NAMEMETA_DATA_ATTR_NAMEMETA_DATA_ATTR_VALUEUSER_IDUSER_NAME
Theoperationsthatareperformedare:failforeachifmsiRemoveKeyValuePairsFromObjmsiSetACLmsiString2KeyValPairselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐access‐set.r
6.36.1 Restrict access for “Protected” data Eachcollectionthatcontains“Protected”informationwillhaveanApprovalflag,called
ACCESS_APPROVALWhenthevalueofthisattributeissetto“0”,nopublicoranonymousaccessisallowedtofileswithinthecollection.ThisrulesetstheACCESS_APPROVALflagto0foreveryfileinacollection,andrestrictsaccessbypublicandanonymousaccounts.Theruleusesthepolicyfunctions:
addAVUMetadatacheckCollInputdeleteAVUMetadata
Theinputvariablesare:
*Coll acollectionnameNosessionvariablesareused.Thepolicyusespersistentstateinformation:
COLL_ID
118
COLL_NAMEDATA_ACCESS_DATA_IDDATA_ACCESS_USER_IDDATA_IDDATA_NAMEMETA_DATA_ATTR_NAMEMETA_DATA_ATTR_VALUEUSER_IDUSER_NAME
Theoperationsthatareperformedare:failforeachifmsiRemoveKeyValuePairsFromObjmsiSetACLmsiString2KeyValPairselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐restrict‐access.r
6.37 Set access controls (Policy 14) Thisrulekeepsusersfromseeingthenamesofotheruser’sfiles.TherulesetstheAccessControlListpolicy.IftheruleisnotcalledorcalledwithanargumentotherthanSTRICT,theSTANDARDsettingisineffect,whichisfineformanysites.Bydefault,usersareallowedtoseecertainmetadata,forexamplethedata‐objectandsub‐collectionnamesineachother'scollections.WhenaccesscontrolsaremadeSTRICTbycallingmsiAclPolicy(STRICT),theGeneralQueryAccessControlisappliedoncollectionsanddataobjectmetadatawhichmeansthatthelistcommand,ils,willneed'read'accessorbettertothecollectiontoreturnthecollectioncontents(nameofdata‐objects,sub‐collections,etc.).Thedefaultisthenormal,non‐strictlevel,allowinguserstoseenamesofothercollections.Inallcases,accesscontroltothedata‐objectsisenforced.Evenifapersoncanseefilenamesinacollection,“read”accessisrequiredonafiletobeabletoreadthefile.EvenwithSTRICTaccesscontrol,however,theadminuserisnotrestrictedsovariousmicroservicesandquerieswillstillbeabletoevaluatesystem‐wideinformation.Thesessionvariable,“$userNameClient”canbeusedtolimitactionstoindividualusers.However,thisisonlysecureinanirods‐passwordenvironment(notGSI),butyoucanthenhaverulesforspecificusers:
acAclPolicy{ON($userNameClient=="quickshare"){}}acAclPolicy{msiAclPolicy("STRICT");}
whichwasrequestedbyARCS(SeanFleming).SeersGenQuery.cformore
119
informationon$userNameClient.Thetypicaluseistojustsetitstrictornotforallusers.ThepolicycanbeupdatedintheiRODScore.refile.Thepolicyimplementsaconstraint:
AppliedattheacACLPolicypolicyenforcementpointTheoperationsthatareperformedare:
msiAclPolicyTheruleisavailableathttps://github.com/DICE‐UNC/policy‐workbook/blob/master/acAclPolicy‐strict.re
6.37.1 Set access controls after proprietary period Thisrulechecksaflagforwhetheraproprietaryperiodhaselapsed,andthenprovidespublicaccesstothefile.TheflagACL_EXPIRYdefinesthedateandtimeafterwhichthefilebecomespublic.Theruleusesthepolicyfunction: checkCollInputTheinputvariablesare:
*Coll arelativecollectionnameThesessionvariablesare:
$rodsZoneClient$userNameClient
Thepolicyusespersistentstateinformation:
COLL_IDCOLL_NAMEDATA_IDDATA_NAMEMETA_DATA_ATTR_NAMEMETA_DATA_ATTR_VALUE
Theoperationsthatareperformedare:failforeachifmsiSetACLmsiRemoveKeyValuePairsFromObjmsiString2KeyValPairselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐set‐ACL.r
120
6.38 Set access restriction until approval flag is set (Policy 65) Whenafileisaddedtoacollection,itnormallycanonlybeaccessedbytheowner,thepersonuploadingthefile.Thefilecaninheritaccesscontrolsfromitscollectionifthestickybitisenabled.Thisappliestheaccesscontrolsfromthecollectionastheaccesscontrolsonthefile.Astandardsequenceisto:
Turnofftheinheritflagonthecollection Loadafileintothecollection.Thefilecanonlybeaccessedbytheownerof
thefile. Explicitlyaddaccesscontrolsforagroup
o MembersofthegroupcanthenaccessthefileWhentheapprovalflagissettoone,thenpublicaccesscanbeenabled.Publicaccessallowsaccessbyallaccountswithinthedatagrid.Foraccessbypersonswithoutanaccountinthedatagrid,Anonymousaccessmustalsobeenabled.
6.39 Set approval flag per collection for enabling bulk download (Policy 66) Bulkdownloadsareinitiatedbyaclient,whichmanageseitheraloopoveraspecifiedfilesetoroverfilesinacollection.Restrictionofbulkdownloadrequiresapolicyenforcementpoint,acBulkGetPreProcPolicy.Thiscouldbeturnedoffforingeneral.Thepolicyimplementsaconstraint:
AppliedattheacBulkGetPreProcPolicypolicyenforcementpointTheoperationsthatareperformedare:
msiSetBulkGetPostProcPolicyTheruleisavailableathttps://github.com/DICE‐UNC/policy‐workbook/blob/master/acBulkGetPreProcPolicy‐off.re
Bulkprocessingcanbeturnedoffforacollection.Thepolicyimplementsaconstraint:
AppliedattheacBulkGetPreProcPolicypolicyenforcementpointAcheckismadeforaspecificcollection"/UNC‐CH/home/HIPAA"
Thesessionvariablesare:
$objPath
Theoperationsthatareperformedare:ifmsiSetBulkGetPostProcPolicymsiSplitPath
121
Theruleisavailableathttps://github.com/DICE‐UNC/policy‐workbook/blob/master/acBulkGetPreProcPolicy‐on.re
Bulkprocessingcanbecontrolledforacollectionthathasaflag“BulkDownLoad”withavalue“off”.Thepolicyimplementsaconstraint:
AppliedattheacBulkGetPreProcPolicypolicyenforcementpointThesessionvariablesare:
$objPath
Theoperationsthatareperformedare:ifforeachmsiSetBulkGetPostProcPolicymsiSplitPathselect
Theruleisavailableathttps://github.com/DICE‐UNC/policy‐workbook/blob/master/acBulkGetPreProcPolicy‐flag.re
ThesepoliciescanbeupdatedintheiRODScore.refile.
6.40 Set asset protection classifier for data sets based on type of PII (Policy 67) Eachdatasetshouldbeassignedaprotectionclassifierthatdefineswhetherthefilecontains:
1‐ProtectedHealthInformation–PHI 2‐PersonallyIdentifiableInformation–PIIsuchassocialsecuritynumbers 3‐PaymentCardInformation–PCIsuchasaccountnumbers,cardholder
name,expirationdate,servicecode,CID,PINs 4‐Legallyrestricteddata–classified 5‐Proprietaryinformation
Theclassifierisstoredinametadataattributeforeachfile: META_DATA_ATTR_NAME=AssetProtectionClassifier META_DATA_ATTR_VALUE=“protectionclassifiervalue1‐5” META_DATA_ATTR_UNIT=“”
AnapproachistouseabitcuratorruletoassignassetclassifierforPII,PHI,PCI.
6.41 Set flag for whether tickets can be used on files in a collection (Policy 68) TheiRODSdatagridsupportsthecreationofticketsthatenableaccesstospecificdatasetsbypersonswhodonothaveanaccount.Theticketscontrolthenumberofallowedaccessesandthetimeperiodduringwhichtheaccesscanbemade.ForcollectionsthathavetheACCESS_APPROVALflagsetto0,ticket‐basedaccessisprohibited.
122
Thepolicyimplementsaconstraint:AppliedattheacTicketPolicypolicyenforcementpoint
Thesessionvariablesare: $objPathTheoperationsthatareperformedare:
ifforeachmsiSplitPathselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/acTicketPolicy.re.
6.41.1 Remove public and anonymous access Ticketaccessrequiresthatanonymousaccesspermissionbeset.WhentheACCESS_APPROVALflagissetto0,anonymousaccessisturnedoff.ThusticketaccesscanbecontrolledbysettingtheACCESS_APPROVALflagto0.Therulelistedinsection6.36.1canbeusedtosettheACESS_APPROVALflagto0.
6.42 Set lockout flag and period on user name ‐ counting number of tries (Policy 69)
Whenauserexceedsthenumberofallowedattemptswhentryingtologonwithoutsuccess,alockoutflagwillbesetforaspecifiedperiodoftime.Ideallythisisdonebytheauthenticationsystem.
6.42.1 Set lockout period on user name Thecodethatcheckstheusernamewillneedtobeaugmentedwithapolicyenforcementpoint(acChkUserLogon)thatimplementsthreemetadataattributesforauser:
META_USER_ATTR_NAME NumberAttempts META_USER_ATTR_NAME LockoutPeriod META_USER_ATTR_NAME ResetPassword
ThecontrolpointacChkUserLogonwillneedtobecalledforeverycontrollediCommand.NotethattheNumberAttemptscounterwillneedtobesetbackto“0”onasuccessfullogin.Thisrulesetsincrementstheattemptcounter,andsetsanexpirationtimewhentheallowednumberofattemptsisexceeded.Thepolicyimplementsaconstraint:
AppliedattheacChkUserLogonpolicyenforcementpoint
123
Thesessionvariablesare: $userNameClientTheoperationsthatareperformedare:
foreachifmsiAssociateKeyValuePairsToObjmsiGetSystemTimemsiRemoveKeyValuePairsFromObjmsiString2KeyValPairselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/acChkUserLogon.re.Asecondruleteststheexpirationtimetoreleasethelockoutflag.ThisrulecouldbeaddedtotheacSetPublicUserPolicy.Thepolicyimplementsaconstraint:
AppliedattheacSetPublicUserPolicypolicyenforcementpoint
Thesessionvariablesare: $userNameClientTheoperationsthatareperformedare:
foreachifmsiAssociateKeyValuePairsToObjmsiGetSystemTimemsiRemoveKeyValuePairsFromObjmsiString2KeyValPairselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/acSetPublicUserPolicy‐lockout.re.
6.43 Set password update flag on user name (Policy 70) Aflagisassociatedwitheachusernametospecifywhethertheyneedtoupdatetheirpassword.Thisusestheattribute:
META_USER_ATTR_NAME ResetPasswordThevaluecanbesetto‘1’forallusersbytheadministrator.Noinputvariablesareused.
124
Nosessionvariablesareused.Thepolicyusespersistentstateinformation:
META_USER_ATTR_NAMEMETA_USER_ATTR_VALUEUSER_NAME
Theoperationsthatareperformedare:foreachifmsiAssociateKeyValuePairsToObjmsiRemoveKeyValuePairsFromObjmsiString2KeyValPairselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐passwordUpdate.r
EachtimetheacSetPublicPolicyenforcementpointisexecuted,theResetPasswordflagcanbecheckedandamessagecanbewrittentostdout.Thepolicyimplementsaconstraint:
AppliedattheacSetPublicUserPolicypolicyenforcementpoint
Thesessionvariablesare: $userNameClientTheoperationsthatareperformedare:
foreachifmsiAssociateKeyValuePairsToObjmsiGetSystemTimemsiRemoveKeyValuePairsFromObjmsiString2KeyValPairselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/acSetPublicUserPolicy‐reset.re.
6.44 Set retention period for data reviews (Policy 71) TheiRODSdatagridprovidesametadataattribute,DATA_EXPIRY,foraretentionperiod.Thechoiceofwhattodowhentheretentionperiodisoverisgovernedbyadispositionpolicy.OneapproachistosetDATA_EXPIRYforadatareview.Aquery
125
canthenbeissuedtoidentifyfilesthatneedtobereviewed.Theruleusesthepolicyfunction: checkCollInputTheinputvariablesare:
*Coll acollectionnameNosessionvariablesareused.Thepolicyusespersistentstateinformation:
COLL_IDCOLL_NAMEDATA_EXPIRYDATA_NAME
Theoperationsthatareperformedare:foreachifmsiGetSystemTimeselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐retention‐review.r
6.45 Set retention period on ingestion (Policy 21) Asystemattribute,DATA_EXPIRY,isusedtodefineanexpirationdateforadigitalobject.ThisrulesetsanexpirationdateaspecifiednumberofsecondsgreaterthantheingestiontimeforaspecifiedcollectionThepolicyimplementsaconstraint:
AppliedattheacPostProcForPutpolicyenforcementpointChecksforcollectionequalto“/UNC‐ARCHIVE/home/Archive”
Thesessionvariablesare: $objPathTheoperationsthatareperformedare:
ifmsiGetSystemTimemsiSplitPathmsiSysMetaModify
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/acPostProcForPut‐expiry.re.
126
6.46 Track systems by type (server, laptop, router,….) (Policy 72) Eachsystemusedwithintherepositorycanbelabeledbyitstype.TheinformationcanbekeptinafilethatisstoredintheReportsfolder.Thispolicydefinesthecollectionlocationandfilenameusedforthereport.
Technologyreportname LogSystemType Collectionname Reports Location /UNC‐CH/home/HIPAA/Reports
Theinputvariablesare:
*destRescName astorageresourceNosessionvariablesareused.Nopersistentstateinformationisused.
Theoperationsthatareperformedare:
msiDataObjPut
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐store‐system‐log.r
6.47 Verify approval flags within a collection (Policy 73) Thisruleexaminesacollectiontodeterminewhetheranyofthefileshavenotbeenapprovedforaccess,andlistsallsuchfiles.Theruleusesthepolicyfunction: checkCollInputTheinputvariablesare:
*Coll acollectionnameNosessionvariablesareused.Thepolicyusespersistentstateinformation:
COLL_IDCOLL_NAMEDATA_IDDATA_NAMEMETA_DATA_ATTR_NAMEMETA_DATA_ATTR_VALUE
Theoperationsthatareperformedare:failforeachifselectwriteLine
127
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐check‐access‐approval.r
6.48 Verify files have not been corrupted (Policy 18) Theruleforverifyingthatfileshavenotbeencorruptedcanbecombinedwiththeruletocheckexistenceofreplicas.Aversionoftheruleislistedinsection4.5.2.
6.49 Verify presence of required replicas (Policy 74) Arulecanberunperiodicallytoverifythateveryfilehasareplica.Thisrulechecksboththeexistenceoftherequiredreplica,validatesthechecksums,andreplacesmissingorcorruptedfiles.Aversionoftheruleislistedinsection4.5.2.
6.50 Verify that no controlled data have public or anonymous access (Policy 75) Eachcollectionthatcontains“Protected”informationwillhaveanApprovalflag,called
ACCESS_APPROVALWhenthevalueofthisattributeissetto“0”,nopublicoranonymousaccessisallowedtofileswithinthecollection.Whentheflagissetto“1”,anonymousaccessisallowed.
6.50.1 Restrict access to “Protected” data ThisrulecheckstheACCESS_APPROVALflag,andrestrictsaccessbypublicandanonymousaccounts.Noinputvariablesareused.Nosessionvariablesareused.Thepolicyusespersistentstateinformation:
COLL_NAMEDATA_ACCESS_DATA_IDDATA_ACCESS_USER_IDDATA_IDDATA_NAMEMETA_COLL_ATTR_NAMEMETA_COLL_ATTR_VALUEUSER_IDUSER_NAME
Theoperationsthatareperformedare:foreachifmsiSetACLselectwriteLine
128
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐verify‐access‐approval.r
6.51 Verify that protected assets have been encrypted (Policy 76)Checkthatallfilesinthecollection
/UNC‐CH/home/HIPAA/ArchivehavetheDATA_ENCRYPTflagsetto1.Iftheflagismissingorthevalueisnot1,writeanoutputlineandencryptthefile.
6.51.1 Check that files with ACCESS_APPROVAL = 0 are encrypted ThisversionoftherulelooksfortheACCESS_APPROVALflag.Ifthevalueissetto0,thenthefileencryptionischecked.Ifthefileisnotencrypted,anoutputlineiswrittenandthefileisencrypted.Noinputvariablesareused.Nosessionvariablesareused.Thepolicyusespersistentstateinformation:
COLL_NAMEDATA_NAMEMETA_DATA_ATTR_NAMEMETA_DATA_ATTR_VALUE
Theoperationsthatareperformedare:foreachifmsiAssociateKeyValuePairsToObjmsiEncryptmsiRemoveKeyValuePairsFromObjmsiString2KeyValPairselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/hipaa‐encrypt‐check.r
129
7 Data Management Plan Example Rules Datamanagementplans(DMPs)arerequiredbytheNationalScienceFoundationandotherfederalagenciesforeverysubmittedproposal.TheDMPsspecifytasksrelatedtoformationofthedigitalcollection,analysis,storage,publication,andarchives.Theexpectationisthatthetaskscanbeautomatedthroughpoliciesthatareeitherappliedatpolicyenforcementpoints,orthatareperiodicallyexecuted.AnanalysisofNSFrequirementsforDMPsisshowninTable7.1.Atotalof38taskswereidentified,alongwiththetypeofenvironmentvariableneededasinputforeachtask.
Table7.1.DataManagementPlanTasks DMP tasks Variable Policy
1 Collection Managers & staff Roles 48
2 Costs Budget 24
3 Collection plans How, what 45
4 Instrument types Type 77
5 Event log Event 54
6 Collection report Event 41
7 Required data policies Products 17
8 Data category Type 78
9 Use of existing data Source 79
10 Analysis Quality control Plans 80
11 Analysis plans Plans 81
12 Data sharing during analysis Who 82
13 Data dictionary / glossary Type 29
14 Naming includes Attributes 83
15 Data format type Type 16
16 DOI for data sets Type 27
17 Metadata standard Type 29
18 Metadata export as Type 84
19 Storage Collection Location 85
20 Size Size 86
21 Publication Make original data public When 87
22 Make Data products public When 88
23 Re‐use Policies 89
24 Re‐distribution Community 90
25 Access restrictions Privacy 14
26 IPR Type 91
27 Web access through How 92
28 Data sharing system Type 93
29 Code distribution system Type 94
30 Archive Retention period Period 21
31 Curation Plans 95
32 Archive Location 96
33 Number of replicas # 13
34 Backup frequency Policies 97
35 Integrity check frequency Policies 18
36 Technology evolution Plans 49
37 Catalog Metadata 9
38 Transformative migration Formats 15
EachdirectorateanddivisionatNSFhasselecteddifferentaspectstoemphasize.ThesepreferredtasksareindicatedinTable7.2.
130
Table7.2.DMPtasksbyNSFDirectorate/Division AGS AST CHE CISE DMR EAR EHR ENG GEN OCE PHY SBE
1 X X X X
2 X X X X
3 X X X X X X X X X X X
4 X
5 X
6 X
7 X X X X X X X X
8 X X X X X X X X X X X X
9 X X X X X X X X X X X X
10 X X X X X X X
11 X X X X X X
12 X X
13 X
14 X
15 X X X X X X X X X X X
16 Cite URL
17 X X X X X X X X X X X X
18 X X X X X X X X X
19 X X X X
20 X X X X X X X X X X X X
21 X X X X X < 2 yrs X X X X X X
22 X X X X X X X X X X X X
23 X X X X X X X X X X X
24 X X X X X X X X X X X
25 X X X X x X X X X X X X
26 X X X X X X X X X X X
27 X X X X X X X X X X X
28 X X X X X X X X X X X
29 X
30 X X X X X X X >3 yrs X X X X
31 X X X X X X X X X X X X
32 X X X X X EAR X X X X X X
33 X X X X X X X X X X
34 X X X X X X X X X X
35 X X X X X X X X X X
36 X X X
37 X X X X X X X X X X
38 X X X X X X X X X
TounderstandhowactualDMPswerecreated,18DataManagementPlans(DMP)werecomparedtodeterminewhetheracommonsetofpoliciescouldbeimplementedforautomatingmanagementtasks.TheDMPswereacquiredfromtheDataONEwebsite(exampleDMPs)andfromtheDataManagementPlanningtool(publicDMPSfromtheCaliforniaDigitalLibrary).EachDMPwascomparedwiththetasksdeterminedfromtheNSFrequirements.Theexpectationisthateachtaskcanbeautomatedbycreatingasetofdatamanagementpoliciesforsettingenvironmentvariables(suchasretentionperiod),enforcingthepolicy,andverifyingthepolicy.ThetasksfromtheDMPsarelistedinTables7.3Aand7.3B.,ThetasksspecifiedintheDMPsvarieddramatically.Forthetasksthatdependeduponanenvironmentalvariable,thevalueofthevariablewasspecifiedforeachtaskforeachplan.
131
Table7.3A–Publisheddatamanagementplans
Task
Environment
Variables
Cultural O
bjects
Mauna Loa CO2
Sensor
Surface weather data
Multim
edia Text
Annotation
Parietal Cortex
Biosignature Suites
Peer Power
Anthropod Responses
Andvari
NEH AGS AGS NEH BIO BIO CISE GEN NEH
1 Roles X X X
2 Budget
3 How, what X X X X X X
4 Type
5 Event X
6 Event X
7 Products
8 Type X X X X X X X
9 Source GHCN‐D Institutions
10 Plans X X
11 Plans X X Generate netCDF
12 Who
13 Type
14 Attributes timestamp time stamp
15 Type .txt CSV, text CSV,
netCDF
.plx,
.dvt, .avt
.pdf, .tif, .csv .txt .xsl, .csv
16 Type URL X X
17 Type METS X X X Dublin Core EML
18 Type XML XML images
19 Location
20 Size
21 When X X X X
22 When 6 months review project end 2 yrs publication review
23 Policies
24 Community
25 Privacy none creator, copyright CCL
26 Type none
27 How URLs URLs URLs URLs FTP website URL website
28 Type Dspace iRODS google docs Dspace
29 Type UCSC GitHub
30 Period forever forever 5 yr 10 yr forever long‐term project
31 Plans
32 Location UC3 ORNL ORNL IDEC, CCNP
UNM‐Dspace mySQL
33 # 1 3
34 Policies Daily Daily,
monthly periodic periodic daily
35 Policies
36 Plans
37 Metadata
38 Formats
132
Table7.3B–DataManagementPlans
Task
Making Data Count
Collaboration as a
means of retention
Meterological
measuremen
ts East
Antartica
Project 1 data
managem
ent
Agent‐based
model of
population
Engineered Bioactive
Interfaces
Inquiring into
Engineering
Certain Stem
HydroShare
NSF IES AGS SBE SBE GEN ENG EHR AGS
1 X X X X X
2
3 X X X X X X
4
5
6
7
8 X X X X X
9 DataONE Yes Yes
Time series, Geospatial ‐ NASA,
USGS NWI
10 Yes
11
Web Map, WaterOneFlow, Web
Feature, Web Coverage
12
13
14 metadata source, date
15 .txt, .csv
.html, .txt, .csv, .xml ArcGIS .xsl, .tif, .txt
audio, .txt, .xsl
16 EZID X
17 COUNTER WMO FGDC X education WaterML, OGC, ISO,
INSPIRE
18 .txt CUAHSI HIS
19
20
21 Apache 2 X X X NSF Collaboration driven
22 project project
23 Use agreement
24
25 IRB IRB proprietary IRB IRB Research group driven
26
27 website website website website HTTP, FTP, DataONE
28 Merritt CUAHSI HIS, HUBzero,
iRODS
29 X gForge
30 10 yrs 3 yrs 3 yrs 10 yrs
31
32 GitHub OSF.io US ADC EPA Uknowl‐edge
DataCommons HydroShare
33 3 2 1 2
34
35 3
months
36 iRODS
37 DataONE
38
133
Task27specifiedthetypeofaccessclientthatcouldbeusedtointeractwiththedatacollection.MostDMPsplannedtopublishdatathroughalocalwebsiteortoprovidepersistentURLstoenableremoteaccesstothedatasets.Task30specifiedtheretentionperiod.Somesitesplannedtokeepthedataforever,oraslongasthedesignatedrepositorywasfunctional.Task32specifiedtherepositorywherethedatasetswouldbemanaged.TheDMPsidentifiedawidevarietyofdatamanagementsystems,fromlocaldiskcaches,tolocaldatabases,toinstitutionalrepositories,tofederalrepositories.MostoftheDMPSdidnotspecifytheresourceswherethecollectionwouldbeassembled,andinsteadspecifiedthefinalarchive.ThemostcomprehensiveDMPplanthatwasexaminedwastheDataONEexampleDataManagementPlanfor“AtmosphericConcentrations,MaunaLoaObservatory,Hawaii,2011‐2013”.Thisplanincluded16ofthepolicies.TheMaunaLoaDMPislistedinAppendixF.Weanalyzedtheplantoidentifythedatamanagementrequirementsandextractedthefollowingtasks: 3.Plansforassemblingthecollection
5.Maintenanceofaneventlogrecordingchangestosensors6.Maintenanceofacollectionreport8.Categorizationasobservationaldata10.Qualityassessment11.Analysisplans14.Timestampincludedinfilename15.Datatypesare.csv,.txt16.DOIcreatedforeachfile17.Metadatastandardbasedondiscipline18.MetadataexportedasXML21.Alloriginaldataismadepublic22.Dataproductsaremadepublicafter6monthsandreview27.WebaccessprovidedthroughURLs30.Dataretainedforever32.DataarchivedatORNL
AsimilaranalysiswasdoneforadministrationofprotecteddataatUNCincludingPII,PHI,andPCIdatatypes.Atotalof48taskswasidentified,includingpasswordstrengthassessments,detectionofthepresenceofprotecteddata,characterizationofthetypeofprotecteddata,loggingofaccessevents,andanalysisofaudittrails.Thisindicatedthatthetasklistfordatamanagementplansisexpectedtoexpandasadditionaltypesofdataaremanaged.Foreachtask,wecreateacomputeractionablerulethatcanbeusedtoautomateexecution.WeusetheintegratedRuleOrientedDataSystemrulelanguagetowritetherules.Theresultingrulesarelistedbelowforeachtask.
134
7.1 Staffing policies (Policy 48) Therolesneededtoimplementadatamanagementplaninclude:
1. administrator–personmakingthefinancialcommitmentformaintainingtherepository
2. collectionmanager–personmaintainingthepropertiesofthedatacollection(requiredmetadataanddataformatstandards,collectionquality)
3. datagridadministrator–personmaintainingthepropertiesoftherepository(repositorysoftwareupgrades,driversforstoragesystems,clients)
4. informationtechnologyadministrator–personmaintainingthestoragesystems,network,authenticationsystems.
Typically,atleasttwopersonsareneededforeachofthedatagridandinformationtechnologyadministratorpositions.Thisprovidesredundancyneededtoensureaccessacrossvacations.Thefollowingpolicycountsthenumberofdatagridadministratorsforacollection.Thepolicychecksthenumberofuserswhocanaccessaspecifiedcollectionandliststheiraccountnames.Therearenoinputvariables.Nosessionvariablesareused.Thepolicyusespersistentstateinformation:
USER_NAMEUSER_TYPE
Theoperationsthatareperformedare:foreachselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/dmp‐list‐admin.r
7.2 Cost reporting (Policy 24) Thecostofmanagingadatacollectionincludes:
1. Facilitycostsforfloorspaceandpower2. Equipmentcostsforstoragesystems,networks,andcomputerservers3. Mediacostsfortape4. Laborcostsforoperations5. Networkcostsforloadingthecollectionandforcollectionaccess
Thecostscanbedistributedacrossthefilesinthecollection.Howeverthecostsmaybeproportionalto:
‐ Thenumberoffiles
135
‐ Thesizeofthefiles‐ Theamountofmetadata
Apolicythataggregatescostsacrossthesethreemetricsislistedbelow.Theruleusesthepolicyfunctions:
checkCollInputcheckRescInputcreateLogFilefindZoneHostNameisColl
Theinputvariablesare:
*FacCount Costfactorpermillionfiles*FacMeta Costfactorpermillionattributes*FacSize CostfactorperGigabyte*Rep acollectionname*Res astorageresource*Src acollectionname
Thepolicyusessessionvariables: $rodsZoneClientThepolicyusespersistentstateinformation:
COLL_IDCOLL_NAMEDATA_IDDATA_SIZEMETA_DATA_ATTR_IDRESC_IDRESC_NAMEZONE_CONNECTIONZONE_NAME
Theoperationsthatareperformedare:failforeachifmsiCollCreatemsiDataObjCreatemsiGetSystemTimemsiSplitPathByKeyselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/dmp‐cost‐report.r
136
7.3 Collection creation planning (Policy 45) Collectioncreationplanningidentifiesthepropertiesthatwillbeassociatedwithacollection.Thepropertiesaredrivenbyassertionsthatthecollectioncreatorswillclaimaboutthedigitalentities,suchasprovenance,authenticity,quality,completeness.Collectionplanningalsorequirestheidentificationof:
‐ Mechanismsforingestingsensordataintoacollection‐ Namingconventionsassignedtothefiles‐ Arrangementoffilesintocollection‐ Identificationofappropriateprovenancemetadata‐ Identificationofappropriatedescriptionmetadata‐ Assignmentofaccesscontrols‐ Identificationofproceduresforgeneratingderiveddataproducts.‐ Qualitycontrol
Thespecificpoliciesthatautomatethesetasksdependuponthespecificdetailsofthecollectionformationprocessandthetypeofdatathatarebeingorganized(observational,experimental,simulation,survey).Examplepoliciesforcollectionarrangementmightbe:
‐ Organizebytimeperiod.Eachmonthanewsubcollectionisstarted.‐ Organizebydatatype.Separatecollectionsaremadeforsensordata,
simulationdata,documents.‐ Organizebycontributor.‐ Organizebyexperiment.
Theexamplepolicylistedbeloworganizesdatafilesbyatimeextension.Filesarecopiedfromastagingareaintosubcollectionsforeachyear.Theruleusesthepolicyfunctions:
checkCollInput isCollTheinputvariablesare:
*Destcoll acollectionname*Srccoll acollectionname
Nosessionvariablesareused.Thepolicyusespersistentstateinformation:
COLL_IDCOLL_NAMEDATA_NAME
Theoperationsthatareperformedare:failforeachifmsiCollCreatemsiDataObjRename
137
msiSplitPathByKeyselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/dmp‐stage‐time.r
7.4 Instrument control (Policy 77) Thecontrolofthedatastreamsfromsensorsrequiresidentificationofhowfrequentlytoharvestobservationaldata,howtoaggregatethesensordataintofiles,andhowtoarchivethedatastreams.Asanexample,weillustratetheharvestingofsensordatafromanexternalAntelopeRealTimeSystem.Theplanningrequiresidentifyinghowfrequentlytoharvest,theformattobeusedtostorethedata,andhowtonamethefiles.Theruleharvests100,000packetsfromaspecificsensor.Theinputvariablesare:
*Coll acollectionname*Loc aseekaddresswithinafile*modeln amodelnumber*Offset afileoffset*OrbHost ahostaddress*OrbParam aparameterforasensor*PKTNum numberofpackets*Resc flagforfilecreate*Sensor typeofsensor
Nosessionvariablesareused.Thepolicyusesnopersistentstateinformation.
Theoperationsthatareperformedare:
formsiCollCreatemsiDataObjClosemsiDataObjCreatemsiDataObjLseekmsiDataObjOpenmsiDataObjWritemsiFreeBuffermsiOrbClosemsiOrbDecodePktmsiOrbOpenmsiOrbReapmsiOrbSelectselect
138
writeLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/dmp‐sensor‐harvest.r
7.5 Event log for collection formation (Policy 54) Errorsmayoccurinthesensordataastheyarebeinggenerated(missingvaluesorbadcalibration),whenthesensordataarearchived(transmissionerror),andafterstorage(datacorruption).Detectionoferrorsongenerationrequiresanalysisofthedatastream,testforvaluesoutofrange,andtestsformissingvalues.Detectionoftransmissionerrorscanbehandledwithnetworkprotocols.Detectionoferrorsafterstoragerequiresperiodicvalidationofchecksums.Thefollowingruleverifiesthechecksumsofallfilesintheaccount/Mauna/home/atmos.Sincethesizeofthecollectionissmall,theruledoesnotneedtomonitortheloadonthesystem.Alogfileiscreatedthatcontainsatimestampforwhenthecheckwasrun,andthatlistsallcorruptedfiles.Theruleusesthepolicyfunctions:
checkCollInputcheckRescInputcreateLogFilefindZoneHostNameisColl
Theinputvariablesare:
*Coll acollectionname*Res astorageresource
Thesessionvariablesare:
$rodsZoneClient$userNameClient
Thepersistentstateinformationis:
COLL_ACCESS_COLL_IDCOLL_ACCESS_USER_IDCOLL_IDCOLL_NAMEDATA_CHECKSUMDATA_IDRESC_IDRESC_NAMEZONE_CONNECTIONZONE_NAME
Theoperationsthatareperformedare:failforeach
139
ifmsiCollCreatemsiDataObjChksummsiDataObjCreatemsiGetSystemTimemsiSplitPathByKeyremoteselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/dmp‐validate‐chksum.r
7.6 Collection reports (Policy 41) Informationaboutthecollectionmayincludethenumberoffiles,thesizeofthedata,thenumberofmetadatavalues,theusage,whenintegritychecksweredone,theuniformityofmetadataacrossthefiles,thesizedistribution,etc.Theinformationmaybeorganizedbyeachsub‐collection,orbyfiletype,orbyyear.ReportsaregeneratedbyissuingqueriestotheiCATcatalogandformattingtheresults.Thisexamplepolicyliststhesizeofeachcollectionandthenumberoffilesthatarepubliclyaccessible.Theruleusesthepolicyfunction:
checkCollInputcheckRescInputcreateLogFilefindZoneHostNameisColl
Theinputvariablesare:
*PathColl acollectionname*Res astorageresource
Thesessionvariablesare:
$rodsZoneClient$userNameClient
Thepersistentstateinformationis:
COLL_ACCESS_COLL_IDCOLL_ACCESS_USER_IDCOLL_IDCOLL_NAMEDATA_IDDATA_SIZERESC_IDRESC_NAMEUSER_ID
140
USER_NAMEZONE_CONNECTIONZONE_NAME
Theoperationsthatareperformedare:failforeachifmsiCollCreatemsiDataObjCreatemsiGetSystemTimemsiSplitPathByKeyremoteselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/dmp‐report.r
7.7 Product formation (Policy 17) Whenprocessingobservationaldata,communitiesgeneratethreeadditionalclassesofdata:1)calibrateddata,2)physicalvariables,3)griddeddata.Theprocessingstepscanbeaggregatedintoaprocessingpipelinethatautomaticallygenerateseachsuccessivedataclass.Theprocessingcanbeappliedeachtimeafileisdepositedintoaknowndirectory,orappliedinabatchmodeataremotecomputeserver,orappliedatthestorageresource.Theprocessingstepscanalsobecapturedinaworkflowthatisregisteredintothedatagrid.Eachexecutionoftheworkflowcanbetracked,associatingtheworkflowinputwiththeworkflowoutput.Thefollowingruleillustratesprocessingthatisautomaticallyappliedeachtimeafileisdepositedintoaspecifiedcollection.Inthiscaseareportisamendedtoaddinformationabouteachfilethatisdeposited.Thepolicyimplementsaconstraint:
AppliedattheacPostProcForPutpolicyenforcementpoint
Thesessionvariablesare$objPath
Theoperationsthatareperformedare:foreachifmsiDataObjChksummsiDataObjOpenmsiDataObjLseek
141
msiGetSystemTimemsiSplitPathselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/acPostProcForPut‐report.re.
7.8 Data category management (Policy 78) Thecategoriesofdataincludeobservational,experimental,simulation,survey,andpublications.Differentassertionscanbemadeabouteachtypeofdata.Thusobservationaldataneedstobecalibrated,convertedtophysicalvariables,andmappedtoacoordinatesystem.Experimentaldatamayrequireadditionalprovenanceinformationthatrecordthedetailsofeachexperiment.Simulationdataneedclosetrackingofsimulationversionandinputfiles.Publicationdatamayhaveareleasedatethatdependsuponacceptancebyajournal.Ineachcase,asetofassertionsaremadeaboutthedatacollectionwhichareuniformlyappliedtoalldepositedfiles.SimilarlytotheProductGenerationtask,datacategorymanagementcanbeexpressedasasetofprocessingstepsthatenforcetheassertions.Anexamplepolicyistheautomatedapplicationofaprocessingsteponthestoragesystemholdingthedata.Thisruleexecutesanapplication(calledapp)storedintheirods/server/bin/cmddirectory.Twoinputargumentsaresetupfortheapp,andthetemporaryfilesaredeleted.Theruleusesthepolicyfunction:
checkPathInputTheinputvariablesare:
*Cmd anapplicationcommand*outXmlFile afilepathname*Pathf afilepathname
Nosessionvariablesareused.Thepersistentstateinformationis:
COLL_NAMEDATA_IDDATA_NAMEDATA_PATHDATA_RESC_NAMERESC_LOC
Theoperationsthatareperformedare:errorcodeerrormsgexecCmdArg
142
failforeachifmsiDataObjPutmsiExecCmdmsiGetStderrInExecCmdOutmsiGetStdoutInExecCmdOutmsiSplitPathremoteselecttimewriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/dmp‐external‐process.r
7.9 Re‐using existing data (Policy 79) Adatagridcanaccessfilesfromexternalrepositories.Alocalcopycanbemadeandusedinprocessingsteps.Mostrepositoriesprovidewebservicesforaccessingfiles.ThisexampleruleretrievesafilefromaspecifiedURLandstoresacopyofthefileinthedatagrid.Theinputvariablesare:
*destObj afilepathname*url aURL
Nosessionvariablesareused.Nopersistentstateinformationisused.
Theoperationsthatareperformedare:
msiCurlGetObjwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/dmp‐get‐object‐url.r
7.10 Quality control (Policy 80) Assertionsaboutpropertiesofacollectioncanbeverifiedbyperiodicallyevaluatingassessmentcriteria.Thetypesofpropertiesthatcanbeverifiedincluderequiredmetadata,requiredfiletype,integrity,distribution,etc.
143
Theexamplerulecomparesthemetadatadefinedonacollectionandchecksthateachfileinthecollectionhashadthesamemetadataattributesdefined.Theruleusesthepolicyfunction: checkCollInputTheinputvariablesare:
*Coll acollectionnameNosessionvariablesareused.Thepersistentstateinformationis:
COLL_IDCOLL_NAMEDATA_IDDATA_NAMEMETA_COLL_ATTR_NAMEMETA_DATA_ATTR_IDMETA_DATA_ATTR_NAME
Theoperationsthatareperformedare:failforeachifselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/dmp‐metadata‐check‐coll.r
7.11 Analysis procedures (Policy 81) Eachtimeafileisaddedtothesystem,anewfileversioniscreated.Aversionofafilecanbecreatedbyaddingatimestamp,andmovingtheversiontoanarchivedirectory.Thisruleprocessesfilesinacollection,creatingaversionofeachfilethatisstoredinadestinationdirectorycalled“SaveVersions”.Theruleiscalledruleversion.randislistedinsection4.7.1.
Theversionnumbercanbeinsertedinthefilenamebeforetheextension.Thisruleparsesthefilename,identifiesanextension,andinsertsthetimestampbeforetheextensionwhentheversionnameiscreated.TheruleisautomaticallyexecutedwithintheacPostProcForPutpolicyenforcementpoint.Notethataccesscontrolshavetobesetontheversionedfile.Theruleiscalledruleversionfile.randislistedinsection4.7.1.
144
Therule“ruleversionfile.r”canbemodifiedtoenforceversioningataPolicyEnforcementPoint.Thefollowingruleisappliedeverytimeafileisloadedintothedatagrid.Thepolicyimplementsaconstraint:
AppliedattheacPostProcForPutpolicyenforcementpointFilesareversionedtoaspecificcollection
Thesessionvariablesare:
$objPath$rodsZoneClient$userNameClient
Theoperationsthatareperformedare:msiDataObjCopymsiGetSystemTimemsiSetACLmsiSplitPathmsiSplitPathByKey
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/acPostProcForPut‐version.re.
7.12 Analysis collaborations (Policy 82) Whencollaborationsresultinmultiplepersonsupdatingacollection,achangelogwillbeneededtodeterminewhenupdateshavebeenmadetoacollection.Twoapproachesaretoanalyzeaudittrails,ortoperiodicallysummarizethecontentsofthecollection.Achangelogsummarizesallchangesmadetothesensordata.Thechangelogcanbecreatedbylistingallofthefilesthatareinthe“/Mauna/home/atmos/version”directory.Theruleusesthepolicyfunction
checkRescInputcreateLogFilefindZoneHostNameisColl
Theinputvariablesare:
*Res astorageresourceThesessionvariablesare:
$rodsZoneClientThepersistentstateinformationis:
COLL_IDCOLL_NAME
145
DATA_NAMERESC_IDRESC_NAMEZONE_CONNECTIONZONE_NAME
Theoperationsthatareperformedare:failforeachifmsiCollCreatemsiDataObjCreatemsiGetSystemTimemsiSplitPathByKeyremoteselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/dmp‐report‐changes.r
7.13 Data dictionary (Policy 29) AreservedvocabularycanbeimplementedforacollectionusingtheHIVE(HelpingInterdisciplinaryVocabularyEngineering)system.HIVEmaintainsanontologyforadiscipline,definingrelationshipsbetweenwordsaswellasastandardvocabulary.Thedescriptivemetadataregisteredonfileswithinacollectioncanbecheckedforcompliancewiththereservedvocabulary.Thisensuresthatwell‐knowntermscanbeusedtoquerythecollectionandidentifyrelevantmaterial.AnexamplevalidationruleutilizesaRESTservicetoiterateoveriRODScollections,validatingthetermsasbeingvalidSKOSreferences,andgeneratingareportoninvalidterms.Theruleiscalledvalidate‐ontologies.randislistedinsection5.12.1.Anexampleoutputforwhentwodataobjectsareannotated,onewithaninvalidterm,islistedbelow.
test1@ubuntu:~/workspace/rule_workbench$irule‐Fvalidate_data_object_ontologies.rMetadatavalidationreport/fedZone1/home/rods/hive/libmsiCurlGetObj.cpphasurihttp://purl.org/astronomy/uat#TT888thatisnotinavalidontology
7.14 Naming control (Policy 83) TheingestionofdataintothecollectionisgovernedbyprocessesoutsideofiRODS.IfanAntelopeRealTimeSystemisbeingusedtomanagethesensordata,thenmicro‐servicesexisttoautomatetheperiodicingestionofsensorrecordsfromARTS
146
intoaniRODScollection.Theupdatecanbedoneperiodically.NotethattheattributeDATA_CREATE_TIMEisautomaticallyseteachtimeafileiscreated,andDATA_MODIFY_TIMEisautomaticallyseteachtimeafileismodified.Theruleiscalleddmp‐sensor‐harvest.randislistedinsection7.4.
7.15 Data format control (Policy 16) Acheckcanbemadethatthedatatypeassociatedwitheachsensordatafileis.csv.Theruleusesthepolicyfunction: checkCollInputTheinputvariablesare:
*Coll acollectionnameNosessionvariablesareused.Thepersistentstateinformationis:
COLL_IDCOLL_NAMEDATA_NAMEDATA_TYPE_NAME
Theoperationsthatareperformedare:failforeachifselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/dmp‐metadata‐checkDataType.r
7.16 Unique identifiers (Policy 27) ADigitalObjectIdentifiercanbegeneratedautomaticallythroughanextensiontotheacPostProcForPutrule.TheHandlesystemcanusealocalhandleregistryforassigningidentifierstofiles.Thelocalhandleregistry,inturn,isassignedauniqueidentifierinaglobalhandlesystem.ThefollowingrulecreatesahandleandregistersitintheDFChandleserver:(theregistrationofthehandleinourhandleserverindicatesitisavailableforaccessfromDataONE.)Thepolicyimplementsaconstraint:
AppliedattheacPostProcForPutpolicyenforcementpoint
Thesessionvariablesare:
147
$objPath
Theoperationsthatareperformedare:msiGetStdoutInExecCmdOutmsiExecCmd
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/acPostProcForPut‐handle.re.Theruleexecutesashellscript:
#!/bin/bashif["$#"‐ne2];thenecho"Usage:create_handle<dataobjectid><dataobjecturl>"exit1fiOID="$1"URL="$2"HANDLE=$(java‐classpath./irods‐hs‐tools.jarorg.irods.dfc.CreateHandle./admpriv.bin"$URL""$OID")echo"$HANDLE"exit0;
7.17 Metadata standard (Policy 29) Themetadataattributesthatwillbecreatedcanbespecifiedinatemplate.Dependinguponthesensordataformat,theattributescanbeparsedfromeachsensorfileandaddedasmetadataonthefile.Examplesexistforparsingmetadatafromtextfiles,netCDFfiles,XMLfiles,etc.Patternmatchingoperationscanbeappliedtotexttoextractcontextualmetadata.Atemplateforpatternmatchingcanbecreatedthatdefinestriplets:
<pre‐string‐regexp,keyword,post‐string‐regexp>.
Thetripletsarereadintomemory,andthenusedtosearchadatabuffer.Foreachsetofpreandpostregularexpressions,thestringbetweenthemisassociatedwiththespecifiedkeywordandcanbestoredasametadataattributeonthefile.Intheexample,thetemplatefilehastheformat:
<PRETAG>X‐Mailer:</PRETAG>MailerUser<POSTTAG></POSTTAG><PRETAG>Date:</PRETAG>SentDate<POSTTAG></POSTTAG><PRETAG>From:</PRETAG>Sender<POSTTAG></POSTTAG><PRETAG>To:</PRETAG>PrimaryRecipient<POSTTAG></POSTTAG>
148
<PRETAG>Cc:</PRETAG>OtherRecipient<POSTTAG></POSTTAG><PRETAG>Subject:</PRETAG>Subject<POSTTAG></POSTTAG><PRETAG>Content‐Type:</PRETAG>ContentType<POSTTAG></POSTTAG>
Theendtagisactuallya"return"forunixsystems,ora"carriage‐return/linefeed"forWindowssystems.Theexamplerulereadsatextfileintoabufferinmemory,readsinthetemplatefilethatdefinestheregularexpressions,andthenparses the text in the buffer to identify presence of a desiredmetadata attribute.Theruleiscalledrulemetaload.randislistedinsection4.6.3.
7.18 Metadata export (Policy 84) ThedescriptivemetadatathatareregisteredoneachfilecanbeextractedandwrittenasanXMLfile.ThisrulecreatesanXMLmetadatafileforeachfileinthe/Mauna/home/atmos/sensordirectory.Thefollowingstructureisused:
<?xmlversion="1.0"?><catalog><Filepath=”COLL_NAME/DATA_NAME”><META_DATA_ATTR_NAME>META_DATA_ATTR_VALUE</META_DATA_ATTR_NAME></File></catalog>
Thenameofthemetadatafileiscreatedbyappending.xmltothenameofthesensordatafile.Theruleusesthepolicyfunctions:
checkCollInputcheckRescInputfindZoneHostName
Theinputvariablesare:
*Relcoll arelativecollectionname*Res astorageresource
Thesessionvariablesare:
$rodsZoneClient$userNameClient
Thepersistentstateinformationis:
COLL_IDCOLL_NAMEDATA_NAMEMETA_DATA_ATTR_NAMEMETA_DATA_ATTR_VALUERESC_IDRESC_NAMEZONE_CONNECTION
149
ZONE_NAME
Theoperationsthatareperformedare:failforeachifmsiDataObjClosemsiDataObjCreatemsiSplitPathByKeyremoteselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/dmp‐createXML.r
7.19 Collection creation system (Policy 85) Thedatamanagementplanshouldincludeinformationaboutthesystemthatwillbeusedtoassemblethecollection.Thismaybedifferentfromthesystemusedtoarchivethecollection.Acollaborationenvironmentfacilitatescollectioncreation.Eachcollaboratingpersonisgivenanaccount,andpermissionsaresettoallowdepositionoffilesintothesharedcollection.Thisrequires:
‐ Creatingsharedcollectionname.Thismaybeaseparateaccountinthedatagrid.
‐ Settingwriteaccesscontrolsonthesharedcollection.Thismaybedonebycreatingausergroupthatisallowedtoupdatethecollection.
‐ Definingthedesirednamingconventionforthefiles.Thismayrequirerenamingeachfileasitisdeposited.
‐ Definingtherequiredprovenanceanddescriptivemetadataneededforeachfile.Thismayrequireextractionofheaderinformationfromeachfile.
Thefollowingpolicyliststhenamesofthepersonsineachgroupthatcanupdatethesharedcollection.Theruleusesthepolicyfunction: checkCollInputTheinputvariablesare:
*Coll acollectionnameNosessionvariablesareused.Thepersistentstateinformationis:
COLL_ACCESS_COLL_IDCOLL_ACCESS_TYPECOLL_ACCESS_USER_ID
150
COLL_IDCOLL_NAMETOKEN_IDTOKEN_NAMETOKEN_NAMESPACEUSER_GROUP_IDUSER_IDUSER_NAME
Theoperationsthatareperformedare:failforeachifselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/dmp‐metadata‐check‐group.r
7.20 Collection size (Policy 86) ThetotalsizeofthecollectioncanbefoundbyqueryingtheiCATcatalog.Thetotalsizeshouldincludethestoragespaceforreplicas,thestoragespaceforintermediateproducts,andthestoragespaceforpublishedresults.Theexamplepolicytakesasinputacollectionname.Theruleusesthepolicyfunctions:
checkCollInputcheckRescInputcreateLogFilefindZoneHostNameisColl
Theinputvariablesare:
*Coll acollectionname*PathColl acollectionname*Res astorageresource
Thesessionvariablesare:
$rodsZoneClient$userNameClient
Thepersistentstateinformationis:
COLL_IDCOLL_NAMEDATA_ID
151
DATA_SIZERESC_IDRESC_NAMEZONE_CONNECTIONZONE_NAME
Theoperationsthatareperformedare:failforeachifmsiCollCreatemsiDataObjCreatemsiGetSystemTimemsiSplitPathByKeyremoteselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/dmp‐report‐size.r
7.21 Publication of original data (Policy 87) Astandardapproachistoplacetherestrictedaccessdatainacollection,createusergroupsforallowedusers,andrestrictaccesstojusttheallowedusergroups.TherearethreetypesofdatamanagedbytheMaunaLoaproject:sensordata,deriveddataproducts,andresearchdata.Thesecanbehandledbycreatingthreecollections:
/Mauna/home/atmos/sensor /Mauna/home/atmos/derived /Mauna/home/atmos/research
Wewillturnoninheritanceineachcollection,andsettheaccesscontrolsatthecollectionlevel.PublicaccessisspecifiedforallsensordatafortheMaunaLoadata.IntheiRODSdatagrid,publicaccessisthroughthe“anonymous”account.Weturnoninheritanceonthe“sensor”datacollectionandgiveaccesstothe“anonymous”account.Theruleusesthepolicyfunction:
checkCollInputTheinputvariablesare:
*RelativeCollection arelativecollectionnameThesessionvariablesare:
$rodsZoneClient$userNameClient
152
Thepersistentstateinformationis:
COLL_IDCOLL_NAME
Theoperationsthatareperformedare:failforeachifmsiSetACLselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/dmp‐set‐public.r
7.22 Publication of data products (Policy 88) ThetimeperiodsforholdingdataproprietaryvariedacrosstheDMPs,andexamplesincluded6months,2years,untilprojectend,untilprojectreview,anduntilresearchpublication.FortheMaunaLoadata,allderiveddatawillbeheldprivateuntilasixmonthperiodhaselapsed.Attheendofthisperiodwechangethereadaccesstopublic.Theruleusesthepolicyfunction: checkCollInputTheinputvariablesare:
*RelativeCollection arelativecollectionname*Acl anaccesscontrol
Thesessionvariablesare:
$rodsZoneClient$userNameClient
Thepersistentstateinformationis:
COLL_IDCOLL_NAMEDATA_ACCESS_DATA_IDDATA_ACCESS_TYPEDATA_ACCESS_USER_IDDATA_CREATE_TIMEDATA_IDDATA_NAME
Theoperationsthatareperformedare:failforeach
153
ifmsiGetSystemTimemsiSetACLselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/dmp‐proprietary‐change.r
7.23 Re‐use policies (Policy 89) Collectionre‐useoccurswhenthecollectionissubsummedintoanotherdigitallibrary,orprocessedthroughanewdataprocessingpipeline,orarchivedatanothersite.Dependinguponthetypeofdata,re‐usemayentailmultiplerequirements:
‐ Accesspermission.Allproprietaryorconfidentialdatarequirenegotiationofaccessagreements.Thismayrequireanonymizationofdatafiles,orencryptionofdatafiles,orcreationofaccesscontrols.
‐ Descriptivemetadata.Thecontextassociatedwitheachfileisrepresentedbyastandardmetadataschema.Re‐usemayrequiremappingfromthechosenstandardtoanothermetadataschema.TheHIVEtechnologyprovidestheabilitytomapbetweenontologiestosimplifythisprocess.
‐ Integritychecks.Integrityshouldbeverifiedoneachshareddataobject.Thisimpliesthecommunitythatisre‐usingthedatacanverifychecksumsoneachfile.
‐ Policy‐encodedobjects.Thepoliciesthatgovernaccessandprocessingofadigitalobjectcanbeencapsulatedwiththedigitalobject.Ifthesepoliciesareautomaticallyloadedintoacontrollingruleenginewhenthedigitalobjectisused,controlcanbemaintainedevenwhenthedigitalobjectisre‐used.Theimplementationwillrequire:
o Encryptionofthedigitalobject.o Negotiationbetweentheinstitutionthatisre‐usingthedigitalobject
andtheoriginalrepositoryfortheencryptionkey.o Verificationthatthere‐useinstitutioniscapableofenforcingthe
policies.o Extractionoftheassociatedpoliciesandthereloadingintoare‐use
ruleengine‐ PreservationofDigitalObjectIdentifiers.Themetadatausedtoidentifythe
digitalobjectsshouldbepreservedbythere‐useinstitution.‐ Provenancetrail.Digitalobjectsthatarederivedfromtheoriginaldata
shouldincludemetadatathatdenotesthesourceandthetransformationthatwereappliedtotheoriginaldata.Thetransformationscanbeencapsulatedinworkflowsthatcanberegisteredintotherepositoryalongwithidentifiersfortheinputfilesandtheoutputfiles.
154
Theimplementationofthesepoliciesdependsuponthetechnologyusedbythere‐useinstitution.Ifdatagridtechnologyisused,manyoftheserequirementsmaybeimplementedthroughfederationoftheoriginaldatagridandthere‐usedatagrid.
7.24 Distribution policies (Policy 90) Researchersprefertohavealocalcopyofthedatasetstheyareanalyzing.Thisminimizeslatencyinprocessingpipelines,ensuresaccess,andenablestrackingofversionsofthedatawithoutdisruptingtheoriginalcollection.Distributionpoliciesmaybedefinedto:
Cachedataonaresourceataremoteinstitution. Controlwhichdatasetsmaybere‐used. Automategenerationofcopiesattheremotesitewhenfilesareaddedtoa
collection. Distributefilesacrossinstitutionsdependinguponthetypeofdata.An
exampleisthedistributionofsensordatatotheinstitutionthatisworkingwithaparticularsensor.
Applytransformativemigrationasthedatasetsaredistributedtoensuretheappropriateformatisprovided.
Distributeworkflowsthatcanbeusedtoprocessthedatasets. DistributeapplicationswithinDockervirtualenvironmentimagesthatcanbe
usedtoanalyzethedatasets. DistributethedescriptivemetadataeitherasanXMLfile,oraCSVfile,ora
JSONfile.ThefollowingpolicygeneratesaJSONfilecontainingthedescriptivemetadataforthedatafilesinacollection.Foreachfile,aJSONfileisputintoasubdirectorycalled“Metadata”.Theruleusesthepolicyfunctions:
checkCollInputcheckRescInputfindZoneHostNameisColl
Theinputvariablesare:
*Coll acollectionname$Res astorageresource
Thesessionvariablesare:
$rodsZoneClient$userNameClient
Thepersistentstateinformationis:
COLL_IDCOLL_NAMEDATA_NAMEMETA_DATA_ATTR_UNITS
155
META_DATA_ATTR_NAMEMETA_DATA_ATTR_VALUERESC_IDRESC_NAMEZONE_CONNECTIONZONE_NAME
Theoperationsthatareperformedare:failforeachifmsiCollCreatemsiDataObjClosemsiDataObjCreatemsiSplitPathByKeyremoteselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/dmp‐json.r
7.25 Privacy access restrictions (Policy 14) TherearenorestrictionsonaccessfortheMaunaLoasensordata.TypicalaccessrestrictionsforotherDMPsincludeInstitutionalReviewBoard,proprietarydata,andcopyright.Asbefore,therestrictionscanbeenforcedbyplacingrestricteddatainacollection,creatingusergroupsfortheallowedusers,andonlypermittingallowedgroupstoaccessthedata.Astandardtaskistoverifythattheaccesscontrolshavebeensetcorrectly.Theruleusesthepolicyfunctions:
checkUserInputcontainsfindZoneHostName
Theinputvariablesare:
*Group agroupnameThesessionvariablesare:
$rodsZoneClient$userNameClient
Thepersistentstateinformationis:
COLL_NAMEDATA_ACCESS_DATA_IDDATA_ACCESS_TYPEDATA_ACCESS_USER_ID
156
DATA_IDDATA_SIZETOKEN_IDTOKEN_NAMETOKEN_NAMESPACEUSER_IDUSER_NAMEUSER_ZONEZONE_CONNECTIONZONE_NAME
Theoperationsthatareperformedare:failforeachifmsiSplitPathByKeyremoteselectstrlenwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/policy‐workbook/dmp‐group‐access.r
7.26 IPR restrictions (Policy 91) Weassumethatfilesdepositedintotheresearchdirectoryhavebeenpublished.Toensurepublicaccess,weonlyneedtosetinheritanceonthedirectoryforthe“anonymous”account.ThiscanbedoneasshownforTask1.Thisruleusesthepolicyfunction: checkCollInputTheinputvariablesare:
*Acl anaccesscontrol*RelativeCollection arelativecollectionname*User ausername
Thesessionvariablesare:
$rodsZoneClient$userNameClient
Thepersistentstateinformationis:
COLL_IDCOLL_NAME
Theoperationsthatareperformedare:
157
failforeachifmsiSetACLselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/odum‐inherit.rAmoresophisticatedrulewouldcheckforametadataflagthatspecifiesthatpublicationhasbeendone.Thisrulecheckswhetherthevalueofa“PUBLICATION”flagissetto1,andthenprovidespublicaccess.Theruleusesthepolicyfunctions:
addAVUMetadatacheckCollInputdeleteAVUMetadata
Theinputvariablesare:
*Coll acollectionnameThesessionvariablesare:
$rodsZoneClient$userNameClient
Thepersistentstateinformationis:
COLL_IDCOLL_NAMEDATA_ACCESS_DATA_IDDATA_ACCESS_USER_IDDATA_IDDATA_NAMEMETA_DATA_ATTR_NAMEMETA_DATA_ATTR_VALUEUSER_IDUSER_NAME
Theoperationsthatareperformedare:failforeachifmsiRemoveKeyValuePairsFromObjmsiSetACLmsiSetAVUmsiString2KeyValPairselectwriteLine
158
Theruleisavailableathttp://github.com/DICE‐UNC/dmp‐publication.r
7.27 Web access policies (Policy 92) AstandardapproachacrosstheDMPsistoprovideapersistentURLforaccessingdatasets.WithintheiRODSdatagrid,eitheraURLcanbecreatedforpublicaccess,oraticketcanbecreatedthatdefinesapersistentURL,definesaccesscontrols,andalsodefinesthetimeperiodoverwhichtheticketisvalid.Anypersonholdingtheticketisallowedaccesstothedataset.Ticketscanbecreatedbyawebclient,orcanbecreatedbyrunningtheiticketiCommand.Arulecanbecreatedtolistticketsusedwithinacollection.Theruleusesthepolicyfunction:
checkCollInputTheinputvariablesare:
*Coll acollectionnameThesessionvariablesare:
$rodsZoneClient$userNameClient
Thepersistentstateinformationis:
COLL_IDCOLL_NAMETICKET_DATA_COLL_NAMETICKET_EXPIRYTICKET_ID
Theoperationsthatareperformedare:failforeachifselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/dmp‐list‐tickets.r
7.28 Data sharing system (Policy 93) Thechoiceofthedatamanagementsystemforsharingorpublishingthedataproductsdependsonthetypeofdataproduct.MostDMPsuseGitHubtopublishcode,adatabasetopublishinformation,andadatarepositorytopublishdata.Ineachofthesecases,thedatasetsaretypicallypubliclyaccessed.Forfinergrain
159
accesscontrol,adigitalrepositoryordatagridischosen.Thedatasharingsystemshouldprovidethefollowingcapabilities:
Collectionhierarchy.Thisisneededtoseparatethegenerationofdatafromthepublicationofdata.
Accesscontrols.Usuallyintermediatedataproductsarenotreleasedtothepublic.Deriveddataproductsareusuallyheldproprietaryuntiltheyareverifiedforquality.
Supportfordistributeddata.Dataproductsmaybelocatedatmultiplesitesandshouldbemanagedbythedatasharingsystem.
7.29 Code distribution system (Policy 94) ThedistributionofcodemaybedonethroughanopensourcecoderepositorysuchasGitHub,orthroughawebsite,oreventhroughadatarepository.Themajorchallengesarethemanagementofversions,thedevelopmentofdocumentation,andunittestingtoverifyallupdates.
7.30 Retention period (Policy 21) Theretentionperiodforthedataproductsisusuallymeasuredinyears.Achallenge,then,ishowtoshowthatthedataproductswereretainedfortherequiredlengthoftime.Oneapproachistoturnoffdeletiononthedatacollection.Thepolicyimplementsaconstraint:
AppliedattheacDataDeletePolicypolicyenforcementpoint
Theoperationsthatareperformedare:msiDeleteDisallowed
Theruleisavailableathttps://github.com/DICE‐UNC/policy‐workbook/blob/master/acDataDeletePolicy‐collection.re Thisprohibitsdeletionevenbyanadministrator.Thefilesinthecollectioncanthenbecheckedforwhethertheirretentionperiodhasbeenpassed.Theruletocheckretentionperiodusesthepolicyfunction: checkCollInputTheinputvariablesare:
*Coll acollectionnameNosessionvariablesareused.Thepersistentstateinformationis:
COLL_IDCOLL_NAMEDATA_EXPIRYDATA_NAME
160
Theoperationsthatareperformedare:failforeachifmsiGetSystemTimeselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/dmp‐check‐retention.r
7.31 Curation plans (Policy 95) Curationactivitiesinclude:
Validationofdescriptivemetadata Validationofprovenancemetadata Settingofaccesscontrols Verificationofdataformats
ThecurationpoliciescanberegisteredintotheiCATcatalog.Thepoliciescanthenberetrievedfromthecatalogandpublishedasareport.Theexamplepolicylistsallofthepoliciesthatarebeingenforcedatpolicy‐enforcementpointswithintheiRODSdatagrid.Theruleusesthepolicyfunction:
checkCollInputcheckRescInputcreateLogFilefindZoneHostNameisColl
Theinputvariablesare:
*Coll acollectionname*Res astorageresource
Thesessionvariablesare:
$rodsZoneClientThepersistentstateinformationis:
COLL_IDCOLL_NAMERESC_IDRESC_NAMEZONE_CONNECTIONZONE_NAME
Theoperationsthatareperformedare:failforeach
161
ifmsiAdmShowIRBmsiCollCreatemsiDataObjClosemsiDataObjCreatemsiDataObjWritemsiGetSystemTimemsiSplitPathByKeyremoteselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/dmp‐pepRules.r
7.32 Archive system (Policy 96) Forlongtermstorage,adepositionwillberequiredintotheremotearchive.Iftwodatagridsarefederated,thenarulecanberuntoarchiveallfilesfromaselectedcollectionintotheremotestoragelocation.Theruleusesthepolicyfunctions:
checkCollInputcheckRescInputcreateLogFilefindZoneHostNameisColl
Theinputvariablesare:
*Acct ausername*Dest acollectionnamein*DestZone*DestZone azonename*Res astorageresource*Src acollectionname
Thesessionvariablesare:
$rodsZoneClientThepersistentstateinformationis:
COLL_IDCOLL_NAMEDATA_CHECKSUMDATA_NAMERESC_IDRESC_NAMEZONE_CONNECTIONZONE_NAME
162
Theoperationsthatareperformedare:failforeachifmsiCollCreatemsiDataObjChksummsiDataObjCopymsiDataObjCreatemsiGetSystemTimemsiSetACLmsiSplitPathByKeyremoteselectstrlensubstrwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/dmp‐archive.r
7.33 Replication policy (Policy 13) Thenumberofreplicascanbeverifiedforeachfileinacollection.Thisrulelistsallfilesforwhichtherequirednumberofreplicasisnotavailable.Theruleusesthepolicyfunction:
checkCollInputcheckRescInputcreateLogFilefindZoneHostNameisColl
Theinputvariablesare:
*Coll acollectionname*Numrep numberofreplications*Res astorageresource
Thesessionvariablesare:
$rodsZoneClient$userNameClient
Thepersistentstateinformationis:
COLL_IDCOLL_NAMEDATA_IDDATA_NAME
163
Theoperationsthatareperformedare:failforeachifmsiSetACLselectwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/odum‐check‐replicas.r
7.34 Backup policy (Policy 97) Thetimeperiodbetweenbackupscanbesetbyspecifyingaperiodicruleexecutionforarchivingdata.WecanturntherulespecifiedforTask18intoaperiodicrulethatisexecutedevery7days.Theruleusesthepolicyfunctions:
checkCollInputcheckRescInputcreateLogFilefindZoneHostNameisCollisData
Theinputvariablesare:
*Acct ausername*Dest acollectionname*DestZone azonename*Res astorageresource*Src acollectionname
Thesessionvariablesare:
$rodsZoneClient$userNameClient
Thepersistentstateinformationis:
COLL_IDCOLL_NAMEDATA_CHECKSUMDATA_IDDATA_NAMERESC_IDRESC_NAMEZONE_CONNECTIONZONE_NAME
Theoperationsthatareperformedare:
164
delayfailforeachifmsiCollCreatemsiDataObjChksummsiDataObjCopymsiDataObjCreatemsiGetSystemTimemsiSetACLmsiSplitPathByKeyremoteselectstrlensubstrwriteLine
Theruleisavailableathttp://github.com/DICE‐UNC/dmp‐periodic‐backup.r
7.35 Integrity verification (Policy 18) Integritychecksshouldbeperformedperiodicallytocatchfailuremodessuchasmediafailure,storagesystemfailure,dataoverwrites,operatorerror,etc.Evenifboththehardwareandsoftwareperformflawlessy,itisstillpossibleforanoperatorerrortodeleteoroverwriteafile.Thereplicationruleisturnedintoarulethatisexecutedeveryyear.Aproductioncapableversionoftheruleisshownthatisrestartable,monitorstheexecutionrate,checkstheinputvariables,maintainsalogfileofallactions,repairscorruptedfiles,andreplacesmissingreplicas.Inthe“delay”command,theexecutionfrequencyforrepeatingtheruleneedstobeset.Anexampleforatestevery6monthswouldbe: delay(("<PLUSET>1s</PLUSET>"<EF>6m</EF>){Theruleisnamed“rda‐replication‐rule.r”andislistedinsection4.5.2.Thisruleusesthepolicyfunctions:
checkCollInputcheckRescInputcreateLogFilecheckMetaExistsCollfindZoneHostNamegetNumSizeCollgetRescCollisCollselectRescUpdatecreateReplicasupdateCollMeta
165
7.36 Technology management policies (Policy 49) Theizonereportcommandliststhepropertiesofthedatagrid,includingboththeiCATcatalogandstorageservers.Updatesaboutsoftwareversionsandhardwareversionscanbetrackedbyperiodicallyrunningtheizonereport.Thereportincludesinformationaboutmicro‐serviceplugins,policies,andstoragesystems.
7.37 Metadata catalog management (Policy 9) Themetadatacatalog,iCAT,containsallofthestateinformationforthedatagrid.Tominimizerisk,themetadatacatalogshouldbereplicated.Periodicbackupdumpsofthecatalogshouldbesavedoutsideofthedatagrid.Thedatagridusesschemaindirectiontostoredescriptiveandprovenancemetadataattributes.Onceastandardschemaischosen,theschemacanbeinstalledasaHIVEontology.Arulecanthenberuntocomparethedescriptivemetadataforeachfilewiththestandardschema.Anexampleruleiscalledvalidate‐ontologies.randislistedinsection5.12.1.
7.38 Transformative migration (Policy 15) Themigrationofdataformatstonewtechnologyissupportedthroughinvocationofexternaltransformationsystems,suchasNCSAPolyglotandBrownDog.Accesstothesesystemsisinvokedthroughamicro‐servicethatissueshttppostandgetcommands.Examplesforinvokingexternalservicesarelistedinsections5.13.1(acPostProcForModifyAVUMetadata.r),6.27(hipaa‐issue‐url.r),and7.9(dmp‐get‐object‐url.r).
166
8 Verifying Policy Sets: Toverifyatheoryofpolicy‐baseddatamanagement,agenericcharacterizationofdatamanagementsystemsisneeded.Tobasethediscussiononwell‐knownconcepts,considerthecharacterizationoffilesystemsshowninFigure2.Thefilesystemcomprisesanenvironmentthatisdefinedbythestateinformationmaintainedabouteachfile.Interactionswiththefilesystemconsistofeventsthatspecifyanoperation.Eachoperationmanipulatesafileandchangestheassociatedstateinformation.Operationsmayrequireaccesstostateinformationsuchasfilelocation,orfilesize,orfileowner.Ifthestateinformationisconsistentlyupdatedoneachoperationappliedtofileswithinthefilesystem,theenvironmentcanhavepropertiessuchascompleteness,consistency,correctness,andclosure.Thesepropertiesdescribefouressentialelementsofdatamanagement:
1)Whatarethebasicbuildingblocksforcomposingprocedures?2)Whataretheconstraintsforprocedureauthoringanddeployment?3)Howareproceduresimplemented?4)Howistheoutputofprocedureshandled?
Completenessmeansthatalloperationsforeachmanagedfiletypearesupported.Consistencymeansthattherearenoconflictingprocedures.Correctnessmeansthatagivenoperationperformswithouterror.Closuremeansthatoperationsonfileswillgeneratefilesthataremembersofthesystem.Wecanevaluatethepropertiesofcompleteness,consistency,correctness,andclosurebyanalyzingchangestothestateinformation.TypicalfilesystemstateinformationislistedinTable2.Theoperationsperformeduponthefilesystemmayconsistofcreate,open,close,read,write,update,seek,stat,chown,link,andunlink.Anoperationmaybeappliedtoafileortoagroupoffiles.Interactionswiththefilesaredonethroughinteractiveexecutionofclients,whichinvokethedesiredoperationthroughasystemcall.Thisapproachmakesitpossibletoimplementastandarddata
Figure2.FileSystemCharacterization
Table2.FileSystemStateInformationFileName
FileLocationondiskCreationtime
ModificationtimeFilesize
AccesscontrolLocks
SoftLinkDirectory
167
managementapproachondifferenttypesofhardwaresystems,whichinturnenablesthemigrationoffilesacrossstoragesystems.Wecangeneralizethismodelofdatamanagementbyintroducingpoliciesthatcontroltheoperationsperformedwithinthesystem.InFigure3,weintroducethreesignificantchanges:
Operationsarereplacedbypolicies.
Filesarereplacedbyobjects. Updatesonobjectsandon
stateinformationareimplementedasprocedures.
Agiveneventmayinvokemultiplepolicies.Eachpolicycontrolstheexecutionofaprocedurethatchainstogethermultipleoperationsexpressedasmicro‐services.Theobjectsmanipulatedbythepoliciescanincluderesources,users,digitalobjects,micro‐services,rules,metadata,andthepropertiesoftheenvironmentitself.Forexample,considertheadditionofafiletothesystem.Eventhoughtheexpliciteventisasimplefileaddition,theresponseofthesystemmayrequiretheexecutionofmultiplepolicies,witheachpolicypotentiallyexecutingproceduresthatmanipulatemultipletypesofobjects.Policiesthatareexecutedmayinclude:
1. Authenticationofthepersonaddingthefile2. Authorizationfortheadditionofafile3. Evaluationofastoragequotaforthestorageresource4. Creationofalogicalnameforthefile5. Logicalarrangementofthefileasamemberofacollection6. Physicalaggregationofthefileintoacontainer7. Selectionofastorageresourceforthephysicalcopyofthefile8. Creationofaphysicalfilenameonthestorageresource9. Inheritanceofaccesscontrolsfromthecollectionaccesscontrols10. Creationofachecksum11. Replicationofthefiletoasecondstoragelocation12. Assignmentofaretentionperiodforthefile13. Assignmentofadatatypetothefilebasedonthefileextension14. Storageofsystemlevelmetadata(ownername,accesscontrols,checksum,
filesize,replicalocation,retentionperiod,filetype)15. Extractionandstorageofdescriptivemetadata
Figure3.Policy‐basedDataManagement
168
16. Creationofanarchivalinformationpackage(aggregatingmetadatawiththefile)
17. StorageofthefileTheresponseofthesystemiscontrolledbythepoliciesthatareenforcedwithintheenvironment.Anotablechallengeisthatpolicy‐baseddatamanagementsystemshavetheabilitytochangethecontrollingpolicies,andthereforechangetheresponseofthesystemtoexternalevents.Aprocessforvalidatingthepropertiesoftheenvironmentisneededtoverifythateitherthenewpoliciesarecompatiblewithpriorpoliciesandthatthepropertiesoftheenvironmenthavenotchanged,orthattheimpactofthenewpoliciescanbedefinedasapprovedchangestostateinformation.Wecancharacterizeinteractionswiththedatamanagementsystemintermsoftheallowedevents.Eventsmaybeinitiatedinteractivelybyexternalusers,orbytime‐basedprocedures,orbychangesofstateinformation.Inpolicy‐baseddatamanagement,eventsaredetectedatpolicy‐enforcementpoints,whichcontroltheselectionofpoliciesthatshouldbeapplied.Thepoliciesinturncontroltheexecutionofproceduresthatread/create/updatestateinformationandmodifytheobjectsinthesystem.Policiesinvokedatpolicy‐enforcementpointscontrolhowtheenvironmentrespondstoevents.Amappingbetweenevents,thepolicy‐enforcementpoints,thepolicies,theprocedures,andassociatedchangestostateinformationisnecessarytodescribetheenvironment.Ifallchangestostateinformationcanbeidentifiedforallevents,thenthepropertiesoftheenvironmentcanbeverified.Wecanbuildacharacterizationofadatamanagementsystemintermsofthefollowingconcepts:
1. Eventsinvokedbyusersofthesystema. Create,modify,delete,access
2. Entitiesthataremanagedbythesystema. Users,digitalobjects,resources(storage,compute),metadata,rules,
micro‐services,environmentframework3. Policiesthatcontrolassertionsabouttheenvironment
a. Propertiesassociatedwitheachtypeofentity(provenancemetadata,accesscontrol,audittrail,aggregation,retentionperiod)
b. Propertiescontrollingenvironmentoperations(numberofprocessingthreads,numberofI/Ostreams,choiceofphysicalpathname)
Wecanverifyatheoryofpolicy‐baseddatamanagementbyanalyzingtheconsistency,completeness,correctness,andclosureofthestateinformationafterapplicationofeverysupportedevent.Todothiswewillneedtodefinethesetofpoliciesthatareinvokedbyeachevent.Foreachpolicywewillneedtodefinetheproceduresthatareinvoked,andthesetofstateinformationvariablesthataremodifiedbyeachprocedure.Notethatproceduresarecomposedbychainingtogethermicro‐services.Wecanthenidentifythesetsofstateinformation
169
generatedormodifiedbyeachmicro‐service.Averificationpolicycanbedefinedthatvalidatesthattherevisedstateinformationisconsistentwiththedesiredcollectionproperties.Thisapproachcanbeappliedforeachdatamanagementdomain(datasharing,digitallibrary,preservation,processingpipeline)byanalyzingthecontrollingpoliciesandprocedures.Theresultsaredomaindependent.Ananalysisneedstobedoneforeachdomainandforeachchangetothesetofpolicies.Howevertheapproachisgeneric,andtheunderlyinginfrastructurethatisusedtoimplementthepolicy‐baseddatamanagementisgeneric.Inadistributedenvironmentthatencompassesmultiplestoragelocations,multiplenetworkpaths,andmultipleadministrativedomains,correctnesscannotbeguaranteed.Astoragesystemmayhaveamediafailureandcorruptthedatabits.Anetworkmaybecomeunavailableandatransfermaynotcomplete.Aremoteadministratormaychoosetoperformmaintenanceandtakeanentiresystemoffline.Thisimpliesthattheenvironmentneedstobeabletodetectinconsistencies,anduseperiodicpoliciestocorrecttheproblems.Asimpleexampleisthemanagementofintegrity.Astandardapproachistogenerateachecksumforeachfile,andreplicatethefileacrossmultiplestoragesystems.Apolicycanbeexecutedperiodicallythatverifiestheintegrityofeachfilebycomparingthecurrentchecksumwiththestoredvalue.Whenacorruptedfileisfound,thesystemcandeletethecorruptedfile,createanewreplicafromanuncorruptedcopy,updatethesystemmetadata,andlogtheevent.Agoalinapolicy‐baseddatamanagementsystemistoimplementpoliciesthatverifythedesiredpropertiesoftheenvironment,andthatimplementrecoveryproceduresasneededtoensurecompliance.Anextendedgoalistoimplementpoliciesthatensurethatdesiredpropertiesaremaintainedastheenvironmentevolves.
8.1 Analysis of the integrated Rule Oriented Data System ThegeneralityoftheapproachcanbeillustratedusingtheiRODSintegratedRuleOrientedDataSystem[5,10].TheiRODSsoftwareimplementsvirtualizationmechanismsthatenablethefederationofexistingdatamanagementsystems,andtheenforcementofdesiredenvironmentpropertiesacrossthefederatedsystems.TheiRODSdatagridmanagesmultipletypesofentitiesindependentlyofthechoiceofauthenticationenvironment,storagesystem,database,andadministrativedomain:
Users(logicalusernamespace) Digitalobjects(files,workflowstructuredobjects,softlinks) Resources(storagesystems,repositories,computesystems) Metadata(systemstateinformation,provenanceinformation,descriptive
information) Rules(computeractionablepoliciesthatcontroltheexecutionofprocedures) Micro‐services(computerexecutablefunctionsthatcanbechainedinto
procedures)
170
Environmentframework(thedatagriditself).Standardpropertiescanbegeneratedforeachtypeofentity:
Logicalname(persistentidentifierdefinedbythedatagrid) Accesscontrols Aggregation(formationofgroups) Descriptivemetadata Audittrailofeventsandactions
Thestandardpropertiesarereifiedassystemstateinformationthatarestoredinarelationaldatabase(theiCATcatalog).Theimpactofeacheventthataccessesthesystemcanbetrackedthroughthecorrespondingchangestothestateinformation.IniRODS,manyofthestateinformationattributesareupdatedbytheiRODSservermiddlewaretoguaranteeconsistency.Howeverthedatagridadministratorcancustomizechangestothesystembymodifyingthepoliciesthatarestoredintherulebase.Sincethesepoliciesreflectdecisionsbythedatagridadministrator,aprocedureisneededthatverifiestheconsistencyofthedatagrid.Wecangenerateacomprehensiveassessmentoftheconsistentupdateofstateinformationbyanalyzingthemappingof:
Events(clientactions)tomultiplepolicy‐enforcementpoints Policiesinvokedatpolicy‐enforcementpoints Procedurescontrolledbyeachpolicy Chainofmicro‐servicesinvokedbyaprocedure Updatestostateinformationgenerationbyeachmicro‐service Verificationpolicythatmonitorsthestateofthesystem
8.2 Policy‐enforcement points InAppendixA,welistthepolicy‐enforcementpointsiniRODS.Theycanbelooselygroupedintocontrolpointsformanipulatingfiles,users,resources,systemstateinformation,andenvironmentparameters.WhiletheiRODSdatagridprovides71policy‐enforcementpoints,thestandarddatagridusespoliciesatonly11pointswhicharelistedinsection2.
Inpractice,sitesaddrulestoenforcespecificpropertieswithinthedatagrid.Forexample,intheSILSLifeTimeLibrary[11]fiveadditional/modifiedrulesareused,listedinsection3.ToverifythattheLifeTimeLibraryrulesetenforcestherequiredproperties,wewillneedtoexaminewhicheventsinvokethepolicies,andthenanalyzechangestothestateinformationforconsistency.
8.3 Client invocation of policy‐enforcement points InAppendixB,welisteventsgeneratedbytheexecutionoftheunixshellcommandsprovidedwiththeiRODSdatagrid(icommands).TheunixshellcommandsarethemostcomprehensiveinterfaceforiRODSintermsofthepolicy‐enforcementpoints
171
thatcanbetriggered.Eachcommandinvocationmaycausepoliciesatmultiplepolicy‐enforcementpointstobeexecuted.Forthecaseofloadingafileintothedatagrid,thefollowingtenpolicy‐enforcementpointsaretriggered:
1. acChkHostAccessControl2. acSetPublicUserPolicy3. acAclPolicy4. acSetRescSchemeForCreate5. acRescQuotaPolicy6. acSetVaultPathPolicy7. acPreProcForModifyDataObjMeta8. acPostProcForModifyDataObjMeta9. acPostProcForCreate10. acPostProcForPut
WeimmediatelycanseethatfourofthepoliciesaddedfortheSILSLifeTimeLibrarywillneedtobeverifiedfortheirimpactonpolicy‐enforcementpoints3,4,5,and10intheabovelist.TheadditionalpolicyfortheLifeTimeLibrarycontrolsthepreferredstoragelocationforreplications.AnassertionaboutthepropertiesoftheLifeTimeLibraryrequiresverifyingthatthenewpolicieshavenotchangedthedatagridproperties.Wedothisbycheckingwhetherchangestothestateinformationforeachoftheserulesmaintainsthedesiredcompleteness,correctness,closure,andconsistency.Atotalof80differentclientinteractionsarelistedinAppendixB,alongwiththepolicyenforcementpointsthataretriggered.Forotherevents,adifferentsetofpolicyenforcementpointsmaybetriggered.However,allclients(webbrowsers,loadlibraries,I/Olibraries)willtriggerthesamepolicyenforcementpointsforthesameevents.
8.4 Procedures executed at each policy enforcement point TheproceduresexecutedwithintheiRODSdatagridarecomposedbychainingtogethermicro‐services.AppendixCliststheavailablemicro‐services,organizedalphabetically.Mostofthemicro‐servicesdonotaffectthesystemstateinformation,andinsteadareusedtomanagetheworkflow,orinteractwithexternalsystems,orsupportstringmanipulation,orsupportarithmeticoperations,orsupportadministrativefunctions.Therearecurrently348micro‐servicesavailableforuseinrules.Foreachmicro‐servicethesetofsystemattributesthatareread,modified,orwrittenisidentified.AlistofqueriablepersistentstateinformationattributesarelistedinAppendixD.IfapersistentstateinformationattributeisnotincludedinAppendixC,thenitisnotreadormodifiedbyamicro‐service.Thereareatotalof67differentsetsofstateinformationthatmaybemodified.ThesetsarelistedintablesC:2,C:3,andC:4.
172
Ofthelistof348micro‐services,only103modifystateinformation.Outofatotalof338systemstateattributes,151attributesaremodifiedbythemicro‐services.Themappingchallengeistherefore:
80separateclienteventsrepresentedbyicommandactions 71policyenforcementpoints 103micro‐servicesthatmanipulatestateinformation 151persistentstateattributes
Thenumberofcombinationsthatshouldcheckedis
Numberofclientevents*Numberofpolicyenforcementpointsaccessedbytheevent*Numberofmicro‐servicesinvokedatapolicyenforcementpoint*Numberofpersistentstateattributesmodifiedbyamicro‐service.
Inthefollowinganalysis,weignorethepolicy‐enforcementpointsthathavenotbeenmodified,andthemicro‐servicesthatarenotinvokedatapolicy‐enforcementpoint.WeexaminetheimpactofeachpolicyfortheSILSLifeTimeLibrary:
acAclPolicyenforcementpointisusedby37oftheclientactions.o ThispolicycallsthemsiAclPolicy("STRICT")micro‐service.o ThemsiAclPolicysets“STRICT”accessinastructureinmemory.The
persistentstateinformationisnotchangeddirectly.o Tocheckenforcementofthispolicy,alistingoffilesinanon‐public
useraccountcanbetriedtoverifythatthefilescannotbeseen. acSetRescSchemeForCreateenforcementpointisusedby7oftheclient
actions,basicallyeachtimeafileiscreated.o ThispolicycallsthemsiSetDefaultResc("lifelibResc1","null")micro‐
service.o ThemsiSetDefaultRescdefinesthestoragesystemtouseforcreating
afileinastructureinmemory.Thepersistentstateinformationisnotchangeddirectly.
o Theimpactofthepolicycanbemonitoredbyrunningarulethatverifiesthateachfilehasacopyresidingon“lifelibResc1”:
ruleverifyFiles{#Verifyeachfilehasacopyonaspecifiedstorageresource*Path="/$rodsZoneClient/home/$userNameClient/%";*Q=selectDATA_NAME,COLL_NAMEwhereCOLL_NAMElike'*Path';*Count=0;foreach(*Rin*Q){*F=*R.DATA_NAME;*C=*R.COLL_NAME;*Q2=selectcount(DATA_ID)whereCOLL_NAME='*C'andDATA_NAME='*F'andDATA_RESC_NAME='*Resc';foreach(*R2in*Q2){if(*R2.DATA_ID=="0"){*Count=*Count+1;}
173
}}writeLine("stdout","Atotalof*Countfilesarenotpresenton*Resc");}INPUT*Resc="lifelibResc1"OUTPUTruleExecOut
acSetRescSchemeForReplenforcementpointisusedby1clientactionforcreatingareplica.
o ThispolicyalsocallsthemsiSetDefaultResc("renci‐unix1","null")micro‐service.
o ThemsiSetDefaultRescdefinesthestoragesystemtouseforreplicatingafileinastructureinmemory.Thepersistentstateinformationisnotchangeddirectly.
o Enforcementofthepolicycanbemonitoredbyrunningarulethatverifiesthateachfilehasareplicaon“renci‐unix1”.
ruleverifyFiles{#Verifyeachfilehasacopyonaspecifiedstorageresource*Path="/$rodsZoneClient/home/$userNameClient/%";*Q=selectDATA_NAME,COLL_NAMEwhereCOLL_NAMElike'*Path';*Count=0;foreach(*Rin*Q){*F=*R.DATA_NAME;*C=*R.COLL_NAME;*Q2=selectcount(DATA_ID)whereCOLL_NAME='*C'andDATA_NAME='*F'andDATA_RESC_NAME='*Resc';foreach(*R2in*Q2){if(*R2.DATA_ID=="0"){*Count=*Count+1;}}}writeLine("stdout","Atotalof*Countfilesarenotpresenton*Resc");}INPUT*Resc="renci‐unix1"OUTPUTruleExecOut
acRescQuotaPolicyenforcementpointisnotcalledbyanicommand.
o ThispolicycallsthemsiSetRescQuotaPolicy("on")micro‐service.o ThemsiSetRescQuotaPolicyturnsonthestoragequotainastructure
inmemory.Thepersistentstateinformationisnotchangeddirectly.o Enforcementofthepolicycanbecheckedbyrunningarulethat
checkstheQUOTA_USAGE.ruleQuota{#Countnumberofusersthatexceedthequota*Q=selectQUOTA_USER_NAME,QUOTA_OVER;*Count=0;foreach(*Rin*Q){*Over=double(*R.QUOTA_OVER);if(*Over>0.){*Count=*Count+1;*User=*R.QUOTA_USER_NAME;
174
writeLine("stdout","User*Userexceededquota");}writeLine("stdout","*Countpersonsexceedquota");}}INPUTnullOUTPUTruleExecOut
acPostProcForPutenforcementpointisusedby5clientactions.o Thepolicycallstwomicro‐services
delay("<PLUSET>1s</PLUSET>") Thisusespersistentstatevariableset#60tomodify
stateinformation:o RULE_EVENTo RULE_EXEC_ADDRESS
RULE_EXEC_ESTIMATED_EXE_TIMEo RULE_EXEC_FREQUENCYo RULE_EXEC_IDo RULE_EXEC_NAMEo RULE_EXEC_NOTIFICATION_ADDRo RULE‐EXEC_PRIORITYo RULE_EXEC_REI_FILE_PATHo RULE_EXEC_TIMEo RULE_EXEC_USER_NAMEo RULE_ID
msiSysReplDataObj('renci‐unix1','null') Thisreadsthepersistentstatevariablesinset#18to
collectstateinformation:o COLL_CREATE_TIMEo COLL_IDo COLL_MODIFY_TIMEo COLL_NAMEo COLL_OWNER_NAMEo COLL_OWNER_ZONEo DATA_ACCESS_DATA_IDo DATA_ACCESS_TYPEo DATA_ACCESS_USER_IDo TOKEN_IDo TOKEN_NAMEo TOKEN_NAMESPACEo USER_GROUP_IDo USER_IDo USER_NAMEo USER_TYPEo USER_ZONE
Thisupdatespersistentstatevariablesforthereplica:o DATA_CHECKSUMo DATA_COLL_IDo DATA_COMMENTSo DATA_CREATE_TIMEo DATA_EXPIRYo DATA_IDo DATA_MAP_ID
175
o DATA_MODIFY_TIMEo DATA_NAMEo DATA_OWNER_NAMEo DATA_OWNER_ZONEo DATA_PATHo DATA_REPL_NUMo DATA_RESC_GROUP_NAMEo DATA_RESC_NAMEo DATA_SIZEo DATA_STATUSo DATA_TYPE_NAMEo DATA_VERSION
Thecreationofareplicacanbeverifiedbyrunningaperiodicrulethatchecksthatareplicaforeachfileexists,andthattheintegrityofthereplicahasnotbeencompromised.
176
9 Summary: Theimpactofmodificationstothepoliciesusedinpolicy‐baseddatamanagementsystemcanbebasedonanalysisofchangestopersistentstateinformation.Theprocessrequiresidentifyingtheevents(actions)executedbyuseofthesystem,andtheresponsesmadetotheactionsunderpolicy‐basedcontrol.Theresponsesaremappedfromtheclientevents,throughpolicy‐enforcementpoints,tothepoliciesthatareenforced,tothemicro‐servicesthatareexecuted,andfinallytothepersistentstateinformationthatismodified.Rulesthatanalyzetheconsistencyofthechangedstateinformationcanthenbeperiodicallyappliedtoverifysystemstate.Thisapproachrequiresananalysisruleforeachpolicythatischanged.AnexamplebasedontheSILSLifeTimeLibrarypolicysetispresented.
10 Acknowledgements: ThedevelopmentoftheiRODSdatagridandtheresearchresultsinthispaperwerefundedbytheNSFOCI‐1032732grant,"SDCIDataImprovement:ImprovementandSustainabilityofiRODSDataGridSoftwareforMulti‐DisciplinaryCommunityDrivenApplication,"(2010‐2013),andtheNSFCooperativeAgreementOCI‐094084,“DataNetFederationConsortium”,(2011‐2015).WethankShanePusz,UniversityofNorthCarolinaatChapelHillforgeneratingthemicro‐serviceusageinformationfortheiRODSstateinformationattributes.
11 References: 1. http://irods.org/download/2. Moore,R.,A.Rajasekar,MichaelConway,GaryMarchionini,M.Nutt,K.Street,M.
Sullivan,S.Trujillo,B.Wolfe,“LifeTimeLibrary”,JCDLDigitalLibraries‐BeyondtheDesktopworkshop,June16‐17,2011,Ottawa,Canada.
3. ResearchDataAllianceFileDepot,“Implementations:PracticalPolicyWorkingGroup,September2014”.
4. Rajasekar,R.,M.Wan,R.Moore,W.Schroeder,S.‐Y.Chen,L.Gilbert,C.‐Y.Hou,C.Lee,R.Marciano,P.Tooby,A.deTorcy,B.Zhu,“iRODSPrimer:IntegratedRule‐OrientedDataSystem”,Morgan&Claypool,2010.
5. Ward,J.,M.Wan,W.Schroeder,A.Rajasekar,A.deTorcy,T.Russell,H.Xu,R.Moore,“TheintegratedRule‐OrientedDataSystem(iRODS3.0)Micro‐serviceWorkbook”,DICEFoundation,November2011,ISBN:9781466469129,Amazon.com.
6. BitCurator:http://www.bitcurator.net/7. DFXML:http://wiki.bitcurator.net/index.php?title=Fiwalk_and_DFXML8. BulkExtractor:http://www.forensicswiki.org/wiki/Bulk_extractor9. iRODS:https://www.irods.org
177
10. Rajasekar,R.,Wan,M.,Moore,R.,Schroeder,W.,Chen,S.‐Y.,Gilbert,L.,Hou,C.‐Y.,Lee,C.,Marciano,R.,Tooby,P.,deTorcy,A.,andZhu,B..2010.iRODSPrimer:IntegratedRule‐OrientedDataSystem,Morgan&Claypool.DOI=10.2200/S00233ED1V01Y200912ICR012.
11. Moore, R., A. Rajasekar, Michael Conway, Gary Marchionini, M. Nutt, K. Street, M. Sullivan, S. Trujillo, B. Wolfe, “Life Time Library”, JCDL Digital Libraries-Beyond the Desktop workshop, June 16-17, 2011, Ottawa, Canada.
178
Appendix A: Policy‐enforcement Points Each policy‐enforcement point is named. A policy can be added to the rule base(core.re file)using thenameofapolicy‐enforcementpoint to invokea controllingprocedure. Thus to set access control to strict (meaning that no‐one can see thenamesofanyoneelse’sfiles,weaddthepolicy: acAclPolicy{msiAclPolicy("STRICT");}Thepolicyinvokestheexecutionofthemicro‐servicemsiAclPolicyusingtheinputparameter“STRICT”.Threetypesofpolicy‐enforcementpointsareused:
1. Providecontroloftheexecutionofasystemfunction.2. Provide pre‐process control for defining input to the system function
(acPreProc).3. Provide post‐process control formanipulating the output from the system
function(acPostProc).
TableA.1PolicyEnforcementPointsPolicyEnforcementPoint Policy
acAclPolicy ThisrulesetsAccessControlListpolicy.
acBulkPutPostProcPolicyThisrulesetsthepolicyforexecutingthepostprocessingputrule(acPostProcForPut)forbulkput.
acCheckPasswordStrengthThisisapolicypointforcheckingpasswordstrength,calledwhentheadminoruserissettingapassword.
acChkHostAccessControlThisrulecheckstheaccesscontrolbyhostanduserbasedonthepolicygivenintheHostAccessControlfile.
acCreateDefaultCollections Thisrulecontrolscreationofstandardcollectionsforanewuser.acCreateUser Thisruleenablespre‐processandpost‐processforcreationofauser.
acDataDeletePolicyThisrulesetsthepolicyfordeletingdataobjects.ThisisthePreProcessingrulefordelete.
acDeleteUser ThisruleenablespreprocessandpostprocessforuserdeletionacDeleteUserZoneCollections ThisruledeletesstandardusercollectionswithinazoneacGetUserByDN ThisrulecanbeconfiguredtodosomespecialhandlingofGSIDNs.acPostProcForCollCreate Thisrulesetsthepost‐processingpolicyforcreatingacollection.acPostProcForCopy Ruleforpostprocessingthecopyoperation.acPostProcForCreate Ruleforpostprocessingofdataobjectcreate.acPostProcForCreateResource Thisrulesetsthepost‐processingpolicyforcreatinganewresource.acPostProcForCreateToken Thisrulesetsthepost‐processingpolicyforcreatinganewtoken.acPostProcForCreateUser Thisrulesetsthepost‐processingpolicyforcreatinganewuser.acPostProcForDataObjRead Ruleforpostprocessingthereadbuffer.acPostProcForDataObjWrite Ruleforpreprocessingthewritebuffer.acPostProcForDelete Thisrulesetsthepost‐processingpolicyfordeletingdataobjects.acPostProcForDeleteResource Thisrulesetsthepost‐processingpolicyfordeletinganoldresource.acPostProcForDeleteToken Thisrulesetsthepost‐processingpolicyfordeletinganoldtoken.acPostProcForDeleteUser Thisrulesetsthepost‐processingpolicyfordeletinganolduser.acPostProcForFilePathReg Ruleforpostprocessingtheregistrationorafilepath.acPostProcForGenQuery Thisrulesetsthepost‐processingpolicyforgeneralquery.acPostProcForModifyAccessControl Thisrulesetsthepost‐processingpolicyforaccesscontrolmodification.
acPostProcForModifyAVUmetadataThisrulesetsthepost‐processingpolicyforadding/deletingandcopyingtheAVUmetadatafordata,collection,resources,anduser.
acPostProcForModifyCollMetaThisrulesetsthepost‐processingpolicyformodifyingsystemmetadataofacollection.
acPostProcForModifyDataObjMetaThisrulesetsthepost‐processingpolicyformodifyingsystemmetadataofadataobject.
acPostProcForModifyResource Thisrulesetsthepost‐processingpolicyformodifyingthepropertiesofaresource.acPostProcForModifyResourceGroup Thisrulesetsthepost‐processingpolicyformodifyingmembershipofaresource
179
group.acPostProcForModifyUser Thisrulesetsthepost‐processingpolicyformodifyingthepropertiesofauser.acPostProcForModifyUserGroup Thisrulesetsthepost‐processingpolicyformodifyingmembershipofausergroup.
acPostProcForObjRenameThisrulesetsthepost‐processingpolicyforrenaming(logicallymoving)dataandcollections.
acPostProcForOpen Ruleforpostprocessingofdataobjectopen.
acPostProcForPhymvRuleforpostprocessingofdataobjectmoveofaphysicalfilepath(e.g.‐iregcommand).
acPostProcForPut Ruleforpostprocessingtheputoperation.acPostProcForRepl Ruleforpostprocessingofdataobjectreplication.acPostProcForRmColl Thisrulesetsthepost‐processingpolicyforremovingacollection.acPostProcForTarFileReg Ruleforpostprocessingtheregistrationoftheextractedtarfile(fromibun‐x).acPreprocForCollCreate ThisisthePreProcessingruleforcreatingacollection.acPreProcForCreateResource Thisrulesetsthepre‐processingpolicyforcreatinganewresource.acPreProcForCreateToken Thisrulesetsthepre‐processingpolicyforcreatinganewtoken.acPreProcForCreateUser Thisrulesetsthepre‐processingpolicyforcreatinganewuser.
acPreprocForDataObjOpenPreprocessruleforopeninganexistingdataobjectwhichisusedbytheget,copyandreplicateoperations.
acPreProcForDeleteResource Thisrulesetsthepre‐processingpolicyfordeletinganoldresource.acPreProcForDeleteToken Thisrulesetsthepre‐processingpolicyfordeletinganoldtoken.acPreProcForDeleteUser Thisrulesetsthepre‐processingpolicyfordeletinganolduser.acPreProcForExecCmd RuleforpreprocessingwhenremotelyexecutingacommandacPreProcForGenQuery Thisrulesetsthepre‐processingpolicyforgeneralquery.acPreProcForModifyAccessControl Thisrulesetsthepre‐processingpolicyforaccesscontrolmodification.
acPreProcForModifyAVUmetadataThisrulesetsthepre‐processingpolicyforadding/deletingandcopyingtheAVUmetadatafordata,collection,resources,anduser.
acPreProcForModifyCollMeta Thisrulesetsthepre‐processingpolicyformodifyingsystemmetadataofacollection.
acPreProcForModifyDataObjMetaThisrulesetsthepre‐processingpolicyformodifyingsystemmetadataofadataobject.
acPreProcForModifyResource Thisrulesetsthepre‐processingpolicyformodifyingthepropertiesofaresource.
acPreProcForModifyResourceGroupThisrulesetsthepre‐processingpolicyformodifyingmembershipofaresourcegroup.
acPreProcForModifyUser Thisrulesetsthepre‐processingpolicyformodifyingthepropertiesofauser.acPreProcForModifyUserGroup Thisrulesetsthepre‐processingpolicyformodifyingmembershipofausergroup.
acPreProcForObjRenameThisrulesetsthepre‐processingpolicyforrenaming(logicallymoving)dataandcollections
acPreprocForRmCollThisisthePreProcessingruleforremovingacollection.Currentlythereisnofunctionwrittenspecificallyforthisrule.
acRenameLocalZone Thisrulerenamesthezoneandallcollectionswithinthezone.acRescQuotaPolicy Thisrulesetsthepolicyforaresourcequota.acSetChkFilePathPerm Thisrulemanagesmountingofcollections.acSetMultiReplPerResc Preprocessruleforreplicatinganexistingdataobject.acSetNumThreads Ruletosetthenumberofthreadsforadatatransfer.
acSetPublicUserPolicyThisrulesetsthepolicyforthesetofoperationsthatareallowablefortheuser"public"
acSetRescSchemeForCreate Thisisthepreprocessingruleforcreatingadataobject.acSetRescSchemeForRepl Thisisthepreprocessingruleforreplicatingadataobject..
acSetReServerNumProcThisrulesetsthepolicyforthenumberofprocessestousewhenrunningjobsintheirodsReServer.
acSetVaultPathPolicy ThisrulesetsthepolicyforcreatingthephysicalpathintheiRODSresourcevault.acTicketPolicy Thisisapolicypointforticket‐basedaccesscontrol.acTrashPolicy Thisrulesetsthepolicyforwhetherthetrashcanshouldbeused.
180
Appendix B: Client Invocation of Policy Enforcement Points Eachpolicyenforcementpointmaybeinvokedbymultipleclientevents.Foreventsthat manipulate files, up to 12 policy enforcement points are accessed for eachinteraction.Inthefollowingtables,thecolumnslistthepolicyenforcementpoints.Client actions that invoke a policy enforcement point are listed in separate rows.Notethateachtabledefineseventsthatinvokedifferentpolicyenforcementpoints.
TableB.1Filemanipulationevents
icommands
acChkH
ostAccessControl
acSetPublicUserPolicy
acAclPolicy
acSetRescSchem
eForCreate
acRescQ
uotaPolicy
acSetVaultPathPolicy
acPreProcForM
odifyD
ataO
bjM
eta
acPostProcForM
odifyD
ataO
bjM
eta
acPreprocForDataO
bjOpen
acPostProcForOpen
acSetRescSchem
eForRepl
acSetM
ultiRep
lPerResc
acPostProcForCreate
acPostProcForPut
acPostProcForCopy
acPostProcForRep
l
acPostProcForPhym
v
acPreProcForObjRen
ame
acPostProcForObjRen
ame
acPreProcForRmColl
acTrashPolicy
acDataD
eleteP
olicy
acPreProcForCollCreate
acPostProcForCollCreate
acPostProcForFilePathReg
acPostProcForRmColl
acPostProcForDelete
icp Copy a file x x x x x x x x x x x x
icp ‐N 2 Copy a file using 2 I/O threads
x x x x x x x x x x x x
iphybun Physically bundle a collection
x x x x x x x x x x
irepl Replicate a file x x x x x x x x x x
ibun ‐c D Upload/download tar files
x x x x x x x x x x
iput Put a file into the data grid
x x x x x x x x x x
iphymv Physically move a file
x x x x x x x x x x
imv Move a file x x x x x x x x x
irm Remove a file x x x x x x x x x x x
irm ‐r collection Recursively remove a collection
x x x x x x x x x x x x
ichksum Checksum a file x x x x x
iput ‐f Overwrite an existing file
x x x x x x x x
irsync Synchronize two collections
x x x x x x x x
irule ‐ msiDataObjWrite
Write a file x x x x x x x x
irule ‐ msiDataObjRead
Read a file x x x x x
idbo exec Execute a database resource
x x x x x
iget Get a file from the data grid
x x x x x
igetwild.sh Get multiple files x x x x x
imkdir Make a directory x x x x x
ireg Register a file x x x x x x
irmtrash Empty trash x x x x x x
181
TableB.2Eventsthatmanipulateusersandresources
icommands
acChkH
ostAccessControl
acSetPublicUserPolicy
acAclPolicy
acCreateU
ser
acPreProcForCreateU
ser
acCreateU
serF1
acCreateD
efaultCollections
acCreateUserZoneCollections
acCreateCollByA
dmin
acCreateUserZoneCollections
acCreateD
efaultCollections
acPostProcForCreateU
ser
acPreProcForM
odifyU
ser
acPostProcForM
odifyU
ser
acDeleteU
ser
acPreProcForDeleteU
ser
acDeleteU
serF1
acDeleteD
efaultCollections
acDeleteU
serZoneC
ollections
acDeleteCollByA
dmin
acPostProcForDeleteU
ser
acPreProcForCreateR
esource
acPostProcForCreateR
esource
acPreProcForDeleteResource
acPostProcForDeleteR
esource
iadmin mkuser Make a user x x x x x x x x x x x
iadmin mkgroup
Make a user group
x x x x x x x x x x x
iadmin moduser
Modify a user x x x x
ipasswd Create password x x x x
iadmin rmuser Remove user x x x x x x x x x
iadmin mkresc Make a resource x x x x
iadmin rmresc Remove a resource
x x x x
TableB.3AdministrativeOperations
icommands
acChkH
ostAccessControl
acSetPublicUserPolicy
acAclPolicy
acPreProcForM
odifyResource
acPostProcForM
odifyResource
acPreProcForM
odifyU
serGroup
acPostProcForM
odifyU
serGroup
acPreProcForM
odifyResourceG
roup
acPostProcForM
odifyResourceG
roup
acPreProcForCreateToken
acPostProcForCreateToken
acPreProcForDeleteToken
acPostProcForDeleteToken
acVacuum
acPreProcForM
odifyA
VUMetadata
acPostProcForM
odifyA
VUMetadata
acPreProcForM
odifyA
ccessControl
acPostProcForM
odifyA
ccessControl
acPreProcForM
odifyCollM
eta
acPostProcForM
odifyCollM
eta
acRen
ameLocalZone
acG
etIcatResults
acPurgeFiles
iadmin modresc Modify a resource x x x x
iadmin atg Add user to group x x x x
iadmin rfg Remove use from group x x x x
iadmin atrg Add resource to resource group x x x x
iadmin rfrg Remove resource from resource group
x x x x
iadmin at Add token x x x x
iadmin rt Remove token x x x x
iadmin pv Initiate database vacuum x x x
imeta List metadata x x x x
ichmod Change access x x x x x
imcoll ‐m l Mount a collection x x x x x
iadmin modzone Modify a zone
x x x
irule ‐ acPurgeFiles Purge deleted files
x x x x x
182
TableB.4Operationsonmetadata,rules,andremoteexecution
icommands
acChkH
ostAccessControl
acSetPublicUserPolicy
acAclPolicy
acConvertTo
Int
acGetUserByD
N
acSetNumTh
reads
acSetChkFilePathPerm
acSetReServerNumProc
acPreProcForGen
Query
acPostProcForGen
Query
acPostProcForDataO
bjW
rite
acPostProcForDataO
bjRead
irule ‐ acConvertToInt Execute a rule to convert to integer x x x x
gsi authentication Authenticate using GSI x
irule ‐ acSetNumThreads Set number of threads for data transfer x x x
irule ‐ msiNoChkFilePathPerm.r Set permissions for registration x x x
irule ‐ acSetReServerNumProc Set number of execution threads x x x
PrePostProcForGenQueryFlag = 1 Execute general query x x
ReadWriteRuleState = ON_STATE Modify a data object x x
irule ‐ rulemsiExecGenQuery Execute a general query x x x
iinit Initialize access to the data grid x x
iadmin Administration interface x x
iadmin mkdir Make a directory x x
icd Change directory x x x
iexecmd Execute a remote command x x
ifsck Check consistency of data in vault x x x
ilocate Search for a file x x x
ils List files x x x
ilsresc List resources x x x
imiscsvrinfo List server information x x
ips Display connections for running agents x x
iqdel Delete rule from queue x x x
iqmod Modify rule in queue x x
iqstat List rules in queue x x x
iquest Query metadata catalog x x x
iquota Show information on iRODS quotas x x x
irule Execute a rule x x
iscan i: Check registration of local files x x x
isysmeta List system metadata x x
itrim Delete replicas x x x
iuserinfo List user information x x x
ixmsg Send a message
ienv List environment variables
ihelp List icommands
iadmin mkzone Make a data grid
iadmin rmzone Remove a data grid
iadmin asq Set an alias
iadmin rsq Remove an alias
ierror List error message
iexit Exit from the data grid
ipwd Change password
183
Appendix C: Micro‐services Themicro‐servicesencapsulatebasicoperationsthatmaybeusefulwhenimplementingapolicy.Thetypesofoperationsincludemanipulationof:
1. Collections2. Dataobjects3. Outputfilesandstrings4. Rulebase5. Workflow6. Messagingsystem7. Environment8. Metadata9. Externalservices10. Remotedatabaseaccess11. Softlinks12. HDF13. Propertylists14. URLs15. Webservices16. XML
Foreachmicro‐service,anidentifierisprovidedthatdefinesthesetofpersistentstatevariablesreadormodifiedbyexecutionofthemicro‐service.ThepersistentstatevariablesetsarelistedinTableC.2.Notethatmicro‐servicesthatdonotmodifystateinformationarelistedwithpersistentstateset“0”.
TableC.1Listofmicro‐servicesavailableiniRODSversion4.0
Micro‐service Persistent State Set
‐ Negation operator for arithmetic 0
! Negation operator for boolean variables 0
!= Negation operation for conditional test 0
. Structure operator for extracting variables from structure 0
* Workflow variable 0
/ Division operator for arithmetic 0
&& And operator for query 0
% Module operator for arithmetic 0
%% Or operator for query 0
^ Exponentiation operator for arithmetic 0
^^ Calculate nth root for arithmetic 0
+ Addition operator for arithmetic 0
++ Addition operator for strings 0
< Less than operator for conditional tests 0
<= less than or equal operator for conditional tests 0
= Assignment operator for variables 0
== Equal operator for conditional tests 0
> Greater than operator for conditional tests 0
>= Greater than or equal operator for conditional tests 0
|| Or operator for query 0
abs Absolute value operator for arithmetic 0
applyAllRules Apply all rules 0
average Average operator for arithmetic 0
184
bool Boolean type operator 0
break Break loop execution operator for workflow 0
ceiling Calculate closest larger integer for arithmetic 0
cons List definition operator 0
cut No retry operator on failure for workflow 0
datetime Date‐time converter for workflow 0
datetimef Data‐time formatted converter for workflow 0
delay Delay execution of a rule 60
double Double type operator 0
elem List element operator 0
errorcode Trap error code operator for workflow 0
errormsg Trap error message operator for workflow 0
eval Evaluate code 0
execCmdArg Execute remote command with an argument 0
exp Exponentiation operator for arithmetic 0
fail Fail operator for workflow 0
floor Calculate closest lower integer for arithmetic 0
for For loop operator for workflow 0
foreach For each loop operator for workflow list 0
hd Calculate the head of a list 0
if Conditional test for workflow 0
int Integer type operator 0
let Define function variables in an expression 0
like Similarity operator for query 0
like regex Similarity operator for query 0
list List structure type 0
log Logarithm operator for arithmetic 0
match Matches a string against a regular expression 0
max Maximum operator for arithmetic 0
min Minimum operator for arithmetic 0
msiAclPolicy Set access control policy 0
msiAddConditionToGenQuery Add condition to a general query 0
msiAddKeyVal Add key‐value pair to an in‐memory structure 0
msiAddKeyValToMspStr Add key‐value pair to an in‐memory structure for concatenating command arguments 0
msiAddSelectFieldToGenQuery Add select field to a general query 0
msiAddToNcArray Modify an array in a netCDF file 0
msiAddUserToGroup Admin ‐ add a user to a group 66
msiAdmAddAppRuleStruct Admin ‐ add rules to an in‐memory structure 0
msiAdmAppendToTopOfCoreRE Admin ‐ append rules to the top of the rule base (core.re file) 0
msiAdmChangeCoreRE Admin ‐ change the rule base (core.re file) 0
msiAdmClearAppRuleStruct Admin ‐ clear rules from the in‐memory structure 0
msiAdmInsertDVMapsFromStructIntoDB Admin ‐ Insert persistent state name maps from memory structure into database 48
msiAdmInsertFNMapsFromStructIntoDB Admin‐Insertfunctionnamemapsfrommemorystructureintodatabase 51
msiAdmInsertMSrvcsFromStructIntoDB Admin ‐ insert micro‐service names from in‐memory structure into database 54
msiAdmInsertRulesFromStructIntoDB Admin ‐ Insert rules from memory structure into database 58
msiAdmReadDVMapsFromFileIntoStruct Admin ‐ load persistent state name maps from file into memory structure 0
msiAdmReadFNMapsFromFileIntoStruct Admin ‐ Load function name maps from file into memory structure 0
msiAdmReadMSrvcsFromFileIntoStruct Admin ‐ Read micro‐service name maps from file into memory structure 0
msiAdmReadRulesFromFileIntoStruct Admin ‐ Read rules from file into memory structure 0
msiAdmRetrieveRulesFromDBIntoStruct Admin ‐ Load rules from database into a memory structure 59
msiAdmShowCoreRE Admin ‐ list rules from rule base (core.re file) 0
msiAdmShowDVM Admin ‐ list persistent state names 0
msiAdmShowFNM Admin ‐ list function names (micro‐services) 0
msiAdmWriteDVMapsFromStructIntoFile Admin ‐ write persistent state name maps from memory into a file 0
msiAdmWriteFNMapsFromStructIntoFile Admin ‐ write function name maps from memory into a file 0
msiAdmWriteMSrvcsFromStructIntoFile Admin ‐ write micro‐service names from memory into a file 0
msiAdmWriteRulesFromStructIntoFile Admin ‐ write rules from memory into a file 0
msiApplyDCMetadataTemplate Apply the Dublin Core template to set attribute‐value‐unit triplets on a digital object 27
185
msiAssociateKeyValuePairsToObj Add attribute‐value‐units to a digital object, specified as key‐value pairs 7
msiAutoReplicateService Verify integrity and repair corrupted digital objects 26
msiBytesBufToStr Format a buffer into a string 0
msiCheckAccess Check access control 28
msiCheckHostAccessControl Check host access control 65
msiCheckOwner Check owner of a digital object 0
msiCheckPermission Check access permissions 0
msiCloseGenQuery Close the memory structure for a general query 0
msiCollCreate Create a collection 24
msiCollectionSpider Apply workflow to digital objects in a collection 15
msiCollRepl Replicate a collection 18
msiCollRsync Recursively synchronize a source collection with a target collection 14
msiCommit Commit a change to the metadata catalog 0
msiConvertCurrency Get conversion rates for currencies from a web service 0
msiCopyAVUMetadata Copy attribute‐value‐units between digital objects 27
msiCreateCollByAdmin Admin ‐ create a collection 2
msiCreateUser Admin ‐ create a user 63
msiCreateUserAccountsFromDataObj Create user accounts specified in a list in a digital object 20
msiCreateXmsgInp Create an Xmsg packet from input parameters (messaging system) 0
msiCutBufferInHalf Decrease size of an in‐memory buffer 0
msiDataObjAutoMove Move a file into a destination collection 13
msiDataObjChksum Checksum a digital object 15
msiDataObjClose Close a digital object 47
msiDataObjCopy Copy a digital object 16
msiDataObjCreate Create a digital object 13
msiDataObjGet Get a digital object 13
msiDataObjLseek Seek to a location in a digital object 0
msiDataObjOpen Open a digital object 20
msiDataObjPhymv Physically move a digital object 22
msiDataObjPut Put a digital object into the data grid 0
msiDataObjRead Read a digital object 0
msiDataObjRename Rename a digital object 13
msiDataObjRepl Replicate a digital object 13
msiDataObjRsync Synchronize a digital object with an iRODS collection 15
msiDataObjTrim Delete selected replicas of a digital object 13
msiDataObjUnlink Delete a digital object 20
msiDataObjWrite Write a digital object 0
msiDboExec Execute a database resource object 56
msiDbrCommit Execute a database resource commit 56
msiDbrRollback Rollback a database resource object 56
msiDeleteCollByAdmin Admin‐ delete a collection 36
msiDeleteDisallowed Turn off deletion for a digital object 0
msiDeleteUnusedAVUs Delete unused attribute‐value‐unit triplets 52
msiDeleteUser Delete a user 67
msiDeleteUsersFromDataObj Delete users specified in a list in a digital object 20
msiDigestMonStat Generate and store load factors for monitoring resources 61
msiDoSomething Template for constructing a new micro‐service 0
msiExecCmd Execute a remote command 0
msiExecGenQuery Execute general query user
defined
msiExecStrCondQuery Convert a string to a query and execute user
defined
msiExit Add a user explanation to the error stack 0
msiExportRecursiveCollMeta Recursively export collection metadata into a buffer using pipe‐delimited format 33
msiExtractTemplateMDFromBuf Use a template to apply pattern matching to a buffer and extract key‐value pairs 0
msiFlagDataObjwithAVU Add an attribute‐value‐unit to a digital object 27
msiFlagInfectedObjs Parse the output from clamscan and flag infected objects 20
msiFloatToString Convert a binary variable to a string 0
msiFlushMonStat Delete old usage monitoring statistics 0
msiFreeBuffer Free space allocated to an in‐memory buffer 0
msiFreeNcStruct Free an in‐memory structure used to process netCDF files 0
186
msiFtpGet Get a file from an FTP site 9
msiGetAuditTrailInfoByActionID Get audit trail information based on ActionID 1
msiGetAuditTrailInfoByKeywords Get audit trail information based on use of keywords 1
msiGetAuditTrailInfoByObjectID Get audit trail information based on ObjectIDs 1
msiGetAuditTrailInfoByTimeStamp Get audit trail information based on time stamps 1
msiGetAuditTrailInfoByUserID Get audit trail information based on userID 1
msiGetCollectionACL Get access controls for a collection 6
msiGetCollectionContentsReport Generate a report of collection contents 34
msiGetCollectionPSmeta Get attribute‐value‐units from a collection in pipe‐delimited format 38
msiGetCollectionSize Get the size of a collection 35
msiGetContInxFromGenQueryOut Get continuation index for whether additional rows are available for a query result 0
msiGetDataObjACL Get access control list for a digital object 19
msiGetDataObjAIP Create XML file containing system and descriptive metadata 12
msiGetDataObjAVUs Get attribute‐value‐units from a digital object 32
msiGetDataObjPSmeta Get attribute‐value‐units from a digital object in pipe‐delimited format 32
msiGetDiffTime Get the difference between two system times 0
msiGetDVMapsFromDBIntoStruct Load persistent state name maps from database into memory structure 49
msiGetFNMapsFromDBIntoStruct Load function name maps from database into memory structure 50
msiGetIcatTime Get the system time from the metadata catalog 0
msiGetMoreRows Get more query results 0
msiGetMSrvcsFromDBIntoStruct Load micro‐service names from database into memory structure 53
msiGetObjectPath Convert from in‐memory structure to string for printing 0
msiGetObjType Get the type of digital object (file, collection, user, resource) 31
msiGetQuote Get stock quotation by accessing external web service 0
msiGetRescAddr Get the IP address of a storage resource 0
msiGetRulesFromDBIntoStruct Load rules from database into a memory structure 59
msiGetSessionVarValue Get value of a session variable from in‐memory structure 0
msiGetStderrInExecCmdOut Retrieve standard error from remote command execution 0
msiGetStdoutInExecCmdOut Retrieve standard out from remote command execution 0
msiGetSystemTime Get the system time from the iRODS server 0
msiGetTaggedValueFromString Use pattern‐based extraction to retrieve a value for a tag from a string 0
msiGetUserACL Get access control list for a user 30
msiGetUserInfo Get information about a user 64
msiGetValByKey Extract a value from in‐memory structure that holds result of a query 0
msiGoodFailure Force failure in a workflow without initiating recovery procedures 0
msiGuessDataType Guess the data type based on the file extension 62
msiH5Dataset_read Read an HDF5 files 0
msiH5Dataset_read_attribute Get attributes from an HDF5 file 0
msiH5File_close Close an HDF5 file 44
msiH5File_open Open an HDF5 file 25
msiH5Group_read_attribute Get group attributes from an HDF5 file 0
msiHumanToSystemTime Convert human time format to system time format 0
msiImageConvert Convert image format 0
msiImageGetProperties Get image properties from an image (Colors, ColorSpace, Depth, Format, Gamma, …) 0
msiIp2location Convert an IP address to a location using an external web service 0
msiIsColl Verify digital object is a collection 37
msiIsData Check if digital object is a file 31
msiListEnabledMS List enabled micro‐services 0
msiLoadACLFromDataObj Load access controls from a list in a digital object 20
msiLoadMetadataFromDataObj Load attribute‐value‐units from a list in a digital object 20
msiLoadMetadataFromXml Load metadata for digital objects from an XML file 11
msiLoadUserModsFromDataObj Load user information from a list in a digital object 20
msiMakeGenQuery Make a general query 0
msiMakeQuery Construct a query 0
msiMergeDataCopies Merge multiple collections to create an authoritative version 17
msiNccfGetVara Get variables from a netCDF file 0
msiNcClose Close a netCDF file 0
msiNcCreate Create a netCDF file 10
msiNcGetArrayLen Get array length from a netCDF file 0
msiNcGetAttNameInInqOut Get attribute names from a netCDF file 0
msiNcGetAttValStrInInqOut Get attribute values from a netCDF file 0
187
msiNcGetDataType Get data type from a netCDF file 0
msiNcGetDimLenInInqOut Get dimension length from a netCDF file 0
msiNcGetDimNameInInqOut Get dimension name from a netCDF file 0
msiNcGetElementInArray Get an element from an array in a netCDF file 0
msiNcGetFormatInInqOut Get the format of a netCDF file 0
msiNcGetGrpInInqOut Get group information from a netCDF file 0
msiNcGetNattsInInqOut Get the number of attributes in a netCDF file 0
msiNcGetNdimsInInqOut Get the number of dimensions in a netCDF file 0
msiNcGetNGrpsInInqOut Get the number of groups in a netCDF file 0
msiNcGetNumDim Get a dimension from a netCDF file 0
msiNcGetNvarsInInqOut Get the number of variables in a netCDF file 0
msiNcGetVarIdInInqOut Get a variable ID from a netCDF file 0
msiNcGetVarNameInInqOut Get a variable name from a netCDF file 0
msiNcGetVarsByType General variable sub‐setting function for a netCDF file 0
msiNcGetVarTypeInInqOut Get a variable type from a netCDF file 0
msiNcInq Query a netCDF file 0
msiNcInqGrps Get group paths for a given netCDF ID 0
msiNcInqId Get netCDF ID 0
msiNcInqWithId Query a netCDF file with a netCDF ID 0
msiNcIntDataTypeToStr Convert netCDF data type to a string 0
msiNcOpen Open a netCDF file 13
msiNcOpenGroup Open a group within a netCDF file 0
msiNcRegGlobalAttr Register a global attribute in a netCDF file 0
msiNcSubsetVar Subset a variable in a netCDF file 0
msiNcVarStat List variable information in a netCDF file 0
msiNoChkFilePathPerm Set policy for checking the file path permission when registering a physical file path 0
msiNoTrashCan Set policy for use of trash can 0
msiObjByName Retrieve astronomy images by name using web services 0
msiobjget_dbo Get a database object from a registered database resource 0
msiobjget_http Get an http page from a registered web site 0
msiobjget_irods Get a file from a registered iRODS path name 0
msiobjget_slink Get a digital object referenced by a soft link to an iRODS data grid 20
msiobjget_srb Get a file from a registered Storage Resource Broker path name 0
msiobjget_test Test the micro‐service object framework 0
msiobjget_z3950 Get an object from a registered Z39.50 site 0
msiobjput_dbo Write a registered database object resource 0
msiobjput_http Write a registered http page 0
msiobjput_irods Write a registered iRODS digital object 0
msiobjput_slink Write a registered iRODS digital object in a remote iRODS data grid 0
msiobjput_srb Write a registered Storage Resource Broker digital object 0
msiobjput_test Test the micro‐service object framework 0
msiobjput_z3950 Write a registered Z 39.50 digital object 0
msiObjStat Get status of digital object for workflow 21
msiOprDisallowed Disallow an operation 0
msiPhyBundleColl Physically bundle a collection 23
msiPhyPathReg Register a physical path 0
msiPrintGenQueryInp Print a general query 0
msiPrintGenQueryOutToBuffer Write contents of output results from a general query into a buffer 0
msiPrintKeyValPair Print a key value pair returned from a query 0
msiPropertiesAdd Add properties to a list 0
msiPropertiesClear Clear properties from a list 0
msiPropertiesClone Clone a properties list 0
msiPropertiesExists Verify existence of properties in a list 0
msiPropertiesFromString Create a properties list from a string 0
msiPropertiesGet Get a property from a list 0
msiPropertiesNew Create a new property list 0
msiPropertiesRemove Remove properties from a list 0
msiPropertiesSet Set the value of a property in a list 0
msiPropertiesToString Convert a property list into a string buffer 0
msiQuota Admin ‐ calculate storage usage and check storage quotas 46
msiRcvXmsg Receive an Xmsg packet (messaging system) 0
msiReadMDTemplateIntoTagStruct Parse a buffer holding a tag template and store the tags in an in‐memory 0
188
tag structure
msiRecursiveCollCopy Recursively copy a collection 5
msiRemoveKeyValuePairsFromObj Remove attribute‐value‐unit from digital object, specified as key‐value pair 28
msiRenameCollection Rename a collection 8
msiRenameLocalZone Admin ‐ Rename the local zone (data grid) 40
msiRmColl Remove a collection 39
msiRollback Roll back a database transaction 0
msiSdssImgCutout_GetJpeg Get an astronomy image cutout using a web service 0
msiSendMail Send e‐mail message 0
msiSendStdoutAsEmail Send standard output as an e‐mail message 0
msiSendXmsg Send an Xmsg packet (messaging system) 0
msiServerBackup Backup an iRODS server to a local vault 3
msiServerMonPerf Monitor server performance 57
msiSetACL Set an access control 4
msiSetBulkPutPostProcPolicy Control use of the acPostProcForPut policy when using a bulk put operation 0
msiSetChkFilePathPerm Disallow non‐admin user from registering files 0
msiSetDataObjAvoidResc Disallow use of a storage resource 0
msiSetDataObjPreferredResc Set the preferred storage resource 0
msiSetDataType Set the type of digital object (file, collection, user, resource) 41
msiSetDataTypeFromExt Set a recognized data type for a digital object based on its extension 42
msiSetDefaultResc Set the default storage resource 0
msiSetGraftPathScheme Define the physical path name for storing files 0
msiSetMultiReplPerResc Allow multiple replicas to exist on the same storage resource 0
msiSetNoDirectRescInp Define a list of resources that cannot be used by a normal user 0
msiSetNumThreads Set the number of threads used for parallel I/O 0
msiSetPublicUserOpr Set a list of operations that can be performed by the user "public" 0
msiSetQuota Set resource usage quota 55
msiSetRandomScheme Set the physical path name based on a randomly generated path 0
msiSetReplComment Set data object comment field 29
msiSetRescQuotaPolicy Turn resource quotas on or off 0
msiSetRescSortScheme Set the scheme used for selecting a storage resource 0
msiSetReServerNumProc Set the number of execution threads for processing rules 0
msiSetResource Set the resource to use within a workflow 0
msiSleep Sleep for a specified interval 0
msiSortDataObj Sort the order in which resources will be accessed to retrieve a replicated digital object 0
msiSplitPath Split a path into a collection and file name 0
msiSplitPathByKey Split a path based on a key (separate a file name from an extension) 0
msiStageDataObj Stage a digital object to a specified resource 0
msiStoreVersionWithTS Create a time‐stamped version of a digital object 20
msiStrArray2String Convert an array of strings to a list of strings separated by "%" 0
msiStrCat Concatenate a string to a target string 0
msiStrchop Remove the last character of a string 0
msiString2KeyValPair Convert a string to a key‐value pair in memory structure 0
msiString2StrArray Convert a list of strings separated by "%" to an in‐memory array of strings 0
msiStripAVUs Remove attribute‐value‐units from a digital object 28
msiStrlen Get the length of a string 0
msiStrToBytesBuf Load a string into an in‐memory buffer 0
msiStructFileBundle Create a bundle of files in a collection for export as a tar file 13
msiSysChksumDataObj Checksum a digital object 45
msiSysMetaModify Modify system metadata attributes 43
msiSysReplDataObj Admin ‐ replicate a digital object 18
msiTarFileCreate Create a tar file 47
msiTarFileExtract Extract files from a tar file 20
msiVacuum Optimize indices in the metadata catalog 0
msiWriteRodsLog Write a string into iRODS/server/log/rodsLog 0
msiXmlDocSchemaValidate Validate an XML document schema for adding attributed‐value‐unit triplets 13
msiXmsgCreateStream Create a message stream (messaging system) 0
msiXmsgServerConnect Connect to a message stream (messaging system) 0
msiXmsgServerDisConnect Disconnect from a message stream (messaging system) 0
189
msiXsltApply Apply an XSLT transformation to an XML document 13
msiz3950Submit Retrieve a record from a Z39.50 server 0
nop Null operation 0
not like Not like operator for query 0
not like regex Not like operator for query using regular expression 0
readXMsg Read a message stream (messaging system) 0
remote Execute rule at a remote site 0
setelem Set an element in a list 0
size Return the number of elements in a list 0
split Split a string 0
str Convert a variable to a string 0
strlen Return the length of a string 0
substr Create a specified sub‐string 0
succeed Cause a workflow to immediately succeed (workflow operator) 0
time Get the current time 0
timestr Convert a datetime variable to a string 0
timestrf Convert a datetime variable to a string using a format 0
tl Calculate the tail of a list 0
triml Trim a prefix of a string 0
trimr Trim a suffix of a string 0
while While loop (workflow operator) 0
writeBytesBuf Write a buffer to standard output or standard error 0
writeKeyValPairs Write key‐value pairs to standard output or standard error from an in‐memory structure 0
writeLine Write a line to standard output or standard error 0
writePosInt Write a positive integer to standard output or standard error 0
writeString Write a string to standard output or standard error 0
writeXMsg Write a message packet (messaging system) 0
ThesetsofpersistentstateinformationarelistedintableC:2.Eachpersistentstateinformationsetidentifieswhetherapersistentstate:
1–attributeisread 2–attributeismodified 3–attributeisbothreadandmodified.
TableC:2Persistentstateattributesmodifiedbymicro‐servicesforfiles&
collections
Persistent State Variable Sets 2 3 4 5 6 7 8 9 10
11
12
13
14
15
16
17
18
19
20
21
22
23
Number of micro‐services 1 1 1 1 1 1 1 1 1 1 1 10 1 3 1 1 2 1
11 1 1 1
COLL_ACCESS_COLL_ID 2 3 3 1 1 1 1 1 1
COLL_ACCESS_TYPE 2 3 3 1 1 1 1 1 1
COLL_ACCESS_USER_ID 2 3 3 1 1 1 1 1 1
COLL_CREATE_TIME 2 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
COLL_ID 3 3 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1
COLL_INHERITANCE 1 1 1
COLL_MODIFY_TIME 2 3 1 1 2 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1
COLL_NAME 3 3 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
COLL_OWNER_NAME 2 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
COLL_OWNER_ZONE 2 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
COLL_PARENT_NAME 2 2 3 1
DATA_ACCESS_DATA_ID 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1
DATA_ACCESS_TYPE 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1
DATA_ACCESS_USER_ID 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1
DATA_CHECKSUM 1 3 2 1 1 1 3 3 3 1 3 1 1 1 2
190
Persistent State Variable Sets 2 3 4 5 6 7 8 9 10
11
12
13
14
15
16
17
18
19
20
21
22
23
DATA_COLL_ID 1 1 1 1 1 1 2 1 1 1 1 1 1 3 3 1 1 1
DATA_COMMENTS 1 1 1 1 1 1 1 1 3 1
DATA_CREATE_TIME 1 3 2 1 1 1 1 1 1 1 3 1 1 1
DATA_EXPIRY 1 1 1 1 1 1 1 1 3 1
DATA_ID 1 1 1 1 1 1 3 2 1 1 1 1 1 1 1 3 1 1 1
DATA_MAP_ID 1 1 1 1 1 1 1 1 3 1
DATA_MODIFY_TIME 1 3 2 1 1 1 1 1 3 3 3 1 1 1
DATA_NAME 1 1 1 1 1 1 3 2 1 1 1 1 1 1 1 3 1 1 1
DATA_OWNER_NAME 1 3 2 1 1 1 1 1 1 1 3 1 1 1
DATA_OWNER_ZONE 1 3 2 1 1 1 1 1 1 1 3 1 1 1
DATA_PATH 1 3 2 1 1 1 1 1 1 1 3 1 2
DATA_REPL_NUM 1 3 2 1 1 1 1 1 1 1 3 1
DATA_RESC_GROUP_NAME 1 2 2 1 1 1 1 1 1 1 3 1 2
DATA_RESC_NAME 1 3 2 1 1 1 1 1 1 1 3 1 2
DATA_SIZE 1 3 2 1 1 1 1 1 1 1 3 1 1 1 2
DATA_STATUS 1 1 1 1 1 1 1 1 3 1
DATA_TYPE_NAME 1 2 2 1 1 1 1 1 1 1 3 1
DATA_VERSION 1 2 2 1 1 1 1 1 1 1 3 1
META_COLL_ATTR_ID 2 3
META_COLL_ATTR_NAME 2 3
META_COLL_ATTR_UNITS 2 3
META_COLL_ATTR_VALUE 2 3
META_COLL_CREATE_TIME 2 3
META_COLL_MODIFY_TIME 2 3
META_DATA_ATTR_ID 2 3 3
META_DATA_ATTR_NAME 2 3 1 1
META_DATA_ATTR_UNITS 2 3 1 1
META_DATA_ATTR_VALUE 2 3 1 1
META_DATA_CREATE_TIME 2 3 2
META_DATA_MODIFY_TIME 2 3 2
TOKEN_ID 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
TOKEN_NAME 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
TOKEN_NAMESPACE 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
USER_GROUP_ID 1 1 1 1 1 1 1 1 1 1 1 1 1 1
USER_ID 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
USER_NAME 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
USER_TYPE 1 1 1 1 1 1 1 1 1 1 1 1 1 1
USER_ZONE 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
ZONE_NAME 1
ZONE_TYPE 1
TableC:3Additionalpersistentstateattributesetsforoperationsonfilesandcollections.
Persistent State Variable Sets 24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
Number of micro‐services 1 1 1 3 3 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2
COLL_CREATE_TIME 1
COLL_ID 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
COLL_MODIFY_TIME 1 2 2
COLL_NAME 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2
COLL_OWNER_NAME 1
COLL_OWNER_ZONE 1 2
COLL_PARENT_NAME 1 2
DATA_ACCESS_DATA_ID 1 1 1 1 1 1 1 1 1 1 1
DATA_ACCESS_TYPE 1 1 1 1 1 1 1 1 1 1
DATA_ACCESS_USER_ID 1 1 1 1 1 1 1 1 1 1
DATA_CHECKSUM 1 3 2
DATA_COLL_ID 1 1 1 1 1 1 1 1 1 1 1 1
191
Persistent State Variable Sets 24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
DATA_COMMENTS 1 1 2 2
DATA_CREATE_TIME 1 1
DATA_EXPIRY 1 1 2
DATA_ID 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
DATA_MAP_ID 1 1
DATA_MODIFY_TIME 1 1 2 2
DATA_NAME 1 1 1 1 1 1 1 1 1
DATA_OWNER_NAME 1 1 1
DATA_OWNER_ZONE 1 1 2 1
DATA_PATH 1 1 1
DATA_REPL_NUM 1 1 1 1 1 1 1
DATA_RESC_GROUP_NAME 1 1 1
DATA_RESC_NAME 1 1
DATA_SIZE 1 1 1 1 2
DATA_STATUS 1 1
DATA_TYPE_NAME 1 1 1 2 2 2
DATA_VERSION 1 1
META_COLL_ATTR_ID 1 1
META_COLL_ATTR_NAME 1 1
META_COLL_ATTR_UNITS 1 1
META_COLL_ATTR_VALUE 1 1
META_DATA_ATTR_ID 2 1 1
META_DATA_ATTR_NAME 2 1 1
META_DATA_ATTR_UNITS 2 1 1
META_DATA_ATTR_VALUE 2 1 1
META_DATA_CREATE_TIME 2
META_DATA_MODIFY_TIME 2
QUOTA_LIMIT 1
QUOTA_MODIFY_TIME 2
QUOTA_OVER 2
QUOTA_RESC_ID 3
QUOTA_USAGE 3
QUOTA_USAGE_RESC_ID 1
QUOTA_USAGE_USER_ID 1
QUOTA_USER_ID 3
RESC_ID 1
RESC_MODIFY_TIME 2
RESC_NAME 1 1
RESC_ZONE_NAME 2
RESC_VAULT_PATH 1
RULE_MODIFY_TIME 2
RULE_OWNER_ZONE 2
TOKEN_ID 1 1 1 1 1 1 1 1 1
TOKEN_NAME 1 1 1 1 1 1 1 1 1
TOKEN_NAMESPACE 1 1 1 1 1 1 1 1 1
USER_GROUP_ID 1 1 1 1 1 1 1 1 1 1 1
USER_ID 1 1 1 1 1 1 1 1 1 1 1 1
USER_MODIFY_TIME 2
USER_NAME 1 1 1 1 1 1 1 1 1 1 1 1
USER_TYPE 1 1 1 1 1 1 1 1 1 1
USER_ZONE 1 1 1 1 1 2 1 1 1 1 1
ZONE_ID 1
ZONE_MODIFY_TIME 2
ZONE_NAME 3
192
TableC:4Persistentstateattributesmodifiedbymicro‐servicesforaudittrails,rules,andusers
Persistent State Variable Sets 1 48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
Number of micro‐services 5 1 1 1 1 1 1 1 1 3 1 1 2 1 1 1 1 1 1 1 1
AUDIT_ACTION_ID 1
AUDIT_COMMENT 1
AUDIT_CREATE_TIME 1
AUDIT_MODIFY_TIME 1
AUDIT_OBJ_ID 1
AUDIT_USER_ID 1
DVM_BASE_MAP_BASE_NAME 3 1
DVM_BASE_MAP_CREATE_TIME 2
DVM_BASE_MAP_MODIFY_TIME 2
DVM_BASE_MAP_OWNER_NAME 2
DVM_BASE_MAP_OWNER_ZONE 2
DVM_BASE_MAP_VERSION 3 1
DVM_BASE_NAME 3
DVM_CONDITION 3 1
DVM_CREATE_TIME 2
DVM_EXT_VAR_NAME 3 1
DVM_ID 3 1
DVM_INT_MAP_PATH 3 1
DVM_MODIFY_TIME 2
DVM_OWNER_NAME 2
DVM_OWNER_ZONE 2
DVM_VERSION 2
FNM_BASE_MAP_BASE_NAME 1 2
FNM_BASE_MAP_CREATE_TIME 2
FNM_BASE_MAP_MODIFY_TIME 2
FNM_BASE_MAP_OWNER_NAME 2
FNM_BASE_MAP_OWNER_ZONE 2
FNM_BASE_MAP_VERSION 1 2
FNM_BASE_NAME 3
FNM_CREATE_TIME 2
FNM_EXT_FUNC_NAME 1 3
FNM_ID 1 3
FNM_INT_FUNC_NAME 1 3
FNM_MODIFY_TIME 2
FNM_OWNER_NAME 2
FNM_OWNER_ZONE 2
META_COLL_ATTR_ID 2
META_DATA_ATTR_ID 2
MSRVC_MODULE_NAME 1 2
MSRVC_NAME 1 2
MSRVC_SIGNATURE 1 2
MSRVC_VERSION 1 2
MSVRC_HOST 1 2
MSVRC_ID 1 2
MSVRC_LANGUAGE 1 2
MSVRC_LOCATION 1 2
MSVRC_STATUS 1 2
MSVRC_TYPE_NAME 1 2
QUOTA_LIMIT 3
QUOTA_MODIFY_TIME 2
QUOTA_OVER 2
QUOTA_RESC_ID 3
QUOTA_USAGE 1
QUOTA_USAGE_RESC_ID 1
QUOTA_USAGE_USER_ID 1
QUOTA_USER_ID 3
193
Persistent State Variable Sets 1 48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
RESC_GROUP_RESC_ID 1
RESC_GROUP_NAME 1
RESC_ID 1 1
RESC_NAME 1 1 1
RESC_ZONE_NAME 1
RESC_VAULT_PATH 1
RULE_BASE_MAP_BASE_NAME 3 1
RULE_BASE_MAP_CREATE_TIME 2
RULE_BASE_MAP_MODIFY_TIME 2
RULE_BASE_MAP_OWNER_NAME 2
RULE_BASE_MAP_OWNER_ZONE 2
RULE_BASE_MAP_PRIORITY 2 1
RULE_BASE_MAP_VERSION 3 1
RULE_BASE_NAME 1
RULE_BODY 1 1
RULE_CONDITION 1 1
RULE_EVENT 1 1
RULE_EXEC_ADDRESS 2
RULE_EXEC_ESTIMATED_EXE_TIME 2
RULE_EXEC_FREQUENCY 2
RULE_EXEC_ID 2
RULE_EXEC_NAME 2
RULE_EXEC_NOTIFICATION_ADDR 2
RULE_EXEC_PRIORITY 2
RULE_EXEC_REI_FILE_PATH 2
RULE_EXEC_TIME 2
RULE_EXEC_USER_NAME 2
RULE_ID 3 1 1
RULE_NAME 1 1
RULE_RECOVERY 1 1
SLD_RESC_NAME 1
SLD_CREATE_TIME 1
TOKEN_ID 1
TOKEN_NAME 1 1
TOKEN_NAMESPACE 1
TOKEN_VALUE2 1
USER_COMMENT 1
USER_CREATE_TIME 2 1
USER_GROUP_ID 1 2 1 2
USER_ID 1 2 1 1 1 1
USER_INFO 1
USER_MODIFY_TIME 2 1
USER_NAME 1 2 1 1 1 1
USER_TYPE 1 2 1 1
USER_ZONE 1 2 1 1 1
ZONE_NAME 1 1
ZONE_TYPE 1 1
194
Appendix D: Persistent State Variables Thepersistentstatevariablesthatcanbequeriedarelistedbelow.NotethatmanyoftheattributesaremaintainedandsetbytheiRODSservers,independentlyofthemicro‐servicesandthepolicy‐enforcementpoints.
TableD:1PersistentStateVariablesPersistentStateAttribute ExplanationAUDIT_ACTION_ID InternalidentifierfortypeofactionthatisauditedAUDIT_COMMENT CommentonauditactionforthisinstanceAUDIT_CREATE_TIME CreationtimestampforauditactionAUDIT_MODIFY_TIME Modificationtimestampforauditaction
AUDIT_OBJ_IDInternalIdentifieroftheobject(data,collection,user,etc.)onwhichtheauditactionwasperformed
AUDIT_USER_ID InternalIdentityofuserwhoseactionwasauditedCOLL_ACCESS_COLL_ID AliasedCollectionidentifierusedforaccesscontrolCOLL_ACCESS_NAME Accessstringforcollection(cf.DATA_ACCESS_NAME)COLL_ACCESS_TYPE InternalidentifierforaccessnameCOLL_ACCESS_USER_ID Internalidentifieroftheuserwhoseactionisaudited.COLL_COMMENTS CommentsaboutthecollectionCOLL_CREATE_TIME Collectioncreationtimestamp
COLL_FILEMETA_CREATE_TIME
WhenaUnixdirectoryisimportedintoiRODSfromclient‐side,thedirectorymetadatainthefilesystemiscapturedintheiCATunderCOLL_FILEMETA.Thisisusefulwhengettingthedirectorybackintotheclientasthe“original”metadatacanbere‐created.TheCOLL_FILEMETA_CREATE_TIMEvariableholdsthevaluewhenthedirectorymetadatawasinsertedintoiCAT
COLL_FILEMETA_CTIME OriginalUnixdirectorycreatetimeattheclient‐side.COLL_FILEMETA_GID OriginalUnixGroup‐idforthedirectory(usedforACLs)attheclient‐side.COLL_FILEMETA_GROUP OriginalUnixGroupnameforthedirectory(usedforACLs)attheclient‐side.COLL_FILEMETA_MODE OriginalUnixACLforthedirectoryattheclient‐side.COLL_FILEMETA_MODIFY_TIME ValuewhenthedirectorymetadatawasmodifiediniCATCOLL_FILEMETA_MTIME OriginalUnixtimestampforlastmodificationattheclient‐sideCOLL_FILEMETA_OBJ_ID OriginalUnixobject_idforthedirectorattheclient‐side.COLL_FILEMETA_OWNER OriginalUnixownerforthedirectoryattheclient‐side.COLL_FILEMETA_SOURCE_PATH OriginalUnixpathforthedirectoryattheclient‐side.COLL_FILEMETA_UID OriginalUnixuser‐idofownerforthedirectoryattheclient‐side.COLL_ID Collectioninternalidentifier
COLL_INHERITANCEAttributesinheritedbysub‐collectionsfromparent‐collection:ACL,metadata,pins,locks
COLL_MAP_ID Internalidentifierdenotingthetypeofcollection.COLL_MODIFY_TIME LastmodificationtimestampforcollectionCOLL_NAME LogicalcollectionnameCOLL_OWNER_NAME CollectionownerCOLL_OWNER_ZONE HomezoneofthecollectionownerCOLL_PARENT_NAME ParentcollectionnameCOLL_TOKEN_NAMESPACE SeeTOKEN_NAMESPACE(alsoDATA_TOKEN_NAMESPACE),notusedDATA_ACCESS_DATA_ID Internalidentifierofthedigitalobjectforwhichaccessisdefined
DATA_ACCESS_NAMEAccessstringiniCATusedfordata,collections,etc.(e.g.readobject)iquest"SELECTTOKEN_NAMEWHERETOKEN_NAMESPACE='access_type'"
DATA_ACCESS_TYPE InternalICATidentifierDATA_ACCESS_USER_ID Userorgroup(name)forwhichtheaccessisdefinedondigitalobject
DATA_CHECKSUMChecksumstoredastaggedlist:<BINHEX>12344</BINHEX><MD5>22234422</MD5>
DATA_COLL_ID CollectioninternalidentifierDATA_COMMENTS CommentsaboutthedigitalobjectDATA_CREATE_TIME CreationtimestampforthedigitalobjectDATA_EXPIRY Expirationdateforthedigitalobject
195
DATA_FILEMETA_CREATE_TIME
WhenaUnixfileisimportedintoiRODSfromclient‐side,thefilemetadatainthefilesystemiscapturedintheiCATunderDATA_FILEMETA.Thisisusefulwhengettingthefilebackintotheclientasthe“original”metadatacanbere‐created.TheDATA_FILEMETA_CREATE_TIMEvariableholdsthevaluewhenthefilemetadatawasinsertedintoiCAT
DATA_FILEMETA_CTIME OriginalUnixfilecreatetimeattheclient‐side.DATA_FILEMETA_GID OriginalUnixGroup‐idforthefile(usedforACLs)attheclient‐side.DATA_FILEMETA_GROUP OriginalUnixGroupnameforthedirectoryfile(usedforACLs)attheclient‐side.DATA_FILEMETA_MODE OriginalUnixACLforthefileattheclient‐side.DATA_FILEMETA_MODIFY_TIME ValuewhenthefilemetadatawasmodifiediniCATDATA_FILEMETA_MTIME OriginalUnixtimestampforlastmodificationattheclient‐sideDATA_FILEMETA_OBJ_ID OriginalUnixobject_idforthefileattheclient‐side.DATA_FILEMETA_OWNER OriginalUnixownerforthefileattheclient‐side.DATA_FILEMETA_SOURCE_PATH OriginalUnixpathforthefileattheclient‐side.DATA_FILEMETA_UID OriginalUnixuser‐idofownerforthefileattheclient‐side.
DATA_IDUniqueDatainternalidentifier.Adigitalobjectisidentifiedby(zone,collection,dataname,replica,version).Theidentifierissameacrossreplicasandversions.
DATA_MAP_ID InternalidentifierdenotingthetypeofdataDATA_MODIFY_TIME LastmodificationtimestampforthedigitalobjectDATA_NAME LogicalnameofthedigitalobjectDATA_OWNER_NAME UserwhocreatedtheobjectDATA_OWNER_ZONE HomezoneoftheuserwhocreatedtheobjectDATA_PATH PhysicalpathnamefordigitalobjectinresourceDATA_REPL_NUM Replicanumberstartingwith“1”DATA_REPL_STATUS Replicastatus:locked,is‐deleted,pinned,hideDATA_RESC_GROUP_NAME NameofresourcegroupinwhichdataisstoredDATA_RESC_NAME LogicalnameofstorageresourceDATA_SIZE SizeofthedigitalobjectinbytesDATA_STATUS Digitalobjectstatus:locked,is‐deleted,pinned,hideDATA_TOKEN_NAMESPACE Namespaceofthedatatoken:e.g.datatype,notusedDATA_TYPE_NAME Typeofdata:jpegimage,PDFdocument
DATA_VERSIONVersionstringassignedtothedigitalobject.Olderversionsofreplicashaveanegativereplicanumber
DVM_BASE_MAP_BASE_NAME NamefortheDataBaseofDataVariableSetofMaps(e.g.“core”incore.dvm)DVM_BASE_MAP_COMMENT CommentsforDVM_BASE_MAPDVM_BASE_MAP_CREATE_TIME CreationtimeforDVM_BASE_MAPDVM_BASE_MAP_MODIFY_TIME LastModificationtimeforDVM_BASE_MAPDVM_BASE_MAP_OWNER_NAME Owner’snameoftheDVM_BASE_MAPDVM_BASE_MAP_OWNER_ZONE Owner’szonenameoftheDVM_BASE_MAPDVM_BASE_MAP_VERSION VersionoftheDVM_BASE_MAP(emptyor0meanscurrent)DVM_BASE_NAME ForeignkeyreferencetoDVM_BASE_MAP_BASE_NAMEDVM_COMMENT CommentfortheDVMDVM_CONDITION ConditionforapplyingtheDVMMappingcorrespondingtoDVM_EXT_VAR_NAMEDVM_CREATE_TIME CreationtimeoftheDVMMappingDVM_EXT_VAR_NAME ExternalnamefortheMap(theactual$‐variable)DVM_ID AninternalidentifierforDVMMappingDVM_INT_MAP_PATH InternalStructurepathinREIcorrespondingtoDVM_EXT_VAR_NAMEDVM_MODIFY_TIME LastmodificationtimefortheDVMMappingDVM_OWNER_NAME Owner’snameoftheDVM_MappingDVM_OWNER_ZONE Owner’szonenameoftheDVMMappingDVM_STATUS StatusoftheDVM_Mapping(emptyisvalid)DVM_VERSION VersionfortheDVM_Mapping(emptyor0meanscurrent)
FNM_BASE_MAP_BASE_NAMENamefortheDataBaseofFunctionNameSetofMaps(e.g.“core”incore.fnm).Thiscanbeusedforgivingvirtualnamesformicro‐servicesandrulesandforversioningnamesforthesame.
FNM_BASE_MAP_COMMENT CommentsforFNM_BASE_MAPFNM_BASE_MAP_CREATE_TIME CreationtimeforFNM_BASE_MAPFNM_BASE_MAP_MODIFY_TIME LastModificationtimeforFNM_BASE_MAPFNM_BASE_MAP_OWNER_NAME Owner’snameoftheFNM_BASE_MAPFNM_BASE_MAP_OWNER_ZONE Owner’szonenameoftheFNM_BASE_MAPFNM_BASE_MAP_VERSION VersionoftheFNM_BASE_MAP(emptyor0meanscurrent)FNM_BASE_NAME ForeignkeyreferencetoFNM_BASE_MAP_BASE_NAMEFNM_COMMENT CommentfortheFNMMappingFNM_CREATE_TIME CreationtimeoftheFNMMappingFNM_EXT_FUNC_NAME ExternalnamefortheFNMMapping
196
FNM_ID AninternalidentifierforFNMMappingFNM_INT_FUNC_NAME InternalStructurepathinREIcorrespondingtoFNM_EXT_FUNC_NAMEFNM_MODIFY_TIME LastmodificationtimefortheFNMMappingFNM_OWNER_NAME Owner’snameoftheFNM_MappingFNM_OWNER_ZONE Owner’szonenameoftheFNMMappingFNM_STATUS StatusoftheFNM_Mapping(emptyisvalid)FNM_VERSION VersionfortheFNM_Mapping(emptyor0meanscurrent)META_ACCESS_META_ID Internalidentifierofthe(AVU)metadataforwhichaccessisdefinedMETA_ACCESS_NAME SeeDATA_ACCESS_NAMEMETA_ACCESS_TYPE InternalICATidentifierMETA_ACCESS_USER_ID Userorgroup(name)forwhichtheaccessisdefinedonmetadataMETA_COLL_ATTR_ID InternalidentifierformetadataattributeforcollectionMETA_COLL_ATTR_NAME MetadataattributenameforcollectionMETA_COLL_ATTR_UNITS MetadataattributeunitsforcollectionMETA_COLL_ATTR_VALUE MetadataattributevalueforcollectionMETA_COLL_CREATE_TIME CreationtimeforthemetadataforcollectionsMETA_COLL_MODIFY_TIME LastmodificationtimeforthemetadataforcollectionsMETA_DATA_ATTR_ID InternalidentifierformetadataattributefordigitalobjectMETA_DATA_ATTR_NAME MetadataattributenamefordigitalobjectMETA_DATA_ATTR_UNITS MetadataattributeunitsfordigitalobjectMETA_DATA_ATTR_VALUE MetadataattributevaluefordigitalobjectMETA_DATA_CREATE_TIME TimestampwhenmetadatawascreatedMETA_DATA_MODIFY_TIME TimestampwhenmetadatawasmodifiedMETA_MET2_ATTR_ID InternalidentifierformetadataattributeformetadataMETA_MET2_ATTR_NAME MetadataattributenameformetadataMETA_MET2_ATTR_UNITS MetadataattributeunitsformetadataMETA_MET2_ATTR_VALUE MetadataattributevalueformetadataMETA_MET2_CREATE_TIME CreationtimeforthemetadataformetadataMETA_MET2_MODIFY_TIME LastmodificationtimeforthemetadataformetadataMETA_MSRVC_ATTR_ID Internalidentifierformetadataattributeformicro‐serviceMETA_MSRVC_ATTR_NAME Metadataattributenameformicro‐serviceMETA_MSRVC_ATTR_UNITS Metadataattributeunitsformicro‐serviceMETA_MSRVC_ATTR_VALUE Metadataattributevalueformicro‐serviceMETA_MSRVC_CREATE_TIME Creationtimeforthemetadataformicro‐serviceMETA_MSRVC_MODIFY_TIME Lastmodificationtimeforthemetadataformicro‐serviceMETA_NAMESPACE_COLL NamespaceofcollectionAVU‐tripletattributeMETA_NAMESPACE_DATA NamespaceofdigitalobjectAVU‐tripletattributeMETA_NAMESPACE_MET2 NamespaceofmetadataAVU‐tripletattributeMETA_NAMESPACE_MSRVC Namespaceofmicro‐serviceAVU‐tripletattributeMETA_NAMESPACE_RESC NamespaceofresourceAVU‐tripletattributeMETA_NAMESPACE_RESC_GROUP Namespaceofresource‐groupAVU‐tripletattributeMETA_NAMESPACE_RULE NamespaceofruleAVU‐tripletattributeMETA_NAMESPACE_USER NamespaceofuserAVU‐tripletattributeMETA_RESC_ATTR_ID InternalidentifierformetadataattributeforresourceMETA_RESC_ATTR_NAME MetadataattributenameforresourceMETA_RESC_ATTR_UNITS MetadataattributeunitsforresourceMETA_RESC_ATTR_VALUE MetadataattributevalueforresourceMETA_RESC_CREATE_TIME CreationtimeforthemetadataforresourceMETA_RESC_MODIFY_TIME LastmodificationtimeforthemetadataforresourceMETA_RESC_GROUP_ATTR_ID InternalidentifierformetadataattributeforresourcegroupMETA_RESC_GROUP_ATTR_NAME MetadataattributenameforresourcegroupMETA_RESC_GROUP_ATTR_UNITS MetadataattributeunitsforresourcegroupMETA_RESC_GROUP_ATTR_VALUE MetadataattributevalueforresourcegroupMETA_RESC_GROUP_CREATE_TIME CreationtimeforthemetadataforresourcegroupMETA_RESC_GROUP_MODIFY_TIME Lastmodificationtimeforthemetadataforresourcegroup META_RULE_ATTR_ID InternalidentifierformetadataattributeforaruleMETA_RULE_ATTR_NAME MetadataattributenameforaruleMETA_RULE_ATTR_UNITS MetadataattributeunitsforaruleMETA_RULE_ATTR_VALUE MetadataattributevalueforaruleMETA_RULE_CREATE_TIME CreationtimeforthemetadataentryforaruleMETA_RULE_MODIFY_TIME LastmodificationtimeforthemetadataforaruleMETA_TOKEN_NAMESPACE SeeTOKEN_NAMESPACEMETA_USER_ATTR_ID InternalidentifierformetadataattributeforuserMETA_USER_ATTR_NAME Metadataattributenameforuser
197
META_USER_ATTR_UNITS MetadataattributeunitsforuserMETA_USER_ATTR_VALUE MetadataattributevalueforuserMETA_USER_CREATE_TIME Internalidentifierofthe(AVU)metadataforwhichaccessisdefinedMETA_USER_MODIFY_TIME SeeDATA_ACCESS_NAMEMSRVC_ACCESS_MSRVC_ID InternalICATidentifierMSRVC_ACCESS_NAME Userorgroup(name)forwhichtheaccessisdefinedonmetadataMSRVC_ACCESS_TYPE InternalICATidentifierMSRVC_ACCESS_USER_ID Userorgroup(name)forwhichtheaccessisdefinedonthemicro‐serviceMSRVC_COMMENT Commentsformicro‐serviceMSRVC_CREATE_TIME Creationtimeforthemicro‐serviceMSRVC_DOXYGEN Doxygendocumentationforthemicro‐serviceMSRVC_HOST Hosttypesatwhichthemicro‐servicecanbeexecutedMSRVC_ID InternalIdforthemicro‐serviceMSRVC_LANGUAGE Languageinwhichthemicro‐serviceiswrittenMSRVC_LOCATION TheLocationofthemicro‐serviceexecutableMSRVC_MODIFY_TIME LastModificationtimeforthemicro‐serviceMSRVC_MODULE_NAME Modulenameforthemicro‐serviceMSRVC_NAME Nameofthemicro‐serviceMSRVC_OWNER_NAME Ownernameofthemicro‐serviceMSRVC_OWNER_ZONE Owner’szonenameofthemicro‐serviceMSRVC_SIGNATURE Digitalsignature(checksum)forthemicro‐serviceMSRVC_STATUS Statusofthemicro‐serviceMSRVC_TOKEN_NAMESPACE SeeTOKEN_NAMESPACEMSRVC_TYPE_NAME Typeofthemicro‐serviceMSRVC_VARIATIONS Variations(orforms)ofthemicro‐serviceMSRVC_VER_COMMENT Commentsonthemicro‐serviceMSRVC_VER_CREATE_TIME Creationtimeofversionofthemicro‐serviceMSRVC_VER_MODIFY_TIME Lastmodificationtimeofversionofthemicro‐serviceMSRVC_VER_OWNER_NAME Ownernameoftheversionofthemicro‐serviceMSRVC_VER_OWNER_ZONE Ownerzonenameoftheversionofthemicro‐serviceMSRVC_VERSION Versionofthemicro‐serviceQUOTA_LIMIT HighlimitforquotaforresourceinQUOTA_RESC_IDforQUOTA_USER_IDQUOTA_MODIFY_TIME LastmodificationtimeofquotaQUOTA_OVER FlagifquotaisexceededQUOTA_RESC_ID InternalResourceIDforquotaQUOTA_RESC_NAME ResourceNameforquotaQUOTA_USAGE NameofUsageforquota(normallywrite)QUOTA_USAGE_MODIFY_TIME LastmodificationtimeofquotausageQUOTA_USAGE_RESC_ID InternalResourceIDforquotausageQUOTA_USAGE_USER_ID InternalUserIDforquotausageQUOTA_USER_ID InternalUserIDforquotaQUOTA_USER_NAME UserNameforQuotaQUOTA_USER_TYPE UsertypenameforquotaQUOTA_USER_ZONE UserzonenameforquotaRESC_ACCESS_NAME SeeDATA_ACCESS_NAMERESC_ACCESS_RESC_ID InternalidentifieroftheresourceforwhichaccessisdefinedRESC_ACCESS_TYPE InternalICATidentifierRESC_ACCESS_USER_ID Userorgroup(name)forwhichtheaccessisdefinedonresourceRESC_CLASS_NAME Resourceclass:primary,secondary,archivalRESC_COMMENT CommentaboutresourceRESC_CREATE_TIME CreationtimestampofresourceRESC_FREE_SPACE FreespaceavailableonresourceRESC_FREE_SPACE_TIME TimeatwhichfreespacewascomputedRESC_GROUP_ID InternalIdforresourcegroupRESC_GROUP_NAME LogicalnameoftheresourcegroupRESC_GROUP_RESC_ID InternalidentifierfortheresourcegroupRESC_ID Internalresourceidentifierforresourceinthegroup
RESC_INFOTaggedinformationlist:<MAX_OBJ_SIZE>2GBB</MAX_OBJ_SIZE><MIN_LATENCY>1msec</MIIN_LATENCY>
RESC_LOC ResourceIPaddressRESC_MODIFY_TIME LastmodificationtimestampforresourceRESC_NAME LogicalnameoftheresourceRESC_STATUS OperationalstatusofresourceRESC_TOKEN_NAMESPACE SeeTOKEN_NAMESPACERESC_TYPE_NAME Resourcetype:HPSS,SamFS,database,orb
198
RESC_VAULT_PATH ResourcepathforstoringfilesRESC_ZONE_NAME NameoftheiCAT,uniquegloballyRULE_ACCESS_NAME InternalidentifieroftheiRODSruleforwhichaccessisdefinedRULE_ACCESS_RULE_ID SeeDATA_ACCESS_NAMERULE_ACCESS_TYPE InternalICATidentifierRULE_ACCESS_USER_ID Userorgroup(name)forwhichtheaccessisdefinedoniRODSruleRULE_BASE_MAP_BASE_NAME NamefortheDataBaseofRuleSetofMaps(e.g.“core”incore.re).RULE_BASE_MAP_COMMENT CommentsforRULE_BASE_MAPRULE_BASE_MAP_CREATE_TIME CreationtimeforRULE_BASE_MAPRULE_BASE_MAP_MODIFY_TIME LastModificationtimeforRULE_BASE_MAPRULE_BASE_MAP_OWNER_NAME Owner’snameoftheRULE__BASE_MAPRULE_BASE_MAP_OWNER_ZONE Owner’szonenameoftheRULE_BASE_MAP
RULE_BASE_MAP_PRIORITYPrioritizationoftheRULE_BASE_MAP(emptyor0meanscurrent).Thistellswhichmaphaspriorityoverothermaps.Thiscandefineatree/forest.
RULE_BASE_MAP_VERSION VersionoftheRULE_BASE_MAP(emptyor0meanscurrent)RULE_BASE_NAME RulebasetowhichtheruleisamemberRULE_BODY BodyoftheruleRULE_COMMENT CommentsontheruleRULE_CONDITION ConditionoftheruleRULE_CREATE_TIME CreationtimeoftheruleRULE_DESCR_1 Descriptionofrule(1)RULE_DESCR_2 Descriptionofrule(2)RULE_DOLLAR_VARS SessionvariablesusedintheruleRULE_EVENT Eventnameoftherule(canbeviewedasrulename)RULE_EXEC_ADDRESS HostnamewherethedelayedRulewillbeexecutedRULE_EXEC_ESTIMATED_EXE_TIME EstimatedexecutiontimeforthedelayedRuleRULE_EXEC_FREQUENCY DelayedRuleexecutionfrequencyRULE_EXEC_ID InternalidentifierforadelayedRuleexecutionrequestRULE_EXEC_LAST_EXE_TIME PreviousexecutiontimeforthedelayedRuleRULE_EXEC_NAME LogicalnameforadelayedRuleexecutionrequestRULE_EXEC_NOTIFICATION_ADDR NotificationaddressfordelayedRulecompletionRULE_EXEC_PRIORITY DelayedRuleexecutionpriorityRULE_EXEC_REI_FILE_PATH Pathofthefilewherethecontext(REI)ofthedelayedRuleisstoredRULE_EXEC_STATUS CurrentstatusofthedelayedRuleRULE_EXEC_TIME TimewhenthedelayedRulewillbeexecutedRULE_EXEC_USER_NAME UserrequestingadelayedRuleexecutionRULE_ICAT_ELEMENTS Permanent(#‐variables)affectedbytheruleRULE_ID InternalidentifierfortheruleRULE_INPUT_PARAMS ParametersusedasinputwheninvokingtheruleRULE_MODIFY_TIME LastmodificationtimeoftheruleRULE_NAME Nameoftherule(canbedifferentfromRULE_EVENTRULE_OUTPUT_PARAMS OutputparameterssetbytheruleinvocationRULE_OWNER_NAME OwnernameoftheruleRULE_OWNER_ZONE Owner’szonenameoftheruleRULE_RECOVERY RecoverypartoftheruleRULE_SIDEEFFECTS Sideeffects(%‐variables)–usedasasemanticofwhattheruledoesRULE_STATUS Statusoftherule(valid/activeorotherwise)RULE_TOKEN_NAMESPACE SeeTOKEN_NAMESPACERULE_VERSION Versionoftherule
SL_CPU_USEDServerloadinformation:cpuused.Serverloadinformationiscomputedperiodicallyforallserversinthegrid,ifenabledbytheadministrator.
SL_CREATE_TIME Serverloadinformation:creationtimeoftheentrySL_DISK_SPACE Serverloadinformation:diskspaceusedSL_HOST_NAME Serverloadinformation:hostnameoftheserverSL_MEM_USED Serverloadinformation:memoryusedSL_NET_INPUT Serverloadinformation:networkinputloadSL_NET_OUTPUT Serverloadinformation:networkoutputloadSL_RESC_NAME Serverloadinformation:resourceforwhichdiskspaceisprovidedSL_RUNQ_LOAD Serverloadinformation:runqueueloadSL_SWAP_USED Serverloadinformation:swapspaceusedSLD_CREATE_TIME Serverloaddigestinformation:digestcreationtimeSLD_LOAD_FACTOR Serverloadinformation:loadfactorcomputedromserverloadinformationSLD_RESC_NAME Serverloadinformation:resourcenameforwhichtheloadfactoriscomputedTICKET_ALLOWED_GROUP_NAME Usergrouptowhichtheticket(TICKET_ALLOWED_GROUP_TICKET_ID)isvalidTICKET_ALLOWED_GROUP_TICKET_ID Identifierfortheticket
199
TICKET_ALLOWED_HOSTHostforwhichtheticket(TICKET_ALLOWED_HOST_TICKET_ID)isvalidAllowsinvocationoftheticket‐basedaccessonlyfromthishost.Usefulforscheduledjobs
TICKET_ALLOWED_HOST_TICKET_ID IdentifierfortheticketTICKET_ALLOWED_USER_NAME Usertowhichtheticket(TICKET_ALLOWED_GROUP_TICKET_ID)isvalidTICKET_ALLOWED_USER_TICKET_ID IdentifierfortheticketTICKET_COLL_NAME CollectionnameonwhichtheticketisissuedTICKET_CREATE_TIME TicketcreationtimeTICKET_DATA_COLL_NAME CollectionnameoftheobjectonwhichtheticketisissuedTICKET_DATA_NAME DatanameoftheobjectonwhichtheticketisissuedTICKET_EXPIRY ExpirationdateforaticketTICKET_ID IdentifierfortheticketTICKET_MODIFY_TIME LastmodificationtimefortheticketTICKET_OBJECT_ID (Internal)ObjectIdfortheobjectonwhichtheticketisissuedTICKET_OBJECT_TYPE Ticketmaybefordata,resource,user,rule,metadata,zone,collection,tokenTICKET_OWNER_NAME NameofthepersonwhocreatedtheticketTICKET_OWNER_ZONE HomezoneofthepersonwhocreatedtheticketTICKET_STRING HumanreadablenamefortheticketTICKET_TYPE Typeofticket,either“read”or“write”TICKET_USER_ID IdentifierofthepersonwhoisusingtheticketTICKET_USES_COUNT NumberoftimesatickethasbeenusedTICKET_USES_LIMIT MaximumnumberoftimesaticketmaybeusedTICKET_WRITE_BYTE_COUNT NumberofbyteswrittenforaccessesthroughagiventicketTICKET_WRITE_BYTE_LIMIT MaximumnumberofbytesthatmaybewrittenusingagiventicketTICKET_WRITE_FILE_COUNT NumberoffileswrittenforaccessesthroughagiventicketTICKET_WRITE_FILE_LIMIT MaximumnumberoffilesthatcanbewrittenusingagiventicketTOKEN_COMMENT CommentontokenTOKEN_ID InternalidentifierfortokennameTOKEN_NAME Avalueinthetokennamespace;e.g.“jpgimage”TOKEN_NAMESPACE Namespacefortokens;e.g.datatype,resource_type,rule_type,…TOKEN_VALUE Additionaltokeninformationstring(e.g.dotextensionsforjpg:jpg,.jpg2,jg)TOKEN_VALUE2 AdditionaltokeninformationstringTOKEN_VALUE3 AdditionaltokeninformationstringUSER_COMMENT CommentabouttheuserUSER_CREATE_TIME CreationtimestampUSER_DN Distinguishednameintaggedlist:<authType>distinguishedName</authType>USER_GROUP_ID InternalidentifierfortheusergroupUSER_GROUP_NAME LogicalnamefortheusergroupUSER_ID Userinternalidentifier
USER_INFOTaggedinformation:<EMAIL>[email protected]</EMAIL><PHONE>5555555555</PHONE>
USER_MODIFY_TIME LastmodificationtimestampUSER_NAME Username
USER_TYPEUserrole(rodsgroup,rodsadmin,rodsuser,domainadmin,groupadmin,storageadmin,rodscurators)
USER_ZONE HomeDataGridoruserZONE_COMMENT Commentaboutthezone
ZONE_CONNECTIONConnectioninformationintaggedlist;<PASSWORD>RPS1</PASSWORD><GSI>DISTNAME</GSI>
ZONE_CREATE_TIME DateandtimestampforcreationofadatagridZONE_ID DataGridorzoneidentifierZONE_MODIFY_TIME DateandtimestampformodificationofadatagridZONE_NAME DataGridorzonename,nameoftheiCATZONE_TYPE Typeofzone:local/remote/other
200
Appendix E: Protected Data Requirements
Thedatamanagementrequirementsareabstractedfromthedocument,https://www.med.unc.edu/security/hipaa/documents/ADMIN0082%20Info%20Security.pdf
Eachrequirementhasbeenevaluatedforthefeasibilityofcreatingacomputeractionablepolicythatautomatesenforcement. Documentthepolicies Protecttheconfidentiality,integrity,andavailabilityofinformationfrom
accidentalorintentionalunauthorizedmodification,destructionordisclosure Periodicriskassessmenttodocumenttypesofthreatsandvulnerabilities,and
evaluateinformationassetsandtechnologyfordatacollection,storage,dissemination,andprotection
Protectedassetsinclude:o Paymentcardaccountnumbers,cardholdername,expirationdate,
servicecode,andCID/PINso Legallycoveredentitieso SocialSecurityNumbersandpersonalinformationo ProtectedHealthInformation–demographic,physicalormentalhealth,
provisionofhealthcare,healthcarepaymentthatidentifiestheindividual
Protectiontaskso Dataavailableondemandbyanauthorizedpersono Datanotaccessiblebyunauthorizedpersonorprocesso Encryptiono Integrityo Identifyinvolvedpersonidentificationo Identifyinvolvedcomputersystems
SecurityOfficeo Monitorpolicydistributiontoresourceso Basicsecuritysupport(accounts,accesscontrols,OSupgrades)o Classificationofcomputerresourceso Systemdesignforsecuritycontrolso Vulnerabilitydetection,notificationo Detectionofunauthorizedaccess(audittrails)o Trainingo Securityauditso Reports
Collectionownerso PresenceofHIPAAinformationo Dataretentionperiodo Applicationofpoliciesandproceduresfordataprotectiono Authorizingaccesso Specifyingcontrols,settingcontrolpolicieso Reportinglossormisuse
201
o Correctingproblemso Trainingo Trackingapprovalprocessesforsystems
Datagridadministrator–custodiano Providephysicalsafeguards–one‐timepasswordstoaccessiCATo Provideproceduresforsecurityo Controlaccesstoinformationo Releaseinformationthroughprivacyprocedureso Evaluatecosteffectivenessofcontrolso Maintainpoliciesandprocedureso Promoteeducationo Reportlossormisuseo Respondtosecurityincidents
Usermanagement–projectso Reviewandapproverequestsforaccesso Updateemployees’securityrecordswithpositionandjobfunction
changeso Updateaccessonemployeeterminationortransfero Revokephysicalaccesstoterminatedemployeeso Promotetrainingo Reportlossormisuseo Initiatecorrectiveactionso Followrecommendationsforpurchaseandimplementationofsystems
Usero Onlyaccessinformationforauthorizedjobresponsibilitieso Complywithaccesscontrolso ReportdisclosuresofPHIotherthanfortreatment,payment,orhealth
careo Keeppersonalauthenticationinformationconfidentialo Reportlossormisuseo Initiatecorrectiveactions
Classifyinformationo Protectedhealthinformationo Confidentialinformation–PCI,PIo Internalinformation–allinformationnotPHI,Confidential,orPublico Publicinformation
Computerandinformationcontrolo Ownership,licensingofsoftwareo Inventoryofsoftwareandcomputers,users,managerso Virusprotection,scanallfileso Accesscontrols
authorizationbysupervisorcontextbased–ticket authorizationrolebased authorizationuserbased authentication–uniqueuserID
202
Controlledpasswords Biometric TokensinconjunctionwithaPIN
Passwordsecurity Nore‐useormultipleuse Minimumlength,expiration,encryptionduring
transmission,storage Logunsuccessfulattempts Proceduresforvalidatinguserswhorequestpassword
reset Automatictimeoutafterperiodofinactivity Log‐off
o Dataintegrity Transactionaudit Replication Checksums Encryptioninstorage Digitalsignatures Datavalidationonentry
o Transmissionsecurity Integrity–checksums Encryptioninmessagingsystems
o Remoteaccess Onlyapprovedmethodsandpathways
o Physicalaccess Accesscontrolledareas,HVAC Authenticationtodatagridandaccesscontrols Authenticationtoworkstation,automaticscreensavers
o Facilityaccesscontrols Contingencyforemergencyoperationsafterdisaster Facilitysecurityplan–policiesandprocedures Documentedprocedurestovalidateaccess Documentedmaintenanceoffacility
o Emergencyaccess Proceduresforauthorization,implementation,revocation
Equipmentandmediacontrolso Mediadisposalo Trackcustodyofmediao Databackup
Othermediacontrolso Encryptionforstorageonremovablemediao Encryption,power‐onpasswords,autologoffformobiledeviceso Ownershipofmediaforassigningresponsibility
Datatransfer/printingo Approvalforbulkdownload
203
o De‐identificationofdata–Bitcuatoro Encryptdatatransfers
SocialMediao NoPHI,confidential,orproprietaryinformationo Nopatientidentificationinformationo Nopatientphotographs
Auditcontrolso Recordactivitybyusersandsystemadministratorso Reviewactivitylogso Preservereviewsfor6years
Evaluationo Verifyproceduresaftereachoperationalorenvironmentalchange
Contingencyplano Enablerecoveryofdata
Documentdatabackupplan Backupdataoffsite Manageaccesscontrolsonreplicas
o Disasterrecoveryplan–procedureforrestoringdatao Emergencyoperationplan–fornaturaldisasterso Proceduresfortestingcontingencyplansonrevisiono Identifycriticalcomponents
Passwordcontrolso Nosharingofpasswordso Singlesign‐onsystemforpasswordso NopasswordsonPCo Nodictionarywordso Encryptpasswordso Maximumof5invalidpasswordscauseslockoutfor30minuteso Contain1uppercase,1lowercase,1numbero Minimumlengthof10characterso Passwordschangedannuallyo Maintainhistoryofprior6passwords,preventre‐use
Peer‐to‐peero F2Pfile‐sharingprogramsareprohibitedo InternetstoragemaynotbeusedforPHIandconfidentialinformation
204
Appendix F: Mauna Loa Sensor Data DMP TypesofDataProducedAirsamplesatMaunaLoaObservatorywillbecollectedcontinuouslyfromairintakeslocatedatfivetowers–acentraltowerandfourtowerslocatedatcompassquadrants.RawdatafileswillcontaincontinuouslymeasuredCO2concentrations,calibrationstandards,referencesstandards,dailycheckstandards,andblanks.Thesamplelineslocatedatcompassquadrantswereusedtoexaminetheinfluenceofsourceeffectsassociatedwithwinddirections[3,4].InadditiontotheCO2data,wewillrecordweatherdata(windspeedanddirection,temperature,humidity,precipitation,andcloudcover).SiteconditionsatMaunaLoaObservatorywillalsobenotedandretained.Thefinaldataproductwillconsistof5‐minute,15‐minute,hourly,daily,andmonthlyaverageatmosphericconcentrationofCO2,inmolefractioninwater‐vapor‐freeairmeasuredattheMaunaLoaObservatory,Hawaii.DataarereportedasadrymolefractiondefinedasthenumberofmoleculesofCO2dividedbythenumberofmoleculesofdryairmultipliedbyonemillion(ppm).Thefinaldataproducthasbeenthoroughlydocumentedintheopenliterature[2]andinScrippsInstitutionofOceanographyInternalReports[1].Thedatagenerated(rawCO2measurements,meteorologicaldata,calibrationandreferencestandards)willbeplacedincomma‐separated‐valuesinplainASCIIformat,whicharereadableoverlongtimeperiods.Thefinaldatafilewillcontaindatesforeachobservation(time,day,monthandyear)andtheaverageCO2concentration.Thefinaldataproductdistributedtomostuserswilloccupylessthan500KB;rawandancillarydata,whichwillbedistributedonrequestcompriselessthan10MB. DataandMetadataStandardsMetadatawillbecomprisedoftwoformats–contextualinformationaboutthedatainatextbaseddocumentandISO19115standardmetadatainanxmlfile.Thesetwoformatsformetadatawerechosentoprovideafullexplanationofthedata(textformat)andtoensurecompatibilitywithinternationalstandards(xmlformat).ThestandardXMLfilewillbemorecomplete;thedocumentfilewillbeahuman‐readablesummaryoftheXMLfile. PoliciesforAccessandSharingThefinaldataproductwillbereleasedtothepublicassoonastherecalibrationofstandardgaseshasbeencompletedandthedatahavebeenprepared,typicallywithinsixmonthsofcollection.Thereisnoperiodofexclusiveusebythedatacollectors.UserscanaccessdocumentationandfinalmonthlyCO2datafilesviatheScrippsCO2Programwebsite(http://scrippsco2.ucsd.edu).ThedatawillbemadeavailableviaftpdownloadfromtheScrippsInstitutionofOceanographyComputerCenter.Rawdata(continuousconcentrationmeasurements,weatherdata,etc.)willbemaintainedonaninternallyaccessibleserverandmadeavailableonrequestatnochargetotheuser. PoliciesforRe‐use,DistributionAccesstodatabasesandassociatedsoftwaretoolsgeneratedundertheprojectwillbeavailableforeducational,researchandnon‐profitpurposes.Suchaccesswillbeprovidedusingweb‐basedapplications,asappropriate.
205
MaterialsgeneratedundertheprojectwillbedisseminatedinaccordancewithUniversity/ParticipatinginstitutionalandNSFpolicies.Dependingonsuchpolicies,materialsmaybetransferredtoothersunderthetermsofamaterialtransferagreement.Publicationofdatashalloccurduringtheproject,ifappropriate,orattheendoftheproject,consistentwithnormalscientificpractices.Researchdatawhichdocuments,supportsandvalidatesresearchfindingswillbemadeavailableafterthemainfindingsfromthefinalresearchdatasethavebeenacceptedforpublication. PlansforArchivingandPreservationShortTerm:Thedataproductwillbeupdatedmonthlyreflectingupdatestotherecord,revisionsduetorecalibrationofstandardgases,andidentificationandflaggingofanyerrors.Thedateoftheupdatewillbeincludedinthedatafileandwillbepartofthedatafilename.Versionsofthedataproductthathavebeenrevisedduetoerrors/updates(otherthannewdata)willberetainedinanarchivesystem.Arevisionhistorydocumentwilldescribetherevisionsmade.DailyandmonthlybackupsofthedatafileswillberetainedattheKeelingGroupLab(http://scrippsco2.ucsd.edu,accessed05/2011),attheScrippsInstitutionofOceanographyComputerCenter,andattheWoodsHoleOceanographicInstitution’sComputerCenter.LongTerm:Ourintentisthatthelong‐termhighqualityfinaldataproductgeneratedbythisprojectwillbeavailableforusebytheresearchandpolicycommunitiesinperpetuity.Therawsupportingdatawillbeavailableinperpetuityaswell,forusebyresearcherstoconfirmthequalityoftheMaunaLoaRecord.Theinvestigatorshavemadearrangementsforlong‐termstewardshipandcurationattheCarbonDioxideInformationandAnalysisCenter(CDIAC),OakRidgeNationalLaboratory(seeletterofsupport).ThestandardizedmetadatarecordsfortheMaunaLoaCO2datawillbeaddedtothemetadatarecorddatabaseatCDIAC,sothatinteresteduserscandiscovertheMaunaLoaCO2recordalongwithotherrelatedEarthsciencedata.CDIAChasastandardizeddataproductcitation[5]includingDOI,thatindicatestheversionoftheMaunaLoaDataProductandhowtoobtainacopyofthatproduct.