datanet federation consortium preservation policy toolkit. reagan moore, arcot rajasekar and hao xu

17

Click here to load reader

TRANSCRIPT

Page 1: DataNet Federation Consortium Preservation Policy Toolkit. Reagan Moore, Arcot Rajasekar and Hao Xu

Datanet Federation

Consortium

Preservation Policy Toolkit

1

Reagan Moore

Arcot (Raja) Rajasekar

Hao Xu

UNC-Chapel Hill

11/18/2015

Page 2: DataNet Federation Consortium Preservation Policy Toolkit. Reagan Moore, Arcot Rajasekar and Hao Xu

Preservation

• Preservation is communication with the

future

• Preservation requires management of

communication from the past

– How does an archivist verify that the

assertions made about an archives have been

preserved?

– How are assertions preserved as technology

flows through the archives?

2

Page 3: DataNet Federation Consortium Preservation Policy Toolkit. Reagan Moore, Arcot Rajasekar and Hao Xu

Preservation Assertions

• Traditional preservation assertions are:

– Authenticity

– Integrity

– Chain of custody

– Original arrangement

• The DataNet Federation Consortium uses

a policy-based data management system

to preserve collection properties

– Integrated Rule Oriented Data System

(iRODS) 3

Page 4: DataNet Federation Consortium Preservation Policy Toolkit. Reagan Moore, Arcot Rajasekar and Hao Xu

Policy-Based Data Management

• Organize objects in collections

– Associate metadata with each object

• Provenance

• Descriptive

• Administrative

• Virtualize collection properties

– Manage properties independently of the technology

choice

• Naming

• Arrangement

• Access controls

• Integrity

• Metadata4

Page 5: DataNet Federation Consortium Preservation Policy Toolkit. Reagan Moore, Arcot Rajasekar and Hao Xu

Policy-based System Concepts

• Purpose

– Reason the collection is formed

• Properties

– Assertions made about the collection

• Policies

– Control enforcement of properties

• Procedures

– Encapsulate operations applied to objects

• Persistent State

– Information generated by procedures

• Periodic assessment

– Verification of properties

5

Page 6: DataNet Federation Consortium Preservation Policy Toolkit. Reagan Moore, Arcot Rajasekar and Hao Xu

1. Astrophysics Auger supernova search Shared collection

2. Atmospheric science NASA Langley Atmospheric Sciences Center Shared collection

3. Biology Phylogenetics at CC IN2P3 Shared collection

4. Climate NOAA National Climatic Data Center Ingestion cache for archive

5. Cognitive Science Temporal Dynamics of Learning Center Shared collection

6. Computer Science GENI experimental network Archive

7. Cosmic Ray AMS experiment on the International Space Station Shared collection

8. Dark Matter Physics Edelweiss II Shared collection

9. Earth Science NASA Center for Climate Simulations Digital library

10. Ecology CEED Caveat Emptor Ecological Data Digital Library

11. Engineering CIBER-U Digital Library

12. Genomics Broad Institute, Wellcome Trust Sanger Institute, NGS Digital library

13. High Energy Physics BaBar / Stanford Linear Accelerator Shared collection / Archive

14. Hydrology Institute for the Environment, UNC-CH; Hydroshare Digital Library / portal

15. Information Science SLS LifeTime Library, Carolina Digital Repository Digital Library

16. Medicine Lineberger Cancer Institute Patient data analysis

17. Neuroscience International Neuroinformatics Coordinating Facility Shared collection

18. Neutrino Physics T2K and dChooz neutrino experiments Project collections

19. Oceanography SciON Archive

20. Optical Astronomy National Optical Astronomy Observatory Archive

21. Particle Physics Indra multi-detector collaboration at IN2P3 Project collection

22. Plant genetics the iPlant Collaborative Collaboration environment

23. Quantum ChromodynamicsIN2P3 Project collection

24. Radio Astronomy Cyber Square Kilometer Array, TREND, BAOradio Digital library

25. Seismology Southern California Earthquake Center Digital library

26. Social Science Odum, TerraPop Digital library

Projects Using iRODS Policy-based Data Management

11/18/20156

Page 7: DataNet Federation Consortium Preservation Policy Toolkit. Reagan Moore, Arcot Rajasekar and Hao Xu

Policy SetsDataNet Federation Consortium

• Data grids - Sharing data

• Student digital library - Organizing data

• Data centers - Managing data

• Preservation - Archiving data

• Protected data - Enforcing security

• NSF data management- DMP

requirements7

Page 8: DataNet Federation Consortium Preservation Policy Toolkit. Reagan Moore, Arcot Rajasekar and Hao Xu

Define Tasks for each Property

• Preservation purpose defines the set of properties

that are maintained over time

• ISO 16363 – Standard for Trusted Digital Repositories

• 4.6.1 The repository shall comply with Access Policies.

– Access policy for repository.

– Collection Development Policy.

– Definition of the Designated Community.

– Demonstrations and discussion with relevant staff of what occurs when a

query results in 'Access Denied'.

– Documentation that illustrates the Access Policy is being carried out:

Sign in sheets, logs of access, logs of successful and unsuccessful

access to the system, follow up emails or help desk reports when 'access

denials' received.8

Page 9: DataNet Federation Consortium Preservation Policy Toolkit. Reagan Moore, Arcot Rajasekar and Hao Xu

Tasks for Controlling Access

• Creating identifiers for persons, collections, and files.

• Assigning roles to persons.

• Assigning access controls to collections and files (in

effect a relationship between the person identifier and

the file identifier).

• Assigning inheritance of access controls on collections

(files can inherit the access control of the collection).

• Checking access permissions on reads and for other

actions on the file.

• Verifying the set of access controls applied to files in a

collection.

9

Page 10: DataNet Federation Consortium Preservation Policy Toolkit. Reagan Moore, Arcot Rajasekar and Hao Xu

Policy Templates

• Constraints

– Information needed to evaluate constraint

– Operations applied to enforce constraint

• Procedure

– Information needed to apply operations

– Operations that are needed

10

Page 11: DataNet Federation Consortium Preservation Policy Toolkit. Reagan Moore, Arcot Rajasekar and Hao Xu

Policy Template

11

Policy

type Constraint State attributes for Constraint

Access

data By role (type of person) User_ID

Role_type per User_ID

Role_ACL

By ACL (read permission) User_ID

File_name

ACL per File_name per User_ID

Page 12: DataNet Federation Consortium Preservation Policy Toolkit. Reagan Moore, Arcot Rajasekar and Hao Xu

Policy Template

12

Operations State Attributes for Operation

Set person name User_ID

User_name

Set file name File_ID

File_name

Set role per person User_ID

Role_type

Set ACL on file File_ID

User_ID

ACL_type

Set sticky bit on collection Collection_name

Sticky-bit_value

Set access on replication File_ID

Replica_number

User_ID

ACL_type

Execution - check ACL on read File_name

User_ID

ACL_type

Verify ACLs File_ID

Replica_number

User_ID

ACL_type

Page 13: DataNet Federation Consortium Preservation Policy Toolkit. Reagan Moore, Arcot Rajasekar and Hao Xu

Generated Rules

• Data grids - (11 rules)

• Student digital library - ( 8 rules)

• Data centers - (27 rules)

• Preservation - (28 rules)

• Protected data - (66 rules)

• NSF data management- (35 rules)

13

Page 14: DataNet Federation Consortium Preservation Policy Toolkit. Reagan Moore, Arcot Rajasekar and Hao Xu

Standard Rule Structure

• Check input parameters

• Create log file

• Create query on persistent state

information

• Loop over results

• Test for a condition

• Apply operation

• Write results14

Page 15: DataNet Federation Consortium Preservation Policy Toolkit. Reagan Moore, Arcot Rajasekar and Hao Xu

Simple Inheritance Rule

15

setInheritance {

# odum-inherit.r

# Path or file that will have ACL changed

*Home="/$rodsZoneClient/home/$userNameClient/";

*Path= *Home ++ *RelativeCollection;

checkCollInput (*Path);

msiSetACL("recursive", *Acl,*User,*Path);

writeLine("stdout", "Set inheritance of access on collection

*Path");

}

INPUT *RelativeCollection="test", *Acl = "inherit", *User=""

OUTPUT ruleExecOut

Page 16: DataNet Federation Consortium Preservation Policy Toolkit. Reagan Moore, Arcot Rajasekar and Hao Xu

Standard Components• Across the six types of data management applications

• Identified

– 97 policies

– 175 rules that automate tasks

– 123 operations

– 50 persistent state attributes

Collections

Files

Users

Metadata

Quotas Storage limits

Resources Storage systems

Tickets Access URLs

Tokens System parameters

Zones Data grid federation 16

Page 17: DataNet Federation Consortium Preservation Policy Toolkit. Reagan Moore, Arcot Rajasekar and Hao Xu

Books

Policy Templates Workbookhttps://dfcweb.datafed.org/idrop-

web2/home/link?irodsURI=irods%3A%2F%2Firen2.renci.org%3A1237

%2Fdfcmain%2Fhome%2FDFC-public%2Fpapers%2FDFC-policy-

template.pdf

Policy Examples Workbookhttps://dfcweb.datafed.org/idrop-

web2/home/link?irodsURI=irods%3A%2F%2Firen2.renci.org%3A1237

%2Fdfcmain%2Fhome%2FDFC-public%2Fpapers%2FDFC-policy-

examples.pdf

Contact

[email protected]

http://www.datafed.org17