datanet federation consortium preservation policy toolkit. reagan moore, arcot rajasekar and hao xu
Upload: 12th-international-conference-on-digital-preservation-ipres-2015
Post on 23-Jan-2018
191 views
TRANSCRIPT
Datanet Federation
Consortium
Preservation Policy Toolkit
1
Reagan Moore
Arcot (Raja) Rajasekar
Hao Xu
UNC-Chapel Hill
11/18/2015
Preservation
• Preservation is communication with the
future
• Preservation requires management of
communication from the past
– How does an archivist verify that the
assertions made about an archives have been
preserved?
– How are assertions preserved as technology
flows through the archives?
2
Preservation Assertions
• Traditional preservation assertions are:
– Authenticity
– Integrity
– Chain of custody
– Original arrangement
• The DataNet Federation Consortium uses
a policy-based data management system
to preserve collection properties
– Integrated Rule Oriented Data System
(iRODS) 3
Policy-Based Data Management
• Organize objects in collections
– Associate metadata with each object
• Provenance
• Descriptive
• Administrative
• Virtualize collection properties
– Manage properties independently of the technology
choice
• Naming
• Arrangement
• Access controls
• Integrity
• Metadata4
Policy-based System Concepts
• Purpose
– Reason the collection is formed
• Properties
– Assertions made about the collection
• Policies
– Control enforcement of properties
• Procedures
– Encapsulate operations applied to objects
• Persistent State
– Information generated by procedures
• Periodic assessment
– Verification of properties
5
1. Astrophysics Auger supernova search Shared collection
2. Atmospheric science NASA Langley Atmospheric Sciences Center Shared collection
3. Biology Phylogenetics at CC IN2P3 Shared collection
4. Climate NOAA National Climatic Data Center Ingestion cache for archive
5. Cognitive Science Temporal Dynamics of Learning Center Shared collection
6. Computer Science GENI experimental network Archive
7. Cosmic Ray AMS experiment on the International Space Station Shared collection
8. Dark Matter Physics Edelweiss II Shared collection
9. Earth Science NASA Center for Climate Simulations Digital library
10. Ecology CEED Caveat Emptor Ecological Data Digital Library
11. Engineering CIBER-U Digital Library
12. Genomics Broad Institute, Wellcome Trust Sanger Institute, NGS Digital library
13. High Energy Physics BaBar / Stanford Linear Accelerator Shared collection / Archive
14. Hydrology Institute for the Environment, UNC-CH; Hydroshare Digital Library / portal
15. Information Science SLS LifeTime Library, Carolina Digital Repository Digital Library
16. Medicine Lineberger Cancer Institute Patient data analysis
17. Neuroscience International Neuroinformatics Coordinating Facility Shared collection
18. Neutrino Physics T2K and dChooz neutrino experiments Project collections
19. Oceanography SciON Archive
20. Optical Astronomy National Optical Astronomy Observatory Archive
21. Particle Physics Indra multi-detector collaboration at IN2P3 Project collection
22. Plant genetics the iPlant Collaborative Collaboration environment
23. Quantum ChromodynamicsIN2P3 Project collection
24. Radio Astronomy Cyber Square Kilometer Array, TREND, BAOradio Digital library
25. Seismology Southern California Earthquake Center Digital library
26. Social Science Odum, TerraPop Digital library
Projects Using iRODS Policy-based Data Management
11/18/20156
Policy SetsDataNet Federation Consortium
• Data grids - Sharing data
• Student digital library - Organizing data
• Data centers - Managing data
• Preservation - Archiving data
• Protected data - Enforcing security
• NSF data management- DMP
requirements7
Define Tasks for each Property
• Preservation purpose defines the set of properties
that are maintained over time
• ISO 16363 – Standard for Trusted Digital Repositories
• 4.6.1 The repository shall comply with Access Policies.
– Access policy for repository.
– Collection Development Policy.
– Definition of the Designated Community.
– Demonstrations and discussion with relevant staff of what occurs when a
query results in 'Access Denied'.
– Documentation that illustrates the Access Policy is being carried out:
Sign in sheets, logs of access, logs of successful and unsuccessful
access to the system, follow up emails or help desk reports when 'access
denials' received.8
Tasks for Controlling Access
• Creating identifiers for persons, collections, and files.
• Assigning roles to persons.
• Assigning access controls to collections and files (in
effect a relationship between the person identifier and
the file identifier).
• Assigning inheritance of access controls on collections
(files can inherit the access control of the collection).
• Checking access permissions on reads and for other
actions on the file.
• Verifying the set of access controls applied to files in a
collection.
9
Policy Templates
• Constraints
– Information needed to evaluate constraint
– Operations applied to enforce constraint
• Procedure
– Information needed to apply operations
– Operations that are needed
10
Policy Template
11
Policy
type Constraint State attributes for Constraint
Access
data By role (type of person) User_ID
Role_type per User_ID
Role_ACL
By ACL (read permission) User_ID
File_name
ACL per File_name per User_ID
Policy Template
12
Operations State Attributes for Operation
Set person name User_ID
User_name
Set file name File_ID
File_name
Set role per person User_ID
Role_type
Set ACL on file File_ID
User_ID
ACL_type
Set sticky bit on collection Collection_name
Sticky-bit_value
Set access on replication File_ID
Replica_number
User_ID
ACL_type
Execution - check ACL on read File_name
User_ID
ACL_type
Verify ACLs File_ID
Replica_number
User_ID
ACL_type
Generated Rules
• Data grids - (11 rules)
• Student digital library - ( 8 rules)
• Data centers - (27 rules)
• Preservation - (28 rules)
• Protected data - (66 rules)
• NSF data management- (35 rules)
13
Standard Rule Structure
• Check input parameters
• Create log file
• Create query on persistent state
information
• Loop over results
• Test for a condition
• Apply operation
• Write results14
Simple Inheritance Rule
15
setInheritance {
# odum-inherit.r
# Path or file that will have ACL changed
*Home="/$rodsZoneClient/home/$userNameClient/";
*Path= *Home ++ *RelativeCollection;
checkCollInput (*Path);
msiSetACL("recursive", *Acl,*User,*Path);
writeLine("stdout", "Set inheritance of access on collection
*Path");
}
INPUT *RelativeCollection="test", *Acl = "inherit", *User=""
OUTPUT ruleExecOut
Standard Components• Across the six types of data management applications
• Identified
– 97 policies
– 175 rules that automate tasks
– 123 operations
– 50 persistent state attributes
Collections
Files
Users
Metadata
Quotas Storage limits
Resources Storage systems
Tickets Access URLs
Tokens System parameters
Zones Data grid federation 16
Books
Policy Templates Workbookhttps://dfcweb.datafed.org/idrop-
web2/home/link?irodsURI=irods%3A%2F%2Firen2.renci.org%3A1237
%2Fdfcmain%2Fhome%2FDFC-public%2Fpapers%2FDFC-policy-
template.pdf
Policy Examples Workbookhttps://dfcweb.datafed.org/idrop-
web2/home/link?irodsURI=irods%3A%2F%2Firen2.renci.org%3A1237
%2Fdfcmain%2Fhome%2FDFC-public%2Fpapers%2FDFC-policy-
examples.pdf
Contact
http://www.datafed.org17