dfc vision

19
National Science Foundation Cooperative Agreement: OCI-0940841

Upload: milton

Post on 26-Feb-2016

44 views

Category:

Documents


0 download

DESCRIPTION

DFC Vision. Build collaboration environment Sharing of data, information , and knowledge Form national data cyberinfrastructure Federation of existing data management systems Support reproducible data-driven research Encapsulate knowledge within shared workflows - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: DFC Vision

National Science Foundation Cooperative Agreement: OCI-0940841

Page 2: DFC Vision

Compute Resources – HPC centers, institutional clusters

DFC Collaboration Environment – Data Grid

Community Resources – Repository, Catalog

DFC Vision

• Build collaboration environment– Sharing of data, information, and knowledge

• Form national data cyberinfrastructure– Federation of existing data management

systems • Support reproducible data-driven

research– Encapsulate knowledge within shared

workflows• Enable student participation in

research– Policy-controlled analysis of “live” data

NEW

Page 3: DFC Vision

Data Driven Science and Engineering

Collaboration Environments– Oceanography – Ocean Observatory

Initiative• Archiving climatic data records from

real-time sensor data streams– Engineering – CIBER-U

• Engineering Digital Library: Curating civil engineering data, materials data, archaeology data, student training materials

– Hydrology - EarthCube• Automating hydrology research

workflows (data retrieval, transformation, analysis)

– Plant biology – the iPlant Collaborative• Enable collaborative research across

existing data repositories– Cognitive science – the Temporal

Dynamics of Learning Center• Manage research data, apply IRB

policies

– Social Science – the Odum Institute

• Integrate policy-based data management with the existing Dataverse repository

Page 4: DFC Vision

Challenges

• Federated national data cyberinfrastructure

• Existing projects have web services, data repositories, digital libraries, archives, processing pipelines, science portals

• What are the interoperability mechanisms needed to enable federation of existing resources?

Page 5: DFC Vision

1. Astrophysics Auger supernova search2. Atmospheric science NASA Langley Atmospheric Sciences Center3. Biology Phylogenetics at CC IN2P34. Climate NOAA National Climatic Data Center5. Cognitive Science Temporal Dynamics of Learning Center6. Computer Science GENI experimental network7. Cosmic Ray AMS experiment on the International Space Station8. Dark Matter Physics Edelweiss II9. Earth Science NASA Center for Climate Simulations10. Ecology CEED Caveat Emptor Ecological Data11. Engineering CIBER-U12. High Energy Physics BaBar / Stanford Linear Accelerator13. Hydrology Institute for the Environment, UNC-CH; Hydroshare14. Genomics Broad Institute, Wellcome Trust Sanger Institute, NGS15. Medicine Sick Kids Hospital16. Neuroscience International Neuroinformatics Coordinating Facility17. Neutrino Physics T2K and dChooz neutrino experiments18. Oceanography Ocean Observatories Initiative19. Optical Astronomy National Optical Astronomy Observatory20. Particle Physics Indra multi-detector collaboration at IN2P321. Plant genetics the iPlant Collaborative22. Quantum Chromodynamics IN2P323. Radio Astronomy Cyber Square Kilometer Array, TREND, BAOradio24. Seismology Southern California Earthquake Center25. Social Science Odum, TerraPop

DFC Builds on the iRODS data grid(integrated Rule Oriented Data System)

Page 6: DFC Vision

CollectionDefines

Attribute

Has

Has

Digital Object

Has

Has

Collection Purpose

Defines

PolicyProperty Defines Controls UpdatesPersistent

State Information

Policy Concept Graph Purpose

Procedure

Completeness

Correctness

Isa

Consensus

Consistency

HasFeature

HasFeature

HasFeature

Integrity

Isa

Authenticity Isa

Access control

HasFeature

Property

Replication Policy

Checksum Policy

Quota Policy

Data Type Policy

Isa

Isa

Isa

Isa

Policy

Workflow

Isa

Function

Chains

Operation

Isa

Updates

GetUserACL

SetDataType

SetQuota

DataObjRepl

SysChksumDataObj

Isa

Isa

Isa

Isa

Isa

Procedure

Isa

DATA_ID DATA_REPL_NUM DATA_CHECKSUM

Isa Isa Isa

Persistent State

Client Action

Periodic Assessment

Criteria Policy

Policy Enforcement

Point

Invokes

HasSubType

Policy Enforcement

Page 7: DFC Vision

Policy-based Data Management – Implementation in iRODS

CollectionPurpose

(5 main types)

Completeness

Correctness

Consensus

Defines

Consistency

Attribute

HasFeature

HasFeature

HasFeature

Has

Defines

Policy (11 default)

Has

Property (7 default)

Defines Procedure (11 default)

Controls Updates

Clients (50)

Periodic Assessment

Criteria Policy

Policy Enforcement Points (70)

Workflow

Invokes

HasSubType Isa

Micro-service (317)

Chains

Operation

Isa

Persistent State

Information (338)

Isa

Digital Object

Updates

Has

Has

Replication Policy

Checksum Policy

Quota Policy

Data Type Policy

Isa

IsaIntegrity

Isa

AuthenticityIsa

Access control

Isa

msiGetUserACL

msiSetDataType

msiSetQuota

msiDataObjRepl

msiSysChksumDataObj

Isa

Isa

Isa

Isa

Isa

DATA_ID DATA_REPL_NUM DATA_CHECKSUM

Isa Isa IsaIsa

Isa

HasFeature

ArchiveData gridCollection

Digital LibraryProcessing Pipeline

SubType

Page 8: DFC Vision

DFC - CNI

Federation Approach

• Use middleware to implement unifying name spaces for:1. Users Single sign-on2. Collections Directories, workflow, time series3. Objects Files, soft links, workflows4. Storage systems Cloud, tape, file systems, objects5. Metadata Provenance, description, state6. Policies Management, assessment7. Micro-services Procedures, interactions

Page 9: DFC Vision

Port: 1237, Zone: dfcmain

iCATiren2.renci.org

hydroReschydro.renci.org

res-bk15srbbrick15.ucsd.edu

res-dfcmainiren2.renci.org

demoResciren2.renci.org

renciIren2.renci.org: 1247

ooiicat.oceanobservatories.org: 1247

TDLCtdlc-01.sdsc.edu: 6688

odumMainiodum1.irss.unc.edu: 1247

dfctestdfctest.renci.org: 1248

engineeringirods.ischool.drexel.edu: 1247

hydrologyiren2.renci.org: 2823

DFC Federation Hub

Page 10: DFC Vision

National Infrastructure

Research Environment - Portals, Applications, Workflows

DFC Collaboration Environment – Data Grid

Community ResourceRepository

Community ResourceCatalog

Community ResourceServices

Existing infrastructure

XSEDEKepler

OOITDLCiPlant

CUAHSINCDC

Dataverse

GeoBrainDataONE

NCSA Polyglot

DFC - CNI

Page 11: DFC Vision

The Challenge:Support reproducible data-driven research

PETABYTES

DOUBLING EVERY

TWO YEARSDeliver the capability to manage, mine, and publish

knowledge through collaboration environments.

ExperimentsArchive

sSensor

s

Literature

Simulation

The Future: Reproducible Research

DFC - CNI

Page 12: DFC Vision

National Infrastructure Approach

1. Build national data cyberinfrastructure prototype – Support multiple science and engineering domains by loosely coupling

their existing infrastructure with a collaboration environment2. Develop generic interoperability framework – Define the generic infrastructure needed for the national

infrastructure to manage knowledge as well as data and information3. Define interoperability mechanisms – Support access across the disparate types of infrastructure in common

use4. Define domain specific extensions – Support three levels: technical interoperability, project level policy,

and end user usage requirements

Page 13: DFC Vision

Interoperability Mechanisms

Information

Collection Registration

Information Exchange

Soft Links

Message Queue

Information ManipulationDatabase Query

Policies control execution of each interoperability mechanism

DataData Access

Data Manipulation

Micro-services

Storage Driver

KnowledgeKnowledge CreationAnalysis Workflows

Knowledge ManagementProcedures : Micro-services

DFC - CNI

Page 14: DFC Vision

DataNet InteroperabilityResearch Environment - Portals, Applications, Workflows

DFC Collaboration Environment

Message Queue

Web Service

DataONE Member Node

TerraPop Server

SEAD Portal (VIVO)

DataONE Coordinating Node

SEAD Engagement CenterDFC

Data GridSEAD Data

DFC Data Grid

DFC - CNI

Page 15: DFC Vision

DFC Interoperability Layers

Authentication

Workflows

Data Manipulation

Networks

PAM / GSSAPI InCommon, GSI, Kerberos, Shibboleth, LDAP

Micro-Services Kepler, NCSA Cyberintegrator, Taverna, NCSA

Polyglot

Format Drivers NetCDF, HDF5, THREDDS, ERDDAP

Network Drivers HTTPS, TCP/IP, Parallel TCP/IP, RBUDP

Data Access Micro-Services DataONE, Data Conservancy, CUAHSI, NCDC

DFC - CNI

Clients

Vocabulary

Messaging

Management

OpenSocialWeb browsers, Web Services, Workflows,

FUSE, Synchronization, MediaWiki

Micro-Services HIVE, (Cheshire)

Micro-Services AMQP, iRODS Xmsg

Policies (RDA Policies), (ISO 16363 Criteria)

Storage Systems Storage Drivers File Systems, Tape Archives, Object Stores, Cloud Storage

Page 16: DFC Vision

Interoperability Mechanisms

• Drivers– Encapsulate knowledge to support your operations at the remote

repository: partial I/O, parsing of formats, manipulation of data structures

– Authentication, format, storage• Micro-services

– Encapsulate knowledge needed to interact with an external system or with a data set using the remote protocol

– Data access, external workflows, semantics, messaging• Policies

– Encapsulate knowledge needed for management functions– Federation control, administrative tasks, validation checks

Page 17: DFC Vision

Assertion

• Three basic types of interoperability mechanisms are sufficient for assembling national data cyberinfrastructure

• Example: Linked software defined networks to data grids– From an iRODS data grid, controlled the selection of three

disjoint network paths for optimizing data transport by adding appropriate policy enforcement points and micro-services

• Expect functionality currently in data grid middleware to migrate into network middleware

Page 18: DFC Vision

Future Architecture

Clients

Resources

Data Grid Middleware

Clients

Network Middleware

Data Grid Middleware

ResourcesDFC Federation

GEMI - GENI

Virtual collection

Virtual network

Page 19: DFC Vision

DFC - CNI

Contacts

http://datafed.orghttp://irods.org

Reagan W. [email protected]

National Science Foundation Cooperative Agreement: OCI-0940841