the data grid: towards an architecture for the distributed management and analysis of large...

Post on 12-Jan-2016

213 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

The Data Grid: Towards an Architecture for the Distributed Management

and Analysis of Large Scientific Dataset

Caitlin Minteer & Kelly Clynes

The Data Grid

Large dataset size Geographic distribution of users and

resources Computationally intensive analysis No other architecture exists that allows

us to apply technologies in large scale application domains

The Data Grid

Data grid applications must frequently operate in wide area, multi-institutional diverse environments

Design Architecture for The Data Grid

Mechanism Neutrality Designed to be as independent as

possible of low level mechanisms Defining interfaces that sum up oddness

of specific storage systems.

Design Architecture for The Data Grid

Policy Neutrality Structured so that design decisions with

significant performance implications are exposed to the user

Design Architecture for The Data Grid

Compatibility with Grid Infrastructure Take advantage of fundamental Grid

infrastructure Compatible with lower level Grid

mechanisms

Design Architecture for The Data Grid

Uniformity of Information Infrastructure The same data model and interface used

to access the grids metadata

Design Architecture for The Data Grid

These four principals lead us to development of a layered architecture.

Lower layers provide high performance access to a statistical set of devices.

In data grids, the focus on simple, policy-independent mechanisms will encourage and enable wide use without limiting the range of applications that can be applied.

Core Grid Data Services

Two fundamental services required in data grid architecture: Data Access Metadata Access

Data Access

Provides mechanisms for accessing, managing, and initiating third party transfers of data stored in storage systems

Metadata Access

Provides mechanisms for accessing and managing information about data stored in storage systems

Data Abstraction: Storage System

Basic grid component is the Storage System which provides functions for creating, destroying, reading, writing and manipulation file instances

File instances are basic unit of information in a storage system

A Storage system implemented by any storage technology that can support the required access functions

Data Access:

Storage system access functions must be included with the security environment of each site to which remote access is required

Applications should be able to provide storage systems with hints concerning access patterns, network performance, etc, that the storage system can use to optimize performance

Data movement functions must be able to detect and report errors

Metadata

Management of the data grid itself Information about file instances, the

contents of file instances, and the various storage systems contained in the grid

The metadata service provides the way to publish and access the data

Application Metadata

Describes the contents and structure of the data Content represented by the file Circumstances under which the data was

obtained Other info useful to applications that

process the data

Replica Metadata

Used to manage replication of data objects

Includes information for mapping file instances to a particular storage system locations

System Configuration Metadata

Describes the fabric of the grid itself i.e network connectivity and details

about storage systems Capacity Usage policy

Additional Requirements

Service must operate efficiently in a distributed environment

Scalable Robust Assert Local Control over information

Hierarchical Distributed System

Because of these, the metadata service must be hierarchical distributed system Achieve scalability Avoid single points of failure Facilitate local control over data

Higher-Level Data Grid Components

Two types of representative components: Replica management Replica selection

Replica Management

Replica Manager Create copies of file instances, or

replicas, within specified storage systems

Offers better performance or availability for access to or from a particular location

Maintains repository or catalog

Replica Selection and Data Filtering

High level service provided in the data grid is Replica Selection Optimize performance principles

Speed Cost Security

Replicas may be local or accessed remotely

Summary

Architecture of the Data Grid Mechanism Neutrality Policy Neutrality Compatibility with Grid Infrastructure Uniformity of information infrastructure

Data Services Data Access Metadata Access

Replica Management

top related