a metadata catalog service for data intensive applications presented by chin-yi tsai

37
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai

Upload: jasmine-king

Post on 12-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai

A Metadata Catalog Service for Data Intensive Applications

Presented by Chin-Yi Tsai

Page 2: A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai

2

Outline

Introduction

The Role of Metadata Services in Grid Data Management

Requirements for the Metadata Service

Components of a Metadata Service

MSC: A Metadata Catalog Service for Grids

Application Experiences

Scalability of the MCS

summary

Page 3: A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai

3

Data-intensive application

Experimental analyses Simulation in scientific disciplines

Massive datasets are shared by a community of hundreds or thousands of researchers

Purpose To manage these large data sets efficiently

Metadata or descriptive information about the data needs to be managed

Page 4: A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai

4

High Level Diagram of the Metadata Catalog Architecture

Client ApplicationClient Application

Web Server Database Connectivity

Web Server Database Connectivity

Metadata Database(MySQL)

Metadata Database(MySQL)

Standard interface

Metadata Catalog Service

Page 5: A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai

5

Introduction

Metadata is information that describes data.

Design of a Metadata Catalog ServiceMetadata Catalog Service (MCS) that provides a mechanism for storing and accessing descriptive metadata and allows users to query for data items based on desired attribtues.

Accurate identification of desired data items is essential for correct analysis of experimental and simulation results.

Page 6: A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai

6

Introduction (cont’d)

There are various types of metadata. Replication metadata Describe the contents of data items Relate to the physical characteristics of data objects, such as

size, access permission.

Distinguish between logical file metadata and physical file metadata.

logical file metadata physical filequery

Page 7: A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai

7

A usage scenario of the Metadata Catalog Service

ClientApplication

Physical Storage System

Replical Location Service

Metadata Catalog Service

MCSWeb Server

MCSDatabase

Replica Index Node

Local Replica Cat.

1

2

3

45

6

Page 8: A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai

8

Metadata types

User MetadataUser Metadata

Virtual Organization MetadataVirtual Organization Metadata

Domain-Specific MetadataDomain-Specific Metadata

Domain-Independent MetadataDomain-Independent Metadata

Physical MetadataPhysical Metadata

Metadata Types

Information about the characteristics of data on physical

storage system

Regardless of the application domain or virtual organization in which the data sets are created

and shared.

Specific to an application domain, a virtual organization or to particular

user

Page 9: A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai

9

The Role of Metadata Services in Grid Data Management

Medata Services as services that maintain mappings between logical name attributes for data items and other descriptive metadata attributes and respond to queries about those mappings.

Metadata Services play a key role

in the publication and the discovery

and access of data sets.

Page 10: A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai

10

Publication

Publication is the process by which data sets and their associated attributes are stored and made acessible to a user community. Domain-independent, domain-dependent, and virtual

organization metadata attributes To discover and access according to attributes

Some members of the community may use the Metadata Service to annotate the data sets with their own observations using user attributes and make these annotations available to a controlled subset of the community.

Page 11: A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai

11

Discovery and Access

Discovery is the process of identifying data items of interest to the user.

ClientApplication

Physical Storage System

Replical Location Service

Metadata Catalog Service

MCSWeb Server

MCSDatabase

Replica Index Node

Local Replica Cat.

1

2

3

45

6

Page 12: A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai

12

Requirements for the Metadata Service

Metadata Service must provide a mechanism for associating logical name attributes with domain-independent metadata attributes.

The Metadata Service must support queries on its contents.

The Metadata Service must implement policies regarding the consistency guarantees, authentication, authorization, and auditing capabilities provided by the service.

Page 13: A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai

13

Requirements for the Metadata Service

The Metadata Service may support the ability to aggregate metadata into collections or views by associating aggregation attributes with logical name attributes.

The Metadata should provide the ability to store attributes that describe the record the transformations ona dataset.

The Metadata Service should provide good performance and scalability.

Page 14: A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai

14

Components of a Metadata Service

A data model that includes mechanisms for aggregation of metadata mappings

A standard schema for domain-independent metadata attributes with extensibility for additional user-defined attributes

A set of standard service behaviors

Query mechanisms for accessing the database

A set of standard interfaces and APIs for storing and accessing metadata

A set of policies for consistency, access control and authorization, and auditing

Page 15: A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai

15

MCS : A Metadata Catalog Service for Grids (design and implementation) The MCS data model

MCS Schema

MCS service implementation

MCS Query mechanism and APIs

MCS policies

Page 16: A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai

17

The MCS Data Model

Logical file Logical collection Logical view

Page 17: A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai

18

MCS Schema Logical file metadata

main attributes of a logical file Logical collection metadata

user-defined associations of logical files Logical view metadata

user-defined aggregation of logical files, logical collections or other logical views

Authorization information is associated the both individual logical files and logical collections

User informationUser information Audit metadataAudit metadata User-defined metadataUser-defined metadata Annotation attributesAnnotation attributes Creation history Creation history External catalog metadataExternal catalog metadata

Page 18: A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai

field name type remarks description

Data_id Integer Non null The data identifier

Logical_name Varchar(250) Non null The logical file name

Version Integer The version of the daat

Data_type Varchar(250) The type of data

Collection_id Integer

Container_id Integer

Container Service Varchar(250)

Is_valid Integer Non null

Creator_Dn Varchar(250) Non null

Last_Modifier_Dn Varchar(250)

Create_Time Date/Time Non null

Last_Modify_Time Date/Time

Master_Copy Varchar(250)

Logical file metadata

Page 19: A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai

Logical collection metadata

Page 20: A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai

Logical view metadata

Page 21: A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai

Authorization information

Page 22: A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai

User information About writers or modicifers of the logical files in the database

Audit metadata Record information about actions that can be performed on the Metadata

Service

User-defined metadata Different application domains have their own metadata schemas

Annotation attributes comments

Creation history Information about how data items are geneated

External catalog metadata Use this information to further query the external catalog

Page 23: A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai

24

MCS Service Implementation

Application Program

Main() { mcsClient( ); mcsCreate( x );}

MCS Client

SOAP Engine SOAP Engine

MCS Server MySQL Database

Overview of the Implementation

Page 24: A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai

25

MCS Query Mechanisms and APIs

The client API provides the following operations: Querying the catalog for logical objects based on object

attributes Querying the static attributes of a logical object Querying the user defined attributes of a logical object Querying the contents of a logical view or a logical collection Creating a logical file, collection or a view Modifying the attributes of a logical object Deleting a logical file, view or a collction Annotating a logical object Adding logical objects to view

Page 25: A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai

26

MCS Policies

The MCS provides authentication and authorization capabilities on the logical files and logical colleciton attributes in MCS

The MCS provides auditing metadata Creation information log

To support other services Such as replica managers that maintain consistency among data

items

Page 26: A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai

27

Application Experiences

To intergrate MCS into the software used by these applications The Pegasus/LIGO Application The Earth System Grid Application

Page 27: A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai

28

The Pegasus/LIGO Application

Pegasus is used to map complex application workflows onto the available Grid resources

Pegasus uses MCS to discover existing application data products.

Pegasus uses the MCS and Replication Location Service MCS only stores logical file names

Attributes that describe these data products, including the type of the data and the duration of data measurements, are stored in the MCS.

23 user defined attributes

ClientApplication

Physical Storage System

Replical Location Service

Metadata Catalog Service

MCSWeb Server

MCSDatabase

Replica Index Node

Local Replica Cat.

1

2

3

45

6

Page 28: A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai

29

The Earth System Grid Application

The MCS is one component in an ESG testbed

ESG scientists use the MCS to discover and query for ESG files based on metadata attributes

Page 29: A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai

30

Scalability of the MCS

Database size Logical collection Logical file User defined

100,000 100 1000 10

1,000,000 1000 1000 10

5,000,000 5000 1000 10

Add and query operations

Page 30: A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai

31

Scalability of the MCS

With web interface

Web service overhead

Page 31: A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai

32

Scalability of the MCS

Page 32: A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai

33

Scalability of the MCS

Page 33: A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai

34

Scalability of the MCS

Page 34: A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai

35

Scalability of the MCS

Page 35: A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai

36

Scalability of the MCS

Page 36: A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai

37

Scalability of the MCS

Page 37: A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai

38

Summary

The design and implementation of a MCS

Store, access, and query

To make the service more extensible and to provdie a more general query model

Use of other database backnd technologies