hai-ning wu academia sinica grid computing hnwu@twgrid

43
EGEE-II INFSO-RI- 031688 Enabling Grids for E-sciencE www.eu-egee.org Data Grid Services/SRB/SRM & Practical Hai-Ning Wu Academia Sinica Grid Computing [email protected]

Upload: nedaa

Post on 14-Jan-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Hai-Ning Wu Academia Sinica Grid Computing [email protected]. Data Grid Services/SRB/SRM & Practical. Outlines. Introduction Characteristics of data grid Storage Resource Management (SRM) Storage Resource Broker (SRB) SRB Practical Summary. Data Storage in Large Scales. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

EGEE-II INFSO-RI-031688

Enabling Grids for E-sciencE

www.eu-egee.org

Data Grid Services/SRB/SRM & Practical

Hai-Ning Wu

Academia Sinica Grid Computing

[email protected]

Page 2: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

2

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Outlines

• Introduction• Characteristics of data grid• Storage Resource Management (SRM)• Storage Resource Broker (SRB)

– SRB Practical

• Summary

Page 3: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

3

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Data Storage in Large Scales

• Historically data has been STORED rather than MANAGED

• The amount of data grows so rapidly that traditional storage architectures are no longer suitable

• Data are distributed in multiple types of source – hard to integrate data and increase the barriers between users and storage systems

Page 4: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

4

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Challenges of Data Storage

• Large scales of data• Distributed storage via network• Heterogeneous data resources• Management data with efficiency and safety• Long-term preservation

Page 5: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

5

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

The Solution: Data Grid

• Data virtualization– Manipulates data in high level– Hides details in low level

• Provides a uniform interface to access the distributed data storage systems

Page 6: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

6

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Virtualization

• Data virtualization• Trust virtualization• Data grids are used to manage shared collections that are

distributed across multiple sites and multiple storage systems

Page 7: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

7

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Data Grid - The Idea

Data Grid Data Grid

Data found

Request for D

ata

Client Users

Page 8: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

8

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Data Grid - The Idea

Data found

Request for D

ata

Client Users

Details are hidden. The data grid system finds out where the data are located.

Data Grid System

Page 9: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

10

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Data Grid Transparencies

• Find data without knowing the identifier– Descriptive attributes

• Access data without knowing the location– Logical name space

• Access data without knowing the type of storage– Storage repository abstraction

• Retrieve data using your preferred API– Access abstraction

• Provide transformations for any data collection– Data behavior abstraction

Page 10: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

11

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Data Grid Components

• Federated client-server architecture– Servers can talk to each other independently of the client

• Infrastructure independent naming– Logical names for users, resources, files, applications

• Collective ownership of data– Collection-owned data, with infrastructure independent access control

lists• Context management

– Record state information in a metadata catalog from data grid services such as replication

• Abstractions for dealing with heterogeneity

Page 11: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

12

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Data Grid Architecture

Unix Shell

Java, NTBrowsers

OAI,WSDLOGSA

HTTP

Archives - Tape,HPSS, ADSM,

UniTree, DMF, CASTOR,ADS

DatabasesDB2, Oracle, Sybase,SQLserver,Postgres,

mySQL, Informix

File SystemsUnix, NT,Mac OSX

Application

ORB

Standard Storage System Operations InterfaceStandard Database Interface

DatabasesDB2, Oracle, Sybase,

Postgres, mySQL,Informix

C, C++, Java Libraries

Logical Name Space Management

LatencyManagement

Digital ComponentTransport

MetadataTransport

Consistency & Metadata Management / Authorization-Authentication Audit

Linux I/O

DLL /Python,

Perl

Federation Management

Page 12: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

13

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

SRMThe Data Grid Interface for EGEE Grid

Middleware

Page 13: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

14

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

gLite Services

Page 14: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

15

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

SE

• Storage Element– The Storage Element is the service which allows a

user or an application to store data – Data Channel Protocols

File Transfer and File I/O

Page 15: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

16

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

SRM (Storage Resource Management)

• What is SRM?– SRM is a protocol to manage storage resources (It is NOT a file

access protocol!)– Provides an uniform interface for computing applications and

client users to heterogeneous storage elements– Does not transfer files itself– Provides space management– Manage the life time of file

Page 16: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

17

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

SRM & Grid

Page 17: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

18

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Grid files

• Grid Files– Files in the Grid can be referred by different names:

Logical File Name (LFN) : An alias created by a user to refer to some item of data. For example, /grid/gilda/gridcamp/testFile.txt

Grid Unique IDentifier (GUID) : A non-human-readable unique identifier for an item of data. For example, 37afd0cc-c53b-4795-a873-6a9dde35a9cc

Site URL (SURL) : The location of an actual piece of data on a storage system. For example, srm://dpm01.grid.sinica.edu.tw/dpm/grid.sinica.edu.tw/home/twgrid/generated/2007-09-18/file4c4a5a6f-878d-4ef3-a73d-941ae6275383

Transport URL (TURL) : Temporary locator of a replica + access protocol: understood by a SE. For example, gsiftp://dpm01.grid.sinica.edu.tw/dpm01.grid.sinica.edu.tw:/path1/twgrid/2007-09-18/file4c4a5a6f-878d-4ef3-a73d-941ae6275383.168233.0

– While the GUIDs and LFNs identify a file irrespective of its location, the SURLs and TURLs contain information about where a physical replica is located, and how it can be accessed.

Page 18: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

19

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

LFC

• File Catalogue (LFC)– The mappings between LFNs, GUIDs and SURLs are kept in a

File Catalogue service, while the files themselves are stored in Storage Elements.

– The only file catalogue officially supported in WLCG/EGEE is the LCG File Catalogue (LFC).

Mapping by the “LFC” catalogue server

Page 19: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

20

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

I/O server

GridFTP

WNWN

WNWN

CE SE

SRMSite A

LFCUser interface

1

23

Upload a file to a SE

CASE 1User needs to store data in SE (from a UI)1. Create a new LFN entry

in LFC, return a SURL.2. srmPrepateToPut

(SURL)3. Transfer the file4. srmPutDone (SURL)

4

Page 20: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

21

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Upload a file to a SE

CASE 2Application needs to store data in SE (from a WN)

I/O server

GridFTP

WNWN

WNWN

CE SE

SRM

Site A

LFC1

2

3

1. Create a new LFN entry in LFC, return a SURL.

2. srmPrepateToPut (SURL)

3. Transfer the file4. srmPutDone

(SURL)

4

Page 21: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

22

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

I/O server

GridFTP

WNWN

WNWN

CE SE

SRMSite A

LFCUser interface

1

23

Download files from a SE

CASE 3User needs to retrieve (onto the UI) data stored into SE1. Query the file catalog

to retrieve the SURL from the LFN.

2. srmPrepateToGet (SURL)

3. Transfer the file (read)4. srmReleaseFile (SURL)

4

Page 22: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

23

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Download files from a SE

CASE 4Application needs to copy data locally (into the WN) and use them.

I/O server

GridFTP

WNWN

WNWN

CE SE

SRM

Site A

LFC1

2

3

1. Query the file catalog to retrieve the SURL from the LFN.

2. srmPrepateToGet (SURL)

3. Transfer the file (read)

4. srmReleaseFile (SURL)

4

Page 23: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

24

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

SRB

Storage Resource Broker

Page 24: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

25

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Storage Resource Broker

• Developed at San Diego Supercomputer Center• A distributed file management system (Data Grid), based on a

client-server architecture• A uniform interface to heterogeneous data storage resources, • Based upon their attributes rather than just their names or

physical locations• Support many data storage systems• Provide various types of client interfaces on different platforms

Page 25: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

26

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

SRB Physical Structure

Oracle Client SRB ServerOracle RDBMS

SRB Vault@ location A

SRB Vault@ location B

SRB Vault@ location D

SRB Server

Storage Space

Storage Driver

SRB Server

Storage Space

Storage Driver

SRB Server

Storage Space

Storage Driver

User@

location X

MCAT-Enabled Server

Page 26: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

27

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

SRB Pratical - inQ

• Download inQ 3.5.0 from

http://www.sdsc.edu/srb/tarfiles/inQ350.zip• Unzip inQ350.zip• Execute inQ.exe

Page 27: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

28

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

inQ – Login

• Name: srbusr+your number• Host: tap07.grid.sinica.edu.tw• Domain: ASGC• Port: 6833• Authorization: ENCRYPT1• Password: The same as your user name

Page 28: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

29

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

SRB Client Tool - inQ

Page 29: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

30

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

SRB Demonstrations

• Use InQ to upload, download, remove files.• Use Scommands to upload, download, remove files.

– Sinit: log in SRB system Syntax: Sinit

– Sls: list directory content Syntax: Sls

– Sput: upload a file to the SRB server Syntax: Sput filename

– Sget: download a file from the SRB server Syntax: Sget filename

– Srm: remove a file stored in SRB server Syntax: Srm filename

– Sreplicate: to replicate data to another resource Syntax: Sreplicate filename

– Sexit: log out SRB system Syntax: Sexit

Page 30: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

31

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Data Grid Applications

• Digital Archiving– Long-term preservation– Heterogeneous backup

• Digital Library– Data sharing

• Scientific Computing

Page 31: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

32

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

SRB Use Case

• Build Data Grid Management System – Data Grid services in Academia Sinica– NDAP cross-organization data backup project

Page 32: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

33

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

SRB Data Grid Services in Academia Sinica (1)

• Objective– To provide Grid services for long-term preservation and

unified data access• Data Collection Status

– File size: ~ 60 TB– File count: ~ 3.5 Million

Page 33: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

34

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

SRB Data Grid Service in Academia Sinica(2)

Campus Backbone Network

ASCC(20TB)

IIS(8TB)

IOE(8TB)

ITH(8TB)

IHP(8TB)

IMH(8TB)

IZAS(8TB)

Tape Library(500TB)

Page 34: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

35

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

• Academia Sinica (AS)• National Palace Museum (NPM)• National Taiwan University (NTU)• National Museum of History (NMH)• Academia Historica (DRNH)• National Central Library (NCL)

• National Museum of Natural Science (NMNS)• Taiwan Historica (TH)

NDAP Partners For Long-term Data Preservation

Page 35: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

36

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Data grid for NDAP LTP service

Page 36: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

37

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Summary

• Data grids provides a new solution for large-scale storage with the following features: – Distributed data storage – Efficient and safe management of data– A uniform interface to heterogeneous systems– Flexibility to new storage technology

Page 37: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

38

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

SRM & SRB

• SRM– Used in gLite middleware– A uniform interface between different SEs and grid middleware

• SRB– Developed by SDSC– Support many backend storage systems– Widely used data grid software

Page 38: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

39

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

SRM & SRB

• SRM and SRB cannot interoperate unless they have a standard to communicate

• Constructing a bridge between SRM and SRB so that– Integrate SRB into the gLite environment– Bind resources from the two important data grid systems– This project is currently developed by ASGC

Page 39: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

40

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

SRM & SRB

Page 40: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

41

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

iRODS

• A next generation data grid system after SRB developed by SDSC

• A rule-oriented data grid system• More flexibility for data management• Current version: iRODS 1.0

Page 41: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

42

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

iRODS Workshop

• Time – Tue 8 April 2008• Location – 2nd Conference Room, 3F• For more information, please check on ISGC 2008

Website

Page 42: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

43

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

References

[1] Use Cases on Data Services, Fu-Ming Tsai

[2] Building Preservation Environments with

Data Grid Technology, R. Moore

[3] EGEE Middleware Architecture and Planning (Release 1)

Page 43: Hai-Ning Wu Academia Sinica Grid Computing hnwu@twgrid

44

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Thanks for your attentions!