connecting arbitrary data sources to the grid

Post on 14-Jan-2016

32 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Connecting arbitrary data sources to the grid. Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University of Adelaide. Background. Australian Research Collaboration Service A successor of APAC Services HPC Data - PowerPoint PPT Presentation

TRANSCRIPT

Connecting arbitrary data sources to the grid

Shunde ZhangAustralian Research Collaboration Service

(ARCS)

eResearch SA

School of Computer Science, University of Adelaide

Background

Australian Research Collaboration Service

A successor of APACServices

– HPC– Data– Collaboration tools: AccessGrid, EVO,

Plone, drupal, Sakai

ARCS Data Fabric

ARCS Data Fabric (cont.)

A national serviceProvided to all Australian

researchersBased on iRODS

The Problem

Interoperability with “The Grid”– “The Grid”: Globus, gLite, condor, etc.– Data sources

• GridFTP-compatible: dCache• Non GridFTP-compatible: iRODS, SRB

Possible solutions– “Manual” copy (or do it in PBS script)– Copy queue

The Problem (cont.)

Movement of massive data– Both ends use same software (talks

same protocol)– Different systems are used (talks

different protocol)– Efficiency

Possible solutions– Transfer via an intermediate point

A solution - old fashioned

AWS Import/Export for Amazon S3– Ship the hard-disks by courier

company

Our Solution - GridFTP

De facto standard– Compatible with the Grid, and many grid

clientsEfficiency

– Parallel transfer– Data channel reuse– Large file transfer - in small blocks

Compatible with many file transfer services– Monitoring– Scheduling

An overview of GridFTP protocolBased on FTP with extensionsThird-party transfer

– Intermediate point not neededSecurity - GSIExtended block mode

– Parallel transfer– Striped transfer– Partial transfer

Reliable and restartableTCP and UDP

The Architecture

GridFTP interface

Generic File System Framework

Data Source Plugin

Data Source

Generic File System Framework

FileSystem

FileSystemConnection

FileObject

RandomAccessFileObject

creates

creates

creates

FileSystem interface

public String getSeparator();

public void init() throws IOException;

public FileSystemConnection

createFileSystemConnection(GSSCredential credential) throws

FtpConfigException, IOException;

public void exit();

FileSystemConnection interface

public FileObject getFileObject(String path);

public String getHomeDir();

public String getUser();

public void close() throws IOException;

public boolean isConnected();

public long getFreeSpace(String path);

FileObject interfacepublic String getName();public String getPath();public boolean exists();public boolean isFile();public boolean isDirectory();public int getPermission();public String getCanonicalPath() throws IOException;public FileObject[] listFiles();public long length();public long lastModified();public RandomAccessFileObject getRandomAccessFileObjec(String type) throws IOException;public boolean delete();public FileObject getParent();public boolean mkdir();public boolean renameTo(FileObject file);public boolean setLastModified(long t);

RandomAccessFileObject interfacepublic void seek(long offset) throws IOException;public int read() throws IOException;public int read(byte[] b) throws IOException;public int read(byte[] b, int off, int len) throws

IOException;public void close() throws IOException;public String readLine() throws IOException;public void write(int b) throws IOException;public void write(byte[] b) throws IOException;public void write(byte[] b, int off, int len) throws

IOException;public long length() throws IOException;

The Implementation - Griffin

GridFTP interface

Generic file system framework

GridFTP client

Grid job submission system

Data transfer service

Adaptor for iRODS

Adaptor for local file system

Other adaptors

iRODS Local File System Other data source

Griffin

Features

GridFTP protocol version 1Java-based

– Spring framework– OS-independent

Lightweight, stand-alone, self-contained– No need to install Globus Toolkit

Two plugins included– iRODS plugin– Local file system plugin

Open source (Apache 2 & GPL)

Parallel transfer with Griffin

Client GriffinData Source

WAN LAN/localhost

Authentication

GSI– iRODS plugin

User mapping – local file system plugin– XML file

• Maps GSI authentication (certificate DN) to internal user management system

Use case

Integration of the Grid and Data Fabric– iRODS plugin for Data Fabric– Third-party transfer to cluster (Globus

GridFTP)

Tested with– Globus.org– Globus-url-copy (5.0 and 4.x)– Globus GridFTP GUI

Performance Evaluation

Server: Two quad-core Xeon 3.16GHz CPU, 16GB memory

Client: IBM xSeries 346 with two hyper-threaded Intel Xeon 3.20GHz CPUs, 4GB memory

Network: 1Gbps LANWAN: two 10Gbps linksTransfer: 256MB, 512MB, 1GB, 2GB,

4GB, 8GB, 16GB– iCommands– Globus-url-copy

Evaluation Set up - Griffin vs iCommands

Client

iRODS

Local File System

Griffin

Jargon Adaptor

globus-url-copy iCommands

Evaluation Result Chart - Griffin vs iCommands

Evaluation Set up -Griffin vs Globus GridFTP

Client

Globus GridFTP server

Local File System

Griffin

Local FS Adaptor

globus-url-copy

Evaluation Result Chart - Griffin vs Globus GridFTP

Related work

Client library– SAGA/jSAGA– Commons-vfs

Data transfer service– Stork– PAFTP

Globus– XIO– DSI

Griffin vs. Globus GridFTP

Griffin Globus GridFTP

Java C

OS-independent *nix

Simple, standalone complex

Conclusion

A generic solution to connect arbitrary data sources to the grid– Data in/out of the grid– Data transfer between different data

sources

Java-based implementation– Standalone, lightweight– Plugable– Not depend on Globus

Future work

Currently working on a plugin for MongoDB

Java NIOUDPStriped transfer

MongoDB plugin

MongoDB– NOSQL database– Stores JSON-style documents– GridFS component

• Stores files

Plugin for griffin– Read/write files via GridFS

Acknowledgements

ARCS funded

Current Status

ARCS production serviceUsed to transfer data in/out of

ARCS Data FabricWebsite

– https://projects.arcs.org.au/trac/griffin

Thank you!

Questions/Comments?

top related