the juxmem-gfarm collaboration enhancing the juxmem grid data sharing service with persistent...

18
The JuxMem-Gfarm Collaboration Enhancing the JuxMem Grid Data Sharing Service with Persistent Storage Using the Gfarm Global File System Gabriel Antoniu, Loïc Cudennec, Majd Ghareeb INRIA/IRISA Rennes, France Osamu Tatebe University of Tsukuba Japan

Upload: roy-powell

Post on 21-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The JuxMem-Gfarm Collaboration Enhancing the JuxMem Grid Data Sharing Service with Persistent Storage Using the Gfarm Global File System Gabriel Antoniu,

The JuxMem-Gfarm Collaboration

Enhancing the JuxMem Grid Data Sharing Service with Persistent Storage

Using the Gfarm Global File System

Gabriel Antoniu, Loïc Cudennec,Majd Ghareeb

INRIA/IRISARennes, France

Osamu Tatebe

University of TsukubaJapan

Page 2: The JuxMem-Gfarm Collaboration Enhancing the JuxMem Grid Data Sharing Service with Persistent Storage Using the Gfarm Global File System Gabriel Antoniu,

2

Context: Grid Computing

Target architecture: cluster federations (e.g. Grid’5000)

Focus: large-scale data sharing

Solid mechanics

Thermodynamics

Optics

Dynamics

Satellite design

Page 3: The JuxMem-Gfarm Collaboration Enhancing the JuxMem Grid Data Sharing Service with Persistent Storage Using the Gfarm Global File System Gabriel Antoniu,

3

Current Approaches for Data Management on Grids

Use of data catalogs: Globus GridFTP, Replica Location Service, etc

Logistical networking of data: IBP Buffers available across Internet

Unified access to data: SRB From file-systems to tapes and databases

Limitations No transparency => Increased complexity at large scale No consistency guarantees for replicated data

Page 4: The JuxMem-Gfarm Collaboration Enhancing the JuxMem Grid Data Sharing Service with Persistent Storage Using the Gfarm Global File System Gabriel Antoniu,

4

Towards Transparent Access to Data

Desirable features Uniform access to distributed data via global identifiers Transparent data localization and transfer Consistency models and protocols for replicated data

Examples of systems taking this approach On clusters

Memory level: DSM systems (Ivy, TreadMarks, etc.) File level: NFS-like systems

On grids Memory level: data sharing services

JuxMem - INRIA Rennes, France File level: global file systems

Gfarm - AIST/University of Tsukuba, Japan

Page 5: The JuxMem-Gfarm Collaboration Enhancing the JuxMem Grid Data Sharing Service with Persistent Storage Using the Gfarm Global File System Gabriel Antoniu,

Idea: a Collaborative Research of Memory and File-level Data Sharing

Study possible interactions between The JuxMem grid data sharing service The Gfarm global file system

Goal Enhance global data sharing functionality Improve performance and reliability Build a memory hierarchy for global data sharing by

combining the memory level and the file system level

Approach Enhance JuxMem with Persistent Storage using Gfarm

Support The DISCUSS Sakura collaboration (2006-2007)

5

Page 6: The JuxMem-Gfarm Collaboration Enhancing the JuxMem Grid Data Sharing Service with Persistent Storage Using the Gfarm Global File System Gabriel Antoniu,

6

JuxMem: a Grid Data-Sharing Service

Generic grid data-sharing service Grid-scale: 103-104 nodes Transparent data localization Data consistency Fault-tolerance

JuxMem ~= DSM + P2P

Implementation Multiple replication strategies Configurable consistency protocols Based on JXTA 2.0 (

http://www.jxta.org/)

Integrated into 2 grid programming models

GridRPC (DIET, ENS Lyon) Component models (CCM & CCA)

Cluster group A

Juxmem group

Cluster group C

Cluster group B

Data group D

http://juxmem.gforge.inria.fr

Page 7: The JuxMem-Gfarm Collaboration Enhancing the JuxMem Grid Data Sharing Service with Persistent Storage Using the Gfarm Global File System Gabriel Antoniu,

7

JuxMem’s Data Group: a Fault-Tolerant, Self-Organizing Group

GDG : Global Data GroupLDG : Local Data Group

LDG

Client

GDG

LDG

LDG

Data group D

Data availability despite failures is ensured through replication and fault-tolerant building blocks

Hierarchical self-organizing groups Cluster level: Local Data Group (LDG) Grid level: Global Data Group (GDG)

Group membership

Atomic multicast

Consensus

Failure detectors

Adaptation layer

Self-organizing group

Page 8: The JuxMem-Gfarm Collaboration Enhancing the JuxMem Grid Data Sharing Service with Persistent Storage Using the Gfarm Global File System Gabriel Antoniu,

8

JuxMem: Memory Model and API

Memory model (currently): entry consistency Explicit association of data to locks Multiple Reader Single Writer (MRSW)

juxmem_acquire, acquire_read, release Explicit lock acquire/release before/after access

API Allocate memory for JuxMem data

ptr = juxmem_malloc (size, #clusters, #replicas per cluster, &ID…)

Map existing JuxMem data to local memory ptr = juxmem_mmap (ID), juxmem_unmap (ptr)

Synchronization before/after data access juxmem_acquire(ptr), juxmem_acquire_read(ptr),

juxmem_release(ptr) Read and write data: direct access through pointers!

int n = *ptr; *ptr =…

Page 9: The JuxMem-Gfarm Collaboration Enhancing the JuxMem Grid Data Sharing Service with Persistent Storage Using the Gfarm Global File System Gabriel Antoniu,

Gfarm: a Global File System [CCGrid 2002]

Commodity-based distributed file system that federates storage of each site

It can be mounted from all cluster nodes and clients It provides scalable I/O performance wrt the number

of parallel processes and users It supports fault tolerance and avoids access

concentration by automatic replica selection

Gfarm File System

/gfarm

ggf jp

aist gtrc

file1 file3file2 file4

file1 file2

File replica creation

Globalnamespace

mapping

Page 10: The JuxMem-Gfarm Collaboration Enhancing the JuxMem Grid Data Sharing Service with Persistent Storage Using the Gfarm Global File System Gabriel Antoniu,

GridFTP, samba, NFS server

Compute & fs node

Compute & fs node

Compute & fs node

Compute & fs node

Compute & fs node

Compute & fs node

Gfarm: a Global File System (2)

Files can be shared among all nodes and clients Physically, it may be replicated and stored on any

file system node Applications can access it regardless of its location File system nodes can be distributed

GridFTP, samba, NFS server

Gfarm metadata server

Compute & fs node

Compute & fs node

Compute & fs node

Compute & fs node

Compute & fs node

ClientPC

NotePC

/gfarm

metadata

Gfarmfile system

File A

File A

File B

File C

File A

File B

File C

File C

File B

USJapan

Page 11: The JuxMem-Gfarm Collaboration Enhancing the JuxMem Grid Data Sharing Service with Persistent Storage Using the Gfarm Global File System Gabriel Antoniu,

11

Our Goal: Build a Memory Hierarchy for Global Data Sharing

Approach Applications use JuxMem’s API (memory-level sharing) Applications DO NOT use Gfarm directly JuxMem uses Gfarm to enhance data persistence

Without Gfarm, JuxMem supports some crashes of memory providers thanks to the self-organizing groups

With Gfarm, persistence is further enhanced thanks to secondary storage

How does it work? Basic principle: on each lock release, data can be flushed

to Gfarm Flush frequency can be tuned to compromise

efficiency/fault tolerance

Page 12: The JuxMem-Gfarm Collaboration Enhancing the JuxMem Grid Data Sharing Service with Persistent Storage Using the Gfarm Global File System Gabriel Antoniu,

12

Step 1: A Single Flush by One Provider

Cluster #1 Cluster #2

JuxMemGlobalData

Group(GDG)

JuxMemProvider

GDG Leader

JuxMemProvider

JuxMemProvider

GFarmGFSDGFSD GFSDGFSD

One particular JuxMem provider (GDG leader) flushes data to Gfarm

Then, other Gfarm copies can be created using Gfarm’s gfrep command

Page 13: The JuxMem-Gfarm Collaboration Enhancing the JuxMem Grid Data Sharing Service with Persistent Storage Using the Gfarm Global File System Gabriel Antoniu,

13

Step 2: Parallel Flush by LDG Leaders

Cluster #1 Cluster #2

JuxMemLocalData

Group(LDG #1)

JuxMemProvider

LDG #1Leader

JuxMemProvider

GFarmGFSD

LDG #2Leader

GFSD GFSDGFSD

JuxMemLocalData

Group(LDG #2)

One particular JuxMem provider in each cluster (LDG leader) flushes data to Gfarm (parallel copy creation, one copy per cluster)

The copies are registered as the same Gfarm file Then, other Gfarm copies can be created using Gfarm’s gfrep command

Page 14: The JuxMem-Gfarm Collaboration Enhancing the JuxMem Grid Data Sharing Service with Persistent Storage Using the Gfarm Global File System Gabriel Antoniu,

14

Step 3: Parallel Flush by All Providers

Cluster #1 Cluster #2

JuxMemGlobalData

Group(GDG)

JuxMemProvider

GFarmGFSD

JuxMemProvider

GFSD

JuxMemProvider

GFSD

JuxMemProvider

GFSD

All JuxMem providers in each cluster (LDG leader) flush data to Gfarm All copies are registered as the same Gfarm file Useful to create multiple copies of the Gfarm file per cluster No more replication using gfrep

Page 15: The JuxMem-Gfarm Collaboration Enhancing the JuxMem Grid Data Sharing Service with Persistent Storage Using the Gfarm Global File System Gabriel Antoniu,

15

Deployment issues Application deployment on large scale infrastructures

Reserve resources Configure the nodes Manage dependencies between processes Start processes Monitor and clean up the nodes

Mixed-deployment of GFarm and JuxMem Manage dependencies between processes of both applications Make the JuxMem provider able to act as a Gfarm client

Approach: use a generic deployment tool: ADAGE (INRIA, Rennes, France)

Design specific plugins for Gfarm and JuxMem

Page 16: The JuxMem-Gfarm Collaboration Enhancing the JuxMem Grid Data Sharing Service with Persistent Storage Using the Gfarm Global File System Gabriel Antoniu,

16

ADAGE: Automatic Deployment of Applications in a Grid Environment

IRISA/INRIA Paris Research Group Deploy a same application

on any kind of resources from clusters to grids

Support multi-middleware applications MPI+CORBA+JXTA+GFARM...

Network topology description Latency and bandwidth hierarchy NAT, non-IP networks Firewalls, Asymmetric links

Planner as plugin Round robin & Random

Preliminary support for dynamic applications

Some successes 29,000 JXTA peers on ~400 nodes 4003 components on 974 processors on 7

sites

GFarm Application Description

JuxMem Application Description

Resource Descriptio

n

Generic Application Description

Control Parameter

s

Deployment Planning

Deployment Plan Execution

Application Configuratio

n

Page 17: The JuxMem-Gfarm Collaboration Enhancing the JuxMem Grid Data Sharing Service with Persistent Storage Using the Gfarm Global File System Gabriel Antoniu,

17

Roadmap overview

Design of the common architecture (done) Discussions on possible interactions between JuxMem and Gfarm

May 2006, Singapore (CCGRID 2006) June 2006, Paris (HPDC 2006 and NEGST workshop)

October 2006: Gabriel Antoniu and Loïc Cudennec visited the Gfarm team First deployment tests of Gfarm on G5K Overall Gfarm/JuxMem design

December 2006: Osamu Tatebe visited the JuxMem team Refinement of the Gfarm/JuxMem design

Implementation of JuxMem on top of Gfarm (partially done) April 2007: Gabriel Antoniu and Loïc Cudennec visited the Gfarm team One JuxMem provider (GDG leader) flushes data to Gfarm after each critical

section (step 1 done) Work started on large-scale deployment of Gfarm using ADAGE

Future work: parallel flush of JuxMem data to Gfarm (steps 2 and 3) Work in progress (Master thesis of Majd Ghareeb at INRIA Rennes)

Page 18: The JuxMem-Gfarm Collaboration Enhancing the JuxMem Grid Data Sharing Service with Persistent Storage Using the Gfarm Global File System Gabriel Antoniu,

18

Roadmap: deployment Design the Gfarm plugin for ADAGE (April 2007 - done!)

Propose a specific application description language for GFarm Translate the specific description into a generic description Start processes with respect of the dependencies Transfer the Gfarm configuration files from:

The Metadata Server to the Agents The Agents to their GFSD and Clients

Deployment of JuxMem on top of Gfarm (May 2007 - first prototype running on G5K!)

A simple configuration (one reader, one writer, one provider = one Gfarm client)

ADAGE deploys Gfarm, then JuxMem (separate deployment) Limitation: the user still needs to indicate

The Gfarm client hostname The Gfarm configuration file location

Future work: design a meta-plugin for ADAGE that automatically deploys a mixed description of a Gfarm+JuxMem configuration