egee-ii infso-ri-031688

24
EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks A GRID based platform to host multiple repositories for digital content Antonio Calanducci 1 J.M. González 3 , R. Ramos 2 , M. Rubio 2 , D.Tcaci 3 1 INFN Catania, 2 CETA-CIEMAT, 3 MAAT-G Knowledge 3rd EGEE User Forum 11-14 Febrary 2008 – Clermont-Ferrand (France)

Upload: simeon

Post on 19-Mar-2016

41 views

Category:

Documents


0 download

DESCRIPTION

Enabling Grids for E-sciencE. A GRID based platform to host multiple repositories for digital content. Antonio Calanducci 1 J.M. González 3 , R. Ramos 2 , M. Rubio 2 , D.Tcaci 3 1 INFN Catania, 2 CETA-CIEMAT, 3 MAAT-G Knowledge 3rd EGEE User Forum - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: EGEE-II INFSO-RI-031688

EGEE-II INFSO-RI-031688

Enabling Grids for E-sciencE

www.eu-egee.org

EGEE and gLite are registered trademarks

A GRID based platform to host multiple repositories for digital content

Antonio Calanducci1

J.M. González3, R. Ramos2, M. Rubio2, D.Tcaci3

1INFN Catania, 2CETA-CIEMAT, 3MAAT-G Knowledge3rd EGEE User Forum

11-14 Febrary 2008 – Clermont-Ferrand (France)

Page 2: EGEE-II INFSO-RI-031688

23rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Introduction

Need to offer a GRID based platform to host arbitrary repositories A digital repository is a set of annotated digitalized data offered to users in a

structured manner. Both digitalized data and annotations can vary greatly from one rep to another

but the following commonalties are acknoledged: There is a basic informational unit of digitalized data (a mammogram, a

page of an ancient manuscript, a 3D model..) There is metadata around each unit of digitalized data (patient info,

diagnoses, translation, historical context, physical properties …) Specific algorithms process the data (search microcalcifications, automatic

translation…) Users browse, search and update the repository, launch algorithms (GRID

WMS) Data is stored in a federated way: each institution owns and manages its

content Metadata to DB, Digitalized data to archive (GRID SE)

2

Page 3: EGEE-II INFSO-RI-031688

33rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Goals of gLibrary/DRI

To host multiple repositories of arbitrary structure On a GRID infrastructure (security, federation, …) Reduce the “cost-to-deploy”, reach new communities Open architecture Easy to use platform, web based interface

Collaboration between INFN and CETA-CIEMAT Builds on INFN gLibrary

3

Page 4: EGEE-II INFSO-RI-031688

43rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

INFN gLibrary

Created by GILDA team at INFN Catania Secure, robust, easy to use interface to handle

digital assets stored in GRID SE Interface to browse entries and finding files in SE

– “à la iTunes” browsing allows mouse-clicks searches• Built on top of gLite GRID services:

any SRM SE, LFC, AMGA, VOMS authorization Authentication/Authorization

Via applet, creating a proxy cert on the user’s PC Proxy used to interact directly with GRID elements (LFC,

SE, AMGA) Files transferred directly from SE to applet and

viceversa.

4

Page 5: EGEE-II INFSO-RI-031688

53rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

gLibrary screenshots

5

Page 6: EGEE-II INFSO-RI-031688

63rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

gLibrary/DRI

Extends gLibrary by: Making it multirepository No predefined repository content structure: each repository

describes itself. Decoupling navigation + management from repository specifics

DRI: Digital Repositories Infrastructure A repository must provide:

A description of its navigational structures (trees, filters) and a viewer

A description of its data model An storage engine (for data model persistence)

The DRI API specification describes HOW this is provided A repository provider can

Make its own implementation of the specification Use (or extend) the default one provided

6

Page 7: EGEE-II INFSO-RI-031688

73rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

gLibrary/DRI web interface

7

Page 8: EGEE-II INFSO-RI-031688

83rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

DICOM viewer

8

Page 9: EGEE-II INFSO-RI-031688

93rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

gLibrary/DRI API specification

A repository has to provide: •Data Model:

– XML format description of the repository’s data– Relational data model supported– Indication of which part of the data model is saved on

the federated DB and which on the Storage System•Storage Module:

– it takes care of data persistency – Load() and Saves() method have to be provided for

loading and saving instances of the data model•User Interface Module:

– definition of the navigational trees and filters– viewer for the specific repository

9

Page 10: EGEE-II INFSO-RI-031688

11

3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

gLibrary/DRI API specification

Contract between gLibrary/DRI platform and specific repository implementations

Each application must provide three Java modules implementing the following interfaces:

DRIUIInterface for describing trees, filters and viewers DRIStorageInterface for storing and retrieving data DRINodeInterface for defining repository data model

gLibrary/DRI engine orchestrates API calls to different interface implentations

11

Page 11: EGEE-II INFSO-RI-031688

12

3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

gLibrary/DRI UI API extract

P

public interface DRIUIInterface {public Vector<Tree> getRepositoryTrees (String reposititoryName);public TreeHierarchy getTreeHierarchy (String treeName);public Vector getFilterNameInstances();public Vector <FilterEntry> getFilterEntries (String filterName);public void loadViewer (String viewerClass);}

public class MyRepositoryUI implements DRIUIInterface {public Vector<Tree> getRepositoryTrees (String repositoryName) {

// access repository config file/db/etc to get tree data…return new Vector( new Tree(“By author”), new Tree(“By date”));

}…}

12

Page 12: EGEE-II INFSO-RI-031688

13

3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

gLibrary/DRI Engine Orchestration

Registered repositories

MGUI.getRepositoryTrees(): what are your navigation trees?

MGUI.getFilterNameInstances()what are your filters?

MGUI.LoadViewer(): return an applet with theviewer application to display and manipulatethe selected repository item

MGUI.getFilterEntries()what are the possible values for the selected filter?

13

Page 13: EGEE-II INFSO-RI-031688

14

3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

gLibrary/DRI Storage API

public interface DRIStorageInterface {

public DRIGenericNode Load(String Id);public void Remove(String Id);public void CreateNew(DRIGenericNode Node);public void Save(DRIGenericNode Node);

}

public class MyRepositoryStorage implements DRIStorageInterface {public MyRepositoryNode Load (String id) {

// access db, GRID SE, etc.. Assemble one instance of data model…MyRepositoryNode node = new MyRepositoryNode (db, data, …);return node;

}…

}

14

Page 14: EGEE-II INFSO-RI-031688

15

3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

gLibrary/DRI default implementation

We provide a default implementation for UI and Storage APIs

public class DRIUIModule implements DRIUIInteface public class DRIStorageModule implements

DRIStorageInterface UI default implementation:

Loads repository trees from AMGA Loads filter definitions from AMGA Field display definitions from AMGA

Storage Reads repository data model from XML file Stores/Loads data model in AMGA and marked items in SEs

15

Page 15: EGEE-II INFSO-RI-031688

16

<TableName name=StoragePatient primaryIdAttr=PatientID foreignAttr=NULL>

<attr name=PatientID><dbAttrName>PatientID</dbAttrName><type>int</type><dbAttrType>int</dbAttrType></attr>

<attr name=PatientName><dbAttrName>PatientName</dbAttrName><type>String</type><dbAttrType>Varchar(80)</dbAttrType></attr>

<attr name=AGE><dbAttrName>PatientAge</dbAttrName><type>Int</type><dbAttrType>Int</dbAttrType></attr>

<attr name=studies><dbAttrName>studies</dbAttrName><type>Entity</type><refEntity>StorageStudy</refEntity></attr></TableName>

3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

XML Data model def example

16

<TableName name=StorageStudy primaryIdAttr=StorageID foreignAttr=PatientID>

<attr name=StorageID><dbAttrName>StorageID</dbAttrName><type>int</type><dbAttrType>int</dbAttrType></attr>

<attr name=Diagnose><dbAttrName>Diagnose</dbAttrName><type>String</type><dbAttrType>Varchar(255)</dbAttrType></attr>

<attr name=Mammogram><dbAttrName>Mammogram</dbAttrName><type>LFN</type><dbAttrType>Varchar(255)</dbAttrType></attr></TableName>

DRIStorageModule stores specially marked fieldsin a GRID Storage Element e register them in the File Catalog

DRIStorageModule stores regular fields in AMGA

public class MyRepStorageModule inherits DRIStorageModule {}public class MyRepNode inherits DRIGenericNode DRI Storage module reads data model from XML files:

Page 16: EGEE-II INFSO-RI-031688

17

3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Using UI default implementationpublic class MyRepUIModule inherits DRIUIModule {} (not implements DRIUIInterface)

AMGA dumpCollection:/ceta/mgplus/config/treesContent:/ceta/mgplus/config/trees/alphabetical (Collection)

> lsQuery> getattr 0 tag parentid path filter fields >> FromAtoD>> FromEtoJ>> FromKtoO>> FromPtoU>> FromVtoZ

/ceta/mgplus/config/trees/pathologies (Collection)> ls>> 0

Query>getattr 0 tag parentid path filter fields PathologyId>> Benign>> TumorMorphology>> Spread

>> Microcalcifications>> study

>> ‘/ceta/mgplus/data/patient/study:PathologyId=0 and /ceta/mgplus/data/patient:MGPlusPatientId=/ceta/mgplus/data/patient/study:MGPlusStudyId’

/ceta/mgplus/data/patient:MGPlusPatientId,PatientId,PatientName,Gender,AgeAtMenarche,AgeAtMenopause

Note the EMPY implementationWhere MGPLUS trees are stored

Alphabetical patient tree definition

Contents of the alphabetical tree

Pathologies tree definition

Contents of pathologies tree

Filter definition for Microcalcification branch

17

Page 17: EGEE-II INFSO-RI-031688

18

3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Mammography repository example

Goals: a GRID based repository for mammograms, patient history and collaborative diagnoses

Uses UI and Storage default implementations Provides its own viewer which accepts a

MGPlusNode: Based on Open Source TUDOR DICOM viewer Adapted it to comply with the DRI API Converted it into an applet Extended functionality (display specific patient data,

annotations directly on the mammograms, etc.) Save() method retrieve directly data files from SEs using

direct GridFTP transfers

18

Page 18: EGEE-II INFSO-RI-031688

19

3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Repository specific viewer

19

Page 19: EGEE-II INFSO-RI-031688

20

3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

gLibrary/DRI architecture

20

Page 20: EGEE-II INFSO-RI-031688

21

3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Technologies Web 2.0 Web interface (AJAX) PHP 5 for the front-end engine Java Servlets for the back-end DRI engine Usage of Java-PHP bridge Applets

For user authentication with their VO certificate For viewers implementation

Java Introspection XML gLite Java APIs: AMGA, LFC wrappers, JGlobus

GridFTPclient

21

Page 21: EGEE-II INFSO-RI-031688

22

3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Where we are

Engine deployed and working, API and default implementation working

MGPlus repository implemented on DRI

Current work: Interface to launch and manage jobs on Grid WMS Generic uploader

22

Page 22: EGEE-II INFSO-RI-031688

23

3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Conclusions and future work Effectively reduced cost by APIs and default implementation.

New repository providers must: Provide empty implementations of UI and Storage (very easy) Describe their data model in XML (very easy) Adapt/make viewer (difficult)

Provides: Generic multirepository platform, making GRID facilities easily

accessible attract new communities, ease of hosting

Future work: Having a SOA and JSR170 compliant Generic viewer and tree management interface (almost ZERO cost for rep

providers)

EELA-II Official Digital Library product

23

Page 23: EGEE-II INFSO-RI-031688

24

3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Contacts• Mailing list:

[email protected]

• Authors:– [email protected][email protected][email protected][email protected][email protected]

• Prototypes:– https://glibrary.ct.infn.it (INFN gLibrary platform)– https://dri-dev.ceta-ciemat.es (gLibrary/DRI platform)

24

Page 24: EGEE-II INFSO-RI-031688

25

3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Questions?

25