egee-ii infso-ri-031688
DESCRIPTION
Enabling Grids for E-sciencE. A GRID based platform to host multiple repositories for digital content. Antonio Calanducci 1 J.M. González 3 , R. Ramos 2 , M. Rubio 2 , D.Tcaci 3 1 INFN Catania, 2 CETA-CIEMAT, 3 MAAT-G Knowledge 3rd EGEE User Forum - PowerPoint PPT PresentationTRANSCRIPT
EGEE-II INFSO-RI-031688
Enabling Grids for E-sciencE
www.eu-egee.org
EGEE and gLite are registered trademarks
A GRID based platform to host multiple repositories for digital content
Antonio Calanducci1
J.M. González3, R. Ramos2, M. Rubio2, D.Tcaci3
1INFN Catania, 2CETA-CIEMAT, 3MAAT-G Knowledge3rd EGEE User Forum
11-14 Febrary 2008 – Clermont-Ferrand (France)
23rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Introduction
Need to offer a GRID based platform to host arbitrary repositories A digital repository is a set of annotated digitalized data offered to users in a
structured manner. Both digitalized data and annotations can vary greatly from one rep to another
but the following commonalties are acknoledged: There is a basic informational unit of digitalized data (a mammogram, a
page of an ancient manuscript, a 3D model..) There is metadata around each unit of digitalized data (patient info,
diagnoses, translation, historical context, physical properties …) Specific algorithms process the data (search microcalcifications, automatic
translation…) Users browse, search and update the repository, launch algorithms (GRID
WMS) Data is stored in a federated way: each institution owns and manages its
content Metadata to DB, Digitalized data to archive (GRID SE)
2
33rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Goals of gLibrary/DRI
To host multiple repositories of arbitrary structure On a GRID infrastructure (security, federation, …) Reduce the “cost-to-deploy”, reach new communities Open architecture Easy to use platform, web based interface
Collaboration between INFN and CETA-CIEMAT Builds on INFN gLibrary
3
43rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
INFN gLibrary
Created by GILDA team at INFN Catania Secure, robust, easy to use interface to handle
digital assets stored in GRID SE Interface to browse entries and finding files in SE
– “à la iTunes” browsing allows mouse-clicks searches• Built on top of gLite GRID services:
any SRM SE, LFC, AMGA, VOMS authorization Authentication/Authorization
Via applet, creating a proxy cert on the user’s PC Proxy used to interact directly with GRID elements (LFC,
SE, AMGA) Files transferred directly from SE to applet and
viceversa.
4
53rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
gLibrary screenshots
5
63rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
gLibrary/DRI
Extends gLibrary by: Making it multirepository No predefined repository content structure: each repository
describes itself. Decoupling navigation + management from repository specifics
DRI: Digital Repositories Infrastructure A repository must provide:
A description of its navigational structures (trees, filters) and a viewer
A description of its data model An storage engine (for data model persistence)
The DRI API specification describes HOW this is provided A repository provider can
Make its own implementation of the specification Use (or extend) the default one provided
6
73rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
gLibrary/DRI web interface
7
83rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
DICOM viewer
8
93rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
gLibrary/DRI API specification
A repository has to provide: •Data Model:
– XML format description of the repository’s data– Relational data model supported– Indication of which part of the data model is saved on
the federated DB and which on the Storage System•Storage Module:
– it takes care of data persistency – Load() and Saves() method have to be provided for
loading and saving instances of the data model•User Interface Module:
– definition of the navigational trees and filters– viewer for the specific repository
9
11
3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
gLibrary/DRI API specification
Contract between gLibrary/DRI platform and specific repository implementations
Each application must provide three Java modules implementing the following interfaces:
DRIUIInterface for describing trees, filters and viewers DRIStorageInterface for storing and retrieving data DRINodeInterface for defining repository data model
gLibrary/DRI engine orchestrates API calls to different interface implentations
11
12
3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
gLibrary/DRI UI API extract
P
public interface DRIUIInterface {public Vector<Tree> getRepositoryTrees (String reposititoryName);public TreeHierarchy getTreeHierarchy (String treeName);public Vector getFilterNameInstances();public Vector <FilterEntry> getFilterEntries (String filterName);public void loadViewer (String viewerClass);}
public class MyRepositoryUI implements DRIUIInterface {public Vector<Tree> getRepositoryTrees (String repositoryName) {
// access repository config file/db/etc to get tree data…return new Vector( new Tree(“By author”), new Tree(“By date”));
}…}
12
13
3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
gLibrary/DRI Engine Orchestration
Registered repositories
MGUI.getRepositoryTrees(): what are your navigation trees?
MGUI.getFilterNameInstances()what are your filters?
MGUI.LoadViewer(): return an applet with theviewer application to display and manipulatethe selected repository item
MGUI.getFilterEntries()what are the possible values for the selected filter?
13
14
3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
gLibrary/DRI Storage API
public interface DRIStorageInterface {
public DRIGenericNode Load(String Id);public void Remove(String Id);public void CreateNew(DRIGenericNode Node);public void Save(DRIGenericNode Node);
}
public class MyRepositoryStorage implements DRIStorageInterface {public MyRepositoryNode Load (String id) {
// access db, GRID SE, etc.. Assemble one instance of data model…MyRepositoryNode node = new MyRepositoryNode (db, data, …);return node;
}…
}
14
15
3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
gLibrary/DRI default implementation
We provide a default implementation for UI and Storage APIs
public class DRIUIModule implements DRIUIInteface public class DRIStorageModule implements
DRIStorageInterface UI default implementation:
Loads repository trees from AMGA Loads filter definitions from AMGA Field display definitions from AMGA
Storage Reads repository data model from XML file Stores/Loads data model in AMGA and marked items in SEs
15
16
<TableName name=StoragePatient primaryIdAttr=PatientID foreignAttr=NULL>
<attr name=PatientID><dbAttrName>PatientID</dbAttrName><type>int</type><dbAttrType>int</dbAttrType></attr>
<attr name=PatientName><dbAttrName>PatientName</dbAttrName><type>String</type><dbAttrType>Varchar(80)</dbAttrType></attr>
<attr name=AGE><dbAttrName>PatientAge</dbAttrName><type>Int</type><dbAttrType>Int</dbAttrType></attr>
<attr name=studies><dbAttrName>studies</dbAttrName><type>Entity</type><refEntity>StorageStudy</refEntity></attr></TableName>
3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
XML Data model def example
16
<TableName name=StorageStudy primaryIdAttr=StorageID foreignAttr=PatientID>
<attr name=StorageID><dbAttrName>StorageID</dbAttrName><type>int</type><dbAttrType>int</dbAttrType></attr>
<attr name=Diagnose><dbAttrName>Diagnose</dbAttrName><type>String</type><dbAttrType>Varchar(255)</dbAttrType></attr>
<attr name=Mammogram><dbAttrName>Mammogram</dbAttrName><type>LFN</type><dbAttrType>Varchar(255)</dbAttrType></attr></TableName>
DRIStorageModule stores specially marked fieldsin a GRID Storage Element e register them in the File Catalog
DRIStorageModule stores regular fields in AMGA
public class MyRepStorageModule inherits DRIStorageModule {}public class MyRepNode inherits DRIGenericNode DRI Storage module reads data model from XML files:
17
3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Using UI default implementationpublic class MyRepUIModule inherits DRIUIModule {} (not implements DRIUIInterface)
AMGA dumpCollection:/ceta/mgplus/config/treesContent:/ceta/mgplus/config/trees/alphabetical (Collection)
> lsQuery> getattr 0 tag parentid path filter fields >> FromAtoD>> FromEtoJ>> FromKtoO>> FromPtoU>> FromVtoZ
/ceta/mgplus/config/trees/pathologies (Collection)> ls>> 0
Query>getattr 0 tag parentid path filter fields PathologyId>> Benign>> TumorMorphology>> Spread
>> Microcalcifications>> study
>> ‘/ceta/mgplus/data/patient/study:PathologyId=0 and /ceta/mgplus/data/patient:MGPlusPatientId=/ceta/mgplus/data/patient/study:MGPlusStudyId’
/ceta/mgplus/data/patient:MGPlusPatientId,PatientId,PatientName,Gender,AgeAtMenarche,AgeAtMenopause
Note the EMPY implementationWhere MGPLUS trees are stored
Alphabetical patient tree definition
Contents of the alphabetical tree
Pathologies tree definition
Contents of pathologies tree
Filter definition for Microcalcification branch
17
18
3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Mammography repository example
Goals: a GRID based repository for mammograms, patient history and collaborative diagnoses
Uses UI and Storage default implementations Provides its own viewer which accepts a
MGPlusNode: Based on Open Source TUDOR DICOM viewer Adapted it to comply with the DRI API Converted it into an applet Extended functionality (display specific patient data,
annotations directly on the mammograms, etc.) Save() method retrieve directly data files from SEs using
direct GridFTP transfers
18
19
3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Repository specific viewer
19
20
3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
gLibrary/DRI architecture
20
21
3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Technologies Web 2.0 Web interface (AJAX) PHP 5 for the front-end engine Java Servlets for the back-end DRI engine Usage of Java-PHP bridge Applets
For user authentication with their VO certificate For viewers implementation
Java Introspection XML gLite Java APIs: AMGA, LFC wrappers, JGlobus
GridFTPclient
21
22
3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Where we are
Engine deployed and working, API and default implementation working
MGPlus repository implemented on DRI
Current work: Interface to launch and manage jobs on Grid WMS Generic uploader
22
23
3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Conclusions and future work Effectively reduced cost by APIs and default implementation.
New repository providers must: Provide empty implementations of UI and Storage (very easy) Describe their data model in XML (very easy) Adapt/make viewer (difficult)
Provides: Generic multirepository platform, making GRID facilities easily
accessible attract new communities, ease of hosting
Future work: Having a SOA and JSR170 compliant Generic viewer and tree management interface (almost ZERO cost for rep
providers)
EELA-II Official Digital Library product
23
24
3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Contacts• Mailing list:
• Authors:– [email protected]– [email protected]– [email protected]– [email protected]– [email protected]
• Prototypes:– https://glibrary.ct.infn.it (INFN gLibrary platform)– https://dri-dev.ceta-ciemat.es (gLibrary/DRI platform)
24
25
3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Questions?
25