integration of the biological databases into grid-portal environments michal kosiedowski, michal...
TRANSCRIPT
![Page 1: Integration of the Biological Databases into Grid-Portal Environments Michal Kosiedowski, Michal Malecki, Cezary Mazurek, Pawel Spychala, Marcin Wolski](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e865503460f94b8974e/html5/thumbnails/1.jpg)
Integration of the Biological Databases into Grid-Portal
Environments
Michal Kosiedowski, Michal Malecki, Cezary Mazurek, Pawel Spychala, Marcin
Wolski
![Page 2: Integration of the Biological Databases into Grid-Portal Environments Michal Kosiedowski, Michal Malecki, Cezary Mazurek, Pawel Spychala, Marcin Wolski](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e865503460f94b8974e/html5/thumbnails/2.jpg)
Agenda
• Introduction• PROGRESS Grid-Portal
Environment• Data Management System• Enabling SRS resources within DMS• Case study• Conclusions
![Page 3: Integration of the Biological Databases into Grid-Portal Environments Michal Kosiedowski, Michal Malecki, Cezary Mazurek, Pawel Spychala, Marcin Wolski](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e865503460f94b8974e/html5/thumbnails/3.jpg)
R&D Center
• PSNC was established in 1993 and is an R&D Center in:– New Generation Networks
• POZMAN and PIONIER networks• 6-NET, ATRIUM, Muppet,
– HPC and Grids• GRIDLAB, CROSSGRID, VLAB,
PROGRESS, Clusterix, HPCEuropa
– Portals and Content Management Tools• Polish Educational Portal "Interkl@sa",
Multimedia City Guide, Digital Library Framework,Interactive TV
![Page 4: Integration of the Biological Databases into Grid-Portal Environments Michal Kosiedowski, Michal Malecki, Cezary Mazurek, Pawel Spychala, Marcin Wolski](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e865503460f94b8974e/html5/thumbnails/4.jpg)
PROGRESS (1)
• Project Partners– PSNC IBCh Poznan– SUN Microsystems Poland– Cyfronet AMM, Krakow– Technical University Lodz
• Co-funded by The State Committee for Scientific Research (KBN) and SUN Microsystems Poland
![Page 5: Integration of the Biological Databases into Grid-Portal Environments Michal Kosiedowski, Michal Malecki, Cezary Mazurek, Pawel Spychala, Marcin Wolski](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e865503460f94b8974e/html5/thumbnails/5.jpg)
PROGRESS (2)
• The PROGRESS project produced a set of open source tools for use by:– Grid constructors– Computational applications
developers– Computing portals operators
![Page 6: Integration of the Biological Databases into Grid-Portal Environments Michal Kosiedowski, Michal Malecki, Cezary Mazurek, Pawel Spychala, Marcin Wolski](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e865503460f94b8974e/html5/thumbnails/6.jpg)
PROGRESS (3)
• Cluster of 80 processors
• Networked Storage of 1,3 TB
• Software: ORACLE, HPC Cluster Tools, Sun ONE, Sun Grid Engine, Globus
Wrocław
Gdańsk
![Page 7: Integration of the Biological Databases into Grid-Portal Environments Michal Kosiedowski, Michal Malecki, Cezary Mazurek, Pawel Spychala, Marcin Wolski](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e865503460f94b8974e/html5/thumbnails/7.jpg)
![Page 8: Integration of the Biological Databases into Grid-Portal Environments Michal Kosiedowski, Michal Malecki, Cezary Mazurek, Pawel Spychala, Marcin Wolski](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e865503460f94b8974e/html5/thumbnails/8.jpg)
Data Management Issues
• Hide the data management complexity from the end users
• Use new standards defined by grid organizations• Co-operate with different kinds of client
applications• Provide seamless access to data and information
for grid computing• Enable intuitive and efficient methods for resource
exploration• Providing friendly interface to data management
for administrators and scientists
![Page 9: Integration of the Biological Databases into Grid-Portal Environments Michal Kosiedowski, Michal Malecki, Cezary Mazurek, Pawel Spychala, Marcin Wolski](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e865503460f94b8974e/html5/thumbnails/9.jpg)
PROGRESS
![Page 10: Integration of the Biological Databases into Grid-Portal Environments Michal Kosiedowski, Michal Malecki, Cezary Mazurek, Pawel Spychala, Marcin Wolski](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e865503460f94b8974e/html5/thumbnails/10.jpg)
Web Services and PROGRESS
PORTLETSPORTLETS GRID SERVICEGRID SERVICEPROVIDERPROVIDER
DATADATAMANAGEMENTMANAGEMENT
GRID GRID RESOURCERESOURCE
BROKERBROKER
WSWS
![Page 11: Integration of the Biological Databases into Grid-Portal Environments Michal Kosiedowski, Michal Malecki, Cezary Mazurek, Pawel Spychala, Marcin Wolski](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e865503460f94b8974e/html5/thumbnails/11.jpg)
Data Management System
• A distributed system enabling the management of grid data files
• Stores files in distributed storage modules of various types: generic filesystems, archivers, relational databases
• Uses metadata to describe files• Allows access to data banks like a mirror
of Sequence Retrieval System• Exposes its functionality within the Data
Broker Service
![Page 12: Integration of the Biological Databases into Grid-Portal Environments Michal Kosiedowski, Michal Malecki, Cezary Mazurek, Pawel Spychala, Marcin Wolski](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e865503460f94b8974e/html5/thumbnails/12.jpg)
DMS Functionality
• Virtual file system keeping the data organized in a tree structure– Metadirectories – hierarchize other
objects– Metafiles - represent a logical view of
computational data regardless of their physical location
• DMS provides its services in a form of Web Services API (Data Broker Service)
![Page 13: Integration of the Biological Databases into Grid-Portal Environments Michal Kosiedowski, Michal Malecki, Cezary Mazurek, Pawel Spychala, Marcin Wolski](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e865503460f94b8974e/html5/thumbnails/13.jpg)
DMS Functionality
• Web Services interface: storing, access, describing and delivery of data– directory mgmt.: e.g. add, remove and rename
directories, retrieve root and current path, change path,
– file mgmt.: e.g. add, remove and rename files, add, remove and retrieve physical file location,
– metadata mgmt.: e.g. retrieve list of schemes and attributes, assign schemes to files and edit values
– external datasource mgmt.: e.g. databanks content retrieving, entry resolving, databanks exploring
![Page 14: Integration of the Biological Databases into Grid-Portal Environments Michal Kosiedowski, Michal Malecki, Cezary Mazurek, Pawel Spychala, Marcin Wolski](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e865503460f94b8974e/html5/thumbnails/14.jpg)
DMS Architecture
![Page 15: Integration of the Biological Databases into Grid-Portal Environments Michal Kosiedowski, Michal Malecki, Cezary Mazurek, Pawel Spychala, Marcin Wolski](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e865503460f94b8974e/html5/thumbnails/15.jpg)
Data Broker
• Serves as an interface (Web Services) for external clients, such as the HPC Portal and the grid resource broker
• Mediates in the flow of all requests directed to the DMS
• Authorizes the client that submitted the request
![Page 16: Integration of the Biological Databases into Grid-Portal Environments Michal Kosiedowski, Michal Malecki, Cezary Mazurek, Pawel Spychala, Marcin Wolski](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e865503460f94b8974e/html5/thumbnails/16.jpg)
Metadata Repository
• Central and single point of metadata management
• Responsible for all metadata operations and their storage and maintenance
• It stores the following sorts of information:– metadata about resources: data files, its physical
localization and possible way to access them,– metadata about rights: all information related to the
rights – users, their groups, access rights.– metadata describing the standards for file description,
e.g. Dublin Core (DC)– metadata about services: data brokers, data containers
![Page 17: Integration of the Biological Databases into Grid-Portal Environments Michal Kosiedowski, Michal Malecki, Cezary Mazurek, Pawel Spychala, Marcin Wolski](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e865503460f94b8974e/html5/thumbnails/17.jpg)
Data Container
• Enables access to physical data• Data can be stored on various media types • Data can be organized as files on generic
filesystems, BLOBs in databases or files on data tapes
• All Containers possess a uniform interface regardless of the media types they manage
• Container does not perform file transfers - it uses external services like ftp, https, gass, gridftp
![Page 18: Integration of the Biological Databases into Grid-Portal Environments Michal Kosiedowski, Michal Malecki, Cezary Mazurek, Pawel Spychala, Marcin Wolski](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e865503460f94b8974e/html5/thumbnails/18.jpg)
Proxy (SRS Container)
• Enables access to external scientific databases• Includes both Repository (listing entries,
retrieving attached metadata, building queries) and Data Container (downloading files) functionality
• DMS treats the Proxy as a separate, independent module, that manages read-only data
• The PROGRESS grid-portal environment: the Proxy (named SRS Container) enables access to SRS resources
![Page 19: Integration of the Biological Databases into Grid-Portal Environments Michal Kosiedowski, Michal Malecki, Cezary Mazurek, Pawel Spychala, Marcin Wolski](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e865503460f94b8974e/html5/thumbnails/19.jpg)
Administrative Portal
• Web application allowing users to handle grid data management with the use of a web browser
• An intuitive interface allowing to execute superset of DMS services
• An effective way to explore huge SRS resources
• On-line help
![Page 20: Integration of the Biological Databases into Grid-Portal Environments Michal Kosiedowski, Michal Malecki, Cezary Mazurek, Pawel Spychala, Marcin Wolski](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e865503460f94b8974e/html5/thumbnails/20.jpg)
SRS Resources in PSNC
• Genbank Release (about 32 mln entries)Updates (about 2 mln entries)
• EMBL - European Molecular Biology Laboratory Release (about 42 mln entries)Updates (about 2 mln entries)
• PDB – Protein Data Bank• Swissprot
Swissprot Release, Swissprot New, SPTREMBL, REMTREMBL
![Page 21: Integration of the Biological Databases into Grid-Portal Environments Michal Kosiedowski, Michal Malecki, Cezary Mazurek, Pawel Spychala, Marcin Wolski](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e865503460f94b8974e/html5/thumbnails/21.jpg)
SRS Installation
• Installation uses multiple storage resources• Data access interface delivered via a
common portal (srs.man.poznan.pl)• Administrative tasks (retrieval and data
preparation) splitted onto multiple machines• Parallel data retrieving from remote
resources• Offline data indexing and packing on a
computational machine (0.5Tb storage)• Compressed online data (2*250Gb storage)
![Page 22: Integration of the Biological Databases into Grid-Portal Environments Michal Kosiedowski, Michal Malecki, Cezary Mazurek, Pawel Spychala, Marcin Wolski](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e865503460f94b8974e/html5/thumbnails/22.jpg)
SRS Installation - Schema
storage 02
SRSsrs.man.poznan.pl
offlineonline
indexing offindex flatfiles
storage 01
flatfilesindex
viola.man.poznan.pl
bellis-e.man.poznan.pl
![Page 23: Integration of the Biological Databases into Grid-Portal Environments Michal Kosiedowski, Michal Malecki, Cezary Mazurek, Pawel Spychala, Marcin Wolski](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e865503460f94b8974e/html5/thumbnails/23.jpg)
SRS Container• Using shell-based access to the SRS
– SRS operations are sent via a shell command
• Access interface based on Web Services– Internal functionality delivered using SOAP
communication
• Data access - ftp, gsiftp, gass protocols– Data are accessed using external file servers
integrated with SRS module
• Advanced caching system– Databanks and entries are cached and reused in the
following user requests
![Page 24: Integration of the Biological Databases into Grid-Portal Environments Michal Kosiedowski, Michal Malecki, Cezary Mazurek, Pawel Spychala, Marcin Wolski](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e865503460f94b8974e/html5/thumbnails/24.jpg)
Portal Interface – databanks list
![Page 25: Integration of the Biological Databases into Grid-Portal Environments Michal Kosiedowski, Michal Malecki, Cezary Mazurek, Pawel Spychala, Marcin Wolski](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e865503460f94b8974e/html5/thumbnails/25.jpg)
Portal Interface – databank content
![Page 26: Integration of the Biological Databases into Grid-Portal Environments Michal Kosiedowski, Michal Malecki, Cezary Mazurek, Pawel Spychala, Marcin Wolski](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e865503460f94b8974e/html5/thumbnails/26.jpg)
Portal Interface - searching
![Page 27: Integration of the Biological Databases into Grid-Portal Environments Michal Kosiedowski, Michal Malecki, Cezary Mazurek, Pawel Spychala, Marcin Wolski](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e865503460f94b8974e/html5/thumbnails/27.jpg)
Portal Interface – search results
![Page 28: Integration of the Biological Databases into Grid-Portal Environments Michal Kosiedowski, Michal Malecki, Cezary Mazurek, Pawel Spychala, Marcin Wolski](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e865503460f94b8974e/html5/thumbnails/28.jpg)
Portal Interface – copying entries
![Page 29: Integration of the Biological Databases into Grid-Portal Environments Michal Kosiedowski, Michal Malecki, Cezary Mazurek, Pawel Spychala, Marcin Wolski](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e865503460f94b8974e/html5/thumbnails/29.jpg)
Portal Interface – file properties
![Page 30: Integration of the Biological Databases into Grid-Portal Environments Michal Kosiedowski, Michal Malecki, Cezary Mazurek, Pawel Spychala, Marcin Wolski](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e865503460f94b8974e/html5/thumbnails/30.jpg)
DMS Installation Requirements
• Java virtual machine: recommended Java(TM) 2 Runtime Environment, Standard Edition 1.4.1 or higher.
• Database server: DMS is ready to cooperate with Oracle and PostgreSQL engine:– Oracle - Oracle8i or higher recommended– PostgreSQL - version 7.3 or higher is required with
the additional extentions:• chkpass and tablefunc
from contrib package• plpqsql support
![Page 31: Integration of the Biological Databases into Grid-Portal Environments Michal Kosiedowski, Michal Malecki, Cezary Mazurek, Pawel Spychala, Marcin Wolski](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e865503460f94b8974e/html5/thumbnails/31.jpg)
Usage scenario: PROGRESS HPC Portal
• SRS resources can be used as input for grid jobs created, configured and submitted for execution in the grid with the use of the PROGRESS HPC Portal
• An example application is AminSim –aminoacid sequences similarity – developed by Prof. Jacek Blazewicz group at the Institute of Computing Science, Poznan University of Technology
![Page 32: Integration of the Biological Databases into Grid-Portal Environments Michal Kosiedowski, Michal Malecki, Cezary Mazurek, Pawel Spychala, Marcin Wolski](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e865503460f94b8974e/html5/thumbnails/32.jpg)
AminSim portlet (1)
![Page 33: Integration of the Biological Databases into Grid-Portal Environments Michal Kosiedowski, Michal Malecki, Cezary Mazurek, Pawel Spychala, Marcin Wolski](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e865503460f94b8974e/html5/thumbnails/33.jpg)
AminSim portlet (2)
![Page 34: Integration of the Biological Databases into Grid-Portal Environments Michal Kosiedowski, Michal Malecki, Cezary Mazurek, Pawel Spychala, Marcin Wolski](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e865503460f94b8974e/html5/thumbnails/34.jpg)
AminSim portlet (3)
![Page 35: Integration of the Biological Databases into Grid-Portal Environments Michal Kosiedowski, Michal Malecki, Cezary Mazurek, Pawel Spychala, Marcin Wolski](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e865503460f94b8974e/html5/thumbnails/35.jpg)
AminSim portlet (4)
![Page 36: Integration of the Biological Databases into Grid-Portal Environments Michal Kosiedowski, Michal Malecki, Cezary Mazurek, Pawel Spychala, Marcin Wolski](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e865503460f94b8974e/html5/thumbnails/36.jpg)
Conclusions
• SRS resources have been integrated with the distributed file structure of DMS and enabled for use within a grid-portal environment (PROGRESS HPC Portal)
• A web interface (DMS Portal) enhances the efficiency of the SRS resources exploration:– fast copying interesting entries directly to the users’ home
directory– merging files– saving files in various formats (e.g. Fasta)
• The universal access layer to the to the scientific databases may by successfully used to connect other data sources to the Data Management System
![Page 37: Integration of the Biological Databases into Grid-Portal Environments Michal Kosiedowski, Michal Malecki, Cezary Mazurek, Pawel Spychala, Marcin Wolski](https://reader036.vdocument.in/reader036/viewer/2022062518/56649e865503460f94b8974e/html5/thumbnails/37.jpg)
Contact info
• Check http://dms.progress.psnc.pl for more information about DMS
• Check http://dms.progress.psnc.pl/docs/demo.htm for the DMS Portal demo
• Check http://progress.psnc.pl for more information about PROGRESS
• Download it now: http://progress.psnc.pl• Mail DMS team: [email protected]• Mail PROGRESS team: [email protected]