1 [email protected] p-grade portal family for e-science communities peter kacsuk peter kacsuk...
TRANSCRIPT
1
www.lpds.sztaki.hu/[email protected]
P-GRADE Portal Family for e-Science Communities
Peter KacsukPeter Kacsuk MTA SZTAKI
Univ. of Westminster
2
The community aspects of e-science
• Web2 is about creating and supporting web communities• Grid is about creating virtual organizations where e-
science communities – can share resources and – can collaborate
• A portal should support e-science communities in their collaborations and resource sharing
• And even more: it should provide simultaneous access to any accessible – Resources– Databases– Legacy applications– Workflows, etc.no matter in which grid they are operated on.
3
Who are the members of an e-science community?
Grid Portal Developers• Develop the portal core services (job submission, etc.)• Develop higher level portal services (workflow management, etc.)• Develop specialized/customized portal services (grid testing, rendering, etc.)• Writes technical, user and installation manuals
End-users (e-scientists)• Execute the published applications with custom input parameters by creating application instances using the published applications as templates
Grid Application Developers• Develop grid applications by the portal• Publish the completed applications for end-users
4
by transparently accessing a large set of various IT resources from the e-science infrastructure
Clouds
Local clusters
Supercomputers
Desktop grids (DGs)(BOINC, Condor, etc.)
Cluster based service grids (SGs)(EGEE, OSG, etc.)
Supercomputer based SGs
(DEISA, TeraGrid)
Grid systems
E-science infrastructure
What does an individual e-scientist need?
App. Repository
Access to a large set of ready-to-run scientific applications (services)
Portal
Using a portal to parameterize and run these applications
5
What does an e-science community need?
App. Repository
Portal
Clouds
Local clusters
Supercomputers
Desktop grids (DGs)(BOINC, Condor, etc.)
Cluster based service grids (SGs)(EGEE, OSG, etc.)
Supercomputer based SGs
(DEISA, TeraGrid)
Grid systems
Application developers
E-scientists
The same as an individual scientist but in collaboration with other members of the community
6
Collaboration between e-scientists and application developers
App. Repository
Portal
Application developers
E-scientists
End-users (e-scientists)• Specify the problem/application needs• Execute the published applications via the portal with custom input parameters by creating application instances
Application Developers• Develop e-science applications via the portal in collaboration with e-scientists• Publish the completed applications for end-users via an application repository
7
Collaboration between application developers
App. Repository
Portal
Clouds
Local clusters
Supercomputers
Desktop grids (DGs)(BOINC, Condor, etc.)
Cluster based service grids (SGs)(EGEE, OSG, etc.)
Supercomputer based SGs
(DEISA, TeraGrid)
Grid systems
Application developers
• Application developers use the portal to develop complex applications (e.g. parameter sweep workflow) for the e-science infrastructure
• Publish templates, legacy code appls. and half-made applications in the repository to be continued by other appl. developers
8
Collaboration between e-scientists
App. Repository
Portal
Clouds
Local clusters
Supercomputers
Desktop grids (DGs)(BOINC, Condor, etc.)
Cluster based service grids (SGs)(EGEE, OSG, etc.)
Supercomputer based SGs
(DEISA, TeraGrid)
Grid systems
E-scientists
• Sharing parameterized appls via the repository
• Joint run appls via the portal in the e-science infrastructure
• Joint observation and control of appl execution via the portal
9
Requirements for an e-science portal from the e-scientists’ point of view
It should be able to• Support large number of e-scientists (~ 100) with good
response time• Enable the store and share of ready-to-run applications• Enable to parameterize and run applications• Enable to observe and control application execution• Provide reliable appl. execution service even on top of
unreliable infrastructures (like for example grids)• Provide specific, user community views• Enable the access of the various components of an e-
science infrastructure (grids, databases, clouds, local clusters, etc.)
• Support user’s collaboration via sharing:– Applications (legacy, workflow, etc.)– Databases
10
Requirements for an e-science portal from the app. developers’ point of view
It should be able to• Support large number of application developers (~ 100) with good
response time• Enable the store and share of half-made applications, application
templates• Provide graphical appl. developing tools (e.g. workflow editor) to
develop new applications• Enable to parameterize and run applications• Enable to observe and control application execution• Provide methods and API to customize the portal interface towards
specific user community needs by creating user-specific portlets• Enable the access of the various components of an e-science
infrastructure (grids, databases, clouds, local clusters, etc.)• Support application developers’ collaboration via sharing:
– Applications (legacy, workflow, etc.)– Databases
• Enable the integration/call of other services
11
Choice of an e-science portal
• Basic question for a community:– Buy a commercial portal? (Usually expensive)– Download OSS portal? (Good choice but: Does
the OSS project survive for a long time?)– Develop own portal? (Requires long time and
can become very costly)
• The best choice is: Download OSS where there is an active development community behind the portal
12
The role of the Grid portal developers’ community
Grid Portal Developers• Jointly develop the portal core services (e. g. GridSphere, OGCE, Jetspeed-2, etc.) • Jointly develop higher level portal services (workflow management, data management, etc.)• Jointly develop specialized/customized portal services (grid testing, rendering, etc.)• Never build a new portal from scratch, use the power of the community to create really good portals
• Unfortunately, we are not quite there:– Hundreds of e-science portals have been developed– Some of them are really good:
• Genius, Lead, etc.– However, not many of them OSS (see the sourceforge list on the
next slide)– Even less is actively maintained– Even less satisfies the generic requirements of a good e-science
portal
13
Downloadable Grid portals from SourceForge
Generic Since Number of downloads
Active or Finished activity
P-GRADE yes 2008-01-04 1468 Active
SDSC Gridport
yes 2003-10-01 1266 2004-01-15
Lunarc App.
yes 2006-10-05 783 Active
GRIDPortal for NorduGrid
yes 2006-07-07 231 2006-08-09
NCHC yes 2007-11-07 161 Active
Telemed App. Spec. 2007-11-15 283 Active
14
P-GRADE portal family
• The goal of the P-GRADE portal family – To meet all the requirements of end-users and
application developers listed above– To provide a generic portal that can be used by
a large set of e-science communities– To provide a community code based on which
the portal developers’ community can start to develop specialized and customized portals
15
P-GRADE portal family
P-GRADE portal2.4
NGS P-GRADE portal
P-GRADE portal2.5
Param. Sweep
P-GRADE portal2.8
Current release
P-GRADE portal2.9
Under development
WS-PGRADE Portal
Beta release 3.3
WS-PGRADE Portal
Release 3.4
GEMLCAGrid Legacy Code Arch.
GEMLCA, repository concept
Basic concept
Open source from Jan. 2008
2008
2009
2010
16
P-GRADE Portal in a nutshell
• General purpose, workflow-oriented Grid portal. Supports the development and execution of workflow-based Grid applications – a tool for Grid orchestration
• Based on GridSphere-2– Easy to expand with new portlets (e.g. application-specific portlets)– Easy to tailor to end-user needs
• Basic Grid services supported by the portal:
Service EGEE grids (LCG-2/gLite) Globus 2 grids
Job submission Computing Element GRAM
File storage Storage Element, LFC GridFTP server
Certificate management MyProxy/VOMS
Information system BDII MDS-2, MDS-4
Brokering WMS (Workload Management System)
GTbroker
Job monitoring Mercury
Workflow & job visualization PROVE
17
The typical user scenarioPart 1 - development phase
Certificate servers
Portalserver
Gridservices
START EDITOR
OPEN & EDIT or DEVELOP WORKFLOW
SAVE WORKFLOW,
UPLOAD LOCAL FILES
18
Certificate servers
Portalserver
Gridservices
TRANSFER FILES, SUBMIT JOBS
DOWNLOAD (SMALL)
RESULTS
DOWNLOAD (SMALL)
RESULTS
The typical user scenarioPart 2 - execution phase
VISUALIZE JOBS and
WORKFLOW PROGRESS
MONITOR JOBS
DOWNLOAD PROXY CERTIFICATES
SUBMIT WORKFLOW
19
P-GRADE Portal architecture
Tomcat
DAGMan workflow manager
gLite and Globus
Informationsystems
MyProxy server
& VOMS
P-GRADE Portal portlets (JSR-168 Gridsphere-2 portlets)
Informationsystemclients
CoG API&
scripts
Java Webstartworkflow editor
Web browser
shell scripts
Grid middleware services (gLite WMS, LFC,…; Globus GRAM, …)
Client
P-GRADEPortalserver
Grid
Grid middleware clients
Backend layer
Frontend layer
20
P-GRADE portal in a nutshellCertificate and
proxy management
Grid and Grid resource
management
Graphical editor to define workflows and parametric
studies
Accessing resources in multiple VOs
Built-in workflow manager and
execution visualization
GUI is customizable to
certain applications
21
What is a P-GRADE Portal workflow?
• A directed acyclic graph where– Nodes represent jobs (batch
programs to be executed on a computing element)
– Ports represent input/output files the jobs expect/produce
– Arcs represent file transfer operations and job dependencies
• Semantics of the workflow:– A job can be executed if all
of its input files are available
22
Introducing three levels of parallelism
Each job can be a parallel program
– Parallel execution inside a workflow node
– Parallel execution among workflow nodes
Multiple jobs run parallel
– Parameter study execution of the workflow
Multiple instances of the same workflow with different data files
23
Parameter sweep (PS) workflow execution based on the black box concept
PS port: 4 instances of the input file
PS port: 3 instances of the input file
1 PS workflow execution
=
4 x 3 normal executable workflows(e-workflows)
This provides the 3rd level of parallelism resulting a very large demand for Grid resources
24
Workflow parameter studies in P-GRADE Portal
Generator component(s)
Initial input data
Generate orcut input into smaller pieces
Collector component(s)
Aggregate result
Files in the same LFC catalog (e.g. /grid/gilda/sipos/myinputs)
Results produced in the same catalog
Core workflow
E-workflows
25
Generic structure of PS workflows and their execution
1st phase: executing
all Generators in parallel
3rd phase: executing
all Collectors in parallel
2nd phase: executing all
generated eWorkflows in
parallel
Core workflow to be executed as PS
Generator jobs to generate the set of input files
Collector jobs to collect and process the set of output files
26
Integrating P-GRADE portal with DSpace repository
• Goal: to make available workflow applications for the whole P-GRADE portal user community
• Solution: Integrating P-GRADE portal with DSpace repository
• Functions:– App developers can publish
their ready-to-use and half-made applications in the repository
– End-users can download, parameterize and execute the applications stored in the repository
Portal
DSpace repository
Portal
End-user
App developer
Portal
• Advantage: • Appl. developers can collaborate with end-users• Members of a portal user community can share their WFs• Different portal user communities can share their WFs
27
Integrating P-GRADE portal with DSpace repository
DSpace Repository
Upload WF to DSpace
Download WF from DSpace
28
Creating application specific portals from the generic P-GRADE portal
• Creating an appl. spec. portal does not mean to develop it from scratch
• P-GRADE is a generic portal that can quickly and easily be customized to any application type
• Advantage: – You do not have to develop the generic parts (WF
editor, WF manager, job submission, monitoring, etc.)
– You can concentrate on the appl. spec. part
– Much shorter development time
29
Application Specific Module
Concept of creating application specific portals
Custom User Interface (Written in Java, JSP, JSTL)
Web browser
EGEE and Globus Grid services (gLite WMS, LFC,…; Globus GRAM, …)
Client
P-GRADEPortalserver
Grid
Services of P-GRADE Portal(workflow management, parameter study management, fault tolerance, …)
P-GRADE portal developer
P-GRADE portal developer
Appl. developer
End user
30
Roles of people in creating and using customized P-GRADE portals
Grid Application Developer• develops a grid application by P-GRADE Portal• sends the application to the grid portal developer
Grid Portal Developer• Creates new classes from the ASM for P-GRADE by changing the names of the classes• Develops one or more Gridsphere portlets that fit to the application I/O pattern and the end users’ needs• Connects the GUI to P-GRADE Portal using the programming API of P-GRADE ASM• Using the ASM he publishes the grid application and its GUI for end users
End User• Executes the published application with custom input parameters by creating application instances using the published application as a template
They can be the same group
31
Application Specific P-GRADE portals
Rendering portal by Univ. of Westminster
OMNeT++ portal by SZTAKI
Traffic simulation portal by Univ. of Westminster
32
Grid interoperation by P-GRADE portal
• P-GRADE Portal enables: Simultaneous usage of several production Grids at workflow level
• Currently connectable grids: – LCG-2 and gLite: EGEE, SEE-GRID, BalticGrid
– GT-2: UK NGS, US OSG, US Teragrid
• In progress: – Campus Grids with PBS or LSF
– BOINC desktop Grids
– ARC: NorduGrid
– UniCore: D-Grid
33
User
P-GRADE Portal
SZTAKI Portal Server
Simultaneous use of production Grids at workflow level
WMS
broker
Workflow
Manchester
Leeds
UK NGS GT2
EGEE-VOCE gLite
Job
Job
Job
Job
Budapest
Athens
Brno
Supports both direct and brokered job submission
34
P-GRADE Portal references
• P-GRADE Portal services:– SEE-GRID, BalticGrid
– Central European VO of EGEE
– GILDA: Training VO of EGEE
– Many national Grids (UK, Ireland, Croatia, Turkey, Spain, Belgium, Malaysia, Kazakhstan, Switzerland, Australia, etc.)
– US Open Science Grid, TeraGrid
– Economy-Grid, Swiss BioGrid, Bio and Biomed EGEE VOs, MathGrid, etc.
Portal services and account request:
– portal.p-grade.hu/index.php?m=5&s=0
35
Community based business model for the sustainability of P-GRADE portal
• Some of the developments are related to EU projects. Examples:– PS feature: SEE-GRID-2– Integration with DSpace: SEE-GRID-SCI– Integration with BOINC: EDGeS, CancerGrid
• There is an open Portal Developer Alliance with the current active members:– Middle East Technical Univ. (Ankara, Turkey)
• gLite file catalog management portlet
– Univ. of Westminster (London, UK)• GEMLCA legacy code service extension• SRB integration (workflow and portlet)• OGSA-DAI integration (workflow and portlet)• Embedding Taverna, Kepler and Triana WFs into the P-GRADE
workflow• All these features are available in the UK NGS P-GRADE portal
36
Business model for the sustainability of P-GRADE portal
• Some of the developments are ordered by customer academic institutes:– Collaborative WF editor: Reading Univ. (UK)– Accounting portlet: MIMOS (Malaysia)– Separation of front-end and back-end: MIMOS– Shiboleth integration: ETH Zurich– ARC integration: ETH Zurich
• Benefits for the customer academic institutes: – Basically they like the portal but they have some special needs that
require extra development– Instead of developing from scratch a new portal (using many
person-months) rather they pay only for the required little extension/modification of the portal
– To solve their problem gets priority– They become expert of the internal structure of the portal and will
be able to further develop it according to their needs– Joint publications
37
Main features of NGS P-GRADE portal
• Extends P-GRADE portal with– GEMLCA legacy code architecture and repository
– SRB file management
– OGSA-DAI database access
– WF level interoperation of grid data resources
– Workflow interoperability support
• All these features are provided as production service for the UK NGS
38
J1
J2 J3
J4
J5
Grid 1 Grid 2
DB2
FS2
DB1
FS1
Workflow engine
J: Job
FS: File storage system, e.g. SRB or SRM
DB: Database management system (based on OGSA-DAI)
Interoperation of grid data resources
39
Running at OSG
Running at EGEE
Running at NGS
From NGS SRB (both)
From NGS SRB
From NGS GFTP
From local
(both)
From NGS SRB
To EGEE SRM
Running at NGS
Running at NGS
From NGS GFTP
To NGS SRB
Workflow level Interoperation of Workflow level Interoperation of local, SRB, SRM and GridFTP file systems local, SRB, SRM and GridFTP file systems
Jobs can run in various grids and can read and write files stored in different grid systems by different file management systems
40
WF interoperability: P-GRADE workflow embedding Triana, Taverna, and Kepler workflows
Taverna workflow
Kepler workflow
Triana workflow
P-GRADE workflow hosting the other workflows
Available for UK NGS users as production service
41
WS-PGRADE and gUSE• New product in the P-GRADE portal family:
– WS-PGRADE (Web Services Parallel Grid Runtime and Developer Environment)
• WS-PGRADE uses the high-level services of– gUSE (Grid User Support Environment) architecture
• Integrates and generalizes P-GRADE portal and NGS P-GRADE portal features– Advance data-flows (PS features)– GEMLCA– Workflow repository
• gUSE features– Scalable architecture (can be installed on one or more servers)– Various grid submission services (GT2, GT4, LCG-2, gLite, BOINC,
local– Built-in inter-grid broker (seamless access to various types of resources)
• Comfort features– Different separated user views supported by gUSE application repository
42
gUSE: service-oriented architecture
Graphical User Interface: WS-PGRADEGraphical User Interface: WS-PGRADE
WorkflowEngine
WorkflowEngine
Workflowstorage
Workflowstorage File
storage
Filestorage
Applicationrepository
Applicationrepository
LoggingLogging
gUSEinformation
system
gUSEinformation
system
SubmittersSubmitters
Gridsphere portlets
Autonomous Services: high level
middleware service layer
Resources: middleware service layer
Local resources, Service grid resources, Desktop Grid resources, Web services, Databases
Local resources, Service grid resources, Desktop Grid resources, Web services, Databases
gUSE
Meta-brokerMeta-broker SubmittersSubmittersSubmittersSubmitters
Filestorage
Filestorage
SubmittersSubmitters
43
Ergonomics
• Users can be grid application developers or end-users. • Application developers design sophisticated dataflow graphs
– embedding into any depth, recursive invocations, conditional structures, generators and collectors at any position
– Publish applications in the repository at certain stages of work• Applications• Projects• Concrete workflows• Templates• Graphs
• End-users see WS-PGRADE portal as a science gateway – List of ready-to-use applications in gUSE repository– Import and execute application without knowledge of programming,
dataflow or grid
44
Dataflow programming concept for appl. developers
• Cross & dot product data-pairing– Concept similar to Taverna – All-to-all vs. one-to-one
pairing of data items• Any component can be
generator, PS node or collector, no ordering restriction
• Conditional execution based on equality of data
• Nesting, recursion
40
401000
50 20
5000
1
5000 1
7042 tasks
45
Current users of gUSE beta release
• CancerGrid project
– Predicting various properties of molecules to find anti-cancer leads
– Creating science gateway for chemists
• EDGeS project (Enabling Desktop Grids for e-Science)
– Integrating EGEE with BOINC and XtremWeb technologies
– User interfaces and tools
• ProSim project
– In silico simulation of intermolecular recognition
– JISC ENGAGE program (UK)
46
moleculedatabase
executingworkflows browsing
molecules
DG clients from all partners Molecule database server
Portal and DesktopGrid
server
BOINCserver
3GBridge
PortalgUSE
DG jobs
WU 1WU 2WU N
Job 1Job 2Job N
GenWrapper forbatch execution
BOINC client
LegacyApplication
PortalStorage
LocalResource
Local jobs
LegacyApplication
WU X
WU Y
The CancerGrid infrastructure
47
CancerGrid workflow
x1
x1
x1
xN
xN
NxM= 3 millions
NxM
xN
xN
N=30K
xN
xN
NxM
Generator job Generator job
N = 30N = 30KK, M = 100, M = 100 --> about 0.5 year execution time --> about 0.5 year execution time
NxM= 3 millions
Execute on local desktop Grid
48
Protein Molecule Simulation on the Grid
G-USE in ProSim Project
Grid Computing team of Univ. of Westminster
49
The User Scenario
PDB file 1(Receptor) PDB file 2
(Ligand)
Energy Minimization(Gromacs)
Validate(Molprobity)
Check(Molprobity)
Perform docking(AutoDock)
Molecular Dynamics(Gromacs)
50
The Workflow in g-USE
• Parameter sweeps in phases 3 and 4
• Executed on 5 different sites of the UK NGS
51
The ProSim visualiser
52
P-GRADE portal family summary
P-GRADE NGS P-GRADE WS-PGRADE
Scalability ++ + +++
Repository DSpace/WF Job & legacy code services
WF (own development)
Graphical workflow editor
+ + +
Parameter sweep support
+ - ++
Access to various grids
GT2, LCG-2, gLite
GT2, LCG-2, gLite, GT4
GT2, LCG-2, gLite, GT4,
BOINC, campus
Access to clouds In progress - In progress
Access to databases
- via OGSA DAI SQL
Support for WF interoperability
- + In progress
53
Further information…– Take a look at www.lpds.sztaki.hu/pgportal
(manuals, slide shows, installation procedure, etc.)
– Visit or request a training event!(list of events is on P-GRADE Portal homepage)
• Lectures, demos, hands-on tutorials, application development support
– Get an account for the GILDA P-GRADE Portal:www.portal.p-grade.hu/gilda
– Get an account for one of its production installations:• Multi-grid portal (SZTAKI) for VOCE, SEEGRID, HUNGrid,
Biomed VO, Compchem VO, ASTRO VO• NGS P-GRADE portal (Univ. of Westminster) for UK NGS
– Install a portal for your community: • If you are the administrator of a Grid/VO, download the portal from
sourceforge (http://sourceforge.net/projects/pgportal/ )• SZTAKI is pleased to help you install a portal for your community!
54
Thank you for your attention!Any questions?
www.portal.p-grade.hu www.wspgrade.hu