SARA Reken- en Netwerkdiensten
ToPoS:High-Throughput Parallel Processing Pipelines on the Grid
Pieter van Beek
SARA Computing and Networking Services
High Performance Computing and Visualization
e-Science Support
ToPoS | 23 October 2008
SARA Reken- en Netwerkdiensten
Users experiences with gLite
Overhead for starting jobs is considerable
Determining the best chunk size is difficult. Too small -> large overhead
Too large -> timeouts and throughput problems.
Resource brokering is far from optimal
Jobs often fail and users create their own tools for administrative tasks
ToPoS | 23 October 2008
SARA Reken- en Netwerkdiensten
Resource Brokering
ToPoS | 23 October 2008
Submitted jobs are sent to a CE immediately.
When another CE becomes available, you won't use it automatically
SARA Reken- en Netwerkdiensten
Failing Jobs (1)
Common experiences:
Sorry, an Incomprehensible Error occurred
Your VOMS Credential has expired
What Job?
Success! (but there’s no output)
Failure! (but it ran just fine)
Out of Wall-time (but no CPU-time?)
A lot of “monitoring and resubmission” software is created again and again by many users.
ToPoS | 23 October 2008
SARA Reken- en Netwerkdiensten
Failing Jobs (2)
A real world example:27,000 jobs
duration: approx. 4 hrs
approx. 280 WNs
Theoretical duration: 16 days
But with a success rate of 70% …Approx. 9 resubmissions
“Practical” duration: >2 months
ToPoS | 23 October 2008
SARA Reken- en Netwerkdiensten
Pilot Jobs
ToPoS | 23 October 2008
“Normal” jobs
Pilot jobs
SARA Reken- en Netwerkdiensten
Simplest possible solution:Topos I
An online counter, like a “page views” counter
Numbers are “leased” for some period
Leases must be renewed
Interfaced with HTTP (REST web service)
Can be used with any HTTP client (wget, browsers)
As little security as possible
ToPoS | 23 October 2008
SARA Reken- en Netwerkdiensten
Pilot job flow
ToPoS | 23 October 2008
Pilot jobPilot job
affirmtokenuse
affirmtokenuse
Getunusedtoken
Getunusedtoken
SubmitSubmit
Pilot job with
token
Pilot job with
token
Running pilot jobRunning pilot job
Executetoken task
Executetoken task
Finished?
Finished?
DeletetokenDeletetoken
noyes
SARA Reken- en Netwerkdiensten
Advantages
Simple design and useUsing HTTP REST
Automatic resubmissions
Less overhead for large number of jobs. One pilot job can execute several tasks in sequence.
Improved scheduling
Easy job administration by querying Token Pool Server.Progress
Fail rate
ToPoS | 23 October 2008
SARA Reken- en Netwerkdiensten
Topos I screenshots
ToPoS | 14 November 2008
SARA Reken- en Netwerkdiensten
Topos 2.x
Interfaced by WebDAV i.o. HTTP
Tokens are files, i.e. they haveidentity
content
mime-type
properties
Token pools are directories
Tokens can be moved between directories
Allows users to build pipelines and workflows (high-level colored Petri nets)
ToPoS |
SARA Reken- en Netwerkdiensten
Topos 2 screenshot
ToPoS |
SARA Reken- en Netwerkdiensten
“Portfolio”
SciaGridCollaboration between SRON, KNMI, NIKHEF and SARA
Website where users can select satellite data (Sciamachy) data processors
Arnold Kuzniar and Jack Leunissen (WUR)BLAST protein sequence alignment
Bas Dutilh (CMBI)HAMMER sequence alignment (?)
Jan Bot (TUD)
ToPoS |
SARA Reken- en Netwerkdiensten
Future directions
Documentation
ATOM/RSS instead of WEBDAV
Back to numbers instead of files
TODO
ToPoS |