1 integrated workload management for beowulf clusters bill desalvo – august 18, 2004...

51
1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 [email protected]

Upload: noah-richard-williams

Post on 04-Jan-2016

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

1

Integrated Workload Management for Beowulf Clusters

Bill DeSalvo – August 18, 2004

[email protected]

Page 2: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 20032

What We’ll Cover

Platform LSF Family of Products

What is Platform LSF HPC

Key Features & Benefits

How it Works

Q&A

Page 3: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 20033

Platform’s Grid Solution Architecture

Page 4: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 20034

Technical Computing Product Family

Page 5: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 20035

Platform LSF Family of Products

Platform LSF Intelligent, policy-driven batch application workload processing

Manage & accelerate batch workloads for compute- and data-intensive applications

Platform LSF HPC Intelligent, policy-driven high performance computing (HPC) workload processing

Manage & accelerate High Performance Computing (HPC) mission-critical workload

Platform LSF MultiCluster

Intelligent, policy-driven batch application workload processing across multiple Platform LSF clusters

Share between autonomously managed departments or organizations spanning geographical locations

Complimentary Products

Platform LSF License Scheduler

Intelligent, policy-driven application license optimization for Platform LSF clusters

Optimize the usage of all application licenses based on an organization’s established distribution policy

Platform LSF Analytics

Intelligent delivery of precise information for better project decisions

Better co-ordinate projects, estimate project completion times and provision resources more accurately

Platform LSF Reports

Intelligent cluster operation reporting for Platform LSF clusters

Visibility into cluster utilization

Page 6: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 20036

What Problems Are We Solving?

Solve large, grand challenge, complex problems by optimizing the placement of workload in High Performance Computing environments

Page 7: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 20037

Platform LSF HPC

Intelligent, policy-driven high performance computing (HPC) workload processing

Parallel & sequential batch workload management for High Performance Computing (HPC)

Includes patent-pending topology-based scheduling

Intelligently schedules parallel batch jobs

Virtualizes resources

Prioritizes service levels based on policies

Based on Platform LSF:

Standards-based, OGSI-compliant, grid-enabled solution

Commercial production quality product

Marnie Biles
Updated
Page 8: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 20038

Platform Customers

Page 9: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 20039

Platform Customers

Page 10: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 200310

Platform Customers

Page 11: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 200311

Platform LSF HPC

Platform LSF HPC AlphaServer SC

Platform LSF HPC for IBM

Platform LSF HPC for Linux

Platform LSF HPC for SGI

Platform LSF HPC for Cray

Page 12: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 200312

Extensive Hardware Support

HP

HP AlphaServer SC

HP XC

HP Superdome

HP-UX 11i

SGI

SGI IRIX

SGI TRIX

SGI Altix, SGI Propack

IBM

IBM RS/6000 AIX

IBM SP2/SP3

Linux

IA-64 systens with RedHat

Intel, AMD 32-bit systems with LINUX kernel

Sun

SUN Solaris

High Performance Interconnects

Myrinet with GM

Quadrics QsNet

SGI Numa Flex SGI NumaLink

IBM SP Switch

Page 13: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 200313

Platform LSF HPC – Linux Support

HP

HP XC Systems running Unlimited Linux

HP Itanium 2 systems running LINUX 2.4.x kernel, glibc 2.2 with RMS on Quadrics QsNet/Elan3

HP Alpha/AXP systems running LINUX 2.4.x kernel, glibc 2.2.x with RMS on Quadrics QsNet/Elan3

Linux

IA-64 systems, Kernel 2.4.x, compiled with glibc 2.2.x, tested on RedHat 7.3

x86 systems:

Kernel 2.2.x, compiled with glibc 2.1.x, tested on Debian 2.2, OpenLinux 2.4, RedHat 6.2 and 7.0, SuSE 6.4 and 7.0, TurboLinux 6.1

Kernel 2.4.x, compiled with glibc 2.1.x, tested on RedHat 7.x and 8.0, and SuSE 7.0, and RedHat Linux Advanced Server 2.1

Clustermatic Linux 3.0 Kernel 2.4.x, compiled with glibc 2.2.x, tested on RedHat 8.0

Scyld Linux, Kernel 2.4.x, compiled with glibc 2.2.x.

SGI

SGI Altix systems running Linux Kernel 2.4.x compiled with glibc 2.2.x and SGI Propack 2.2 and higher

Page 14: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

Key Features and Benefits Platform LSF HPC

Page 15: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 200315

Key Features

Optimized Application, System and Hardware Performance

Enhanced Accounting, Auditing & Control

Commercial Grade System Scalability & Reliability

Extensive Hardware Support

Comprehensive, Extensible and Standards-based Security

Page 16: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 200316

Key Features – Platform LSF HPC

Optimized Application, System and Hardware Performance

Enhanced Accounting, Auditing & Control

Commercial Grade System Scalability & Reliability

Comprehensive, Extensible and Standards-based Security

Page 17: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 200317

Adaptive Interconnect Performance Optimization

Scheduling that takes advantage of unique interconnect properties

IBM SP Switch at the POE software level

RMS on AlphaServer SC (Quadrics)

SGI topology hardware graph

Out-of-the-box functionality without any customization required

Marnie Biles
Updated
Page 18: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 200318

Generic Parallel Job Launcher

Generic support for all different types of Parallel Job Launchers

LAMMPI, MPICH-GM, MPICH-P4, POE, SCALI, CHAMPION PRO, etc

Customizable for any vendor or publicly available parallel solution

Control - ensuring no jobs can escape the workload management system

Marnie Biles
Updated
Page 19: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 200319

HPC Workload Scheduling

Dynamic load balancing supporting heterogeneous workloads

IBM SP switch aware scheduling

Scheduling of parallel jobs

Number of CPUs, min/max, node span

Backfill on processor & memory

Processor & memory reservation

Topology aware scheduling

Exclusive scheduling

Advance Reservation

Fairshare, Preemption

Accounting

Page 20: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 200320

Intelligent Scheduling Policies

Fairshare (User & Project-based)

Ensure job resources are used for the right work

Guarantees resource allocation among users and projects are met

Co-ordinate access to the right number of resources for different users and projects according to pre-defined shares

Differentiation

Hierarchal & guaranteed

Policy-based Preemption

Maximizes throughput of high priority critical work based on priority and load conditions

Prevents starvation of lower priority work

Differentiation

Platform LSF supports multiple preemption policies

Goal-oriented SLA driven policies

Based on customer SLA driven goals: Deadline, Velocity, Throughput

Guarantees projects are completed on time

Reduces projects and administration costs

Provides visibility into the progress of projects

Allows the admin focus on “What work and When” needs to be done, not “how” the resources are to be allocated

Inte

llig

ent

Sch

edu

ler

Fairshare

Preemption

Resource Reservation

Advance Reservation

SLA SchedulingService Level

Agreement

MultiCluster

Other Scheduling

Modules

Plugin Schedulers

License Scheduling

Page 21: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 200321

Advanced Self-Management

Flexible, Comprehensive Resource Definitions

Resources defined on a node basis across an entire cluster or subset of the nodes in a cluster

Auto-detectable or user defined resources

Adaptive membership – nodes join and leave Platform LSF clusters dynamically and automatically without administration effort

Dynamic or static resources

Job Level Exception Management

Exception-based error detection to take automatic, configurable, corrective actions

Increased job reliability & predictability

Improved visibility on job and system errors & reduced administration overhead and costs

Automatic Job Migration and Requeue

Automatically migrate and requeue jobs based on policies in the event of host or network failures

Reduce user and administrator overhead in managing failures & reduce risk of running critical workloads

Master Scheduler Failover

Automatically fail over to another host if the master host is unavailable

Continuous scheduling service and execution of jobs & eliminate manual intervention

Page 22: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 200322

Backfill

Policy configured at the queue level and applies to all jobs in a queue

Smaller sequential jobs are ‘backfilled’ behind larger parallel jobs

Improves hardware utilization

Users provided with an accurate time when their job will start

Marnie Biles
Updated
Page 23: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

Key New Feature & BenefitsPlatform LSF V6.0

Page 24: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 200324

Feature Overview

OGSI Compliance

Goal-Oriented SLA-Driven Scheduling

License-Aware Scheduling

Job-Level Exception Management (Self Management Enhancement)

Job Group Support

Other Scheduling Enhancements

Queue-Based Fairshare

User Fairshare by Queue Priority

Job Starvation Prevention plug-in

Page 25: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 200325

Feature Overview (Cont.)

HPC Enhancements

Dynamic ptile Enforcement

Resource Requirement Specification for Advance Reservation

Thread Limit Enforcement

General Parallel Support

Parallel Job Size Scheduling

Job Limit Enhancements

Non-normalized Job Run Limit

Resource Allocation Limit Display

Administration and Diagnostics

Scheduler Dynamic Debug

Administrator Action Messages

Page 26: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 200326

Goal-Oriented SLA-Driven Scheduling

What is it?

A new scheduling paradigm.

Unlike current scheduling policies based on configured shares or limits, SLA-driven scheduling is based on customer provided goals:

Deadline based goal: Specify the deadline for a group of jobs.

Velocity based goal: Specify the number of jobs running at any one time.

Throughput based goal: Specify the number of finished jobs per hour.

This scheduling policy works on top of queues and host partitions.

Benefits

Guarantees projects are completed on time according to explicit SLA definitions.

Provides visibility into the progress of projects to see how well projects are tracking to SLAs

Allows the admin focus on “What work and When” needs to be done, not “how” the resources are to be allocated.

Guarantees service level deliveries to the user community, reduces the risks of projects and administration cost.

Page 27: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 200327

User case

Problem: we need to finish all simulation jobs before 15:00pm.

Solution: Configure a deadline service class in lsb.serviceclasses file.

Begin ServiceClass

NAME=simulation

PRIORITY=100

GOALS = [deadline timeWindow (13:00 – 15:00)]

DESCRIPTION = A simple deadline demo

End ServiceClass

Submitting and monitoring jobs

$bsub –sla simulation –W 10 –J A[1-50] mySimulation

$date;bsla

Wed Aug 20 14:00:16 EDT 2003

SERVICE_CLASS_NAME: simulation

GOAL: DEADLINE ACTIVE_WINDOW: (13:00 – 15:00)

STATUS: Active:Ontime

DEAD_LINE: (Wed Aug 20 15:00)

ESTIMATED_FINISH_TIME: (Wed Aug 20 14:30)

Optimum Number of Running Jobs: 5

NJOBS PEND RUN SSUSP USUSP FINISH

50 25 5 20

Page 28: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 200328

Job-Level Exception Management (Self Management Enhancement)

What is it?

Platform LSF can monitor the exception behavior and take action accordingly.

Benefits

Increased reliability of job execution

Improved visibility on job and system errors

Reduced administration overhead and costs

How it works

Platform LSF V6 handles following exceptions:

“Job eating” machine (or “black-hole” machine): for some reason, jobs keep exiting abnormally on a machine (e.g. no processes, mount daemon dies, etc.)

Job underrun (job run time less than configured minimum time)

Job overrun (job run time more than configured maximum time)

Job run idle (job run without cpu usage increasing).

Page 29: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 200329

Job Starvation Prevention Plug-in

What is it?

External scheduler plug-in allows users to define their own equation for job priority

Benefits

Low priority work is guaranteed to run after ‘waiting’ for a specified time ensuring that the job does not wait forever (i.e. starvation).

How it works

By default, the scheduler provides the following calculation

Job priority =A * (q_priority) *MIN(1, int(wait_time/T0))

* (B*requested_processors+MAX(C*wait_time*(1+1/run_time),D)

+E*requested_memory)

Where A, B, C, D, E are coefficients. T0 is the grace period. Default run_time= INFINIT

Admin can define different coefficients for each queue with the following format:

MANDATORY_EXTSCHED=JOBWEIGHT[A=val1; B=val2; …]

Page 30: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 200330

Resource Requirement Specification For Advance Reservation

What is it?

Enable users to select the hosts for advance reservation based on the resource requirement.

Benefit

More flexible to reserve the host slots for the mission critical job.

How it works

brsvadd command supports select string: brsvadd –R “select[type==LINUX]” –n 4 –u xwei –b 10:00 –e 12:00

Page 31: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 200331

Key Features – Platform LSF HPC

Enhanced Accounting, Auditing & Control

Optimized Application, System and Hardware Performance

Commercial Grade System Scalability & Reliability

Comprehensive, Extensible and Standards-based Security

Page 32: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 200332

Job Termination Reasons

Accounting log with detailed audit & error information for every job in the system

Indicates why a job was terminated

Difference between an abnormal termination or caused by Platform LSF HPC

Marnie Biles
Updated
Page 33: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 200333

Key Features – Platform LSF HPC

Optimized Application, System and Hardware Performance

Enhanced Accounting, Auditing & Control

Comprehensive, Extensible and Standards-based Security

Commercial Grade System Scalability & Reliability

Page 34: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 200334

Enterprise Proven

Running on several of the top 10 supercomputers in the world on the “TOP500” (#3,5,9,11)

More than 250,000 licenses in use spanning 1,500 customer sites

Scales to over 100 clusters, 200,000 CPUs and 500,000 active jobs per cluster

11+ years experience in distributed & grid computing

Risk free investment – proven solution

Commercial production quality

Marnie Biles
Updated
Page 35: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 200335

Key Features – Platform LSF HPC

Optimized Application, System and Hardware Performance

Enhanced Accounting, Auditing & Control

Commercial Grade System Scalability & Reliability

Comprehensive, Extensible and Standards-based Security

Page 36: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 200336

Comprehensive, Extensible, Standards-based Security

Scalable scheduler architecture

Multiple scheduler plug-in API support

External executable support

Web GUI

Open source components

Risk free investment – proven solution

Commercial grade

Scalability and flexibility as a business grows

Page 37: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

How It Works Platform LSF HPC

Page 38: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 200338

Master Host Election Process

Host 2 Host N

/dev/kmem /dev/kmem

Host 1

/dev/kmem

SBDSBD SBDSBDSBDSBD

LIMLIM LIMLIMLIMLIM

Exchange load information

Master

MBDMBD

MBSCHDMBSCHD

Master announcement

Am I the master?

Page 39: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 200339

Platform LSF Daemons

Host 2 Host N

/dev/kmem /dev/kmem

Host 1

/dev/kmem

SBDSBD SBDSBDSBDSBD

LIMLIM LIMLIMLIMLIM

RESRES RESRESRESRES

Exchange load information

Master

MBDMBD

MBSCHDMBSCHD

MELIMMELIM MELIMMELIMMELIMMELIM

PIMPIM PIMPIMPIMPIM

SELIMSELIM SELIMSELIMSELIMSELIM

Page 40: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 200340

Grid-enabled, Scalable Architecture

Open, modular plug-in schedulers scale

with the growth of your business

Page 41: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 200341

Multiple Scheduling Modules

Pre-Processing

Matching / Limits

Order / Allocation

Post-Processing

Internal Module

Pre-Processing

Matching / Limits

Order / Allocation

Post-Processing

...

...

...

...

Add-onModule 1

Pre-Processing

Matching / Limits

Order / Allocation

Post-Processing

Add-onModule N

• Vendor specific matching policies (without changing the existing scheduler

• Support for external scheduler

Page 42: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 200342

Maui Integration

MBD

SCH_FM

RMGetInfo

Post-Processing

Pre-processing

Order jobs

UIProcessClients

QueueScheduleSJobsQueueScheduleRJobsQueueScheduleIJobs

QueueBackFill

Job, Host, Res Info

Decisions and ack

Sync

MAUI PluginEvent Handle

(wait until GO event)

MAUIScheduler

Page 43: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

Linux-specific Solutions

Page 44: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 200344

Controlling an MPI job

On a distributed system (Linux cluster) there are many problems to address:

1. Job launch across multiple nodes

2. Gather resource usage while job executes

3. Propagate signals

4. Job “clean-up” to eliminate “dangling” MPI processes

5. Comprehensive job accounting

Page 45: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 200345

Resource manager

Resource manager

submitsubmit

mpirunmpirun

a.outa.out a.outa.out

JobscriptJobscript

“traditional” MPI sequence

Joblauncher

Joblauncher

Page 46: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 200346

Platform LSF HPC for Linux - MPICH-GM

mbatchdmbatchd

sbatchdsbatchd

Job scriptJob script

mpirunmpirun

TSTS

resres

gmmpirun_wrappergmmpirun_wrapper

a.outa.out

TSTS

resres

PIMPIM

bsubbsub

a.outa.out

pampam

resres

PIMPIM

Page 47: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 200347

Execution Host H1

PIM LIM

master LIM

Master Host

lsblib

LIM PIM

bsub

SBD

MBD SBD

SBD child

pam

high

med

hpc_queue

Queues

MBSCHD

Submission host

H2

esub

elim

elim

Mpirun.ch_gm

TaskStarter

a.out: process 1

TaskStarter

a.out: process 2

Gmmpirun_wrapper

Root resRoot res

LIM

elim

Set LSF_PJL_TYPETo mpich_gm

Report resource availability

Signals and rusage collection

Report resource availability

Hostname & pid

Hostname & pid

rsh

Platform LSF HPC for Linux/Myrinet - MPICH_GM

Mpirun.lsf

Page 48: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 200348

Scyld Beowulf Integration

• Scyld Beowulf handles the systems management challenge effectively

• No OS to distribute / synchnronize• Central point of control from master• Single process space makes it appear as large SMP

• Platform integrates with Scyld treating cluster as SMP and allocating resources

• Integrate with mpirun, mpprun or bpsh to start tasks• Collect resource usage from BPROC• Collect load information via BPROC APIs• Singe user interface across Sycld & non-Scyld env.

Page 49: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 200349

Platform LSF HPC for Linux/BProc

Bproc Front-end Node

PIM LIM

master LIM

Master Host

lsblib

LIM PIM

bsub

SBD

MBDSBD

high

med

low

Queues

1A

1B

1C

2

3

4

6B

6C

MBSCHD

5

Submission host

Job file

H3

Res

SBD child –exec() res

allocated nodes

Computing Nodes

Bpsh/mpirun

User Job Processes

esub

Modify submission options

Page 50: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

© Platform Computing Inc. 200350

More info at:

• www.platform.com/customers

• www.platform.com/barriers

Page 51: 1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004 wdesalvo@platform.com

Q & A