virtual mpirun

22
Virtual mpirun Jason Hale Engineering 692 Project Presentation Fall 2007

Upload: rainer

Post on 12-Feb-2016

36 views

Category:

Documents


0 download

DESCRIPTION

Virtual mpirun. Jason Hale Engineering 692 Project Presentation Fall 2007. Rational. Compute cycles = money Mimosa (250 nodes): $.06 per CPU hour Wasted CPU Cycles -> Wasted Money Wasted User Time -> Less Research Not all parallel computations run efficiently - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Virtual mpirun

Virtualmpirun

Jason HaleEngineering 692

Project PresentationFall 2007

Page 2: Virtual mpirun

Rational Compute cycles = money Mimosa (250 nodes): $.06 per CPU hour Wasted CPU Cycles ->

Wasted MoneyWasted User Time -> Less Research

Not all parallel computations run efficiently Goal of a Supercomputing Center:

Have users run on the max number of CPUS/Nodes they can utilize efficiently

Page 3: Virtual mpirun

Percentage of MCSR Jobs Using g03sub (Gaussian)

# g03sub PBS jobs, 26,691, 88%

# other jobs, 3,724, 12%

Page 4: Virtual mpirun

MCSR Initiatives to Improve Utilization

g03sub• Enhanced (virtualized?) wrapper for users submitting

Gaussian calculations

Back-end Processes to poll PBS batch scheduler to compute utilization of parallel jobs; post to DB & Web; e-mail inefficient Users

Amber Alert System

Page 5: Virtual mpirun

Average Efficiency of Parallel G03 Calculationson Scalar MCSR Systems (Redwood, Sweetgum)

56%56%57%57%58%58%59%59%60%60%61%

2005 2006 2007

Page 6: Virtual mpirun

These Systems Don’t Work for Mimosa Cluster

PBSPro can’t accumulate CPU usage times from parallel processes distributed across compute nodes

Idea: Create a monitor process that will follow parallel processes to nodes, monitor their CPU performance, and report back.

Virtualization: Users will not know about the process. They will launch a virtual mpirun (or g03sub), not realizing that is not the “real” one, and it will launch the real one along with the monitor

Page 7: Virtual mpirun

myprogram.exe

Running an MPI Program on a Cluster

myprogram.c

cc myprogram.c –o myprogram.exe

myscript.pbs

#PBS –l nodes=4

mpirun –np 4 myprogram.exe

myscript.pbs

qsub myscript.pbsVirtual mpirun

mpirun –np 4 monitor.exe &

mpirun –np 4 myprogram.exe

monitor.exe

monitor.exe

monitor.exe

monitor.exe

monitor.exe

myprogram.exe

myprogram.exe

myprogram.exe

myprogram.exe

Compute Nodes

myscript.pbsHead Node

Page 8: Virtual mpirun

Design Goals Collect CPU utilization stats on cluster calculations No changes to user end processes No significant performance degradation No side effects (Leave No Trash Behind) Monitor even non-MPI parallel codes (Gaussian 03) Generality and robustness for reuse potential

Page 9: Virtual mpirun

Components monitor (new C++ MPI program) mpirun

• New wrapper around existing mpirun • Calls existing monitor and “real” mpirun

g03sub• existing batch script to launch Gaussian jobs on cluster• MCSR’s version previously “virtualized”• modify to now call monitor program also

Page 10: Virtual mpirun

monitor.exemyprogram.exe

monitor.exemyprogram.exe

monitor.exemyprogram.exe

monitor.exemyprogram.exe

Manager Process

Worker Processes

Page 11: Virtual mpirun

monitor.exemyprogram.exe

monitor.exemyprogram.exe

monitor.exemyprogram.exe

monitor.exemyprogram.exe

Manager Process

Worker Processes

Page 12: Virtual mpirun

Worker Process Logicmonitor.exe

myprogram.exe

monitor.exemyprogram.exe

monitor.exemyprogram.exe

monitor.exemyprogram.exe

Manager Process

Worker Processes

While (NoTerminationMessageFromMaster) Sleep Wakeup Create Process Times File Read Process Times File If (ActiveProcesses) Update Process Times Data Structure SendCPUTimeMessageToMaster Else SendIdleMessageToMaster End If/ElseEnd WhileTerminate

Page 13: Virtual mpirun

Worker Process Logicmonitor.exe

myprogram.exe

monitor.exemyprogram.exe

monitor.exemyprogram.exe

monitor.exemyprogram.exe

Manager Process

Worker Processes

/tmp/ps_file

/tmp/ps_file

/tmp/ps_file

While (NoTerminationMessageFromMaster) Sleep Wakeup Create Process Times File Read Process Times File If (ActiveProcesses) Update Process Times Data Structure SendCPUTimeMessageToMaster Else SendIdleMessageToMaster End If/ElseEnd WhileTerminate

Page 14: Virtual mpirun

Worker Process Logicmonitor.exe

myprogram.exe

monitor.exemyprogram.exe

monitor.exemyprogram.exe

monitor.exemyprogram.exe

Manager Process

Worker Processes

/tmp/ps_file

/tmp/ps_file

/tmp/ps_file

While (NoTerminationMessageFromMaster) Sleep Wakeup Create Process Times File Read Process Times File Delete Process Times File If (ActiveProcesses) Update Process Times Data Structure SendCPUTimeMessageToMaster Else SendIdleMessageToMaster End If/ElseEnd WhileTerminate

Page 15: Virtual mpirun

Worker Process Logicmonitor.exe

myprogram.exe

monitor.exemyprogram.exe

monitor.exemyprogram.exe

monitor.exemyprogram.exe

Manager Process

Worker Processes

/tmp/ps_file

/tmp/ps_file

/tmp/ps_file

While (NoTerminationMessageFromMaster) Sleep Wakeup Create Process Times File Read Process Times File Delete Process Times File If (ActiveProcesses) Update Process Times Data Structure SendCPUTimeMessageToMaster Else SendIdleMessageToMaster End If/ElseEnd WhileTerminate

pid cputime123 06s 124 12s 130 29s

= 47s total

Page 16: Virtual mpirun

Worker Process Logicmonitor.exe

myprogram.exe

monitor.exemyprogram.exe

monitor.exemyprogram.exe

monitor.exemyprogram.exe

Manager Process

Worker Processes

While (NoTerminationMessageFromMaster) Sleep Wakeup Create Process Times File Read Process Times File Delete Process Times File If (ActiveProcesses) Update Process Times Data Structure SendCPUTimeMessageToMaster Else SendIdleMessageToMaster End If/ElseEnd WhileTerminate

pid cputime123 06s 124 12s 130 29s

= 47s total

47

9

Page 17: Virtual mpirun

Worker Process Logicmonitor.exe

myprogram.exe

monitor.exe

monitor.exemyprogram.exe

monitor.exemyprogram.exe

Manager Process

Worker Processes

While (NoTerminationMessageFromMaster) Sleep Wakeup Create Process Times File Read Process Times File Delete Process Times File If (ActiveProcesses) Update Process Times Data Structure SendCPUTimeMessageToMaster Else SendIdleMessageToMaster End If/ElseEnd WhileTerminate

Idle

Page 18: Virtual mpirun

Manager Process Logicmonitor.exe

myprogram.exe

Manager ProcessWhile (Active Processes)MONITOR_LOCAL_PROCESSESIf (LocalActiveProcesses) UpdateGlobalCPUTimeStructure UpdateActiveProcessesStructureElse UpdateActiveProcessStructureEndIf

ForEachSlave WaitForMessage If (CPUMessage) UpdateGlobalCPUTimeStructure UpdateActiveProcessStructure Else If (IdleMessage) UpdateActiveProcessStructure End IfEnd For

EndWhile

WKR cputime0 25s 1 35s 2 09s 3 47s

Page 19: Virtual mpirun

Test MPI Script Parallel Ultimate Virtual Collapse Program

• Reads a list of integers from a file• Distributes the integers to all available worker nodes• Each worker computers the ultimate collapse of its numbers

Control the length of processing time by:• Number of numbers in the list (1,000,000)• The size of the numbers in the list (1 to 7 digits)

Control the parallel efficiency by:• The order of the numbers in the list.

• Larger numbers grouped together – fewer nodes to most of the work• Large numbers evenly distributed – nodes do about the same work

Page 20: Virtual mpirun

Project Status Test Program is Written (Ultimate Collapse) Monitor program: Partially Complete; Some Work Remains

Sleep/WakeupCreate Process Times FileRead Process Times FileDelete Process Times FileIf (ActiveProcesses) Update Process Times Data Structure SendCPUTimeMessageToMaster Else SendIdleMessageToMaster End If/ElseTerminate

Page 21: Virtual mpirun

ps syntax from monintor.cpp

string psCommand (" ps -u " + username + " --no-headers -o pid,cputime,etime,comm,user,c,pcpu | grep -v ps | grep -v sh | grep mpirun | grep -v mon.exe | grep –v grep >> " + myFileName);

system(psCommand.c_str());

Page 22: Virtual mpirun

Example /tmp/ps_file from node

pid cputime etime comm user c pcpu

32765 00:00:00 02:32 a.out jghale 0 0.0

32764 00:00:00 02:32 a.out jghale 0 0.0

305 00:00:00 02:32 a.out jghale 0 0.0

300 00:02:31 02:32 a.out jghale 99 99.8