kento aida, tokyo institute of technology grid challenge - programming competition on the grid -...
TRANSCRIPT
Kento Aida, Tokyo Institute of Technology
Grid Challenge - programming competition on the Grid -
Kento Aida
Tokyo Institute of Technology
22nd APAN Meeting in Singapore
Kento Aida, Tokyo Institute of Technology
What is Grid Challenge?
programming competition to develop high-performance programs on the GridThe organizer operates a Grid testbed.Participants develop/run programs on the
testbed.a special event in the Annual Symposium on
Advanced Computing Systems and Infrastructures (SACSIS)
history1st Grid Challenge in SACSIS 20052nd Grid Challenge in SACSIS 2006
Kento Aida, Tokyo Institute of Technology
Category
compulsoryprogramming competition on the Grid testbedsolving the problem provided by the organizer
Graph Partitioning Problem
students (university and high school)
freegiving opportunities to perform experiments on
the Gridpresentations during the conferencestudents, engineers and researchers
Kento Aida, Tokyo Institute of Technology
Compulsory
Graph Partitioning Problemfor given undirected graph G(V,E), |V| = 2nL and R are disjoint partitions generated by equally dividing G, where |L| = |R|.Find partition that minimizes the number of edges with one endpoint in L and the other in R.
2
3
4
5
61
L R
Kento Aida, Tokyo Institute of Technology
Compulsory (cont’d)
qualifying runs (3 weeks)Solve early!
to find a solution within a given thresholdshared resourcesproblem size: |V| = 500 - 1500
final runs (2 weeks)Solve fast!
dedicated time slots for finalists (2.5h per a team)to find a solution within a given period (10 min)A finalist with the best solution will be a winner!problem size: |V| = 30000 - 35000
Kento Aida, Tokyo Institute of Technology
Free
experiments of research projects (1 month)shared resources
projectstools
a monitoring tool, a message passing system, a programming tool, volunteer computing
applicationsphysics simulation, bio informatics, simulation of
diesel engine, optimization problems
Kento Aida, Tokyo Institute of Technology
Participants
D, 2
M, 12U, 6
H, 1
compulsory free
D, 2
M, 5
U, 1
Kento Aida, Tokyo Institute of Technology
Testbed
Grid Challenge FederationAISTTokyo Institute of TechnologyThe University of TokyoDoshisha University
more than 1,200 CPUs
Kento Aida, Tokyo Institute of Technology
Resources
collection of PC clustersspec of a PC cluster
a gateway nodegateway, compiling
computing nodescomputation
global IP address/private IP addressNFS
“/home” is shared among nodes
Kento Aida, Tokyo Institute of Technology
Resources (cont’d)name site compt. node #compt. node
(#CPUs)
F32 AIST(Tsukuba)
Xeon 3GHz x2, 4GB mem.,1000BASE-T
128(256)
SAKURA Opteron 1.8GHz x2, 3GB mem., 1000BASE-T
16(32)
DIS TITECH(Yokohama)
Athlon MP 2000+ 1.6GHz x2, 512MB mem. 100BASE-TX
50(100)
PrestoIII TITECH(Tokyo)
Opteron 246/242 2/1.6GHz x2, 4/3/2GB mem. 1000BASE-T
103(206)
Tau U. Tokyo(Tokyo)
Xeon 2.4/2.8GHz x2, 2GB mem., 1000BASE-T
175(350)
Chikayama U. Tokyo(Chiba)
Xeon 2.4GHz x2, 2GB mem., 1000BASE-T
64(128)
Xenia Doshisha U.(Kyoto)
Xeon 2.4GHz x2, 1GB em. 100BASE-TX
63/126
Kento Aida, Tokyo Institute of Technology
Internet Connection
TsukubaWAN
F32
SAKURA
PrestoIII
Chikayama
Tau
DIS
SINETXenia
WIDE
Kento Aida, Tokyo Institute of Technology
Software
Grid middlewareGlobus Tool Kit 2.4
batch queueing systemSun Grid Engine, PBS
remote process invocationSSH, GXP
monitoringGanglia
programmingMPICH 1.2.7, Ninf-G 2.4
Kento Aida, Tokyo Institute of Technology
GXP
shell for distributed multi-cluster environmentfast simultaneous command submissionsparallel job pipesinteractive selection of nodes to execute
commandsno cumbersome per-node operations!
installation and deploymentinvocation of parallel processesmonitoring, trouble diagnosis, debugging dead processes clean-up
http://www.logos.ic.i.u-tokyo.ac.jp/phoenix/gxp_quick_man.shtml
Kento Aida, Tokyo Institute of Technology
Ninf-G
reference implementation of GridRPCGridRPC : a simple RPC-based programming
model for the GridClient invokes remote libraries installed on remote
servers on the Grid.utilizing task parallelism
http://ninf.apgrid.org/
server
librarylibrary
server
librarylibrary
data
resultdata
result
client
clientprogram
serverprogram
grpc_call(…)
Kento Aida, Tokyo Institute of Technology
Ganglia
a distributed monitoring tool for high-performance computing systems such as PC clusters and GridsCPU loadmemory usagenetwork traffic
http://ganglia.sourceforge.net/
Kento Aida, Tokyo Institute of Technology
Operation
The testbed is operated by volunteers!researchers/technical staff/students
What we need to doinstallation and its training for studentsuser managementjob management
Kento Aida, Tokyo Institute of Technology
User Management
local accountthe same UID and login name for a user on all si
tesremote login via ssh
public key
Globus accounttemporal CA for the Grid Challenge
Kento Aida, Tokyo Institute of Technology
Job Management
interactive or batchAll sites provide both environment for job
execution.
dedicated slotFinalists are assigned
dedicated slots for their application runs.
the gentlemen’s agreement
Kento Aida, Tokyo Institute of Technology
Troubles …
computing nodesOS hang up, troubles on hard disc drives
power supplyfailure of balancing power supply
serverstroubles on NFS, batch queueing systems
monitoringtroubles to collect monitoring data on ganglia
Kento Aida, Tokyo Institute of Technology
Troubles … (cont’d)
jobs being out of controlwaste of CPU/memory resources by jobs being
out of control
dedicated slotsjobs running beyond its slot.
Kento Aida, Tokyo Institute of Technology
Operational Issue
trouble on computing nodesmonitoring tools to identify computing nodes
power supplycritical problem for small groups, e.g., a lab in
universitytools for power monitoringlow-power processor
serversredundancy
Kento Aida, Tokyo Institute of Technology
Operational Issue (cont’d)
user/process managementtools to control user processes
monitoring user processesdetecting unusual behaviorsuspending/killing jobs being out of control
tools for reservationreserving dedicated slots for userscontrolling user jobs
Kento Aida, Tokyo Institute of Technology
Snapshots qualifying runs
final runs
Kento Aida, Tokyo Institute of Technology
Snapshots (cont’d)
Kento Aida, Tokyo Institute of Technology
Conclusions
Grid Challenge is programming competition to develop high-performance programs on the Grid.compulsory and free categories
Grid testbed for Grid Challenge6 sites, 7 PC clusters, >1200 CPUGlobus, SGE, PBS, GXP, Ganglia, Ninf-G,
MPICH, …discussion about operational issue
tools for monitoring, power supply, user/process management
Kento Aida, Tokyo Institute of Technology
Acknowledgements
Information Processing Society of JapanSun MicrosystemsSoum Corporation Grid Consortium Japan
Kento Aida, Tokyo Institute of Technology
Thank you.