a hierarchical framework for cross-domain mapreduce execution · a hierarchical framework for...
TRANSCRIPT
A Hierarchical Framework for Cross-Domain MapReduce Execution
Yuan Luo1, Zhenhua Guo1, Yiming Sun1, Beth Plale1, Judy Qiu1, Wilfred W. Li 2 1 School of Informatics and Computing, Indiana University, Bloomington, IN, 47405
2 San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA, 92093
{yuanluo, zhguo, yimsun, plale, xqiu}@indiana.edu, [email protected] ABSTRACT The MapReduce programming model provides an easy way to execute pleasantly parallel applications. Many data-intensive life science applications fit this programming model and benefit from the scalability that can be delivered using this model. One such application is AutoDock, which consists of a suite of automated tools for predicting the bound conformations of flexible ligands to macromolecular targets. However, researchers also need sufficient computation and storage resources to fully enjoy the benefit of MapReduce. For example, a typical AutoDock based virtual screening experiment usually consists of a very large number of docking processes from multiple ligands and is often time consuming to run on a single MapReduce cluster. Although commercial clouds can provide virtually unlimited computation and storage resources on-demand, due to financial, security and possibly other concerns, many researchers still run experiments on a number of small clusters with limited number of nodes that cannot unleash the full power of MapReduce. In this paper, we present a hierarchical MapReduce framework that gathers computation resources from different clusters and run MapReduce jobs across them. The global controller in our framework splits the data set and dispatches them to multiple “local” MapReduce clusters, and balances the workload by assigning tasks in accordance to the capabilities of each cluster and of each node. The local results are then returned back to the global controller for global reduction. Our experimental evaluation using AutoDock over MapReduce shows that our load-balancing algorithm makes promising workload distribution across multiple clusters, and thus minimizes overall execution time span of the entire MapReduce execution.
Categories and Subject Descriptors C.2.4 [Computer-Communication Networks]: Distributed Systems – Distributed applications.
General Terms: Design, Experimentation, Performance
Keywords: AutoDock, Cloud, FutureGrid, Hierarchical MapReduce, Multi-Cluster
1. INTRODUCTION Life science applications are often both compute intensive and data intensive. They consume large amount of CPU cycles while processing massive data sets that are either in large group of small
files or naturally splittable. These kinds of applications ideally fit in the MapReduce [2] programming model. MapReduce differs from the traditional HPC model in that it does not distinguish computation nodes and storage nodes so each node is responsible for both computation and storage. Obvious advantages include better fault tolerance, scalability and data locality scheduling. The MapReduce model has been applied to life science applications by many researchers. Qiu et al. [15] describe their work to implement various clustering algorithm using MapReduce.
AutoDock [13] is a suite of automated docking tools for predicting the bound conformations of flexible ligands to macromolecular targets. It is designed to predict how small molecules of substrates or drug candidates bind to a receptor of known 3D structure. Running AutoDock requires several pre-docking steps, e.g., ligand and receptor preparation, and grid map calculations, before the actual docking process can take place. There are desktop GUI tools for processing the individual AutoDock steps, such as AutoDockTools (ADT) [13] and BDT [19], but they do not have the capability to efficiently process thousands to millions of docking processes. Ultimately, the goal of a docking experiment is to illustrate the docked result in the context of macromolecule, explaining the docking in terms of the overall energy landscape. Each AutoDock calculation results in a docking log file containing information about the best docked ligand conformation found from each of the docking runs specified in the docking parameter file (dpf). The results can then be summarized interactively using the desktop tools such as AutoDockTools or with a python script. A typical AutoDock based virtual screening consists of a large number of docking processes from multiple targeted ligands and would take a large amount of time to finish. However, the docking processes are data independent, so if several CPU cores are available, these processes can be carried out in parallel to shorten the overall makespan of multiple AutoDock runs.
Workflow based approaches can also be used to run multiple AutoDock instances; however, MapReduce runtime can automate data partitioning for parallel execution. Therefore our paper focuses on extending the MapReduce model for parallel execution of applications across multiple clusters.
Cloud computing can provide scalable computational and storage resources as needed. With the correct application model and implementation, clouds enable applications to scale out with relative ease. Because of the “pleasantly parallel” nature of the MapReduce programming model, it has become a popular model for deploying and executing applications in a cloud, and running multiple AutoDock jobs certainly fits well for MapReduce. However, many researchers still shun away from clouds for different reasons. For example, some researchers may not feel comfortable letting their data sit in shared storage space with users worldwide, while others may have large amounts of data and computation that would be financially too expensive to move
into the cloud. It is more typical for a researcher to have access to several research clusters hosted at his/her lab or institute. These clusters usually consist of only a few nodes, and the nodes in one cluster may be very different from those in another cluster in terms of various specifications including CPU frequency, number of cores, cache size, memory size, and storage capacity. Commonly a MapReduce framework is deployed in a single cluster to run jobs, but any such individual cluster does not provide enough resources to deliver significant performance gain. For example, at Indiana University we have access to IU Quarry, FutureGrid [5], and Teragrid [17] clusters but each cluster imposes limit on the maximum number of nodes a user can uses at any time. If these isolated clusters can work together, they collectively become more powerful.
Unfortunately, users cannot directly deploy a MapReduce framework such as Hadoop on top of these clusters to form a single larger MapReduce cluster. Typically the internal nodes of a cluster are not directly reachable from outside. However, MapReduce requires the master node to directly communicate with any slave node, which is also one of the reasons why MapReduce frameworks are usually deployed within a single cluster. Therefore, one challenge is to make multiple clusters act collaboratively as one so it can more efficiently run MapReduce. There are two possible approaches to address this challenge. One is to unify the underlying physical clusters as a single virtual cluster by adding a special infrastructure layer, and run MapReduce on top of this virtual cluster. The other is to make the MapReduce framework directly working with multiple clusters without needing additional special infrastructure layers.
We propose a hierarchical MapReduce framework which takes the second approach to gather isolated cluster resources into a more capable one for running MapReduce jobs. Kavulya et al. characterize MapReduce jobs into four categories based on their execution patterns: map-only, map-mostly, shuffle-mostly, and reduce-mostly, and also find that 91% of the MapReduce jobs they have surveyed fall into the map-only and map-mostly categories [10]. Our framework partitions and distributes MapReduce jobs from these two categories (map-only and map-mostly) into multiple clusters to perform map-intensive computation, and collects and combines the outputs in the global node. Our framework also achieves load-balancing by assigning different task loads to different clusters based on the cluster size, current load, and specifications of the nodes. We have implemented the prototype framework using Apache Hadoop.
The rest of the paper is organized as follows. Section 2 presents some related works. Section 3 gives an overview of our hierarchical MapReduce framework. Section 4 presents more details on the multiple AutoDock runs using MapReduce. Section 5 gives experiment setup and result analysis. The conclusion and future work are given in Section 6.
2. RELATED WORKS Researchers have put significant efforts to the easy submission and optimal scheduling of massive parallel jobs in clusters, grids, and clouds. Conventional job schedulers, such as Condor [12], SGE [6], PBS [8], LSF [23], etc., aim to provide highly optimized resource allocation, job scheduling, and load balancing, within a single cluster environment. On the other hand, grid brokers and metaschedulers, e.g., Condor-G [4], CSF [3], Nimrod/G[1], GridWay [9], provide an entry point to multi-cluster grid environments. They enable transparent job submission to various distributed resource management systems, without worrying about
the locality of execution and available resources there. With respect to the AutoDock based virtual screening, our earlier efforts presented at National Biomedical Computation Resource (NBCR) [14] Summer Institute 2009, addressed the performance issue of massive docking processes by distributing the jobs to the grid environment. We used the CSF4 [3] meta-scheduler to split docking jobs to heterogeneous clusters where these jobs were handled by local job schedulers including LSF, SGE and PBS.
Clouds give users a notion of virtually unlimited, on-demand resources for computation and storage. Attributed to its ease of executing pleasantly parallel applications, MapReduce has become a dominant programming model for running applications in a cloud. Researchers are discovering new ways to make MapReduce easier to deploy and manage, more efficient and scalable, and also more able to accomplish complex data processing tasks. Hadoop On Demand (HOD) [7] uses the TORQUE resource manager [16] to provision and manage independent MapReduce and HDFS instances on shared physical nodes. The authors of [21] have identified some fundamental performance limitation issues in Hadoop and in the MapReduce model in general which make job response time unacceptably long when multiple jobs are submitted; by substituting their own scheduler implementation, they are able to overcome these limitations and improve the job throughput. CloudBATCH [22] is a prototype job queuing mechanism for managing and dispatching MapReduce jobs and commandline serial jobs in a uniform way. Traditionally a cluster must separate MapReduce-enabled nodes because they are dedicated to MapReduce jobs and cannot run serial jobs. But CloudBATCH uses HBase to keep various metadata on each job and also uses Hadoop to wrap commandline serial jobs as MapReduce jobs, so that both types of jobs can be executed using the same set of cluster nodes. The Map-Reduce-Merge is extended from the conventional MapReduce model to accomplish common relational algebra operations over distributed heterogeneous data sets [20]. In this extension, the Merge phase is a new concept that is more complex than the regular Map and Reduce phases, and requires the learning and understanding of several new components, including partition selector, processors, merger, and configurable iterators. This extension also modifies the standard MapReduce phase to expose data sources to support some relational algebra operations in the Merge phase.
Sky Computing [11] provides end user a virtual cluster interconnected with ViNe [18] across different domains. It aims to bring convenience by hiding the underlying details of the physical clusters. However, this transparency may cause unbalanced workload if a job is dispatched over heterogeneous compute nodes among different physical domains.
Our hierarchical MapReduce framework, aims to enable map-only and map-most jobs to be run across a number of isolated clusters (even virtual clusters), so these isolated resources can collectively provide a more powerful resource for the computation. It can easily achieve load-balance because the different clusters are visible to the scheduler in our framework.
3. HIERARCHICAL MAPREDUCE The hierarchical MapReduce framework we present in this paper consists of two layers. The top layer has a global controller that accepts user submitted MapReduce jobs and distributes them across different local cluster domains. Upon receiving a user job, the global controller divides the job into sub-jobs according to the capability of each local cluster. If the input data has not been deployed onto the cluster already, the global controller also
pthcfthecr
AsfrwMcplebscs–rfomac
3FMgtrrthMma
Wthainfrnbjo
partitions input hem to these c
clusters, the glofinal reduction uhe user. The bo
each receives sucontroller, perfoesults back to th
Although on thesimilar to the Mframework is verwork section, thMerge model iscomplex than programmers imearn this new co
but also need to msource. Our frconventional Masupply two Redu– instead of juequirement is t
formats of the loof the global Redmap-only, the prand the global coclusters and plac
3.1 ArchiteFigure 1 is a higMapReduce framglobal controllerransferer, a weducer. The botthe distributed lo
MapReduce masmanager. The coaccessible from t
Figure 1.
When a user subhe job schedule
assigns them toncluding the cur
from each local nodes making ubalance by ensurob in approxim
data proportionclusters. After bal controller c
using the global rottom layer consub-jobs and inpuorms local Maphe global control
e surface our fraMap-Reduce-Mer
ry different in nahe Merge phase s a new concep
the conventiomplementing jobs
oncept along wimodify the Map
ramework, on thap and Reduce, ucers – one localust one for the that the programocal Reducer outducer input key/vrogrammer does ontroller simplyes them under a
ecture gh-level architecmework. The tor, which consi
workload collecttom layer consisocal MapReducster node with ompute nodes inthe outside.
. Hierarchical M
bmits a MapReder splits the jobo each local clrrent workload r
cluster, as welup each cluster.ring that all clust
mately the same
nally to the subthe jobs are al
collects the outpreducer which is
sists of multiple ut data partitionpReduce compuler.
amework may arge model preseature. As discus
introduced in pt which is difonal Map ans under this modth the componen
ppers and Reduche other hand, and a programmReducer, and onregular MapR
mmer must matput keys/value value pairs. Hownot need to sup
y collects the macommon directo
cture diagram oop layer in our sts of a job stor, and a usests of multiple cle jobs, where ea workload re
nside each of th
MapReduce Arc
duce job to the b into a numberluster based onreported by the wll as the capabi. This is done ters will finish thtime. The glob
b-jobs, and sendll finished on aputs to perform s also supplied blocal clusters ths from the glob
utation and send
appear structuralented in [20], ossed in the relatethe Map-Reducfferent and mod Reduce, andel must not onnts required by ers to expose dastrictly uses th
mer just needs ne global Reduc
Reduce. The onake sure that thpairs match tho
wever, if the job pply any reducerap results from aory.
of our hierarchicframework is th
scheduler, a dae-supplied globlusters for runnineach cluster haseporter and a johe cluster are n
chitecture
global controller of sub-jobs ann several factorworkload reportility of individuto achieve loa
heir portion of thbal controller al
ds all
a by hat bal ds
lly ur ed
ce-ore nd
nly it,
ata he to
cer nly he se is
rs, all
cal he ata bal ng
a ob
not
er, nd rs, ter ual ad-he so
partitioinput dtransferconfiguAs soojob schthat cluis verycontrolthe timto the efficienget the get dofinishedwill trareceivinreducerthe orig
3.2 PThe pframewcomputGlobal from syntactReducean inplikewisan inteproducpairs. clustersusing tfunctiothe locthe Glo
Table
Func
R
Glob
Figure among diagramGlobal clustersnumberoccur, a(key/va1, and t(globalalso Mconsumintermepairs arthe lockey wit
ons the input dadata have not rer would transfuration files witon as the data trheduler at the gluster to start they expensive, weller to transfer d
me spent for trancomputation timnt and effective tfull benefit of p
ominated by dad on a local cluansfer the outpng all the outpur will be invokedginal job is map-
Programminprogramming mwork is the “Mtations are expreReduce. We usethe “local” R
tically, a Globaer. The Mappeput pair and pse, the Reducer, ermediate inputed by the Map tBoth the Mapp
s. The Global Rthe output from
ons and also the ical Reducers ouobal Reducer inp
e 1. Input and o
ction Name
Map
Reduce
bal Reduce
2 uses a tree-likthe Map, Redu
m, the root nodReduce takes
s that perform thrs shown in Figand the arrows ialue pairs) flow.then the input kel controller) to th
Map tasks are laumes an input ediate key/value
are passed to thecal clusters. Eacth a set of corre
ata in proportionbeen deployed
fer the user suppth the input datransfer finishes lobal controller e local MapRedu
recommend thadata when the siznsferring the datme. For large dto deploy them bparallelization anata transfer. Afster, if the appli
put back to theut data from ald to perform the-only.
ng Model model of our
Map-Reduce-Globessed as three fue the term “Glob
Reducer, but al Reducer is
er, just as a convproduces an int
just as a convet key and a setask, and outputsper and the RedReducer is execum the local clusinput and output
utput keys/value put key/value pa
output types of MReduce functi
Input , , , … ,, , … ,ke structure to shuce, and Globalde is the globaplace, and the
the Map and Regure 2 indicate tindicate the direc A job is submiey/value pairs arhe child nodes (l
unched at the lockey/value pair
e pairs. In Stepe Reduce tasks, ch Reduce task esponding value
n to the sub-job d before-hand. plied MapReduceta partitions to tfor a particular notifies the job
uce job. Since dat users only usze of input data ta is insignificandata sets, it woubefore-hand, so tnd the overall timfter the local sication requires, e global controlll local clusterse final reduction
hierarchical bal Reduce” munctions: Map, Rbal Reduce” to dconceptually ajust another c
entional Mappertermediate key/ntional Reducer
et of corresponds a different set oducer are execututed on the globsters. Table 1 lt data types. Thpairs must mat
irs.
Map, Reduce, aions
O
how the data flol Reduce functial controller on
leaf nodes repeduce functions. the order in whictions in which titted into the sysre passed from thlocal clusters) incal clusters wherr and producesp 3, the set of iwhich are also consumes an i
s, and produces
sizes if the The data
e jar and job the clusters. cluster, the manager of
data transfer e the global is small and nt compared uld be more that the jobs me does not
sub-jobs are the clusters
ller. Upon , the global task, unless
MapReduce model where Reduce, and distinguish it as well as conventional r does, takes /value pair; r does, takes ding values of key/value ted on local al controller lists these 3 he formats of tch those of
and Global
Output , , ,
ow sequence ion. In this n which the present local
The circled ch the steps the data sets stem in Step he root node n Step 2, and re each Map s a set of intermediate launched at
intermediate yet another
saRcRS
Tjuhcdcminecc
3Tah
Tspcow
Insrucadato
WaswrubaMb
ao∈a
set of key/value pare send back tReduce task. Thecorresponding vaReducers, perforStep 5.
Theoretically, theust two hierarch
have more depthcontrollers similadivide its assignclusters. But formore than two lancrease the com
each additional lclusters availablecreate a broader b
3.3 Job SchThe main challenamong each lochow the datasets
The input datasesubmitted by thepre-deployed on catalog to the uson the global cowhen partitioning
n this paper, wsubmitted by theun separate sub
consuming and automatically codataset using usand divides the do each cluster.
We make the aapplication are csame amount of we will see in unning multiple
behavior. The scas follows. LetMappers that canbe the number badded for executof CPU Cores o∈ 1, . . . , n . Weassigns to each c
pairs as output. to the global ce Global Reducalues that were orms the computa
e model we preshical layers, i.e. h by turning thar to the global ed jobs and runr all practical puayers for the for
mplexity as well ayer. If a researe, it is most likebottom layer tha
Figure 2. Progr
heduling andnge of our workal MapReduce are partitioned.
et for a particulae user to the glob
the local clusteser who runs thentroller takes ing the datasets an
we focus on the e user. If the usb-jobs on diffeerror-prone. O
ount the total ner-implemented dataset and assig
assumption that computation inte
time to run – ththe next sectio
AutoDock instacheduling algoritt n be run concurr
of Mappers be the number otion on n , whe
e also use to dore, that is,
In Step 4, the locontroller to pere task takes in aoriginally producation, and produ
sent can be extenthe tree structur
he leaf clusters controller and e
n them on its owurposes, we do nreseeable futureas the overhead
rcher has a largeely more efficiean to increase the
ramming Mode
d Data Partk is how to balan
cluster, which
ar MapReduce jbal controller beers and is exposee MapReduce jo
nto considerationnd scheduling the
situation whereser manually splerent clusters, iOur global contnumber of recoInputFormat an
gns the correct n
all map tasks ensive and take ahis is a reasonabon that applyinances displays exthm we use for be the maximrently on currently runniof available Map; be
ere i is the clusdefine how many
ocal reduce outprform the Globa key and a set ced from the locuces the output
nded to more thare in Figure 2 cainto intermediaach would furth
wn set of childrenot see a need f, because it coud introduced wie number of smaent to use them e depth.
el
titioning nce the workloais closely tied
job may be eithefore execution, ed via a metadaob. The scheduln the data localie job.
e input dataset lit the dataset ant would be timtroller is able ords in the inpnd RecordReadenumber of record
of a MapReduapproximately thble assumption
ng MapReduce xactly this kind our framework mum number
; ng on ppers that can be the total numbter number, andy map tasks a us
put bal of
cal in
an an ate her en for uld ith all to
ds to
her or
ata ler ity
is nd me to
put er, ds
ce he as to of is of
;
be ber d i ser
Normacomput
For sim
The wefactor speed, dependcomput
Let x, whicthe Maschedu
After pusing emove tlocal cl
4. AWe apAutoDoframewoutputsthe Autare ligasimple input corresp
Table
au
sum
For ouReduce
1) Mapexecutasummaconstan
2) Reduthe con
ally we set tation intensive j
mplicity, let
eight of each su is the compu
memory size, sding on the charatation intensive
∑ be the to
ch can be calculaap tasks, and uled to f
, partitioning the equation (5), wethe data items aclusters, or from l
UTODOCKpply the Mapock instances
work to prove ths of AutoGrid (o
utoDock. The keand names and
input file formrecord, which
ponds to a map ta
e 2. AutoDock M
Field
ligand_name
autodock_exe
input_files
output_dir
utodock_parame
summarize_exe
mmarize_param
ur AutoDock Me functions are im
p: The Map taskable against a sharize_result4.py nt intermediate k
uce: The Reducnstant intermedi
1 in the locajobs, so we get
ub-job can be cauting power of storage capacityacteristics of theor I/O intensive
otal number of Mated from the nu, be the for job x, so thatMapReduce jo
e number the daccordingly, eithlocal cluster to l
K MAPREDpReduce paradi
using the he feasibility of one tool in the A
ey/value pairs ofthe location of
mat for AutoDocontains 7 fi
ask.
MapReduce inp
e Pat
Ou
eters
e P
meters Su
MapReduce, the mplemented as f
k takes a ligand thared receptor, an
to output the lokey.
e task takes all iate key and sor
al MapReduce
lculated from (4each cluster, e.g
y, etc. The actue jobs, i.e., whet
Map tasks for a pumber of keys in
number of Mapt
ob to Sub-MapRta items of the der from global c
local cluster.
DUCE igm to runnin
hierarchical our approach. W
AutoDock suite)f the input of theligand files. We
ock MapReduceields shown in
put fields and de
Descriptio
Name of the l
th to AutoDock
Input files of Au
tput directory of
AutoDock para
Path to summari
ummarize script p
Map, Reduce, follows:
to run the AutoDnd then runs a Powest energy re
the values correrts the values by
(1)
clusters for
(2)
(3)
4) where the g., the CPU
ual varies ther they are
(4)
articular job n the input to p tasks to be
(5)
Reduce jobs datasets and controller to
ng multiple MapReduce We take the ) as input to e Map tasks e designed a jobs. Each n Table 2,
escriptions
on
ligand
executable
utoDock
f AutoDock
ameters
ize script
parameters
and Global
Dock binary Python script esult using a
esponding to y the energy
frlo
3oin
5Whasrintocinmlo
InccfrrcfrmbpE
TbminHHti
WHrudjoEs
C
ovvec
Inth
from low to highocal reducer inte
3) Global Reducof the local redunto a single file
5. EVALUAWe evaluate ohierarchical Mapand Shell scriptsstage-in and stageporter is a cnformation acceo make it a sep
code. Unfortunnformation we
modify Hadoop oad data by usin
n our evaluationcluster and two ccluster which hafrom outside. Aelated tasks, inc
cancellation. Thefrom outside. Semounted to each by the jobs. Futuparts, and each Eucalyptus, Nim
Tab
Cluster
Hotel Int
Alamo Int
Quarry Int
To deploy Hadobuilt-in job schmaintainability an shared directo
Hadoop programHadoop daemonimes.
We use three clHotel and Futureun Linux 2.6.1
dedicated masteobtracker) and
Each node in specifications of
Considering Aut1 per sectioon each node is version of Autoversion. The glexecution detailcomplexity.
n our experimenhe most import
h, and outputs thermediate key.
ce: The Global Rucer intermediatby the energy fr
ATIONS our model by pReduce systems. We use ssh age-out. On the lcomponent thatessed by global sparate program nately, Hadoop
need to externcode to add an
ng Hadoop Java A
n, we use severalclusters in Futures several login n
After a user logcluding job subme computation noeveral distributed
computation noureGrid partitionof which prov
mbus, and HPC.
ble 3. Cluster N
CPU
tel Xeon 2.93GH
tel Xeon 2.67GH
tel Xeon 2.33GH
oop to traditionaheduler (PBS) and performancery while store d
m (Java jar filens whereas the
lusters for evalueGrid Alamo. Ea18 SMP. Wither node (HDFother nodes arthese clusters
these cluster no
oDock being a Con 3.3 so that th
equal to the nuoDock we use iobal controller ls because our
nts, we use 6,00ant configuratio
he sorted results
Reduce finally tate key, sorts anrom low to high.
prototyping am. The system iand scp scripts local clusters’ s
exposes Hadoscheduler. Our orwithout touchin
does not enal applications,n additional daeAPIs.
l clusters includeGrid. IU Quarrynodes that are p
gins, he/she canmission, job statodes however, cd file systems (Lode for storing inns the physical cvides a different
Node Specificati
Cache s
Hz 8192K
Hz 8192K
Hz 6144K
al HPC clustersto allocate no
e, we install the data in local direces, etc.) is loadHDFS data is a
uations – IU Quach cluster has 2hin each clusterFS namenode re data nodes as has an 8-codes are listed in
CPU-intensive ae maximum numumber of cores is 4.2 which is
does not carer local job ma
00 ligands and 1on parameters is
s to a file using
akes all the valund combines the
a Hadoop baseis written in Javto finish the daide, the workloaoop cluster loariginal design w
ng Hadoop sourxpose the loa, and we had
emon that collec
ing the IU Quary is a classic HP
publicly accessibn do various jotus query and joannot be accesseLustre, GPFS) anput data accessecluster into severt testbed such
ions.
ize Memory
KB 24GB
KB 12GB
KB 16GB
, we first use thodes. To balan
Hadoop progractory, because th
ded only once baccessed multip
uarry, FutureGr1 nodes. They ar, one node is and MapRedu
and task trackerore CPU. ThTable 3.
application, we smber of map taskon the node. Th
s the latest stabe about low-levanagers hide th
1 receptor. One s ga_num_evals
g a
ues em
ed va ata ad ad
was ce ad to
cts
rry PC ble b-ob ed
are ed ral as
y
he ce
am he by ple
rid all
a ce rs. he
set ks he
ble vel he
of s -
numberprobabexperie5,000,0
Figu
Figure during trackermomennumberbeginniTowardquicklyindicatiMapReby thos
Tab
NumMapPer C
1
5
1
1
2
Test CaOur firControperformAutoDo2000 li4 for re
As is numberlinear, The tot
r of evaluationility that better
ences, the ga_nu000. We configu
ure 3: Number M
3 plots the numthe job executio
rs, so the maximnt is 20 * 8 = r of running ming and stays ds the end of jy (roughly 0 -ing that node ueduce tasks comese new tasks.
ble 4. MapReduunder dif
mber of p Tasks Cluster
Ho(seco
100 10
500 17
000 29
500 43
2000 59
ase 1: rst test case is a ller to find out
ms under diffeock in the Hadigand/receptor pesults.
reflected in Figr of map tasks regardless of thtal execution tim
s. The larger itr results may beum_evals is typiure it to 2,500,00
of running mapMapReduce ins
mber of running mon. The cluster
mum number of160. From the
map tasks quicapproximately job execution, 5). Notice the
usage ratio is lowe in, the availab
uce execution timfferent number
Execution Tim
otel onds)
Al(sec
004 8
763 1
986 2
304 4
942 5
base test case wt how each of erent numbers
doop to process pairs in each of t
gure 4, the toin test case 1
he startup overheme of the jobs ru
ts value is, thee obtained. Basically set from 200 in our experim
p tasks for an Astance
map tasks withinhas 20 data nod
f running map t plot, we can s
ckly grows to constant for a it drops to a
ere is a tail new. At this momle mappers will
me on differentr of map tasks.
me on Three Clu
lamo conds)
Q(s
821
771
2962
4251
849
without involvingour local Hadoof map task
100, 500, 1000the three clusters
otal execution tion each cluster
ead of the MapRunning on the Qu
e higher the sed on prior 2,500,000 to ments.
Autodock
n one cluster des and task tasks at any see that the 160 in the long time.
small value ear the end, ment, if new
be occupied
t clusters
usters
Quarry seconds)
1179
2529
4370
6344
8778
g the Global oop clusters s. We ran 0, 1500 and s. See Table
ime vs. the r is close to Reduce jobs. uarry cluster
isTC
TOMcecsrudpscthbsc
Tligsajo
s approximatelyThe main reasonCPUs compared
Figure 4. Loca
Test Case 2: Our second tesMapReduce jobsclusters, which iequation (4) froconstant, and i ∈shows unning beforeh
distribution on epartition the datstage the data configuration filehe local MapRe
back to the globashows the data contexts.
Figure 5. Twpartitioned da
The input datasigands. The rec
gridmap files tostored in 600approximately 5-ob configuration
y 50% slower thn is that nodes owith that of Ala
al cluster MapRdifferent numb
st case shows s with -weighteis based on the om section 3.3,∈ 1, 2, 3 for o160, ghand. Thereforeach cluster is taset (apart from
together withe to local clusteeduce executional controller formovement cos
wo-way data matasets: local M
et of AutoDockceptor is describotaling 35MB in0 separate di-6 KB large. Inn file together h
han running on Aof the Quarry clamo and Hotel.
Reduce executionber of map tasks
the performaned partitioned da
following param, we set our three clustergiven no MapRre, the weight1/3.m shared dataseh the jar exers for execution
n, the output filr the final globalst in the stage-
movement cost oMapReduce inpu
k contains 1 rebed as a set of n size, and the irectories, eachn addition, the ehas a total of 30
Alamo and Hoteuster have slow
n time based ons.
nce of executinatasets on differemeters setup. F
, where C is rs. Our calculatioReduce jobs at of map task We then equalet) into 3 piececutable and jo
n in parallel. Aftes will be stagel reduce. Figure-in and stage-o
f -weighted uts and outputs
eceptor and 600approximately 26000 ligands a
h of which executable jar an0KB in size. F
el. wer
n
ng ent or a
on are ks lly es, ob ter ed 5
out
00 20 are
is nd or
each ccontainexecutatransferdecompin.” Simfiles tocomprecontrolAs we 13.88 ttakes 2little locomparexecuti
The timMapReclustersdata min Figutime tapproxHotel aall the only 16on Qua
Figur
Test CaIn our MapRedifferentest casassignethe samof timeslower QuarryIntel(R2.93GHthat of differenfrequennumbercapabilscheduHere w2
cluster, the gloning 1 receptorable jar, and jors it to the dpressed. We callmilarly, when th
ogether with conessed into a taller. We call thiscan see from F
to 17.3 seconds t2.28 to 2.52 seconger to transferre to the relatiions.
me it takes to reduce clusters vas. The local M
movement costs (ure 6. The Hotel to finish their
ximately 3,000 mand Alamo. Thelocal results are6 seconds to finarry becomes the
re 6. Local Mapdatasets, i
ase 3: third test case,
educe jobs witnt clusters, whises 1 and 2, we ed the same numme amount of dae to finish. Athan Alamo an
y, Alamo and HR) Xeon(R) X55Hz, respectivelyprocessing timence in processinncies, therefore,r of cores forlities of each
uling policy to awe set 2.92 for Quarry. As
obal controller r file set, 200
ob configurationdestination clusl this global-to-lhe local MapRentrol files (typicaarball and transs local-to-global Figure 5, the dato finish, while tconds to finish. r the data but thively long dura
run 2000 map aries due to the d
MapReduce exec(both data stage and Alamo clusr jobs, but t
more seconds to fe Global Reducee ready in the glnish. Thus, the re bottleneck on t
pReduce turnarincluding data m
we evaluate theth -weighted
ich is based on have observed t
mber of computeata, they take sig
Among the threed Hotel. The sp
Hotel are Intel(R550 2.67GHz, any. The inverse rae match roughly.ng time is main, it is not enour load balancicore are also add CPU freque93 for Hotel, is for test case
creates a 1400 ligands diren files, all compster, where thelocal procedureduce jobs finishally 300-500KBsferred back to procedure “data
ata stage-in procthe data stage-ouThe Alamo clu
he difference is iation of local
tasks on each odifferent specificcution makespan-in and stage-ousters take similathe Quarry clufinish, about 50%e task is only inlobal controller, relatively poor pthe current job d
round time of movement cost
e performance od partitioned dthe following s
that although alle nodes and coregnificantly diffee clusters, Quarpecifications of tR) Xeon(R) E5
nd Intel(R) Xeonatio of CPU fre. So we hypothely due to the di
ugh to merely faing, and the cimportant. We
ency as a facto2.67 for A2, we again hav
MB tarball ectories, the pressed, and e tarball is “data stage-
h, the output in size) are the global a stage-out.” cedure takes ut procedure uster takes a insignificant MapReduce
of the local cation of the n, including ut) is shown ar amount of uster takes % more than nvoked after and it takes
performance distribution.
-weighted
of executing datasets on setup. From clusters are
es to process rent amount rry is much the cores on 5410 2GHz, n(R) X5570 equency and esize that the ifferent core factor in the computation
refine our r to set . Alamo, and
ve calculated
b
ato
Fslic12trrp
F
Wmsath
beforehand. Th0.35and Quarry respeo the new weigh
Table 5. Numb
Cluster
Hotel
Alamo
Quarry
Figure 7 shows tscenario. The vaigands sets are
can see from the 17.64 seconds to2.2 to 2.6 seconransfer the data elatively long d
previous test case
Figure 7. Twpartitioned da
Figure 8. Localdata
With weighted makespan, includstage-out) are shamount of time hat our refined s
160, given nohus, the wei505, and ectively. The daht. Table 5 show
ber of Map TasTime on E
Number of Map
2316
2103
1581
the data movemariations in the squite small, whigraph, the data
o finish, while thnds to finish. A
but the differenduration of local e.
wo-way data moatasets: local M
l MapReduce tuasets, including
partition, theding data movemhown in Figure to finish the locscheduler config
o MapReduce jights are 0.2635 ftaset is also parts how the datase
sks and MapRedach Cluster
p Tasks Exec
(S
ment cost in the wsize of tarball diich is smaller thstage-in proced
he data stage-ouAlamo takes a lince is also insig
MapReduce ex
ovement cost of MapReduce inpu
urnaround timedata movemen
e local MapRment costs (both8. All three clu
cal MapReduce guration improve
jobs are runnin0.386for Hotel, Alamtitioned accordinet is partitioned.
duce Execution
cution Time Seconds)
5915
5888
6395
weighted partitiofferent number han 2MB. As wure takes 12.34
ut procedure takittle bit longer
gnificant given thxecutions as in th
f -weighted uts and outputs
e of -weightet cost
Reduce executioh data stage-in anusters take similjobs. We can s
es performance b
ng 0, mo,
ng
n
on of
we to
kes to he he
ed
on nd lar ee by
balancireductisorts thprocesssecond
6. CIn thisframewclustersimplemReducefunctiocontrolmultiplfunctiocontrolcapacitWe useperformworklominimu
There afuture applicathe CPUhas larOther sto be cin our cscp, whHowevsolutionwith a can alswell inalternatalso exdata set
7. AThis wand Mprovidifeedbacto Chat
8. RE[1] Bu
arsyHP
[2] Deda(Jaht
[3] DiWLiVo10ht
ing workload amion combines pahe results. Thesing 6000 map
ds.
ONCLUSIOs paper, we hawork that can gas and run MapR
mented in this fe” model wherons: Map, Redller in our framele “local” MapR
ons, and the locller to run the Gty-aware algorithe multiple Auto
mance of our oads are well baum.
are several potework. Based
ation, our scheduU specificationsrger data sets tscheduling metr
considered. The current prototyphich may not w
ver, they can ben for remote jobmeta-scheduler,
so be switched ton heterogeneoustive to transferri
xplore the feasibts among global
CKNOWLEwork funded in pMicrosoft. Our
ing us early accck on our work.thura Herath for
REFERENCuyya, R., Abramrchitecture for a rystem in a globalPC ASIA'2000,
ean, J. and Ghemata processing onanuary 2008), 10ttp://doi.acm.org
ing, Z., Wei, X.,W. Customized Pl
ife Sciences Appolume 25, Numb0.1007/s00354-0ttp://dx.doi.org/1
mong clusters. Inartial results froe average globap tasks (ligand
ON AND FUave presented ather computatioReduce jobs acrframework adopre computationduce, and Glowork splits the dReduce clusters
cal results are rGlobal Reduce hm to balance thoDock runs as framework. Th
alanced and the
ential improvemon the computuling algorithm s. It will not be ththat data moverics such as diskremote job subm
pe are built uponwork well in a he replaced by otb submission is , e.g., CSF and No solutions that s environmentsing data explicit
bility of using a l controller and l
EDGMENTpart by the Pervspecial thanks cess to FutureG. We also wouldr discussions.
ES mson, D., Giddy,
resource managel computational China, IEEE CS
mawat, S. 2008.n large clusters. 07-113. DOI=10
g/10.1145/13274
, Luo, Y., Ma, Dlug-in Modules plications, New ber 4, 373-394, 2007-0024-6 10.1007/s00354-
n the final stageom lower-level cal reduce time d/receptor dock
UTURE WOa hierarchicalon resources froross them. The pt the “Map-Redns are expresseobal Reduce. data set and maps to run Map areturned back tofunction. We u
he workload amoa test case to ehe result show
e total makespan
ents we will adte-intensive natonly takes conshe case when an
ement becomes k I/O and netwomission and datan the combinationheterogeneous ether solutions. Oto integrate our
Nimrod/G. Dataare more scalab, such as gridftly from site to sshared file syst
local Hadoop clu
TS vasive Technoloto Dr. Geoffre
Grid resources ad like to express
J. Nimrod/G: anement and schedgrid, in: Proceed
S Press, USA, 20
. MapReduce: siCommun. ACM
0.1145/1327452452.1327492
D., Arzberger, P. in MetascheduleGeneration Com2007, DOI:
-007-0024-6
e, the global clusters and taken after
king) is 16
ORK MapReduce
om different applications duce-Global ed as three The global
ps them onto and Reduce o the global use resource ong clusters. evaluate the
ws that the n is kept in
ddress in our ture of the sideration of n application
significant. ork I/O need a movement n of ssh and nvironment.
One possible r framework a movement
ble and work ftp. As an site, we will tem to share usters.
ogy Institute ey Fox for
and valuable s our thanks
n duling dings of the 000.
implified M 51, 1
.1327492
W., Li, W. er CSF4 for
mputing
[4] Frey, J., Tannenbaum, T., Livny, M., Foster, I., Tuecke, S. 2002. Condor-G: A Computation Management Agent for Multi-Institutional Grids. Cluster Computing 5, 3 (July 2002), 237-246. DOI=10.1023/A:1015617019423 http://dx.doi.org/10.1023/A:1015617019423
[5] FutureGrid, http://www.futuregrid.org
[6] Gentzsch, W. (Sun Microsystems). 2001. Sun Grid Engine: Towards Creating a Compute Power Grid. In Proceedings of the 1st International Symposium on Cluster Computing and the Grid (CCGRID '01). IEEE Computer Society, Washington, DC, USA, 35-39
[7] Hadoop On Demand, http://hadoop.apache.org/common/docs/r0.17.2/hod.html
[8] Henderson, R. L.. 1995. Job Scheduling Under the Portable Batch System. In Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing (IPPS '95), Dror G. Feitelson and Larry Rudolph (Eds.). Springer-Verlag, London, UK, 279-294.
[9] Huedo, E., Montero, R. S., and Llorente, I. M. 2004. A framework for adaptive execution in grids. Softw. Pract. Exper. 34, 7 (June 2004), 631-651. DOI=10.1002/spe.584 http://dx.doi.org/10.1002/spe.584
[10] Kavulya, S., Tan, J., Gandhi, R., and Narasimhan, P. 2010. An Analysis of Traces from a Production MapReduce Cluster. In Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (CCGRID '10). IEEE Computer Society, Washington, DC, USA, 94-103. DOI=10.1109/CCGRID.2010.112 http://dx.doi.org/10.1109/CCGRID.2010.112
[11] Keahey, K., Tsugawa, M., Matsunaga, A., and Fortes, J. 2009. Sky Computing. IEEE Internet Computing 13, 5 (September 2009), 43-51. DOI=10.1109/MIC.2009.94 http://dx.doi.org/10.1109/MIC.2009.94
[12] Litzkow, M. J., Livny, M., Mutka, M. W. Condor - A Hunter of Idle Workstations. ICDCS 1988:104-111
[13] Morris, G. M., Huey, R., Lindstrom, W., Sanner, M. F., Belew, R. K., Goodsell, D. S. and Olson, A. J. (2009), AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility. Journal of Computational Chemistry, 30: 2785–2791. doi: 10.1002/jcc.21256
[14] National Biomedical Computation Resource, http://nbcr.net
[15] Qiu, J., Ekanayake, J., Gunarathne, T., Choi, J. Y., Bae, S. Ruan, Y., Ekanayake, S., Wu, S., Beason, S., Fox, G., Rho,
M., Tang, H., “Data Intensive Computing for Bioinformatics”, In Data Intensive Distributed Computing, IGI Publishers, 2010
[16] Staples, G. 2006. TORQUE resource manager. In Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (SC '06). ACM, New York, NY, USA, Article 8. DOI=10.1145/1188455.1188464 http://doi.acm.org/10.1145/1188455.1188464
[17] Teragrid, http://www.teragrid.org
[18] Tsugawa, M., and Fortes, J. A. B. 2006. A virtual network (ViNe) architecture for grid computing. In Proceedings of the 20th International Conference on Parallel and Distributed Processing (IPDPS'06). IEEE Computer Society, Washington, DC, USA, 148-148.
[19] Vaqué, M., Arola, A., Aliagas, C., and Pujadas, G. 2006. BDT: an easy-to-use front-end application for automation of massive docking tasks and complex docking strategies with AutoDock. Bioinformatics 22, 14 (July 2006), 1803-1804. DOI=10.1093/bioinformatics/btl197 http://dx.doi.org/10.1093/bioinformatics/btl197
[20] Yang, H., Dasdan, A., Hsiao, R., and Parker, D. S. 2007. Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data (SIGMOD '07). ACM, New York, NY, USA, 1029-1040. DOI=10.1145/1247480.1247602 http://doi.acm.org/10.1145/1247480.1247602
[21] Zaharia, M., Borthakur, D, Sarma, J. S., Elmeleegy, K., Shenker, S., and Stoica, I. Job Scheduling for Multi-User MapReduce Clusters, Technical Report UCB/EECS-2009-55, University of California at Berkeley, April 2009.
[22] Zhang, C., De Sterck, H., "CloudBATCH: A Batch Job Queuing System on Clouds with Hadoop and HBase," Cloud Computing Technology and Science, IEEE International Conference on, pp. 368-375, 2010 IEEE Second International Conference on Cloud Computing Technology and Science, 2010
[23] Zhou, S, Zheng, X., Wang, J., and Delisle, P. 1993. Utopia: a load sharing facility for large, heterogeneous distributed computer systems. Softw. Pract. Exper. 23, 12 (December 1993), 1305-1336. DOI=10.1002/spe.4380231203 http://dx.doi.org/10.1002/spe.4380231203