chanda 1

17
COSHH: A Classification and Optimization based Scheduler for Heterogeneous Hadoop Systems Aysan Rasooli a , Douglas G. Down a a Department of Computing and Software, McMaster University, L8S 4K1, Canada Abstract A Hadoop system provides execution and multiplexing of many tasks in a common datacenter. There is a rising demand for sharing Hadoop clusters amongst various users, which leads to increasing system heterogeneity. However, heterogeneity is a neglected issue in most Hadoop schedulers. In this work we design and implement a new Hadoop scheduling system, named COSHH, which considers heterogeneity at both the application and cluster levels. The main objective of COSHH is to improve the mean completion time of jobs. However, as it is concerned with other key Hadoop performance metrics, our proposed scheduler also achieves competitive performance under minimum share satisfaction, fairness and locality metrics with respect to other well-known Hadoop schedulers. Keywords: Hadoop System, Scheduling System, Heterogeneous Hadoop 1. Introduction Hadoop systems were initially designed to optimize the performance of large batch jobs such as web index con- struction [1]. However, due to its advantages, the number of applications running on Hadoop is increasing, which leads to a growing demand for sharing Hadoop clusters amongst multiple users [1]. Various types of applications submitted by different users require the consideration of new aspects in designing a scheduling system for Hadoop. One of the most important aspects which should be consid- ered is heterogeneity in the system. Heterogeneity can be at both the application and the cluster levels. Application level heterogeneity is taken into account in some recent research on Hadoop schedulers [2]. However, to the best of our knowledge, cluster level heterogeneity is a neglected aspect in designing Hadoop schedulers. In this work, we introduce a new scheduling system (called COSHH) de- signed and implemented for Hadoop, which considers het- erogeneity at both application and cluster levels. The main approach in our scheduling system is to use system information to make better scheduling decisions, which leads to improving the performance. COSHH con- sists of several components. The component which gathers system information was first introduced in [3], and is fur- ther developed in [4], which provides a means to estimate the mean job execution time based on the structure of the job, and the number of map and reduce tasks in each job. The main motivations for our scheduler are as follows: Scheduling based on fairness, minimum share requirements, and the heterogeneity of jobs and resources. In a Hadoop system, satisfying the minimum shares of users is the first critical issue. The next important issue is fairness. We design a scheduling algorithm which has two stages. In the first stage, the algorithm considers the satisfaction of the minimum share requirements for all users. Then, in the second stage, the algorithm considers fairness for all users. Most current Hadoop scheduling algo- rithms consider fairness and minimum share objec- tives without considering heterogeneity of the jobs and the resources. One of the advantages of COSHH is that while it addresses the fairness and the mini- mum share requirements, it does this in a way that makes efficient assignments, by considering the het- erogeneity in the system. The system heterogeneity is defined based on job requirements (e.g., estimated execution time) and resource features (e.g., execu- tion rate). Consequently, the proposed scheduler re- duces the average completion time of the jobs. Reducing the communication cost in the Hadoop system. The Hadoop system distributes tasks among the resources to reduce a job’s completion time. How- ever, Hadoop does not consider communication costs. In a large cluster with heterogenous resources, maxi- mizing a task’s distribution may result in overwhelm- ingly large communication overhead. As a result, a job’s completion time will be increased. COSHH considers the heterogeneity and distribution of re- sources in the task assignment. Reducing the search overhead for matching jobs and resources. To find the best matching of jobs and resources in a heterogeneous Hadoop sys- tem, an exhaustive search is required. COSHH uses classification and optimization techniques to restrict Preprint submitted to Elsevier January 13, 2014

Upload: anonymous-vwoqxi

Post on 18-Aug-2015

218 views

Category:

Documents


3 download

DESCRIPTION

fgrgfrgg

TRANSCRIPT

COSHH:AClassicationandOptimizationbasedSchedulerforHeterogeneousHadoopSystemsAysan Rasoolia, Douglas G. DownaaDepartmentofComputingandSoftware,McMasterUniversity,L8S4K1,CanadaAbstractAHadoopsystemprovides executionandmultiplexingof manytasks inacommondatacenter. Thereis arisingdemand forsharingHadoopclustersamongstvarioususers,which leadstoincreasingsystemheterogeneity. However,heterogeneityisaneglectedissueinmostHadoopschedulers. InthisworkwedesignandimplementanewHadoopscheduling system, named COSHH, which considers heterogeneity at both the application and cluster levels. The mainobjective of COSHH is to improve the mean completion time of jobs. However, as it is concerned with other key Hadoopperformance metrics, our proposed scheduler also achieves competitive performance under minimum share satisfaction,fairness and locality metrics with respect to other well-known Hadoop schedulers.Keywords: Hadoop System, Scheduling System, Heterogeneous Hadoop1. IntroductionHadoop systems were initially designed to optimize theperformanceof largebatchjobssuchaswebindexcon-struction [1]. However, due to its advantages, the numberof applications runningonHadoopis increasing, whichleadstoagrowingdemandforsharingHadoopclustersamongst multiple users [1]. Various types of applicationssubmittedbydierentusersrequiretheconsiderationofnew aspects in designing a scheduling system for Hadoop.One of the most important aspects which should be consid-ered is heterogeneity in the system. Heterogeneity can beat both the application and the cluster levels. Applicationlevel heterogeneityistakenintoaccountinsomerecentresearchonHadoopschedulers[2]. However, tothebestof our knowledge, cluster level heterogeneity is a neglectedaspectindesigningHadoopschedulers. Inthiswork, weintroduceanewschedulingsystem(calledCOSHH)de-signed and implemented for Hadoop, which considers het-erogeneity at both application and cluster levels.The main approach in our scheduling system is to usesysteminformationtomakebetterschedulingdecisions,which leads to improving the performance. COSHH con-sists of several components. The component which gatherssystem information was rst introduced in [3], and is fur-ther developed in [4], which provides a means to estimatethe mean job execution time based on the structure of thejob, and the number of map and reduce tasks in each job.The main motivations for our scheduler are as follows:Scheduling based on fairness, minimum sharerequirements, andtheheterogeneityof jobsand resources. In a Hadoop system, satisfying theminimumsharesof usersistherstcritical issue.Thenextimportantissueisfairness. Wedesignaschedulingalgorithmwhichhastwostages. Intherst stage, the algorithm considers the satisfaction ofthe minimum share requirements for all users. Then,in the second stage, the algorithm considers fairnessfor all users. Most current Hadoop scheduling algo-rithmsconsiderfairnessandminimumshareobjec-tiveswithoutconsideringheterogeneityof thejobsand the resources. One of the advantages of COSHHis that while it addresses the fairness and the mini-mum share requirements, it does this in a way thatmakes ecient assignments, by considering the het-erogeneity in the system. The system heterogeneityis dened based on job requirements (e.g., estimatedexecutiontime)andresourcefeatures(e.g., execu-tion rate). Consequently, the proposed scheduler re-duces the average completion time of the jobs.Reducing the communication cost in the Hadoopsystem.The Hadoop system distributes tasks amongthe resources to reduce a jobs completion time. How-ever, Hadoop does not consider communication costs.In a large cluster with heterogenous resources, maxi-mizing a tasks distribution may result in overwhelm-inglylargecommunicationoverhead. As aresult,ajobscompletiontimewillbeincreased. COSHHconsiderstheheterogeneityanddistributionof re-sources in the task assignment.Reducingthesearchoverheadfor matchingjobsandresources. To nd the best matching ofjobsandresourcesinaheterogeneousHadoopsys-tem, an exhaustive search is required. COSHH usesclassication and optimization techniques to restrictPreprintsubmittedtoElsevier January13,2014the search space. Jobs are categorized based on theirrequirements. Every time a resource is available,itsearches through the classes instead of the individualjobstondthebestmatching(usingoptimizationtechniques). The solution of the optimization prob-lemresultsinthesetof suggestedclassesforeachresource, usedformakingroutingdecisions. More-over,toavoidaddingsignicantoverhead,COSHHlimits the number of times that classication and op-timization are performed in the scheduler.Increasinglocality. Inordertoincreaselocality,we should increase the probability that tasks are as-signed to resources which also store their input data.COSHHmakesaschedulingdecisionbasedonthesuggested set of job classes for each resource. There-fore, the required data of the suggested classes of aresourcecanbereplicatedonthatresource. Thiscanleadtoincreasedlocality,inparticularinlargeHadoop clusters, where locality is more critical.We use a Hadoop simulator, MRSIM [5], and extend ittoevaluateourproposedscheduler. Thefourmostcom-monperformancemetrics for Hadoopsystems: locality,fairness, minimum share satisfaction, and average comple-tion time, are implemented. The performance of COSHHis comparedwithtwocommonlyusedHadoopschedul-ingalgorithms,theFIFOalgorithmandtheFair-sharingalgorithm[1]. TheresultsshowthatCOSHHhassignif-icantlybetterperformanceinreducingtheaveragecom-pletion time, and satisfying the required minimum shares.Moreover, its performance for the locality and the fairnessperformance metrics is very competitive with the other twoschedulers. Furthermore, we demonstrate the scalability ofCOSHH based on the number of jobs and resources in theHadoop system. The sensitivity of the proposed algorithmto errors in the estimated job execution times is examined.Ourresultsshowthateveninasystemwithasmuchas40% error in estimating job execution times, the COSHHalgorithm signicantly improves average completion times.Toevaluate the overheadof the COSHHscheduler,wepresentitsschedulingtime, andcompareitwiththeother schedulers. The improvement in average completiontimeis achievedat thecost of increasingtheoverheadof scheduling. However, theadditional overheadfortheCOSHH algorithm, compared to the improvement for av-erage completion time, is in most cases negligible.Theremainderof thispaperisorganizedasfollows.Thehighlevel architectureof COSHHis introducedinSection2. Details of thetwomaincomponents inourproposed scheduler are presented in Sections 3 and 4. InSection5, wepresent details of theevaluationenviron-ment, andstudytheperformanceof COSHHwithtwowell-knownreal Hadoopworkloads. Section6providesfurtherdiscussionabouttheperformanceoftheCOSHHscheduler, andanalyzesit fromsensitivityandscalabil-ityperspectives. Implementationonanactual clusterispresented in Section 7. Current Hadoop scheduling algo-rithms are given in Section 8. We discuss future directionsin the concluding section.2. ProposedHadoopSchedulingSystemThe high level architecture of COSHH is presented inFigure 1. In this section we present a brief overview of allof the components. We will provide greater detail for themain components in the next two sections.Figure1: ThehighlevelarchitectureofCOSHHA typical Hadoop scheduler receives two main messagesfrom the Hadoop system: a message signalling a new jobarrivalfromauser,andaheartbeatmessagefromafreeresource. Therefore, COSHHconsistsof twomainpro-cesses, where each process is triggered by receiving one ofthese messages. Upon receiving a new job,the schedulerperforms the queuing process to store the incoming job inanappropriatequeue. Uponreceivingaheartbeatmes-sage, theschedulertriggerstheroutingprocesstoassignajobtothecurrentfreeresource. InFigure1,theowsof the job arrival and heartbeat messages are presented bysolid and dashed lines, respectively.The high level architecture of COSHH consists of fourcomponents: the Hadoop system, the task scheduling pro-cess, thequeuingprocess, andtheroutingprocess. Thetask scheduling process estimates the execution time of anincoming job on all resources. These estimates are passedto the queuing process to choose an appropriate queue forthe incoming job. The routing process selects a job for theavailablefreeresource, andsendsittothetaskschedul-ingprocess. Usingtheselectedjobscharacteristics, thetask scheduling process assigns tasks of the selected job toavailableslotsof thefreeresource. TheHadoopsystemand task scheduling process are introduced in this section,andthedetaileddescriptionof thequeuingandroutingprocesses are provided in Sections 3 and 4, respectively.2.1. Hadoop SystemTheHadoopsystemconsistsof acluster, whichisagroup of linked resources. The data in the Hadoop systemisorganizedintoles. Theuserssubmitjobstothesys-tem,where each job consists of a number of tasks. Eachtaskiseitheramaptaskorareducetask. TheHadoopcomponentsrelatedtoourresearcharedescribedasfol-lows:21. The cluster consists of a set of resources, where eachresource has a computation unit and a data storageunit. The computation unit consists of a set of slots(in most Hadoop systems, each CPU core is consid-eredasoneslot) andthedatastorageunit hasaspeciccapacity. WeassumeaclusterwithMre-sources as follows:Cluster = {R1, . . . , RM}Rj =< Slotsj, Memj> Slotsjis the set of slots in resourceRj, whereeachslot(slotkj)hasaspecicexecutionrate(execratekj). Generally, slots belonging to oneresourcehavethesameexecutionrate. Are-sourceRjhas the following set ofs slots:Slotsj = {slot1j, . . . , slotsj} Memjis the storage unit of resource Rj, whichhas a specic capacity (capacityj) and data re-trieval rate (retrieval ratej). The data retrievalrate of resourceRjdepends on the bandwidthwithin the storage unit of this resource.2. DataintheHadoopsystemisorganizedintoles,which are usually large. Each le is split into smallpieces, which are called slices (usually, all slices in asystemhavethesamesize). Weassumethatthereare fles in the system, and each le is divided intol slices, which are dened as follows:Files = {F1, . . . , Ff}Fi = {slice1i, . . . , sliceli}3. WeassumethatthereareNusersintheHadoopsystem, whereeachuser(Ui)submitsasetofjobsto the system (Jobsi) as follows:Users = {U1, . . . , UN}Ui =< Jobsi>Jobsi = {J1i , . . . , Jni },where Jdidenotes job d submitted by user Ui, and nisthetotalnumberofjobssubmittedbythisuser.TheHadoopsystemassignsapriorityandamini-mum share to each user based on a particular policy(e.g. the pricing policy of [6]).Thepriorityisanintegerwhichshowstherelativeimportance of a user. Based on the priority (priorityi)of a user Ui, we dene a corresponding weight (weighti),wheretheweight canbeanyinteger or fractionalnumber. The number of slots assigned to user Ui de-pends on her weight (weighti). The minimum shareofauserUi(minsharei)istheminimumnumberof slots that the system must provide for userUiateach point in time.In a Hadoop system,the set of submitted jobs of auser is dynamic, meaning that the set of submittedjobs for userUi at timet1 may be completely dier-ent at timet2. Each job (Ji) in the system consistsof a number of map tasks and reduce tasks. A jobJiis represented byJi = Mapsi Redsi,whereMapsiandRedsiarethesetsof maptasksandreducetasksof thisjob, respectively. ThesetMapsiof jobJiis denoted byMapsi = {MT1i , . . . , MTm

i}.Here, m

is the total number of map tasks, and MTkiismaptaskkofjobJi. EachmaptaskMTkiper-forms some processingonthe slice (slicelj Fj)where the required data for this task is located.The setRedsiof jobJiis denoted byRedsi = {RT1i , . . . , RTr

i}.Here, r

is thetotal number of reducetasks, andRTkiisreducetaskkof jobJi. EachreducetaskRTkireceivesandprocessestheresultsof someofthe map tasks of jobJi.The valuemeanexecTime(Ji, Rj) denes the meanexecutiontimeof jobJionresource Rj, andthecorresponding execution rate is dened as follows:meanexecRate(Ji, Rj) =1/meanexecTime(Ji, Rj).2.2. Task Scheduling ProcessUpon a new job arrival, an estimate of its mean execu-tion times on the resources is required. The task schedul-ingprocesscomponentusesataskdurationpredictortoestimate the mean execution times of the incoming job onall resources(meanexecTime(Ji, Rj)). Thiscomponentis a result of research in the AMP lab at UC Berkeley [4].To dene the prediction algorithm, rst various analy-ses are performed in [4], to identify important log signals.Then, thepredictionalgorithmisintroducedusingtheselog signals, and nally the accuracy of the prediction algo-rithm is evaluated on real Hadoop workloads. The predic-tionalgorithmshouldbeabletomakeadecisionwithinamatterofmicroseconds, withfairlyhighaccuracy. Toachieve this goal, the estimator consists of two parts: therst part, chrond, refers to a daemon running in the back-ground. It is responsible for analyzing Hadoop history logles as well as monitoring cluster utilizations every speci-ed time interval. For example, for Facebook and Yahoo!workloads, an interval of every six hours is able to providethe desired accuracy [7].Periodically, k-meansclustering[8] isappliedonthisdata, whichkeeps trackof thecluster boundaries. Onthe other hand, the second part,Predictor, is an on-the-spot decisionengine. Whenever anewjobarrives, thePredictor classies its tasks intovarious categories de-pending on the le they operate on, the total cluster uti-lizationatthatpointintime, andtheinputbytestheyread, by consulting the lookup table populated by chrond.Finally, it returns the mean job execution times.3TherenedPredictoralgorithmforCOSHHispro-vided in [9]. Accuracy experiments forChronos are pro-videdin[4] on46GBof Yahoo! andFacebookclusterlogs. The results show that around 90% of map tasks arepredictedwithin80%accuracy. Further, about 80%ofreduce tasks are predicted within 80% accuracy. Most im-portantly, the addition of Chronos to the existing HadoopSchedulersdidnotresultinanysignicantperformancedegradation [4].3. QueuingProcessFigure 2 shows the stages of the queuing process. Thetwo main approaches used in the queuing process are clas-sicationandoptimizationbasedapproaches, introducedin detail in Sections 3.1,and 3.2,respectively. At a highlevel,when a new job arrives,the classication approachspecies thejobclass, andstores thejobinthecorre-sponding queue. If the job does not t any of the currentclasses, the list of classes is updated to add a class for theincoming job. The optimization approach is used to ndan appropriate matching of job classes and resources. Anoptimization problem is dened based on the properties ofthejobclassesandfeaturesoftheresources. Theresultofthequeuingprocess, whichissenttotheroutingpro-cess, contains the list of job classes and the suggested setof classes for each resource.Figure2: TheQueuingProcessThe classication and optimization approaches are usedto reduce the search space in nding an appropriate match-ingof resourcesandjobs. Moreover, byusingtheseap-proaches, we consider the heterogeneity in the system, andreducecompletiontimes. However, usingthesetwoap-proachescanaddoverheadtotheschedulingprocess. Inordertoavoidsignicantoverhead, welimitthenumberof timesthatthesestepsareperformed. Also, weper-form the classication and the optimization approaches bymethodswithsmalloverheads. Inthefollowing, werstintroduce the details of the classication and optimizationapproaches, and later we will provide a complete algorithmfor the queuing process.3.1. Classication-based ApproachInvestigations on real Hadoop workloads show that itis possibletodetermineclasses of commonjobs[10].COSHH uses k-means, a well-known clustering method [8],for classication. This method is used for classifying jobsin real Hadoop workloads [10].We designed our scheduler based on the fact that thereare two critical criteria with dierent levels of importanceinaHadoopsystem. Therstcriterion, imposedbytheHadoopprovider,issatisfyingtheminimumshares. TheHadoop providers guarantee that upon a users request atany time, her minimum share will be provided immediately(iffeasible). Thesecondcriterion, importanttoimprovetheoverall systemperformance, isfairness. Consideringfairness prevents starvation of any user, and divides the re-sources among the users in a fair manner. Minimum sharesatisfaction has higher criticality than fairness. Therefore,COSHH has two classications, to consider these issues forrst minimum share satisfaction, then for fairness. In theprimaryclassication(for minimumshare satisfaction),onlythejobswhoseusershaveminshare>0areclas-sied, and in the secondary classication (for fairness) allof the jobs in the system are considered. The jobs whoseusers haveminshare> 0 are considered in both classi-cations. The reason is that when a user asks for more thanherminimumshare, rstherminimumshareisgiventoher immediately through the primary classication. Then,extra shares should be given to her in a fair way by con-sidering all users through the secondary classication.In both classications, jobs are classied based on theirfeatures (i.e. priority, mean execution rate on the resources(meanexecRate(Ji, Rj)), and mean arrival rate). The setof classes generated in the primary classication is denedasJobClasses1, whereanindividual classisdenotedbyCi. EachclassCihasagivenpriority, whichisequaltothe priority of the jobs in this class. The estimated meanarrivalrateofthejobsinclassCiisdenotedbyi, andthe estimated mean execution rate of the jobs in classCion resourceRjis denoted byi,j. Hence, the heterogene-ityof resourcesiscompletelyaddressedwithi,j. Thetotal numberofclassesgeneratedwiththisclassicationis assumed to beF, i.e.JobClasses1 = {C1, . . . , CF}.ThesecondaryclassicationgeneratesasetofclassesdenedasJobClasses2. Asinthepriorityclassication,each class, denoted by C

i, has priority equal to the priorityof the jobs in this class. The mean arrival rate of the jobsin classC

iis equal to

i, and the mean execution rate ofthe jobs in classC

ion resourceRjis denoted by

i,j. Weassumethatthetotal numberof classesgeneratedwiththis classication isF

, i.e.JobClasses2 = {C

1, . . . , C

F }.For example,Yahoo! uses the Hadoop system in pro-duction for a variety of products (job types) [11]:Data An-alytics, ContentOptimization, Yahoo! Mail Anti-Spam,4AdProducts, andseveral otherapplications. Typically,User Job Type min share priorityUser1 Advertisement Products 50 3User2 Data Analytics 20 2User3 Advertisement Targeting 40 3User4 Search Ranking 30 2User5 Yahoo! Mail Anti-Spam 0 1User6 User Interest Prediction 0 2Table1: TheHadoopSystemExample(Exp1)theHadoopsystemdenesauserforeachjobtype,andthe system assigns a minimum share and a priority to eachuser. For example, assume a Hadoop system (called Exp1)with the parameters in Table 1. The corresponding jobs ata given timet, are given in Table 2, where the submittedjobs of a user are based on the users job type (e.g., J4, sub-mitted by User1, is an advertisement product, while job J5is a search ranking job). The primary classication of theUser Job QueueUser1 {J4, J10, J13, J17}User2 {J1, J5, J9, J12, J18}User3 {J2, J8, J20}User4 {J6, J14, J16, J21}User5 {J7, J15}User6 {J3, J11, J19}Table2: ThejobqueuesinExp1attimetFigure3: Theprimaryclassicationof thejobsinExp1systemattimetjobs in the Exp1 system, at time t, is presented in Figure 3.Note that here we assume that there is just one resource inthe system. The secondary classication of system Exp1,at time t, is shown in Figure 4. The parameter k (used fork-means clustering) is set in our systems to be the numberofusers. BasedonthestudiesonFacebookandYahoo!workloads,as well as studies of other Hadoop workloads,jobs sent from a user to a Hadoop cluster can be classiedtobelongtothesameclass([10]). Settingktoalargernumber does not have signicant impact on performance,as the matching is dened based on heterogeneity of jobsand resources.3.2. Optimization based ApproachAfterclassifyingtheincomingjobs, andstoringthemin their appropriate classes, the scheduler nds a matchingof jobs and resources. The optimization approach used inour scheduler rst constructs a linear program (LP) whichconsiders properties of the job classes and features of theFigure4: ThesecondaryclassicationofthejobsinExp1systemattimetresources. The scheduler then solves this LP to nd a setof suggested classes for each resource.AnLPisdenedforeachoftheclassications. TherstLPisdenedforclassesintheset JobClasses1asfollows:max s.t.MXj=1i,j i,j i, foralli=1, . . . , F, (1)FXi=1i,j 1, forallj=1, . . . , M, (2)i,j 0, foralli=1, . . . , F, andj=1, . . . , M. (3)Here is interpreted as the maximum capacity of thesystem, andi,jis the proportion of resourceRjwhich isallocatedtoclassCi. Moreover, Misthetotal numberofresources, andFisthetotalnumberofclassesgener-atedintheprimaryclassication(|JobClasses1|). Thisoptimization problem increases the arrival rates of all theclasses by a xed proportion, to minimize the load in thesystem, while in (1) the system is kept stable. After solv-ing this LP, we have the allocation matrix, whose (i, j)element isi,j. Based on the results of this LP, we denethe setSCjfor each resourceRjasSCj = {Ci : i,j = 0}.For example, consider a system with two classes of jobs,andtworesources (M=2, F =2), inwhichthear-rival and execution rates are=2.45 2.45and =9 52 1, respectively. SolvingtheaboveLPgives =1.0204and =0 0.51 0.5. Therefore, thesets SC1andSC2for resourcesR1andR2will be {C2} and {C1, C2},respectively. Thesetwosetsdenethesuggestedclassesfor each resource, i.e. upon receiving a heartbeat from re-sourceR1,ajobfromclassC2shouldbeselected. How-ever, upon receiving a heartbeat from resourceR2, either5a job from classC1orC2should be chosen. Even thoughresource R1 has the fastest rate for class C1, the algorithmdoesnotassignanyjobsofclassC1toit. Ifthesystemis highly loaded, it turns out that the average completiontime of the jobs will decrease if resourceR1 only executesclassC2jobs.Thesecondoptimizationproblemisusedforthesec-ondary classication. The scheduler denes an LP similartothepreviousone, forclassesintheset JobClasses2.However, in this LP the parameters , i,j, i,j, i, and Fare replaced by

,

i,j,

i,j,

i, andF

, respectively:max

s.t.MXj=1

i,j

i,j

i, foralli=1, . . . , F

, (4)F

Xi=1

i,j 1, forallj=1, . . . , M, (5)

i,j 0, foralli=1, . . . , F

, andj=1, . . . , M. (6)After solving this LP, we will have the matrix

, whose(i, j)elementis

i,j. WedenethesetSC

jforeachre-sourceRjas the set of classes which are allocated to thisresource based on the result of this LP, where SC

j = {C

i :

i,j = 0}.COSHHuses thesets of suggestedclasses SCRandSC

Rfor both making scheduling decisions and improvinglocalityintheHadoopsystem. Theschedulingdecisionismadebytheroutingprocess, andlocalitycanbeim-proved by replicating input data on multiple resources inthe Hadoop system. Most current Hadoop schedulers ran-domly choose three resources for replication of each inputdata [1, 12]. However, COSHH uses the sets of suggestedclasses, SCR and SC

R, to choose replication resources. Foreach input data, the initial incoming jobs using this dataareconsidered, andfromall thesuggestedresourcesforthesejobs, threeofthemarerandomlyselectedforstor-ing replicas of the corresponding input data. Since in thiswork we evaluate our algorithm on a small cluster, we onlyconsider the initial incoming jobs to determine the replica-tion resources. However,in large Hadoop clusters with aAlgorithm1 Queuing ProcessWhenanewJob(sayJ)arrivesGetexecutiontimeofJfromTaskSchedulingProcessif Jtsinanyclass(sayCi)thenaddJtothequeueofCielseusek-meansclusteringtoupdatethejobclassicationndaclassforJ(sayCj),andaddJtoitsqueuesolve optimization problems, and get two sets of suggestedclasses,SCRandSC

Rendifsend SCR, SC

Rand both sets of classes (JobClasses1 andJobClasses2)totheroutingprocesshighvarietyofavailablenetworkbandwidths,developingour proposed replication method to consider the updatescausedbylater incomingjobs, couldleadtosignicantimprovement in the locality. We leave this as future work.We used the IBM ILOG CPLEX optimizer [13] to solvethe LPs. A key feature of this optimizer is its performancein solving very large optimization problems, and the speedrequired for highly interactive analytical decision supportapplications [14]. As aresult, solvingtheoptimizationproblems in COSHH does not add considerable overhead.Now that we have dened the two main approaches ofourproposedqueuingprocess, thecompletealgorithmispresented in Algorithm 1.4. RoutingProcessWhen the scheduler receives a heartbeat message fromafreeresource, sayRj, it triggers theroutingprocess.The routing process receives the sets of suggested classesSCRandSC

Rfromthequeuingprocess, andusesthemto select a job for the current free resource. This processselects a job for each free slot in the resource Rj, and sendsthe selected job to the task scheduling process. The taskscheduling process chooses a task of the selected job, andassigns the task to its corresponding slot.Here, it should be noted that the scheduler is not lim-iting each job to just one resource. When a job is selected,the task scheduling process assigns a number of appropri-atetasksofthisjobtoavailableslotsofthecurrentfreeresource. If the number of available slots is fewer than thenumber of uncompleted tasks for the selected job, the jobwillremaininthewaitingqueue. Therefore,atthenextheartbeatmessagefromafreeresource, thisjobiscon-sidered in making the scheduling decision; however, tasksalready assigned are no longer considered. When all tasksof ajobareassigned, thejobwill beremovedfromthewaiting queue.Algorithm2presentstheroutingprocess. Therearetwostagesinthisalgorithmtoselectjobsfortheavail-ableslotsofthecurrentfreeresource. Intherststage,the jobs of classes inSCRare considered, where the jobsare selected in the order of their minimum share satisfac-tion. This means that a user who has the highest distancetoachieveherminimumsharewill getaresourcesharesooner. However, inthesecondstage, jobsforclassesinSC

R are considered, and jobs are selected in the order de-nedbythecurrentsharesandprioritiesof theirusers.In this way, the scheduler addresses fairness amongst theusers. Ineachstage, iftherearetwouserswithexactlythe same conditions, we randomly choose between them.ItshouldbenotedthatCOSHHisadynamicsched-uler. Based on any variation in the Hadoop workload andresources, the classication and LP solver components canupdate the scheduling decisions accordingly.6Algorithm2 Routing ProcessWhenaheartbeatmessageisreceivedfromaresource(sayR)NFS=numberoffreeslotsinRwhile NFS = 0andthereisajob(J)whoseuser.minShare user.currentShare > 0andclass SCRand((user.minShare user.currentShare) weight)ismaximumdoaddJtothesetofselectedjobs(Jselected)NFS= NFS 1endwhilewhile NFS = 0andthereisajob(J)whoseclass SC

Rand(user.currentShare/weight)isminimumdoaddJtothesetofselectedjobs(Jselected)NFS= NFS 1endwhilesendthesetJselectedtotheTaskSchedulingProcesstochooseataskforeachfreeslotinR.5. ExperimentalResultsInthis sectionweevaluate our proposedschedulingsystemonreal Hadoopworkloads. First, wedenetheperformance metrics considered. Later,we introduce ourexperimental environmentandworkload, andnallywepresent the evaluation results.5.1. Performance MetricsWe dene the function Demand(U, t) as the set of unas-signedtasks for user Uat time t. Also, the functionAssignedSlots(U, t) is dened as the set of slots executingtasksfromuserUattimet. Weconsideranexperimentwhich is run for time T. Using these denitions, we denefour Hadoop performance metrics:1. AveragecompletionTimeistheaveragecompletiontime of all completed jobs.2. Dissatisfactionmeasures howmuchthe schedul-ing algorithm is successful in satisfying the minimumsharerequirementsof theusers. Auserwhosecur-rentdemandisnotzero(|Demand(U, t)| > 0), andwhose current share is less than her minimum share(|AssignedSlots(U, t)|