apache hadoop india summit 2011 talk "scheduling in mapreduce using machine learning...

Scheduling in MapReduce using Machine Learning Techniques

IIIT Hyderabad

Vasudeva Varma vv@iiit.ac.inRadheshyam Nanduri radheshyam.nanduri@research.iiit.ac.in

Cloud Computing GroupSearch and Information Extraction Lab

http://search.iiit.ac.in

Agenda

• Cloud Computing Group @ IIIT Hyderabad• Admission Control• Task Assignment• Conclusion

Cloud Computing Group @ IIIT Hyderabad

• Search and Information Extraction– Large datasets– Clusters of machines– Web crawling– Data intensive applications

• MapReduce– Apache Hadoop

Research Areas

• Resource management for MapReduce– Scheduling– Data Placement

• Power aware resource management• Data management in cloud• Virtualization

Teaching

• Cloud Computing course– Monsoon semester (2008 onwards)– Special focus on Apache Hadoop• MapReduce and HDFS• Mahout

– Virtualization– NoSQL databases– Guest lectures from industry experts

Learning Based Admission Control and Task Assignment in MapReduce

• Learning based approach• Admission Control– Should we accept a job for execution in the

cluster?• Task Assignment– Which task to choose for running on a given node?

Admission Control

Deciding if and which request to accept from a set of incoming requests

Critical in achieving better QoS Important to prevent over committing Needed to maximize the utility from the

perspective of a service provider

• Web services interface for MR jobs• Users search jobs through repositories• Select one that matches their criteria• Launch it on clusters managed by service provider• Service providers rent infrastructure from IaaS provider

MapReduce as a Service

Three phase Soft and hard deadlines Decay parameters Provison for service provider

penalty

Utility Functions

Based on Expected Utility Hypothesis from decision theory

Accept a job that maximizes the expected utility

Use pattern classifier to classify incoming jobs

Two classes Utility functions for prioritizing

Our Approach

Feature Vector

Given as input to the classifier Contains job specific and cluster specific parameters Includes variables that might affect admission decision

Cluster Specific

Used map slots

Used reduce slots

Pending maps

Pending reduces

Finishing jobs

Map time average

Reduce time average

Job Specific

Number of maps

Number of reduces

Mean map task time

Mean reduce task time

Bayesian Classifier

Naive Bayes Assumption Conditionally independent parameters

Works well in practice Use past events to predict future outcomes

Application of Bayes theorem while computing probabilities

Incremental Learning – efficient w.r.t. memory usage

Simple to implement

Evaluation

Success/Failure criteria: Load management Simulation Baseline

Myopic – Immediately select job that has maximum utility

Random – Randomly select one job from the candidate jobs

Algorithm Accuracy

Comparison with baseline

Algorithm Achieved Load Average

Random 42.11

Myopic 42.09

Our algorithm 0.97

Meeting Deadlines

Task Assignment

Deciding if a Task can be assigned on a node Learning based technique Extension of the work presented before

Learning Scheduler

Features of Learning Scheduler

• Flexible task assignment – based on state of resources

• Consider job profile while allocating• Tries to avoid overloading task trackers• Allow users to control assignment by

specifying priority functions• Incremental learning

Using Classifier

• Use a pattern classifier to classify candidate jobs

• Two classes: good and bad• Good tasks don't overload task trackers• Overload: A limit set on system load average

by the admin

Feature Vector

• Job features– CPU, memory, network and disk usage of a job

• Node properties– Static: Number of processors, maximum physical

and virtual memory, CPU Frequency– Dynamic: State of resources, Number of running

map tasks, Number of running reduce tasks

Job Selection

• From the candidates labelled as good select one with maximum priority

• Create a task of the selected job

Priority (Utility) Functions

• Policy enforcement– FIFO: U(J) = J.age– Revenue oriented

• If priority of all jobs is equal, scheduler will always assign task that has the maximum likelihood of being labelled good.

Job Profile

• Users submit 'hints' about job performance• Estimate job's resource consumption on a

scale of 10, 10 being the highest.• This data is passed at job submission time

through job parameters:– learnsched.jobstat.map - “1:2:3:4”

• This scheduler is made open-source at http://code.google.com/p/learnsched/

Evaluation

• Evaluation work load– TextWriter– WordCount– WordCount + 10ms delay– URLGet– URLToDisk– CPU Activity

Learning Behaviour

Classifier Accuracy

Conclusions

Feedback informed classifiers can be used effectively

Better QoS than naive approaches Less runtime happy users more revenue

for the service provider

Thank you

IIIT Hyderabad

Questions/Suggestions/Comments?Vasudeva Varma vv@iiit.ac.inRadheshyam Nanduri radheshyam.nanduri@research.iiit.ac.in

Cloud Computing GroupSearch and Information Extraction Lab

http://search.iiit.ac.in

apache hadoop india summit 2011 talk "scheduling in mapreduce using machine learning...

information

nanduriresearch

inradheshyam

pattern classifier

job

iiit

utility

task

Technology

153 salil varma

prof. vasudeva saran agrawala a bibliographic survey...

domain specific search in indian...

modeling the evolution of product entities priya...

varma kalai

an effective approach for aesop and guided summarization...

datla varma

mapreduce vs pig | mapreduce pig integration

1. name: vasudeva varma 4. field of specialization ... ·...

an iterative approach to extract dictionaries from wikipedia...

varma steam turbines

mapreduce. mapreduce outline mapreduce architecture...

enabling high quality teaching and learning for large...

vasudeva sastri k thescienceofmusic 1954 0097

a simple unsupervised query categorizer for web search...

companies act, 2013 analysis of 98 sections effective...

principles of programming languages course introduction...

mapreduce-mpi library users...

pihla varma windows and doors contents pihla varma 2 •...

sarla varma