improving hadoop performance with reliable opportunistic instances in openstack

13
Improving Hadoop performance with reliable opportunistic instances in OpenStack Telles Nóbrega, Henrique Truta, Andrey Brito Univesidade Federal de Campina Grande

Upload: eubrasilcloudforum-

Post on 28-Jan-2018

69 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Improving Hadoop performance with reliable opportunistic instances in OpenStack

Improving Hadoop performance with reliable opportunistic instances in

OpenStackTelles Nóbrega, Henrique Truta, Andrey Brito

Univesidade Federal de Campina Grande

Page 2: Improving Hadoop performance with reliable opportunistic instances in OpenStack

Cloud Computing● Cheap and flexible way to get resources on demand

● Provides resources in a pay-as-you-go fashion

● Have three basic types of services:○ Infrastructure-as-a-Service○ Software-as-a-Service○ Platform-as-a-Service

2

Page 3: Improving Hadoop performance with reliable opportunistic instances in OpenStack

Underutilization of resources● Overall cluster utilization is around 60% for CPU and 50% for

RAM

● The same behavior can be seen when we look at single machines of the cluster

● Nevertheless, the user’s quota (especially on private clouds) cannot be overdimensioned

3

Page 4: Improving Hadoop performance with reliable opportunistic instances in OpenStack

Underutilization of resources (Google Open Data)

4

Page 5: Improving Hadoop performance with reliable opportunistic instances in OpenStack

Data Processing with Hadoop● Hadoop is one of the most used data processing tools on the market

● Implements the MapReduce paradigm

● Fault tolerant

● Often moved to the cloud○ Easy to scale○ Lower costs○ Gain in flexibility often compensate loss in performance

5

Page 6: Improving Hadoop performance with reliable opportunistic instances in OpenStack

Hadoop as a Service● The PaaS model makes the Hadoop processing on the cloud easier

● The user can request a configured Hadoop cluster and just submit his/her data and applications

● Amazon Elastic MapReduce is an example, OpenStack Sahara is an equivalent solution

6

Page 7: Improving Hadoop performance with reliable opportunistic instances in OpenStack

Opportunistic instances● Cheaper type of instances (somehow similar to AWS Spot instances)

● Spawned in underused compute resources

● The goal is to use them to speed up Hadoop processing○ However, those instances can be preempted if the resources are needed

for a higher priority (regular) instances○ Losing the instance in the middle of a job execution is extremely harmful

for the application performance

7

Page 8: Improving Hadoop performance with reliable opportunistic instances in OpenStack

8

Page 9: Improving Hadoop performance with reliable opportunistic instances in OpenStack

The approach● A framework that uses opportunistic instances in Hadoop processing with a

higher guarantee that this instance will not be lost

● Uses a workload predictor to forecasts the cloud utilization in a short time window, estimating how many instances will be available

● If the predictor underestimates the workload, live migration is used so the VM is not lost○ Moves the instance to a different host without turning them off○ Downtime ranges from a milliseconds to a couple of seconds

9

Page 10: Improving Hadoop performance with reliable opportunistic instances in OpenStack

The approach

10

Page 11: Improving Hadoop performance with reliable opportunistic instances in OpenStack

Results with an extra instance (from 3 to 4)

11

Page 12: Improving Hadoop performance with reliable opportunistic instances in OpenStack

Results● VMs loss reduced with prediction

● Live migration has almost no interference on other processes

● Better resource usage in the cloud without the risks over overprovisioning or increasing the quota

12

Page 13: Improving Hadoop performance with reliable opportunistic instances in OpenStack

Thank you!

Andrey BritoProfessor - Universidade Federal de Campina Grande

Telles NóbregaMaster Degree Student - Universidade Federal de Campina GrandeOpenStack ATC

Henrique TrutaMaster Degree Student - Universidade Federal de Campina GrandeOpenStack ATC

13