topology aware resource allocation

Topology-Aware Resource Allocation

for Data-Intensive WorkloadsSUJITH NAIR

12316EN005

Topology-Aware Resource Allocation

for Data-Intensive Workloads• Programme:

Part I : An Introduction to Cloud Computing.

Part II : Topology-Aware Resource Allocation (TARA).


“ …a style of computing in which dynamically scalable and often virtualized resources are provided as a service over the internet.”

-Wikipedia

Part I : An Introduction to Cloud Computing

Infrastructure-as-a-Service- The revenue models in cloud-systems vary. Our model of interest is the Infrastructure-as-a-Service[IaaS] model.

- IaaS Providers rent computing resources on-demandbill on a pay-as-you-go basis


Examples of IaaS providers include Amazon EC2, Joyent, RackSpace, etc.

The cloud users rent compute cycles, storage, and bandwidth with small minimum billing units (an hour or less for compute and per-MB for storage and bandwidth) and almost-instant provisioning latency (minutes or seconds).

Part II TARA

[Topology-Aware Resource

Allocation].

The need for TARA.

Service providers rent computing resources on-demand, bill on a pay-as-you-go basis, and multiplex many users on the same physical infrastructure.

Current IaaS systems usually provide Virtual Machines (VMs) that are subsequently customized by the user.

The need for TARA.

The placement of these VMs within the cloud can significantly impact application performance.

The workload’s resource usage characteristics and the topology and utilization of the IaaS need to be carefully considered to come up with an optimized allocation policy.

The need for TARA.

But since IaaS providers today are unaware of the hosted application’s requirements, they allocate resources independently of an application’s requirements.

Thus, the resource allocation may not help achieve optimal performance. On the other extreme, it could result in performance anomalies.

The need for TARA.

For example, when dealing with communication-intensive workloads, allocating VMs without considering network topology reduces performance by requiring inter-VM traffic to traverse bottlenecked network paths.

It is therefore critical to optimize the initial resource allocation that could be responsible for the majority of performance anomalies.

The following figure shows the dramatic difference that an incorrect placement makes.

For this experiment, a distributed sort was performed with 90GB data and 18 nodes.

Allocating the clusters in a way that inter-cluster traffic flows through a congested part of the network increases the benchmark completion time by almost 50% when compared to an allocation where there is no constraint on network traffic.

Figure. Difference in Sort Execution Times

Cluster Allocation

The need for TARA.

Thus, both the workload's resource usage characteristics and the topology of the IaaS need to be carefully considered to determine an optimized allocation policy.

This is where the significance of TARA lies.

Architecture Of TARATARA is an architecture that adopts a ‘what- if’ methodology.

The prototype for TARA consists of:- Prediction Engine : is a lightweight simulator to estimate the performance of a given resource allocation.

- Search Algorithm : to find an optimized solution in the large search space.

Architecture Of TARA

Figure. TARA’s Architecture


Objective Function:- The objective function defines the metric that TARA should optimize.

- Our prototype’s objective function uses job completion time as the optimization metric.

- The output value for the objective function is calculated using the MapReduce simulator.


Application Description:1. the framework type that identifies the

framework model to use,2. workload specific parameters that

describe the particular application’s resource usage and,

3. a request for resources including the number of VMs, storage, etc.


Information on Available Resources:- is a resource snapshot of the IaaS data centre. This includes,

1. list of available servers,2. current load, 3. available capacity on individual servers,4. data-centre topology,5. available bandwidth on each network

link.

Figure. TARA’s Integration into the IaaS Stack

Search AlgorithmIn any large IaaS system, a request for r VMs will have a large number of possible resource allocation candidates.

If n servers are available to host at most one VM, the total number of possible combinations is nCr .

Hence, exhaustively searching through all possible candidates for an optimal solution is not feasible.

Search Algorithm

To help in efficiently identifying an approximate solution, a genetic algorithm (GA) is used.

Genetic algorithms are a search technique inspired by evolutionary biology for finding solutions to optimization and search problems.

Search AlgorithmTo represent each possible candidate, we use a bit string where the string length is equal to n, the number of servers available to host a single VM.

For each bit in the string, a value of 1 represents the physical server being selected for hosting a VM and a 0 represents the server being excluded.

Search AlgorithmInitialization: GA initializes a population of 100 random candidates.

Reproduction: Mutation, Swap, or Crossover operations are applied at random to the candidate population to create offspring.

- mutation : exchanges two single bits in a string

- swap : swaps two substrings.- crossover: combines portions of different candidates into a new offspring.

Search Algorithm

Selection: For each successive GA iteration, or generation, the prediction engine of TARA is used to evaluate the fitness of each candidate.

Termination: In TARA, the search algorithm terminates and returns the best candidate found when it reaches a tuneable time limit (60 seconds in the current prototype).

Prediction Engine

The Prediction Engine is designed to be fast and lightweight.

The Prediction Engine simulates the Hadoop 0.18.3. MapReduce framework in C++.

Based on the framework-specific configuration, the simulator creates a number of workers that will “execute” tasks.

Prediction Engine

The ‘worker’ nodes host Map & Reduce tasks.

For every map or reduce task, the simulator allocates CPU cycles that are proportional to the input size instead of performing the actual computation.

It also accounts for Hadoop's initialization overhead, if any, for each new task.

Experimental Setup HP Labs Open Cirrus cluster was used to evaluate TARA.

was composed of 111 machines with a single-socket quad-core Intel 3.0 GHz Xeon X3370 processor, 8 GB of RAM, a Gigabit Ethernet port, and four 750 GB disks.

OS: Ubuntu 9.04 LinuxVirtualization Software: Xen

Alternate Allocation SchemesRR-R: allocates VMs in a round-robin (RR) manner across racks (-R).

RR-S: allocates VMs in a round-robin (RR) manner across servers (-S).

H-1: A hybrid policy that combines RR-S and RR-R with a preference for selecting servers in the rack with the greatest available bandwidth but will only select a maximum of 20 servers per rack.

H-2: A hybrid policy similar to H-1 but only selects a maximum of 10 servers per rack.

Sort Benchmark

For the evaluation of TARA and its comparison with alternate schemes, a distributed Sort benchmark is used.

160GB (2GB/node) of random data is generated as input for this benchmark.

Sort Benchmark

Figure. Sort Benchmark Results

Sort Benchmark

Figure. Predicted vs. Actual Results

Conclusion

Cloud-based Infrastructure-as-a-Service models are gaining in popularity.

However , the potentially huge variations in performance due to the application-unaware resource allocation is a key challenge for their increased adoption.

This Paper proposes and evaluates a topology-aware resource allocation solution that addresses this problem.

References

G. Lee, N. Tolia, P. Ranganathan and R. Katz. Topology-Aware Resource Allocation In Data-Intensive Workloads. In ACM SIGCOMM, Vol.41. January 2011.

J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In Proceedings of the 6th Symposium on OSDI , December 2004.

Thank You.

topology aware resource allocation

Technology