an integer programming representation for data center power-aware management - report

An Integer Linear Programming Representation for DataCenter

Power-Aware Management -ILP implementation and

Heuristic design

2012

Ioanna Tsalouchidou

Arinto Murdopo

Josep Subirats

Communication Networks Optimization

2012

2

Table of Contents 1. Introduction .......................................................................................................................................... 4

2. ILP problem formulation ....................................................................................................................... 5

2.1. Problem satement ........................................................................................................................ 5

2.2. Scheduling Approach .................................................................................................................... 5

2.3. Revenue, Power cost, Migration cost and Quality of Service as factors ...................................... 7

2.4. The final Model ............................................................................................................................. 7

3. Metaheuristic Design .......................................................................................................................... 10

3.1. GRASP Overview ......................................................................................................................... 10

3.2. GRASP Heuristic Implementation ............................................................................................... 11

3.3. Random Job Choosing improvement .......................................................................................... 12

4. Dataset Generation ............................................................................................................................. 13

4.1. Node capacity array generation .................................................................................................. 13

4.2. Min/Max job CPU requirement array generation ....................................................................... 13

5. Experiment Result ............................................................................................................................... 14

5.1. Experiment Platform and Configuration ..................................................................................... 14

5.2. Execution Time ............................................................................................................................ 14

5.3. GRASP Heuristic Alpha Value ...................................................................................................... 15

5.4. GRASP Heuristic Solution Rate Quality ....................................................................................... 18

5.5. Maximum Benefit Comparison - CPLEX and GRASP Heuristic .................................................... 19

6. Conclusions ......................................................................................................................................... 21

7. Future Work ........................................................................................................................................ 22

7.1. Implementation of the Local Search Phase of GRASP ................................................................ 22

7.2. Simulations with Multiple Problem Instances ............................................................................ 22

8. References .......................................................................................................................................... 24

3

Table of Figures

Figure 1: Scheduling problem ....................................................................................................................... 5

Figure 2: Job scheduling ................................................................................................................................ 6

Figure 3: Simple scheduling approach .......................................................................................................... 6

Figure 4: Final scheduling ILP model ............................................................................................................. 8

Figure 5: General GRASP algorithm ............................................................................................................ 10

Figure 6: Job's benefit evaluation ............................................................................................................... 11

Figure 7: Execution time of the CPLEX solution .......................................................................................... 14

Figure 8: Execution time for 100H200J R .................................................................................................... 15

Figure 9: NR GRASP Heuristic results .......................................................................................................... 16

Figure 10: R GRASP Heuristic results .......................................................................................................... 17

Figure 11: Normalized benefit over time for NR GRASP Heuristic.............................................................. 18

Figure 12: Normalized benefit over time for R GRASP Heuristic ................................................................ 19

Figure 13: CPLEX and R/NR GRASP Heuristic results .................................................................................. 20

Figure 14: Possible local search phase approach ........................................................................................ 22

4

1. Introduction

This project is based on the paper “An Integer Linear Programming Representation for DataCenter

Power-Aware Management” [1]. This paper describes the problem of the placement of jobs in a data-

centre in order to maximize its benefit, while taking into account the revenue obtained for executing the

jobs, SLA penalization costs, migration costs and energy consumption costs.

In this project, the integer linear problem described in the aforementioned paper has been

implemented using IBM ILOG CPLEX, in order to be able to obtain the optimal solution of different

problem instances. The main contribution of this project has been the implementation of a GRASP

metaheuristic (with two variants) which tries to find a good solution to the same problem in much less

time than the original ILP. The metaheuristic has been implemented using Java, and has been tested and

compared against the results provided by the CPLEX ILP implementation.

The structure of this document is as follows. Section 2 summarizes the problem statement as described

in [1]. In Section 3, an overview of the GRASP heuristic is provided, along with an explanation on the

implemented metaheuristic. Closely related to Section 3, Section 4 explains the dataset generation

procedure, which is used to run the simulations. Performance and quality results of the metaheuristic

are presented in Section 5. Finally, conclusions on the work done are exposed in Section 6, while in

Section 7 the possible improvements which could be made on it are envisioned.

5

2. ILP problem formulation

2.1. Problem satement

In the proposed paper a data-centre is modelled as a set of physical machines, processors and jobs. For

each element a set of constraints have to be accomplished in order to be coherent with the model.

The solution to the problem comes when we find an optimal way to assign resources (in the paper, just

the number of required CPUs is considered) to jobs keeping in mind the electrical power consumption of

the resources, the migration costs (in case a job has to be moved to another physical machine), the

quality of service penalizations and finally the revenue obtained by running these jobs. In the end, a

positive benefit has to be obtained.

In this direction the above problem was defined as a function that needs to be maximized (the obtained

benefit) with the optimal balance of these four parameters. Moreover, there is a set of conditions and

restrictions that set the jobs on the resources while keeping the solution viable and real.

2.2. Scheduling Approach

The proposed model uses the benefits of the virtualization technology in order for the data centres to

be able to run several jobs in one or multiple physical machines, to share resources and above all to be

able to migrate jobs from one machine to another. This technology enables the dynamic allocation of

jobs along with consolidation, the scheduling policy that reduces the number of used resources by

allocating the most number of jobs in the least number of machines without trade-off in the QoS or

performance.

Figure 1: Scheduling problem

6

As it is shown in Figure 2, the problem that needs to be solved at each scheduling round is the decision

of which resources will be assigned to each job in order to maximize the overall benefit taking into

consideration the job revenue, the migration costs, the power costs and the tolerable QoS loss.

Figure 2: Job scheduling

For the scheduling process each job is supposed to have determined resource requirements, such as

CPU quota or memory space, in order to run properly. In Figure 3 we can see the constraints of the

problem as they were primarily constructed.

Figure 3: Simple scheduling approach

The scheduling problem is therefore described as a matrix (schedule[Hosts,Jobs]) describing whether a

particular job is allocated into a particular host (schedule[host,job]=1) or not (schedule[host,job]=0). The

“Unique” constraint in this case is restrictive; given that it imposes that all the jobs must be allocated in

order for the solution to be feasible. The “Capacity” constraint imposes that the capacity of each one of

the hosts must not be exceeded.

7

2.3. Revenue, Power cost, Migration cost and Quality of Service as factors

The next step is the introduction of more constraints that will help us to model the maximization of the

economic benefit. Jobs are translated to money according to the data-centre pricing policy in the Service

Level Agreement (SLA). The Benefit equation is defined as follows: Benefit = Revenue - Costs.

In the next step towards the final model we try to focus on the problem of power cost and consumption

and the way in which we can possibly minimize it. As it is already known, the power curve for a given

host grows logarithmically, which means that in the case of two machines with many processors with

only one active processor, the consumption is much higher than one working machine with two active

processors.

In the following solution we consider that the number of CPUs in a host is a natural value so that the

scheduling will consider separately the processors of a given host. Moreover, the number of active CPUs

in a host must not surpass the maximum number of CPUs that a host has available.

Another handicap that is introduced through consolidation is the migration cost that is defined as the

cost that represents the amount of time wasted in moving a job from a host to another. During this

time, there is no revenue obtained from the execution of the job, and therefore must be subtracted

from the overall benefit.

The final step is to define the Quality of Service as an input to the scheduling problem. The system

allows some degradation in the provided QoS, as specified in the SLA. In order to improve consolidation

and reduce power consumption, QoS can be relaxed introducing at the same time a penalization. This

penalization is specified in Service Level Agreement. Therefore, the schedule can be altered while taking

into consideration these economic consequences.

In order to be able to measure the level of accomplishment of the job goals and the SLA, the authors

define the Health function which in its implementation is a linear function that varies from cpumin to

cpumax. In other words, a CPU assignment of cpumin (depends on each job) means a health of 0

(maximum QoS penalization), which equals to running the job in the minimal conditions, whereas a CPU

assignment of cpumax means a health of 1 (no QoS penalization), which equals to the optimal execution

of the job.

2.4. The final Model

As a result of all the above steps the final model is obtained as in Figure 4. This is also the final version of

the model that has been implemented in CPLEX.

8

Figure 4: Final scheduling ILP model

The presented ILP is based on the primary model, but incorporates the aspects described in Section 2.3.

Further details about the complete ILP model, the constraints, the variables and the parameters that are

specified in the above formulas are in depth explained in [1]

It is worth mentioning that in the paper, while the problem is formulated, a variable multiplication

appears. This converts the problem into a non-linear problem, which is harder and slower to solve. In

order to solve this problem, the author decomposes the constraint which took care of not exceeding the

maximum CPUs of each physical machine into several constraints (“MaxCPU”, “Capacity”, and

9

“QosAux1-4”). This also introduces the “quota” variable. Refer to *1+ for more details of this

decomposition.

It is also interesting to observe that the author introduced a relaxation in the “Unique” constraint in the

final problem model. In this case, it is not mandatory to schedule all the jobs in order for the solution to

be feasible. Therefore, in this case some jobs might not be scheduled in a scheduling round.

10

3. Metaheuristic Design

3.1. GRASP Overview

The chosen metaheuristic for this project has been the GRASP (Greedy RAndomized Search Procedure)

metaheuristic. This metaheuristic consists in building as many feasible solutions as specified by a

maximum number of iterations (construction phase) and evaluate its benefit using the objective

function. In each of these solutions, a local search evaluation (local search phase) is applied in order to

find better solutions starting from a feasible solution. From all the evaluated solutions, the one with

maximum (when maximizing) or minimum (when minimizing) benefit is selected. This algorithm can be

observed in Figure 5.

Figure 5: General GRASP algorithm

This metaheuristic introduces randomization in the construction phase. This phase starts with an empty

Candidate List (CL) of possible placements for a given job (following the example of our problem). Then

using a greedy function which evaluates the individual benefit incurred by this particular placement, the

CL is filled. From this list, a Restricted Candidate List (RCL) containing the placements whose benefit

accomplishes Equation 1 is built. From the RCL, an element is selected randomly.

Equation 1

This randomness assures that different solutions will be built each time, although its “amount of

randomness” depends on Alpha. The RCL size depends on Alpha. If Alpha=0, it has a pure deterministic

behaviour as it always selects the best placement (only one element in the RCL). On the other hand, a

value of Alpha=1 has a completely random behaviour (all the combinations are present in the RCL),

given that any combination, regardless of its obtained benefit, can be selected.

11

3.2. GRASP Heuristic Implementation

In this project, the general GRASP design has been followed. Although the local search phase has not

been implemented, a possible way to do it has been envisioned and is explained in the Future Work

section.

The interesting part of the GRASP design is the Greedy Function. In our case, the algorithm evaluates the

individual benefit of placing each one of the jobs in all of the hosts of the infrastructure. The individual

benefit takes into account the amount of revenue obtained by running a given job, incurred power

consumption costs of the scheduled job, QoS penalties when it’s not given the maximum amount of CPU

it requires and migration costs in case it is moved from one host in the initial schedule to another one in

the new one.

As an initial approach, all the jobs are evaluated in order (first job number 1, then job number 2 and so

on). Given that each job has a minimum and maximum CPU requirements, each placement to a

particular host is evaluated with all the possible CPU assignments. Note that each of these CPU

assignments will provide a different benefit, given that less power will be used but also the incurred SLA

penalty will be different. In case a particular placement is not possible because the destination host

doesn’t have enough free CPUs, an individual benefit of negative infinity is assigned. In this case, this

combination is not included in the Candidate List, because it would greatly modify the threshold which is

used later to build the Reduced Candidate List.

Figure 6: Job's benefit evaluation

12

When all the possible placements of a particular job have been evaluated, the CL is reduced into the RCL

as explained in Section 3.1. Then, one element is picked up randomly, which will be the chosen

combination. As in the general GRASP implementation, the greedy behaviour is introduced when

selecting the adopted combination from the RCL, given that it is a random selection. This combination

specifies the destination host for a particular job, as well as the assigned CPU (which is in the range

between its minimum and maximum required CPU).

When all the jobs have been evaluated and the scheduling matrix has been built, it is returned to the

general GRASP algorithm, along with the global benefit of this particular placement. If the obtained

placement achieves a better benefit than previous placements, it is stored along with its benefit.

3.3. Random Job Choosing improvement

The first approach considered in this project took each of the jobs to be placed from a list, where the

jobs were always kept in the same order. However, it is clear that the last jobs in the list will have less

possible hosts to be placed and less possible CPU assignments, as some of the hosts may be already full

with other jobs. Therefore, the first jobs in the queue have advantage over the last ones. We will refer

this approach as NR GRASP heuristic in subsequent sections.

Taking this into account, a possible way to fix this is to pick the jobs to be placed from the job list in

random order. Then, even if a job has had disadvantage in an iteration of the GRASP heuristic because it

has been evaluated at the last place, it might be evaluated in the first place in another iteration. We will

refer this second approach as the R GRASP heuristic throughout this document.

13

4. Dataset Generation

4.1. Node capacity array generation

In order to be able to generate large datasets to perform large simulations, an automatic way to

generate the parameters has been implemented. One of these parameters is the capacity in terms of

number of CPUs of each host. Although the model exposed in the paper only considered up-to-4 CPU

nodes, we have extended the power model in order to cope with up-to-8 CPU nodes in order to obtain

richer results.

The capacity of each node is expressed as an array where each position represents the CPU capacity of

host “i”. It can have a capacity of 1, 2, 4 or 8 CPUs, which is assigned randomly.

4.2. Min/Max job CPU requirement array generation

As in the CPU capacity array generation, the minimum job CPU requirement is assigned a random value

from 1 to 10. Therefore, each position of the “consMin” array represents the minimum CPU

requirement of job “j”, ranging from 1 to 9. Even though 9 and 10 CPUs are not available in any node, it

was considered that having jobs which can’t be run in the infrastructure and have to be refused was a

more realistic characterization.

Once the minimum CPU consumption array has been computed, the maximum CPU consumption array

is generated. Each job’s maximum CPU requirement is calculated as its minimum CPU requirement plus

an extra 1 or 2 CPUs, chosen randomly.

14

5. Experiment Result

5.1. Experiment Platform and Configuration

We used same computer to execute our CPLEX and Java heuristic code. It was a Dell Latitude E6410 with

Intel i7 M640 @2.8 GHz CPU, 8 Gigabytes of RAM, and 64-bit Windows 7 Operating System. We used

IBM ILOG CPLEX Optimization Studio 12.4 and our GRASP Java heuristic code was executed on top of

JRE 1.6.0_24-b07 from Oracle.

We simulated the following problem sizes: 5H10J (5-hosts-10-jobs), 15H30J, 20H40J, 30H60J, 40H80J

and 100H200J. For each problem size, we experimented with multiple Alpha ( ) values ranging from 0 to

1, with steps of 0.1 (0, 0.1, 0.2, 0.3 …. 1).

For the heuristic executions, we used multiple iteration configurations as follows: 10, 100, 1000, 10000

and 100000 iterations. We measured different performance aspects of both R and NR heuristic types.

The obtained results are discussed in the following subsections.

5.2. Execution Time

First of all, we measured the execution time in CPLEX for different problem sizes and compared them. As

expected, CPLEX takes a very long time to complete when the problem size is big. In our case, CPLEX was

able to complete a 20H40J instance in 228 seconds.When the problem size was increased to 30H60J,

CPLEX didn’t finish its execution, even though we let it run for 66 hours. We used the available CPLEX

results to measure the quality of the GRASP heuristic solution. Figure 7 shows the execution time of our

CPLEX solution.

Figure 7: Execution time of the CPLEX solution

15

Next, we continued to measure the execution time of the GRASP heuristic. GRASP is able to run for

larger problem sizes. Its execution time strongly depends on the number of iterations and the

configured problem size. We measured the execution time for all of the aforementioned problem sizes

and iteration configurations. In Figure 8, the execution time for the highest problem size of 100H200J is

shown.

Figure 8: Execution time for 100H200J R

The x-axis is displayed in logarithmic scale. In normal scale, the heuristic execution time has a linear

correlation with the number of iterations. It is interesting to observe that the GRASP heuristic performs

much better than CPLEX in terms of execution time. For example, the execution time of the GRASP

heuristic with 100H200J and 100000 iterations is 80 seconds faster compared to CPLEX with 20H40J.

However, the superiority of the GRASP heuristic can not be evaluated only in terms of execution time, it

also has to be taken into account the quality of the obtained solution. Therefore we also need to

measure the performance of our GRASP heuristic in terms of solution quality. This aspect will be

discussed in section 5.5.

5.3. GRASP Heuristic Alpha Value

In this section, we obtain the optimal Alpha value for the GRASP heuristic. For every problem size and

every iteration, we obtained the diagram depicting the relation between Alpha, Benefit, Number of

Iterations, and type of GRASP heuristic. NR refers to not-random job selection GRASP (NR GRASP

heuristic), and R refers to random job selection GRASP (R GRASP heuristic).

16

Figure 9: NR GRASP Heuristic results

Figure 9 shows the results for the NR GRASP heuristic with four types of problem sizes. Based on these

results, it is shown that maximum benefit will be achieved with low Alpha values (0.1, 0.2, 0.3). We also

can see that the more the number of the iterations, the higher the obtained benefit is. As it was

explained in Section 3.1, a value of Alpha=0 has a completely deterministic behaviour, given that only

the combination with the best benefit is chosen when building the Reduced Candidate List, and

therefore is always the one selected in the end. Note that no randomness is introduced in the job

choosing either, as they are selected always in order. This behaviour can be clearly observed in the

previous figure: regardless of the number of iterations, the obtained benefit is exactly the same.

17

Figure 10: R GRASP Heuristic results

Figure 10 shows the results for the R GRASP heuristic with four types of problem sizes, the same ones

that we used to measure NR GRASP heuristic Alpha value. They show the same trend as NR GRASP

heuristic whereby low Alpha values (0, 0.1, 0.2, 0.3) result in higher benefit compared to other Alpha

values. The number of iterations also determine the quality of the benefit and the higher the iteration,

the higher the obtained benefit value. It is also interesting to note that for high problem sizes (30H60J,

40H80J, 100H200J), the highest value is obtained when Alpha is 0. This result means that early

randomization in picking up jobs to be evaluated is more prevalent in producing better results than

randomization in the selection of candidates from the Restricted Candidate List (whose size depends on

the value of Alpha).

Both NR and R GRASP heuristic therefore obtain better benefits when the Alpha Value is small, typically

between 0 to 0.3, both included.

18

5.4. GRASP Heuristic Solution Rate Quality

In this section, we analyze how fast our GRASP heuristic algorithm reaches some significant percentages

of the best solution it provides. We define “significant percentages” as more than 90% of the best

solution that can be found using GRASP (sometimes, it reaches the optimal solution).

To perform this analysis, we used the largest problem size configuration (100H200J). For NR GRASP

heuristic, we used an Alpha value of 0.1 and 100000 iterations, while for R GRASP heuristic we used an

Alpha value of 0 and 100000 iterations.

Figure 11: Normalized benefit over time for NR GRASP Heuristic

Figure 11 shows the normalized benefit over time for the NR GRASP heuristic. The y-axis is the

normalized benefit, which is the percentage of benefit compared to maximum benefit that can be

obtained in this configuration. Note that since we were not able to use CPLEX to obtain the best benefit

for 100H200J, we used the maximum benefit obtained by NR GRASP heuristic to calculate the

normalized benefit. The x-axis denotes the time in mili-seconds. From the figure, we observe that within

12.377 seconds, our NR GRASP heuristic is able to reach around 99.95% of its maximum benefit.

19

Figure 12: Normalized benefit over time for R GRASP Heuristic

Figure 12 shows the results of the R GRASP heuristic. As in the NR GRASP heuristic results, we used the

maximum benefit obtained by the R GRASP heuristic to calculate the normalized benefit. The R GRASP

heuristic is able to reach 93% of maximum benefit within 0.617 seconds and 97.3% of maximum benefit

within 8.813 seconds.

Both NR and R GRASP heuristics have a very good performance in terms of speed to reach more than

90% of its maximum benefit.

5.5. Maximum Benefit Comparison - CPLEX and GRASP Heuristic

CPLEX is guaranteed to obtain the optimal result for our problem statements. However, it has a

drawback in its execution time. CPLEX execution time is very long and it is not affordable for big problem

sizes. On the other hand, heuristics in general have smaller execution times but their results should be

compared to those of CPLEX in order to evaluate its quality. Figure 13 shows the comparison between

CPLEX and GRASP heuristic results for multiple problem sizes. It also presents two different instances of

R GRASP heuristics, one with 10000 iterations and the other with 100000 iterations.

20

Figure 13: CPLEX and R/NR GRASP Heuristic results

For the smallest problem size (5H10J), all of them have the same maximum benefit. However, at 10H20J,

we already observe that NR GRASP with 100000 iterations is not able to reach the optimal solution

obtained by CPLEX. But R GRASP with 10000 iterations and 100000 iterations managed to produce

CPLEX’s optimal benefit values. The same trend is observed for 15H30J problem size, but things started

to get interesting in 20H40J where R GRASP with 10000 iterations was not able to produce maximum

value produced by CPLEX. Here we can observe that the number of iterations also affect the maximum

obtained benefit. We also can claim that our R GRASP heuristic produces good results, and its obtained

maximum benefit equals to the CPLEX’s optimal benefit for all the simulations where CPLEX data was

available.

For bigger problem sizes (30H60J, 40H80J, 100H200J) we were not able to obtain CPLEX results due to its

extremely long execution times. We observe that R GRASP heuristic with 100000 iterations produces the

maximum benefit compared to the other configurations of the GRASP heuristic. Execution time of

100000-iteration-R GRASP heuristic is around 5 minutes and it is still reasonable.We could not confirm

whether the resulting value is the optimal benefit value but based on good comparison results with

CPLEX for smaller problem sizes, we are assured that the resulting value is pretty good also. In this case,

we can improve the quality of the solution by increasing the number of iterations if we have enough

time and resources to perform that.

21

6. Conclusions

In this project, we have implemented an ILP problem for data-center job scheduling and management.

We used IBM ILOG CPLEX to solve our ILP model. We also implemented two types of GRASP heuristic to

solve the scheduling problem, they are Non Random Job Selection GRASP (NR GRASP heuristic) and

Random Job Selection GRASP (R GRASP heuristic). During the GRASP heuristic methods implementation,

we found that complex ILP restrictions or constraints can be translated into relatively easy heuristic Java

code.

R GRASP heuristic performs better than NR GRASP heuristic. For four small problem sizes that CPLEX is

able to produce maximum benefit, we found that R GRASP heuristic with 100000 iteration also managed

to obtain CPLEX’s optimal benefit. However, NR GRASP heuristic is only able to produce around 83.5% of

the maximum benefit for 15H30J and 88.11% of the maximum benefit for 10H20J case.

In both GRASP implementation, lower Alpha values produce better results. And interestingly, for R

GRASP heuristic, the best result is obtained when Alpha is 0. We also observe that more iterations will

achieve better results, but the execution time will increase.

Regarding the scalability of the solution, based on our experiment, CPLEX does not really scale well as its

execution time grows exponentially. When we increased the problem sizes, CPLEX was not able to finish

its optimization process although we let it run for 66 hours. On the other hand, GRASP heuristic scales

well. We were able to use it to solve the problem for bigger problem sizes with acceptable times and

solution qualities. With the biggest problem size that CPLEX can run, R GRASP heuristic is still able to find

the same benefit as CPLEX, which is the optimal one.

22

7. Future Work

7.1. Implementation of the Local Search Phase of GRASP

We could further improve our implementation of the GRASP heuristic by including the Local-Search

phase in our heuristic. One way to implement the Local-Search is to perform benefit comparisons

between the solution obtained by the Greedy Function and a modification of this same solution in which

some jobs are not migrated between nodes. Refer to Figure 14. After the Greedy Function obtains a

New Schedule, we create local search candidate (a neighbour of the obtained solution) where Job 1 and

Job 5 are not migrated with respect to the Old Schedule. We can repeat this process by producing

several local search candidates (neighbours of the original solution) and compare their results with the

original New Schedule obtained by the Greedy Function.

Figure 14: Possible local search phase approach

7.2. Simulations with Multiple Problem Instances

When evaluating the performance and quality of the solutions obtained by the GRASP heuristic, only

one problem instance for each problem size was executed. This means that only one set of parameters

23

(CPUs per host, minimum and maximum CPU requirements per job) was generated to perform the

simulations for each problem size. In order to have a more realistic view of the GRASP performance,

multiple sets of parameters for a given problem size could be generated, and evaluate the performance

of the heuristic for different inputs of the same size.

24

8. References

[1] Josep Ll. Berral, Ricard Gavaldà, Jordi Torres. “An Integer Linear Programming Representation for

DataCenter Power-Aware Management” Research Report number: UPC-LSI-10-21-R, November 2010.

an integer programming representation for data center power-aware management - report

Education