chapter 7 implementation of dynamic voltage scaling...

73

CHAPTER 7

IMPLEMENTATION OF DYNAMIC VOLTAGE

SCALING IN LINUX SCHEDULER

7.1 INTRODUCTION

The proposed DVS algorithm is implemented on DELL

INSPIRON™ 6000 model laptop, which has Intel® Pentium® Mobile

Processor with maximum frequency of 1.86 GHz. The algorithm is

implemented as extension modules to the Fedora core Linux kernel

version 2.6.5. The experimentation results conclude that the UBFG based

algorithm achieves better energy savings than the existing algorithms. This

chapter explains the implementation and experimental set up in detail.

7.2 LINUX SCHEDULER

Multitasking kernels (like Linux) allow more than one process to

exist at any given time, and furthermore each process is allowed to run as if it

were the only process on the system. Processes do not need to be aware of any

other processes unless they are explicitly designed to be. This makes

programs easier to develop, maintain, and port. Though each CPU in a system

can execute only one thread within a process at a time, many threads from

many processes appear to be executing at the same time. This is because

threads are scheduled to run for very short periods of time and then other

threads are given a chance to run. A kernel’s scheduler enforces a thread

scheduling policy, including when, for how long, and in some cases where

74

(on Symmetric Multiprocessing (SMP) systems) threads can execute.

Normally the scheduler runs in its own thread, which is woken up by a timer

interrupt. Otherwise it is invoked via a system call or another kernel thread

that wishes to yield the CPU. A thread will be allowed to execute for a certain

amount of time, then a context switch to the scheduler thread will occur,

followed by another context switch to a thread of the scheduler’s choice. This

cycle continues, and in this way a certain policy for CPU usage is carried out.

An important goal for the Linux scheduler is efficiency. This means that it

must try to allow as much real work as possible to be done while staying

within the restraints of other requirements. For example - since context

switching is expensive, allowing tasks to run for longer periods of time

increases efficiency. Also, since the scheduler’s code is run quite often, its

own speed is an important factor in scheduling efficiency .The code making

scheduling decisions should run as quickly and efficiently as possible.

Efficiency suffers for the sake of other goals such as interactivity, because

interactivity essentially means having more frequent context switches.

However, once all other requirements have been met, overall efficiency is the

most important goal for the scheduler.

The Linux 2.6 scheduler does not contain any algorithm that runs in

worse than O (1) time. That is, every part of the scheduler is guaranteed to

execute within a certain constant amount of time regardless of how many

tasks are on the system. This allows the Linux kernel to efficiently handle

massive number of tasks without increasing overhead costs as the number of

tasks grows. There are two key data structures in the Linux 2.6 scheduler that

allow for it to perform its duties in O (1) time and its design revolves around

them – run queues and priority arrays.

The Linux scheduler takes care of scheduling the tasks for the

processor to execute. Even in multiprocessing systems, at any instance of time

75

there can be only one process that is executed in a processor. The OS

scheduler is the module that decides which process is to make use of the

processor. The process switching, privilege, task management of the Linux

scheduler had to be analyzed in detail for incorporating the DVS algorithm.

7.3 LINUX KERNEL

In a Linux kernel, a CPU can run in either user mode or kernel

mode. When a program is executed in user mode, it cannot directly access the

kernel data structures or the kernel programs. Each CPU model provides

special instructions to switch over from user mode to kernel mode and vice-

versa. A program usually executes in user mode and switches to kernel mode

only when requesting a service provided by the kernel. The kernel interacts

with the input/output devices by means of device drivers. The device drivers

are included in the kernel. Each driver interacts with the remaining parts of

the kernel through a specific interface. A device driver can be written as a

module that can be dynamically loaded in the kernel without requiring the

system to be rebooted. It is also possible to dynamically unload a module that

is no longer needed.

To add a new functionality to the Linux kernel, the new code can

be either compiled as module or can be statically linked to the kernel. The

kernel has two key tasks to perform in managing modules. The first task is to

make sure the rest of the kernel can reach the module’s global symbols such

as the entry point to its main function. A module must know the addresses of

symbols in the kernel and in other modules. Thus, references are resolved

once and for all when the module is linked. The second task consists of

keeping track of the use of the modules, so that no module is unloaded while

another module or another part of the kernel is using it. A simple reference

count keeps track of each module usage.

76

7.4 DVS PROCESSOR

The proposed DVS algorithm is implemented on DELL

INSPIRON™ 6000 model laptop, which has Intel® Pentium® Mobile

Processor with maximum frequency of 1.86GHz. This processor adopts Speed

Step® technology where the processor is defined to have five active states

with frequencies ranging from 800MHz to 1.6GHz shown in Table 7.1. Intel®

Pentium® mobile processor is a CPU frequency scaling processor which can

switch between various defined frequencies and the operating voltages on the

fly without any kernel or user involvement. This feature guarantees very fast

switching to frequencies high enough to serve user needs and low enough to

save power.

Table 7.1 Voltage scaling capability of Intel Pentium Mobile processor

with Speed Step® technology

Frequency (MHz) Voltage (mV)

800(idle time) 1036

1067 1164

1333 1276

1600 1420

1867 1484

7.5 IMPLEMENTATION

DVS algorithm is implemented as extension modules to the Linux

Kernel version 2.6.5. Although it is not a real time operating system, Linux is

extended easily through modules and provides a robust development

environment. The high level view of the software architecture for

implementation of DVS algorithm is shown in Figure 7.1.

77

Figure 7.1 Software architecture for RT-DVS implementation

The kernel level code is implemented as Linux kernel modules and

these modules can be loaded and unloaded using the modins and modrm

commands respectively during run time. The software implementation

comprises of three modules viz., real time task module, task abstraction

module and the algorithm module. The real time task module and the task

abstraction modules are inserted into the kernel after successful compilation.

The real time module acts as a real-time abstraction layer that helps in

simulating the real time functionality for the operating system and provides

the user interaction. It registers with the ‘ proc ’ and provides a file interface

to write and read setting values. The registered intervals and worst case

execution time of real time tasks are shown in Figure 7.2. The set of real time

tasks is invoked by running crtt. which has worst case execution time, period

and deadline of tasks. The implementation of real time module and DVS

scheduler is made by inserting modules into the kernel namely kort .mod

Non-RT DVS

RT Task Set

Speed Step Module

Periodic RT Task

Module

RT Scheduler with RT-DVS

Scheduler Hook

Linux Kernel

User Level

Kernel Level

78

and koscheddvs. . The modrt invokes the DVS scheduler for scheduling its real

time tasks and managing the frequency as shown in Figure 7.3.

The algorithm communicates the decision taken to the hardware

using the CPUFreq driver component. This driver can be compiled as module

and inserted into the kernel at runtime or they can be compiled as the part of

the kernel itself. The GNU-C compiler gcc is used to compile the drivers as

modules and also for compiling as part of the kernel (Figure 7.4). Using the -

make xconfig command in Linux, the user would be presented a GUI based

option list to include the driver as a part of the kernel.

The CPUFreq core code is located in ccpufreqnellinux ./ker/ . This

CPUFreq code offers a standardized interface for the CPUFreq architecture

as well as to notifiers. These are device drivers or other part of the kernel that

need to be informed of policy changes or of all frequency changes. CPUFreq

driver components are specific to the processor and implements decision

determined by the CPUFreq governors. These governors implement policies

regarding the frequency and voltage scaling. The system user can change

governors and their corresponding parameters at run time. There are currently

three governors in Linux 2.6.5 kernel viz., Power save governor that statically

sets the processor to the lowest frequency and voltage available, Performance

governor that sets the processor to the highest frequency and voltage available

and User space governor which allows the user to set the desired frequency

and voltage through the proc/ interface. After receiving the decision from the

DVS algorithm through the CPUFreq interface, the appropriate CPUFreq

driver components are accessed to scale the voltage and frequency

accordingly. Figures 7.5 to 7.7 show the proof of results of experimentation of

DVS governors for idle time, different work loads and CPU intensive

operation.

79

Figure 7.2 Modular flow diagrams - crt .mod

INIT_MODULE : 1. REGISTER WITH PROC AND PROVIDE THE FOPS STRUCTURE 2. RETURN SUCCESS/FAILURE

CLEANUP_MODULE : 1. UN REGISTER WITH PROC AND RETURN SUCCESS/FAILURE

RTMOD_OPEN : 1.GET THE PROCESS ID AND INIT THE RT TASK STRUCT 2. RETURN SUCCESS / FAILURE

RTMOD_CLOSE : 1. RELEASE RT STRUCTURES 2. RETURN SUCCESS /FAILURE

RTMOD_READ : 1. PREPARE A MESSAGE WITH THE CURRENT STATISTICS AND PASS IT ON TO THE USER LEVEL .

RTMOD_WRITE : 1. GET THE INTERVAL AND WORST CASE EXECUTION TIME AND PASS IT TO SCHEDULER .

FOPEN(“”,””,PID)

USER_WRITE

USER_READ

CLOSE (FP)

FILE OPERATIONS -ALGO

INSMOD RTMOD.O (INSTALLATION)

RMMOD RTMOD.O (UNINSTALLATION)

FWRITE

FREAD

80

Figure 7.3 Modular flow diagrams - cchgfreq.

INIT_MODULE : 1.REGISTER WITH PROC AND PROVIDE THE FOPS STRUCTURE 2. RETURN SUCCESS/FAILURE

CLEANUP_MODULE : 1. UN REGISTER WITH PROC AND RETURN SUCCESS/FAILURE

CHGFREQ_OPEN : 1. INCREMENT AN OPEN COUNT 2. RETURN SUCCESS

CHGFREQ_CLOSE : 1. DECREMENT THE OPEN COUNT 2. RETURN SUCCESS

CHGFREQ_READ : 1. GET THE CURRENT

FREQ, VOLTAGE FROM PROCESSOR REGISTERS AND PREPARE MESSAGE.

2. COPY THE MESSAGE TO USER SPACE

CHGFREQ_WRITE: 1. COPY THE SETTING

FREQ, VOLTAGE AS A MESSAGE FROM THE USER SPACE.

2. PARSE MESSAGE AND SET THE RESPECTIVE REGISTERS WITH PROPOSED VALUES

CAT /PROC/CHGFREQ <PROC FREQ : 500>

USER_WR

USER_RD

ECHO “500” > /PROC/CHGFREQ

FILE OPERATIONS -ALGO

INSMOD CHGFREQ.O (INSTALLATION)

RMMOD CHGFREQ.O (UNINSTALLATION)

81

Figure 7.4 GUI based options in configuration setup in Linux

82

Figure 7.5 Power save Governor

83

Figure 7.6 User space Governor

84

Figure 7.7 Performance Governor

85

7.6 OBSERVATIONS AND RESULTS

Experiments for different workloads are done to ensure that the DVS algorithm achieves minimum power consumption and maintains effective system performance while experiencing variable processor performance in reality. The next goal was to quantify, by simply measuring

the energy usage for completing one trial of the experiment. The comparative study was based on battery usage for set of tasks for system set up without DVS, look ahead RT-DVS, and UBFG RT-DVS. The energy consumption is measured by running the laptop on battery and using the ACPI (Advanced

Configuration Power Interface) in Linux to get a fairly accurate evaluation of the remaining capacity of the battery (in mAh ) as shown in Figure 7.8.

Figure 7.8 Battery usage measurement using ACPI in Linux

86

The battery usage for tasks with and without DVS showed good

variance henceforth proving that Dynamic voltage scaling technique is a

noteworthy technology for portable devices. Look ahead RT-DVS algorithm,

which is considered as the most aggressive of RT-DVS algorithm showed

marginal difference from the DVS set up energy savings. Utilization based

frequency grading algorithm is better than the existing RT-DVS algorithms

for achieving more energy savings. The power measurements indicate that

there is 8 to 14% savings than the most aggressive look ahead EDF algorithm.

The comparison charts of battery usage measurements shown in Figure 7.9

prove the potential energy savings in the DVS technology and the enhanced

performance of UBFG algorithm.

Battery measurement

04080

120160200240280320360400

Static EDF LA_EDF ProposedAlgorithms

Bat

tery

con

sum

ptio

n in

m

Ah

Static EDF

LA_EDF

Proposed

Figure 7.9 Comparison charts of battery usage measurements of DVS

algorithms

87

Figure 7.10 shows the actual power consumption measured for real

time DVS algorithms while varying worst-case CPU utilization for a set of

three tasks which consume 80% of their worst-case computation allocated for

each invocation. The measurements reflect the total system power including

constant energy overheads, not just the CPU energy dissipation. Even with

this overhead, the proposed DVS mechanism show a significant (~8-10%)

reduction in power consumption than existing algorithm, while still providing

the deadline guarantees of a real time system.

Real platform

0

5

10

15

20

25

30

0.237 0.337 0.437 0.597 0.677 0.77

Utilization

Pow

er in

Wat

ts

Static EDFCC_EDF LA_EDF Proposed

Figure 7.10 Power consumptions on real platform

88

Figure 7.11 shows a simulation with identical parameters to these

measurements. The simulation only reflects the processor’s energy

consumption, and does not include any energy overheads from the rest of the

system. It is clear that, except for the addition of constant overheads in the

actual measurements, the results are nearly identical and validates the

simulation results. Simulation results which are shown earlier really hold in

real systems, despite the simplifying assumptions in the simulator. The

simulations are accurate and may be useful for predicting the performance of

RT-DVS implementation.

Simulated platform

0

1

2

3

4

5

0.237 0.337 0.437 0.597 0.677 0.77

Utilization

Pow

er a

rbitr

ary

unit)

Static EDFCC_EDF LA_EDF Proposed

Figure 7.11 Power consumptions on simulated platform

89

7.7 CONCLUSION

Based on interpretation, the voltage scaling capability of a

processor supporting Speed Step® technology is explored. The proposed

DVS algorithm, which combines the goodness of static EDF and look ahead

EDF algorithm, is implemented as a modular system in the Linux kernel and

performance is analyzed with the existing algorithms. The proposed algorithm

achieves significant energy savings while preserving timeline guarantees

compared to the previously proposed algorithms. Due to the modularity of the

implementation, additional algorithms can be implemented and validated

using the system.

chapter 7 implementation of dynamic voltage scaling...

Documents