chapter 7 implementation of dynamic voltage scaling...
TRANSCRIPT
73
CHAPTER 7
IMPLEMENTATION OF DYNAMIC VOLTAGE
SCALING IN LINUX SCHEDULER
7.1 INTRODUCTION
The proposed DVS algorithm is implemented on DELL
INSPIRON™ 6000 model laptop, which has Intel® Pentium® Mobile
Processor with maximum frequency of 1.86 GHz. The algorithm is
implemented as extension modules to the Fedora core Linux kernel
version 2.6.5. The experimentation results conclude that the UBFG based
algorithm achieves better energy savings than the existing algorithms. This
chapter explains the implementation and experimental set up in detail.
7.2 LINUX SCHEDULER
Multitasking kernels (like Linux) allow more than one process to
exist at any given time, and furthermore each process is allowed to run as if it
were the only process on the system. Processes do not need to be aware of any
other processes unless they are explicitly designed to be. This makes
programs easier to develop, maintain, and port. Though each CPU in a system
can execute only one thread within a process at a time, many threads from
many processes appear to be executing at the same time. This is because
threads are scheduled to run for very short periods of time and then other
threads are given a chance to run. A kernel’s scheduler enforces a thread
scheduling policy, including when, for how long, and in some cases where
74
(on Symmetric Multiprocessing (SMP) systems) threads can execute.
Normally the scheduler runs in its own thread, which is woken up by a timer
interrupt. Otherwise it is invoked via a system call or another kernel thread
that wishes to yield the CPU. A thread will be allowed to execute for a certain
amount of time, then a context switch to the scheduler thread will occur,
followed by another context switch to a thread of the scheduler’s choice. This
cycle continues, and in this way a certain policy for CPU usage is carried out.
An important goal for the Linux scheduler is efficiency. This means that it
must try to allow as much real work as possible to be done while staying
within the restraints of other requirements. For example - since context
switching is expensive, allowing tasks to run for longer periods of time
increases efficiency. Also, since the scheduler’s code is run quite often, its
own speed is an important factor in scheduling efficiency .The code making
scheduling decisions should run as quickly and efficiently as possible.
Efficiency suffers for the sake of other goals such as interactivity, because
interactivity essentially means having more frequent context switches.
However, once all other requirements have been met, overall efficiency is the
most important goal for the scheduler.
The Linux 2.6 scheduler does not contain any algorithm that runs in
worse than O (1) time. That is, every part of the scheduler is guaranteed to
execute within a certain constant amount of time regardless of how many
tasks are on the system. This allows the Linux kernel to efficiently handle
massive number of tasks without increasing overhead costs as the number of
tasks grows. There are two key data structures in the Linux 2.6 scheduler that
allow for it to perform its duties in O (1) time and its design revolves around
them – run queues and priority arrays.
The Linux scheduler takes care of scheduling the tasks for the
processor to execute. Even in multiprocessing systems, at any instance of time
75
there can be only one process that is executed in a processor. The OS
scheduler is the module that decides which process is to make use of the
processor. The process switching, privilege, task management of the Linux
scheduler had to be analyzed in detail for incorporating the DVS algorithm.
7.3 LINUX KERNEL
In a Linux kernel, a CPU can run in either user mode or kernel
mode. When a program is executed in user mode, it cannot directly access the
kernel data structures or the kernel programs. Each CPU model provides
special instructions to switch over from user mode to kernel mode and vice-
versa. A program usually executes in user mode and switches to kernel mode
only when requesting a service provided by the kernel. The kernel interacts
with the input/output devices by means of device drivers. The device drivers
are included in the kernel. Each driver interacts with the remaining parts of
the kernel through a specific interface. A device driver can be written as a
module that can be dynamically loaded in the kernel without requiring the
system to be rebooted. It is also possible to dynamically unload a module that
is no longer needed.
To add a new functionality to the Linux kernel, the new code can
be either compiled as module or can be statically linked to the kernel. The
kernel has two key tasks to perform in managing modules. The first task is to
make sure the rest of the kernel can reach the module’s global symbols such
as the entry point to its main function. A module must know the addresses of
symbols in the kernel and in other modules. Thus, references are resolved
once and for all when the module is linked. The second task consists of
keeping track of the use of the modules, so that no module is unloaded while
another module or another part of the kernel is using it. A simple reference
count keeps track of each module usage.
76
7.4 DVS PROCESSOR
The proposed DVS algorithm is implemented on DELL
INSPIRON™ 6000 model laptop, which has Intel® Pentium® Mobile
Processor with maximum frequency of 1.86GHz. This processor adopts Speed
Step® technology where the processor is defined to have five active states
with frequencies ranging from 800MHz to 1.6GHz shown in Table 7.1. Intel®
Pentium® mobile processor is a CPU frequency scaling processor which can
switch between various defined frequencies and the operating voltages on the
fly without any kernel or user involvement. This feature guarantees very fast
switching to frequencies high enough to serve user needs and low enough to
save power.
Table 7.1 Voltage scaling capability of Intel Pentium Mobile processor
with Speed Step® technology
Frequency (MHz) Voltage (mV)
800(idle time) 1036
1067 1164
1333 1276
1600 1420
1867 1484
7.5 IMPLEMENTATION
DVS algorithm is implemented as extension modules to the Linux
Kernel version 2.6.5. Although it is not a real time operating system, Linux is
extended easily through modules and provides a robust development
environment. The high level view of the software architecture for
implementation of DVS algorithm is shown in Figure 7.1.
77
Figure 7.1 Software architecture for RT-DVS implementation
The kernel level code is implemented as Linux kernel modules and
these modules can be loaded and unloaded using the modins and modrm
commands respectively during run time. The software implementation
comprises of three modules viz., real time task module, task abstraction
module and the algorithm module. The real time task module and the task
abstraction modules are inserted into the kernel after successful compilation.
The real time module acts as a real-time abstraction layer that helps in
simulating the real time functionality for the operating system and provides
the user interaction. It registers with the ‘ proc ’ and provides a file interface
to write and read setting values. The registered intervals and worst case
execution time of real time tasks are shown in Figure 7.2. The set of real time
tasks is invoked by running crtt. which has worst case execution time, period
and deadline of tasks. The implementation of real time module and DVS
scheduler is made by inserting modules into the kernel namely kort .mod
Non-RT DVS
RT Task Set
Speed Step Module
Periodic RT Task
Module
RT Scheduler with RT-DVS
Scheduler Hook
Linux Kernel
User Level
Kernel Level
78
and koscheddvs. . The modrt invokes the DVS scheduler for scheduling its real
time tasks and managing the frequency as shown in Figure 7.3.
The algorithm communicates the decision taken to the hardware
using the CPUFreq driver component. This driver can be compiled as module
and inserted into the kernel at runtime or they can be compiled as the part of
the kernel itself. The GNU-C compiler gcc is used to compile the drivers as
modules and also for compiling as part of the kernel (Figure 7.4). Using the -
make xconfig command in Linux, the user would be presented a GUI based
option list to include the driver as a part of the kernel.
The CPUFreq core code is located in ccpufreqnellinux ./ker/ . This
CPUFreq code offers a standardized interface for the CPUFreq architecture
as well as to notifiers. These are device drivers or other part of the kernel that
need to be informed of policy changes or of all frequency changes. CPUFreq
driver components are specific to the processor and implements decision
determined by the CPUFreq governors. These governors implement policies
regarding the frequency and voltage scaling. The system user can change
governors and their corresponding parameters at run time. There are currently
three governors in Linux 2.6.5 kernel viz., Power save governor that statically
sets the processor to the lowest frequency and voltage available, Performance
governor that sets the processor to the highest frequency and voltage available
and User space governor which allows the user to set the desired frequency
and voltage through the proc/ interface. After receiving the decision from the
DVS algorithm through the CPUFreq interface, the appropriate CPUFreq
driver components are accessed to scale the voltage and frequency
accordingly. Figures 7.5 to 7.7 show the proof of results of experimentation of
DVS governors for idle time, different work loads and CPU intensive
operation.
79
Figure 7.2 Modular flow diagrams - crt .mod
INIT_MODULE : 1. REGISTER WITH PROC AND PROVIDE THE FOPS STRUCTURE 2. RETURN SUCCESS/FAILURE
CLEANUP_MODULE : 1. UN REGISTER WITH PROC AND RETURN SUCCESS/FAILURE
RTMOD_OPEN : 1.GET THE PROCESS ID AND INIT THE RT TASK STRUCT 2. RETURN SUCCESS / FAILURE
RTMOD_CLOSE : 1. RELEASE RT STRUCTURES 2. RETURN SUCCESS /FAILURE
RTMOD_READ : 1. PREPARE A MESSAGE WITH THE CURRENT STATISTICS AND PASS IT ON TO THE USER LEVEL .
RTMOD_WRITE : 1. GET THE INTERVAL AND WORST CASE EXECUTION TIME AND PASS IT TO SCHEDULER .
FOPEN(“”,””,PID)
USER_WRITE
USER_READ
CLOSE (FP)
FILE OPERATIONS -ALGO
INSMOD RTMOD.O (INSTALLATION)
RMMOD RTMOD.O (UNINSTALLATION)
FWRITE
FREAD
80
Figure 7.3 Modular flow diagrams - cchgfreq.
INIT_MODULE : 1.REGISTER WITH PROC AND PROVIDE THE FOPS STRUCTURE 2. RETURN SUCCESS/FAILURE
CLEANUP_MODULE : 1. UN REGISTER WITH PROC AND RETURN SUCCESS/FAILURE
CHGFREQ_OPEN : 1. INCREMENT AN OPEN COUNT 2. RETURN SUCCESS
CHGFREQ_CLOSE : 1. DECREMENT THE OPEN COUNT 2. RETURN SUCCESS
CHGFREQ_READ : 1. GET THE CURRENT
FREQ, VOLTAGE FROM PROCESSOR REGISTERS AND PREPARE MESSAGE.
2. COPY THE MESSAGE TO USER SPACE
CHGFREQ_WRITE: 1. COPY THE SETTING
FREQ, VOLTAGE AS A MESSAGE FROM THE USER SPACE.
2. PARSE MESSAGE AND SET THE RESPECTIVE REGISTERS WITH PROPOSED VALUES
CAT /PROC/CHGFREQ <PROC FREQ : 500>
USER_WR
USER_RD
ECHO “500” > /PROC/CHGFREQ
FILE OPERATIONS -ALGO
INSMOD CHGFREQ.O (INSTALLATION)
RMMOD CHGFREQ.O (UNINSTALLATION)
81
Figure 7.4 GUI based options in configuration setup in Linux
82
Figure 7.5 Power save Governor
83
Figure 7.6 User space Governor
84
Figure 7.7 Performance Governor
85
7.6 OBSERVATIONS AND RESULTS
Experiments for different workloads are done to ensure that the DVS algorithm achieves minimum power consumption and maintains effective system performance while experiencing variable processor performance in reality. The next goal was to quantify, by simply measuring
the energy usage for completing one trial of the experiment. The comparative study was based on battery usage for set of tasks for system set up without DVS, look ahead RT-DVS, and UBFG RT-DVS. The energy consumption is measured by running the laptop on battery and using the ACPI (Advanced
Configuration Power Interface) in Linux to get a fairly accurate evaluation of the remaining capacity of the battery (in mAh ) as shown in Figure 7.8.
Figure 7.8 Battery usage measurement using ACPI in Linux
86
The battery usage for tasks with and without DVS showed good
variance henceforth proving that Dynamic voltage scaling technique is a
noteworthy technology for portable devices. Look ahead RT-DVS algorithm,
which is considered as the most aggressive of RT-DVS algorithm showed
marginal difference from the DVS set up energy savings. Utilization based
frequency grading algorithm is better than the existing RT-DVS algorithms
for achieving more energy savings. The power measurements indicate that
there is 8 to 14% savings than the most aggressive look ahead EDF algorithm.
The comparison charts of battery usage measurements shown in Figure 7.9
prove the potential energy savings in the DVS technology and the enhanced
performance of UBFG algorithm.
Battery measurement
04080
120160200240280320360400
Static EDF LA_EDF ProposedAlgorithms
Bat
tery
con
sum
ptio
n in
m
Ah
Static EDF
LA_EDF
Proposed
Figure 7.9 Comparison charts of battery usage measurements of DVS
algorithms
87
Figure 7.10 shows the actual power consumption measured for real
time DVS algorithms while varying worst-case CPU utilization for a set of
three tasks which consume 80% of their worst-case computation allocated for
each invocation. The measurements reflect the total system power including
constant energy overheads, not just the CPU energy dissipation. Even with
this overhead, the proposed DVS mechanism show a significant (~8-10%)
reduction in power consumption than existing algorithm, while still providing
the deadline guarantees of a real time system.
Real platform
0
5
10
15
20
25
30
0.237 0.337 0.437 0.597 0.677 0.77
Utilization
Pow
er in
Wat
ts
Static EDFCC_EDF LA_EDF Proposed
Figure 7.10 Power consumptions on real platform
88
Figure 7.11 shows a simulation with identical parameters to these
measurements. The simulation only reflects the processor’s energy
consumption, and does not include any energy overheads from the rest of the
system. It is clear that, except for the addition of constant overheads in the
actual measurements, the results are nearly identical and validates the
simulation results. Simulation results which are shown earlier really hold in
real systems, despite the simplifying assumptions in the simulator. The
simulations are accurate and may be useful for predicting the performance of
RT-DVS implementation.
Simulated platform
0
1
2
3
4
5
0.237 0.337 0.437 0.597 0.677 0.77
Utilization
Pow
er a
rbitr
ary
unit)
Static EDFCC_EDF LA_EDF Proposed
Figure 7.11 Power consumptions on simulated platform
89
7.7 CONCLUSION
Based on interpretation, the voltage scaling capability of a
processor supporting Speed Step® technology is explored. The proposed
DVS algorithm, which combines the goodness of static EDF and look ahead
EDF algorithm, is implemented as a modular system in the Linux kernel and
performance is analyzed with the existing algorithms. The proposed algorithm
achieves significant energy savings while preserving timeline guarantees
compared to the previously proposed algorithms. Due to the modularity of the
implementation, additional algorithms can be implemented and validated
using the system.