adding a scheduling policy to the linux kernel

34
Adding a Adding a Scheduling Policy Scheduling Policy to the Linux to the Linux Kernel Kernel By Juan M. Banda By Juan M. Banda CS518 Advanced Operating CS518 Advanced Operating Systems Systems

Upload: snow

Post on 09-Jan-2016

24 views

Category:

Documents


2 download

DESCRIPTION

Adding a Scheduling Policy to the Linux Kernel. By Juan M. Banda CS518 Advanced Operating Systems. Presentation Outline. Introduction Project Description / Challenges Background Information Project Steps Achievements References. Introduction. What is Linux? - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Adding a Scheduling Policy to the Linux Kernel

Adding a Adding a Scheduling Policy Scheduling Policy

to the Linux Kernelto the Linux Kernel

By Juan M. BandaBy Juan M. Banda

CS518 Advanced Operating CS518 Advanced Operating SystemsSystems

Page 2: Adding a Scheduling Policy to the Linux Kernel

Presentation OutlinePresentation Outline

IntroductionIntroduction Project Description / ChallengesProject Description / Challenges Background InformationBackground Information Project StepsProject Steps AchievementsAchievements ReferencesReferences

Page 3: Adding a Scheduling Policy to the Linux Kernel

IntroductionIntroduction

What is Linux?What is Linux? Operating system for computers, comparable to Windows Operating system for computers, comparable to Windows

or Mac OS Xor Mac OS X Created starting in 1991 by Finnish programmer Linus Created starting in 1991 by Finnish programmer Linus

Torvalds with the assistance of developers from around the Torvalds with the assistance of developers from around the globeglobe

Runs on a wide variety of hardware platforms, from huge Runs on a wide variety of hardware platforms, from huge mainframes to desktop PCs to cell phonesmainframes to desktop PCs to cell phones

Licensed under the Free Software Foundation's GNU Licensed under the Free Software Foundation's GNU Project's GNU General Public License, version 2, which lets Project's GNU General Public License, version 2, which lets users modify and redistribute the softwareusers modify and redistribute the software

You can think of Linux as having two parts -- a kernel, You can think of Linux as having two parts -- a kernel, which is the basic interface between the hardware and which is the basic interface between the hardware and other system software, and the functions that run on top of other system software, and the functions that run on top of it, such as a graphical user interface (GUI) and application it, such as a graphical user interface (GUI) and application programsprograms

Page 4: Adding a Scheduling Policy to the Linux Kernel

Project Description / Project Description / ChallengesChallenges

IdeaIdea: Implement a new scheduling policy: Implement a new scheduling policy

PurposePurpose: The new policy should schedule processes in the : The new policy should schedule processes in the background.background.

Problem 1Problem 1: SCHED_IDLE already does this: SCHED_IDLE already does this

ModificationModification: Policy should schedule process in a lower : Policy should schedule process in a lower priority than SCHED_IDLEpriority than SCHED_IDLE

Problem 2Problem 2: Kernel 2.6 scheduler is considerably different : Kernel 2.6 scheduler is considerably different than in Kernel 2.4than in Kernel 2.4

Page 5: Adding a Scheduling Policy to the Linux Kernel

Background InformationBackground Information Kernel 2.4 scheduler major features:Kernel 2.4 scheduler major features:

An O(n) scheduler - Goes through the entire “ global runqueue” to determine the next task to be run. This is an O(n) algorithm where 'n' is the number of processes. The time taken was proportional to the number of active processes in the system

A Global runqueue - All CPUs had to wait for other CPUs to finish execution. .

A Global runqueue for all processors in a symmetric A Global runqueue for all processors in a symmetric multiprocessing system (SMP). This meant a task could be multiprocessing system (SMP). This meant a task could be scheduled on any processor -- which can be good for load scheduled on any processor -- which can be good for load balancing but bad for memory caches. For example, suppose a balancing but bad for memory caches. For example, suppose a task executed on CPU-1, and its data was in that processor's task executed on CPU-1, and its data was in that processor's cache. If the task got rescheduled to CPU-2, its data would need cache. If the task got rescheduled to CPU-2, its data would need to be invalidated in CPU-1 and brought into CPU-2 to be invalidated in CPU-1 and brought into CPU-2

This lead to large performance hits during heavy workloads

Page 6: Adding a Scheduling Policy to the Linux Kernel

Background InformationBackground Information Kernel 2.4 Scheduler Policies:Kernel 2.4 Scheduler Policies:

SCHED_FIFO - A First-In, First-Out real-time process

When the scheduler assigns the CPU to the process, it leaves the process descriptor in its current position in the runqueue list. If no other higher-priority realtime process is runnable, the process will continue to use the CPU as long as it wishes, even if other real-time processes having the same priority are runnable

Page 7: Adding a Scheduling Policy to the Linux Kernel

Background InformationBackground Information SCHED_RR - A Round Robin real-time process When the scheduler assigns the CPU to the process, it

puts the process descriptor at the end of the runqueue list. This policy ensures a fair assignment of CPU time to all SCHED_RR real-time processes that have the same priority

SCHED_OTHER - A conventional, time-shared process The policy field also encodes a SCHED_YIELD binary

flag. This flag is set when the process invokes the sched_ yield( ) system call (a way of voluntarily relinquishing the processor without the need to start an I/O operation or go to sleep. The scheduler puts the process descriptor at the bottom of the runqueue list

Page 8: Adding a Scheduling Policy to the Linux Kernel

Background InformationBackground Information Kernel 2.6Kernel 2.6

The 2.6 scheduler was designed and implemented by Ingo Molnar. The 2.6 scheduler was designed and implemented by Ingo Molnar. His motivation in working on the new scheduler was to create a His motivation in working on the new scheduler was to create a completely O(1) scheduler for wakeup, context-switch, and timer completely O(1) scheduler for wakeup, context-switch, and timer interrupt overheadinterrupt overhead

One of the issues that triggered the need for a new scheduler One of the issues that triggered the need for a new scheduler was the use of Java virtual machines (JVMs). The Java was the use of Java virtual machines (JVMs). The Java programming model uses many threads of execution, which programming model uses many threads of execution, which results in lots of overhead for scheduling in an O(n) schedulerresults in lots of overhead for scheduling in an O(n) scheduler

Each CPU has a runqueue made up of 140 priority lists that Each CPU has a runqueue made up of 140 priority lists that are serviced in FIFO order. Tasks that are scheduled to are serviced in FIFO order. Tasks that are scheduled to execute are added to the end of their respective runqueue's execute are added to the end of their respective runqueue's priority listpriority list

Each task has a time slice that determines how much time Each task has a time slice that determines how much time it's permitted to executeit's permitted to execute

The first 100 priority lists of the runqueue are reserved for The first 100 priority lists of the runqueue are reserved for real-time tasks, and the last 40 are used for user tasks real-time tasks, and the last 40 are used for user tasks (MAX_RT_PRIO=100 and MAX_PRIO=140)(MAX_RT_PRIO=100 and MAX_PRIO=140)

Page 9: Adding a Scheduling Policy to the Linux Kernel

Background InformationBackground Information In addition to the CPU's runqueue, which is called the In addition to the CPU's runqueue, which is called the active active

runqueuerunqueue, there's also an expired runqueue , there's also an expired runqueue When a task on the active runqueue uses all of its time slice, it's When a task on the active runqueue uses all of its time slice, it's

moved to the moved to the expired runqueueexpired runqueue. During the move, its time slice is . During the move, its time slice is recalculated (and so is its priority)recalculated (and so is its priority)

If no tasks exist on the active runqueue for a given priority, the If no tasks exist on the active runqueue for a given priority, the pointers for the active and expired runqueues are swapped, thus pointers for the active and expired runqueues are swapped, thus making the expired priority list the active onemaking the expired priority list the active one

Page 10: Adding a Scheduling Policy to the Linux Kernel

Background InformationBackground Information

O(1) Algorithm ( Constant time algorithm )

Choose the task on the highest priority list to executeChoose the task on the highest priority list to execute To make this process more efficient, a bitmap is used to define To make this process more efficient, a bitmap is used to define

when tasks are on a given priority listwhen tasks are on a given priority list On most architectures, a find-first-bit-set instruction is used to On most architectures, a find-first-bit-set instruction is used to

find the highest priority bit set in one of five 32-bit words (for find the highest priority bit set in one of five 32-bit words (for the 140 priorities)the 140 priorities)

The time it takes to find a task to execute depends not on the The time it takes to find a task to execute depends not on the number of active tasks but instead on the number of prioritiesnumber of active tasks but instead on the number of priorities

This makes the 2.6 scheduler an O(1) process because the This makes the 2.6 scheduler an O(1) process because the time to schedule is both fixed and deterministic regardless of time to schedule is both fixed and deterministic regardless of the number of active tasksthe number of active tasks

Page 11: Adding a Scheduling Policy to the Linux Kernel

Background InformationBackground Information SMP Support: SMP Support:

Even though the prior scheduler worked in SMP systems, its Even though the prior scheduler worked in SMP systems, its big-lock architecture meant that while a CPU was choosing a big-lock architecture meant that while a CPU was choosing a task to dispatch, the runqueue was locked by the CPU, and task to dispatch, the runqueue was locked by the CPU, and others had to waitothers had to wait

The 2.6 scheduler doesn't use a single lock for scheduling; The 2.6 scheduler doesn't use a single lock for scheduling; instead, it has a lock on each runqueue. This allows all CPUs instead, it has a lock on each runqueue. This allows all CPUs to schedule tasks without contention from other CPUs to schedule tasks without contention from other CPUs

Task preemption:Task preemption: This means a lower-priority task won't execute while a higher-This means a lower-priority task won't execute while a higher-

priority task is ready to run. The scheduler preempts the priority task is ready to run. The scheduler preempts the lower-priority process, places the process back on its priority lower-priority process, places the process back on its priority list, and then rescheduleslist, and then reschedules

Page 12: Adding a Scheduling Policy to the Linux Kernel

Background InformationBackground Information

Function name Function description

schedule The main scheduler function. Schedules the highest priority task for execution.

load_balance Checks the CPU to see whether an imbalance exists, and attempts to move tasks if not balanced.

effective_prio Returns the effective priority of a task (based on the static priority, but includes any rewards or penalties).

recalc_task_prio

Determines a task's bonus or penalty based on its idle time.

source_load Conservatively calculates the load of the source CPU (from which a task could be migrated).

target_load Liberally calculates the load of a target CPU (where a task has the potential to be migrated).

migration_thread

High-priority system thread that migrates tasks between CPUs.

Page 13: Adding a Scheduling Policy to the Linux Kernel

Background InformationBackground Information

Kernel 2.6 Scheduler Policies:Kernel 2.6 Scheduler Policies:

SCHED_NORMAL - A conventional, time-shared process (used to be called SCHED_OTHER), for normal tasks

Each task assigned a “Nice” value PRIO = MAX_RT_PRIO + NICE + 20 Assigned a time slice Tasks at the same prio(rity) are round-robined Ensures Priority + Fairness

Page 14: Adding a Scheduling Policy to the Linux Kernel

Background InformationBackground Information

SCHED_FIFO - A First-In, First-Out real-time process Run until they relinquish the CPU voluntarily Priority levels maintained Not pre-empted !!

SCHED_RR - A Round Robin real-time process Assigned a timeslice and run till the timeslice is exhausted. Once all RR tasks of a given prio(rity) level exhaust their

timeslices, their timeslices are refilled and they continue running

Prio(rity) levels are maintained

Page 15: Adding a Scheduling Policy to the Linux Kernel

Background InformationBackground Information SCHED_BATCH - for "batch" style execution of

processes For computing-intensive tasksFor computing-intensive tasks Timeslices are long and processes are round robin Timeslices are long and processes are round robin

scheduledscheduled lowest priority tasks are batch-processed (nice +19)lowest priority tasks are batch-processed (nice +19)

SCHED_IDLE - for running SCHED_IDLE - for running veryvery low priority low priority background jobbackground job

nice value has no influence for this policynice value has no influence for this policy extremely low priority (lower than +19 nice)extremely low priority (lower than +19 nice)

SCHED_ISO - To be implemented!!SCHED_ISO - To be implemented!!

Page 16: Adding a Scheduling Policy to the Linux Kernel

Background InformationBackground Information

Interactivity estimator Dynamically scales a tasks priority based on it's interactivity Interactive tasks receive a prio bonus [ -5 ]

Hence a larger timeslice CPU bound tasks receive a prio penalty [ +5 ] Interactivity estimated using a running sleep average.

Interactive tasks are I/O bound. They wait for events to occur.

Sleeping tasks are I/O bound or interactive !! Actual bonus/penalty is determined by comparing the

sleep average against a constant maximum sleep average.

Does not apply to RT tasks

Page 17: Adding a Scheduling Policy to the Linux Kernel

Background InformationBackground Information

When a task finishes it's timeslice : It's interactivity is estimated Interactive tasks can be inserted into the 'Active' array again Else, priority is recalculated Inserted into the NEW priority level in the 'Expired' array

Re-inserting interactive tasks To avoid delays, interactive tasks may be re-inserted into the

'active' array after their timeslice has expired Done only if tasks in the 'expired' array have run recently

Done to prevent starvation of tasks Decision to re-insert depends on the task's priority level

Page 18: Adding a Scheduling Policy to the Linux Kernel

Background InformationBackground Information

Timeslice distribution:

Priority is recalculated only after expiring a timeslice Interactive tasks may become non-interactive during

their LARGE timeslices, thus starving other processes To prevent this, time-slices are divided into chunks of

20ms A task of equal priority may preempt the running task

every 20ms The preempted task is requeued and is round-robined

in it's priority level. Also, priority recalculation happens every 20ms

Page 19: Adding a Scheduling Policy to the Linux Kernel

Background InformationBackground Information

From From /usr/src/linux-2.6.x/kernel/sched.c void schedule()

The main scheduling function. Upon return, the highest priority process will

be active Data

struct runqueue() The main per-CPU runqueue data structure

struct task_struct() The main per-process data structure

Page 20: Adding a Scheduling Policy to the Linux Kernel

Background InformationBackground Information

Process Control methods void set_user_nice ( ... )

Sets the nice value of task p to given value int setscheduler( ... )

o Sets the scheduling policy and parameters for a given pid

rt_task( pid ) o Returns true if pid is real-time, false if not

yield() Place the current process at the end of the runqueue

and call schedule()

Page 21: Adding a Scheduling Policy to the Linux Kernel

Background InformationBackground Information

BenchmarkBenchmark Each individual test runs a multiple of 25 processes, Each individual test runs a multiple of 25 processes,

increments to the next multiple and reruns the benchmark. increments to the next multiple and reruns the benchmark. This continues until a max level, set by the tester, is achievedThis continues until a max level, set by the tester, is achieved

Page 22: Adding a Scheduling Policy to the Linux Kernel

Background InformationBackground Information

Now that we know all of this…..Now that we know all of this…..

THEY CHANGED IT AGAIN!!!!!!!!!!!!!!!

Page 23: Adding a Scheduling Policy to the Linux Kernel

Background InformationBackground Information

Kernel 2.6.23 schedulerKernel 2.6.23 scheduler Called Completely Fair Scheduler (CFS)Called Completely Fair Scheduler (CFS) Does not use runqueues, it uses a time-ordered rbtree to Does not use runqueues, it uses a time-ordered rbtree to

build a 'timeline' of future task execution, and thus has no build a 'timeline' of future task execution, and thus has no 'array switch' artifacts for the SCHED_NORMAL policy (or 'array switch' artifacts for the SCHED_NORMAL policy (or SCHED_OTHER)SCHED_OTHER)

Has no notion of 'timeslices' and has no heuristics Has no notion of 'timeslices' and has no heuristics whatsoever whatsoever

sched_rt.c implements SCHED_FIFO and SCHED_RR sched_rt.c implements SCHED_FIFO and SCHED_RR semantics, in a simpler way than the vanilla scheduler does. semantics, in a simpler way than the vanilla scheduler does. It uses 100 runqueues (for all 100 RT priority levels, instead It uses 100 runqueues (for all 100 RT priority levels, instead of 140 in the vanilla scheduler) and it needs no expired of 140 in the vanilla scheduler) and it needs no expired arrayarray

SCHED_BATCH is handled by the CFS scheduler module tooSCHED_BATCH is handled by the CFS scheduler module too

Page 24: Adding a Scheduling Policy to the Linux Kernel

Project StepsProject Steps To start, we need to figure out what version of the kernel we are To start, we need to figure out what version of the kernel we are

currently running. We'll use the uname command for that currently running. We'll use the uname command for that

$ uname -r$ uname -r2.6.24-3-generic2.6.24-3-generic

Now we need to Install the Linux source for your kernel, you can Now we need to Install the Linux source for your kernel, you can substitute the kernel number for whatever you are running. We substitute the kernel number for whatever you are running. We also need to install the curses library and some other tools to help also need to install the curses library and some other tools to help us compileus compile

$ sudo apt-get install linux-source-2.6.24 kernel-package libncurses5-dev fakeroot $ sudo apt-get install linux-source-2.6.24 kernel-package libncurses5-dev fakeroot

If you are curious where the Linux source gets installed to, you If you are curious where the Linux source gets installed to, you can use the dpkg command to tell you the files within a packagecan use the dpkg command to tell you the files within a package

$ dpkg -L linux-source-2.6.17 $ dpkg -L linux-source-2.6.17

Page 25: Adding a Scheduling Policy to the Linux Kernel

Project StepsProject Steps To make things easier, we'll put ourselves in root mode by To make things easier, we'll put ourselves in root mode by

using sudo to open a new shell. There's other ways to do using sudo to open a new shell. There's other ways to do this, but I prefer this way this, but I prefer this way

$ sudo /bin/bash $ sudo /bin/bash

Now change directory into the source location so Now change directory into the source location so that we can install. Note that you may need to that we can install. Note that you may need to install the bunzip utility if it's not installedinstall the bunzip utility if it's not installed

$ cd /usr/src$ cd /usr/src

$ bunzip2 linux-source-2.6.24.tar.bz2 $ bunzip2 linux-source-2.6.24.tar.bz2

$ tar xvf linux-source-2.6.24.tar$ tar xvf linux-source-2.6.24.tar

$ ln -s linux-source-2.6.24 linux $ ln -s linux-source-2.6.24 linux

Page 26: Adding a Scheduling Policy to the Linux Kernel

Project StepsProject Steps Make a copy of your existing kernel configuration to use Make a copy of your existing kernel configuration to use

for the custom compile processfor the custom compile process

$ cp /boot/config-`uname -r` /usr/src/linux/.config$ cp /boot/config-`uname -r` /usr/src/linux/.config

First we'll do a make clean, just to make sure everything is First we'll do a make clean, just to make sure everything is ready for the compileready for the compile

$ make-kpkg clean $ make-kpkg clean

Next we'll actually compile the kernel. This will take a Next we'll actually compile the kernel. This will take a LONG FREAKING TIME, so go find something interesting LONG FREAKING TIME, so go find something interesting to doto do

$ fakeroot make-kpkg --initrd --append-to-version=-custom kernel_image kernel_headers$ fakeroot make-kpkg --initrd --append-to-version=-custom kernel_image kernel_headers This process will create two .deb files in /usr/src that This process will create two .deb files in /usr/src that

contain the kernelcontain the kernel

Page 27: Adding a Scheduling Policy to the Linux Kernel

Project StepsProject Steps Please note that when you run these next commands, this will set Please note that when you run these next commands, this will set

the new kernel as the new default kernel. This could break things! the new kernel as the new default kernel. This could break things! If your machine doesn't boot, you can hit Esc at the GRUB loading If your machine doesn't boot, you can hit Esc at the GRUB loading menu, and select your old kernel. You can then disable the kernel in menu, and select your old kernel. You can then disable the kernel in /boot/grub/menu.lst or try and compile again/boot/grub/menu.lst or try and compile again

$ dpkg -i linux-image-2.6.24.3-custom_2.6.24.3-custom-$ dpkg -i linux-image-2.6.24.3-custom_2.6.24.3-custom-10.00.Custom_i386.deb10.00.Custom_i386.deb

$ dpkg -i linux-headers-2.6.24.3-custom_2.6.24.3-custom-$ dpkg -i linux-headers-2.6.24.3-custom_2.6.24.3-custom-10.00.Custom_i386.deb10.00.Custom_i386.deb

Now reboot your machine. If everything works, you should Now reboot your machine. If everything works, you should be running your new custom kernel. You can check this by be running your new custom kernel. You can check this by using uname. Note that the exact number will be different using uname. Note that the exact number will be different on your machine on your machine

$ uname -r$ uname -r

2.6.17.14-ubuntu1-custom2.6.17.14-ubuntu1-custom

Page 28: Adding a Scheduling Policy to the Linux Kernel

Project StepsProject Steps

Actual Kernel Files Modified:Actual Kernel Files Modified: sched.hsched.h sched.csched.c

Auxiliary Program ModifiedAuxiliary Program Modified chrt.cchrt.c

Page 29: Adding a Scheduling Policy to the Linux Kernel

Project StepsProject Steps

Kernel files modifications:Kernel files modifications:

Added an new policy called Added an new policy called SCHED_JUANSCHED_JUAN

Given a static lower priority value Given a static lower priority value than SCHED_IDLEthan SCHED_IDLE

Code? See the attached filesCode? See the attached files

Page 30: Adding a Scheduling Policy to the Linux Kernel

Project StepsProject Steps Auxiliary Program:Auxiliary Program:

chrt command is part of chrt command is part of util-linux packageutil-linux package - low-level - low-level system utilities that are necessary for a Linux system to system utilities that are necessary for a Linux system to function. It is installed by default under Ubuntu and almost all function. It is installed by default under Ubuntu and almost all other Linux distributions other Linux distributions

You can get / set attributes of running processesYou can get / set attributes of running processes Compile: Compile: gcc chrtJ.c -o chrtJUgcc chrtJ.c -o chrtJU

Changed chrt source to support SCHED_JUAN Changed chrt source to support SCHED_JUAN

Code ? See attached file (chrtJ.c)Code ? See attached file (chrtJ.c)

Page 31: Adding a Scheduling Policy to the Linux Kernel

AchievementsAchievements

Project DemoProject Demo

Page 32: Adding a Scheduling Policy to the Linux Kernel

Project StepsProject Steps

Is the policy useful ?Is the policy useful ?

Improvements ?Improvements ?

Page 33: Adding a Scheduling Policy to the Linux Kernel

Questions ?Questions ?

Page 34: Adding a Scheduling Policy to the Linux Kernel

ReferencesReferences Kernel DesignKernel Design

http://aplawrence.com/Linux/linux26_features.htmlhttp://aplawrence.com/Linux/linux26_features.html http://www.linux.com/whatislinux/119700http://www.linux.com/whatislinux/119700 http://www.ibm.com/developerworks/linux/library/l-scheduler/http://www.ibm.com/developerworks/linux/library/l-scheduler/ http://lxr.linux.no/linux+v2.6.24/Documentation/sched-design.txthttp://lxr.linux.no/linux+v2.6.24/Documentation/sched-design.txt

Kernel Compiling GuideKernel Compiling Guide http://www.howtogeek.com/howto/ubuntu/how-to-customize-your-ubuntu-kernel/http://www.howtogeek.com/howto/ubuntu/how-to-customize-your-ubuntu-kernel/

SCHED_IDLE Reference:SCHED_IDLE Reference: https://kerneltrap.org/mailarchive/linux-kernel/2008/3/3/1051054https://kerneltrap.org/mailarchive/linux-kernel/2008/3/3/1051054

ChrtChrt http://www.cyberciti.biz/faq/howto-set-real-time-scheduling-priority-http://www.cyberciti.biz/faq/howto-set-real-time-scheduling-priority-

process/process/

BenchmarkBenchmark http://devresources.linux-foundation.org/craiger/hackbench/http://devresources.linux-foundation.org/craiger/hackbench/