disk scheduling in linux its really quite complex!

Disk Scheduling In Linux

It’s really quite complex!

http://www.stack.nl/~fidget/pguin/pguin.html

My Goals

• Teach a little bit of Computer Science.• Show that the easy stuff is hard in real life.

• BTW, Operating systems are cool!

How A Disk Works

• A disk is a bunch of data blocks.

• A disk has a disk head.

• Data can only be accessed when the head is on that block.

• Moving the head takes FOREVER.

• Data transfer is fast (once the disk head gets there).

The Problem Statement

• Suppose we have multiple requests for disk blocks …. Which should we access first?

P.S. Yes, order does matter … a lot.

Formal Problem Statement

• Input: A set of requests.

• Input: The current disk head location.

• Input: Algorithm state (direction??)

• Output: The next disk block to get.

P.S. Not the whole ordering, just the next one.

The Goals

• Maximize throughput– Operations per second.

• Maximize fairness– Whatever that means.

• Avoid Starvation– And very very long waits.

• Real Time Concerns– These can be life threatening (and bank

accounts too!).

What’s Not Done

• Every Operating system assigns priorities to all sorts of things– Requests for RAM– Requests for the CPU– Requests for network access

• Few use disk request priority.

Why Talk About This Now?

• Because in the last year Linux has had three very different schedulers, and they’ve been tested against each other.

A Pessimal Algorithm

• Choose the disk request furthest from the current disk head.

• This is known to be as bad as any algorithm without idle periods.

An Optimal Algorithm

• Chose the disk request closest to the disk head.

• It’s Optimal!

Optimal Analyzed

• This is known to have the highest performance in operations per second.

• It’s unfair to requests toward the edges of the disk.

• It allows for starvation.

First Come First Serve

• Serve the requests in their arrival order.– It’s fair.– It avoid starvation.– It’s medium lousy

performance.– Some simple OSs use

this.

Elevator

• Move back and forth, solving requests as you go.

• Performance– good

• Fairness– Files near the middle of the disk get 2x the

attention.

Cyclic Elevator

• Move toward the bottom, solving requests as you go.

• When you’ve solved the lowest request, seek all the way to the highest request.– Performance penalty occurs here.

Cyclic Elevator Analyzed

• It’s fair.

• It’s starvation-proof.

• It’s very good performance.– Almost as good as elevator.

• It’s used in real life (and every textbook).

Deadline Scheduler

• Each request has a deadline.

• Service requests using cyclic elevator.

• When a deadline is threatened, skip directly to that request.

• For Real Time (which means xmms)

Jens Axboe

Deadline Analyzed

• Gives Priority to Real Time Processes.• Fair otherwise.• No starvation

– Unless a real time process goes wild.

Anticipatory Scheduling(The Idea)

• Developed by several people.• Coded by Nick Piggin.

• Assume that an I/O request will be closely followed by another nearby one.

Anticipatory Scheduling(The Algorithm)

• After servicing a request … WAIT.– Yes, this means do nothing even though there

is work to be done.

• If a nearby request occurs soon, service it.

• If not … cyclic elevator.

Anticiptory Scheduling Analyzed

• Fair

• No support for real time.

• No starvation.

• Makes assumptions about how processes work in real life.– That’s the idleness.– They better be right

Benchmarking the Anticipatory Scheduler

Source: http://www.cs.rice.edu/~ssiyer/r/antsched/antsched.pdf

Completely Fair Queuing(also by Jens)

• Real Time needs always come first

• Otherwise, no user should be able to hog the disk drive.

• Priorities are OK.

The CFQ Picture

RT

Q1

Q2

Q20

Dispatcher Disk Queue

Disk

Yes, Gabe’s art is better

10msValve

Analyzing CFQ

• Complex!!!!!

• Has Real Time support (Jens likes that).

• Fair, and fights disk hogs!– A new kind of fairness!!

• No starvation is possible.– Real time crazyness excepted.

• Allows for priorities– But no one knows how to assign them.

Benchmark #1

• time (find kernel-tree -type f | xargs cat > /dev/null)

Dead: 3 minutes 39 secondsCFQ: 5 minutes 7 secondsAS: 17 seconds

The Benchmark #2

for i in 1 2 3 4 5 6dotime (find kernel-tree-$i -type f | xargs cat > /dev/null ) &

done

Dead: 3m56.791sCFQ: 5m50.233s AS: 0m53.087s

The Benchmark #3

time (cp 1-gig-file foo ; sync)

AS: 1:22.36CFQ: 1:25.54Dead: 1:11.03

Benchmark #4

• time ssh testbox xterm -e true

Old: 62 secondsDead: 14 secondsCFQ: 11 secondsAS: 12 seconds

Benchmark #5

• While “cat 512M-file > /dev/null “

• Measure “write-and-fsync -f -m 100 outfile”

Old: 6.4 secondsDead: 7.7 secondsCFQ: 8.4 secondsAS: 11.9 seconds

The Winner is …

• Andrew Morton said, "the anticipatory scheduler is wiping the others off the map, and 2.4 is a disaster."

What You Learned

• In Real Operating Systems …

– Performance is never obvious.– No one uses the textbook algorithm.– Benchmarking is everything.

• Theory is useful, if it helps you benchmark better.

• Linux is cool, since we can see the development process.

disk scheduling in linux its really quite complex!

Documents

seconds slide

right slide

disk scheduling

ms valve slide

disk blocks

disk drive

disk request closest

xmms jens axboe slide