q u e u e s your q u e r y w aits in - percona · 946.4 / 1676.3 → 0.56 ms / op. device: r/s...

67
QUEUES YOUR QUERY WAITS IN JOSH SNYDER

Upload: others

Post on 20-May-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

QUEUES YOUR QUERYWAITS IN

JOSH SNYDER

Page 2: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

SO MUCH QUEUING SO LITTLE TIME

JOSH SNYDER

Page 3: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

ESCALATOR ETIQUETTE

Page 4: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

ESCALATING UPHEAVAL

Why would they do such a thing?

Page 5: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

Latency Throughput

units time time-1

measures smallest sliver ofwork

largest sample ofwork

lim n → 1 lim n → ∞

Page 6: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

HYPOTHETICAL ESCALATOR ANALYSISPerson standing requires 1 stair; 24 secondsSo: 24 stair-secondsPerson walking requires 12 secondsTo break even, walkers must be spaced ≤2 stairsapart

http://www.gizmodo.co.uk/2017/03/the-results-are-in-the-holborn-escalator-trial-proves-that-it-is-better-to-stand-on-the-escalator-well-sometimes/

Page 7: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

WHAT'S TO COMEtwo "favorite" tools: iostat and loadavglayers and layers of latencymanaging multi-tenancyload (un)balancing

Page 8: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

IOSTATPresents disk I/O statisticsReads /proc/diskstats (Linux)

Page 9: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

IOSTAT: AN EXAMPLEreads r_sects r_ms t_act

t=0 459583 96210660 2693900 599164

t=10 476346 100180364 2741888 608628

Δ 16763 3969704 47988 9464

/ 10 1676.3 396970.4 4798.8 946.4

human 1676.3 / second 190.83 MB/s 4.7988 .9464

Page 10: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

r_await: average time each I/O waited 4798.8 ms spent 1676.3 reads 4798.8 / 1676.3 → 2.86 ms / op

Page 11: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

avgrq-sz: mean sectors per I/O396970.4 sectors 1676.3 reads 396970.4 / 1676.3 → 236.81 sectors / op

Page 12: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

svctm: non-idle time (%util) / #OPS946.4 ms / second (94.6 %util) 1676.3 reads 946.4 / 1676.3 → 0.56 ms / OP

Page 13: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util vda 1676 190.83 236.81 4.80 2.86 0.56 94.64

Page 14: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

LOAD AVERAGECollected by the schedulerBased on process states

Page 15: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

PROCESS STATESA process/thread/task is either:

runnableon a CPUstarved of CPU

waiting for something

Page 16: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

BEING RUNNABLEint i = 0; while(1) { i++; }

Page 17: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

CPU STARVATIONPossible reasons:

no CPU is availabletask is (temporarily) assigned to a CPU with otherwork to doa bug in the scheduler (see "A Decade of WastedCores")

Page 18: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

MEASURING CPU STARVATION

Formats documented in Documentation/scheduler/sched-stats.txt

$ awk '/^cpu/ { printf "%s %.9fs\n", $1, $9 / 1e9 }' /proc/schedstat cpu0 508.505281125s cpu1 186.946423306s

$ awk '{ printf "%.9fs\n", $2 / 1e9 }' /proc/$PID/schedstat 3.181567463s

Page 19: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()
Page 20: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

Resource-by-resource analysisUSE method

Page 21: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

WAITING VOLUNTARILYaccept() a new network connection

recv() data on a socket

sleep() a timer

futex() a memory address (lock)

waitpid() a process

etc...

Page 22: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

SLEEPING INVOLUNTARILYpkill -STOP mysqld

Page 23: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

PROCESS STATESRUNNING (R) misnomer: runnable process

UNINTERRUPTIBLE (D) waiting for disk

INTERRUPTIBLE (S) waiting for something else

STOPPED (T) (forced to) wait for SIGCONT

ZOMBIE (Z) waiting for parent to waitpid()

See include/linux/sched.h for gory details

Page 24: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

LOAD AVERAGEInstantaneous load:

TASK_RUNNING (R) + TASK_UNINTERRUPTIBLE (D)

sampled every 5 secondsinto an exponentially weighted moving average

See: include/linux/sched.hkernel/sched/loadavg.c

Page 25: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

WHAT'S BETTER THAN A LOAD AVERAGE?for CPU: runqueue latencyfor disk:

iostat avgqu-sz (per disk)delayacct_blkio_ticks (per task)

https://github.com/hashbrowncipher/taskstats_exporter

Page 26: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

DELAYACCT_BLKIO_TICKSHow long a process spent in the D state, in hundredthsof a second: $ awk '{ print $42 / 100 }' < /proc/$PID/stat19.68

Page 27: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()
Page 28: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

WORKLOAD DEPENDENCE (1)Workload:

Resources: ≥32 coresSSD with maximum performance at QD=16-32

def random_reader(): while True: do_random_read()

threaded(random_reader, 500).start()

Page 29: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

WORKLOAD DEPENDENCE (2)Compare this workload:def locked_random_reader(semaphore): while True: with semaphore: do_random_read()

max_ios = 32 semaphore = Semaphore(max_ios) threaded(lambda: random_reader(semaphore), 500).start()

Page 30: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

WORKLOAD DEPENDENCE: LESSONSChanges in workload will change both bad stats(load) and good ones (delayacct_blkio_ticks)Locks are a form of queueing!

Page 31: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

QUESTIONS?

Page 32: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

SIMPLIFIED MYSQL EXAMPLE1. Query packet arrives at NIC2. Kernel adds packet to socket queue; wakes recv()'ing MySQL

thread (S → R)3. Buffer pool lookup: MISS! (waited for locks, R → S → R)4. Read pages from disk (R → D → R)5. Big result; send result to client (R → S → R)6. Wait for client: recv() (R → S)7. GOTO 1

Page 33: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

"SIMPLIFIED" IS A KEY WORDDid packet processing happen due to interrupt, orpolling?How hot are the CPU cachesQuery passed through bunches of MySQLevents_stagesetc...

Page 34: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

LATENCY ANALYSIS IS FRACTALLY COMPLEX!

CPUs (and everything else) are abstractions that hidecomplexity:

is it throttling?how many cycles did I stall due to memory access?how many cycles did I stall due to lack of resources inthe processor?

Page 35: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

BUT WE DO IT ANYWAY!CPU time is still a useful metric, even though we take itwith a grain of salt!

Page 36: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

COLLECTING LATENCY INFORMATIONTwo methods: 1. Timing2. Sampling (Little's law!)

Page 37: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

COLLECTING TIMINGST: Accumulator t: current time T += tend - tstart

Page 38: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

INSIST ON VDSO TIMEKEEPINGvDSO: virtual dynamic shared objectvDSO timekeeping takes 25-45ns in a tight loopnon-vDSO timekeeping takes ~4x longer

bad $ strace -qq -e clock_gettime date > /dev/null clock_gettime(CLOCK_REALTIME, {946684800, 0}) = 0 good $ strace -qq -e clock_gettime date > /dev/null

http://www.brendangregg.com/blog/2015-03-03/performance-tuning-linux-instances-on-ec2.htmlhttps://www.slideshare.net/AmazonWebServices/cmp402-amazon-ec2-instances-deep-dive

Page 39: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

LITTLE'S LAWIn a stable system: L = λWL (mean dwelling customers) λ (mean arrival rate) W (mean dwell time)

Page 40: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

LITTLE'S LAW APPLIED TO MYSQLOver 1 second:

1000 query threads (~Threads_running sampled)1e5 Questions (SHOW STATUS LIKE 'Questions')

L / λ = W So: 1000 / (1e5 / sec) = 10 ms per query

Page 41: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

Over the same period the application records 23outstanding queries (on average)

23 / (1e5 / sec) = 23 ms per query

23 - 10 = 13 ms of unaccounted-for time (on average)

Page 42: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

PROBLEMS ABOUNDWe now know an average about our queries in general.

Page 43: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

INTERLUDE: WHY NOT HISTOGRAMS‽A useful histogram requires ~16-276 counters

An average requires 2 (+1 for variance).

We can track more averages than we can ever ashistograms.

Page 44: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

QUESTIONS?

Page 45: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

A TALE OF TWO TENANTSService "�shpics" with N workers and two paths:

cache hit (1 ms)cache miss (100-10000 ms)

Page 46: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

AN "ANSWER" TO BAD AVERAGESTrack cache hit/miss time (mean + variance)separatelyTrack time-in-queue separately from work time

Page 47: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

MULTI-TENANCYif misses get too slow, hits will waittwo "tenants", one blocking the other

Page 48: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

FAIRNESSQueue time is useful, but not in isolation!

If a 100 ms RPC waits 3 ms: no big dealIf a 100 µs RPC waits 3 ms: alarm bells!

Page 49: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

SLOWDOWNS = (queued time) / (working time)worthwhile whenever a human is waitingcf. express lanes in grocery storesOLTP vs. batch workloads

Page 50: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()
Page 51: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()
Page 52: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

CONCURRENCY LIMITING�shpics service:

limit misses to 90% of workersdrop requests above 90%single pool of servers; single deployment

Page 53: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

ALL DATASTORES ARE MULTI-TENANT!query threads compete with each otherbatching and coalescing work → background workbackground threads compete with query threadsbackups are background work

Page 54: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

EXAMPLE: MYSQL BACKUPSPipeline: read | compress | sendread from disk (ionice(1); CFQ IOPRIO_CLASS_IDLE)compress (chrt(1); SCHED_IDLE)send over network (prio qdisc; SOL_PRIORITY)

Page 55: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

CLASSFUL SCHEDULINGwork is divided into classesif high-class work exists, low-class work waitsnice(1) is NOT classful (timeslices)

Page 56: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

EXAMPLE: CASSANDRA COMPACTIONGoal: maximal compaction; minimal disruption

don't pick a rate a priori!SCHED_IDLE → possible starvationsolution: cpuset

Page 57: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

HERACLES

From GoogleColocated batch and latency-sensitive tasksPer-resource analysis

Page 58: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

SO MUCH MORE!token bucketscgroupsqdiscs (codel)

Page 59: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

QUESTIONS?

Page 60: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

LOAD BALANCINGhow are requests allocated to backends?central queue minimizes queued timeunder unrealistic assumptions

See, in general, ch 24 of "Performance Modeling and Design of Computer Systems"

Page 61: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

BAD LOAD BALANCINGrandomround-robin (slightly better)

Page 62: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

BETTER LOAD BALANCINGjoin-shortest-queueleast-work-leftTAGS (for "practical" workloads)

Page 63: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

TAGS (PREPARE YOUR MIND)throws away work!unbalances load!fairness over throughput

Page 64: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

TAGS (KEY CONCEPTS)Non-preemptible idempotent jobsLarge variance in job sizeUnwilling (unable) to make predictionsSlowdown metric (covered earlier)Server expansion requirement

Page 65: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

TAGSAllow jobs to run a limited amount of timeKill and requeue jobs that run too long

Page 66: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

LOAD UNBALANCING

Page 67: Q U E U E S YOUR Q U E R Y W AITS IN - Percona · 946.4 / 1676.3 → 0.56 ms / OP. Device: r/s rMB/s avgrq-sz avgqu-sz r_await svctm %util ... data on a socket sleep() a timer futex()

(FINAL) QUESTIONS?