responding in a timely manner - goto conference · system: 1000 tps, mean rt 50µs what is the mean...

92
Responding in a timely manner Martin Thompson - @mjpt777

Upload: others

Post on 29-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Responding in a timely manner

Martin Thompson - @mjpt777

Page 2: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs
Page 3: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Hard Real-time

Page 4: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs
Page 5: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Soft Real-time

Page 6: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs
Page 7: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Squidgy Real-time

Page 8: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs
Page 9: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

The Unaware

Page 10: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs
Page 11: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

1. How to Test and Measure

2. A little bit of Theory

3. A little bit of Practice

4. Common Pitfalls

5. Useful Algorithms and Techniques

Page 12: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Test & Measure

Page 13: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

System Under Test

Page 14: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Distributed Load

Generation Agents

System Under Test

Page 15: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Distributed Load

Generation Agents

System Under Test

Page 16: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Distributed Load

Generation Agents

System Under Test

Page 17: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Distributed Load

Generation Agents

System Under Test

Observer

Page 18: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Pro Tip:Setup a continuous

performance testing environment

Page 19: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Pro Tip: Record Everything

Page 20: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Latency Histograms

Page 21: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Latency Histograms

Mode

Page 22: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Latency Histograms

ModeMedian

Page 23: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Latency Histograms

ModeMedian

Mean

Page 24: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

System: 1000 TPS, mean RT 50µs

Page 25: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

System: 1000 TPS, mean RT 50µs

What is the mean if you add in a

25ms GC pause per second?

Page 26: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

System: 1000 TPS, mean RT 50µs

What is the mean if you add in a

25ms GC pause per second?

~300µs

Page 27: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs
Page 28: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Forget averages,

it’s all about percentiles

Page 29: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Source: Gil Tene (Azul Systems)

Coordinated Omission

Page 30: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Pro Tip: Don’t deceive yourself

Page 31: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Theory

Page 32: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Queuing Theory

0.0

2.0

4.0

6.0

8.0

10.0

12.0

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Re

spo

nse

Tim

e

Utilisation

Page 33: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Queuing Theory

Kendall Notation

M/D/1

Page 34: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Queuing Theory

r = s(2 – ρ) / 2(1 – ρ)

r = mean response time

s = service time

ρ = utilisation

Page 35: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Queuing Theory

r = s(2 – ρ) / 2(1 – ρ)

r = mean response time

s = service time

ρ = utilisation

Note: ρ = λ * (1 / s)

Page 36: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Queuing Theory

0.0

2.0

4.0

6.0

8.0

10.0

12.0

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Re

spo

nse

Tim

e

Utilisation

Page 37: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Pro Tip:Ensure that you have sufficient capacity

Page 38: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Queuing Theory

Little’s Law: L = λ * W

L = mean queue length

λ = mean arrival rate

W = mean time in system

Page 39: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Pro Tip:Bound queues to meet response time SLAs

Page 40: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Can we go parallel to

speedup?

Page 41: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

ASequential Process

time

B

Amdahl’s Law

Page 42: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

ASequential Process

A BParallel Process A

A

A

A

time

B

Amdahl’s Law

Page 43: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

ASequential Process

Parallel Process B

A BParallel Process A

A

A

A

time

B

A B

B

B

B

Amdahl’s Law

Page 44: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Amdahl's Law

Page 45: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Universal Scalability Law

C(N) = N / (1 + α(N – 1) + ((β* N) * (N – 1)))

C = capacity or throughput

N = number of processors

α = contention penalty

β = coherence penalty

Page 46: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Universal Scalability Law

0

2

4

6

8

10

12

14

16

18

20

1 2 4 8 16 32 64 128 256 512 1024

Sp

ee

du

p

Processors

Amdahl USL

Page 47: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

What about the service time?

Page 48: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Order of Algorithms

Page 49: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Practice

Page 50: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs
Page 51: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs
Page 52: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs
Page 53: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs
Page 54: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs
Page 55: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Pitfalls

Page 56: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Modern Processors

P & CStates???

Hyperthreading?

SMIs?

Page 57: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Non-Uniform Memory Architecture (NUMA)

P & CStates???

C 1 C n C 1 C nRegisters/Buffers

<1ns

L1 L1 L1 L1~4 cycles ~1ns

L2 L2 L2 L2~12 cycles ~3ns

L3 L3~40 cycles ~15ns

~60 cycles ~20ns

(dirty hit)

~65ns

DRAM

QPI ~40nsMC MC

DRAM

DRAM

DRAM

DRAM

DRAM

DRAM

DRAM

...

...

...

...

...

...

QPI QPIPCI-e 3 PCI-e 3

40X

IO

40X

IO

* Assumption: 3GHz Processor

Page 58: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Virtual Memory Management

Transparent Huge Pages

Page Flushing & IO Scheduling

vm.min_free_kbytes

Swap???

Page 59: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Safepoints in the JVM

Garbage Collection, De-optimisation, Biased Locking, Stack traces, etc.

Page 60: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Virtualization

System Calls

Page 61: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Notification

public class SomethingUseful

{

// Lots of useful stuff

public void handOffSomeWork()

{

// prepare for handoff

synchronized (this)

{

someObject.notify();

}

}

}

Page 62: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Notification

public class SomethingUseful

{

// Lots of useful stuff

public void handOffSomeWork()

{

// prepare for handoff

synchronized (this)

{

someObject.notify();

}

}

}

Page 63: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Law of Leaky Abstractions

“All non-trivial abstractions,

to some extent, are leaky.”

- Joel Spolsky

Page 64: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Law of Leaky Abstractions

“The detail of underlying

complexity cannot be ignored.”

Page 65: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Mechanical Sympathy

Page 66: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Responding in the presence of failure

Page 67: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Algorithms & Techniques

Page 68: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Clean Room Experiments

• sufficient CPUs

• intel_idle.max_cstate=0

• cpufreq

• isocpus

• numctl, cgroups, affinity

• “Washed” SSDs

• network buffer sizing

• jHiccup

• tune your stack!

• Mechanical Sympathy

Page 69: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Profiling

Page 70: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Pro Tip:Incorporate telemetry and histograms

Page 71: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Smart BatchingLa

ten

cy

Load

Typical

Possible

Page 72: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Smart Batching

Producers

Page 73: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Smart Batching

Batcher

Producers

<< Amortise Expensive Costs >>

Page 74: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Pro Tip:Amortise the Expensive Costs

Page 75: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Applying Backpressure

Transaction Service

ThreadsNe

two

rk

Sta

ck

Storage

ThreadsNe

two

rk

Sta

ck

Gateway Services

Ne

two

rk

Sta

ck

IO

Customers

Page 76: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Non-Blocking Design

“Get out of your own way!”

• Don’t hog any resource

• Always try to make progress

• Enables Smart Batching

Page 77: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Pro Tip:Beware of

hogging resources in synchronous designs

Page 78: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Lock-Free Concurrent Algorithms

• Agree protocols of

interaction

• Don’t get a 3rd party

involved, i.e. the OS

• Keep to user-space

• Beat the “notify()”

problem

Page 79: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Observable State Machines

Page 80: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Pro Tip:Observable state

machines make monitoring easy

Page 81: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Cluster for Response and Resilience

Service A

Service A

Sequencer

Page 82: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Cluster for Response and Resilience

Service A

Service A

Sequencer

Page 83: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Cluster for Response and Resilience

Service A

Service A

Service N

Sequencer

Page 84: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Data Structures and O(?) Models

Is there a world beyond

maps and lists?

Page 85: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

In closing…

Page 86: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs
Page 87: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs
Page 88: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

The Internet of Things (IoT)

“There will be X connected

devices by 2020...”

Where X is 20 to 75 Billion

Page 89: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

If you cannot control

arrival rates...

Page 90: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

...you have to think hard

about improving service times!

Page 91: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

...and/or you have to think hard

about removing all contention!

Page 92: Responding in a timely manner - GOTO Conference · System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Questions?

Blog: http://mechanical-sympathy.blogspot.com/

Twitter: @mjpt777

“It does not matter how intelligent you are, if you guess and that guess cannot be backed

up by experimental evidence –then it is still a guess.”

- Richard Feynman