distributed adrian colyer unevenly - qcon london · do less testing! 12 relative improvement cost...

37
Unevenly Adrian Colyer @adriancolyer Distributed

Upload: others

Post on 05-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Distributed Adrian Colyer Unevenly - QCon London · Do Less Testing! 12 Relative Improvement Cost Improvement Test Executions 40.58% Test Time 40.31% $1,567,608 Test Result Inspection

Unevenly

Adrian Colyer

@adriancolyer

Distributed

Page 2: Distributed Adrian Colyer Unevenly - QCon London · Do Less Testing! 12 Relative Improvement Cost Improvement Test Executions 40.58% Test Time 40.31% $1,567,608 Test Result Inspection

blog.acolyer.org

350FoundationsFrontiers

Page 3: Distributed Adrian Colyer Unevenly - QCon London · Do Less Testing! 12 Relative Improvement Cost Improvement Test Executions 40.58% Test Time 40.31% $1,567,608 Test Result Inspection

Brainstorm

01

02

05

04rainstorm

03

5 Reasons to <3 Papers

Thinking tools

Raise Expectations

AppliedLessons The Great

Conversation

UnevenDistribution

3

Page 4: Distributed Adrian Colyer Unevenly - QCon London · Do Less Testing! 12 Relative Improvement Cost Improvement Test Executions 40.58% Test Time 40.31% $1,567,608 Test Result Inspection

Frank McSherryScalability - but at what COST?

4

Page 5: Distributed Adrian Colyer Unevenly - QCon London · Do Less Testing! 12 Relative Improvement Cost Improvement Test Executions 40.58% Test Time 40.31% $1,567,608 Test Result Inspection

5

Page 6: Distributed Adrian Colyer Unevenly - QCon London · Do Less Testing! 12 Relative Improvement Cost Improvement Test Executions 40.58% Test Time 40.31% $1,567,608 Test Result Inspection

But you have BIG Data!

6

Zipf Distribution

“Working sets are Zipf-distributed. We can therefore store in memory all but the very largest datasets.”

Page 7: Distributed Adrian Colyer Unevenly - QCon London · Do Less Testing! 12 Relative Improvement Cost Improvement Test Executions 40.58% Test Time 40.31% $1,567,608 Test Result Inspection

Musketeer

7

One for all?

Page 8: Distributed Adrian Colyer Unevenly - QCon London · Do Less Testing! 12 Relative Improvement Cost Improvement Test Executions 40.58% Test Time 40.31% $1,567,608 Test Result Inspection

Approx Hadoop

8

32x!

Page 9: Distributed Adrian Colyer Unevenly - QCon London · Do Less Testing! 12 Relative Improvement Cost Improvement Test Executions 40.58% Test Time 40.31% $1,567,608 Test Result Inspection

Improve your API DesignThe Scalable Commutativity Rule

9

Page 10: Distributed Adrian Colyer Unevenly - QCon London · Do Less Testing! 12 Relative Improvement Cost Improvement Test Executions 40.58% Test Time 40.31% $1,567,608 Test Result Inspection

Raising Your Expectations

10

Page 11: Distributed Adrian Colyer Unevenly - QCon London · Do Less Testing! 12 Relative Improvement Cost Improvement Test Executions 40.58% Test Time 40.31% $1,567,608 Test Result Inspection

TLS

11

54 CVEsJan ‘14 - Jan ‘15

! Error prone languages! Lack of Separation! Ambiguous and Untestable Spec

Surely we can do better?

Page 12: Distributed Adrian Colyer Unevenly - QCon London · Do Less Testing! 12 Relative Improvement Cost Improvement Test Executions 40.58% Test Time 40.31% $1,567,608 Test Result Inspection

Do Less Testing!

12

Relative Improvement Cost Improvement

Test Executions 40.58%

Test Time 40.31% $1,567,608

Test Result Inspection 33.04% $61,533

Escaped Defects 0.20% ($11,971)

Total Cost Balance $1,617,170

Microsoft Windows 8.1

Page 13: Distributed Adrian Colyer Unevenly - QCon London · Do Less Testing! 12 Relative Improvement Cost Improvement Test Executions 40.58% Test Time 40.31% $1,567,608 Test Result Inspection

13

Page 14: Distributed Adrian Colyer Unevenly - QCon London · Do Less Testing! 12 Relative Improvement Cost Improvement Test Executions 40.58% Test Time 40.31% $1,567,608 Test Result Inspection

Lessons from the Field

14

Page 15: Distributed Adrian Colyer Unevenly - QCon London · Do Less Testing! 12 Relative Improvement Cost Improvement Test Executions 40.58% Test Time 40.31% $1,567,608 Test Result Inspection

at FacebookA Masterclass in Config Mgt

15

Page 16: Distributed Adrian Colyer Unevenly - QCon London · Do Less Testing! 12 Relative Improvement Cost Improvement Test Executions 40.58% Test Time 40.31% $1,567,608 Test Result Inspection

lessons from GoogleMachine Learning Systems

16

Feature Management

Visualisation

Relative Metrics

Systematic Bias CorrectionAlerts on action Thresholds

01

02

03

04

05

Page 17: Distributed Adrian Colyer Unevenly - QCon London · Do Less Testing! 12 Relative Improvement Cost Improvement Test Executions 40.58% Test Time 40.31% $1,567,608 Test Result Inspection

And the SyntopiconThe Great Conversation

17

Page 18: Distributed Adrian Colyer Unevenly - QCon London · Do Less Testing! 12 Relative Improvement Cost Improvement Test Executions 40.58% Test Time 40.31% $1,567,608 Test Result Inspection

RoboticsSecurity

Distributed Systems

Databases

Machine Learning

Programming Languages

Broad Exposure to Problems and their SolutionsCross-Fertilization

And Many MoreOperating Systems, Algorithms, Networking,Optimisation, SW Engineering,...

18

Page 19: Distributed Adrian Colyer Unevenly - QCon London · Do Less Testing! 12 Relative Improvement Cost Improvement Test Executions 40.58% Test Time 40.31% $1,567,608 Test Result Inspection

TPC-C - 1992

19

Page 20: Distributed Adrian Colyer Unevenly - QCon London · Do Less Testing! 12 Relative Improvement Cost Improvement Test Executions 40.58% Test Time 40.31% $1,567,608 Test Result Inspection

TPC-C Published Record Holder

20

Mar 26th 2013DateOracle 11g r2 Enterprise Edition w. PartitioningDatabase Manager8,552,523 (8.5M)Performance (tpmC)142,542 (143K)Performance (tps)$4,663,073System Cost8#Processors128#Cores1024#Threads

Page 21: Distributed Adrian Colyer Unevenly - QCon London · Do Less Testing! 12 Relative Improvement Cost Improvement Test Executions 40.58% Test Time 40.31% $1,567,608 Test Result Inspection

and I-Confluence AnalysisCoordination Avoidance

21

TPC-C

Page 22: Distributed Adrian Colyer Unevenly - QCon London · Do Less Testing! 12 Relative Improvement Cost Improvement Test Executions 40.58% Test Time 40.31% $1,567,608 Test Result Inspection

Multi-Partition Transactions at Scale

22

Page 23: Distributed Adrian Colyer Unevenly - QCon London · Do Less Testing! 12 Relative Improvement Cost Improvement Test Executions 40.58% Test Time 40.31% $1,567,608 Test Result Inspection

Turning your world Upside Down

Unevenly Distributed

Page 24: Distributed Adrian Colyer Unevenly - QCon London · Do Less Testing! 12 Relative Improvement Cost Improvement Test Executions 40.58% Test Time 40.31% $1,567,608 Test Result Inspection

Human computers at Dryden by NACA (NASA) - Dryden Flight Research Center Photo Collection

http://www.dfrc.nasa.gov/Gallery/Photo/Places/HTML/E49-54.html. Licensed under Public Domain via Commons - https://commons.wikimedia.org/wiki/File:Human_computers_-_Dryden.jpg#/media/File:Human_computers_-_Dryden.jpg

Page 25: Distributed Adrian Colyer Unevenly - QCon London · Do Less Testing! 12 Relative Improvement Cost Improvement Test Executions 40.58% Test Time 40.31% $1,567,608 Test Result Inspection

Computing on a Human Scale

25

10ns70ns

10ms

10s1:10s116d

Registers & L1-L3

File on desk

Main memory

Office filing cabinet

HDDTrip to the warehouse

Page 26: Distributed Adrian Colyer Unevenly - QCon London · Do Less Testing! 12 Relative Improvement Cost Improvement Test Executions 40.58% Test Time 40.31% $1,567,608 Test Result Inspection

ComputeHTMPersistent Memory NIFPGAGPUs

MemoryNVDIMMsPersistent Memory

Networking100GbE

RDMA

StorageNVMe

Next-gen NVM

Next Generation HardwareAll Change Please

26

Page 27: Distributed Adrian Colyer Unevenly - QCon London · Do Less Testing! 12 Relative Improvement Cost Improvement Test Executions 40.58% Test Time 40.31% $1,567,608 Test Result Inspection

2-10m

Computing on a Human Scale

27

10s1:10s116d

File on desk

Office filing cabinet

Trip to the warehouse

4x capacity fireproof local filing cabinets

23-40mPhone another office (RDMA)

3h20mNext-gen warehouse

Page 28: Distributed Adrian Colyer Unevenly - QCon London · Do Less Testing! 12 Relative Improvement Cost Improvement Test Executions 40.58% Test Time 40.31% $1,567,608 Test Result Inspection

The New ~Numbers Everyone Should Know

28

Latency Bandwidth Capacity/IOPS

Register 0.25ns

L1 cache 1ns

L2 cache 3ns 8MB

L3 cache 11ns 45MB

DRAM 62ns 120GBs 6TB - 4 socket

NVRAM’ DIMM 620ns 60GBs 24TB - 4 socket

1-sided RDMA in Data Center 1.4us 100GbE ~700K IOPS

RPC in Data Center 2.4us 100GbE ~400K IOPS

NVRAM’ NVMe 12us 6GBs 16TB/disk,~2M/600K

NVRAM’ NVMf 90us 5GBs 16TB/disk, ~700/600K

Page 29: Distributed Adrian Colyer Unevenly - QCon London · Do Less Testing! 12 Relative Improvement Cost Improvement Test Executions 40.58% Test Time 40.31% $1,567,608 Test Result Inspection

Low Latency - RAMCloud

29

Reads5μsWrites13.5μsTransactions20μs

5-object Txns27μs

TPC-C (10 nodes)35K tps

Page 30: Distributed Adrian Colyer Unevenly - QCon London · Do Less Testing! 12 Relative Improvement Cost Improvement Test Executions 40.58% Test Time 40.31% $1,567,608 Test Result Inspection

No Compromises - FaRM

30

TPC-C (90 nodes)4.5M tps99%ile1.9msKV (per node)6.3M qpsat peak throughput41μs

Page 31: Distributed Adrian Colyer Unevenly - QCon London · Do Less Testing! 12 Relative Improvement Cost Improvement Test Executions 40.58% Test Time 40.31% $1,567,608 Test Result Inspection

No Compromises

31

“This paper demonstrates that new software in modern data centers can eliminate the need to compromise. It describes the transaction, replication, and recovery protocols in FaRM, a main memory distributed computing platform. FaRM provides distributed ACID transactions with strict serializability, high availability, high throughput and low latency. These protocols were designed from first principles to leverage two hardware trends appearing in data centers: fast commodity networks with RDMA and an inexpensive approach to providing non-volatile DRAM.”

Page 32: Distributed Adrian Colyer Unevenly - QCon London · Do Less Testing! 12 Relative Improvement Cost Improvement Test Executions 40.58% Test Time 40.31% $1,567,608 Test Result Inspection

DrTMThe Doctor will see you now

32

5.5M tps on TPC-C6-node cluster.

Page 33: Distributed Adrian Colyer Unevenly - QCon London · Do Less Testing! 12 Relative Improvement Cost Improvement Test Executions 40.58% Test Time 40.31% $1,567,608 Test Result Inspection

Some things Change, Some stay the Same

33

Page 34: Distributed Adrian Colyer Unevenly - QCon London · Do Less Testing! 12 Relative Improvement Cost Improvement Test Executions 40.58% Test Time 40.31% $1,567,608 Test Result Inspection

A Brave New World

34

Fast RDMA networks +Ample Persistent Memory +Hardware Transactions +Enhanced HW Cache Management +Super-fast Storage + On-board FPGAs + GPUs + … = ???

Page 35: Distributed Adrian Colyer Unevenly - QCon London · Do Less Testing! 12 Relative Improvement Cost Improvement Test Executions 40.58% Test Time 40.31% $1,567,608 Test Result Inspection

Brainstorm

01

02

05

04rainstorm

03

5 Reasons to <3 Papers

Thinking tools

Raise Expectations

AppliedLessons The Great

Conversation

UnevenDistribution

35

Page 36: Distributed Adrian Colyer Unevenly - QCon London · Do Less Testing! 12 Relative Improvement Cost Improvement Test Executions 40.58% Test Time 40.31% $1,567,608 Test Result Inspection

A new paper every weekdayPublished at http://blog.acolyer.org.01Delivered Straight to your inboxIf you prefer email-based subscription to read at your leisure.02Announced on TwitterI’m @adriancolyer.03Go to a Papers We Love MeetupA repository of academic computer science papers and a community who loves reading them.04Share what you learnAnyone can take part in the great conversation.05

Page 37: Distributed Adrian Colyer Unevenly - QCon London · Do Less Testing! 12 Relative Improvement Cost Improvement Test Executions 40.58% Test Time 40.31% $1,567,608 Test Result Inspection

THANK YOU !@adriancolyer