qcon nyc: distributed systems in practice, in theory

111
Aysylu Greenberg June 14, 2016 Distributed Systems in Practice, in Theory

Upload: aysylu-greenberg

Post on 19-Jan-2017

1.066 views

Category:

Software


3 download

TRANSCRIPT

Page 1: QCon NYC: Distributed systems in practice, in theory

Aysylu GreenbergJune 14, 2016

Distributed Systems in Practice,in Theory

Page 2: QCon NYC: Distributed systems in practice, in theory

How I got into reading papers as a

practitioner in industry

Page 3: QCon NYC: Distributed systems in practice, in theory

Computer Science ResearchInDistributed Systems Industry

Page 4: QCon NYC: Distributed systems in practice, in theory

Operating systems research

Page 5: QCon NYC: Distributed systems in practice, in theory

Operating systems research

Page 6: QCon NYC: Distributed systems in practice, in theory

Operating systems research

Concurrency

Page 7: QCon NYC: Distributed systems in practice, in theory

Operating systems research

Concurrency

Concurrency primitives: mutex & semaphore

Page 8: QCon NYC: Distributed systems in practice, in theory

Operating systems research

Concurrency

Concurrency primitives: mutex & semaphore

Processes execute at different speeds

Page 9: QCon NYC: Distributed systems in practice, in theory

Time in distributed systems

https://www.flickr.com/photos/national_archives_of_norway/6263353228

Page 10: QCon NYC: Distributed systems in practice, in theory

Time in distributed systems

Page 11: QCon NYC: Distributed systems in practice, in theory

Time in distributed systems

Pipelining

Page 12: QCon NYC: Distributed systems in practice, in theory

1980

Page 13: QCon NYC: Distributed systems in practice, in theory

1980

Page 14: QCon NYC: Distributed systems in practice, in theory

Internet

1980

Page 15: QCon NYC: Distributed systems in practice, in theory

Internet

Distributed consensus

1980

Page 16: QCon NYC: Distributed systems in practice, in theory

Internet

Distributed consensus

1980

Page 17: QCon NYC: Distributed systems in practice, in theory

Internet

Distributed consensus

1980

Page 18: QCon NYC: Distributed systems in practice, in theory

Paxos

Internet

Distributed consensus

1980

Page 19: QCon NYC: Distributed systems in practice, in theory

Reconsider large systems

Page 20: QCon NYC: Distributed systems in practice, in theory

Reconsider large systems

Shared infrastructure

...

Page 21: QCon NYC: Distributed systems in practice, in theory

CS Research is Timeless

Inform decisions

Mitigate technical risk

Page 22: QCon NYC: Distributed systems in practice, in theory

* 22

Aysylu Greenberg

@aysylu22

Page 23: QCon NYC: Distributed systems in practice, in theory

Papers We Love NYC

Page 24: QCon NYC: Distributed systems in practice, in theory

Papers We Love SF

Page 25: QCon NYC: Distributed systems in practice, in theory

* 25

Aysylu Greenberg

@aysylu22

Page 26: QCon NYC: Distributed systems in practice, in theory

Today

● Staged Event-Driven Architecture

Page 27: QCon NYC: Distributed systems in practice, in theory

Today

● Staged Event-Driven Architecture● Leases

Page 28: QCon NYC: Distributed systems in practice, in theory

Today

● Staged Event-Driven Architecture● Leases● Inaccurate Computations

Page 29: QCon NYC: Distributed systems in practice, in theory

Staged Event Driven

Architecture&

Deep Pipelines

2001

Page 30: QCon NYC: Distributed systems in practice, in theory

Hardware to Data Pipelines

Page 31: QCon NYC: Distributed systems in practice, in theory

Hardware to Data Pipelines

https://en.wikipedia.org/wiki/Graphics_pipeline

Page 32: QCon NYC: Distributed systems in practice, in theory
Page 33: QCon NYC: Distributed systems in practice, in theory

Staged Event Driven Architecture

Page 34: QCon NYC: Distributed systems in practice, in theory

Staged Event Driven Architecture

+ -

Page 35: QCon NYC: Distributed systems in practice, in theory

Single-machine pipeline

generalizes to distributed pipelines

Staged Event Driven Architecture

Page 36: QCon NYC: Distributed systems in practice, in theory

Search Indexing Pipelines

Page 37: QCon NYC: Distributed systems in practice, in theory

Search Indexing Pipelines

Page 38: QCon NYC: Distributed systems in practice, in theory

Search Indexing Pipelines

Page 39: QCon NYC: Distributed systems in practice, in theory

Search Indexing Pipelines

Page 40: QCon NYC: Distributed systems in practice, in theory

Search Indexing Pipelines

Page 41: QCon NYC: Distributed systems in practice, in theory

Search Indexing Pipelines

Page 42: QCon NYC: Distributed systems in practice, in theory

Search Indexing Pipelines

Page 43: QCon NYC: Distributed systems in practice, in theory

Search Indexing Pipelines

Page 44: QCon NYC: Distributed systems in practice, in theory

Search Indexing Pipelines

+ -

Page 45: QCon NYC: Distributed systems in practice, in theory

Leasesas Heart Beat in

Distributed Systems

1989

Page 46: QCon NYC: Distributed systems in practice, in theory
Page 47: QCon NYC: Distributed systems in practice, in theory

Leases

● Distributed locking

Page 48: QCon NYC: Distributed systems in practice, in theory

Leases

● Distributed locking● Lease term tradeoffs

○ short

Page 49: QCon NYC: Distributed systems in practice, in theory

Leases

● Distributed locking● Lease term tradeoffs

○ short vs long

Page 50: QCon NYC: Distributed systems in practice, in theory

Leases

● Distributed locking● Lease term tradeoffs

○ short vs long● Use of leases in modern applications

○ Leader election TTL (in etcd)

Page 51: QCon NYC: Distributed systems in practice, in theory

Leases

● Distributed locking● Lease term tradeoffs

○ short vs long● Use of leases in modern applications

○ Leader election TTL (in etcd)○ Liveness detection

Page 52: QCon NYC: Distributed systems in practice, in theory
Page 53: QCon NYC: Distributed systems in practice, in theory

Leases in Build System:Success Scenario

Page 54: QCon NYC: Distributed systems in practice, in theory

Build my project

Build System

Page 55: QCon NYC: Distributed systems in practice, in theory

Build my project

Build System

OK

Page 56: QCon NYC: Distributed systems in practice, in theory

Build my project

Build System

OK

Waiting for the results

Page 57: QCon NYC: Distributed systems in practice, in theory

Build my project

Build System

OK

Waiting for the results

Build is in progress

Page 58: QCon NYC: Distributed systems in practice, in theory

Build my project

Build System

OK

Waiting for the results

Build is in progress

Waiting for the results

Page 59: QCon NYC: Distributed systems in practice, in theory

Build my project

Build System

OK

Waiting for the results

Build is in progress

Waiting for the results

Build is finished

Page 60: QCon NYC: Distributed systems in practice, in theory

Leases in Build System:Failure Scenario

Page 61: QCon NYC: Distributed systems in practice, in theory

Leases in Build System

Page 62: QCon NYC: Distributed systems in practice, in theory

Leases in Build System

Page 63: QCon NYC: Distributed systems in practice, in theory

Leases in Build System

Page 64: QCon NYC: Distributed systems in practice, in theory

Leases in Build System

Page 65: QCon NYC: Distributed systems in practice, in theory

Leases in Build System

Page 66: QCon NYC: Distributed systems in practice, in theory

Leases in Build System

Page 67: QCon NYC: Distributed systems in practice, in theory

Using etcd leases for heartbeat$ curl http://server.com/v2/keys/foo -XPUT -d\

value=bar -d ttl=300

Page 68: QCon NYC: Distributed systems in practice, in theory

{ "action": "set", "node": { "createdIndex": 2, "expiration":"2016-06-14T16:15:00", "key": "/foo", "modifiedIndex": 2, "ttl": 300, "value": "bar" }}

Page 69: QCon NYC: Distributed systems in practice, in theory

Using etcd leases for heartbeat$ curl http://server.com/v2/keys/foo -XPUT -d \

value=bar -d ttl=300

… 3 minutes later...

Page 70: QCon NYC: Distributed systems in practice, in theory

Using etcd leases for heartbeat$ curl http://server.com/v2/keys/foo -XPUT -d \

value=bar -d ttl=300

$ curl \

http://server.com/v2/keys/foo?prevValue=bar \

-XPUT -d ttl=300 -d refresh=true -d \

prevExist=true

Page 71: QCon NYC: Distributed systems in practice, in theory

{ "action": "update", "node": { "createdIndex": 2, "expiration":"2016-06-14T16:18:00", "key": "/foo", "modifiedIndex": 3, "ttl": 300, "value": "bar" } "prevNode": {...}}

Page 72: QCon NYC: Distributed systems in practice, in theory

{ "action": "update", "node": { "createdIndex": 2, "expiration":"2016-06-14T16:18:00", "key": "/foo", "modifiedIndex": 3, "ttl": 300, "value": "bar" } "prevNode": {...}}

"prevNode": { "createdIndex": 2, "expiration":"2016-06-14T16:15:00", "key": "/foo", "modifiedIndex": 2, "ttl": 120, "value": "bar"}

Page 73: QCon NYC: Distributed systems in practice, in theory

Leases for heartbeat:How long should the lease term be?

Page 74: QCon NYC: Distributed systems in practice, in theory

Inaccurate Computations&Serving Search Results

Page 75: QCon NYC: Distributed systems in practice, in theory

From Accurate to "Good Enough"

Page 76: QCon NYC: Distributed systems in practice, in theory

[Trade off] Inaccuracy for Performance

Page 77: QCon NYC: Distributed systems in practice, in theory
Page 78: QCon NYC: Distributed systems in practice, in theory
Page 79: QCon NYC: Distributed systems in practice, in theory
Page 80: QCon NYC: Distributed systems in practice, in theory

[Trade off] Inaccuracy for Resilience

Page 81: QCon NYC: Distributed systems in practice, in theory
Page 82: QCon NYC: Distributed systems in practice, in theory
Page 83: QCon NYC: Distributed systems in practice, in theory

Reduce

Map

Input

Map

Input

Map

Input

Page 84: QCon NYC: Distributed systems in practice, in theory

Inaccuracy for Resilience

1. Task decomposition

Page 85: QCon NYC: Distributed systems in practice, in theory
Page 86: QCon NYC: Distributed systems in practice, in theory

Inaccuracy for Resilience

1. Task decomposition2. Baseline for correctness

Page 87: QCon NYC: Distributed systems in practice, in theory
Page 88: QCon NYC: Distributed systems in practice, in theory

Inaccuracy for Resilience

1. Task decomposition2. Baseline for correctness3. Criticality Testing

Page 89: QCon NYC: Distributed systems in practice, in theory
Page 90: QCon NYC: Distributed systems in practice, in theory
Page 91: QCon NYC: Distributed systems in practice, in theory
Page 92: QCon NYC: Distributed systems in practice, in theory
Page 93: QCon NYC: Distributed systems in practice, in theory
Page 94: QCon NYC: Distributed systems in practice, in theory

Inaccuracy for Resilience

1. Task decomposition2. Baseline for correctness3. Criticality Testing4. Distortion and timing models

Page 95: QCon NYC: Distributed systems in practice, in theory

Distortion Model

Page 96: QCon NYC: Distributed systems in practice, in theory

Timing Model

Page 97: QCon NYC: Distributed systems in practice, in theory

[In production]Inaccuracy for Performance & Resilience

Page 98: QCon NYC: Distributed systems in practice, in theory

Jeff Dean "Building Software Systems at Google and Lessons Learned", Stanford, 2010

Page 99: QCon NYC: Distributed systems in practice, in theory
Page 100: QCon NYC: Distributed systems in practice, in theory
Page 101: QCon NYC: Distributed systems in practice, in theory

[Designing with]Inaccuracy for Performance & Resilience

Page 102: QCon NYC: Distributed systems in practice, in theory

[Designing with]Inaccuracy for Performance & Resilience

simplified implementation

focus on observabilityapplicable to some problem domains

Page 103: QCon NYC: Distributed systems in practice, in theory

[Designing with]Inaccuracy for Performance & Resilience

fuzz testing

generative testing

simplified implementation

fault injection testing

focus on observabilityapplicable to some problem domains

Page 104: QCon NYC: Distributed systems in practice, in theory

References● T. Wurthinger, C. Wimmer et al. "One VM to Rule Them

All"● M. Rinard "Probabilistic Accuracy Bounds for Fault-

Tolerant Computations that Discard Tasks"● F. Corbato, M. Daggett, R. Daley "An Experimental Time-

Sharing System"● E. Dijkstra "Cooperating Sequential Processes"● L. Lamport "Time, Clocks, and the Ordering of Events in a

Distributed System"● http://blinkdb.org/

Page 105: QCon NYC: Distributed systems in practice, in theory

References● B. Oki, B. Liskov "Viewstamped Replication: A New Primary Copy

Method to Support Highly-Available Distributed Systems"● L. Lamport "The Part-Time Parliament"● M. Welsh, D. Culler, E. Brewer "SEDA: An Architecture for Well-

Conditioned, Scalable Internet Services"● C. Gray, D. Cheriton "Leases: An Efficient Fault-Tolerant

Mechanism for Distributed File Cache Consistency"● S. Agarwal, B. Mozafari et al. "BlinkDB: Queries with Bounded

Errors and Bounded Response Times on Very Large Data"

Page 106: QCon NYC: Distributed systems in practice, in theory

GratitudeInes SombraDavid GreenbergKaran ParikhMatt WelshErran Berger

Page 107: QCon NYC: Distributed systems in practice, in theory

Robust & scalable pipelines

Page 108: QCon NYC: Distributed systems in practice, in theory

Robust & scalable pipelinesLeases for sharing &

heartbeat

Page 109: QCon NYC: Distributed systems in practice, in theory

Robust & scalable pipelinesLeases for sharing &

heartbeatInaccuracy for resilience &

performance

Page 110: QCon NYC: Distributed systems in practice, in theory

Robust & scalable pipelinesLeases for sharing &

heartbeatInaccuracy for resilience &

performance

CS research is timeless:use it to mitigate risk

Page 111: QCon NYC: Distributed systems in practice, in theory

Aysylu GreenbergJune 14, 2016

Distributed Systems in Practice,in Theory

@aysylu22