distributed in space and time

40
Dr. Roland Kuhn Akka Tech Lead @rolandkuhn Distributed in Space and Time

Upload: roland-kuhn

Post on 02-Jul-2015

797 views

Category:

Software


1 download

DESCRIPTION

How do I create an application that is always available, even in the face of catastrophic hardware failures? How do I make it cope with large fluctuations in the number of active users or their behavior? The answers to both questions have one thing in common: the need to spread the application out, to distribute it. In this presentation we take a deep dive into the architecture of reactive applications (see http://reactivemanifesto.org/), starting from cheap and efficient best-effort messaging and building delivery guarantees and persistence on top—with examples using Akka. Understanding these mechanisms and how they relate to each other will allow you to build systems that strike the right balance between partition tolerance, consistency, availability, implementation complexity and operational cost.

TRANSCRIPT

Page 1: Distributed in Space and Time

Dr. Roland Kuhn Akka Tech Lead @rolandkuhn

Distributed in Space and Time

Page 2: Distributed in Space and Time

The Future is Bright

2

source: http://nature.desktopnexus.com/wallpaper/58502/

Page 3: Distributed in Space and Time

The Future is Bright

• Computing Resources: Effectively Unbounded • 18-core CPU packages, 1024 around the corner • on-demand provisioning in the cloud

• Demand for Web-Scale Apps: Ravenous • everyone has the tools • successful ideas are bought for Giga-$

3

Page 4: Distributed in Space and Time

How do I make a Large-Scale App?

• figure out a service to provide to everyone • this will require resources accordingly: • storage • computation • communication

• scalability implies no shared contention points • need to be elastic ➟ use the cloud

4

Page 5: Distributed in Space and Time

How do I make an Always-Available App?

• must not have single point of failure • all functions and resources are replicated • protect against data loss • protect against service downtime • regional content delivery

• need to distribute ➟ use the cloud

5

Page 6: Distributed in Space and Time

How do I make a Persistent App?

• runtime replication usually not sufficient • system of record needed

• one central database is not tenable • persist “locally” and replicate changes

6

Page 7: Distributed in Space and Time

How do I make a Persistent App?

7

Page 8: Distributed in Space and Time

How do I make a Persistent App?

8

Page 9: Distributed in Space and Time

How do I make a Persistent App?

• runtime replication usually not sufficient • system of record needed

• one central database is not tenable • persist “locally” and replicate changes • temporally distributed change records • spatially distributed change log • granularity can be one Aggregate Root (Event Sourcing) • extract business intelligence from history

• scale & distribute persistence ➟ in the cloud

9

Page 10: Distributed in Space and Time

As Noted Earlier: The Future is Bright

10

source: http://nature.desktopnexus.com/wallpaper/58502/

Page 11: Distributed in Space and Time

… or is it?

11

Photograph: Patrick Cromerford, 2011

Page 12: Distributed in Space and Time

Distribution: a Double-Edged Sword

• did that message make it over the network? • is that other machine running? • are our data centers in sync? • did we lose messages in that hardware crash?

12

Page 13: Distributed in Space and Time

The Essence of the Problem

A distributed system is one whose parts can fail independently of each other.

13

Page 14: Distributed in Space and Time

Fighting Back!

14

source: www.hear2heal.com (Kokopelli Rain Dance)

Page 15: Distributed in Space and Time

Transactional Storage Consensus

Message Delivery …

Page 16: Distributed in Space and Time

Transactional Storage

• Serializability simplifies reasoning about behavior • … but is at odds with distribution (CAP theorem) • Eventual Consistency is often sufficient

(Peter Bailis: Probabilistically Bounded Staleness) • Scalability can be regained by partitioning

(Pat Helland: Life beyond Distributed Transactions) • consistency has a cost • you need to choose how much is necessary

16

Page 17: Distributed in Space and Time

PAXOS and Friends

• problem: distributed consensus • hard solutions favor Consistency • 2PC, PAXOS, RAFT • if a node is unreachable then consensus cannot be reached

• soft solutions favor Availability • quorum, allow nodes to drop out • partitions can lead to more than one active set

• consistency always has a cost • you need to choose which one is appropriate

17

Page 18: Distributed in Space and Time

Fighting Message Loss

• resending is the only cure • sender stores the message and retries • recipient must send acknowledgement eventually

• fighting unreliable medium is sender’s obligation • sender’s buffer must be persistent • resending must begin anew after a crash

18

Page 19: Distributed in Space and Time

At-Least-Once Delivery with Akka

19

class ALOD(other: ActorPath) extends PersistentActor with AtLeastOnceDelivery { val IDs = Iterator from 1

def receiveCommand = { case Command(x) => persist(ImportantResult(x)) { ir => sender() ! Done(x) deliver(other, FollowUpCommand(x, IDs.next(), _)) } case rc: ReceiptConfirmed => persist(rc) { rc => confirmDelivery(rc.deliveryId) } }

def receiveRecover = { case ImportantResult(x) => deliver(other, FollowUpCommand(x, IDs.next(), _)) case ReceiptConfirmed(_, id) => confirmDelivery(id) }}

Page 20: Distributed in Space and Time

At-Least-Once Delivery with Akka

20

class ALOD(other: ActorPath) extends PersistentActor with AtLeastOnceDelivery { val IDs = Iterator from 1

def receiveCommand = { case Command(x) => persist(ImportantResult(x)) { ir => sender() ! Done(x) deliver(other, FollowUpCommand(x, IDs.next(), _)) } case rc: ReceiptConfirmed => persist(rc) { rc => confirmDelivery(rc.deliveryId) } }

def receiveRecover = { case ImportantResult(x) => deliver(other, FollowUpCommand(x, IDs.next(), _)) case ReceiptConfirmed(_, id) => confirmDelivery(id) }}

Page 21: Distributed in Space and Time

At-Least-Once Delivery with Akka

21

class ALOD(other: ActorPath) extends PersistentActor with AtLeastOnceDelivery { val IDs = Iterator from 1

def receiveCommand = { case Command(x) => persist(ImportantResult(x)) { ir => sender() ! Done(x) deliver(other, FollowUpCommand(x, IDs.next(), _)) } case rc: ReceiptConfirmed => persist(rc) { rc => confirmDelivery(rc.deliveryId) } }

def receiveRecover = { case ImportantResult(x) => deliver(other, FollowUpCommand(x, IDs.next(), _)) case ReceiptConfirmed(_, id) => confirmDelivery(id) }}

Page 22: Distributed in Space and Time

At-Least-Once Delivery with Akka

22

class ALOD(other: ActorPath) extends PersistentActor with AtLeastOnceDelivery { val IDs = Iterator from 1

def receiveCommand = { case Command(x) => persist(ImportantResult(x)) { ir => sender() ! Done(x) deliver(other, FollowUpCommand(x, IDs.next(), _)) } case rc: ReceiptConfirmed => persist(rc) { rc => confirmDelivery(rc.deliveryId) } }

def receiveRecover = { case ImportantResult(x) => deliver(other, FollowUpCommand(x, IDs.next(), _)) case ReceiptConfirmed(_, id) => confirmDelivery(id) }}

Page 23: Distributed in Space and Time

At-Least-Once Delivery with Akka

23

class ALOD(other: ActorPath) extends PersistentActor with AtLeastOnceDelivery { val IDs = Iterator from 1

def receiveCommand = { case Command(x) => persist(ImportantResult(x)) { ir => sender() ! Done(x) deliver(other, FollowUpCommand(x, IDs.next(), _)) } case rc: ReceiptConfirmed => persist(rc) { rc => confirmDelivery(rc.deliveryId) } }

def receiveRecover = { case ImportantResult(x) => deliver(other, FollowUpCommand(x, IDs.next(), _)) case ReceiptConfirmed(_, id) => confirmDelivery(id) }}

Page 24: Distributed in Space and Time

Mysterious Double-Bookings

• acknowledgements can be lost • sender will faithfully duplicate the message • therefore named at-least-once • recipient is responsible for dealing with this

24

Page 25: Distributed in Space and Time

Exactly-Once Delivery

• processing occurs only once for each message • all middleware does it* • EIP: Guaranteed Delivery just means persistence • JMS: configure session for AUTO_ACKNOWLEDGE • Google’s MillWheel

25

Page 26: Distributed in Space and Time

Exactly-Once Delivery with Akka

26

class PersistThenAct extends PersistentActor { var nextId = 1 def receiveCommand = { case FollowUpCommand(x, id, deliveryId) => val rc = ReceiptConfirmed(id, deliveryId) if (id == nextId) { persist(rc) { rc => sender() ! rc doTheAction() nextId += 1 } } else if (id < nextId) sender() ! rc // otherwise an older command was lost, so wait for the resend } def receiveRecover = { case ReceiptConfirmed(id, _) => nextId = id + 1 } private def doTheAction(): Unit = () // actually do the action}

Page 27: Distributed in Space and Time

Exactly-Once Delivery with Akka

27

class PersistThenAct extends PersistentActor { var nextId = 1 def receiveCommand = { case FollowUpCommand(x, id, deliveryId) => val rc = ReceiptConfirmed(id, deliveryId) if (id == nextId) { persist(rc) { rc => sender() ! rc doTheAction() nextId += 1 } } else if (id < nextId) sender() ! rc // otherwise an older command was lost, so wait for the resend } def receiveRecover = { case ReceiptConfirmed(id, _) => nextId = id + 1 } private def doTheAction(): Unit = () // actually do the action}

Page 28: Distributed in Space and Time

Exactly-Once Delivery with Akka

28

class PersistThenAct extends PersistentActor { var nextId = 1 def receiveCommand = { case FollowUpCommand(x, id, deliveryId) => val rc = ReceiptConfirmed(id, deliveryId) if (id == nextId) { persist(rc) { rc => sender() ! rc doTheAction() nextId += 1 } } else if (id < nextId) sender() ! rc // otherwise an older command was lost, so wait for the resend } def receiveRecover = { case ReceiptConfirmed(id, _) => nextId = id + 1 } private def doTheAction(): Unit = () // actually do the action}

Page 29: Distributed in Space and Time

Exactly-Once Delivery with Akka

29

class PersistThenAct extends PersistentActor { var nextId = 1 def receiveCommand = { case FollowUpCommand(x, id, deliveryId) => val rc = ReceiptConfirmed(id, deliveryId) if (id == nextId) { persist(rc) { rc => sender() ! rc doTheAction() nextId += 1 } } else if (id < nextId) sender() ! rc // otherwise an older command was lost, so wait for the resend } def receiveRecover = { case ReceiptConfirmed(id, _) => nextId = id + 1 } private def doTheAction(): Unit = () // actually do the action}

Page 30: Distributed in Space and Time

Exactly-Once Delivery with Akka

30

class PersistThenAct extends PersistentActor { var nextId = 1 def receiveCommand = { case FollowUpCommand(x, id, deliveryId) => val rc = ReceiptConfirmed(id, deliveryId) if (id == nextId) { persist(rc) { rc => sender() ! rc doTheAction() nextId += 1 } } else if (id < nextId) sender() ! rc // otherwise an older command was lost, so wait for the resend } def receiveRecover = { case ReceiptConfirmed(id, _) => nextId = id + 1 } private def doTheAction(): Unit = () // actually do the action}

Page 31: Distributed in Space and Time

Exactly-Once Delivery with Akka

31

class PersistThenAct extends PersistentActor { var nextId = 1 def receiveCommand = { case FollowUpCommand(x, id, deliveryId) => val rc = ReceiptConfirmed(id, deliveryId) if (id == nextId) { doTheAction() persist(rc) { rc => sender() ! rc nextId += 1 } } else if (id < nextId) sender() ! rc // otherwise an older command was lost, so wait for the resend } def receiveRecover = { case ReceiptConfirmed(id, _) => nextId = id + 1 } private def doTheAction(): Unit = () // actually do the action}

Page 32: Distributed in Space and Time

The Hard Truth

CPU can stop between any two instructions, therefore exactly-once semantics cannot be Guaranteed*.

*with a capital G, i.e. with 100% absolute coverage

32

Page 33: Distributed in Space and Time

Mitigating that Hard Truth

• probability of loss or duplication can be small • many systems are fine with almost-exactly-once • still need to decide on which side to err • persist-then-act ➟ almost-exactly-but-at-most-once • act-then-persist ➟ almost-exactly-but-at-least-once

33

Page 34: Distributed in Space and Time

We Can Do Better!

• an operation is idempotent if multiple application produces the same result as single application • (side-)effects must also be idempotent • unique transaction identifiers etc.

• this achieves effectively-exactly-once • can save persistence round-trip latency by

optimistic execution

34

Page 35: Distributed in Space and Time

Effectively-Exactly-Once with Akka

35

class Idempotent(other: ActorRef) extends PersistentActor { var nextId = 1 def receiveCommand = { case FollowUpCommand(x, id, deliveryId) => val rc = ReceiptConfirmed(id, deliveryId) if (id == nextId) { // optimistically execute if (isValid(x)) { updateState(x) other ! Command(x) // assuming idempotent receiver } else sender() ! Invalid(x) nextId += 1 // then take care of reliable delivery handling persistAsync(rc) { rc => sender() ! rc } } else if (id < nextId) sender() ! rc } ...}

Page 36: Distributed in Space and Time

The Future is Bright Again!

36

source: http://freewallpapersbackgrounds.com/

Page 37: Distributed in Space and Time

To Recapitulate:

37

Problem Choices

Transactional Storagemonolithic or distributed/partitioned, serializable or eventually consistent

Cluster Managementstrong consensus or convergent gossip, consistency or availability

Message Deliveryat-most-once, at-least-once, almost-exactly-once, effectively-exactly-once

… …

Page 38: Distributed in Space and Time

Conclusion: Consistency à la Carte

• not all parts of a system have the same needs • expend the effort only where necessary • idempotency does not come for free • confirmation handling cannot always be abstracted away

• incur the runtime overhead only where needed • persistence round-trips take time • consensus protocols take more time

• include required semantics in system design

38

Page 39: Distributed in Space and Time

Remember that the trade-off is always between

consistency and availability.

And latency.

Page 40: Distributed in Space and Time

©Typesafe 2014 – All Rights Reserved