dynamo: not just for datastores

56
Dynamo: not just datastores Susan Potter September 2011

Upload: susan-potter

Post on 06-May-2015

2.739 views

Category:

Technology


1 download

DESCRIPTION

Find out how to build decentralized, fault-tolerant, stateful application services using core concepts and techniques from the Amazon Dynamo paper using riak_core as a toolkit.

TRANSCRIPT

Page 1: Dynamo: Not Just For Datastores

Dynamo: not just datastores

Susan Potter

September 2011

Page 2: Dynamo: Not Just For Datastores

Overview: In a nutshell

Figure: "a highly available key-value storage system"

Page 3: Dynamo: Not Just For Datastores

Overview: Not for all apps

Page 4: Dynamo: Not Just For Datastores

Overview: Agenda

Distribution & consistency

t - Fault tolerance

N, R, W

Dynamo techniques

Riak abstractions

Building an application

Considerations

Oh, the possibilities!

Page 5: Dynamo: Not Just For Datastores

Distributed models & consistency

Are you on ACID?

Are your apps ACID?

Few apps requirestrong consistency

Embrace BASEfor high availability

Page 6: Dynamo: Not Just For Datastores

Distributed models & consistency

Are you on ACID?

Are your apps ACID?

Few apps requirestrong consistency

Embrace BASEfor high availability

Page 7: Dynamo: Not Just For Datastores

Distributed models & consistency

Are you on ACID?

Are your apps ACID?

Few apps requirestrong consistency

Embrace BASEfor high availability

Page 8: Dynamo: Not Just For Datastores

Distributed models & consistency

Are you on ACID?

Are your apps ACID?

Few apps requirestrong consistency

Embrace BASEfor high availability

Page 9: Dynamo: Not Just For Datastores

Distributed models & consistency

B A S EBasically Available Soft-state Eventual consistency

</RANT>

Page 10: Dynamo: Not Just For Datastores

Distributed models & consistency

B A S EBasically Available Soft-state Eventual consistency

</RANT>

Page 11: Dynamo: Not Just For Datastores

t-Fault tolerance: Two kinds of failure

Page 12: Dynamo: Not Just For Datastores

t-Fault tolerance: More on failure

Failstopfail in well understood / deterministic ways

t-Fault tolerance means t + 1 nodes

Byzantinefail in arbitrary non-deterministic ways

t-Fault tolerance means 2t + 1 nodes

Page 13: Dynamo: Not Just For Datastores

t-Fault tolerance: More on failure

Failstopfail in well understood / deterministic ways

t-Fault tolerance means t + 1 nodes

Byzantinefail in arbitrary non-deterministic ways

t-Fault tolerance means 2t + 1 nodes

Page 14: Dynamo: Not Just For Datastores

t-Fault tolerance: More on failure

Failstopfail in well understood / deterministic ways

t-Fault tolerance means t + 1 nodes

Byzantinefail in arbitrary non-deterministic ways

t-Fault tolerance means 2t + 1 nodes

Page 15: Dynamo: Not Just For Datastores

t-Fault tolerance: More on failure

Failstopfail in well understood / deterministic ways

t-Fault tolerance means t + 1 nodes

Byzantinefail in arbitrary non-deterministic ways

t-Fault tolerance means 2t + 1 nodes

Page 16: Dynamo: Not Just For Datastores

CAP Controls: N, R, W

N = number of replicasmust be 1 ≤ N ≤ nodes

R = number of responding read nodescan be set per request

W = number of responding write nodescan be set per request

Q = quorum "majority rules"set in stone: Q = N/2 + 1

Tunabilitywhen R = W = N =⇒ strong

when R + W > N =⇒ quorum where W = 1,R = N or W = N,R = 1 or W = R = Q

Page 17: Dynamo: Not Just For Datastores

CAP Controls: N, R, W

N = number of replicasmust be 1 ≤ N ≤ nodes

R = number of responding read nodescan be set per request

W = number of responding write nodescan be set per request

Q = quorum "majority rules"set in stone: Q = N/2 + 1

Tunabilitywhen R = W = N =⇒ strong

when R + W > N =⇒ quorum where W = 1,R = N or W = N,R = 1 or W = R = Q

Page 18: Dynamo: Not Just For Datastores

CAP Controls: N, R, W

N = number of replicasmust be 1 ≤ N ≤ nodes

R = number of responding read nodescan be set per request

W = number of responding write nodescan be set per request

Q = quorum "majority rules"set in stone: Q = N/2 + 1

Tunabilitywhen R = W = N =⇒ strong

when R + W > N =⇒ quorum where W = 1,R = N or W = N,R = 1 or W = R = Q

Page 19: Dynamo: Not Just For Datastores

CAP Controls: N, R, W

N = number of replicasmust be 1 ≤ N ≤ nodes

R = number of responding read nodescan be set per request

W = number of responding write nodescan be set per request

Q = quorum "majority rules"set in stone: Q = N/2 + 1

Tunabilitywhen R = W = N =⇒ strong

when R + W > N =⇒ quorum where W = 1,R = N or W = N,R = 1 or W = R = Q

Page 20: Dynamo: Not Just For Datastores

CAP Controls: N, R, W

N = number of replicasmust be 1 ≤ N ≤ nodes

R = number of responding read nodescan be set per request

W = number of responding write nodescan be set per request

Q = quorum "majority rules"set in stone: Q = N/2 + 1

Tunabilitywhen R = W = N =⇒ strong

when R + W > N =⇒ quorum where W = 1,R = N or W = N,R = 1 or W = R = Q

Page 21: Dynamo: Not Just For Datastores

Dynamo: Properties

Decentralizedno masters exist

Homogeneousnodes have same capabilities

No Global Stateno SPOFs for global state

Deterministic Replica Placementeach node can calculate where replicas should exist for key

Logical TimeNo reliance on physical time

Page 22: Dynamo: Not Just For Datastores

Dynamo: Properties

Decentralizedno masters exist

Homogeneousnodes have same capabilities

No Global Stateno SPOFs for global state

Deterministic Replica Placementeach node can calculate where replicas should exist for key

Logical TimeNo reliance on physical time

Page 23: Dynamo: Not Just For Datastores

Dynamo: Properties

Decentralizedno masters exist

Homogeneousnodes have same capabilities

No Global Stateno SPOFs for global state

Deterministic Replica Placementeach node can calculate where replicas should exist for key

Logical TimeNo reliance on physical time

Page 24: Dynamo: Not Just For Datastores

Dynamo: Properties

Decentralizedno masters exist

Homogeneousnodes have same capabilities

No Global Stateno SPOFs for global state

Deterministic Replica Placementeach node can calculate where replicas should exist for key

Logical TimeNo reliance on physical time

Page 25: Dynamo: Not Just For Datastores

Dynamo: Properties

Decentralizedno masters exist

Homogeneousnodes have same capabilities

No Global Stateno SPOFs for global state

Deterministic Replica Placementeach node can calculate where replicas should exist for key

Logical TimeNo reliance on physical time

Page 26: Dynamo: Not Just For Datastores

Dynamo: Techniques

Consistent Hashing

Vector Clocks

Gossip Protocol

Hinted Handoff

Page 27: Dynamo: Not Just For Datastores

Dynamo: Techniques

Consistent Hashing

Vector Clocks

Gossip Protocol

Hinted Handoff

Page 28: Dynamo: Not Just For Datastores

Dynamo: Consistent hashing

Figure: PrefsList for K is [C, D, E]

Page 29: Dynamo: Not Just For Datastores

Dynamo: Vector clocks

-type vclock() ::[{actor(), counter()}]

A occurred-before B if counters in A areless than or equal to those in B for each actor

Page 30: Dynamo: Not Just For Datastores

Dynamo: Gossip protocol

Figure: "You would never guess what I heard down the pub..."

Page 31: Dynamo: Not Just For Datastores

Dynamo: Hinted handoff

Figure: "Here, take the baton and take over from me. KTHXBAI"

Page 32: Dynamo: Not Just For Datastores

riak_core: Abstractions

Coordinatorenforces consistency requirements, performing anti-entropy, gen_fsm, coordinates vnodes

VNodesErlang process, vnode to hashring partition, delegated work for its partition, unit of replication

Watchersgen_event, listens for ring and service events, used to calculate fallback nodes

Ring Managerstores local node copy of ring (and cluster?) data

Ring Event Handlersnotified about ring (and cluster?) changes and broadcasts plus metadata changes

Page 33: Dynamo: Not Just For Datastores

riak_core: Abstractions

Coordinatorenforces consistency requirements, performing anti-entropy, gen_fsm, coordinates vnodes

VNodesErlang process, vnode to hashring partition, delegated work for its partition, unit of replication

Watchersgen_event, listens for ring and service events, used to calculate fallback nodes

Ring Managerstores local node copy of ring (and cluster?) data

Ring Event Handlersnotified about ring (and cluster?) changes and broadcasts plus metadata changes

Page 34: Dynamo: Not Just For Datastores

riak_core: Abstractions

Coordinatorenforces consistency requirements, performing anti-entropy, gen_fsm, coordinates vnodes

VNodesErlang process, vnode to hashring partition, delegated work for its partition, unit of replication

Watchersgen_event, listens for ring and service events, used to calculate fallback nodes

Ring Managerstores local node copy of ring (and cluster?) data

Ring Event Handlersnotified about ring (and cluster?) changes and broadcasts plus metadata changes

Page 35: Dynamo: Not Just For Datastores

riak_core: Abstractions

Coordinatorenforces consistency requirements, performing anti-entropy, gen_fsm, coordinates vnodes

VNodesErlang process, vnode to hashring partition, delegated work for its partition, unit of replication

Watchersgen_event, listens for ring and service events, used to calculate fallback nodes

Ring Managerstores local node copy of ring (and cluster?) data

Ring Event Handlersnotified about ring (and cluster?) changes and broadcasts plus metadata changes

Page 36: Dynamo: Not Just For Datastores

riak_core: Abstractions

Coordinatorenforces consistency requirements, performing anti-entropy, gen_fsm, coordinates vnodes

VNodesErlang process, vnode to hashring partition, delegated work for its partition, unit of replication

Watchersgen_event, listens for ring and service events, used to calculate fallback nodes

Ring Managerstores local node copy of ring (and cluster?) data

Ring Event Handlersnotified about ring (and cluster?) changes and broadcasts plus metadata changes

Page 37: Dynamo: Not Just For Datastores

riak_core: Coordinator

Figure:

Page 38: Dynamo: Not Just For Datastores

riak_core: Commands

-type riak_cmd() ::{verb(), key(), payload()}Implement handle_command/3 clause for each command in your

callback VNode module

Page 39: Dynamo: Not Just For Datastores

riak_core: VNodes

Figure: Sample handle_command/3 from RTS example app

Page 40: Dynamo: Not Just For Datastores

riak_core: VNodes

Figure: Sample handoff functions from RTS example app

Page 41: Dynamo: Not Just For Datastores

riak_core: Project Structure

rebar ;)also written by the Basho team, makes OTP building and deploying much less painless

dependenciesadd you_app, riak_core, riak_kv, etc. as dependencies to shell project

new in 1.0 stuffcluster vs ring membership, riak_pipe, etc.

Page 42: Dynamo: Not Just For Datastores

riak_core: Project Structure

rebar ;)also written by the Basho team, makes OTP building and deploying much less painless

dependenciesadd you_app, riak_core, riak_kv, etc. as dependencies to shell project

new in 1.0 stuffcluster vs ring membership, riak_pipe, etc.

Page 43: Dynamo: Not Just For Datastores

riak_core: Project Structure

rebar ;)also written by the Basho team, makes OTP building and deploying much less painless

dependenciesadd you_app, riak_core, riak_kv, etc. as dependencies to shell project

new in 1.0 stuffcluster vs ring membership, riak_pipe, etc.

Page 44: Dynamo: Not Just For Datastores

Considerations

Other:stuff()

Interface layerriak_core apps need to implement their own interface layer, e.g. HTTP, XMPP, AMQP, MsgPack, ProtoBuffers

Securing applicationriak_core gossip does not address identity/authZ/authN between nodes; relies on Erlang cookies

Distribution modelse.g. pipelined, laned vs tiered

Query/execution modelse.g. map-reduce (M/R), ASsociated SET (ASSET)

Page 45: Dynamo: Not Just For Datastores

Considerations

Other:stuff()

Interface layerriak_core apps need to implement their own interface layer, e.g. HTTP, XMPP, AMQP, MsgPack, ProtoBuffers

Securing applicationriak_core gossip does not address identity/authZ/authN between nodes; relies on Erlang cookies

Distribution modelse.g. pipelined, laned vs tiered

Query/execution modelse.g. map-reduce (M/R), ASsociated SET (ASSET)

Page 46: Dynamo: Not Just For Datastores

Considerations

Other:stuff()

Interface layerriak_core apps need to implement their own interface layer, e.g. HTTP, XMPP, AMQP, MsgPack, ProtoBuffers

Securing applicationriak_core gossip does not address identity/authZ/authN between nodes; relies on Erlang cookies

Distribution modelse.g. pipelined, laned vs tiered

Query/execution modelse.g. map-reduce (M/R), ASsociated SET (ASSET)

Page 47: Dynamo: Not Just For Datastores

Considerations

Other:stuff()

Interface layerriak_core apps need to implement their own interface layer, e.g. HTTP, XMPP, AMQP, MsgPack, ProtoBuffers

Securing applicationriak_core gossip does not address identity/authZ/authN between nodes; relies on Erlang cookies

Distribution modelse.g. pipelined, laned vs tiered

Query/execution modelse.g. map-reduce (M/R), ASsociated SET (ASSET)

Page 48: Dynamo: Not Just For Datastores

Oh, the possibilities!

What:next()

Concurrency modelse.g. actor, disruptor, evented/reactor, threading

Consistency modelse.g. vector-field, causal, FIFO

Computation optimizationse.g. General Purpose GPU programming, native interfacing (NIFs, JInterface, Scalang?)

Other optimizationse.g. Client discovery, Remote Direct Memory Access (RDMA)

Page 49: Dynamo: Not Just For Datastores

Oh, the possibilities!

What:next()

Concurrency modelse.g. actor, disruptor, evented/reactor, threading

Consistency modelse.g. vector-field, causal, FIFO

Computation optimizationse.g. General Purpose GPU programming, native interfacing (NIFs, JInterface, Scalang?)

Other optimizationse.g. Client discovery, Remote Direct Memory Access (RDMA)

Page 50: Dynamo: Not Just For Datastores

Oh, the possibilities!

What:next()

Concurrency modelse.g. actor, disruptor, evented/reactor, threading

Consistency modelse.g. vector-field, causal, FIFO

Computation optimizationse.g. General Purpose GPU programming, native interfacing (NIFs, JInterface, Scalang?)

Other optimizationse.g. Client discovery, Remote Direct Memory Access (RDMA)

Page 51: Dynamo: Not Just For Datastores

Oh, the possibilities!

What:next()

Concurrency modelse.g. actor, disruptor, evented/reactor, threading

Consistency modelse.g. vector-field, causal, FIFO

Computation optimizationse.g. General Purpose GPU programming, native interfacing (NIFs, JInterface, Scalang?)

Other optimizationse.g. Client discovery, Remote Direct Memory Access (RDMA)

Page 52: Dynamo: Not Just For Datastores

# finger $(whoami)

Login: susan Name: Susan PotterDirectory: /home/susan Shell: /bin/zshOn since Mon 29 Sep 1997 21:18 (GMT) on tty1 from :0Too much unread mail on [email protected] working at Assistly! Looking for smart developers!;)Plan:github: mbbx6spptwitter: @SusanPotter

Page 53: Dynamo: Not Just For Datastores

Slides & Material

https://github.com/mbbx6spp/riak_core-templates

Figure: http://susanpotter.net/talks/strange-loop/2011/dynamo-not-just-for-datastores/

Page 54: Dynamo: Not Just For Datastores

Examples

Rebar Templateshttps://github.com/mbbx6spp/riak_core-templates

Riak Pipeshttps://github.com/basho/riak_pipe

Riak Zabhttps://github.com/jtuple/riak_zab

Page 55: Dynamo: Not Just For Datastores

Questions?

Figure: http://www.flickr.com/photos/42682395@N04/

@SusanPotter

Page 56: Dynamo: Not Just For Datastores

Questions?

Figure: http://www.flickr.com/photos/42682395@N04/

@SusanPotter