the inner workings of dynamo db

Post on 11-Jun-2015

667 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

An introduction to the inner works of dynamo db

TRANSCRIPT

Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com

THE INNER WORKINGS OF AMAZON DYNAMO

Jonathan Lau Nov 2013

Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com

MOTIVATION AND BIO

• Early stage companies

• Build bigger system

• Specialize in backend system

Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com

DISTRIBUTE / CENTRALIZEDistributed Centralized

Data Different data for each node One master copy

Replicas Replicate smaller data set for each of the nodes

Replicate the master copy into read slaves

Scaling Data are shared into the nodes by default Extra work to shard

Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com

WHAT ABOUT NOSQL?High performance solution != scaling

Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com

DYNAMO DESIGN CONSIDERATION

• Distributed key value store

• Incremental scalability - Scaling one node at a time

• Decentralized design - Gossip-based protocol for membership and failure detection

• Symmetry - All the nodes have the same functionality

• Heterogeneity - The system will be deployed in a environment with huge variance on hardware and system performance.

Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com

H

F

A

G C

E

B

D

put()get()

Request for key "K", which is in [C, D)

HIGH LEVEL CONCEPTDistribute the data in N nodes in a ring

Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com

DYNAMO’S CHALLENGES• Data partitioning

• N-1 replicas

• High availability for writes

• Handling temporary failures

• Recovering from permanent failures

• Membership and failure detection

Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com

A CB D

Request for key K in [B, C)

PARTITIONING

• 128 bit MD5 hash

• Consistent hashing for key partitioning

• Virtual node helps improve the local distribution

• Request can hit any of the node on the key preference list (coordinator)

Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com

REPLICATION

• Replication is stored by N-1 successor nodes

• The nodes with the replicas and the coordinator node forms the preference list.

Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com

AVAILABLE FOR WRITES• Accepts all the writes based on the version modified

• Tracking modification and base version by vector clock

• Accepts all the writes and the vector clock

• Conflict resolution by examining the vector clock on the objects and reconcile during the read operation

• Consistency issue arises because of network or node failure

• Oldest vector clock items will be purged

Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com

HANDLING TEMPORARY FAILURES

• Trade off between durability and availability

• Sloppy Quorum - write / read is only consider successful if the first N healthy nodes return from the preference list.

• Hinted hand off - write will be picked up by the replicas when the designated coordinator node is down. The write picked up by replica will have hint about the intended recipient for the write so we can reconstruct the state.

Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com

REPLICA SYNCHRON

• Dynamo uses Merkle tree to track hash for the keys

• Passing only the root hash to validate synchronization states between the replicas

• If a replica is deemed to be out of sync, the node can traverse down the tree to figure out the exact mismatch portion.

Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com

NODE MEMBERSHIP• Partition and placement information is propagate via a

gossip protocol

• Each node will be aware of the token range of its peer

• They have seed node in the cluster to speed up the membership and the key range membership for the ring

• Nodes are not really aware of each other until an actual delete happens

Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com

GET() AND PUT()What happen during a read or write request?

Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com

GET() AND PUT()• get() and put() are routed through a generic load balancer +

partition aware library to route traffic

• top N nodes in the preference list for key K are the coordinators.

• Requests basically go down the list and bad nodes are skipped over

• Two configuration parameters: R and W, where R + W > N. 

Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com

MORE ON GET() AND PUT()When a writes happens: 

• coordinator generates a vector clock value

• sends the new value along with the vector clock value to N highest ranked reachable nodes

• If at least W-1 node responded, the write is considered successful.

When a read happens: 

• coordinate sends a read request to N highest ranked reachable nodes

• wait for R nodes return, and then return the result to client

Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com

WHAT DOES IT ALL MEANHow does all these ties in together?

Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com

WHAT DOES IT MEAN?• Dynamo shards the data from day 1

• Replica and redundancy is baked in from day 1

• The configuration parameter W and R has a huge effect our trade off between availability and durability.

• W + R > N

• Consistency resolution at read will allow more controlled conflict resolution strategy

Smokehouse Software | Jonathan Lau | jon@smokehousesoftware.com

HAPPY SCALING

Read the dynamo design paper @

http://bit.ly/QeM8AC

top related