algebraic data types: semilattices

Post on 14-Dec-2014

271 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Introduction to the algebraic data type Semilattice and its application in distributed environments.

TRANSCRIPT

Algebraic data type: Semilattices

a.k.a eventually consistent data structures

Bernhard Huemer IRIAN Solutions

@bhuemer

.. because distributed is the new normal

Why are we here?

Shamelessly stolen from: https://skillsmatter.com/skillscasts/4915-how-do-we-reconcile-eventually-consistent-data

Source: Wikimedia Commons

130 ms

E = MC2

Latency might be one reason why you want distribution

Shamelessly stolen from: https://skillsmatter.com/skillscasts/4915-how-do-we-reconcile-eventually-consistent-data

Scale-up vs scale-out

foo = foo + 1

foo = foo + 2

Race conditions

Network partitions

Conflict resolution (1)

Not clinging to some total order will make your life easier

Conflict resolution (2)

Leave it to the user to resolve conflicts, often there’s something meaningful you can do (e.g. merge shopping carts)

G-Counters

Conflict resolution (3)

.. or this thing that Riak does for you

Algebraic data types

Source: http://en.wikipedia.org/wiki/Algebraic_structure

Algebra - the GoF design pattern collection for functional programmers

Rather than solving this problem over and over again, let’s find a more general solution

Semilattice

trait Semilattice[T] { ! def join(T a, T b): T }

Monoid

trait Monoid[T] { def id: T def op(T a, T b): T }

Idempotency Commutativity Associativity

Identity Associativity

Idempotency

List(a) ++ List(a) ≠ List(a)

Set(a) ++ Set(a) = Set(a)

1 + 1 ≠ 1

max(1, 1) = 1

!

• Familiar binary operations forming monoids don’t need to be semilattices!

• Immutability isn’t enough / the same

• It doesn’t matter how many times you apply the operation

Commutativity

• Order in which you apply operations doesn’t matter any more

• If we notice dropped packages, just send them again

1 + 2 = 2 + 1

max(1, 2) = max(2, 1)

List(a) ++ List(b) ≠ List(b) ++ List(a)

Set(a) ++ Set(b) = Set(b) ++ Set(a)

Associativity (1)

• Allows you to split up and batch computations

• Each node needn’t receive all atomic operands, intermediate results will do as well

1 + (2 + 3) = (1 + 2) + 3

max(1, max(2, 3) = max(max(1, 2), 3)

List(a) ++ (List(b)++ List(c))

= (List(a) ++ List(b))

++ List(c)

Associativity (2)

• Again, intermediate results are as good as atomic operands

• You never lose any information in the whole computation

red + blue = blue + red =

purple

red + (blue + blue) red + blue

≠ (red + blue) + blue

purple + blue

* Simplistic version that assumes we’re losing information about the volume of the colour, for example (if you’re mixing paint)

avg(1, avg(2, 4)) avg(1, 3)

≠ avg(avg(1, 2), 4)

avg(1.5, 4)

G-Set

2P-Set

OP-Set

Further reading (1)• “Jonas Bonér - The Road to Akka Cluster, and Beyond”:

https://skillsmatter.com/skillscasts/4543-the-road-to-akka-cluster-and-beyond

• “Noel Welsh - Reconciling eventually consistent data”: https://skillsmatter.com/skillscasts/4915-how-do-we-reconcile-eventually-consistent-data

• “Sean Cribbs - Eventually Consistent Data Structures”: https://vimeo.com/43903960

Further reading (2)

• “A comprehensive study of Convergent and Commutative Replicated Data Types”: http://hal.upmc.fr/docs/00/55/55/88/PDF/techreport.pdf

One more thing …

top related