counting is hard

12

Click here to load reader

Upload: russell-brown

Post on 16-Apr-2017

172 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Counting is Hard

Counting is hard (at scale)

Russell Brown @russelldb

Page 2: Counting is Hard
Page 3: Counting is Hard

Big data - More machines

•Availability

•Fault tolerance

•Low latency

Page 4: Counting is Hard

Distributed = CAP

!!!<-­‐  Consistent  -­‐-­‐-­‐-­‐-­‐-­‐  Available  -­‐>  !!Atomic  -­‐  Identity  -­‐  idempotent  -­‐  ec  -­‐  Probalistic  

Page 5: Counting is Hard

Counting waits; Waiting counts

Joseph M. Hellerstein -“The Declarative Imperative”

•Consensus/coordination (count)

•Count all records (wait)

•Non-monotonicity means synchronisation (+/- on a counter?)

Page 6: Counting is Hard

Stop! I Can’t Count

•Atomic sequences

•Fail when unavailable

•Registering users

•Checking out

Page 7: Counting is Hard

I Just Can’t Stop

•Events you have no control over

•Web hit counts

•Traffic counts

•“Likes!”

•Reduce accuracy for availability

Page 8: Counting is Hard

No Means Yes, Maybe.

•Partial failures with sloppy quorum

•Re-try? D-Double count.

•Idempotency

•Space trade off

•Acks / TTL?

Page 9: Counting is Hard

Probably

•Probabilistic

•Sampling

•Estimates

•Great for cardinality, poor for identity

•Great for the scalezzzz

Page 10: Counting is Hard

Implementations

•Ids -> Ticket servers, RDBMS, Redis

•Atomic counts -> RDBMS / Zookeeper / App Engine

•Distributed Ids -> Snowflake / zookeeper / SQL Shards

Page 11: Counting is Hard

Implementations

•EC Idempotent -> Bueller?

•EC -> Riak / Cassandra / NoSql

•Estimates -> Stats / Sampling

Page 12: Counting is Hard

Take Home?

•Know what you’re counting

•Know why

•Make your trade off based on use case

•BUT think ahead (big data got that way ‘cos it grew)