counting is hard
TRANSCRIPT
![Page 1: Counting is Hard](https://reader038.vdocument.in/reader038/viewer/2022100722/58f367521a28abb42a8b45ef/html5/thumbnails/1.jpg)
Counting is hard (at scale)
Russell Brown @russelldb
![Page 2: Counting is Hard](https://reader038.vdocument.in/reader038/viewer/2022100722/58f367521a28abb42a8b45ef/html5/thumbnails/2.jpg)
![Page 3: Counting is Hard](https://reader038.vdocument.in/reader038/viewer/2022100722/58f367521a28abb42a8b45ef/html5/thumbnails/3.jpg)
Big data - More machines
•Availability
•Fault tolerance
•Low latency
![Page 4: Counting is Hard](https://reader038.vdocument.in/reader038/viewer/2022100722/58f367521a28abb42a8b45ef/html5/thumbnails/4.jpg)
Distributed = CAP
!!!<-‐ Consistent -‐-‐-‐-‐-‐-‐ Available -‐> !!Atomic -‐ Identity -‐ idempotent -‐ ec -‐ Probalistic
![Page 5: Counting is Hard](https://reader038.vdocument.in/reader038/viewer/2022100722/58f367521a28abb42a8b45ef/html5/thumbnails/5.jpg)
Counting waits; Waiting counts
Joseph M. Hellerstein -“The Declarative Imperative”
•Consensus/coordination (count)
•Count all records (wait)
•Non-monotonicity means synchronisation (+/- on a counter?)
![Page 6: Counting is Hard](https://reader038.vdocument.in/reader038/viewer/2022100722/58f367521a28abb42a8b45ef/html5/thumbnails/6.jpg)
Stop! I Can’t Count
•Atomic sequences
•Fail when unavailable
•Registering users
•Checking out
![Page 7: Counting is Hard](https://reader038.vdocument.in/reader038/viewer/2022100722/58f367521a28abb42a8b45ef/html5/thumbnails/7.jpg)
I Just Can’t Stop
•Events you have no control over
•Web hit counts
•Traffic counts
•“Likes!”
•Reduce accuracy for availability
![Page 8: Counting is Hard](https://reader038.vdocument.in/reader038/viewer/2022100722/58f367521a28abb42a8b45ef/html5/thumbnails/8.jpg)
No Means Yes, Maybe.
•Partial failures with sloppy quorum
•Re-try? D-Double count.
•Idempotency
•Space trade off
•Acks / TTL?
![Page 9: Counting is Hard](https://reader038.vdocument.in/reader038/viewer/2022100722/58f367521a28abb42a8b45ef/html5/thumbnails/9.jpg)
Probably
•Probabilistic
•Sampling
•Estimates
•Great for cardinality, poor for identity
•Great for the scalezzzz
![Page 10: Counting is Hard](https://reader038.vdocument.in/reader038/viewer/2022100722/58f367521a28abb42a8b45ef/html5/thumbnails/10.jpg)
Implementations
•Ids -> Ticket servers, RDBMS, Redis
•Atomic counts -> RDBMS / Zookeeper / App Engine
•Distributed Ids -> Snowflake / zookeeper / SQL Shards
![Page 11: Counting is Hard](https://reader038.vdocument.in/reader038/viewer/2022100722/58f367521a28abb42a8b45ef/html5/thumbnails/11.jpg)
Implementations
•EC Idempotent -> Bueller?
•EC -> Riak / Cassandra / NoSql
•Estimates -> Stats / Sampling
![Page 12: Counting is Hard](https://reader038.vdocument.in/reader038/viewer/2022100722/58f367521a28abb42a8b45ef/html5/thumbnails/12.jpg)
Take Home?
•Know what you’re counting
•Know why
•Make your trade off based on use case
•BUT think ahead (big data got that way ‘cos it grew)