Download - A tale of scalability
A tale of scalabilityFrom one node to multiples DC's
Who?
2014 FIFA World Cup Brazil- 450K simultaneous users (ARG vs SWE)- 580Gbps (ARG vs SWE)- 1659 years watched (all games)- 7 x 1 (GER vs BRA)bbb, sportv, off, pfc, combate, gnews ...
Agenda
1. What this presentation is not about! 2. Basic glossary3. The story4. Questions
PS1: this presentation was not made by someone good with graphs/drawing/diagrams.
PS2: this reflects only my own unique personal individual stupid opinion not my employer.
What this presentation is not about!
Byzantine fault, 2PC, Paxos, RAFT, Threading, Locks, Leader election, ZAB, Consensus
problem, CRDTs, CALM, CAP theorem, to sum up this is not a deep distributed systems
presentation.
Basic glossary
Scalability: ability to enlarge to accommodate growth. (why?)
Basic glossary
Availability: the proportion of time a system is in a functioning condition. (why?)
Basic glossaryFault tolerance: ability to continue operating properly in the event of the failure. (why?)
Failover systems: software with automatic fault tolerance.
The story :: BananaApp
1st Solution :: 1 Server
databaseappserver
BRA
US
CAN Lost World
1st Solution :: problems
More users hit the app, it becomes slower :(CPU load was 1.3 (NOK) [h/top, w]I/O utilization was high (NOK) [iostat, vmstat]RAM usage 45% (OK) [free -m]Disk space 5% (OK) [df -h]
database
2nd Solution :: 2 Servers
appserver
BRA
USCAN
Lost World
database
2nd Solution :: good parts
Distributed the loadWe can fine tune each server separately
2nd Solution :: problems
More users hit the app, it becomes slower :(APP's CPU load was 1.3 (NOK)DB's CPU load was 0.3 (OK)I/O utilization was normal (OK)Introduce of network latency / point of failure
appserver
2nd Solution :: new conceptsPoint of Failure (or single point of failure [SPoF]) is a part of a system that, if it fails, will stop the entire system from working.Examples: our database and app server
When a solution is free from SPoF we can say it's a failover system.
3rd Solution :: 4 Servers
loadbalancerPPL
database
app1
app2
3rd Solution :: new concepts"Load balancer (LB) is a device/software that distributes network or application traffic across a number of servers (reals)." (F5)
1. How does it chooses which server to send? round-robin, least conn, weighted... 2. How does it knows about a dead node? health check /page, tcp:80...
3rd Solution :: LB exampleshttp {
upstream myapp1 {
server srv1.example.com;
server srv2.example.com;
}
server {
listen 80;
location / {
proxy_pass http://myapp1;
health_check uri=/health;
}
}}
listen appname 0.0.0.0:80 mode http stats enable balance leastconn option httpclose option forwardfor option httpchk HEAD /health HTTP/1.1 server srv1 srv1.example.com:80 check server srv2 srv2.example.com:80 check
NGINX HAProxy
3rd Solution :: problems
Users are getting signed out "randomly". This problem is also known as: session persistence, session stickiness. Nginx: sticky cookie srv_id expires=1h domain=.example.com path=/;
HAProxy: cookie srv_id insert indirect nocache
4th Solution :: 5 Servers
loadbalancerPPL
database
app1
app2
memcachedredis ...
4th Solution :: +problems
We now have 3 SPoFs: LB, memcache and database.
5th Solution :: LB (float/virtual ip)
/etc/sysctl.conf net.ip_nonlocal_bind=1/etc/ha.d/haresources lb1 192.168.0.10
5th Solution :: Database
Partition and Replication
ABCD
C B
D A
B (1,2) C (2,3)
A (3,0) D (0,1)
5th Solution :: mongo (master/bkp)
5th Solution :: cassandra (cluster)
5th Solution :: you got the idea
lb1PPL lb2
app1
app2
app3
appn
DB
Session
db1 db2
db3 db4
s1 s2
5th Solution :: + caching
lb1PPL lb2
app1
app2
app3
appn
DB
Session
db1 db2
db3 db4
s1 s2
Caching
c1
c2
c3
5th Solution :: Cachingproxy_cache_path /data/nginx/cache keys_zone=one:10m;
http {
upstream myapp1 {
server srv1.example.com;
server srv2.example.com;
}
server {
listen 80;
proxy_cache one;
location / {
proxy_cache_valid any 1m;
proxy_pass http://myapp1;
}
}}
NGINX
5th :: + Microservice Architecture
Application API's
Core
n1
c1 c1
n1
Search
n1
c1 c1
n1
Recommendation
n1
c1
n1
Social
n1
c1 c1
n1
mongodb elasticsearch spark/hadoop neo4j
n1
c1
5th :: Single datacenter (yet SPoF)
6th solution :: multihoming
6th solution :: models to replication
master / backup
master / master
2PC
Paxos
6th solution :: database
Cassandra can help you
6th :: DNS round robin
$ dig a www.youtube.com
6th solution :: anycastBorder Gateway Protocol (BGP) makes routing decisions based on paths, network policies or rule-sets configured by a network administrator, and is involved in making core routing decisions.
DNS solves www.example.com to 1.1.1.1
Clients from Colorado mostly will be routed to Colorado's DC.
Clients from California mostly will be routed to California's DC.
1.1.1.1
1.1.1.1
6th solution :: sub domain per cli
6th :: GSLB (Global Server Load Balancing)
6th :: GSLB multiples A records (BR)
$ dig a www.youtube.com
6th :: GSLB multiples A records (DE)
7th day: you shall rest
BR-DKC101
Summarizing
lb1
lb2
app1app2app3
appn
DB
Session
db1
db2d
b3
db4
s2
Cachingc
1
c2
c3
s1
US-DKC102
lb1
lb2
app1app2app3
appn
DB
Session
db1
db2d
b3
db4
s2
Cachingc
1
c2
c3
s1
JP-DKC103
lb1
lb2
app1app2app3
appn
DB
Session
db1
db2d
b3
db4
s2
Cachingc
1
c2
c3
s1
Bonus - Vagrant
Bonus - Docker (docker-compose)
Bonus - don’t blindly trust vendors
Link to this presentation
slideshare.net/leandro_moreira
References● https://f5.com/glossary/load-balancer● http://leandromoreira.com.br/2014/11/20/how-to-start-to-learn-high-scalability/● http://nginx.org/en/docs/http/load_balancing.html● https://www.digitalocean.com/community/tutorials/how-to-use-haproxy-to-set-up-http-load-balancing-on-an-ubuntu-vps● https://academy.datastax.com/courses/● http://en.wikipedia.org/wiki/Single_point_of_failure● http://book.mixu.net/distsys/single-page.html● https://www.howtoforge.com/high-availability-load-balancer-haproxy-heartbeat-debian-etch-p2● http://docs.mongodb.org/manual/core/sharding-introduction/● http://docs.mongodb.org/manual/core/replication-introduction/● http://nginx.com/resources/admin-guide/caching/● https://developers.google.com/web/fundamentals/performance/optimizing-content-efficiency/http-caching?hl=en● http://martinfowler.com/articles/microservices.html● http://www.netflix.com/WiMovie/70140358?trkid=12244757● http://highscalability.com/blog/2009/8/24/how-google-serves-data-from-multiple-datacenters.html● http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html● http://docs.mongodb.org/manual/tutorial/deploy-shard-cluster/● http://tech.3scale.net/2014/06/18/redis-sentinel-failover-no-downtime-the-hard-way/● http://www.slideshare.net/gear6memcached/implementing-high-availability-services-for-memcached-1911077● http://docs.couchbase.com/moxi-manual-1.8/● http://highscalability.com/blog/2009/8/24/how-google-serves-data-from-multiple-datacenters.html● http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/35590.pdf● http://static.googleusercontent.com/media/research.google.com/en//archive/chubby-osdi06.pdf● http://the-paper-trail.org/blog/distributed-systems-theory-for-the-distributed-systems-engineer/● http://nil.csail.mit.edu/6.824/2015/papers/paxos-simple.pdf● http://the-paper-trail.org/blog/consensus-protocols-paxos/● donkeykong.com● http://backreference.org/2010/02/01/geolocation-aware-dns-with-bind/● http://www.tenereillo.com/GSLBPageOfShame.htm● http://backreference.org/2010/02/01/geolocation-aware-dns-with-bind/
References● https://aphyr.com/tags/Distributed-Systems● http://pbs.cs.berkeley.edu/pbs-vldb2012.pdf● http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/36737.pdf● http://blog.empathybox.com/post/19574936361/getting-real-about-distributed-system-reliability● https://www.usenix.org/legacy/events/nsdi06/tech/full_papers/freedman/freedman.pdf● http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/35590.pdf