Download - High-performance high-availability Plone
High AvailabilityHigh Performance
Plone
Guido [email protected]
www.cosent.nlSocial Knowledge Technology
Plone Worldwide
Resilience
Please wave, to improve my speech
Plone as usual
● Aspeli: über-buildout for a production Plone server
● Regebro: Plone-Buildout-Example
– nginx frontend
– varnish cache
– haproxy balancer
– 4x plone instance
– zeo backend
Plone as usual
Plone as usual
webserver :80
Plone as usual
caching
Plone as usual
balancing across Plone instances
Plone as usual
Plone instances
Plone as usual
ZEO backend
Meet the client
● High-profile internet technology NGO
● Slashdot traffic levels
– 0.4 million page views / peak day
– 4 million page views / month
– 40 million hits / month
● Mission-critical web presence
● 100% uptime previous 5 years
● Non-Plone sysadmins
● High security
No can do
SPOF
SPOF
WTF?
Architecture Goals
● Must convince “file-based 100% uptime” sysadmins
● No SPOF
– eliminate all Single Points Of Failure
● Automated failover
– no manual intervention
● Extreme performance
● Extreme resilience
– killall -9 Plone
Meet Paul Stevens
● My brother
● mod_wodan + DBmail
● Plone developer
● pjstevns on irc/github/etc
NFG Net Facilities Group
● premium hosting
● 24/7 MySQL HA
– since stone age
● www.nfg.nl
Plone as usual
3-tier
Plone as usual
Duplicate setup
Load Balancer
Load Balancer
● Client provided hardware load balancer
● Alternative: Linux Virtual Server + HAproxy
– 2x HAproxy in active/passive config● this would be an EXTRA layer of HAproxy not shown in diagram
– use highly available “virtual” IP address
– monitor with Heartbeat or comparable
– failover virtual IP addres with arping broadcasts
● Alternative: AWS
Load Balancer
Ensure physical separation
● Ensure redundancy across physical servers
– no use to fail over on same machine
– separate machines in separate data centers
● Gotcha: moving virtuals around
– Disable HA facilities of virtualization platform
– We'll do our own HA
Full cluster
Replacing ZEO
ZEO versus Relstorage
● ZEO
– ZEO protocol
– filestorage
– object pickles
● ZRS Replication
– $$$ at the time
– later opensourced
● No hot-failover
– slave master reconfig→
● Relstorage
– ZEO protocol
– MySQL or PostgreSQL
– object pickles: no alchemy!
● MySQL replication
– done that 24/7 since 2001
– widely used
● Hot failover
– multi-master
Relstorage on MySQL
Blobstorage
● Not shown in diagram
● Client provided Netapp Metrocluster NFS disks
– no need to care about replication and HA for those
● Alternatives:
– DRBD + NFS
– AWS Elastic Block Device
– F-sniper + rsync + NFS
● Why not run database on that?
– disk replication + NFS + ZEO
– what can possibly go wrong?
Full cluster
Apache + Wodan
mod_wodan
● Caching module for Apache
– C
– Originally by ICS for nu.nl
– Now maintained by NFG
● Store response body + headers on disk
● BOFH attitude to caching policies
● Used in anger
● Alternative: stxnext.staticdeployment
Varnish ↔ Wodan
● Proxy process
● RAM memory cache
– restart → empty cache
– expired → gone
● Plays nice
– request + response headers
– etag split-view
● purge API
– plone.app.caching
● Apache module
● Persistent disk cache
– restart full cache→
– expired keep fallback→
● BOFH
– my way or the highway
– single cache file per page
● Cronjobs maintenance
– crawl sitemap
– delete removed pages
Varnish plus Wodan
Varnish● unload Plone
● plone.app.caching policies
– pages 1 hour
– resources longer
– purge on edit
● etag split-view
– per-user page versions
– cache authenticated
Wodan● failsafe content delivery
● hard policy config
– pages 1 minute
– resources longer
– edit 1-minute refresh→
● Gotcha: anonymous only
– editors bypass Wodan
Failure Modes
Full cluster
MySQL failover
Multi Master MySQL
● multi-master
– cross replication● each slaves the other
– any can be master● hot failover and failback
● Gotcha: use only 1 master at a time
– Relstorage is not multi-master
– avoid replication errors
● mmm_agent server (not shown in diagram)
– monitors mysql health and replication
– manages virtual MySQL HA ip address● think: Heartbeat for MySQL
Blade failure
Wodan only
Plone as usual
file-basedcontentdelivery
Readonly Rescue Mode
● File-based content delivery
– mod_wodan
– full cache of all pages + resources
– cached search results (Subject / tag cloud)
● AJAX-driven graceful degradation
– detect backend down via non-cached lightweight view● @@ipaddress not a full page: minimal rendering overhead
– disable interactive elements via CSS● search bar, personal tools display:none→
● Gotcha: anonymous only
– down for authenticated until manual reconfig→
● Gotcha: ErrorDocument
– pre-cache nice page but preserve http error status code→
No-downtime maintenance
Full cluster
cosent.nl/blog