zalando: bootstrapping a cassandra cluster in a dynamic cloud environment
TRANSCRIPT
Bootstrapping Cassandra in a Dynamic Cloud Environment
Luis Mineiro - @voidmaze Thorbjörn Gruda - @ThorbyG
AgendaWho We Are and What We Do
How Do We Run a Tech Company
Cloud Infrastructure for Autonomous Teams
Cassandra in a Dynamic Cloud Environment
15 countries 3 fulfilment centers 15+ million active customers 2.2+ billion € revenue 2014 130+ million visits per month 8.000+ employees
ONE OF EUROPE’S LARGEST ONLINE FASHION RETAILERS
Visit us: tech.zalando.com
Tech hubs in Berlin, Dublin, Dortmund and Helsinki
#CassandraSummit
Zalando Technology
#CassandraSummit
Environment
Credits to our colleague Kolja Wilcke
#CassandraSummit
Environment
Credits to our colleague Kolja Wilcke
#CassandraSummit
Fashion Store Website
#CassandraSummit
Fashion Store Apps
iOSAndroid Windows Mobile
#CassandraSummit
Zalando Infrastructure
Based on https://flic.kr/p/bX5E4c
2 datacenters thousands of production instances
serving 15 countries
#CassandraSummit
RadicalAgility
#CassandraSummit
What’s That?
Radical Agility is Zalando Tech's approach to running a technology organization -
an approach based on the three pillars of autonomy, mastery and purpose
http://zln.do/ra-video
Compliance Innovation
#CassandraSummit
#CassandraSummit
STUPSA Cloud Infrastructure for Autonomous Teams
#CassandraSummit
What Is It?
“The STUPS platform is a set of tools and components to provide a convenient
and audit-compliant Platform-as-a-Service (PaaS) for multiple autonomous teams on top of Amazon Web Services (AWS)”
Learn more at https://stups.io
#CassandraSummit
Relevant Requirements
All STUPS applications are Docker containers
We can only use audit-compliant AMIs
STUPS is dynamic. VPC is DHCP only
#CassandraSummit
Challenges Ahead
#CassandraSummit
Docker Container
Plenty of options out there
None was created specifically for STUPS
They should be immutable
#CassandraSummit
Audit-Compliant AMI
We can’t use DataStax’s AMIs
Cloud Engineering team publishes audit-compliant AMIs - Taupage
Taupage has its own bootstrapping process
#CassandraSummit
Dynamic Environment
Clusters require special nodes - Seeds
EC2 instances get a random IP address
EC2 instances come and go
#CassandraSummit
How can I help you?
#CassandraSummit
I’d like to bootstrap Cassandra
#CassandraSummit
YADCYet Another Docker Cassandra
• Kept the best options from the other
recipes out there
• Added STUPS specific options
#CassandraSummit
and then…?
#CassandraSummit
Run Container in Taupage
With this Cassandra container
we should be ready to run in our
audit-compliant AMI - Taupage
#CassandraSummit
You wish!
#CassandraSummit
I’d like to discover seed nodes dynamically
#CassandraSummit
Discovery
STUPS makes use of etcd to register EC2 instances
Applications can use etcd for discovery and distributed locking
#CassandraSummit
and then…?
#CassandraSummit
Custom SeedProviderWe created the EtcdSeedProvider
seed_provider: - class_name: org.zalando.cassandra.locator.EtcdSeedProvider parameters: - url: http://example.org/v2/keys/cassandra/my-cluster-name/seeds
Dynamically updates list of seed nodes from etcd
#CassandraSummit
and then…?
#CassandraSummit
Senza TemplateWe use the STUPS senza tool
to instrument AWS’ CloudFormation Bootstraps a Cassandra Cluster
with one simple command
$ senza create https://goo.gl/qyTy7b \ my-new-cluster \ etcd.example.org
#CassandraSummit
STUPS Appliance
CloudFormation
Cluster Details Name: my-new-cluster Size: 3
etcd.example.org
#CassandraSummit
STUPS Appliance
CloudFormation
Cluster Details Name: my-new-cluster Size: 3
etcd.example.org
Auto Scaling Group
Min, Max, Desired = 3
#CassandraSummit
STUPS Appliance
CloudFormation
Cluster Details Name: my-new-cluster Size: 3
etcd.example.org
Auto Scaling Group
Min, Max, Desired = 3
SeedNode #1
#CassandraSummit
STUPS Appliance
CloudFormation
Cluster Details Name: my-new-cluster Size: 3
etcd.example.org
Auto Scaling Group
Min, Max, Desired = 3
SeedNode #1
Node #2
#CassandraSummit
STUPS Appliance
CloudFormation
Cluster Details Name: my-new-cluster Size: 3
etcd.example.org
Auto Scaling Group
Min, Max, Desired = 3
SeedNode #1
Node #2
Node #3
#CassandraSummit
STUPS OpsCenterCluster discovers and registers into OpsCenter while bootstrapping
#CassandraSummit
What about operations?
#CassandraSummit
Appliance Operations
First release focused only on initial bootstrapping
Some things Just Work™
#CassandraSummit
Adding NodesJust adjust your Auto Scaling Group
Or trigger CloudWatch ScaleUp actions
#CassandraSummit
Useless piece of sh*t
#CassandraSummit
Upcoming FeaturesBetter distribution of seed nodes per AZ
Scheduled backups
HTTP Health check
JMX metrics and instrumentation over HTTP
Replacement of dead nodes
#CassandraSummit
All of this is open source
#CassandraSummit
We Want to Hear From You
https://github.com/zalando/stups-cassandra
https://github.com/zalando/stups-opscenter
https://github.com/zalando/cassandra-etcd-seed-provider
Try it out
Help us find issues
Send us pull requests
Thank you for listeningCheck out our blog https://tech.zalando.com
Our many open source products https://github.com/zalando The STUPS stack https://stups.io
Got more questions? You can reach us on twitter @ZalandoTech
We’re hiring!
Special thanks to Jessie Dude. No Continuum Transfunctioners were harmed during the production of these slides.