building a paas with docker and aws
TRANSCRIPT
©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved
EmpireBuilding a PaaS with Docker and AWS
Agenda
• A little background about why we decided to build an internal PaaS.
• Introduction to Empire.• How we’re leveraging Amazon EC2 Container
Service (ECS) as the backend.• Demo• Q&A
Who am I
• Eric Holmes• Infrastructure Engineer at Remind• I like building things for other developers• Work mostly with Go and Ruby• You can find my open source stuff at
https://github.com/ejholmes
What’s Remind?
• Remind is a messaging platform for teachers, students and parents.
• Chat/Announcements/Files• ~25 million users. ~350,000 new users per day
during BTS• ~5 million messages per day.• ~50 employees. ~30 engineers.
Broke apart the monolith
• Sidekiq queues were IO bound and constantly backed up during BTS
• Message delivery workers were tightly coupled to the rest of the application. Difficult to scale out horizontally
• Database would need to be sharded• Started breaking the monolith apart into loosely
coupled services.• Now have ~50 production services
Heroku
• Entirely hosted on Heroku• Heroku has been awesome; never needed an
ops team.• Allowed us to focus on building product.
But we ran into issues...
• “Internal” micro-services need to be exposed publicly.
• Databases need to be opened up to all traffic.• Little visibility into performance of hosts.• No control over the routing layer.
What do we want?
• Want to use AWS services.• Want to maintain operational simplicity.• Support 12 factor apps. http://12factor.net/• Maintain shared patterns for deployment. Faster iteration and build +
release cycles• No ops.• Decrease our surface area and only expose a single app publicly.• Robust and resilient to failure. Self-healing.• If we can, continue to use containers as a unit of deployment.
Why containers?
• Fast to build*• Let us isolate dependencies as a portable, easy-
to-distribute package.• Allow us to create better development
environments with more dev/prod parity.• Limit the number of moving parts when we
deploy.• Better resource utilization and cost management
We’re not the first company to want a PaaS
• Netflix - Asgard• SoundCloud - Bazooka• Every other company in our investor’s portfolio...
Something we can re-use?
• Flynn–Alpha–Undergoing many architectural changes–Custom load balancer
• Deis–More than it needed to be–Nobody using it successfully in production (that we knew of)
Empire was born
• Initially started as a management layer on top of CoreOS + fleet.
• Load balancing via nginx configured through confd + etcd.
• Unit of deployment was Docker containers• Implemented a subset of the Heroku API
Therein lies the rub...
• Fleet initially worked well, until we started testing failure modes.
• Fleet had a lot of bugs• etcd was fragile• We needed resilience and stability• We didn’t want to run and operate our own
clustering.
Amazon EC2 Container Service (ECS) becomes GA
• Amazon ECS became GA while we were looking for an alternative scheduler.
• Looked promising to serve as the scheduling backend.
What is Amazon ECS?
• Pools hosts together as a single compute resource.
• Provides a set of APIs for placing tasks on machines
• Scheduler supports “services” for scaling tasks horizontally and maintaining desired state.
• Services integrate with ELB for connection draining, zero downtime, and healthchecks.
Amazon ECS for Empire
• Solid set of primitives to serve as the scheduling backend
• Managed service• Failure modes behaved as we expected them to• ELB integration allowed us to remove custom
routing layer• Service discovery via DNS
What is Empire?
• Open source internal PaaS for micro-services• A layer of usability on top of Amazon ECS for 12
factor apps• Single binary. Minimal deps. Easy to run.• Provides an API and CLI to create apps, deploy
docker images, update configuration, run one off tasks etc.
• Allows you to use Procfiles to build multiple Amazon ECS services
Is it ready for production?
• Running ~15 production services within Amazon ECS managed via Empire for a little over a month
• Empire is hands off after you’ve deployed. AWS services take over
• Moving directly onto EC2 showed huge performance improvements for services
What does Empire not do?
• Bring your own logging and metrics (soon?)• It doesn’t handle building your Docker images• Doesn’t handle the creation of attached
resources like Databases
Thank you
• GH: @ejholmes• Twitter: @vesirin• https://github.com/remind101/empire• https://github.com/ejholmes/empire-demo• http://12factor.net/