docker pipelines
TRANSCRIPT
1
Dev/QA/Ops Friendly Docker Pipeline
Chris Mague / Shokunin
12/13/2016
2
Today's Talk
The Goal The Problem The Stack The Process The Conclusion
3
Quote
“a problem well put is half solved.” ― John Dewey
4
The Goal
“We want to release more frequently”
5
The Goal – Restated as Solvable
Build a continuous delivery pipeline for the Trulia Mobile API that is usable for all stakeholders.
6
The Problem(s) – Dev Version
- my code works on the shared dev host, but not on prod- no real visibility into what is happening in prod- troubleshooting is difficult- the Ops team is not helpful
7
The Problem(s) – QA Version
- code tested in QA doesn’t work in prod- inability to test multiple builds at the same time- no shared language to bridge the Dev/Ops teams- the Ops team is not helpful
8
The Problem(s) – Ops Version
- Dev/Stage environments are inconsistent- Prod environment is un-reproducable- Files are copied around in prod- Incoming requests are difficult to parse
9
The Problems – Stated as Solvable
- Need to build a common language (culture)- Need to build a reproducable platform in all environments (tech)- Need to provide automation and visibility tools (tech/culture)
10
The Stack
11
Docker
- Build a reproducable/immutable(ish) platform- Control Application dependencies- Automated build capabilities- Low overhead compared to virtualization- Stateless application
12
Step 1 / Base Image
- Packer instead of Dockerfiles- Puppet to build container- Build on Jenkins- Vagrant option available- Tagged with latest- Pushed to our Docker registry
13
Step 2 / Develop Locally
- create separate run directories per environment
- modules per environment- consul_shared
14
Local Terraform
- Sets up the docker container- Sources variable- calls the shared keys- uses the run_locatoin
15
Run Location
- list of containers- mobileapi-base only is not
cached
16
Run Location
- Run supervisor- expose port 80 as 8080- link to dependencies- set env vars- mount volumes
17
Configuration
- done in consul- consul template to json- creates
/etc/trulia/<APPNAME>.json- separated by environment
18
Running
19
Step 3 / Kickoff
20
An aside on Jenkins
- Configure with Puppet- Install SCM Sync Plugin- Vanilla as possible- Configure with Puppet
21
${BUILD_NUMBER}
Jenkins provides several environment variables and the build number of the software packaging now becomes our shared key
22
Communication
QA to Dev - “tcd-mobileapi(container) build 12 failed to pass smoke tests can you please look at class foo”
QA to Ops - “tcd-mobileapi(container) build 12 went is having trouble connecting to the user database”
Ops to Dev - “after we rolled out tcd-mobileapi(container) build 12 we noticed the app_v1_userlookup(KPI) time doubled”
23
Pipeline - Package Software
- Spin up a build container- Mount the current directory- Pull in dependencies- Build a .deb with FPM- Push to aptly
24
25
Pipeline – Build Deployable Container
- Take base container- Install packaged software- Tag with build number- Upload to registry
26
Docker tags
Be SUPER careful with latest
When in doubt do not use
27
Pipeline – Run in QATCD
- Spin up container in our QATCD Nomad cluster- Run terraform to update all of the configurations in consul- Set up credentials using Vault- container is now available http://tcd-mobileapi-10.qatcd.example.com
28
Pipeline – Deploy Test
- health checks are crucial - needed for monitoring - needed for LB - needed for consul - get hit like 20 times/second- engineer came up with the idea of
deploy tests - only hit occasionally - more detailed - more resource heavy
29
Pipeline – Smoke test
- Calls another Jenkins server- Managed by the QA team- Detailed application level test
30
Pipeline - Repointer
- allows for static hostnames for applications or external testers
- does some checking
31
Pipeline – Next Steps
1) Preprod environment - Push configuration LIVE - Run a single container with the newer version - Other tests run - Build number is put in a Jenkins form and push button2) Release to Production - Put a build number in a Jenkins form - Only allowed if the build is on preprod - Containers are rolled out with sleep and concurrency set
32
33
Pipeline
Dev, QA and Ops teams keep an eye on KPIs and various dashboards
QED
34
Internals
35
Nomad
- Job scheduler- Not limited to Docker- Integrates with Consul- Easy setup- Sane configuration
36
Nomad Config
37
Traefik
- HAProxy restart issue- Performant- Easily templatable
configuration- Nice quick front end
38
39
Vault / Consul Template
- Easily generate config files from key/value store- Feature flags are easily implimented- Store and filter Database credentials
40
Logging
- Big challenge- All Apache/Nginx logs include APPNAME/BUILD_NUMBER
information and are in JSON format- Application logs are in JSON format and often include unique
IDs- Stacktraces are fingerprinted- Logstash picks up from the Nomad alloc dirs
41
42
43
44
Stats / KPIs
- Data is pulled from the logs and sent to statsd→influxdb with a Grafana front end
- Host and container level stats are picked up via cAdvisor
45
46
47
Troubleshooting
- Devs have exec access to all containers through Vault SSH
- This is audited- After completion of any activities the container is
terminated
48
No silver bullets...
- Unit tests are slow- Initial learning curve- Docker on anything other than Linux is painful- Apps need to be modified- Less control for devs compared to old method
49
Improvements
- Better troubleshooting tools- Shared docker host for apps with heavy upstream dependencies- More local services to make development easier- Better training/support for desktop Docker issues- More code libraries to handle common app issues
50
Thanks
Kevin - AppDynamics Sonal Joshi – Trulia Sr. Automation Engineer
Vincent Lam – Trulia Sr. Application Developer