making spinnaker go @ stitch fix
TRANSCRIPT
Making Spinnaker Go
@ Stitch Fix
Diana Tkachenko,Data Platform Engineer
Spinnaker Is Not Yet in Production
Let me tell you an awesome story of how to install and set up spinnaker to make it work for you!
I. Our InfrastructureII. Setting Up Spinnaker
III. Authentication on Spinnaker
PART I Our Infrastructure Pre-Spinnaker
100% of Infrastructure on AWS3 Peered VPCs
Isolate environments into different VPCs:
● TEST○ testing deployments before
pushing to prod● PROD
○ all production deployments● INFRA
○ tools that both prod and test need to use
prod test
infra
jenkinsartifactoryspinnaker
flotilla
Deployment Pipeline
Immutable Server Pattern
● Package Code into RPMs● Bake AMI from RPM● Deploy
○ Set up Launch Config with AMI
○ Create ASG
○ Set up ELBs, Route53
Process Overview
create ELB
create Route53
create spec
bake AMI
launch ASG
build RPM
Repeatable Deployment Process
Definition of Application
make changes to code
To create an application, this would be the one time setup
app “scaffolding” on aws;route53 points to ELB
rpm built from this recipe
Iterative process for deploying new versions
attach to ELB
Step 1: Build RPM from Spec
Wrote up simple tools to create the RPM:
● Create spec file from template● Customize spec file● Jenkins job to build RPM
The process appears complex:
● The spec file seems scary for user● But it makes deployment easy down the
line!
Name: sf-helloworldVersion: 0.0.1Release: 1Summary: YOUR SUMMARY HERE!Group: Development/LibrariesLicense: stitchfix-internalBuildArch: noarchAutoReqProv: noBuildRequires:Requires: sf-base, sf-aa, sf-nginx
%installmkdir -p $RPM_BUILD_ROOT{/stitchfix,/etc/init.d}cp -R %{_sourcedir} $RPM_BUILD_ROOT/stitchfix/%{base_name}cp %{_topdir}/SCRIPTS/sf-%{base_name} $RPM_BUILD_ROOT/etc/init.d/sf-%{base_name}
%files/stitchfix/%{base_name}/etc/init.d/sf-%{base_name}
%postln -s /etc/nginx/sites-available/sf-app.conf /etc/nginx/sites-enabled/sf-app.conf/usr/bin/pip-2.7 install -e /stitchfix/%{base_name}chkconfig --add %{name}chkconfig --levels 345 %{name} on
sf-helloworld.spec
Step 2: Bake AMI
● Used aminator (also from Netflix) to create
AMIs● Jenkins job for baking
How does AMI get baked?
1. Create volume from base AMI id
2. Attach and mount volume3. Chroot into volume4. Install RPM on volume5. Create snapshot from volume6. Register AMI from snapshot
EC2 Instance(Baking Machine)
Artifactory(RPM repo)
RPM
Volume
get RPM from repo
inst
all R
PM
Step 3: Deploy
ELB
ASG
Route53
EC2 EC2 EC2
Launch Config
AMIRPM
is baked into
both used to create
internet traffic
imm
utab
le server
routes traffic
Why Spinnaker?
80 Data Scientists
10 Platform Engineers
Our data scientists are responsible for:
● Building ETLs
● Deploying Dashboards and Services
We value self service!
PART IISetting Up SpinnakerIn Our Infrastructure
Key Differences
from the Netflix Setup
1. Amazon Linux instead of Ubuntua. Adding RPM support to Gradle
b. System V instead of Upstart
2. Nginx instead of Apache3. Secured Redis on AWS4. No Cassandra in Existing
Architecture
And how to handle them
Diff #1You drew the short straw with Amazon Linux (Red Hat) instead of Ubuntu
Adding RPM Support to Gradle
Create the buildRpm block:
● add our rpm repo in /etc/yum.repos.d on bake machine
● add dependency rpms inside the block
● make sure to build all the other spinnaker rpms and push to your rpm repo
./gradlew buildRpm
// UbuntubuildDeb { requires('redis-server', '3.0.5', GREATER | EQUAL) requires('spinnaker-clouddriver') requires('spinnaker-deck') requires('spinnaker-echo') requires('spinnaker-front50') requires('spinnaker-gate') requires('spinnaker-igor') requires('spinnaker-orca') requires('spinnaker-rosco') requires('spinnaker-rush') requires('apache2')}
// CentosbuildRpm { requires('sf-nginx') requires('sf-base') requires('spinnaker-clouddriver') requires('spinnaker-deck') requires('spinnaker-echo') requires('spinnaker-front50') requires('spinnaker-gate') requires('spinnaker-igor') requires('spinnaker-orca') requires('spinnaker-rosco') requires('spinnaker-rush') os = LINUX # ⇐ YOU NEED THIS MAGIC LINE!}
[spinnaker] build.gradle
Upstart on Amazon LinuxDifferent startup systems:
● We use System V (ancient) ○ service nginx start○ startup scripts in /etc/init.d○ chkconfig for starting on bootup
● Spinnaker uses upstart○ initctl start spinnaker○ conf files in /etc/init
Another Issue:
● 0.6.5 version of upstart on Amazon Linux which is way older than 1.4 on Ubuntu
description "rosco"start on filesystem or runlevel [2345]
# not supported in old version# so for amazon linux we remove these lines:setuid spinnakersetgid spinnaker
expect forkstop on stopping spinnaker
env HOME=/home/spinnaker exec /opt/rosco/bin/rosco 2>&1 > /var/log/spinnaker/rosco/rosco.log &
[rosco] /etc/init/rosco.conf
Diff #2You’re hip and use Nginx instead of Apache
Namespace Gate and Rosco in Nginx
● include /etc/nginx/sites-enabled in main nginx conf● on deploy, symlink
/etc/nginx/sites-available/spinnaker.conf => /etc/nginx/sites-enabled/spinnaker.conf
[spinnaker]
/etc/nginx/sites-available/spinnaker.conf
# all services on the same machineserver {
listen 80; location / { root /opt/deck/html; }
# namespacing gate location ~* ^/gate/ { rewrite ^/gate/(.*) /$1 break; proxy_pass http://localhost:8084; } # namespacing rosco location ~* ^/rosco/ { rewrite ^/rosco/(.*) /$1 break; proxy_pass http://localhost:8087; }}
ELB HTTP 80 ⇒ HTTP 80
nginx 80
/ => /opt/deck/html/gate/health => localhost:8084/health
/rosco/health => localhost:8087/health
EC2
spinnaker.<internal-domain>.com
Diff #3You happily use AWS Elasticache for Redis, but find out Spinnaker angers it
AWS Elasticache is Special
AWS Redis won’t let you issue CONFIG commands!
● Redis version has to be >= 2.8.0● On AWS elasticache console, add
notify-keyspace-events=Egx to a new parameter group
○ this enables redis keyspace events for generic commands and expired events
● In gate.yml, add
redis.configuration.secure=true
server: port: ${services.gate.port:8084} address: ${services.gate.host:localhost}
...
redis: connection: ${services.redis.connection} # add the following two lines if using aws redis configuration: secure: true
[spinnaker] /config/gate.yml
AWS
Redis 2.8.0
spinnakerparameter
group
notify-keyspace-events=Egx
Diff #4You’d like a quick Cassandra hack since you are Cassandra-less
Quick EBS Backed Cassandra Node
Don’t want an entire cluster - want fast setup, so create single-node Cassandra:
● EBS backed store for cassandra data● Startup script remaps route53 entry on each
deployment○ Point straight to EC2, not ELB
On redeploy or termination:
● EBS detaches, so data is not lost● cassandra.<internal-domain>.com mapped
to new EC2
Cassandra
cassandra.<internal-domain>.com
EBS/cassandra-storage
# change all store dirs to EBSdata_file_directories: - /cassandra-storage/datacommitlog_directory: /cassandra-storage/commitlogsaved_caches_directory: /cassandra-storage/saved_caches
# point all to private route53 entryseed_provider: parameters: - seeds: cassandra.<internal-domain>.comlisten_address: cassandra.<internal-domain>.comrpc_address: cassandra.<internal-domain>.com
/etc/cassandra/conf/cassandra.yaml
Overview: Spinnaker on AWS
ELBspinnaker.<internal-domain>.com
HTTP 80 ⇒ HTTP 80
ASG
EC2
clouddriver7002
front50
8080
orca8083
rosco8087
gate8084
rush8085
igor8088
echo8089
nginx80
deck80
route53 cname for load balancerload balancer listeners
deck, rosco, gate through nginxgate calls everything else
cassandra redis
PART IIIAuth on SpinnakerKeep Calm
SSL + Auth on Spinnaker
● Where to Terminate SSL?● Glory and the Beast of Self Signed
Certs● Google OAuth2.0 Redirects Mess
up Nginx Rewrites● Tomcat Ignores Client Certs for
Client AuthGet ready to read a lot of stack traces
SSL: Dilemma #1Where to terminate SSL:
a. ELBb. Nginxc. Server
Nginx to Terminate SSL for Deck, Rosco
● Configure nginx with cert and key and turn ssl on● Nginx now cannot start on bootup - needs
password?○ Add password to a file, add to nginx
● Now our healthcheck is messed up
○ Add 5000 port for easy ELB healthcheck
● Optional 80 => 443 redirect
● Notice how gate rewrite is gone…○ has to do with oauth redirects
server { listen 5000; location / { add_header Content-Type text/plain; return 200 'POOOOOOOOP'; }}
# optional redirect hereserver { listen 80; return 301 https://$host$request_uri;}
server { listen 443 ssl; ssl_password_file /etc/keys/spinnaker.pass; ssl_certificate /opt/spinnaker/ssl/server.crt; ssl_certificate_key /opt/spinnaker/ssl/server.key;
location / { root /opt/deck/html; }
location ~* ^/rosco/ { rewrite ^/rosco/(.*) /$1 break; proxy_pass http://localhost:8087; }}
[spinnaker]
/etc/nginx/sites-available/spinnaker.conf
For Gate, Pass Through SSL Directly to Server
We want ELB to just pass traffic through to gate
without decrypting:
● Bypass nginx for gate: ports 8084 ⇒ 8084 for gate SSL
Gate is responsible for all types of authentication:
● Have client certificate? ○ Authenticate client certificate - this is
why gate needs to terminate SSL● No client certificate?
○ Send to google oauth
ELB
HTTP 80 ⇒ HTTP 80TCP 443 ⇒ TCP 443
TCP 8084 ⇒ TCP 8084
EC2
spinnaker.<internal-domain>.com
gate8084
nginx443
80 ⇒ 443
SSL: Dilemma #2Self signed certs? Meet your new best friends, the Java TrustStores
Tomcat Needs CA to Be in Trust Store
Because we are using self-signed certs, it’s important to have our self created CA in the
truststore:
● Add spinnaker cert to java keystore using keytool utility
● Add keystore/truststore file location to gate-local.yml config
server: ssl: enabled: true keyStore: /opt/spinnaker/ssl/keystore.jks keyStorePassword: poop keyAlias: server trustStore: /opt/spinnaker/ssl/keystore.jks trustStorePassword: poop
/opt/spinnaker/conf/gate-local.yml
But at some point I still had problems, so here’s a quick hack - add your CA to default java CA file:
$JAVA_HOME/jre/lib/security/cacerts
OAuth: Dilemma #3Google OAuth2.0 redirects trample all over your Nginx rewrites
Remove Namespacing for Gate & Bypass Nginx
● Set redirect_uri to our gate address: https://spinnaker.<internal-domain>.com:8084/login
● Gate can no longer be namespaced because on redirect, /gate in the path gets lost as only $host recorded
Spinnaker(gate)
Google Auth
Server
Web Browser(deck javascript)
https://spinnaker.<internal-domain>.com:8084/login
User authorization request
User authorizes application
Auth code grant
Access token request
Access token grant
Client Auth: Dilemma #4Tomcat doesn’t seem to care about your client cert
Make Tomcat Request Client Cert for Client Auth
We need to enable scripts to post tasks to spinnaker with client authentication:
● Create certs for client● Configure gate tomcat to validate client cert
Spinnaker Gatespinnaker.<internal-domain>.com:8084
Beakhead(Spinnaker Client)
x509: enabled: true subjectPrincipalRegex: CN=(.*?)
server: ssl:
clientAuth: want enabled: true keyStore: /opt/spinnaker/ssl/keystore.jks keyStorePassword: poop keyAlias: server trustStore: /opt/spinnaker/ssl/keystore.jks trustStorePassword: poop
/opt/spinnaker/conf/gate-local.yml
POST /tasksInclude client cert in request
● Layer based authentication on gate
● Tomcat validates cert: has to recognize cert authority from truststore
● Returns response if authenticated
PART IVTake AwaysWhat we learned
Spinnaker is complex!There are barriers to overcome if working with different infrastructure.
I learned a lot about SSL, OAuth 2.0 and Client Authentication.
Like a lot.
Thanks for Listening!
We are very much looking forward to having Spinnaker in production.
Find me on spinnaker slack
@dtkachenko
All pictures used in this presentation credit to Allie Brosh hyperboleandahalf.blogspot.com