test driven infrastructure development (2 - puppetconf 2013 edition)

Test driven Infrastructure development

Tomas Doranbobtfish@bobtfish.net@bobtfish

Puppetconf 2013

Today, I’m going to talk about the promised land!And by ‘repeatable’, I mean I need to be able to spin up an arbitrary set of servers for any environment I want, whenever I want - so _all_ the configuration of all the instances has to be dynamic!

•High availability!

•Automated testing of all infrastructure changes

• Entirely repeatable application environments

•High confidence in changes

•Continuous integration and deployment for infrastructure

So who the hell am I?

Infrastructure automation nut!Ex-backend web developer, Ex-security, currently fixing puppet at Yelp!

Dev / Ops

State of repeatability and testing in infrastructures is generally shocking.Leads to systems/operations teams being adverse to change and conservative - slows the business down!Why isn’t your infrastructure an agile software project?

Dev / Ops•Developer viewpoint

•Grass IS greener

•Think of your infra as an agile software project...

•Grass IS greener

•Think of your infra as an agile software project...

•What workflow do I want?

The state of the art

Going to talk about how I think the generally accepted way of doing some things is fundamentally broken!But lets start with a simple description of the issues I’m worrying about.

CM = state machine

Each change puppet makes (or attempts to make) is a state transition. Each circle represents the configuration state of the server on disc + services running etc..

Non deterministic

This is the key observation here - you don’t know which way puppet’s gonna jump :)In this case - it doesn’t matter, as the two operations are orthogonal.

Convergent!

Convergence is when each run of puppet takes you nearer to 0 changes, but the next run makes additional changes..The classic way to screw this up is to miss a dependency in your code.

Convergent!

Of course, this doesn’t happen - the first step goes BANG, then mysql gets installed, creates /etc/mysql.The second puppet run _then_ sets the config up..

err: /Stage[main]//File[/etc/mysql/my.cnf]/ensure: change from absent to file failed: Could not set 'file on ensure: No such file or directory - /etc/mysql/my.cnf.puppettmp_3706 at /home/tdoran/test.pp:4

Aaand in your puppet logs, you get.

Purple text of rage!

err: /Stage[main]//File[/etc/mysql/my.cnf]/ensure: change from absent to file failed: Could not set 'file on ensure: No such file or directory - /etc/mysql/my.cnf.puppettmp_3706 at /home/tdoran/test.pp:4

THE PURPLE TEXT OF RAGE

Convergent!

(Shamelessly stolen from https://www.usenix.org/legacy/publications/library/proceedings/lisa02/tech/full_papers/traugott/traugott.pdf)

Aaand your machine is convergent - i.e. it gets towards the desired state in a number of steps..

•before

• require

• subscribe

•notify

As I noted, this all happens as you missed a dependency. This is the easy case, where puppet can detect hat and tell you! It’s also entirely possible to be totally silent.It is though totally possible to write your puppet code well enough to need EXACTLY 1 puppet run to fully provision a server!

Fixable!

•before

• require

• subscribe

•notify

As I noted, this all happens as you missed a dependency. This is the easy case, where puppet can detect hat and tell you! It’s also entirely possible to be totally silent.It is though totally possible to write your puppet code well enough to need EXACTLY 1 puppet run to fully provision a server!

Fixable!

•before

• require

• subscribe

•notify

What about an entire

infrastructure?

The $64,000 question is....

A whole stack

Lets start simple, but semi realistic.Gonna ignore databases.Gonna ignore monitoring.Gonna ignore the n[eo]twork.

Exported resources

Each layer of systems can publish data to the systems which depend on it. (I.e. webs register, proxies find the webs + register themselves, lbs then find the proxy).Given you know the dependencies - you can get consistent runs by ordering them.

Exported resources

• Inter machine dependencies

Exported resources

• Unidirectional!

Exported resources

• Unidirectional!

• Known graph - webs, proxies, lbs

Exported resources

• Unidirectional!

• Known graph - webs, proxies, lbs

• Puppetroll (github.com/youdevise/puppetroll)

Exported resources

(Shameless ripoff of http://xkcd.com/1171/ )

Ordering dependent. Hard to test (in isolation). Slooow (have to run in order)

Co-dependence

And if we really are talking about entire infrastructures...Then maybe we need some of these.

Co-dependence

:(You _know_ that if everything is dynamically configured that you’re gonna have to do multiple puppet runs per server...Do we _really_ want to keep running puppet till it stops changing things?

The solution - an external model

Use your software model to generate a set of machines for an environment.And generate config for puppet to apply to each system to configure itAdd super secret special sauce (lots and lots of mcollective!)

• Represent system as a set of ruby classes

• DSL for describing environments

• Dependencies

• Domain knowledge

This is a simplified / minimal example jenkins environment - just 4 machines (2 web apps, 2 load balancers)

ENC data!

Our external node classifier generates this for each of the 4 machines, which translates to puppet code run on the server.Note how every server gets all of it’s dependenciesThere’s a companion data structure sent to the agent which actually provisons the virtual machines

Call tree looks something like this: Model all the nodes, allocate all their IPs. Make calls to KVM servers to provision machines.. VMs start, boot, run puppet, send cert to puppetmaster, --waitforcert.Central provisioning asks ‘do we have a cert’, waits - signs it. Looks up DNS and ENC to compile catalogue. Catalog shipped to node, runs puppet. Provisioning uses MCO to determine when puppet finished.. When all nodes up, check nrpe all green on all nodes, then run end to end app tests!

Automate all the things

Suddenly, I have massive power.I can write a small script to bring up a whole production like environment, run tests against it, tear it down. I can do this against the latest puppet changes, and only promote them to run on production servers when the tests pass!

BDD infrastructure

Behavior driven development - given I have a high level model of the systems comprising an infrastructure, I can then write equally high level tests to assert the behavior of that infrastructure

BDD infrastructure• Given

For example...

BDD infrastructure• Given – the Service has finished being

provisioned

• And

provisioned

• And – all monitoring related to the service is passing

provisioned

• When

provisioned

• When – when we destroy a single member of the service

provisioned

• Then

provisioned

• Then – we expect all monitoring at the service level to be passing

provisioned

• And

provisioned

• And – we expect all monitoring at the single machine level to be failing

Yes, I am suggesting regression testing your load balancer setup...

Is this for real?

•Yes!

Is this for real?

•Yes!

• We actually built this, the core parts are on github

Is this for real?

•Yes!

• We actually built this, the core parts are on github

• Deployed real applications to production at TIM Group

•Continuous integration and deployment for infrastructure

This is my promised land!

Questions?

• https://devblog.timgroup.com/2013/06/14/exported-resources-considered-harmful/

• https://devblog.timgroup.com/2013/06/26/scenario-testing-infrastructures/

• https://github.com/youdevise/provisioning-tools

• https://github.com/youdevise/stackbuilder

test driven infrastructure development (2 - puppetconf 2013 edition)

Technology

event driven infrastructure

workshop: know before you push 'go': using the beaker...

infrastructure as software - puppetconf 2014

performance tuning your puppet infrastructure - puppetconf...

orchestrate event-driven infrastructure with saltstack

infrastructure-as-code with puppet enterprise in the cloud -...

containercon - test driven infrastructure

killer r10k workflow - puppetconf 2014

puppetconf track overview: windows

enterprise model driven infrastructure

data-driven transport infrastructure maintenance

plugging chocolatey into your puppet infrastructure -...

test-driven development of infrastructure code

test driven infrastructure avec docker

building data-driven infrastructure with puppet - puppetconf...

puppetconf 2016: collaboration and empowerment: driving...

understanding openstack deployments - puppetconf 2014

infrastructure as data - puppetconf 2013

continuous infrastructure: modern puppet for the jenkins...

r10k workshop - puppetconf 2014