autobots @ rea

Post on 19-Jul-2015

207 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Autobots @ realestate.com.au

AutomateALL THE THINGS(or at least the things that really matter)

Technical Lead @ realestate.com.au

Who am I?

@ggiesemann @geekle

Some newcomers mistake me for a sysadmin :(

Geoffrey Giesemann

(and what the hell am I talking about?)

What's the problem?

WTF is going on?!?

I hope my wife still recognises me...

(deployments are *hard*)

(just the usual suspects)

Why so hard?

(and who do we blame?)

How do we fix this?

deploybot

with_down_in_load_balancer(server) do with_down_in_nagios(server) do puppet(server) raise "borked!" unless server.working? endend

schemabot

What's worked well?

Service Discovery!

$ grep -r 'aa01' nagios/ | wc -l 17$ grep -r 'aa01' puppet/ | wc -l 4$ grep -r 'aa01' deploybot/ | wc -l 0

> show lb vserver agentadmin-prod

agentadmin-prod (125.56.204.120:80) - HTTP Type: ADDRESS State: UP

... blah blah ...

Bound Service Groups:

1) Group Name: agentadmin

1) agentadmin (192.168.25.1: 80) - HTTP State: UP Weight: 1

Persistence Cookie Value : my_random_str=9999

2) agentadmin (192.168.25.2: 80) - HTTP State: UP Weight: 1

Persistence Cookie Value : my_random_str=9999

... etc etc ...

> show servicegroup agentadmin

agentadmin - HTTP

State: ENABLED Monitor Threshold : 0

Max Conn: 0 Max Req: 0 Max Bandwidth: 0 kbits

Monitor Name: http-diagnositic-warmup State: ENABLED Weight: 1

1) 192.168.25.1:80 State: UP Server Name: 192.168.25.1

Probes: 131205 Failed [Total: 2525 Current: 0]

Last response: Success - HTTP response code 200 received.

2) 192.168.25.2:80 State: UP Server Name: 192.168.25.2

Probes: 131322 Failed [Total: 2428 Current: 0]

Last response: Success - HTTP response code 200 received.

Standardised Monitoring!

$ curl http://aa01/diagnostic/status/nagiosOK - the application is functioning correctly

$ curl http://ar01/diagnostic/status/nagiosOK - the application is functioning correctly

● custom app health checks○ https://github.com/tribune/is_it_working○ https://github.com/blythedunham/health_monitor

● monitor requests in munin○ https://github.com/pka/rack-monitor

Better Error Handling!

● Don't bother trying to handle errors - just make sure you can recover from them quickly!

● As long as you have some app servers alive things will work out!

FIN

● http://markerguru.deviantart.com/● http://www.quickmeme.com/● http://www.doingitwrong.com/● Everyone who was part of @rea_autobots

Thanks!

top related