Download - Automating Cloud Applications Using Open Source

Automating Life in the Cloud!

Joshua Buss, Matthew Kemp & Cody Ray!

Add more features!!!This widget is too slow!!!No more downtime!!!Weʼre losing potential customers in Asia!!

Use Case 0: Scalability and Reliability!Designing for the Cloud!

3

Focus on scaling applications horizontally.!

Use Case 0: Scalability and Reliability!Scalability!

4

Wikipedia Definition:!SOA as an architecture relies on service-orientation as its fundamental design principle. If a service presents a simple interface that abstracts away its underlying complexity, users can access independent services without knowledge of the service's platform implementation.!!Laymanʼs terms:!A complex system is broken into simple components that are able to interact with each other (and possibly outside sources).!

Use Case 0: Scalability and Reliability!Service Oriented Architecture!

5

What is a Service in SOA?!

6

An independent unit that's composable with other components.!

Use Case 0: Scalability and Reliability!

Presenta(on (web, api, etc)

Business Logic

Data Access

Data Access

Business Logic

Data Access

Data Stores

Use Case 0: Scalability and Reliability!Services at BrightTag!

7

database

stathub

datahub

ui database

tagserve

When should you split services up?!

Use Case 0: Scalability and Reliability!Service Division of Labor!

8

Keep failures self contained.!

Use Case 0: Scalability and Reliability!Design for Failure!

9

Release It! by Nyard is a great resource for stability patterns

stathub database

database datahub tagserve

ui

Run a full stack in each region.!Use Case 0: Scalability and Reliability!Redundancy at BrightTag!

10

database

stathub

datahub

ui database

tagserve

Services are over HTTP.!!Able to use standard tools and components without extra effort.!

Use Case 0: Scalability and Reliability!!

Load Balancers!

11

Changes need to be allowed, but compatibility needs to be maintained.!!

Use Case 0: Scalability and Reliability!Backwards Compatibility!

12

Need some data available in all regions, but keep inter-region communication to a minimum.!!

Case 1: Inter-Region Communication!Cross-Region Data Replication!

13

Google's BigTable data model on Amazon's Dynamo infrastructure.!

Case 1: Inter-Region Communication!What is Cassandra?!

14

Case 1: Inter-Region Communication!Cassandra Token Ring!

15

cassandra01 [0-‐63]




East





West

Key hashes to 157?

Case 1: Inter-Region Communication!How Cassandra Writes!

16





East





West

Writes goes here.

Cross region messaging over HTTPS with compression.!

Case 1: Inter-Region Communication!Cross Region Messaging (Hiveway)!

17

local hiveway

remote hiveway

Mes

sage

s

Mes

sage

s

Use Case 2: Zero Downtime Builds!Smooth Code Pushes!

18

Easy migrations and upgrade path.!

Can be more expensive.!

Use Case 2: Zero Downtime Builds!Mirror Environment Cutover!

19

More complicated migrations and upgrades.!!Longer deploy window.!!Usually cheaper.!!

Use Case 2: Zero Downtime Builds!Rolling Deploy!

20

for region in regions: for app in apps: for server in region: if app on server: maintenance app scp new code to <deployment_tag> dir symlink app/current to app/<deployment_tag> restart app wait for healthy!

Use Case 2: Zero Downtime Builds!Fabric Pseudocode!

21

Use Case 2: Zero Downtime Builds!!

Health Checks at BrightTag!

22

Standardized health checks across services.!!!$ curl -‐si 'http://service/bthc' HTTP/1.1 204 No Content $ curl -‐si 'http://service/bthc?action=maint' HTTP/1.1 500 Internal Server Error Connection: close Content-‐Length: 5 MAINT

At a glance environment health.!Use Case 2: Zero Downtime Builds!Keeping an Eye on the Pulse!

23

Provide multiple modes of operation.!

Use Case 2: Zero Downtime Builds!Runtime Controls!

24

Use Case 3: Generating /etc/hosts Connectivity

Use Case 3: Generating /etc/host What is Zerg?!

26

+ =

DRIVER_MAPPING = { "dev": { "office": get_driver(Provider.EUCALYPTUS)( DEV_ID, secret=DEV_KEY, host="openmaster", port=8773, secure=False, path="/services/Cloud") }, "prod": { "us-‐east-‐1": get_driver(Provider.EC2_US_EAST)(PROD_ID, PROD_KEY), "eu-‐west-‐1": get_driver(Provider.EC2_EU_WEST)(PROD_ID, PROD_KEY) } }

@app.route("/hosts/<env>/<region>") def hosts(env, region): hosts = DRIVER_MAPPING[env][region].list_nodes() return str([d.extra['private_dns'] for host in hosts])

!

Use Case 3: Generating /etc/hosts Flask and libcloud Working Together!

27

@app.route("/etchosts/<env>/<region>") def etchosts(env, region): driver = DRIVER_MAPPING[env][region] sorted_nodes = sorted((node.name, node.private_ips, node.public_ips) for node

in driver.list_nodes()) hosts = [{'private_ip':private_ips[0], 'name':name, 'public_ip':public_ips[0]}

for (name, private_ips, public_ips) in sorted_nodes] response = render_template('etc_hosts.txt', hosts=hosts) return Response(response, content_type='text/plain')

Template:!# The following lines are desirable for IPv6 capable hosts ::1 ip6-‐localhost ip6-‐loopback

{% for host in hosts %} {{ "%-‐21s%-‐21s# External: %s"|format(host.private_ip, host.name,

host.public_ip) }} {%-‐ endfor %}

Use Case 3: Generating /etc/hosts The Zerg Code!

28

$ curl –s 'http://zerg/etchosts/prod/eu-west-1'

# The following lines are desirable for IPv6 capable hosts"

::1 ip6-localhost ip6-loopback

10.0.0.10 server01 # External: 123.123.123.123

10.0.0.11 server02 # External: 123.123.123.124

10.0.0.12 server03 # External: 123.123.123.125

10.0.0.13 server04 # External: 123.123.123.126

10.0.0.14 server05 # External: 123.123.123.127

10.0.0.15 server06 # External: 123.123.123.128

Use Case 3: Generating /etc/hosts !

The Zerg HTTP Response!

29

# Set variables read -‐r -‐d '' STATIC_HOSTS << static_hosts # The following lines are included by default 127.0.0.1 localhost # DO NOT EDIT THIS COMMENT -‐ everything after this line is managed by zerg! static_hosts cp /etc/hosts ${TMPDIR}/old_hosts grep -‐B 5000000 '# DO NOT' ${TMPDIR}/old_hosts >> ${TMPDIR}/static_hosts cp ${TMPDIR}/static_hosts ${TMPDIR}/new_hosts wget -‐qO-‐ "http://${ZERG_IP}/etchosts/${E}/${R}" >> ${TMPDIR}/new_hosts && if [[ $(diff ${TMPDIR}/new_hosts /etc/hosts | wc -‐l | awk '{print $1}') < 7

|| ${FORCE} == '-‐-‐force' ]]; then cp ${TMPDIR}/new_hosts /etc/hosts; fi

Use Case 3: Generating /etc/hosts The bash update_hosts.sh script!

30

Update timing tricky to get right!

Too important to leave completely autonomous!

Use Case 4: Generating Load Balancer Configuration!Configuring Load Balanced Services!

31

Need a rock-solid foundation to deploy onto.

Use Case 4: Generating Load Balancer Configuration!Consistency > *

Set environment per-instance: /etc/puppet/puppet.conf

Symlink /etc/puppet/environments/ on master to various git checkouts of the source:

$ cd /etc/puppet/environments $ ln –s ~/src/puppet/prod_stable prod_stable $ ln –s ~/src/puppet/dev_stable dev_stable $ ln –s ~/src/puppet/dev_test dev_test

Use cron to keep all branches up-to-date

Use Case 4: Generating Load Balancer Configuration!Single Puppet Master

Each environment has its own branch.

Make a new branch for every new feature.

Merge into a test branch to test.

Merge into stable.

Use Case 4: Generating Load Balancer Configuration!Source Controlled Puppet Configs

APP_DEFS : { "zerg": { "type": "http", "healthcheck": {"port": 19999, "resource": "/zerghealth"} }, "awesome": { "type": "http", "healthcheck": {"port": 20000, "resource": "/ahc"}, "frontend" : "10080" }, "haproxy_awesome":{ "type": "http", "healthcheck": {"port": 20001, "resource": "/"} }, "foo": { "type": "http", "healthcheck": {"port": 20002, "resource": "/"}, "frontend" : "10081" }, "mashed_potatoes": { "type": "http", "healthcheck": {"port": 20003, "resource": ”/"}, "frontend" : "10082" }, "haproxy_foo": { "type": "http", "healthcheck": {"port": 20004, "resource": "/hc"} }, "thehardproblem": { "type": "http", "healthcheck": {"port": 20006, "resource": "/"} }, "redis": { "type": "tcp", "healthcheck": {"port": 20007, "resource": "/rhc"} }, "dataserver": { "type": "http", "healthcheck": {"port": 20008, "resource": "/"} }, "frontend" : "10083" }, "itshards":{ "type": "http", "healthcheck": {"port": 20009, "resource": "/"} }, "devnull": { "type": "http", "healthcheck": {"port": 200010, "resource": "/hc"} } }

Use Case 4 – Load Balancer Configs!The App Definitions in Zerg!

35

@app.route("/haproxy/<env>/<region>/<type>") def haproxy(env, region, type): instances = get_region_manifest(region) apps = {} for app in APP_DEFS[env]: if 'frontend' in APP_PORTS[env][app].keys(): app_object = { 'servers':[], 'backend_port': APP_PORTS[env][app]['healthcheck']['port'], 'frontend_port': APP_PORTS[env][app]['frontend'] } for server in instances: if app in instances[server]['roles']: app_object['servers'].append({'name':server, 'details':instances[server]}) apps[app] = app_object return render_template('haproxy_%s_%s_%s.txt' % (env, region, type), vips=apps)

Use Case 4 – Load Balancer Configs!The Zerg Code!

36

global blah blah defaults blah blah frontend dataserver_vip bind *:{{ vips.dataserver.frontend_port }} default_backend dataserver frontend mashed_potatoes_vip bind *:{{ vips.mashed_potatoes.frontend_port }} default_backend mashed_potatoes backend dataserver balance roundrobin {%-‐ for server in vips.dataserver.servers %} server {{ server['name'] }} {{ server.details['private ip'] }}:{{ vips.dataserver.backend_port }} check {%-‐ endfor %} backend mashed_potatoes balance roundrobin {%-‐ for server in vips.mashed_potatoes.servers %} server {{ server['name'] }} {{ server.details['private ip'] }}:{{ vips.mashed_potatoes.backend_port }} check {%-‐ endfor %}

Use Case 4 – Load Balancer Configs!The Zerg Flask Template!

37

$ curl –s http://zerg/haproxy/<env>/<region>/<type>

globals and defaults blah blah frontend dataserver_vip

bind *:10083 default_backend dataserver frontend mashed_potatoes_vip

bind *:10082 default_backend mashed_potatoes

backend dataserver

blah blah options server dataserv01 10.0.0.28:20008 check server dataserv02 10.0.0.29:20008 check

backend mashed_potatoes

blah blah options server taters01 10.0.0.30:20003 check server taters02 10.0.0.31:20003 check

Use Case 4 – Load Balancer Configs!The Zerg HTTP Response!

38

Use Case 4 – Load Balancer Configs!The Config Workflow!

39

Large changes to templates (human)

Git (ops)

Zerg (genera(on)

Script (human)

Git (puppet)

Server Server Server

./update_haproxy.sh <env> <region> <service> ** Git is clean and in sync with origin.. now waiting for zerg http response.. [prod_stable 012345] [puppet] Haproxy Auto-‐Commit for <env> <region> <service> 1 files changed, 2 insertions(+), 2 deletions(-‐) ** Template pulled and committed ** Here is the diff from origin to the new version: diff -‐-‐git a/modules/haproxy/templates/haproxy_<env>_<region>_<service>_cfg.erb b/modules/haproxy/templates/haproxy_<env>_<region>_<service>_cfg.erb -‐-‐-‐ a/modules/haproxy/templates/haproxy_prod_us-‐east-‐1_tagserve_cfg.erb +++ b/modules/haproxy/templates/haproxy_prod_us-‐east-‐1_tagserve_cfg.erb -‐ server oldandslow01 10.0.0.23:20003 check -‐ server oldandslow02 10.0.0.24:20003 check + server taters01 10.0.0.30:20003 check + server taters01 10.0.0.31:20003 check ** Do you want to push this change? (y/n) y blah blah successful git push message ** Commit successfully pushed to origin ** All done!

Use Case 4 – Load Balancer Configs!The bash update_haproxy.sh script!

40

Alerting, Monitoring & Visualization!!

Use Case 5: Dashboards & Alerting!!

What's really going on?!

41

Identify metrics that act as signals.!!Add alerts after every incident.!

Use Case 5: Dashboards & Alerting!What to monitor?!

42

Use Case 5: Dashboards & Alerting!Metric Polling at BrightTag!

43

graphite carbon mpoller

tagserve haproxy

datahub redis

cassandra

graphite carbon

tagserve haproxy mpoller

datahub redis

mpoller cassandra mpoller

Storage of historical metrics allows for trending and comparisons.!!Aggregation is performed on data retrieval via the webapp.!

Use Case 5: Dashboards & Alerting!Graphite!

44

Expose a "metrics" service per region.!!Enables a flexible topology.!!

Use Case 5: Dashboards & Alerting!Branches and Leaves!

45

Use Case 5: Dashboards & Alerting!Metric Aggregation at BrightTag!

46

tagserve haproxy

datahub redis

cassandra metrics

metrics metrics

dashboard

Use Case 5: Dashboards & Alerting!Realtime Numbers Across Regions!

47

Requests are farmed out to each metrics service.

Different visualizations tell you different things.!Use Case 5: Dashboards & Alerting!!

Visualization!

48

Tattle allows us to alert on any metric in Graphite.!!Alerting is done per region.!

Use Case 5: Dashboards & Alerting!Alerting!

49

Fabric is push, puppet is pull.!!Businesses don't move as fast as infrastructure changes, but configs have to stay up to date all the time.!

(/etc/hosts) (systempoller.py) (mashed_potatoes.env) (dataserver.war)

puppet ===================================== fabric (real-‐time up-‐to-‐date) (moderately up-‐to-‐date) (weekly)

!

Deployment!Fabric vs Puppet!

50

Have to go with what cloud provider offers.!!Not always ideal for every workload.!

Designing for the Cloud!Virtual Machines!

51

(but if you find one let us know)!There are no Silver Bullets!

52

Questions?!

53

Download - Automating Cloud Applications Using Open Source

Top Related