design consequences of dev ops practices

96
NICTA Copyright 2012 From imagination to impact Design Consequences of DevOps Practices Len Bass

Upload: len-bass

Post on 27-Jan-2015

107 views

Category:

Technology


0 download

DESCRIPTION

This is a tutorial that I am giving at WICSA 2014 http://www.wicsa.net/ and SATURN 2014 https://www.sei.cmu.edu/saturn/2014/

TRANSCRIPT

Page 1: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Design

Consequences

of DevOps

Practices

Len Bass

Page 2: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Introductions

• Me

• You

2

Page 3: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Overview of Tutorial

• DevOps practices when taken to the limit for

internet scale organizations => continuous

delivery

• Economics of deployment when have many

instances of services => rolling upgrade

3

Page 4: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Outline

• What is DevOps?

– Definitions

– Deriving architecturally significant requirement

• Architectural style elaboration

• Deployment

• Summary

4

Page 5: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

What is DevOps?

• “DevOps is a software development method that stresses

communication, collaboration, and integration between software

developers and IT professionals” – Wikipedia

• From an architect’s or developers’ perspective it means treating

system administrators and operators as first class stakeholders.

5

Page 6: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

What is DevOps - 2

• DevOps is accompanied by a certain amount of

mysticism.

– “Be Self-Aware

– Be aware of a project’s maturity

– Be aware of others” http://architects.dzone.com/articles/zen-and-art-collaborative

• Similar to the early days of agile.

6

Page 7: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

What problem is DevOps trying to solve?

• Poor communication between developers and

operations personnel

• Slow release schedule

• Limited capacity of operations staff

• Limited organizational insight into operations

7

Page 8: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Communication between developers and

operations staff

• Log messages

– What information is needed to do monitoring and

error diagnosis?

– Where is the best place to put particular types of

information?

• Release planning

– What is the scheduling for the next release?

– What capacity is needed for the next release?

– What are the infrastructure compatibility requirements

for the next release?

8

Page 9: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Release plan

1. Define and agree release and deployment plans with

customers/stakeholders.

2. Ensure that each release package consists of a set of related assets and

service components that are compatible with each other.

3. Ensure that integrity of a release package and its constituent components is

maintained throughout the transition activities and recorded accurately in

the configuration management system.

4. „„Ensure that all release and deployment packages can be tracked, installed,

tested, verified, and/or uninstalled or backed out, if appropriate.

5. „„Ensure that change is managed during the release and deployment

activities.

6. „„Record and manage deviations, risks, issues related to the new or changed

service, and take necessary corrective action.

7. „„Ensure that there is knowledge transfer to enable the customers and users

to optimise their use of the service to support their business activities.

8. „„Ensure that skills and knowledge are transferred to operations and support

staff to enable them to effectively and efficiently deliver, support and

maintain the service, according to required warranties and service levels *http://en.wikipedia.org/wiki/Deployment_Plan

9

Page 10: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Limited capacity of operations staff

• The number of physical servers that can be

administered by a single sys admin varies

depending on context but some data*

– As low as 10 per admin

– Norm of 30 per admin at small-medium businesses

• Depends on whether admin performs just

maintenance or whether admin is also involved

in other projects

*http://www.computerworld.com.au/article/352635/there_best_practice_server_

system_administrator_ratio_/

10

Page 11: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Limited Organizational insight into

operations

• An organization has budgetary insight into

operations.

• The impact of various operational activities on

business value is difficult to discern.

• This is a long running complaint that goes under

the heading of “aligning IT with the

business”. There are differences in

– Objectives

– Culture

– Incentives

11

Page 12: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

DevOps can also be a role

• DevOps practices rely on a high degree of

automation and standardization of tools

• Someone has to be responsible for these tools.

• Person filling this role is “DevOps Engineer”

12

Page 13: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

My Take on DevOps

• DevOps is a set of practices intended to – Reduce management overhead

– Speed up deployment

– Move some (formerly) IT responsibilities to developers

– Increase communication between developers and operations

– Reduce operations costs

• Are there architecturally significant requirements

in these practices?

13

Page 14: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Architecturally significant requirement

• Speed up deployment through minimizing

synchronous coordination among development

teams.

• Synchronous coordination such as a meeting

adds time since it requires – Ensuring that all parties are available

– Ensuring that all parties have the background to make

the coordination productive.

– Following up to decisions made during the meeting.

14

Page 15: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Summary of this section

• DevOps is a collection of practices designed, among

other things, to reduce time to deploy new features.

• Reducing time to deploy new features can be

accomplished by reducing synchronous coordination

among development teams

– This is an architecturally significant requirement that we will carry

forward.

15

Page 16: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Questions

16

Page 17: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Outline

• What is DevOps?

• Architectural Style Elaboration

– Micro Service Oriented Architecture

– Categories of design decisions

– How micro SOA specifies or delegates the categories

of design decisions

• Deployment

• Summary

17

Page 18: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Deployment pipeline

• Developers commit code

• Code is compiled

• Binary is processed by a build and unit test tool which

builds the service

• Integration tests are run followed by performance tests.

• Result is a machine image (assuming virtualization)

• The service (its image) is deployed to production.

18

Page 19: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Continuous Deployment

• Deployment pipeline is triggered by commit of

code

• All gates from one phase to the next are

automatic.

19

Page 20: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Requirements that drive the design in this

section

• Reduce synchronous communication among

development teams

– Continuous deployment

– Individual developers can commit to production (as

long as automated tests are passed)

• Scalability and performance

• Reliability

• A different ordering of requirements will produce

a different design

20

Page 21: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Architectural Style

• An architectural style (pattern) can specify many

decisions that might otherwise require

synchronous coordination among development

teams.

• The remainder of this section will justify why the

Micro Service Oriented Architecture style

satisfies our identified Architecturally Significant

Requirement.

21

Page 22: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Amazon design rules - 1

• All teams will henceforth expose their data and

functionality through service interfaces.

• Teams must communicate with each other

through these interfaces.

• There will be no other form of inter-process

communication allowed: no direct linking, no

direct reads of another team’s data store, no

shared-memory model, no back-doors

whatsoever. The only communication allowed is

via service interface calls over the network.

22

Page 23: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Amazon design rules - 2

• It doesn’t matter what technology they[services]

use.

• All service interfaces, without exception, must be

designed from the ground up to be

externalizable.

• Amazon is optimizing for its workload with these

requirements

– Mainly searching and browsing and web page

delivery

– Some transactions but not the dominant portion of the

workload. 23

Page 24: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Micro service oriented architecture

24

Service

• Each user request is

satisfied by some sequence

of services.

• Most services are not

externally available.

• Each service communicates

with other services through

service interfaces.

• Service depth may be 70,

e.g. LinkedIn

Page 25: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Relation of teams and services

• Each service is the responsibility of a single

development team

• Individual developers can deploy new version without

coordination with other developers.

• It is possible that a single development team is

responsible for multiple services

• Team size

• Coordination among team members

must be high bandwidth and low

overhead.

• Typically is done with small teams –

as in agile.

25

Page 26: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Design decisions

• Seven categories of design decisions*.

1. Allocation of responsibilities.

2. Coordination model.

3. Data model.

4. Management of resources.

5. Mapping among architectural elements.

6. Binding time decisions.

7. Choice of technology

*Software Architecture in Practice 3rd edition, Chap 4

26

Page 27: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Design decisions made or delegated by

choice of micro SOA

• Micro service oriented architecture either

specifies or delegates to the development team

five out of the seven categories of design

decisions.

1. Allocation of responsibilities.

2. Coordination model.

3. Data model.

4. Management of resources.

5. Mapping among architectural elements.

6. Binding time decisions.

7. Choice of technology

27

Page 28: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Roadmap for next several slides

• Micro service oriented architectural style will

either specify or allow delegation of five different

categories of design decisions.

• Each decision category will be discussed

separately.

28

Page 29: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Decision 1 – allocation of responsibilities

• This decision is not delegated to the team or

specified.

• Development teams must coordinate to divide

responsibilities for features that are to be added.

• Typically this happens at the beginning of each

iteration cycle.

29

Page 30: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Decision 2 - coordination model

• Elements of service interaction

– Services communicate asynchronously through

message passing

– Each service could (in principle) be deployed

anywhere on the net.

• Latency requirements will probably force particular

deployment location choices.

• Services must discover location of dependent services.

30

Page 31: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Service discovery

31

• When an instance of a

service is launched, it

registers with a

registry/load balancer

• When a client wishes

to utilize a service, it

gets the location of an

instance from the

registry/load balancer.

• Eureka is an open

source registry/load

balancer

Instance of

a service

Client

Register

Invoke

Registry/

load balancer

Query registry

Page 32: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Subtleties of registry/load balancer

• When multiple instances of the same service

have registered, the load balancer can rotate

through them to equalize number of requests to

each instance.

• Each instance must renew its registration

periodically (~90 seconds) so that load balancer

does not schedule message to failed instance.

• Registry can keep other information as well as

address of instance. For example, version

number of service instance.

32

Page 33: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Decision 3 – Data model

• Schema based database system (relational).

Requires coordination.

– Development teams must coordinate when schema is

defined or modified.

– Schema definition happens once when the

architecture is defined. Schema modification should

be rare occurrence. Schema extensions (new fields or

tables) do not cause problems.

• NoSQL systems. Will still require coordination

over semantics of data.

– Data written by one service is typically read by others,

they must agree on semantics.

33

Page 34: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Decision 4 – Resource Management

• Each instance of a service can process a certain

workload.

– Could be expressed in terms of requests

– Could be expressed in terms of resource

requirements – e.g. CPU

• Each client instance will require resources from

the service to process its requests.

• Service Level Agreements (SLAs) are a means

for automating the resource assumptions of the

clients and the resource requirements of the

service.

34

Page 35: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Managing SLAs

• A requirement for each service is to provide an SLA for

its response time in terms of the workload asked of it.

– E.g. For a workload of Y requests per second, I will

provide a response within X seconds.

• A requirement for each client is to provide an estimate of

the requests it will make of each dependent service.

– E.g. for each request I receive, I will make Z requests

for your service per second.

• This combination will enable a run time determination of

the number of instances required for each service to

meet its SLA.

35

Page 36: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Provisioning new instances

• When the desired workload of a service is greater than

can be provided by the existing number of instances of

that service, new instances can be instantiated (at

runtime).

• Four possibilities for initiating new instance of a service:

1. Client. Client determines whether service is adequately

provisioned for its needs based on service SLA and services

current workload.

2. Service. Service determines whether it is adequately

provisioned based on number of requests it expects from

clients.

3. Registry/load balancer determines appropriate number of

instances of a service based on SLA and client instance

requests.

4. External entity can initiate creation of new instances

36

Page 37: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Responsibilities of development teams.

• SLA determination of a service is done by the

service development team prior to deployment

augmented by run time discovery.

• Determination of a client's requirements for a

service are is done by the client’s development

team.

• Choice of which component has responsibility

for instantiating/deinstantiating instances of a

service is done as a portion of the architecture

definition.

37

Page 38: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Decision 5 – Mapping among architectural

elements

• Decisions about packaging modules into

processes and processes into a service are

delegated to the service development team.

• Decisions about deployment of a service will be

discussed in the next section.

38

Page 39: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Decision 6 – Binding time

• Configuration information binding time is

decided during the development of architecture

and the deployment pipeline.

• Other binding time decisions are delegated to

the service development team.

39

Page 40: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Decisions 7 – Technology choices

• All technology choices are delegated to the

service development team.

40

Page 41: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Questions about Micro SOA

• /Q/ Isn’t it possible that different teams will implement the

same functionality, likely differently?

• /A/ Yes, but so what? Major duplications are avoided

through assignment of responsibilities to services. Minor

duplications are the price to be paid to avoid necessity

for synchronous coordination.

• /Q/ what about transactions?

• /A/ Micro SOA privileges flexibility above reliability and

performance. Transactions are recoverable through

logging of service interactions. This may introduce some

delays if failures occur.

41

Page 42: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Summary

• Synchronous coordination among development

teams is avoided by

– Using a micro SOA architecture

– Having the architecture specify the coordination

model and resource management techniques used by

the application.

– Delegating to the development team mapping,

binding time, and technology decisions.

– Having each service be the responsibility of a single

development team.

• Micro SOA privileges flexibility and development

team independence over performance and

reliability.

42

Page 43: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Questions

43

Page 44: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Outline

• What is DevOps?

• Overall Architectural Style

• Deployment

– Deployment strategies

– Maintaining Logical Consistency.

• Summary

44

Page 45: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Deployment Overview

45

Multiple instances

of a service are

executing • Red is service being

replaced with new version

• Blue are clients

• Green are dependent

services

VA VB VB VB

UAT / staging / performance

tests

Page 46: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Deployment goal and constraints

• Goal of a deployment is to move from current

state (N instances of version A of a service) to a

new state (N instances of version B of a service)

• Constraints:

– Any development team can deploy their service at

any time. I.e. New version of a service can be

deployed either before or after a new version of a

client. (no synchronization among development

teams)

– It takes time to replace one instance of version A with

an instance of version B (order of minutes)

– Service to clients must be maintained while the new

version is being deployed. 46

Page 47: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Deployment strategies

• Two basic all of nothing strategies

– Big Flip – leave N instances with version A as they

are, allocate and provision N instances with version B

and then switch to version B and release instances

with version A.

– Rolling Upgrade – allocate one instance, provision it

with version B, release one version A instance.

Repeat N times.

• Other deployment topics

– Partial strategies (canary testing, A/B testing,). We

will discuss them later. For now we are discussing all

or nothing deployment.

– Rollback

– Packaging services into machine images

47

Page 48: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Trade offs - Big Flip and Rolling Upgrade

• Big Flip

– Only one version available

to the client at any

particular time.

– Requires 2N instances

(additional costs)

• Rolling Upgrade

– Multiple versions are

available for service at the

same time

– Requires N+1 instances.

• Rolling upgrade is

commonly preferred. 48

Update Auto Scaling

Group

Sort Instances

Remove & Deregister

Old Instance from ELB

Confirm Upgrade Spec

Terminate Old

Instance

Wait for ASG to Start

New Instance

Register New Instance

with ELB

Rolling

Upgrade

in EC2

Page 49: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Types of failures during rolling upgrade

Rolling Upgrade Failure

Provisioning

See references at end

Logical failure

Inconsistencies to be discussed

Instance failure

Handled by Auto Scaling Group in EC2

49

Page 50: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

What are the problems with Rolling

Upgrade?

• Recall that any development team can deploy

their service at any time.

• Three concerns

– Maintaining consistency between different versions of

the same service when performing a rolling upgrade

– Maintaining consistency among different services

– Maintaining consistency between a service and

persistent data

50

Page 51: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Maintaining consistency between different

versions of the same service

• Key idea – differentiate between installing a new

version and activating a new version

• Involves “feature toggles” (described

momentarily)

• Sequence

– Develop version B with new code under control of

feature toggle

– Install each instance of version B with the new code

toggled off.

– When all of the instances of version A have been

replaced with instances of version B, activate new

code through toggling the feature. 51

Page 52: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Issues

• What is a feature toggle?

• How do I manage features that extend across

multiple services?

• How do I activate all relevant instances at once?

52

Page 53: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Feature toggle

• Place feature dependent new code inside of an

“if” statement where the code is executed if an

external variable is true. Removed code would

be the “else” portion.

• Used to allow developers to check in

uncompleted code. Uncompleted code is

toggled off.

• During deployment, until new code is activated,

it will not be executed.

• Removing feature toggles when a new feature

has been committed is important.

53

Page 54: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Multi service features

• Most features will involve multiple services.

• Each service has some code under control of a

feature toggle.

• Activate feature when all instances of all

services involved in a feature have been

installed.

– Maintain a catalog with feature vs service version

number.

– A feature toggle manager determines when all old

instances of each version have been replaced. This

could be done using registry/load balancer.

– The feature manager activates the feature.

54

Page 55: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Activating feature

• The feature toggle manager changes the value

of the feature toggle. Two possible techniques to

get new value to instances.

– Push. Broadcasting the new value will instruct each

instance to use new code. If a lag of several seconds

between the first service to be toggled and the last

can be tolerated, there is no problem. Otherwise

synchronizing value across network must be done.

– Pull. Querying the manager by each instance to get

latest value may cause performance problems.

• A coordination mechanism such as Zookeeper

will overcome both problems. I will discuss

Zookeeper if I have time at the end. 55

Page 56: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Maintaining consistency across versions

(summary)

• Install all instances before activating any new

code

• Use feature toggles to activate new code

• Use feature toggle manager to determine when

to activate new code

• Use Zookeeper to coordinate activation with low

overhead

56

Page 57: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Maintaining consistency among different

services

• Use case:

– Wish to deploy new version of service A without

coordinating with development team for clients of

service A.

• I.e. new version of service A should be backward compatible

in terms of its interfaces.

• May also require forward compatibility in certain

circumstances, e.g. rollback

57

Page 58: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Achieving Backwards Compatibility

• APIs can be extended but must always be

backward compatible.

• Leads to a translation layer

External APIs (unchanging but with ability to extend

or add new ones)

Translation to internal APIs

Client Client

Internal APIs (changes require changes to

translation layer but do not propagate further)

Page 59: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

What about dependent services?

• Dependent services that are within your control

should maintain backward compatibility

• Dependent services not within your control (third

party software) cannot be forced to maintain

backward compatibility.

– Minimize impact of changes by localizing interactions

with third party software within a single module.

– Keeping services independent and packaging as

much as possible into a virtual machine means that

only third party software accessed through message

passing will cause problems.

59

Page 60: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Forward Compatibility

• Gracefully handle unknown calls and data base schema

information

– Suppose your service receives a method call it does

not recognize. It could be intended for a later version

where this method is supported.

– Suppose your service retrieves a data base table with

an unknown field. It could have been added to

support a later version.

• Forward compatibility allows a version of a service to be

upgraded or rolled back independently from its clients. It

involves both

– The service handling unrecognized information

– The client handling returns that indicate unrecognized

information. 60

Page 61: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Maintaining consistency between a service

and persistent data

• Assume new version is correct – we will discuss the

situation where it is incorrect in a moment.

• Inconsistency in persistent data can come about

because data schema or semantics change.

• Effect can be minimized by the following practices (if

possible).

– Only extend schema – do not change semantics of

existing fields. This preserves backwards

compatibility.

– Treat schema modifications as features to be toggled.

This maintains consistency among various services

that access data.

61

Page 62: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

I really must change the schema

• In this case, apply pattern for backward

compatibility of interfaces to schemas.

• Use features of database system (I am

assuming a relational DBMS) to restructure data

while maintaining access to not yet restructured

data.

62

Page 63: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Summary of consistency discussion so far.

• Feature toggles are used to maintain

consistency within instances of a service

• Backward compatibility pattern is used to

maintain consistency between a service and it s

clients.

• Discouraging modification of schema will

maintain consistency between services and

persistent data.

– If schema must be modified, then synchronize

modifications with feature toggles.

63

Page 64: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Canary testing

• Canaries are a small number of instances of a new

version placed in production in order to perform live

testing in a production environment.

• Canaries are observed closely to determine whether the

new version introduces any logical or performance

problems. If not, roll out new version globally. If so, roll

back canaries.

• Named after canaries

in coal mines.

64

Page 65: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Implementation of canaries

• Designate a collection of instances as canaries. They do

not need to be aware of their designation.

• Designate a collection of customers as testing the

canaries. Can be, for example

– Organizationally based

– Geographically based

• Then

– Activate feature or version to be tested for canaries.

Can be done through feature activation

synchronization mechanism

– Route messages from canary customers to canaries.

Can be done through making registry/load balancer

canary aware.

65

Page 66: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

A/B testing

• Suppose you wish to test user response to a

system variant. E.g. UI difference or marketing

effort. A is one variant and B is the other.

• You simultaneously make available both

variants to different audiences and compare the

responses.

• Implementation is the same as canary testing.

66

Page 67: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Rollback

• New versions of a service may be unacceptable

either for logical or performance reasons.

• Two options in this case

• Roll back (undo deployment)

• Roll forward (discontinue current deployment and

create a new release without the problem).

• Decision to rollback or roll forward is almost

never automated because there are multiple

factors to consider.

• Forward or backward recovery

• Consequences and severity of problem

• Importance of upgrade 67

Page 68: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

States of upgrade.

• An upgrade can be in one of two states when an

error is detected.

– Installed (fully or partially) but new features not

activated

– Installed and new features activated.

68

Page 69: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Possibilities

• Initially we will discuss the situation where

persistent data is not incorrect. Later we will

discuss persistent data.

• Installed but new features not activated

– Error must be in backward compatibility

– Halt deployment

– Roll back by reinstalling old version

– Roll forward by creating new version and installing

that

• Installed with new features activated

– Turn off new features

– If that is insufficient, we are at prior case.

69

Page 70: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Persistent data

• Keep log of user requests (each with their own

identification)

• Identification of incorrect persistent data • Tag each data item with metadata that provides service and

version that wrote that data

• user request that caused the data to be written

• Correction of incorrect persistent data (simplistic

version)

– Remove data written by incorrect version of a service

– Install correct version

– Replay user requests that caused incorrect data to be

written

70

Page 71: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Persistent data correction problems

I will not present good solutions to these problems.

1. Replaying user requests may involve requesting

features that are not in the current version.

– Requests can be queued until they can be correctly

re-executed

– User can be informed of error (after the fact)

2. There may be domino effects from incorrect data. i.e.

other calculations may be affected.

– Keep pedigree for data items that allows determining

which additional data items are incorrect. Remove

them and regenerate them when requests replayed.

– Data that escaped the system, e.g. sent to other

system or shown to a user, cannot be retrieved.

71

Page 72: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Summary of rollback options

• Can roll back or roll forward

• Rolling back without consideration of persistent

data is relatively straightforward.

• Managing erroneous persistent data is

complicated and will likely require manual

processing.

72

Page 73: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Packaging of services

• The last portion of the deployment pipeline is

packaging services into machine images for

installation.

• Two dimensions

– Flat vs deep service hierarchy

– One service per virtual machine vs many services per

virtual machine

73

Page 74: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Flat vs Deep Service Hierarchy

• Trading off independence of teams and

possibilities for reuse.

• Flat Service Hierarchy

– Limited dependence among services & limited

coordination needed among teams

– Difficult to reuse services

• Deep Service Hierarchy

– Provides possibility for reusing services

– Requires coordination among teams to discover

reuse possibilities. This can be done during

architecture definition.

74

Page 75: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Services per VM Image

75

Service

1

Service

2

VM image

Develop

Develop

Embed

Embed

One service per VM

Service VM image

Develop Embed

Multiple services per VM

Page 76: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

One Possible Race Condition with Multiple

Services per VM

76

TIME

Initial State: VM image with Version N of Service 1 and Version N of Service 2

Developer 1

Build new image with VN+1|VN

Begin provisioning

process with new image

Developer 2

Build new image with VN|VN+1

Begin provisioning

process with new image

without new version of

Service 1

Results in Version N+1 of Service 1 not being

updated until next build of VM image

Could be prevented by VM image build tool

Page 77: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Another Possible Race Condition with Multiple

Services per VM

77

TIME

Initial State: VM image with Version N of Service 1 and Version N of Service 2

Developer 1

Build new image with VN+1|VN

Begin provisioning

process with new image

overwrites image

created by developer 2

Developer 2

Build new image with VN+1|VN+1

Begin provisioning

process with new image

Results in Version N+1 of Service 2 not being

updated until next build of VM image

Could be prevented by provisioning tool

Page 78: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Trade offs

• One service per VM

– Message from one service to another must go

through inter VM communication mechanism – adds

latency

– No possibility of race condition

• Multiple Services per VM

– Inter VM communication requirements reduced –

reduces latency

– Adds possibility of race condition caused by

simultaneous deployment

78

Page 79: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Summary of Deployment

• Rolling upgrade is common deployment strategy

• Introduces requirements for consistency among

– Different versions of the same service

– Different services

– Services and persistent data

• Other deployment considerations include

– Canary deployment

– A/B testing

– Rollback

79

Page 80: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Question

80

Page 81: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Zookeeper

• What purpose does Zookeeper serve?

• Use cases

– Leader election

– Group membership

– Distributed locks

– Synchronization

– Configuration

• In our case, we will use Zookeeper to manage

activating features

81

Page 82: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Distributed applications

• Zookeeper provides guaranteed consistent

(mostly) data structure for every instance of a

distributed application.

– Definition of “mostly” is within eventual consistency

lag (but this is small)

• Zookeeper deals with managing failure as well

as consistency.

– Done using Praxis algorithm.

• Zookeeper guarantees that service requests are

linearly ordered and processed in a FIFO order

Page 83: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Model

• Zookeeper maintains a file type data structure

– Hierarchical

– Data in every node (called znode)

– Amount of data in each node assumed small (<1M)

– Intended for metadata

• Configuration

• Location

• Group

Page 84: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Zookeeper znode structure

/

<data>

/b1

<data>

/b1/c1

<data>

/b1/c2

<data>

/b2

<data>

/b2/c1

<data>

Page 85: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

API

Function Type

create write

delete write

Exists read

Get children Read

Get data Read

Set data write

+ others

• All calls return atomic views of state – either

succeed or fail. No partial state returned. Writes

also are atomic. Either succeed or fail. If they

fail, no side effects.

Page 86: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Use Case – leader election

• Many distributed applications have master

(leader)/slave structure

– One master, many slaves

– Master

• Sends work to slaves

• Monitors health of slaves and creates new ones as needed.

Page 87: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Using Zookeeper to elect master

• Suppose master fails. Then must create/choose

a new master.

• All candidates issue “create” call with node

name “master”.

• Only one of these create requests will succeed,

the rest will fail. This is one of the consistency

elements enforced by Zookeeper.

• Client who successfully creates znode named

“master” will become new master.

Page 88: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Using Zookeeper to manage group membership

• App connects to zookeeper – Get list of zookeeper servers

– Create session (if server fails – automatic fail over)

• Known group name – Create /group_name

• If already exists get a failure

• Client joins group by creating /group_name/my_id

• Client can list children of /group_name and get members of group.

• Watcher will inform client if group members fail or leave.

Page 89: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Using Zookeeper to manage distributed locks - 1

• Naïve solution

– All clients attempt to create /lockname

– Successful client has lock.

• Client will delete znode when finished with lock

• Znode will be deleted if client fails

– Unsuccessful clients will watch /lockname. If it is

deleted then they will attempt to create it.

– Repeat

Page 90: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Distributed locks – 2

• Problem with naïve solution is “herd effect”.

– If many clients all wake up and try to grab lock at

once there will be an impact on the system load.

• Better solution is for each client to watch

predecessor.

– Zookeeper enforces order

– When predecessor deletes /lockname, then client will

acquire it.

– If predecessor fails, client is informed and will watch

predecessor’s predecessor. Etc.

Page 91: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Using Zookeeper for distributed synchronization

• Create new synchronization client. – It creates synchronization node

– Other clients register on synchronization node at beginning of computation.

– At end of computation they remove themselves from synchronization node

• Synchronization client watches clients that have registered themselves. If one fails, it removes it from synchronization node.

• When synchronization node is empty, synchronization client deletes it and other clients (who are watching) can proceed.

Page 92: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Using Zookeeper for configuration

Each client records configuration information as

data in a child node it creates under a main

configuration node.

Checking configuration is a matter of getting data

from all of the children of the configuration node.

Page 93: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Using Zookeeper to synchronize activation

of features.

• Feature manager creates Znode containing

– <feature flag name, feature flag value>

– Written only when all services available.

• Service retrieves feature flag value from Znode

– If (Znode_read_value(feature flag name) then

feature is active

else

feature is inactive

• Feature flag value guaranteed to be consistent across

services.

• Latency is low (order of micro seconds) since

Zookeeper keeps data structures in memory. 93

Page 94: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Summary of tutorial

• DevOps practices lead to requirement to

minimize inter team coordination

• Continuous deployment has no human

intervention from developer commit until

deployment to production

• Micro SOA architectural style determines or

delegates 5 of 7 design decision categories

• Deployment strategies raise issues of

consistency. Separation of installation and

activation enables turning features on or off.

• Zookeeper is one tool to manage synchronizing

the activations of features. 94

Page 95: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

NICTA Team

• Anna Liu

• Alan Fekete

• Min Fu

• Daniel Sun

• Hiroshi Wada

• Ingo Weber

• Xiwei Xu

• Liming Zhu

95

Page 96: Design consequences of dev ops practices

NICTA Copyright 2012 From imagination to impact

Readings

• http://www.slideshare.net/lenbass/what-is-dev-ops-for-

review

• http://www.slideshare.net/lenbass/02-team-practcies-

and-overall-architecture

• http://www.slideshare.net/lenbass/03-build-structure-

and-testing

• NICTA research papers.

https://ssrg.nicta.com.au/projects/cloud/

96