automating the quality

55
Dejan Vukmirović Belgrade, 2016 Automating the Quality

Upload: dejan-vukmirovic

Post on 08-Jan-2017

148 views

Category:

Engineering


2 download

TRANSCRIPT

Page 1: Automating the Quality

Dejan VukmirovićBelgrade, 2016

Automating the Quality

Page 2: Automating the Quality

A bit of context…

Global ticket sales and distribution company.A cliche, but the global leader in it’s line of business. Large IT operation.Engineering HQs in Los Angeles and London. More than 150 platforms/products.Both legacy stuff and edge technologies.

TicketmasterBelgrade based IT company.Ticketmaster’s development centre.Currently around 50 people, only engineering.Mainly Java projects.Strong in local Scala community.

Bakson

Page 3: Automating the Quality

A bit of context…

Why emphasise the “quality” ?

Page 4: Automating the Quality

A bit of context…

Each high priority production bug (Business Disruption) can be directly linked to and measured in money loss.

Bug? Fans can’t purchase the tickets.

Bug? Fans can’t enter the venue.

Because the Business people

Page 5: Automating the Quality

A bit of context…

Because the fansEntire Adele’s European tour was sold out in two days, in less than 15 minutes per day.

Huge success. But…

Page 6: Automating the Quality

A bit of context…

Page 7: Automating the Quality

A bit of context…

How can DevOps help teams?And how to move “there”?

Page 8: Automating the Quality

A bit of context…

Last phase targets: Canary release, Chaos Monkey, etc.

DevOps Maturity Model

Company wide initiative.Assessed by Gartner.18 categories - “Deployment”, “Support”, etc.

Products are required to “move” through the matrix.Progress is constantly evaluated.

Additional benefits: standardisation, guidance.

Page 9: Automating the Quality

Public API

HTTP service. Not RESTful.Close to 100 endpoints/actions.

2 years live in production.

Development + QA team size = 10 people

Page 10: Automating the Quality

Public API

Distributed architecture (microservices).Java stack.

Storages: relational, NoSQL, search engines…

APIGEE as management layer.

Each microservice has it’s on source code repository.

Page 11: Automating the Quality

Issues list

A week of testing upon release development is completed.Long lasting regression campaign1

Only going to shared environment after entire release is developed/completed.Late integration with clients2

Variety of tools. Or even manual. Procedures differ from env to env.Non-standardised deploy procedures3

Automated testing on entire release, also clients are testing only entire release build.Difficult to pinpoint a root cause of broken functionalities4

Page 12: Automating the Quality

The goal

Start automation on feature completion (code pushed to repository)

RunUnit

Tests

Do StaticCode

Analysis

Build& Save

PackageDeploy

CheckServiceStatus

Run Integration

Tests

Send Reports

Page 13: Automating the Quality

Tool - Gitlab

Git repository management tool.

Many additional features: code review, continuous integration, deploy…

On premise or SaaS.Free and Commercial editions.

In our flows first point since via webhooks, upon code push, the next tool in the flow is triggered.

(Note: our first AWS-based service is utilising CI on the Gitlab. But that is WiP.)

Page 14: Automating the Quality

Continuous Integration

Start automation on feature completion (code pushed to repository)

RunUnit

Tests

Do StaticCode

Analysis

Build& Save

PackageDeploy

CheckServiceStatus

Run Integration

Tests

Send Reports

Continuous Integration (CI) is a development practice that requires developers to integrate code into a shared repository several times a day. Each check-in is then verified by an automated build, allowing teams to detect problems early.

- Martin Fowler

Page 15: Automating the Quality

Tool - Jenkins

Automation server. Gets additional power from numerous plugins available.

Open source. Available only as on premise.

Main unit is “job”.

In TM jobs can be created only through code repository. Creation via GUI is disabled.Two configuration XML files are part of the application code.Reasoning: - (distributed) versioning - easy to restore in case of issues with Jenkins server - easy to migrate between Jenkins instances

Page 16: Automating the Quality

• JENKINS Screenshot

• Demo ?

Page 17: Automating the Quality

Tool - SonarQube

Platform for continuous inspection of code quality.

More than 20 programming languages are covered.

Open source. Available only as on premise.

Some TM teams are failing Jenkins job on code quality violations.

API team is reviewing reports per Sprint/Release

Using FindBugs as a plugin.

Page 18: Automating the Quality
Page 19: Automating the Quality
Page 20: Automating the Quality

Tool - Nexus

Artifact repository.

Free or Commercial. Available only as on premise.

OOTB providing support for multiple platforms (Java, NPM, Docker…).

TM instance is locked for manual upload of artifacts.Only Jenkins instances can upload, through predefined Release plugin.

Support for release process, only “promoted” artifacts are available for Production deploy.

API team is reviewing reports per Sprint/Release.

Page 21: Automating the Quality
Page 22: Automating the Quality

CI is completed

When to run integration tests?Where to run integration tests?

Page 23: Automating the Quality

GitFlow

Branching model, introduced by Atlasssian.

Feature/task is merged to “develop” on completion (as by “Definition of Done”).

“Release” branch is created on demand.

“Release” is merged to “master” when ready for production.

This helps answering “When?”. On merge to “develop”.

Page 24: Automating the Quality

Where?

The major problem……is not developing tests…it’s not creating environments …it’s not even about automating the whole thing.

IT’S ALWAYS DATA.

Page 25: Automating the Quality

Data setup

Because you (usually) can’t control data in your dependencies.

Easier to initially develop.

Difficult to maintain. Tracking evolution of dependencies.

Allows easier setup of testing environemnts.

Use mocksIt allows testing in “real” environment.

Difficult to initially develop.

Easier to maintain, since owners of your datawill have to migrate it together with rest of their data.

Permanent data sets

We decided to go with permanent sets!There is a creation tool available on TM backends.

Page 26: Automating the Quality

API environments

DEVs TPI Production(s)QAs Stage

CAP

Stage and Production have SLAs defined.

Mapping to Gitflow: “develop” -> TPI, “release” -> Stage, “master” -> Prod

Page 27: Automating the Quality

Where?

Each service (should) have it’s own integration tests.

Test everything !!!

But for API it is crucial that on Gateway “everything works”.

Page 28: Automating the Quality

Tool - Rundeck

Tool for runbook automation and execution of arbitrary management tasks.

Open source. Available only as on premise.

Is Rundeck even needed if you already use Jenkins?

- “Rundeck is made for Operations and knows about the details of your environments.”

- “Jenkins is fundamentally not a deployment tool, although it can be used like one.”

Page 29: Automating the Quality

QA framework

Separate project. Own source-code repo.

Implemented in Java. Maven project.Uses standard HTTP clients and Java testing libs (JUnit, TestNG).

Used for functional testing.Blackbox testing of our services (no DB access, log checks…)

Smoke suite: ~1.000 tests, ~5 mins to executeRegression suite: ~10.000 tests, ~35 mins to execute

Every feature or bug we ever had is included in the regression suite.We are constantly supporting 2 API versions with test suites covering both.

Page 30: Automating the Quality

Implementation issues…

- New feature branching-out will result with identical copy of Jenkins XML configs.- Jenkins plugins have limited support for conditional executions in some phases.

Limit Jenkins “job” only to be executed from “develop”1

- Another set of conditionals/variables to be set/passed between jobs.QA “job” only to be triggered by service’s “develop”2

- Only way to cover all cases/features is to always deploy and test all service.Know services that are involved in feature3

Page 31: Automating the Quality

Try with job chaining

Standard Jenkins "freestyle" jobs support simple sequential tasks execution.

Doesn’t work in our case.- Git triggers would result in service restarts while test execution is active.

Additional idea was to introduce additional branch so that entire flow would not be triggered from “develop”.- Additional work/thinking required from developers.- Where to place “signal” that would trigger entire flow?

Page 32: Automating the Quality

Try with plugins

“Closest” to what to we need found in “JobFanIn” plugin.

This plugin provides a watch on upstream projects to trigger downstream projects once all upstream projects are successfully build.

Doesn’t work in our case.- Impossible to predict on which services will feature reside.

Page 33: Automating the Quality

Step back. Rethink.

Do we really need to deploy and test everything always?Does this approach actually fits microservices architecture?

LETS SIMPLIFY.For each service only deploy and test itself.

Yes, developers will need to do additional thinking when finishing feature that spans over multiple services.

Page 34: Automating the Quality

Testing agreement

On merge to develop (as by Gitflow).

Deploy to live environment - TPI.

Use permanent data sets.

Each micro service (and gateway) will have accompanying QA framework.Upon service deploy execute it.

If feature is on multiple microservices it will be on developers to sequence the testing.

Page 35: Automating the Quality
Page 36: Automating the Quality
Page 37: Automating the Quality

The goal

Start automation on feature completion (code pushed to repository)

RunUnit

Tests

Do StaticCode

Analysis

Build& Save

PackageDeploy

CheckServiceStatus

Run Integration

Tests

Send Reports

Page 38: Automating the Quality

Deploy validation

Via healthchecks. Internally exposed HTTP endpoints that provide summary of dependencies’ and internal statuses.

Every product must implement this TM standard.

Response must be quick.Healthcheck status is composed by background job.

Healthchecks are used in monitoring,and by load-balancers.

Rundeck/Jenkins will fail job if healthcheck is negative.

Page 39: Automating the Quality
Page 40: Automating the Quality

The Question!

Is this we are doing the Continuous Delivery?Or maybe Continuous Deployment?

Page 41: Automating the Quality

CD vs CD

Continuous Delivery is about keeping your application in a state where it is always able to deploy into production.

Continuous Deployment is actually deploying every change into production, every day or more frequently.

- Martin Fowler

Page 42: Automating the Quality

CD vs CD

Why not all the way to Production?

We (API) are only the half-product. - Vanja Radaković (Product Manager)

Even if all tests on API pass that doesn’t mean no functionality is broken on our clients.

We “sit” a week in Stage env, for sign-off from major clients,between when release is ready and actually deployed to Production.

DEVs TPI Production(s)QAs Stage

Page 43: Automating the Quality

Automating the security

Veracode is platform for application security scanning.Commercial. Available only as SaaS.

We have added a branch that (via GitLab and Jenkins)automatically uploads artifacts to Veracode.

Due to long-lasting scan this is not includedin regular flow on feature completion.

There are company-wide defined policies.

We are reviewing status once per Sprint/Release.

Page 44: Automating the Quality
Page 45: Automating the Quality

Performance testing (WiP)

Running on dedicated environment.Same topology (num. of servers) and data size as in production.

Our production data is imported on demand.

All of backend dependencies are mocked due to difficulties to provision data.TPI we use for functional testing contains inconsistent and not-big-enough data.

Mocks are based on or logs from production.

API mocking tool - WireMock.

What if in need to mock something other than HTTP API, like storage? Rethink your architecture.

Page 46: Automating the Quality

Tool - Gatling

Load testing framework.Open source.

Supports code written in Scala or Java.

Can be executed from command line.

Easy to integrate with Jenkins using the official plugin.

Page 47: Automating the Quality
Page 48: Automating the Quality
Page 49: Automating the Quality

Performance testing ideas

Automate in a way similar to security scanning - new branch.

Jenkins to build.Rundeck to deploy.Gatling to execute tests.

Bonus: Attach APM tooling that would provide insights during testing.Currently evaluating New Relic and Ruxit.

Page 50: Automating the Quality

Logging

Company standards to separate logs:• application log• payload log (inbound/outbound)• performance log

Only application logs are indexed.Others are available on servers for N days (depending on retention policy).

Unique “Correlation ID” that allows tracking of requests through multiple services and all type of log files.

Page 51: Automating the Quality

Tool - Splunk

Platform for operational intelligence. Much more than log aggregation (searching, monitoring and vizualization).

On premise or SaaS.Free and Commercial editions.

Our dashboards: relationships between HTTP errors (not application errors) and clients.

Our alerting: on detected deviation/increase in volume of errors.

Page 52: Automating the Quality
Page 53: Automating the Quality
Page 54: Automating the Quality

Benefits we (Dev team) got

Less thinking for developers.Quicker test and feedback cycles.

Automation on “feature completion”.1

Feeling very comfortable during production deploys.Using same tools for all environments2

Being able to react quickly. Or even do preemptive actions.Visibility of changes and metrics3

No need to “reinvent the wheel”.Shared knowledge. Contributing to solutions.

Company initiatives as guidance4

Page 55: Automating the Quality

Feel free to contact us:[email protected]

Thanks for listening!