google, quality and you

37
Google, Quality & You

Upload: nelinger

Post on 08-Apr-2017

102 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Google, quality and you

Google, Quality & You

Page 2: Google, quality and you

Agenda1. Why we care about test automation2. Test sizes and test hermeticity3. Deflake strategies4. A tale of large test

Page 3: Google, quality and you

Codebase - as of Jan 2015

Number of files 1 billion

Number of source files

9 million

Lines of code 2 billion

Depth of history 35 million commits

Size of contents 86 terabytes

Commits per workday 45 thousand

Source: https://www.youtube.com/watch?v=W71BTkUbdqE

Page 4: Google, quality and you

Codebase - as of Jan 2015

Number of files 1 billion

Number of source files

9 million

Lines of code 2 billion

Depth of history 35 million commits

Size of contents 86 terabytes

Commits per workday 45 thousand

● 15 million lines in 250 thousand files changed per week, by humans. This is the same number of LOC as in the Linux Kernel

Source: https://www.youtube.com/watch?v=W71BTkUbdqE

Page 5: Google, quality and you

Codebase - as of Jan 2015

Number of files 1 billion

Number of source files

9 million

Lines of code 2 billion

Depth of history 35 million commits

Size of contents 86 terabytes

Commits per workday 45 thousand

● ⅔ of these 45k commits are done by robots

Source: https://www.youtube.com/watch?v=W71BTkUbdqE

Page 6: Google, quality and you

Codebase - as of Jan 2015

Number of files 1 billion

Number of source files

9 million

Lines of code 2 billion

Depth of history 35 million commits

Size of contents 86 terabytes

Commits per workday 45 thousand

● 800k read requests per second at daily pick

Source: https://www.youtube.com/watch?v=W71BTkUbdqE

Page 7: Google, quality and you

Build System● Most engineers build large parts of the codebase

many times a day● Everything is built from head

http://www.bazel.io/docs/be/c-cpp.html

cc_library(name = 'search', hdrs = ['search.h'], srcs = ['search.cc'], deps = ['//index:query'],)

cc_test(name = 'search_test', srcs = ['search_test.cc'], deps = [':search'],)

Page 8: Google, quality and you

Runtime dependencies

● Different push cycles● Frequent push cycles

Page 9: Google, quality and you

Runtime dependencies

● Different push cycles● Frequent push cycles

MyBinary

Library

Page 10: Google, quality and you

Runtime dependencies

● Different push cycles● Frequent push cycles

MyBinary

Library

OtherBinary

RPC

Page 11: Google, quality and you
Page 12: Google, quality and you

What do you call a test that tests your application through its UI?

Page 13: Google, quality and you

What do you call a test that tests your application through its UI?

Ui test

Integration test

Functional test

Regression test

Black box test

Selenium/Webdriver test

E2E test

Release test

Validation test

...

Page 14: Google, quality and you

Just what, exactly, is an integration test? A unit test? How do we name these things?

Page 15: Google, quality and you

Different tests have (very) different properties, and it is important to have a common language to talk about tests (example == scheduling)

Page 16: Google, quality and you

At Google, we like to make decisions based on data, rather than just relying on gut instinct or something that can’t be measured and assessed

Over time we’ve come to agree on a set of data-driven naming conventions for our tests. We call them “Small”, “Medium” and “Large” tests

Page 17: Google, quality and you

Small test

A unit test. Tests a class or a function

Specific logic conditions; heavy use of mocks, stubs and fakes

(blazingly) Fast (test runner is expected to kill fast)

Opens no external ports

No sleep statements

Single threaded, no async flows, no race conditions

Run frequently - while editing code

Page 18: Google, quality and you

Medium test

Interaction of one or more modules on a single machine

Less mocking

Slower (test runner)

Limits network service to the localhost

Permits sleep statements

Uses lightweight tools such as in-memory databases to improve performance

Multiple threads, async flows

(Aimed to be) run before submitting code, along with small tests

Page 19: Google, quality and you

Large test

A system test, integration test, or end-to-end test that verifies that a complete application works and accounts for the behavior of external subsystems

Exercises any or all application subsystems and may make use of external resources such as databases, file systems, and network services

Slooooooooooooooooooooooooooooooow

External dependencies

Multiple threads, multiple processes, even multiple machines

Run as frequent as possible, but definitely cannot run on a presubmit queue

Page 20: Google, quality and you

Requirements common to all sizes

Each test must be independent from other tests; tests must be runnable in any order

Tests must not have any persistent side effects. They must leave their environment as it was before they started

Page 21: Google, quality and you

Test hermeticity

Hermetic tests can be run if you unplug the network cable

non-Hermetic tests are inherently flaky - don’t try to fix these

Larger tests tend to be less hermetic

Page 22: Google, quality and you

You should not think about small tests the same way you think about large tests

Page 23: Google, quality and you

The Test Pyramid

Page 24: Google, quality and you

Small tests

Write a lot of them

Maximize code coverage

Run as much as possible

Do invest in faking out dependencies

Sharding and parallelization

Block submit on failures

Do not allow flakes in

Page 25: Google, quality and you

Small tests

Write a lot of them

Maximize code coverage

Run as much as possible

Do invest in faking out dependencies

Sharding and parallelization

Block submit on failures

Do not allow flakes in

Page 26: Google, quality and you

Flakiness @Google

We heavily rely on (small) tests

Literally, without these we wouldn’t be able to scale up with a single monolithic repository

We define a "flaky" test result as a test that exhibits both a passing and a failing result with the same code (@ the same version)

Root causes: concurrency, relying on non-deterministic or undefined behaviors, flaky third party code, infrastructure problems, rendering, gpu and animations

Across our entire corpus of tests, we see a continual rate of about 1.5% of all test runs reporting a "flaky" result

Page 27: Google, quality and you

Flakiness @Google

Our continuous build systems understand when a test has transitioned from a passing state to failure

If we had no flakes, we can automatically, efficiently and reliably find the culprit (binary search)

If so, we can automatically roll it back

Page 28: Google, quality and you

Deflake strategies

Mark a test as flaky (ok if passes 1 out of 3)

Each time a test is failed it is being executed again in the background; if it passes, the original run is designated as flaky

The flake probability is the number of times the test failure was a flake over the total number of test passes for the particular test. It is the historic ratio of a test's flakes to its passes

We use this metric to automatically quarantines flaky tests

Another tool detects changes in the flakiness level of tests and works to identify the change that caused the test to change the level of flakiness

Other approach: new tests get quarantined by default

Page 29: Google, quality and you

A tale of large tests

MyBinary

Library

OtherBinary

RPC

Page 30: Google, quality and you

A tale of large tests

MyBinary

Library

OtherBinary

RPC

Page 31: Google, quality and you

A tale of large tests

MyBinary

Library

OtherBinary

RPC

Page 32: Google, quality and you

A tale of large tests

MyBinary

Library

OtherBinary

RPC

Page 33: Google, quality and you

Test

A tale of large tests

MyBinary

Library

LocalOtherBinary

RPC

Page 34: Google, quality and you

ProductionTest

A tale of large tests

MyBinary

Library

RealOtherBinary

RPC

Page 35: Google, quality and you

ProductionPreProduction

A tale of large tests

MyBinaryCandidate

LibraryRPC

Test

(Per candidate)RealOtherBinary

Page 36: Google, quality and you

ProductionPreProduction

A tale of large tests (last slide)

MyBinaryCandidate

LibraryRPC

Test

● Runs continuously against different configurations

● Statistical approach (probers)

● Distributed geographically

● Alerts with respect to configuration importance

RealOtherBinary

Page 37: Google, quality and you