ship happens: a better firefox build and release pipeline

Kim Moir (kmoir), Mozilla Release Engineering

Ship Happens:A better Firefox build & release pipeline

“I am notorious for making impassioned speeches

about things nobody cares about.”

― Mindy Kaling, Why Not Me?

https://www.goodreads.com/author/show/194416.Mindy_Kaling

https://www.goodreads.com/author/show/194416.Mindy_Kaling

https://www.goodreads.com/work/quotes/41897766

https://www.goodreads.com/work/quotes/41897766

Today’s agenda

● Faster pipelines and what they mean for you

● How to try it yourself!

● Lessons learned and what’s next

Mozilla Releng live here

Release times

● 2013 - 11 hours

● 2017 - 4-5 hours

Continuous integration

Land code

Unit tests

Decision

graph

Builds x N

platforms

Performance

tests

Sign Builds

Nightlies

Land code

Unit tests

Decision

graph

Builds x N

platforms

Performance

tests

Sign Builds

Generate

updates

L10n

Release process using release promotion

Use existing

build

artifacts

Generate

updates

L10n

Unit tests

Decision

graphSign Builds

Performance

tests

Repackage

Builds

+

Move

artifacts

Refresh

update db

rules

Update

websites

with release

About:Taskcluster

● Taskcluster is a task execution framework that supports Mozilla’s continuous

integration farm + release pipeline

It is a set of components that manages task queuing, scheduling, execution and

provisioning of resources.

Why: In-tree and Decision Graph

● Build and test configs are all in tree

○ Good news: Developer autonomy

○ Bad news: Developer autonomy

● Decision graph upon push identifies failures more quickly

● Changes can be tested locally and on try

Testing the graph locally

● Generates the full taskgraph.

○ ./mach taskgraph full > full.txt

● Generates an optimized taskgraph

○ ./mach taskgraph optimized > full.txt

● Generates a target taskgraph

○ ./mach taskgraph target -p parameters.yml > target.txt

● Generates a target taskgraph with json to inspect content of graph

○ ./mach taskgraph target --json -p parameters.yml > target.txt

● Taskcluster config files are under taskcluster/ in tree

○ Example: taskcluster/ci/build/macosx.yml defines mac builds (which

actually run on Linux)

Changing tests

● YAML files in taskcluster/ci/test/ files define tests groups by suite name - e.g.

mochitest, reftest, talos etc

Why: Docker Containers

● Docker containers for test and build images (not all platforms)○ Consistent environment to debug build and test failures via one click loaners

○ More self-serve developer loaners

Why: More autoscaling

● Moved more platforms to AWS enable autoscaling in response to bursty load

○ Moved Macosx builds to Linux cross-compile on AWS

○ Moved many Windows builds/tests to AWS

Why: More security

● Better security - Chain of Trust (CoT) between artifacts as they are built,

signed and moved to AWS S3/CDNs for download on releases/nightlies

● CoT is the security model for releases

● Task execution is restricted by taskcluster scopes, but that is only one type of

authentication

● CoT allows us to trace requests back to the tree and verify each previous task

in the chain.

● If CoT fails, the task is marked as invalid

Why+?

● Team learned new things - Docker, transforms, migration strategies,

microservices, monitoring

● Future efficiencies - allow us to continue to scale

● Migrate off technologies that did not scale to our needs

● Re-evaluate existing jobs: Are they still needed? Could they be improved?

Timeline for migration

● Jan 20 - Linux Desktop and Android Firefox nightly builds from Taskcluster

● Mar 13 - Mobile beta in Taskcluster

● July 2 - Mac Nightlies in Taskcluster

● Aug 30 - Windows nightlies in Taskcluster

● Nov 14 - Shipped Firefox Quantum in Taskcluster

https://docs.taskcluster.net/

Approach to migration

● Incremental portions of pool

● Communication

● Checklist

● Monitor capacity and wait times

● Monitor state after migration

● Rollback plan

● Decommission old

● Migrate more

Strangler Application - Martin Fowler

56 was a rough release

● We had many automation changes

○ New compression format for updates

○ Watersheds for win32->win64 migration for people on 64 bit hardware

○ Win32/Win64 on taskcluster

Operation: Don’t F*ck up 57

● Implement missing release automation

● Fix our staging environment

● Smooth our merge day process

● Train team members on merges and staging releases

● Run staging releases and merges to iron out any issues

before 57 releases

● Write tests to validate update rules for 57

● Spreadsheet to coordinate update rules with relman

What have we learned?

● Incrementalism - change one thing, evaluate, then change

another

● Expectations change. The faster we build, the faster other

groups expect to be able to ship

● Staging environment is important to test new automation

● Communication

● Organizational changes

● Consider the operational side, not just landing code

Upcoming work

● In tree release promotion for beta and release builds

● Release process optimizations: measure our release end-

to-end times, common failure points with the aim of

providing more predictable and stable releases

● Staging releases on try

● More incremental fixes to make things faster

I embrace mistakes, they make you who you are

―Beyoncé

Questions?

Additional Reading

● Justin Wood’s (Callek’s) talks on transforms

https://gitpitch.com/Callek/slideshows/transforms_2017

● All your nightlies are belong to Taskcluster

https://atlee.ca/blog/posts/migration-status.html

● Nightly builds from Taskcluster https://atlee.ca/blog/posts/nightly-builds-from-

taskcluster.html

● 2016 retrospective https://atlee.ca/blog/posts/2016-releng-retrospective.html

● What's So Special About "In-Tree?"

http://code.v.igoro.us/posts/2016/08/whats-so-special-about-in-tree.html

https://gitpitch.com/Callek/slideshows/transforms_2017

https://atlee.ca/blog/posts/migration-status.html

https://atlee.ca/blog/posts/nightly-builds-from-taskcluster.html

https://atlee.ca/blog/posts/2016-releng-retrospective.html

Additional Reading

● Chris Cooper Nightlies in Taskcluster

http://coopcoopbware.tumblr.com/post/156133487075/nightlies-in-taskcluster-

go-team

● Chris Cooper Mobile Betas in TC

http://coopcoopbware.tumblr.com/post/158362146735/shameless-self-

release-promotion-firefox-530b1

● So you want to rewrite that - Camille Fournier, GOTO conference, Chicago,

2014 https://www.youtube.com/watch?v=PhYUvtifJXk

http://coopcoopbware.tumblr.com/post/158362146735/shameless-self-release-promotion-firefox-530b1

https://www.youtube.com/watch?v=PhYUvtifJXk

ship happens: a better firefox build and release pipeline

Technology