parallelizing ci using docker swarm-mode · 2017. 12. 14. · solution: parallelize ci across...
TRANSCRIPT
Copyright©2017 NTT Corp. All Rights Reserved.
Akihiro Suda <[email protected]>
NTT Software Innovation Center
Parallelizing CI using Docker Swarm-Mode
2Copyright©2017 NTT Corp. All Rights Reserved.
https://github.com/AkihiroSuda
• Software Engineer at NTT Corporation
• Docker core maintainer
• Previous talks at FLOSS community
• FOSDEM 2016
• ApacheCon Core North America 2016
• ...
Who am I
3Copyright©2017 NTT Corp. All Rights Reserved.
A problem in Docker project: CI is slow
https://jenkins.dockerproject.org/job/Docker-PRs/buildTimeTrend
capture: March 3, 2017
120 min
4Copyright©2017 NTT Corp. All Rights Reserved.
How about other FLOSS projects?
https://builds.apache.org/view/All/job/Pr
eCommit-HDFS-Build/buildTimeTrendhttps://builds.apache.org/view/All/job/Pr
eCommit-YARN-Build/buildTimeTrend
https://amplab.cs.berkeley.edu/jenkins/j
ob/SparkPullRequestBuilder/buildTimeTr
end
capture: March 3, 2017200 min
260 min
240 min
https://grpc-
testing.appspot.com/job/gRPC_pull_req
uests_linux/buildTimeTrend
550 min
(for ease of visualization, picked up some projects that use Jenkins from various categories)
5Copyright©2017 NTT Corp. All Rights Reserved.
• Blocker for reviewing/merging patches
• Discourages developers from writing tests
• Discourages developers from enabling additional testing features (e.g. race detector)
Why slow CI matters?
Poor implementation quality &Slow development cycle
6Copyright©2017 NTT Corp. All Rights Reserved.
e.g.
• `go test –parallel N`
• `mvn –-threads N`
• `parallel`
• ...
But parallelization is not enough
• No isolation
• Concurrent test tasks may race for certain shared resources(e.g. files under `/tmp`, TCP port, ...)
• Poor scalability
• CPU/RAM resource limitation
• I/O parallelism limitation
Solution: Parallelize CI ?
7Copyright©2017 NTT Corp. All Rights Reserved.
Solution: Parallelize CI across Docker Swarm-mode
Distributeacross Swarm-mode
Parallelize & Isolateusing Docker containers
✓Isolation ✓Scalability
• Docker provides isolation
• Docker Swarm-mode provides scalability,plus allows you to manage multiple Docker nodes as if they were a single node
8Copyright©2017 NTT Corp. All Rights Reserved.
• For ideal isolation, each of the test functions should be encapsulated into independent containers
• But this is not optimal in actual due to setup/teardown code
Chunking
func TestMain(m *testing.M) {
setUp()
m.Run()
tearDown()
}
func TestFoo(t *testing.T)
func TestBar(t *testing.T)
redundantly executed
setUp()
testFoo.Run()
tearDown()
container
setUp()
testBar.Run()
tearDown()
container
9Copyright©2017 NTT Corp. All Rights Reserved.
• So we execute a chunk of multiple test functions sequentially in a single container
Chunking
setUp()
testFoo.Run()
tearDown()
container
setUp()
testBar.Run()
tearDown()
container
setUp()
testFoo.Run()
testBar.Run()
tearDown()
container
chunk
10Copyright©2017 NTT Corp. All Rights Reserved.
• makespans of the chunks are not uniform
• So we shuffle the chunks for "hope" of optimal scheduling
• No guarantee for optimal schedule
Shuffling
Test1
Test2
Test3
Test4
Test4 Test1
Test2
Test3
Test1
Test2Test3
Test4shufflelongest
makespannonuniformity
decrease inmakespan
11Copyright©2017 NTT Corp. All Rights Reserved.
• We use Funker (github.com/bfirsh/funker)
• FaaS-like architecture
• Loads are automatically balanced via Docker's built-in LB
• No explicit task queue
• No `docker exec` hack
• Could be easily portable to other container orchestrators
Implementation
masterBuilt-in
LB
Client
worker.2
worker.1
worker.3
Funker!
Docker Swarm-mode cluster(typically on cloud)
(typically on laptop)
12Copyright©2017 NTT Corp. All Rights Reserved.
Experimental result
• Evaluated my hack against the CI of Docker itself• Of course, this hack can be applicable to CI of other software as well
• Target: Docker 16.04-dev (git:7fb83eb7)• Contains 1,648 test functions
• Machine: Google Computing Enginen1-standard-4 instances (4 vCPUS, 15GB RAM)
1h22m7s(traditional testing)
4m10s
1 node
10 nodes
Cost: x10
Effectiveness: x20
Time: average of 5 runs
13Copyright©2017 NTT Corp. All Rights Reserved.
Detailed result
Even with a single node,
the result is significantly better than
the traditional testing (1h22m7s)
Nodes 10 30 50 70
1 15m3s N/A(more than 30m)
2 12m7s 10m12s 11m25s 13m57s
5 10m16s 6m18s 5m46s 6m28s
10 8m26s 4m31s 4m10s 4m20s
Number of containers running in parallel(chunk size is inversely proportional to this)
Fastest
configuration
14Copyright©2017 NTT Corp. All Rights Reserved.
PR (merged): https://github.com/docker/docker/pull/29775
The code is available!
$ cd $GOPATH/src/github.com/docker/docker
$ make build-integration-cli-on-swarm
$ ./hack/integration-cli-on-swarm/integration-cli-on-swarm ╲
-replicas 50 ╲
-shuffle ╲
-push-worker-image your-docker-registry.example.com/worker
15Copyright©2017 NTT Corp. All Rights Reserved.
• Yes, with just a few line of modifications
• Planning to split this hack into an independent generic test tool
Is it applicable to other software as well?
16Copyright©2017 NTT Corp. All Rights Reserved.
• "Shuffling" does not always result in optimal schedule
• Future work: learn the past execution history to optimize the schedule
Future work
17Copyright©2017 NTT Corp. All Rights Reserved.
• Docker Swarm-mode is effective for parallelizing CI jobs
• Some hacks for optimizing schedule
Recap
1h22m7s(traditional testing)
4m10s
1 node
10 nodes
Cost: x10
Effectiveness: x20