how to reduce database load with sparse branches
TRANSCRIPT
How to Reduce Database Load with Sparse Branches John LoVersoSoftware Archeologist
2
MathWorks
We are a 3500+ person company dedicated to accelerating the pace of engineering and science
We have ~90 products based upon our core platforms• MATLAB – The Language of Technical Computing• Simulink – Simulation and Model-Based Design
3
Technical Challenges
Unified code base from which full product family is released twice a year
Managing an over 1 million file code base• 5,000+ components, acyclic dependencies
Integrating changes from ~1500 developerson ~270 active branches (streams)
4
Multilevel Streams Hierarchy
//mw/main
//mw/integ2//mw/integ1
//mw/product1 //mw/product5
5
Merge Down and Merge Up
Build and Test infrastructure blesses submitted changes• Qualify changes already submitted to the branch
Merge down from the last blessed change-level• Almost never merge from the tip
Cannot copy up; must merge
6
Componentized Streams
Our branches are built out of components We use pairs of stream specs for each branch
For details, see• Moving 1000 Users and 100 Branches Into Streams
- from MERGE2014• Outsmarting Merge Edge Cases in Component Based Development
- from MERGE2016
7
Pair of Streams
Wide-open development stream specsStream: //mw/productParent: //mw/integPaths: share …
Virtual streams computed from component informationStream: //mw/product~CTBParent: //mw/productPaths:
share matlab/toolbox/prodA/...share matlab/toolbox/prodB/...
8
Average Week
ChangesFiles
Min Average Max TotalNew Work 3328 1 8 52 25641
Merges 610 1 1890 39305 1153239
Changes submitted over one week in our main product depot
9
2015.1 Merge Meltdown
Many regressions in integration engine Lost weeks of developer time All because we merge all the time
10
What If We Didn’t Need to Merge?
At any time, the number of files on a branch with unique changes is small
The rest of the files are the same as on the parent branch
11
What is Sparse Branching?
Sparse branching, a.k.a. lightweight or just-in-time branching, is a strategy where files are only branched when modified and are otherwise just a reference to a file on the corresponding parent branch
Initial creation of a sparse branch is an O(1) operation akin to creating a clone or snapshot in a copy-on-write filesystem like ZFS
12
Why not simply use Task Streams?
They are database expensive• Order(n) rather than Order(1) to create• They consume database space until deleted
They require unnecessary merging in order to be kept up-to-date with their parent• Still exposes the user to the vagaries of complex merges
They have limitations• You can only merge from their parent• Can’t recreate them if they are destroyed• No virtual stream support
13
Some Terms
Winked-in file• a file mapped to a revision on an ancestor branch (lazy copy)
Active file• a file with changes that have not yet propagated to the parent branch
Make concrete• the act of branching a winked-in file in order to make it active
14
Our Approach to Sparse Branches
A sparse branch is defined by the stream Paths:• Winked-in files use “import path@change” to map the paths from an
ancestor branch• Files are made active on the branch by inserting “share” lines for each file
after populating it on the sparse branch Moving ahead to a new parent change involves
• Advancing the change number in the “import path@change” directives• Merging changes down from the parent into active files
15
Pair of Streams
Wide-open development stream specsStream: //mw/productParent: //mw/integPaths: share …
Virtual streams computed from component informationStream: //mw/product~CTBParent: //mw/productPaths:
share matlab/toolbox/prodA/...share matlab/toolbox/prodB/...
16
Pair of Streams – Sparse Branch
Wide-open development stream specsStream: //mw/productParent: //mw/integPaths: share …
Virtual streams computed from component informationStream: //mw/product~CTBParent: //mw/productPaths: import matlab/toolbox/prodA/... //mw/integ/matlab/toolbox/prodA/…@1000 import matlab/toolbox/prodB/... //mw/integ/matlab/toolbox/prodB/…@1000
17
Pair of Streams – Sparse BranchWith Active Files Wide-open development stream specs
Stream: //mw/productParent: //mw/integPaths: share …
Virtual streams computed from component informationStream: //mw/product~CTBParent: //mw/productPaths: import matlab/toolbox/prodA/... //mw/integ/matlab/toolbox/prodA/…@1000 import matlab/toolbox/prodB/... //mw/integ/matlab/toolbox/prodB/…@1000 share matlab/toolbox/prodA/file1
18
Multi-level Sparse Hierarchies
Multi-level sparse hierarchiesStream: //mw/integ~CTBParent: //mw/integPaths: import matlab/toolbox/prodA/... //mw/main/matlab/toolbox/prodA/…@980 import matlab/toolbox/prodB/... //mw/main/matlab/toolbox/prodB/…@980 share matlab/toolbox/prodB/file2
Stream: //mw/product~CTBParent: //mw/productPaths: import matlab/toolbox/prodA/... //mw/main/matlab/toolbox/prodA/…@980 import matlab/toolbox/prodB/... //mw/main/matlab/toolbox/prodB/…@980 import matlab/toolbox/prodB/file2 //mw/integ/matlab/toolbox/prodB/file2@1000 share matlab/toolbox/prodA/file1
20
Transparent to the User
Once a sparse branch is created, user commands should be entirely agnostic to the nature of the branch• add/edit/delete/move/unshelve/merge should all just work
Updating to a newer change level of the parent is special• merge + revert of newly branched files
21
But what about new work?
We have explored two approaches:
Branch On Edit• Just-in-time branching of a file as soon as “p4 edit” happens
Branch On Submit• Files are opened on the parent branch and are only branched at submit
time
22
Branch On Edit
Has the benefit of being easier to implement Broker wrapper to intercept operations that open files
• edit, delete, move, open, unshelve, merge• Compute all the files about to be operated on• Invoke ‘p4 populate’ to make concrete revisions on the sparse branch of
winked-in files• Invoke ‘p4 flush’ to switch have revision• Let operation complete to affect opened revision
Trigger (change-content) to add files to active list on submit
23
How It Looks
$ p4 stream -t sparse_v1 -P //mw/robotics@1705498 //pdb/jloverso/robotics/demoStream //pdb/jloverso/robotics/demo created.
$ p4 stream -o //pdb/jloverso/robotics/demo~CTB | tail -2 import matlab/toolbox/robotics/... //mw/robotics/matlab/toolbox/robotics/...@1705498
$ p4 files //pdb/jloverso/robotics/demo/...//pdb/jloverso/robotics/demo/... - no such file(s).
24
How It Looks - Sync
$ p4 client -s -S //pdb/jloverso/robotics/demoClient jloverso.demo switched.
$ p4 sync…//mw/robotics/matlab/toolbox/robotics/Makefile#3 - /sandbox/jloverso.demo/matlab/toolbox/robotics/Makefile//mw/robotics/matlab/toolbox/robotics/baseline.cpp#7 - /sandbox/jloverso.demo/matlab/toolbox/robotics/baseline.cpp…
25
How It Looks - Edit
$ p4 have matlab/toolbox/robotics/Makefile//mw/robotics/matlab/toolbox/robotics/Makefile#3 - /sandbox/jloverso.demo/matlab/toolbox/robotics/Makefile
$ p4 edit matlab/toolbox/robotics/Makefile//pdb/jloverso/robotics/demo/matlab/toolbox/robotics/Makefile#1 - opened for edit
$ p4 have matlab/toolbox/robotics/Makefile//pdb/jloverso/robotics/demo/matlab/toolbox/robotics/Makefile#1 - /sandbox/jloverso.demo/matlab/toolbox/robotics/Makefile
26
How It Looks – Dynamically Branched
$ p4 files //pdb/jloverso/robotics/demo/...//pdb/jloverso/robotics/demo/matlab/toolbox/robotics/Makefile#1 - branch change 1722340 (text)
$ p4 filelog //pdb/jloverso/robotics/demo/matlab/toolbox/robotics/Makefile//pdb/jloverso/robotics/demo/matlab/toolbox/robotics/Makefile... #1 change 1722340 branch on 2016/03/31 by [email protected] (text) 'Dynamically branched'... ... branch from //mw/robotics/matlab/toolbox/robotics/Makefile#1,#4
27
How It Looks - Submit
$ p4 submit -d "new work"Submitting change 1722341.Locking 1 files ...edit //pdb/jloverso/robotics/demo/matlab/toolbox/robotics/Makefile#2Change 1722341 submitted.
$ p4 stream -o //pdb/jloverso/robotics/demo~CTB | tail -3 import matlab/toolbox/robotics/... //mw/robotics/matlab/toolbox/robotics/...@1705498 share matlab/toolbox/robotics/Makefile
28
Branch On Submit
Provides a truer version of copy-on-write semantics Pending changes are fully discardable with no remnants or
commit server impact Requires the ability to re-base (reopen) the files in a pending
change from one branch to another• Can be done by creating journal entries in order to modify entries in
db.have, db.working, and db.locks tables• This is known in 2015.2 as “p4 sync –r”
29
Status
Branch-on-edit in production use for• private developer branches• “fixes” branches that are created on-the-fly for each change that is
processed by our Build & Test automation
We have a prototype of branch-on-submit• Some limitations; being worked on
Both versions support multilevel hierarchies
30
Future Plans
Our goal is to get all non-mainline branches to be sparse• This is where we can truly reduce database sizes
Possibility of open-source release of broker and trigger logic• Some internal dependencies need to be eliminated
Thank you!