how to reduce database load with sparse branches

30
How to Reduce Database Load with Sparse Branches John LoVerso Software Archeologist

Upload: perforce

Post on 12-Apr-2017

252 views

Category:

Software


2 download

TRANSCRIPT

Page 1: How to Reduce Database Load with Sparse Branches

How to Reduce Database Load with Sparse Branches John LoVersoSoftware Archeologist

Page 2: How to Reduce Database Load with Sparse Branches

2

MathWorks

We are a 3500+ person company dedicated to accelerating the pace of engineering and science

We have ~90 products based upon our core platforms• MATLAB – The Language of Technical Computing• Simulink – Simulation and Model-Based Design

Page 3: How to Reduce Database Load with Sparse Branches

3

Technical Challenges

Unified code base from which full product family is released twice a year

Managing an over 1 million file code base• 5,000+ components, acyclic dependencies

Integrating changes from ~1500 developerson ~270 active branches (streams)

Page 4: How to Reduce Database Load with Sparse Branches

4

Multilevel Streams Hierarchy

//mw/main

//mw/integ2//mw/integ1

//mw/product1 //mw/product5

Page 5: How to Reduce Database Load with Sparse Branches

5

Merge Down and Merge Up

Build and Test infrastructure blesses submitted changes• Qualify changes already submitted to the branch

Merge down from the last blessed change-level• Almost never merge from the tip

Cannot copy up; must merge

Page 6: How to Reduce Database Load with Sparse Branches

6

Componentized Streams

Our branches are built out of components We use pairs of stream specs for each branch

For details, see• Moving 1000 Users and 100 Branches Into Streams

- from MERGE2014• Outsmarting Merge Edge Cases in Component Based Development

- from MERGE2016

Page 7: How to Reduce Database Load with Sparse Branches

7

Pair of Streams

Wide-open development stream specsStream: //mw/productParent: //mw/integPaths: share …

Virtual streams computed from component informationStream: //mw/product~CTBParent: //mw/productPaths:

share matlab/toolbox/prodA/...share matlab/toolbox/prodB/...

Page 8: How to Reduce Database Load with Sparse Branches

8

Average Week

ChangesFiles

Min Average Max TotalNew Work 3328 1 8 52 25641

Merges 610 1 1890 39305 1153239

Changes submitted over one week in our main product depot

Page 9: How to Reduce Database Load with Sparse Branches

9

2015.1 Merge Meltdown

Many regressions in integration engine Lost weeks of developer time All because we merge all the time

Page 10: How to Reduce Database Load with Sparse Branches

10

What If We Didn’t Need to Merge?

At any time, the number of files on a branch with unique changes is small

The rest of the files are the same as on the parent branch

Page 11: How to Reduce Database Load with Sparse Branches

11

What is Sparse Branching?

Sparse branching, a.k.a. lightweight or just-in-time branching, is a strategy where files are only branched when modified and are otherwise just a reference to a file on the corresponding parent branch

Initial creation of a sparse branch is an O(1) operation akin to creating a clone or snapshot in a copy-on-write filesystem like ZFS

Page 12: How to Reduce Database Load with Sparse Branches

12

Why not simply use Task Streams?

They are database expensive• Order(n) rather than Order(1) to create• They consume database space until deleted

They require unnecessary merging in order to be kept up-to-date with their parent• Still exposes the user to the vagaries of complex merges

They have limitations• You can only merge from their parent• Can’t recreate them if they are destroyed• No virtual stream support

Page 13: How to Reduce Database Load with Sparse Branches

13

Some Terms

Winked-in file• a file mapped to a revision on an ancestor branch (lazy copy)

Active file• a file with changes that have not yet propagated to the parent branch

Make concrete• the act of branching a winked-in file in order to make it active

Page 14: How to Reduce Database Load with Sparse Branches

14

Our Approach to Sparse Branches

A sparse branch is defined by the stream Paths:• Winked-in files use “import path@change” to map the paths from an

ancestor branch• Files are made active on the branch by inserting “share” lines for each file

after populating it on the sparse branch Moving ahead to a new parent change involves

• Advancing the change number in the “import path@change” directives• Merging changes down from the parent into active files

Page 15: How to Reduce Database Load with Sparse Branches

15

Pair of Streams

Wide-open development stream specsStream: //mw/productParent: //mw/integPaths: share …

Virtual streams computed from component informationStream: //mw/product~CTBParent: //mw/productPaths:

share matlab/toolbox/prodA/...share matlab/toolbox/prodB/...

Page 16: How to Reduce Database Load with Sparse Branches

16

Pair of Streams – Sparse Branch

Wide-open development stream specsStream: //mw/productParent: //mw/integPaths: share …

Virtual streams computed from component informationStream: //mw/product~CTBParent: //mw/productPaths: import matlab/toolbox/prodA/... //mw/integ/matlab/toolbox/prodA/…@1000 import matlab/toolbox/prodB/... //mw/integ/matlab/toolbox/prodB/…@1000

Page 17: How to Reduce Database Load with Sparse Branches

17

Pair of Streams – Sparse BranchWith Active Files Wide-open development stream specs

Stream: //mw/productParent: //mw/integPaths: share …

Virtual streams computed from component informationStream: //mw/product~CTBParent: //mw/productPaths: import matlab/toolbox/prodA/... //mw/integ/matlab/toolbox/prodA/…@1000 import matlab/toolbox/prodB/... //mw/integ/matlab/toolbox/prodB/…@1000 share matlab/toolbox/prodA/file1

Page 18: How to Reduce Database Load with Sparse Branches

18

Multi-level Sparse Hierarchies

Multi-level sparse hierarchiesStream: //mw/integ~CTBParent: //mw/integPaths: import matlab/toolbox/prodA/... //mw/main/matlab/toolbox/prodA/…@980 import matlab/toolbox/prodB/... //mw/main/matlab/toolbox/prodB/…@980 share matlab/toolbox/prodB/file2

Stream: //mw/product~CTBParent: //mw/productPaths: import matlab/toolbox/prodA/... //mw/main/matlab/toolbox/prodA/…@980 import matlab/toolbox/prodB/... //mw/main/matlab/toolbox/prodB/…@980 import matlab/toolbox/prodB/file2 //mw/integ/matlab/toolbox/prodB/file2@1000 share matlab/toolbox/prodA/file1

Page 19: How to Reduce Database Load with Sparse Branches

20

Transparent to the User

Once a sparse branch is created, user commands should be entirely agnostic to the nature of the branch• add/edit/delete/move/unshelve/merge should all just work

Updating to a newer change level of the parent is special• merge + revert of newly branched files

Page 20: How to Reduce Database Load with Sparse Branches

21

But what about new work?

We have explored two approaches:

Branch On Edit• Just-in-time branching of a file as soon as “p4 edit” happens

Branch On Submit• Files are opened on the parent branch and are only branched at submit

time

Page 21: How to Reduce Database Load with Sparse Branches

22

Branch On Edit

Has the benefit of being easier to implement Broker wrapper to intercept operations that open files

• edit, delete, move, open, unshelve, merge• Compute all the files about to be operated on• Invoke ‘p4 populate’ to make concrete revisions on the sparse branch of

winked-in files• Invoke ‘p4 flush’ to switch have revision• Let operation complete to affect opened revision

Trigger (change-content) to add files to active list on submit

Page 22: How to Reduce Database Load with Sparse Branches

23

How It Looks

$ p4 stream -t sparse_v1 -P //mw/robotics@1705498 //pdb/jloverso/robotics/demoStream //pdb/jloverso/robotics/demo created.

$ p4 stream -o //pdb/jloverso/robotics/demo~CTB | tail -2 import matlab/toolbox/robotics/... //mw/robotics/matlab/toolbox/robotics/...@1705498

$ p4 files //pdb/jloverso/robotics/demo/...//pdb/jloverso/robotics/demo/... - no such file(s).

Page 23: How to Reduce Database Load with Sparse Branches

24

How It Looks - Sync

$ p4 client -s -S //pdb/jloverso/robotics/demoClient jloverso.demo switched.

$ p4 sync…//mw/robotics/matlab/toolbox/robotics/Makefile#3 - /sandbox/jloverso.demo/matlab/toolbox/robotics/Makefile//mw/robotics/matlab/toolbox/robotics/baseline.cpp#7 - /sandbox/jloverso.demo/matlab/toolbox/robotics/baseline.cpp…

Page 24: How to Reduce Database Load with Sparse Branches

25

How It Looks - Edit

$ p4 have matlab/toolbox/robotics/Makefile//mw/robotics/matlab/toolbox/robotics/Makefile#3 - /sandbox/jloverso.demo/matlab/toolbox/robotics/Makefile

$ p4 edit matlab/toolbox/robotics/Makefile//pdb/jloverso/robotics/demo/matlab/toolbox/robotics/Makefile#1 - opened for edit

$ p4 have matlab/toolbox/robotics/Makefile//pdb/jloverso/robotics/demo/matlab/toolbox/robotics/Makefile#1 - /sandbox/jloverso.demo/matlab/toolbox/robotics/Makefile

Page 25: How to Reduce Database Load with Sparse Branches

26

How It Looks – Dynamically Branched

$ p4 files //pdb/jloverso/robotics/demo/...//pdb/jloverso/robotics/demo/matlab/toolbox/robotics/Makefile#1 - branch change 1722340 (text)

$ p4 filelog //pdb/jloverso/robotics/demo/matlab/toolbox/robotics/Makefile//pdb/jloverso/robotics/demo/matlab/toolbox/robotics/Makefile... #1 change 1722340 branch on 2016/03/31 by [email protected] (text) 'Dynamically branched'... ... branch from //mw/robotics/matlab/toolbox/robotics/Makefile#1,#4

Page 26: How to Reduce Database Load with Sparse Branches

27

How It Looks - Submit

$ p4 submit -d "new work"Submitting change 1722341.Locking 1 files ...edit //pdb/jloverso/robotics/demo/matlab/toolbox/robotics/Makefile#2Change 1722341 submitted.

$ p4 stream -o //pdb/jloverso/robotics/demo~CTB | tail -3 import matlab/toolbox/robotics/... //mw/robotics/matlab/toolbox/robotics/...@1705498 share matlab/toolbox/robotics/Makefile

Page 27: How to Reduce Database Load with Sparse Branches

28

Branch On Submit

Provides a truer version of copy-on-write semantics Pending changes are fully discardable with no remnants or

commit server impact Requires the ability to re-base (reopen) the files in a pending

change from one branch to another• Can be done by creating journal entries in order to modify entries in

db.have, db.working, and db.locks tables• This is known in 2015.2 as “p4 sync –r”

Page 28: How to Reduce Database Load with Sparse Branches

29

Status

Branch-on-edit in production use for• private developer branches• “fixes” branches that are created on-the-fly for each change that is

processed by our Build & Test automation

We have a prototype of branch-on-submit• Some limitations; being worked on

Both versions support multilevel hierarchies

Page 29: How to Reduce Database Load with Sparse Branches

30

Future Plans

Our goal is to get all non-mainline branches to be sparse• This is where we can truly reduce database sizes

Possibility of open-source release of broker and trigger logic• Some internal dependencies need to be eliminated

Page 30: How to Reduce Database Load with Sparse Branches

Thank you!

[email protected]