analyzing differences between w1 and gdls using tree alignment morten rhiger (the it-university of...

54
Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Post on 21-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Analyzing differences between W1 and GDLs using tree alignment

Morten Rhiger(The IT-University of Copenhagen)

Page 2: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Outline

1. The problem(The upgrade problem: migrating partner customizations from version N to version N+1)

2. Our solution(Daisychaining procedures, a lá AOP)

3. Other solutions(Repositories with versioning, software merging, …)

4. Validating our solution(Measuring the number of good customizations using a tree diff)

5. Numbers…

2

Page 3: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

NAV lifecycle

W1version 5.0

W1version 5.0

DEversion 5.0

DEversion 5.0

GBversion 5.0

GBversion 5.0

BEversion 5.0

BEversion 5.0

DKversion 5.0

DKversion 5.0

W1version 2009

W1version 2009

DKversion 2009

DKversion 2009

Partners customize

Mic

roso

ftev

olve

Time3

Page 4: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

The NAV upgrade problem• There are no language features for controlling

customization in NAVs C/AL• Customizations are (destructive) source-code

modifications• There is no versioning in NAV– Some (clever) partners maintain repositories of their edits– www.mergetool.com– On the other hand, few (VAR) partners are IT professionals

• Consequently, partners face a serious problem when migrating their old customization to the new version– Migration takes up to 30% the effort required to

implement the first derived version

4

Page 5: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Our solution

• Distinguish between – The location of a customization in the original

version (a customization point), and– the modification a customization performs

• (Reminiscent of AOP)

5

Page 6: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

NAV lifecycle

W1version 5.0

W1version 5.0

DKversion 5.0

DKversion 5.0

W1version 2009

W1version 2009

DKversion 2009

DKversion 2009

Modifying customizations, but leaving customization points unchanged.

Mov

ing

arou

nd

cust

omiz

ation

poi

nts.

6

Pluggin old customization into (possibly moved) customization points (trivial).

Page 7: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Customization points where?

PROCEDURE UpdateBalance();BEGIN

GenJnlManagement.CalcBalance(…);

END

PROCEDURE UpdateBalance();BEGIN

GenJnlManagement.CalcBalance(…); TotalPayAmount := 0; TempGenJourLine.COPY(Rec);END

DE version 5.0W1 version 5.0

PROCEDURE UpdateBalance();BEGIN

GenJnlManagement.CalcBalance(…); <<customization point>> END

W1 version 2009

PROCEDURE UpdateBalance();BEGIN <<customization point>> GenJnlManagement.CalcBalance(…); END

Legal?

7

Page 8: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Customization points where?

• When is it legal to move a customization point? Where can it be moved to? …

• Procedure calls are useful customization points (we hypothesize):– If a procedure call can be moved, so can

customizations a that point• We probably still need something more fine

grained (we also hypothesize)

8

Page 9: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Daisy-chaining procedures

• Daisy-chaining procedures and triggers (a proposal due to Lars)

• Reminiscent of aspect-oriented programming • A property (Trigger) on a procedure or

trigger controls what is (also) invoked when that procedure is called

9

Page 10: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Daisy-chaining procedures• Existing procedure and trigger property:

[Trigger(“*”)]PROCEDURE Foo(…) = …

• Adding code to execute at the end of Foo:

PROCEDURE FooMorten() = …

• The “*” says that after Foo is invoked, all procedures with prefix Foo should also be invoked (in some unspecified order)– Resolved “late”

10

Page 11: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Customization points where?

[Trigger(“*”)]PROCEDURE UpdateBalance();BEGIN GenJnlManagement.CalcBalance(…);END

PROCEDURE UpdateBalanceDE();BEGIN TotalPayAmount := 0; TempGenJourLine.COPY(Rec);END

DE version 5.0W1 version 5.0

W1 version 2009

[Trigger(“*”)]PROCEDURE UpdateBalance();BEGIN … GenJnlManagement.CalcBalance(…); … END

The late partner (here GDL) has decided that the new code should be invoked whenever UpdateBalance is invoked, after the original.

The early partner (here Microsoft) is free to modify the body of the procedure.

There is an understanding that the calls to UpdateBalance are the relevant customization point for the Germain customization.

11

Page 12: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Other Trigger properties

• Daisy chaining:

[Trigger(“*”)]PROCEDURE Foo() = …

[Trigger(“*”)]PROCEDURE FooMorten() = …

PROCEDURE FooMortenMore() = …

Invoking Foo also invokes FooMorten and FooMortenMore (in that order).

12

Page 13: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Other Trigger properties

• “Hijacking” (or replacing) a procedure:

[Trigger(“Other”)]PROCEDURE Foo() = …

PROCEDURE Other() = …

Calls to Foo discards the body of Foo and executes Other instead.

13

Page 14: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Other Trigger properties

• Dynamic dispatch:

[Trigger(“=Dispatch”)]PROCEDURE Foo() = …

PROCEDURE Dispatch() = …

Calls to Foo invokes Dispatch, to produce the string controlling the trigger. For example,

PROCEDURE Dispatch() = RETURN “*”; orPROCEDURE Dispatch() = RETURN “Other”; or evenPROCEDURE Dispatch() = RETURN “=NewDispath”;

14

Page 15: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Evaluating the proposal

• Benefits– Little or no new C/AL syntax required– A class of existing customization can be handled

without modifying the corresponding W1

• Drawbacks– Probably not flexible enough– Editing experience messed up

15

Page 16: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

More fine-grained customizations• Inserting new customization point (procedure calls) in W1:

PROCEDURE Foo() = PROCEDURE Foo() = A(); A(); B(); <<customization>>

B();-------------------------------------------------PROCEDURE Foo() = PROCEDURE InFooMorten() = A(); <<customization>> InFoo(); B();

[Trigger(“*”)]PROCEDURE InFoo();

• The need for a customization point must be passed back through the chain of developers

16

Page 17: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Goals …

• … to measure how well existing customization fit the model– … to testdrive our analysis tool (currently)• A tree-diff enginge discovering tree alignments• Test data:

– W1 5.0 SP1, and– 39 GDLs of the same version: DK 5.0 SP1, …

– ... to make the tool available for other analyses (long term)

17

Page 18: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

A “diff” for C/AL source code

18

Page 19: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Sequence-based diff source code?

• Traditional sequence-based diff (e.g., UNIX diff) does not take program structure into account

• Valid for software merging (e.g., UNIX diff3)• Not appropriate for identifying whole-statement

modifications:IF X = 0 THEN BEGIN IF X = 0 THEN BEGIN Foo(); Foo(); Bar(); END ELSE BEGIN // addEND; Bar();

END

19

Page 20: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Tree-based diff?

• Yes, but what is a tree-based diff?– Preserve depth?– Allow general movements?– Allow re-ordering siblings?– …

• We propose a tree alignment for ordered trees [Jiang,Wang,Zhang CPM’94] as an appropriate way to identify customizations.

20

Page 21: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Tree alignment• A tree alignment A of two trees T, U is a tree

whose nodes are pairs on form

(t, u) (t, -) (-, u)(copy node) (delete node) (insert node)

where t, u are nodes from T, U, and that satisfying an erasure property:

discarding the second components and removing “-” nodes and their paths gives the original T, and (vice versa) removing the first compont gives the original U.

21

Page 22: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Tree alignment

• A tree alignment for ordered trees– … does not preserve depth,– … does not allow re-ordering of siblings,– … does not allow general movements of subtrees

• From the alignment, an edit script (similar to the output of UNIX diff) can be generated

• Interactive examples…

22

Page 23: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Tree alignment algorithm• Dynamic programming for sequence-based diff:

• Dynamic programming for tree alignment– More “complicated”– More complex: O(|T|×|U|×(deg(T)+deg(U))2) time complexity

a b a c c

0 1 2 3 4 5

a 1 0 1 2 3 4

c 2 1 1 2 2 3

b 3 2 1 2 3 3

a 4 3 2 1 2 3

a 5 4 3 2 2 3

Copy:c

c+u

Delete: c c+d

Add:c

c+a

Minimum cost

23

= edit script

Page 24: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Quantitative analyses of single versions (W1, DK, DE, …)

24

Page 25: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Sizes of code pieces

• Code piece = procedure or trigger• Code pieces are uniquely identified by a code

path, e.g.,– Table/317/FIELDS/0/OnValidate– Codeunit/530/CODE/ValidateEnumVal– Form/31/CONTROLS/4/Menu/MENUITEMS/2/OnPush

• Code size measured in AST nodes (≈ number of statements)

25

Page 26: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

W1 5.0 SP1 code pieces

26

Page 27: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

W1 5.0 SP1 code amount

27

Page 28: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

W1 5.0 SP1 numbers• 39,946 code pieces:– 45% has 3 statements or less,– another 30% has 4-10 statements, – yet another 16% has 11-30 statements.

• 357,713 statements:– 8% are in code with 3 statements or less, – another 17% are in code with 4-10 statements, – Yet another 18% are in code with 11-20 statements.

• Roughly the same numbers for GDLs.• The complexity of the tree alignment algorithm is

under control (a W1-GDL diff takes 14-18 minutes on my laptop)

28

Page 29: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

W1 5.0 SP1 numbersDetails

• Four code pieces has more than 1,000 statements:– Codeunit 80 “Sales-Post”

PROPERTIES/OnRun (1462 statements, nontrivial) – Codeunit 90 “Purch.-Post”

PROPERTIES/OnRun (1492 statements, nontrivial)– Report 83 “Change Global Dimensions”

CODE/ChangeGlobalDim (1751, trivial code duplications)– Codeunit 406 “Setup Checklist Management”

CODE/TransferContents (2033, trivial code duplications)

29

Page 30: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Quantitative analyses of differences between versions

30

Page 31: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Amount of customization

31

Page 32: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Amount of customizationDetails

• Much variance:– 91 very mild customizations in IS 5.0 SP1 – 2,593 customizations in TH 5.0 SP1

• Some agreement, too:– 2,593 customizations in all of APAC, ID, MY, PH, SG,

and TH– Same for {GB, IE}, {NA-US, NA-USCA, NA-USCAMX},

and {DE, AT}– Not a coincidence: These versions differ only in

language– (But gives a “Proof of concept”)

32

Page 33: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Customization point usage(Hotspots)

4005

2053

1125

Cold spots Hotspots33

Probably false positives due to hotfix

Page 34: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Customization point usage(Hotspots, details)

• Many cold customization points used by only one (4000), two (2000), or three (1000) GDLs.

• A nontrivial number of customization points (42) used by all GDLs!– (Consistent renamings of, e.g., “.name” to

“.Name”)– Probably a hotfix not captured in the repository– (But gives a “proof of concept”)

34

Page 35: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Hot objects

Cold objects Hot objects35

Page 36: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Hot objectsObject Number of GDLs customizing object Codeunit/2 30 Codeunit/11 31 Codeunit/80 31 Table/39 32 Table/37 33 Table/38 33 Table/36 36 Codeunit/12 37 Table/81 37 Codeunit/1 39 Codeunit/424 39 Codeunit/5054 39 Codeunit/5300 39 Codeunit/7152 39 Codeunit/99008517 39 Report/99008512 39

36

Page 37: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Classes of customization by version

37

False positives (due to measuring)

Page 38: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Example customizations

38

Page 39: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Example modificationsModification that should be avoided!

• Codeunit/80/CODE/FillInvPostingBuffer– W1 5.0 SP1:InvPostingBuffer[1]."Line Discount Amount" := "Line Discount Amount";InvPostingBuffer[1]."Inv. Discount Amount" := "Inv. Discount Amount";

– TH 5.0 SP1:InvPostingBuffer[1]."Inv. Discount Amount" := "Inv. Discount Amount";InvPostingBuffer[1]."Line Discount Amount" := "Line Discount Amount";

39

Page 40: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Example modificationsModification that could be avoided

• Table/4/CODE/InitRoundingPrecision– W1 5.0 SP1:"Unit-Amount Rounding Precision" := 0.00001

– ES 50 SP1:"Unit-Amount Rounding Precision" := 0.000001

40

Page 41: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Example modifications• Table/14/FIELDS/5703/OnValidate

– W1 5.0 SP1:BEGIN Postcode.ValidateCity(City, "Post Code");END

– ES 5.0 SP1:BEGIN Postcode.ValidateCity(City, "Post Code", County);END

– APAC 5.0 SP1:BEGIN PostCodeCheck.ValidateCity(CurrFieldNo, DATABASE::Location, Rec.GETPOSITION, 0, Name, "Name 2", Contact, Address, "Address 2", City, "Post Code", County, "Country/Region Code");END

• Candidate for hijacking41

Page 42: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Example modifications• Codeunit 99000889/CODE/SetSalesHeader

– W1 5.0 SP1:REPEAT SalesLine.NEXT = 0 BEGIN "Entry No." := SalesLine."Line No.“ TransferFromSalesLine(SalesLine) SalesLine.CALCFIELDS("Reserved Qty. (Base)") ...END

– APAC 5.0 SP1:REPEAT SalesLine.NEXT = 0 BEGIN IF SalesLine."Build Kit" THEN TransferFromKitSalesLine(SalesLine,OrderPromisingLine) ELSE BEGIN "Entry No." := SalesLine."Line No.“ TransferFromSalesLine(SalesLine) "Source Sub Line No." := 0 SalesLine.CALCFIELDS("Reserved Qty. (Base)") ... ENDEND

42

Page 43: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Using tree alignment forC/AL source code

44

Page 44: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Identifying code modifications

• Operations that can be applied to the old document:– Here (and elsewhere): delete(L1), add(L2),

update(L1, L2), copy(L1)

• These operations have costs• An edit script is a sequence of operations

transforming the old document into the new.• An optimal edit script is one with least cost

45

Page 45: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Finding optimal edit scripts

• In revision control systems,– for merging code: UNIX diff (sequence based)

• In bioinformatics,– for globally aligning protein sequences

(sequence based),– for comparing RNA secondary structure (tree

based)• In (semi-) structured data models,– for comparing XML documents, etc

46

Page 46: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Assigning costs to edit operations

• Costs for deleting an old node, adding a new node, and updating an old node with the label of a new.

• Costs with the right properties give rise to a distance between two trees (in a certain metric space):

D(x,y) ≥ 0D(x,y) = 0, only if x = yD(x,y) = D(y,x)D(x,z) ≤ D(x,y) + D(y,z)

47

Page 47: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Which edit costsgives “best” edit scripts?

• High costs for updates, low costs for adds and deletes:– Pro: Doesn’t equate unrelated statements– Cons: Fails to detect actual updates

• Low costs for updates, high cost for adds and deletes– Pro: Detects actual updates– Cons: Equates unrelated statements

48

Page 48: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

High costs for updates• codeunit 73/properties/OnRun from W1-5.0 SP1:

...PurchHeader.TESTFIELD(Status,PurchHeader.Status::Open);FromBOMComp.SETRANGE("Parent Item No.","No.");NoOfBOMComp := FromBOMComp.COUNT;IF NoOfBOMComp = 0 THEN ERROR(Text001, "No.");Selection := STRMENU(Text005,2);...

• codeunit 73/properties/OnRun from TH-5.0 SP1:...PurchHeader.TESTFIELD(Status,PurchHeader.Status::Open);Item.GET("No.");IF Item."Kit BOM No." = '' THEN ERROR(Text001, "No.");KitManagement.GetKitProdBOM(...);IF NoOfBOMComp = 0 THEN ERROR(Text001, "No.");Selection := STRMENU(Text005,2);...

Aha! A smart way to achieve this

Thus, updating the IF should have low cost.

Hmm… no. These lines replaced these

Thus, updating the IF should have high cost.

49

Page 49: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Low costs for updates• codeunit 73/properties/OnRun from W1-5.0 SP1:

REPEAT ToPurchLine.INIT; NextLineNo := NextLineNo + LineSpacing; ToPurchLine."Line No." := NextLineNo; CASE FromBOMComp.Type OF FromBOMComp.Type::" ": ToPurchLine.Type := ToPurchLine.Type::" "; FromBOMComp.Type::Item: ...

• codeunit 73/properties/OnRun from TH-5.0 SP1:REPEAT ToPurchLine.INIT; NextLineNo := NextLineNo + LineSpacing; ToPurchLine."Line No." := NextLineNo; CASE TempProdBOMLine.Type OF TempProdBOMLine.Type::" ": ToPurchLine.Type := ToPurchLine.Type::" "; TempProdBOMLine.Type::Item: ...

50

Page 50: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Which edit costsgives “best” edit scripts?

• Ideally, we would require

update(L1, L2) < delete(L1) + add(L2)

while still taking the content into account. (For example, updating an IF to a WHILE should have a very high cost.)

51

Page 51: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Which edit costsgives “best” edit scripts?

• Currently,delete(L1) = |L1| / 2add(L2) = |L2| / 2update(L1,L2) = 2 × sift3(L1,L2,5), if L1.kind = L2.kind

= |L1| + |L2| - 1, otherwise• Somewhat arbitrary. (Is it a metric?) What works

in one case might not work in another...• sift3 is a linear-time approximate string distance

“algorithm.” (A true sequence alignment would be a better alternative.)

52

Page 52: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Which edit costsgives “best” edit scripts?

• Example dump of

codeunit 73/properties/OnRun

for W1 5.0 SP1 vs. TH 5.0 SP1

53

Page 53: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Conclusions

54

Page 54: Analyzing differences between W1 and GDLs using tree alignment Morten Rhiger (The IT-University of Copenhagen)

Future work

• (Bugfixes…)

• Run on W1 version N versus W1 version N+1• Run on partner-customized versions • Input annotated programs, to facilitate more

precise program analyses of diffs• Save diffs to a database, to ease querying(?)

55