andrew g. west and insup lee august 28, 2012
DESCRIPTION
Andrew G. West and Insup Lee August 28, 2012. Towards Content-Driven Reputation for Collaborative Code Repositories. Big Concept. Do the computed reputations accurately reflect user behavior? If so, how could such a system be useful in practice ? - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Andrew G. West and Insup Lee August 28, 2012](https://reader036.vdocument.in/reader036/viewer/2022062422/56813737550346895d9ec6d8/html5/thumbnails/1.jpg)
Andrew G. West and Insup LeeAugust 28, 2012
Towards Content-Driven Reputation for Collaborative Code Repositories
![Page 2: Andrew G. West and Insup Lee August 28, 2012](https://reader036.vdocument.in/reader036/viewer/2022062422/56813737550346895d9ec6d8/html5/thumbnails/2.jpg)
Big Concept
1. Do the computed reputations accurately reflect user behavior? If so, how could such a system be useful in practice?
2. What do inaccuracies teach us about differences in the evolution of code vs. natural language content? Adaptation?
2
Apply reputation algorithms developed for wikis in collaborative code repositories:
![Page 3: Andrew G. West and Insup Lee August 28, 2012](https://reader036.vdocument.in/reader036/viewer/2022062422/56813737550346895d9ec6d8/html5/thumbnails/3.jpg)
Motivations
Platform equivalence• Purely collaborative• Increasingly distributed;
collaboration between unknown/un-trusted parties
3
VehicleForge.mil [1]•Crowdsourcing a next generation military vehicle•Trust implications!
![Page 4: Andrew G. West and Insup Lee August 28, 2012](https://reader036.vdocument.in/reader036/viewer/2022062422/56813737550346895d9ec6d8/html5/thumbnails/4.jpg)
CONTENT-DRIVEN REPUATION
4
![Page 5: Andrew G. West and Insup Lee August 28, 2012](https://reader036.vdocument.in/reader036/viewer/2022062422/56813737550346895d9ec6d8/html5/thumbnails/5.jpg)
Content Driven Rep.
5
V1V0
Article Version History
Initialization
AuthorsA1
Mr. Franklin flew a kite
IDEA: Content that survives is good content. Good content is written/maintained by good authors.
V1: No reputation changes; no survival
![Page 6: Andrew G. West and Insup Lee August 28, 2012](https://reader036.vdocument.in/reader036/viewer/2022062422/56813737550346895d9ec6d8/html5/thumbnails/6.jpg)
Content Driven Rep.
6
V1 V2 V3V0
Article Version History
Initialization
AuthorsA1 A2 A3
V4A4
Mr. Franklin flew a kite
Your mom flew a plane
Damage
IDEA: When a subsequent editor allows content to survive, it has his/her implicit approval (and vice versa)
V2: Author A2 deletes most of A1’s content. Reputation of A1 is negatively impacted.
![Page 7: Andrew G. West and Insup Lee August 28, 2012](https://reader036.vdocument.in/reader036/viewer/2022062422/56813737550346895d9ec6d8/html5/thumbnails/7.jpg)
Content Driven Rep.
7
V1 V2 V3V0
Article Version History
Initialization Content Restoration
AuthorsA1 A2 A3
V4A4
Mr. Franklin flew a kite
Your mom flew a plane
Mr. Franklin flew a kite
Damage
IDEA: Survival is examined at depth
V3: Author A3 reverts A2’s content. Editor A1 gains reputation as his content is restored, A2 loses rep.
![Page 8: Andrew G. West and Insup Lee August 28, 2012](https://reader036.vdocument.in/reader036/viewer/2022062422/56813737550346895d9ec6d8/html5/thumbnails/8.jpg)
Content Driven Rep.
8
V1 V2 V3V0
Article Version History
Initialization Content Restoration
AuthorsA1 A2 A3
V4
Content Persistence
A4
Mr. Franklin flew a kite
Your mom flew a plane
Mr. Franklin flew a kite
Mr. Franklin flew a kite and …
Damage
IDEA: … and the process continues (depth=10)
V4: Author A1 and A3 accrue reputation, while A2 continues to receive reputation decrements.
![Page 9: Andrew G. West and Insup Lee August 28, 2012](https://reader036.vdocument.in/reader036/viewer/2022062422/56813737550346895d9ec6d8/html5/thumbnails/9.jpg)
In Practice
Implemented as WikiTrust [2, 3]• Token survival + edit distance captures novel
content as well as maintenance actions• Size of ∆ is: (1) proportional to degree of
change, (2) weighted by the rep. of the editor• Nice security properties–Implicit feedback–Symmetric evaluation–No self approval
9
![Page 10: Andrew G. West and Insup Lee August 28, 2012](https://reader036.vdocument.in/reader036/viewer/2022062422/56813737550346895d9ec6d8/html5/thumbnails/10.jpg)
WikiTrust Success
Live processing several language editions of Wikipedia; portable!
10
VANDALISM
Implementation [4] works on any MediaWiki installation
![Page 11: Andrew G. West and Insup Lee August 28, 2012](https://reader036.vdocument.in/reader036/viewer/2022062422/56813737550346895d9ec6d8/html5/thumbnails/11.jpg)
REPRESENTING AREPOSITORY ON
A WIKI PLATFORM
11
![Page 12: Andrew G. West and Insup Lee August 28, 2012](https://reader036.vdocument.in/reader036/viewer/2022062422/56813737550346895d9ec6d8/html5/thumbnails/12.jpg)
Repo. ↔ Wiki Model
12
1
2
3
4
6
7
5
9
tags/
trunk/
branches/ merge
Just replay history in a sequential fashion:•Repository ↔ wiki•Check-in ↔ edit•File ↔ article
![Page 13: Andrew G. West and Insup Lee August 28, 2012](https://reader036.vdocument.in/reader036/viewer/2022062422/56813737550346895d9ec6d8/html5/thumbnails/13.jpg)
Repo. ↔ Wiki Model
Minor accommodations: • Ignore tags• Ignore branches (merge
as a recommendation)• Multi-file check-in
13
1
2
3
4
6
7
5
9
tags/
trunk/
branches/ merge
Just replay history in a sequential fashion:•Repository ↔ wiki•Check-in ↔ edit•File ↔ article
![Page 14: Andrew G. West and Insup Lee August 28, 2012](https://reader036.vdocument.in/reader036/viewer/2022062422/56813737550346895d9ec6d8/html5/thumbnails/14.jpg)
Replay in Practice
1. [svnsync] produces local copy (not a checkout)2. [svn log] yields metadata script (see table)3. Pipe file versions into wiki via API
1. Log-in user (create account if needed)2. Use [svn cat path@id] syntax to yield content3. Make edit to article “path”. Logout.
14
ID USR COMMENT MOD PATH
1 U1 Initial check-in.A /trunk/core/header.c
A /trunk/core/misc.c
2 U2 Compilation error M /trunk/core/header.c
3 U1 Don’t need this D /trunk/core/misc.c
![Page 15: Andrew G. West and Insup Lee August 28, 2012](https://reader036.vdocument.in/reader036/viewer/2022062422/56813737550346895d9ec6d8/html5/thumbnails/15.jpg)
CASE STUDYINTRODUCTION
15
![Page 16: Andrew G. West and Insup Lee August 28, 2012](https://reader036.vdocument.in/reader036/viewer/2022062422/56813737550346895d9ec6d8/html5/thumbnails/16.jpg)
Mediawiki SVN• Case study repository: Mediawiki SVN [5]• http://hincapie.cis.upenn.edu/wiki_mediawiki/
16
PROPERTY ORIG MOD
Authors 326 271
Check-ins 91,808 53,715
File versions 585,629 117,432
… in trunk/ 420,613 117,432
Unique paths 138,741 7,521
… to PHP file 56,063 7,521
Further filtering:• Only PHP files
• Core language• No binary files• Tokenization
• Toss out i18n filesper late 2011
![Page 17: Andrew G. West and Insup Lee August 28, 2012](https://reader036.vdocument.in/reader036/viewer/2022062422/56813737550346895d9ec6d8/html5/thumbnails/17.jpg)
Mediawiki SVN• Case study repository: Mediawiki SVN [X]• http://hincapie.cis.upenn.edu/wiki_mediawiki/
17
PROPERTY ORIG MOD
Authors 326 271
Check-ins 91,808 53,715
File versions 585,629 117,432
… in trunk/ 420,613 117,432
Unique paths 138,741 7,521
… to PHP file 56,063 7,521
Further filtering:• Only PHP files
• Core language• No binary files• Tokenization
• Toss out i18n files
Wiki database is givento WikiTrust implementation:
Revision #A by J had ∆+0.75 on reputation of X=12.05Revision #B by K had ∆-42.00 on reputation of Y=0.5
Revision #B by K had ∆+16.75 on reputation of Z=1000.1… … …
Recall: An edit can change up to 10 reputations!
![Page 18: Andrew G. West and Insup Lee August 28, 2012](https://reader036.vdocument.in/reader036/viewer/2022062422/56813737550346895d9ec6d8/html5/thumbnails/18.jpg)
General Results (1)
18
Distribution of Final User Reputations • Reputations
lie on [0,20k]• 0.0 is the
initial rep. • ≈15 users
w/max. rep. Not always those w/most revs.
![Page 19: Andrew G. West and Insup Lee August 28, 2012](https://reader036.vdocument.in/reader036/viewer/2022062422/56813737550346895d9ec6d8/html5/thumbnails/19.jpg)
General Results (2)
19
Distribution of Update ∆s, by Magnitude • Majority of
updates are positive; evidence of a healthy community
• Most freq. update is 1-10 pt. increment
![Page 20: Andrew G. West and Insup Lee August 28, 2012](https://reader036.vdocument.in/reader036/viewer/2022062422/56813737550346895d9ec6d8/html5/thumbnails/20.jpg)
Example Reputations
20
![Page 21: Andrew G. West and Insup Lee August 28, 2012](https://reader036.vdocument.in/reader036/viewer/2022062422/56813737550346895d9ec6d8/html5/thumbnails/21.jpg)
EVALUATING REPUTATION ACCURACY
21
![Page 22: Andrew G. West and Insup Lee August 28, 2012](https://reader036.vdocument.in/reader036/viewer/2022062422/56813737550346895d9ec6d8/html5/thumbnails/22.jpg)
Evaluation Process
Find edits (Ex) where:
• Subsequent edit (Ex+1) resulted in non-trivial rep. loss for author
• Manually inspect comment, Bugzilla, diffs, and ask:“Would editor Ax+1 consider the previous change CONSTRUCTIVE, or UNCONSTRUCTIVE”?
• Could be a subjective mess, but…22
Ex+1
Ex
Non-trivialcontentremoval
Was this removal the result of ineptitude by the prior editor?
![Page 23: Andrew G. West and Insup Lee August 28, 2012](https://reader036.vdocument.in/reader036/viewer/2022062422/56813737550346895d9ec6d8/html5/thumbnails/23.jpg)
Classifying Rep. Loss (1)
23
Surprising number of obviously “bad” actions resulting in reverts. Editor calls out previous edit and/or editor explicitly:
“Password in plaintext! … DOESN'T WORK … don't put it in trunk!”“massive breakage with incomplete experimental changes”
“revert … spewing giant red HTML all over everything”
“failed, possibly other problems. NEEDS PARSER TESTS”“ten billion extra callouts …. clutter things up and trigger errors”
“… no apparent purpose … more complex and prone to breakage”
![Page 24: Andrew G. West and Insup Lee August 28, 2012](https://reader036.vdocument.in/reader036/viewer/2022062422/56813737550346895d9ec6d8/html5/thumbnails/24.jpg)
Classifying Rep. Loss (2)
24
Some cases are more ambiguous. The editor erred but its not immediately clear there should be significant penalty (NONFATAL):
Code showing no immediate errors:• But reverted (or branched) for testing
Issues unrelated to functional code: • Whitespace, comment/string changes
![Page 25: Andrew G. West and Insup Lee August 28, 2012](https://reader036.vdocument.in/reader036/viewer/2022062422/56813737550346895d9ec6d8/html5/thumbnails/25.jpg)
Evaluation Results
Per a conservative approach, anything not in the other two sets is CONSTRUCTIVE:
25
UNCONSTRUCTIVE NON-FATAL CONSTRUCTIVE51% 19% 30%
63% accuracy if we discount the “non-fatal” cases70% accuracy if we interpret them as “unconstructive”Interpret how you wish; purposely a naïve application
Concentrate on false-positives:Can the algorithm be improved?
![Page 26: Andrew G. West and Insup Lee August 28, 2012](https://reader036.vdocument.in/reader036/viewer/2022062422/56813737550346895d9ec6d8/html5/thumbnails/26.jpg)
IDENTIFYING & FIXINGFALSE POSITIVES +
EVALUATION
26
![Page 27: Andrew G. West and Insup Lee August 28, 2012](https://reader036.vdocument.in/reader036/viewer/2022062422/56813737550346895d9ec6d8/html5/thumbnails/27.jpg)
False Positives (1)
SVN does not handle RENAME elegantly:
27
file.c
file_renamed.cADD
DEL
Consequences: Authors of [file.c] punished; provenance lost; renamer gets all credit.
Solutions: Detect via hash; simple wiki “move”
![Page 28: Andrew G. West and Insup Lee August 28, 2012](https://reader036.vdocument.in/reader036/viewer/2022062422/56813737550346895d9ec6d8/html5/thumbnails/28.jpg)
False Positives (2.1)
28
INTER-DOCUMENT REORGANIZATION is problematic for WikiTrust
file1.c >>
file2.c >>
file3.c >> ...
Entire code-base as one giant doc. –global diff!
func_b(){…}func_c(){…}
file_1.c
func_c(){…}……
file_2.c
--- ∆ +++ ∆
Solution: Examine all diff ∆; sub-string matching; replay history. Intra-doc reorg. is a non-issue!
![Page 29: Andrew G. West and Insup Lee August 28, 2012](https://reader036.vdocument.in/reader036/viewer/2022062422/56813737550346895d9ec6d8/html5/thumbnails/29.jpg)
False Positives (2.2)
29
INTER-DOCUMENT REORGANIZATION is problematic for WikiTrust
file1.c >>
file2.c >>
file3.c >> ...
Entire code-base as one giant doc.
Solution: Intra-document reorg. is non-issue!; Global diff; substring matching; replay history.
func_b(){…}func_c(){…}
file_1.cfunc_c(){…}……
file_2.c
--- ∆ +++ ∆
[This is the content block being moved]
A1 – V1
A2 – V2
A3– V3
[This is the same block 3 edits ago]
Destination doc. history
A1
A2
A3
!
![Page 30: Andrew G. West and Insup Lee August 28, 2012](https://reader036.vdocument.in/reader036/viewer/2022062422/56813737550346895d9ec6d8/html5/thumbnails/30.jpg)
False Positives (2.3)
30
INTER-DOCUMENT REORGANIZATION is problematic for WikiTrust
file1.c >>
file2.c >>
file3.c >> ...
Entire code-base as one giant doc.
Solution: Intra-document reorg. is non-issue!; Global diff; substring matching; replay history.
func_b(){…}func_c(){…}
file_1.cfunc_c(){…}……
file_2.c
--- ∆ +++ ∆
TRANSCLUSION!
A1
A2
A3 text{{sect}} text
A1
A2
A3
sec. txt. sec. txt.sec. txt.
New doc.Old doc.
![Page 31: Andrew G. West and Insup Lee August 28, 2012](https://reader036.vdocument.in/reader036/viewer/2022062422/56813737550346895d9ec6d8/html5/thumbnails/31.jpg)
False Positives (3)
REVERT CHAINS cause big penalties:
31
+++ BIG CODE CHANGES
“Revert: Needs testing first”
+++ BIG CODE CHANGES
identicalnearly identical
V0 V1 V2 V3
Consequences: At V2, A1 loses reputation (a NONFATAL).
Solution: Revert chains rare; manual inspection?
testing done
At V3, A2 is wrongly punished.
![Page 32: Andrew G. West and Insup Lee August 28, 2012](https://reader036.vdocument.in/reader036/viewer/2022062422/56813737550346895d9ec6d8/html5/thumbnails/32.jpg)
False Positives (4)
• Initially 30 false positive cases– If “solutions” were implemented– This number would be just 10– Suggestions accuracies of 80-90%
• And those 10 cases?– Benign code evolution– Feature requests; method deprecation; no fault
• Results similar for [ruby] and [httpd]32
![Page 33: Andrew G. West and Insup Lee August 28, 2012](https://reader036.vdocument.in/reader036/viewer/2022062422/56813737550346895d9ec6d8/html5/thumbnails/33.jpg)
Better Evaluation
• POC evaluation lacking in many ways– Not enough examples. Subjective.– Says nothing about true negatives
• Bug attribution is extremely difficult– Corpus: “X erred at rev. Y with severity {L,M,H}”– If it could be automated; problem solved!– Work backwards from Bugzilla? Developers?– Reputation as a predictor of future loss events.
• Qualitative instead of quantitative measures33
![Page 34: Andrew G. West and Insup Lee August 28, 2012](https://reader036.vdocument.in/reader036/viewer/2022062422/56813737550346895d9ec6d8/html5/thumbnails/34.jpg)
Other Optimization
• Lots of free variables, weights, ceilings
34
// this is a loopfor(int i=0;i<10;i++) print(“Some text”); for ( int i = 0 ; i < 10 ; i++ ){
print( “” );}
Canonical code
for ( int i = 0; i < 10; i++ ){ print( “” );} for ( int i = 0 ; i < 10 ; i++ ){
print( “” );}
Tokenization
![Page 35: Andrew G. West and Insup Lee August 28, 2012](https://reader036.vdocument.in/reader036/viewer/2022062422/56813737550346895d9ec6d8/html5/thumbnails/35.jpg)
USE-CASES &CONCLUSIONS
35
![Page 36: Andrew G. West and Insup Lee August 28, 2012](https://reader036.vdocument.in/reader036/viewer/2022062422/56813737550346895d9ec6d8/html5/thumbnails/36.jpg)
Use-case: Small Projects
• Small/non-production proj.– Conflict, not just tokens!
• Undergraduate research– Who did all the work?
• Academic paper repositories– Automatic author order!
• Collaboration or conflict?– Graph of reputation events
36
A B
C D
Faction #1
Faction #2
-
+
+
++
--
-
-
-
![Page 37: Andrew G. West and Insup Lee August 28, 2012](https://reader036.vdocument.in/reader036/viewer/2022062422/56813737550346895d9ec6d8/html5/thumbnails/37.jpg)
Use-cases (2)MEDIAWIKI• Alert service/warnings (anti-vandal style)• Expediting code review• Permission granting/revocation
37
![Page 38: Andrew G. West and Insup Lee August 28, 2012](https://reader036.vdocument.in/reader036/viewer/2022062422/56813737550346895d9ec6d8/html5/thumbnails/38.jpg)
Use-cases (2)MEDIAWIKI• Alert service/warnings (anti-vandal style)• Expediting code review• Permission granting/revocation
VEHICLEFORGE.MIL• Access control for users/commits• Wrap content-persistent reputation with metadata
features for a stronger classifier [6]• Robustness considerations (i.e., reach-ability)
38
![Page 39: Andrew G. West and Insup Lee August 28, 2012](https://reader036.vdocument.in/reader036/viewer/2022062422/56813737550346895d9ec6d8/html5/thumbnails/39.jpg)
Conclusions• Despite high-(er) barriers to entry, bad things still
happen in production repositories!• Content-persistence is a reasonably accurate way to
identify these instances ex post facto• False positives indicate code uniqueness:
– 1. Non-functional aspects are non-trivial (WS, comments)– 2. Inter-document reorganization is common– 3. Quality-assurance is more than surface level
• Evaluation needs to be more rigorous• A variety of use-cases if it becomes production-ready
39
![Page 40: Andrew G. West and Insup Lee August 28, 2012](https://reader036.vdocument.in/reader036/viewer/2022062422/56813737550346895d9ec6d8/html5/thumbnails/40.jpg)
References
40
[1] Lohr, Steve. “Pentagon Pushes Crowdsourced Manufacturing”. New York Times “Bits Blog”. April 5, 2012.
[2] Adler, B.T. and L. de Alfaro. “A Content-Driven Reputation System for Wikipedia”. In WWW 2007: Proc. of the 16th Intl. World Wide Web Conference.
[3] Adler, B.T., et al. “Measuring Author Contributions to Wikipedia”. In WikiSym 2008: Proc. of the 3rd Intl. Symposium on Wikis and Open Collaboration.
[4] WikiTrust online. http://www.wikitrust.net/
[5] Mediawiki SVN. http://svn.wikimedia.org/viewvc/mediawiki/ (note: this an archive of that resource, Git is the currently used repository software)
[6] Adler, B.T. et al. “Wikipedia Vandalism Detection: Combining Natural Language, Metadata, and Reputation Features”. In CICLing 2011: Proc. of the 12th Intl. Conference on Intelligent Text Processing and Computational Linguistics.
[Ø] Mediawiki Developer Hub. http://www.mediawiki.org/wiki/Developer_hub