programming with “big code” - github pages · predicting program properties from “big...
TRANSCRIPT
![Page 1: Programming with “Big Code” - GitHub Pages · Predicting Program Properties from “Big Code”, POPL 2015 Fast and Precise Statistical Code Completion, ETH TR Statistical Feedback](https://reader036.vdocument.in/reader036/viewer/2022070711/5ecb78ee0746fe023043adfd/html5/thumbnails/1.jpg)
Programming with “Big Code”: Lessons, Techniques, Applications
Pavol Bielik, Veselin Raychev, Martin VechevDepartment of Computer ScienceETH Zurich
![Page 2: Programming with “Big Code” - GitHub Pages · Predicting Program Properties from “Big Code”, POPL 2015 Fast and Precise Statistical Code Completion, ETH TR Statistical Feedback](https://reader036.vdocument.in/reader036/viewer/2022070711/5ecb78ee0746fe023043adfd/html5/thumbnails/2.jpg)
Work @ ETH Zurich
Work on “Big Code” started a few years ago
Code Completion with Statistical Language Models, PLDI 2014Machine Translation for Programming Languages, Onward 2014Predicting Program Properties from “Big Code”, POPL 2015Fast and Precise Statistical Code Completion, ETH TRStatistical Feedback Generation for Programs, ETH TRProgramming with Big Code: Lessons, Techniques and Applications, SNAPL 2015
Prof.Martin Vechev
Prof.AndreasKrause
VeselinRaychev
PavolBielik
Svetoslav Karaivanov
ChristineZeller
PascalRoos
![Page 3: Programming with “Big Code” - GitHub Pages · Predicting Program Properties from “Big Code”, POPL 2015 Fast and Precise Statistical Code Completion, ETH TR Statistical Feedback](https://reader036.vdocument.in/reader036/viewer/2022070711/5ecb78ee0746fe023043adfd/html5/thumbnails/3.jpg)
Applications[PLDI 14]SLANG: Code Completion
Intent i = new Intent();
?ctx.sendBroadcast(i);
All of these benefit from the “Big Code” and lead to applications not possible with previous techniques
![Page 4: Programming with “Big Code” - GitHub Pages · Predicting Program Properties from “Big Code”, POPL 2015 Fast and Precise Statistical Code Completion, ETH TR Statistical Feedback](https://reader036.vdocument.in/reader036/viewer/2022070711/5ecb78ee0746fe023043adfd/html5/thumbnails/4.jpg)
Applications[PLDI 14]SLANG: Code Completion
Intent i = new Intent();
?ctx.sendBroadcast(i);
P( Java | C# )P( C# | Java )P( Java )
[Onward 14]Programming Language Translation
All of these benefit from the “Big Code” and lead to applications not possible with previous techniques
![Page 5: Programming with “Big Code” - GitHub Pages · Predicting Program Properties from “Big Code”, POPL 2015 Fast and Precise Statistical Code Completion, ETH TR Statistical Feedback](https://reader036.vdocument.in/reader036/viewer/2022070711/5ecb78ee0746fe023043adfd/html5/thumbnails/5.jpg)
...for x in range(a):
print a[x]
[submitted]Statistical Feedback Generation
Applications[PLDI 14]SLANG: Code Completion
Intent i = new Intent();
?ctx.sendBroadcast(i);
likely error
P( Java | C# )P( C# | Java )P( Java )
[Onward 14]Programming Language Translation
All of these benefit from the “Big Code” and lead to applications not possible with previous techniques
![Page 6: Programming with “Big Code” - GitHub Pages · Predicting Program Properties from “Big Code”, POPL 2015 Fast and Precise Statistical Code Completion, ETH TR Statistical Feedback](https://reader036.vdocument.in/reader036/viewer/2022070711/5ecb78ee0746fe023043adfd/html5/thumbnails/6.jpg)
[POPL 15]JSNice: DeobfuscationType Prediction
...for x in range(a):
print a[x]
[submitted]Statistical Feedback Generation
Applications[PLDI 14]SLANG: Code Completion
Intent i = new Intent();
?ctx.sendBroadcast(i);
likely error
P( Java | C# )P( C# | Java )P( Java )
[Onward 14]Programming Language Translation
All of these benefit from the “Big Code” and lead to applications not possible with previous techniques
![Page 7: Programming with “Big Code” - GitHub Pages · Predicting Program Properties from “Big Code”, POPL 2015 Fast and Precise Statistical Code Completion, ETH TR Statistical Feedback](https://reader036.vdocument.in/reader036/viewer/2022070711/5ecb78ee0746fe023043adfd/html5/thumbnails/7.jpg)
Probabilistic Programming Systems: Dimensions
Applications
Intermediate Representation
Analyze Program(PL)
Train Model(ML)
Query Model(ML)
![Page 8: Programming with “Big Code” - GitHub Pages · Predicting Program Properties from “Big Code”, POPL 2015 Fast and Precise Statistical Code Completion, ETH TR Statistical Feedback](https://reader036.vdocument.in/reader036/viewer/2022070711/5ecb78ee0746fe023043adfd/html5/thumbnails/8.jpg)
What is a generic metric for code?Applications
Intermediate Representation
Analyze Program(PL)
Train Model(ML)
Query Model(ML)
✔ Cross Entropy → ✗ Code Completion✔ BLEU Score → ✗ Program Translation
Probabilistic Programming Systems: Dimensions
Traditional metrics might not be indicative of client performance
![Page 9: Programming with “Big Code” - GitHub Pages · Predicting Program Properties from “Big Code”, POPL 2015 Fast and Precise Statistical Code Completion, ETH TR Statistical Feedback](https://reader036.vdocument.in/reader036/viewer/2022070711/5ecb78ee0746fe023043adfd/html5/thumbnails/9.jpg)
What is the best program representation?Applications
Intermediate Representation
Analyze Program(PL)
Train Model(ML)
Query Model(ML)
Probabilistic Programming Systems: Dimensions
![Page 10: Programming with “Big Code” - GitHub Pages · Predicting Program Properties from “Big Code”, POPL 2015 Fast and Precise Statistical Code Completion, ETH TR Statistical Feedback](https://reader036.vdocument.in/reader036/viewer/2022070711/5ecb78ee0746fe023043adfd/html5/thumbnails/10.jpg)
What is the best program representation?Applications
Intermediate Representation
Analyze Program(PL)
Train Model(ML)
Query Model(ML)
Probabilistic Programming Systems: Dimensions
Sequences
req → {<open, 0>, <send, 0>}source → {..., <open, 2>}
=
a +
x y
Trees
Graphical Models Feature Vectors
req → (0,0,1,1,0)source → (1,0,0,0,0)...
![Page 11: Programming with “Big Code” - GitHub Pages · Predicting Program Properties from “Big Code”, POPL 2015 Fast and Precise Statistical Code Completion, ETH TR Statistical Feedback](https://reader036.vdocument.in/reader036/viewer/2022070711/5ecb78ee0746fe023043adfd/html5/thumbnails/11.jpg)
What is the best program representation?Applications
Intermediate Representation
Analyze Program(PL)
Train Model(ML)
Query Model(ML)
Probabilistic Programming Systems: Dimensions
Choosing the right representation is crucial
Feedback Generation: Sequence representations
Allamanis et. al. [2013] 46.4%
Hsiao et. al. [2014] 50.8%
Incorporate semantic information 75.3%
Incorporate dataflow analysis 86.3%
![Page 12: Programming with “Big Code” - GitHub Pages · Predicting Program Properties from “Big Code”, POPL 2015 Fast and Precise Statistical Code Completion, ETH TR Statistical Feedback](https://reader036.vdocument.in/reader036/viewer/2022070711/5ecb78ee0746fe023043adfd/html5/thumbnails/12.jpg)
Applications
Intermediate Representation
Analyze Program(PL)
Train Model(ML)
Query Model(ML)
How to extract program representation?SLANG (APIs): alias and typestate analysisJSNice (Variable Names): scope and alias analysisFeedback Generation: alias, control-flow and typestate analysis
Probabilistic Programming Systems: Dimensions
req.open("GET", source, false);
req → {<open, 0>, <send, 0>}source → {..., <open, 2>}
![Page 13: Programming with “Big Code” - GitHub Pages · Predicting Program Properties from “Big Code”, POPL 2015 Fast and Precise Statistical Code Completion, ETH TR Statistical Feedback](https://reader036.vdocument.in/reader036/viewer/2022070711/5ecb78ee0746fe023043adfd/html5/thumbnails/13.jpg)
Applications
Intermediate Representation
Analyze Program(PL)
Train Model(ML)
Query Model(ML)
How to extract program representation?SLANG (APIs): alias and typestate analysisJSNice (Variable Names): scope and alias analysisFeedback Generation: alias, control-flow and typestate analysis
Design scalable yet precise enough algorithms
Probabilistic Programming Systems: Dimensions
1
0.5
0
no alias analysiswith alias analysis
1% 10% 100%
[Precision vs % of data used]
![Page 14: Programming with “Big Code” - GitHub Pages · Predicting Program Properties from “Big Code”, POPL 2015 Fast and Precise Statistical Code Completion, ETH TR Statistical Feedback](https://reader036.vdocument.in/reader036/viewer/2022070711/5ecb78ee0746fe023043adfd/html5/thumbnails/14.jpg)
Applications
Intermediate Representation
Analyze Program(PL)
Train Model(ML)
Query Model(ML)
What is the suitable probabilistic model?N-gram language model
Probabilistic context-free grammarsNeural networks
(Structured) Support vector machineConditional Random Fields
...
Probabilistic Programming Systems: Dimensions
![Page 15: Programming with “Big Code” - GitHub Pages · Predicting Program Properties from “Big Code”, POPL 2015 Fast and Precise Statistical Code Completion, ETH TR Statistical Feedback](https://reader036.vdocument.in/reader036/viewer/2022070711/5ecb78ee0746fe023043adfd/html5/thumbnails/15.jpg)
Applications
Intermediate Representation
Analyze Program(PL)
Train Model(ML)
Query Model(ML)
What is the suitable probabilistic model?N-gram language model
Probabilistic context-free grammarsNeural networks
(Structured) Support vector machineConditional Random Fields
...
Probabilistic Programming Systems: Dimensions
Baseline 25.3%
Independent 54.1%
Structured 63.4%
Structured prediction is critical
![Page 16: Programming with “Big Code” - GitHub Pages · Predicting Program Properties from “Big Code”, POPL 2015 Fast and Precise Statistical Code Completion, ETH TR Statistical Feedback](https://reader036.vdocument.in/reader036/viewer/2022070711/5ecb78ee0746fe023043adfd/html5/thumbnails/16.jpg)
Programming with “Big Code”Applications
Intermediate Representation
Analyze Program(PL)
Train Model(ML)
Query Model
Code completionDeobfuscation
Program synthesis
Feedback generation
Translation
alias analysis
typestate analysis
Graphical Models
N-gram language modelSVM Structured SVM
Neural Networks
Sequences (sentences)
Trees
Translation TableFeature Vectors
control-flow analysis
scope analysis
argmax P(y | x)
y ∈ Ω
![Page 17: Programming with “Big Code” - GitHub Pages · Predicting Program Properties from “Big Code”, POPL 2015 Fast and Precise Statistical Code Completion, ETH TR Statistical Feedback](https://reader036.vdocument.in/reader036/viewer/2022070711/5ecb78ee0746fe023043adfd/html5/thumbnails/17.jpg)
Programming with “Big Code”Applications
Intermediate Representation
Analyze Program(PL)
Train Model(ML)
Query Model
Code completionDeobfuscation
Program synthesis
Feedback generation
Translation
alias analysis
typestate analysis
Graphical Models
N-gram language modelSVM Structured SVM
Neural Networks
Sequences (sentences)
Trees
Translation TableFeature Vectors
control-flow analysis
scope analysis
argmax P(y | x)
y ∈ Ω
Greedy MAP Inference
http://www.nice2predict.org/http://www.srl.inf.ethz.ch/spas.php More information and tutorials at:
![Page 18: Programming with “Big Code” - GitHub Pages · Predicting Program Properties from “Big Code”, POPL 2015 Fast and Precise Statistical Code Completion, ETH TR Statistical Feedback](https://reader036.vdocument.in/reader036/viewer/2022070711/5ecb78ee0746fe023043adfd/html5/thumbnails/18.jpg)
General framework
http://www.nice2predict.org/
We have open-sourced our prediction engine and we are extending it with new capabilities
Upcoming PLDI’15 tutorial
![Page 19: Programming with “Big Code” - GitHub Pages · Predicting Program Properties from “Big Code”, POPL 2015 Fast and Precise Statistical Code Completion, ETH TR Statistical Feedback](https://reader036.vdocument.in/reader036/viewer/2022070711/5ecb78ee0746fe023043adfd/html5/thumbnails/19.jpg)
Programming with “Big Code”Applications
Intermediate Representation
Analyze Program(PL)
Train Model(ML)
Query Model
Code completionDeobfuscation
Program synthesis
Feedback generation
Translation
alias analysis
typestate analysis
Graphical Models
N-gram language modelSVM Structured SVM
Neural Networks
Sequences (sentences)
Trees
Translation TableFeature Vectors
control-flow analysis
scope analysis
argmax P(y | x)
y ∈ Ω
Greedy MAP Inference
http://www.nice2predict.org/http://www.srl.inf.ethz.ch/spas.php More information and tutorials at: