![Page 1: Syntactic and Functional Variability of a Million …people.csail.mit.edu/zp/moocshop2013/presentation_16.pdfSyntactic and Functional Variability of a Million Code Submissions in a](https://reader035.vdocument.in/reader035/viewer/2022070801/5f026fed7e708231d40442ae/html5/thumbnails/1.jpg)
Syntactic and Functional Variability of a Million Code Submissions in a Machine
Learning MOOC
Jonathan Huang Chris Piech
Andy Nguyen Leonidas Guibas
![Page 2: Syntactic and Functional Variability of a Million …people.csail.mit.edu/zp/moocshop2013/presentation_16.pdfSyntactic and Functional Variability of a Million Code Submissions in a](https://reader035.vdocument.in/reader035/viewer/2022070801/5f026fed7e708231d40442ae/html5/thumbnails/2.jpg)
How can we efficiently grade 10,000 students?
A variety of assessments
2
Once you have completed the function, the next step in ex1.m will runcomputeCost once using ✓ initialized to zeros, and you will see the costprinted to the screen.
You should expect to see a cost of 32.07.
You should now submit “compute cost” for linear regression with onevariable.
2.2.4 Gradient descent
Next, you will implement gradient descent in the file gradientDescent.m.The loop structure has been written for you, and you only need to supplythe updates to ✓ within each iteration.
As you program, make sure you understand what you are trying to opti-mize and what is being updated. Keep in mind that the cost J(✓) is parame-terized by the vector ✓, not X and y. That is, we minimize the value of J(✓)by changing the values of the vector ✓, not by changing X or y. Refer to theequations in this handout and to the video lectures if you are uncertain.
A good way to verify that gradient descent is working correctly is to lookat the value of J(✓) and check that it is decreasing with each step. Thestarter code for gradientDescent.m calls computeCost on every iterationand prints the cost. Assuming you have implemented gradient descent andcomputeCost correctly, your value of J(✓) should never increase, and shouldconverge to a steady value by the end of the algorithm.
After you are finished, ex1.m will use your final parameters to plot thelinear fit. The result should look something like Figure 2:
Your final values for ✓ will also be used to make predictions on profits inareas of 35,000 and 70,000 people. Note the way that the following lines inex1.m uses matrix multiplication, rather than explicit summation or loop-ing, to calculate the predictions. This is an example of code vectorization inOctave.
You should now submit gradient descent for linear regression with onevariable.
predict1 = [1, 3.5]
*
theta;
predict2 = [1, 7]
*
theta;
7
Prove that if p is prime,
then
pp is irrational.
![Page 3: Syntactic and Functional Variability of a Million …people.csail.mit.edu/zp/moocshop2013/presentation_16.pdfSyntactic and Functional Variability of a Million Code Submissions in a](https://reader035.vdocument.in/reader035/viewer/2022070801/5f026fed7e708231d40442ae/html5/thumbnails/3.jpg)
Complex and informative feedback in MOOCS
3
Short Response!
Long Response!
Multiple choice!
Essay questions!
Coding assignments!
Proofs!
Easy to automate!
Limited ability to ask expressive questions or require creativity!
(for now anyway)
Requires crowdsourcing!
Can assign complex assignments and provide complex feedback!
![Page 4: Syntactic and Functional Variability of a Million …people.csail.mit.edu/zp/moocshop2013/presentation_16.pdfSyntactic and Functional Variability of a Million Code Submissions in a](https://reader035.vdocument.in/reader035/viewer/2022070801/5f026fed7e708231d40442ae/html5/thumbnails/4.jpg)
Binary feedback for coding questions
4
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters) %GRADIENTDESCENT Performs gradient descent to learn theta % theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by % taking num_iters gradient steps with learning rate alpha m = length(y); % number of training examples J_history = zeros(num_iters, 1); for iter = 1:num_iters theta = theta-alpha*1/m*(X'*(X*theta-y)); J_history(iter) = computeCost(X, y, theta); end
Test Inputs
Correct / Incorrect ? Test Outputs
What would a human grader (TA) do?
Linear Regression submission (Homework 1) for Coursera’s ML class
![Page 5: Syntactic and Functional Variability of a Million …people.csail.mit.edu/zp/moocshop2013/presentation_16.pdfSyntactic and Functional Variability of a Million Code Submissions in a](https://reader035.vdocument.in/reader035/viewer/2022070801/5f026fed7e708231d40442ae/html5/thumbnails/5.jpg)
Complex, Informative Feedback
5
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters) m = length(y); J_history = zeros(num_iters, 1); for iter = 1:num_iters hypo = X*theta; newMat = hypo – y; trans1 = (X(:,1)); trans1 = trans1’; newMat1 = trans1 * newMat; temp1 = sum(newMat1); temp1 = (temp1 *alpha)/m; A = [temp1]; theta(1) = theta(1) - A; trans2 = (X(:,2))’ ; newMat2 = trans2*newMat; temp2 = sum(newMat2); temp2 = (temp2 *alpha)/m; B = [temp2]; theta(2)= theta(2) - B; J_history(iter) = computeCost(X, y, theta); end theta(1) = theta(1) theta(2)= theta(2); Why??
Correctness Efficiency
Style Elegance
Better: theta = theta-(alpha/m) *X'*(X*theta-y)
Good Good Poor Poor
![Page 6: Syntactic and Functional Variability of a Million …people.csail.mit.edu/zp/moocshop2013/presentation_16.pdfSyntactic and Functional Variability of a Million Code Submissions in a](https://reader035.vdocument.in/reader035/viewer/2022070801/5f026fed7e708231d40442ae/html5/thumbnails/6.jpg)
0
5000
10000
15000
20000
25000
30000
1 6 11 16 21 26 31 36 41
Optional problems
42 coding problems (ordered chronologically)
Num
ber o
f uni
que
user
s su
bmitt
ing
to e
ach
prob
lem
ML-class by the numbers
120,000
25,839
10,405
Registered Users
Users who submitted code
Users who submitted to every homework assignment
Topics: Linear regression, Logistic Regression, SVMs, Neural Nets, PCA, Nearest Neighbor…
Taught by Andrew Ng
How will Andrew give feedback to 25,000 students (~40,000 code submissions)?
![Page 7: Syntactic and Functional Variability of a Million …people.csail.mit.edu/zp/moocshop2013/presentation_16.pdfSyntactic and Functional Variability of a Million Code Submissions in a](https://reader035.vdocument.in/reader035/viewer/2022070801/5f026fed7e708231d40442ae/html5/thumbnails/7.jpg)
Gradient descent for linear regression ~40,000 submissions
(the big picture)
![Page 8: Syntactic and Functional Variability of a Million …people.csail.mit.edu/zp/moocshop2013/presentation_16.pdfSyntactic and Functional Variability of a Million Code Submissions in a](https://reader035.vdocument.in/reader035/viewer/2022070801/5f026fed7e708231d40442ae/html5/thumbnails/8.jpg)
Unit test output classes
[.1, .3, 0]
Unit test inputs
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters) %GRADIENTDESCENT Performs gradient descent to learn theta
% theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by % taking num_iters gradient steps with learning rate alpha
m = length(y); % number of training examples J_history = zeros(num_iters, 1);
for iter = 1:num_iters
theta = theta-alpha*1/m*(X'*(X*theta-y));
J_history(iter) = computeCost(X, y, theta); end
Student #1
Unit test inputs
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters) %GRADIENTDESCENT Performs gradient descent to learn theta
% theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by % taking num_iters gradient steps with learning rate alpha
m = length(y); % number of training examples J_history = zeros(num_iters, 1);
for iter = 1:num_iters
theta = theta-alpha*1/m*(X'*(X*theta-y));
J_history(iter) = computeCost(X, y, theta); end
Student #2
[.1, .4, 0]
Unit test inputs
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters) %GRADIENTDESCENT Performs gradient descent to learn theta
% theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by % taking num_iters gradient steps with learning rate alpha
m = length(y); % number of training examples J_history = zeros(num_iters, 1);
for iter = 1:num_iters
theta = theta-alpha*1/m*(X'*(X*theta-y));
J_history(iter) = computeCost(X, y, theta); end
Student #5
[NaN]
Unit test inputs
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters) %GRADIENTDESCENT Performs gradient descent to learn theta
% theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by % taking num_iters gradient steps with learning rate alpha
m = length(y); % number of training examples J_history = zeros(num_iters, 1);
for iter = 1:num_iters
theta = theta-alpha*1/m*(X'*(X*theta-y));
J_history(iter) = computeCost(X, y, theta); end
Student #3
[.1, .3, 0]
Unit test inputs
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters) %GRADIENTDESCENT Performs gradient descent to learn theta
% theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by % taking num_iters gradient steps with learning rate alpha
m = length(y); % number of training examples J_history = zeros(num_iters, 1);
for iter = 1:num_iters
theta = theta-alpha*1/m*(X'*(X*theta-y));
J_history(iter) = computeCost(X, y, theta); end
Student #6
[.1, .4, 0]
…
Unit test inputs
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters) %GRADIENTDESCENT Performs gradient descent to learn theta
% theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by % taking num_iters gradient steps with learning rate alpha
m = length(y); % number of training examples J_history = zeros(num_iters, 1);
for iter = 1:num_iters
theta = theta-alpha*1/m*(X'*(X*theta-y));
J_history(iter) = computeCost(X, y, theta); end
Student #4
[NaN]
![Page 9: Syntactic and Functional Variability of a Million …people.csail.mit.edu/zp/moocshop2013/presentation_16.pdfSyntactic and Functional Variability of a Million Code Submissions in a](https://reader035.vdocument.in/reader035/viewer/2022070801/5f026fed7e708231d40442ae/html5/thumbnails/9.jpg)
Output based feedback
[.1, .3, 0]
Output class #1
[.1, .4, 0]
Output class #2
[NaN]
Output class #3
Instructor Feedback: Great job! Keep up the good work!
Instructor Feedback: Did you remember to preprocess the data before running the SVD?
Instructor Feedback: You might have accidentally divided by zero when you tried to normalize
How many output classes must Andrew Ng annotate?
![Page 10: Syntactic and Functional Variability of a Million …people.csail.mit.edu/zp/moocshop2013/presentation_16.pdfSyntactic and Functional Variability of a Million Code Submissions in a](https://reader035.vdocument.in/reader035/viewer/2022070801/5f026fed7e708231d40442ae/html5/thumbnails/10.jpg)
0 1 2 3 4 5 6 7 8 9
10
0 20
0 40
0 60
0 80
0 10
00
1200
14
00
1600
18
00
2000
22
00
2400
26
00
2800
Regularized Logistic regression
(i.e., poor Andrew…)
Num
ber o
f pr
oble
ms
(of 4
2)
Number of distinct unit test output classes (over all submissions to a problem)
Functional Variability Problems with over 1000
ways to be wrong!
Of course, not all output classes are born equal…
![Page 11: Syntactic and Functional Variability of a Million …people.csail.mit.edu/zp/moocshop2013/presentation_16.pdfSyntactic and Functional Variability of a Million Code Submissions in a](https://reader035.vdocument.in/reader035/viewer/2022070801/5f026fed7e708231d40442ae/html5/thumbnails/11.jpg)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
1
1 6 11 16 21 26 31 36 41 46
Regularized Logistic Regression
Linear Regression (Gradient Descent) Return a 5x5 identity matrix
Top-k unit test output classes
Frac
tion
of s
tude
nts
cove
red
by
anno
tatin
g to
p-k
unit
test
out
put c
lass
es
How many students are covered?
If Andrew gives feedback on 11 output classes, he “covers” 75% of the users
![Page 12: Syntactic and Functional Variability of a Million …people.csail.mit.edu/zp/moocshop2013/presentation_16.pdfSyntactic and Functional Variability of a Million Code Submissions in a](https://reader035.vdocument.in/reader035/viewer/2022070801/5f026fed7e708231d40442ae/html5/thumbnails/12.jpg)
Beyond output classes function [J, grad] = lrCostFunction (theta, X, y, lambda) m = length (y); J = 0; grad = zeros (size (theta)); for i = 1:m J = J + (-y (i) * log (sigmoid (X (i, :) * theta)))
- (1 - y (i)) * log (1 - sigmoid (X (i, :) * theta)); endfor; J = J / m; for j = 2:size (theta) J = J + (lambda * (theta (j) ^ 2) / (2 * m)); endfor; for j = 1:size (theta) for i = 1:m grad (j) = grad (j) +
(sigmoid (X (i, :) * theta) - y (i)) * X (i, j); endfor; grad (j) = grad (j) / m; endfor; for j = 2:size (theta) grad (j) = grad (j) + (lambda * theta (j)) / m; endfor; grad = grad (:);
function [J, grad] = lrCostFunction (theta, X, y, lambda) m = length (y); J = 0; grad = zeros (size (theta)); n = size (X, 2);
h = sigmoid (X * theta); J = ((-y)' * log (h) - (1 - y)' * log (1 - h)) / m; theta1 = [0; theta(2:size (theta), :)]; soma = sum (theta1' * theta1); p = (lambda * soma) / (2 * m);
J = J + p; grad = (X' * (h - y) + lambda * theta1) / m;
Two “correct” implementations of logistic regression
Output based feedback ignores: • vectorization, • multiple approaches with the same result, • coding style, …
![Page 13: Syntactic and Functional Variability of a Million …people.csail.mit.edu/zp/moocshop2013/presentation_16.pdfSyntactic and Functional Variability of a Million Code Submissions in a](https://reader035.vdocument.in/reader035/viewer/2022070801/5f026fed7e708231d40442ae/html5/thumbnails/13.jpg)
TARGET
IDENT:A
ASSIGN
SOURCE
CONST:[]
STATEMENT_LIST
ASSIGN NO_OP:endfunction
TARGET
IDENT:A
SOURCE
IDX_EXPR
IDENT:eye ARG_LIST
CONST:5
FUNCTION:warmUpExercise
From function… to syntax! function A = warmUpExercise()
% helloooo! A = []; A = eye(5); endfunction
• Whitespace • Comments • Differences in
variable names • …
ASTs ignore:
Abstract syntax tree
![Page 14: Syntactic and Functional Variability of a Million …people.csail.mit.edu/zp/moocshop2013/presentation_16.pdfSyntactic and Functional Variability of a Million Code Submissions in a](https://reader035.vdocument.in/reader035/viewer/2022070801/5f026fed7e708231d40442ae/html5/thumbnails/14.jpg)
0
1
2
3
4
5
6
7
0 20
00
4000
60
00
8000
10
000
1200
0 14
000
1600
0 18
000
2000
0 22
000
2400
0 26
000
2800
0 30
000
3200
0 34
000
3600
0 38
000
4000
0
Regularized Logistic regression
(again)
Num
ber o
f pro
blem
s (o
f 42)
Number of distinct abstract syntax trees (over all submissions to a problem)
Syntactic Variability
and again: not all ASTs are born equal…
![Page 15: Syntactic and Functional Variability of a Million …people.csail.mit.edu/zp/moocshop2013/presentation_16.pdfSyntactic and Functional Variability of a Million Code Submissions in a](https://reader035.vdocument.in/reader035/viewer/2022070801/5f026fed7e708231d40442ae/html5/thumbnails/15.jpg)
0.00E+00 1.00E-01 2.00E-01 3.00E-01 4.00E-01 5.00E-01 6.00E-01 7.00E-01 8.00E-01 9.00E-01 1.00E+00
1 6 11 16 21 26 31
Linear Regression (Gradient Descent)
Return a 5x5 identity matrix
Linear Regression (Evaluate cost function)
Top-k abstract syntax trees
Frac
tion
of s
tude
nts
cove
red
by
anno
tatin
g to
p-k
abst
ract
syn
tax
trees
How many students are covered?
If Andrew gives feedback on 20 ASTs, he “covers” ~30% of the users.
Idea: If Alice and Bob have highly similar submissions (albeit not identical), the same feedback can often apply to both students!!
![Page 16: Syntactic and Functional Variability of a Million …people.csail.mit.edu/zp/moocshop2013/presentation_16.pdfSyntactic and Functional Variability of a Million Code Submissions in a](https://reader035.vdocument.in/reader035/viewer/2022070801/5f026fed7e708231d40442ae/html5/thumbnails/16.jpg)
Core Subproblem: Code Correspondence
16
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters) m = length(y); J_history = zeros(num_iters, 1); for iter = 1:num_iters h=X*theta; theta=theta-alpha*(1/m).*(X'*(h-y)); J_history(iter) = computeCost(X, y, theta); end
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters) m = length(y); J_history = zeros(num_iters, 1); for iter = 1:num_iters temp1 = (alpha/m) * ( ( ( (theta' * X')' - y )' * X(:,1) ) ); temp2 = (alpha/m) * ( ( ( (theta' * X')'- y )' * X(:,2) ) ); theta(1) = theta(1) – temp1; theta(2) = theta(2) – temp2; fprintf('Theta: %f, %f\n', theta(1), theta(2)) J_history(iter) = computeCost(X, y, theta); end
![Page 17: Syntactic and Functional Variability of a Million …people.csail.mit.edu/zp/moocshop2013/presentation_16.pdfSyntactic and Functional Variability of a Million Code Submissions in a](https://reader035.vdocument.in/reader035/viewer/2022070801/5f026fed7e708231d40442ae/html5/thumbnails/17.jpg)
Tree matching
17
FUNCTION (warmUpExercise) STATEMENT_LIST ASSIGN TARGET IDENT (A) SOURCE CONST ([]) ASSIGN TARGET IDENT (A) SOURCE IDX_EXPR IDENT (eye) ARG_LIST CONST (5) NO_OP (endfunction)
function A = warmUpExercise() A = eye(5); endfunction
function A = warmUpExercise() A = []; A = eye(5); endfunction
See [Shasha et al, ‘94] for O(n4) dynamic programming algorithm
![Page 18: Syntactic and Functional Variability of a Million …people.csail.mit.edu/zp/moocshop2013/presentation_16.pdfSyntactic and Functional Variability of a Million Code Submissions in a](https://reader035.vdocument.in/reader035/viewer/2022070801/5f026fed7e708231d40442ae/html5/thumbnails/18.jpg)
ASSIGN
SOURCE TARGET
BINARY_OP:*
IDENT:X
IDENT:theta
TARGET
IDENT:errors
BINARY_OP:-
IDENT:predictions IDENT:y
BINARY_OP:*
POSTFIX:'
IDENT:XIDENT:errors
TARGET
IDENT:derivative_vector
BINARY_OP:*
BINARY_OP:/IDENT:X_errors
CONST:1IDENT:m
BINARY_OP:-
BINARY_OP:*IDENT:theta
IDENT:alphaPOSTFIX:'
IDENT:derivative_vector
IDX_EXPR
ARG_LIST
CONST:1
IDX_EXPR
ARG_LIST
IDENT:y IDENT:theta IDENT:alpha CONST:2
TARGET
IDX_EXPR
ARG_LIST
IDX_EXPR
ARG_LISTIDENT:theta
CONST:2
IDENT:m
TARGET
IDENT:length
IDENT:y
ARG_LIST
IDX_EXPR
SOURCE
ASSIGN
IDENT:J_history
TARGET
IDENT:zeros
IDENT:num_iters CONST:1
ARG_LIST
IDX_EXPR
SOURCE
ASSIGN
IDENT:iter
LEFT
CONST:1
BASE
none
INCREMENT
IDENT:num_iters
LIMIT
COLON
CONTROL
IDENT:predictions
SOURCE
ASSIGN
IDENT:X_errors
TARGETSOURCE
ASSIGN
SOURCE
ASSIGN
TARGET SOURCE
ASSIGN
IDENT:J_history
IDENT:iter
ARG_LIST
IDX_EXPR
TARGET
IDENT:computeCost
IDENT:XIDENT:y IDENT:theta
ARG_LIST
IDX_EXPR
SOURCE
ASSIGN
STATEMENT_LIST
FOR NO_OP:endfunction
STATEMENT_LIST
FUNCTION:gradientDescent
Matched ASTs for two submitted octave implementations
Purple – tokens common to both implementations Red/Blue – tokens distinct to one of the implementations
Gradient descent for one-dimensional linear regression
Boilerplate code
![Page 19: Syntactic and Functional Variability of a Million …people.csail.mit.edu/zp/moocshop2013/presentation_16.pdfSyntactic and Functional Variability of a Million Code Submissions in a](https://reader035.vdocument.in/reader035/viewer/2022070801/5f026fed7e708231d40442ae/html5/thumbnails/19.jpg)
Code webs
• Codewebs can act as a backbone for: – Clustering – Outlier identification – Automatic suggestions for alternative (better)
implementations – Propagating complex and informative feedback/annotations 19
Implementations
Graphs of code submissions with nodes connected to nearest neighbors
![Page 20: Syntactic and Functional Variability of a Million …people.csail.mit.edu/zp/moocshop2013/presentation_16.pdfSyntactic and Functional Variability of a Million Code Submissions in a](https://reader035.vdocument.in/reader035/viewer/2022070801/5f026fed7e708231d40442ae/html5/thumbnails/20.jpg)
20
Labeled code webs!
Gradient descent for linear regression ~40,000 submissions
![Page 21: Syntactic and Functional Variability of a Million …people.csail.mit.edu/zp/moocshop2013/presentation_16.pdfSyntactic and Functional Variability of a Million Code Submissions in a](https://reader035.vdocument.in/reader035/viewer/2022070801/5f026fed7e708231d40442ae/html5/thumbnails/21.jpg)
1.00E-06
1.00E-05
1.00E-04
1.00E-03
1.00E-02
1.00E-01
1.00E+00
1 41 81 121 161 201
Log
prob
abili
ty
Log distribution for pairs that agree on unit tests
Log distribution for pairs that disagree on unit tests
Edit distance between pairs of submissions
Interplay between functional and syntactic similarity
![Page 22: Syntactic and Functional Variability of a Million …people.csail.mit.edu/zp/moocshop2013/presentation_16.pdfSyntactic and Functional Variability of a Million Code Submissions in a](https://reader035.vdocument.in/reader035/viewer/2022070801/5f026fed7e708231d40442ae/html5/thumbnails/22.jpg)
Finding Prototypical Submissions
22
(we use affinity propagation [Frey,
Dueck, 2007])
![Page 23: Syntactic and Functional Variability of a Million …people.csail.mit.edu/zp/moocshop2013/presentation_16.pdfSyntactic and Functional Variability of a Million Code Submissions in a](https://reader035.vdocument.in/reader035/viewer/2022070801/5f026fed7e708231d40442ae/html5/thumbnails/23.jpg)
(Correct) Gradient Descent Prototypes
23
m = length(y); J_history = zeros(num_iters, 1); for iter = 1:num_iters theta = theta-alpha*1/m*(X'*(X*theta-y)); J_history(iter) = computeCost(X, y, theta); end
m = length(y); J_history = zeros(num_iters, 1); for iter = 1:num_iters, delta = X' *(X * theta – y); theta = theta - (alpha / m * delta); J_history(iter) = computeCost(X, y, theta); end
m = length(y); J_history = zeros(num_iters, 1); for iter = 1:num_iters t0 = theta(1,1) - ((alpha / m) * sum ((X * theta) - y)); t1 = theta(2,1) - ((alpha / m) * sum (((X * theta) - y) .* X (:,2))); theta = [t0; t1]; J_history(iter) = computeCost(X, y, theta); end
function [theta, J_history] = gradientDescent(X,y,theta,alpha,num_iters)
#1
#2
#3
Most common implementation, vectorized, one-line update!
Vectorized, with temporary variable!
Unvectorized, synchronous update!
![Page 24: Syntactic and Functional Variability of a Million …people.csail.mit.edu/zp/moocshop2013/presentation_16.pdfSyntactic and Functional Variability of a Million Code Submissions in a](https://reader035.vdocument.in/reader035/viewer/2022070801/5f026fed7e708231d40442ae/html5/thumbnails/24.jpg)
Finding Outlier Submissions
24
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters) m = length(y); J_history = zeros(num_iters, 1); for iter = 1:num_iters hypo = X*theta; newMat = hypo - y ; trans1 = (X(:,1)) ; trans1 = trans1'; newMat1 = trans1 * newMat; temp1 = sum(newMat1); temp1 = (temp1 *alpha)/m; A = [temp1]; theta(1) = theta(1) - A; trans2 = (X(:,2))' ; newMat2 = trans2*newMat; temp2 = sum(newMat2); temp2 = (temp2 *alpha)/m; B = [temp2]; theta(2)= theta(2) - B; J_history(iter) = computeCost(X, y, theta); end theta(1) = theta(1) ; theta(2)= theta(2); end
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters) m = length(y); J_history = zeros(num_iters, 1); for iter = 1:num_iters a=0; h=zeros(m,1); h=X*theta; k=(h-y); for i=1:m, a=k(i)+a; end; a=(alpha*a)/m; theta(1)=theta(1)-a; a=0; x1=X(:,2); for i=1:m, a=k(i)*x1(i)+a; end; a=(alpha*a)/m; theta(2)=theta(2)-a; J_history(iter) = computeCost(X, y, theta); end end
![Page 25: Syntactic and Functional Variability of a Million …people.csail.mit.edu/zp/moocshop2013/presentation_16.pdfSyntactic and Functional Variability of a Million Code Submissions in a](https://reader035.vdocument.in/reader035/viewer/2022070801/5f026fed7e708231d40442ae/html5/thumbnails/25.jpg)
Finding Outlier Submissions
25
function [C, sigma] = dataset3Params(X, y, Xval, yval) C = 1; sigma = 0.3; mean(double(predictions ~= yval)) % d = [0.01, 0.03, 0.1, 0.3, 1, 3, 10, 30]; i = 1; j = 1; global gl_sigma = d(j); gl_sigma = d(j); function f = f(x1, x2)
global gl_sigma; f = gaussianKernel(x1, x2, gl_sigma); end model = svmTrain(X, y, d(i), @f); pred = svmPredict(model, Xval); ans = mean(double(pred ~= yval)); if (i == 1 && j == 1) || ans < res
res = ans; C = d(i); sigma = d(j);
end i = 1; j = 2; global gl_sigma = d(j); gl_sigma = d(j); function f = f(x1, x2)
global gl_sigma; f = gaussianKernel(x1, x2, gl_sigma); end model = svmTrain(X, y, d(i), @f); pred = svmPredict(model, Xval); ans = mean(double(pred ~= yval)); if (i == 1 && j == 1) || ans < res
res = ans; C = d(i); sigma = d(j);
end
i = 1; j = 3; global gl_sigma = d(j); gl_sigma = d(j); function f = f(x1, x2)
global gl_sigma; f = gaussianKernel(x1, x2, gl_sigma); end model = svmTrain(X, y, d(i), @f); pred = svmPredict(model, Xval); ans = mean(double(pred ~= yval)); if (i == 1 && j == 1) || ans < res
res = ans; C = d(i); sigma = d(j);
end i = 1; j = 4; global gl_sigma = d(j); gl_sigma = d(j); function f = f(x1, x2)
global gl_sigma; f = gaussianKernel(x1, x2, gl_sigma); end model = svmTrain(X, y, d(i), @f); pred = svmPredict(model, Xval); ans = mean(double(pred ~= yval)); if (i == 1 && j == 1) || ans < res
res = ans; C = d(i); sigma = d(j);
end i = 1; j = 5; global gl_sigma = d(j); gl_sigma = d(j); function f = f(x1, x2)
global gl_sigma; f = gaussianKernel(x1, x2, gl_sigma); end model = svmTrain(X, y, d(i), @f); pred = svmPredict(model, Xval); ans = mean(double(pred ~= yval)); if (i == 1 && j == 1) || ans < res
res = ans; C = d(i); sigma = d(j);
end
i = 1; j = 6; global gl_sigma = d(j); gl_sigma = d(j); function f = f(x1, x2)
global gl_sigma; f = gaussianKernel(x1, x2, gl_sigma); end model = svmTrain(X, y, d(i), @f); pred = svmPredict(model, Xval); ans = mean(double(pred ~= yval)); if (i == 1 && j == 1) || ans < res
res = ans; C = d(i); sigma = d(j);
end i = 1; j = 7; global gl_sigma = d(j); gl_sigma = d(j); function f = f(x1, x2)
global gl_sigma; f = gaussianKernel(x1, x2, gl_sigma); end model = svmTrain(X, y, d(i), @f); pred = svmPredict(model, Xval); ans = mean(double(pred ~= yval)); if (i == 1 && j == 1) || ans < res
res = ans; C = d(i); sigma = d(j);
end i = 1; j = 8; global gl_sigma = d(j); gl_sigma = d(j); function f = f(x1, x2)
global gl_sigma; f = gaussianKernel(x1, x2, gl_sigma); end model = svmTrain(X, y, d(i), @f); pred = svmPredict(model, Xval); ans = mean(double(pred ~= yval)); if (i == 1 && j == 1) || ans < res
res = ans; C = d(i); sigma = d(j);
end i
![Page 26: Syntactic and Functional Variability of a Million …people.csail.mit.edu/zp/moocshop2013/presentation_16.pdfSyntactic and Functional Variability of a Million Code Submissions in a](https://reader035.vdocument.in/reader035/viewer/2022070801/5f026fed7e708231d40442ae/html5/thumbnails/26.jpg)
![Page 27: Syntactic and Functional Variability of a Million …people.csail.mit.edu/zp/moocshop2013/presentation_16.pdfSyntactic and Functional Variability of a Million Code Submissions in a](https://reader035.vdocument.in/reader035/viewer/2022070801/5f026fed7e708231d40442ae/html5/thumbnails/27.jpg)