performance evaluation of plagiarism detection method based on the intermediate language vedran...

14

Upload: gregory-hudson

Post on 18-Dec-2015

221 views

Category:

Documents


1 download

TRANSCRIPT

Plagiarism detection methodPlagiarism detection method

Method for detecting plagiarism in source code for .Net languages C# Visual Basic.Net C++ …

Identify similar code fragments

Determine similarity between source files

Based on intermediate language

2

Plagiarism detectionPlagiarism detection

3

1. using System.Text;

2. namespace Test {3. class Math {4. public double GetMaximum(double[] Input) {5. double result = Input[0];6. foreach (double temp in Input) {7. if (temp>result)8. result = temp; }9. return result; } } }

1. using System.Text;

2. namespace Test {3. class Math {4. public double GetMaximum(double[] Input) {5. double result = Input[0];6. for (int i=0;i<Input.Length;i++) {7. if (Input[i]>result)8. result = Input[i]; }9. return result; } } }

Similarity = Number of overlapping lines / Total number of lines = 6 / 9 = 66,66%

First Second

But…But…

4

1. using System.Text;

2. namespace Test {3. class Math {4. public double GetMaximum(double[] Input) {5. double result = Input[0];6. foreach (double temp in Input) {7. if (temp>result)8. result = temp; }9. return result; } } }

1. using System;

2. namespace OtherTest {3. class MyClass {4. public double ReturnMaximum(double[] Array) {5. double current = Input[0];6. for (int j=0;j<Input.Length;j++) {7. if (Input[j]>current)8. current = Input[j]; }9. return result; } } }

Similarity = Number of overlapping lines / Total number of lines = 0 / 9 = 0,00%

First Second

ProblemsProblems

Modification of variable names, types, constants

Modification of class member definitions

Line and command reordering

Solution Detail analysis Complex preprocessing For each supported language

5

Our solutionOur solution

Convert from source language to low-level language (Common Intermediate Language)

By using existing tools Compiler Disassemler

Tools exist for all .Net languages

6

Our solutionOur solution

7

using System.Text;

namespace Test{ class Math { public double GetMaximum(double[] Input) { double result = Input[0]; foreach (double temp in Input) { if (temp>result) result = temp; } return result; } }}

.method public hidebysig instance float64 GetMaximum(float64[] Input) cil managed { // Code size 61 (0x3d) .maxstack 2 .locals init (float64 V_0, float64 V_1, float64 V_2, float64[] V_3, int32 V_4, bool V_5) IL_0000: nop IL_0001: ldarg.1 IL_0002: ldc.i4.0 IL_0003: ldelem.r8 IL_0004: stloc.0 IL_0005: nop IL_0006: ldarg.1 IL_0007: stloc.3….. IL_0037: ldloc.0 IL_0038: stloc.2 IL_0039: br.s IL_003b

IL_003b: ldloc.2 IL_003c: ret } // end of method C::GetMaximum

C# language

Common Intermediate Language

C# compiler nop ldarg.1 ldc.i4.0 ldelem.r8 stloc.0 nop ldarg.1 stloc.3 … ldloc.0 stloc.2 br.s

ldloc.2 ret

Plagiarism detection systemPlagiarism detection system

Evaluate the performance

Analyze and compare behavior to most commonly used plagiarism detection systems: MOSS JPlag CodeMatch

8

Tested systemsTested systems

MOSS Developed in 1994. Commonly used in computer science faculties Supports 26 programming languages

JPlag Developed in 1996. Commonly used in education Supports C, C++, C# and Java

9

Tested SystemsTested Systems

CodeMatch Developed in 2003. Commercial software Supports 26 languages

ILMatch (our system) Developed in 2010. Supports all .Net languages (currently 59 languages)

10

Testing Testing

6 test categories

50 test cases covering common code modification techniques

Evaluation methods Precision, recall F-measure

11

ResultsResults

12

MOSS JPlag

CodeMatch ILMatch

Highest F-measures

PositivePositive

No impact User comments Code formatting Modification of variable and class names Modification of class members Changing data types

Some impact Replacing expressions and loops Rewritting code in different language

13

Further workFurther work

Significant impact Reordering operands Reordering class members Adding redundant statements and variables

Improvements in comparison algorithm

14