presentation 7 summary

77
Presentation 7 Summary Cross Language Clone Analysis Team 2 November 22, 2010

Upload: iren

Post on 24-Feb-2016

23 views

Category:

Documents


0 download

DESCRIPTION

Presentation 7 Summary. Cross Language Clone Analysis Team 2 November 22, 2010. Agenda. Feasibility Study Release Plan Architecture Parsing CodeDOM Clone Analysis Testing Demonstration Team Collaboration Path Forward. Our Team. Allen Tucker Patricia Bradford Greg Rodgers - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Presentation  7 Summary

Presentation 7 Summary

Cross Language Clone AnalysisTeam 2

November 22, 2010

Page 2: Presentation  7 Summary

• Feasibility Study• Release Plan• Architecture• Parsing• CodeDOM• Clone Analysis• Testing• Demonstration• Team Collaboration• Path Forward

Agenda

2

Page 3: Presentation  7 Summary

Allen Tucker Patricia Bradford Greg Rodgers Brian Bentley Ashley Chafin

Our Team

3

Page 4: Presentation  7 Summary

Feasibility StudyOur evaluation of the project to determine the difficulty in carrying out the task.

4

Page 5: Presentation  7 Summary

Our Customers: Dr. Etzkorn and Dr. Kraft Customer Request:

◦ A tool that will abstract programs in C++, C#, Java, and (Python or VB) to the Dagstuhl Middle Metamodel, Microsoft CodeDOM or something similar, and detect cross-language clones.

Areas to Note: ◦ the user interface◦ easy comparisons of clones◦ visualization of clones◦ sub-clones◦ clone detection for large bodies of code

Task Summary

5

Page 6: Presentation  7 Summary

Per our task, in order to find clones across different programming languages, we will have to first convert the code from each language over to a language independent object model.

Some Language Independent Object Models:◦ Dagstuhl Middle Metamodel (DMM)◦ Microsoft CodeDOM

Both of these models provide a language independent object model for representing the structure of source code.

Task Summary (cont.)

6

Page 7: Presentation  7 Summary

Three Step Process• Step 1 Code Translation

• Step 2 Clone Detection

• Step 3 Visualization

Task Understanding

Source Files Translator Common

Model

Common Model Inspector Detected

Clones

Detected Clones UI Clone

Visualization

7

Page 8: Presentation  7 Summary

Benefits Fact: Modularity is a key characteristic in

today’s software world

Why? Allows us to divide software into a decomposed separation of concerns◦ Attributes to maintainability, reusability, testability

and reliability

Clone Detection allows us to detect common software spread across large bodies of code◦ Identify code that is subject to further modularity

8

Page 9: Presentation  7 Summary

Features Clone Detection Software Suite

◦ Identifies◦ Tracks◦ Manages Software Clones

Multi-language support◦ C++◦ C#◦ Java

9

Page 10: Presentation  7 Summary

Features (cont) Provides complete code coverage Multi-Application Support

◦ Stand-alone◦ Plug-in based (Eclipse)◦ Backend service (Ant task)

Extendible◦ Built on a Plug-in Framework◦ Add new languages

Easy to Navigate between Clones Persists Clones for easy Retrieval

10

Page 11: Presentation  7 Summary

Complexity of problem proves more difficult than initial estimates.

Technology to be applied is neither well-established or has yet to be developed.

Unable to complete defined project scope within schedule.

Volatile user requirements leading to redefinition of project objectives.

Risk Analysis

11

Page 12: Presentation  7 Summary

Release PlanRelease Plan and User Stories

12

Page 13: Presentation  7 Summary

Came out with original Release Plan on 9/15/20

Due to customer wants/needs, we had to re-tool our user stories.

Dr. Etzkorn’s main concerns: Load source code and translate to a language

independent model Analyze the translated source code for clones

◦ Results from meeting: Created two new user stories (see next two slides) These two user stories have been pushed to the front

of our card stack

Re-tooled User Stories

13

Page 14: Presentation  7 Summary

CS 666 Studio I User Stories

Phase I

Page 15: Presentation  7 Summary

Story ID:Priority:Estimate:

017

1

14 Days

15

As an analyst I want the to load and translate my source code projects so I can analyze the source for clones.

Source Code Load & Translate

Page 16: Presentation  7 Summary

Story ID:Priority:Estimate:

018

1

14 Days

16

As an analyst I want the to analyze my source code projects so I can see the clones.

Source Code Analyze

Page 17: Presentation  7 Summary

Story ID:Priority:Estimate:

002

1

14 Days

17

As a analyst I want the capability to have the source code associated with clones highlighted within source files so that they are easy to identify.

Code Clone Highlights

Page 18: Presentation  7 Summary

Current TasksRequirements & Models

18

Page 19: Presentation  7 Summary

Requirements modeling for the first user story “Source Code Load & Translate”:◦ Load & parse C#, Java, C++ source code.◦ Translate the parsed C#, Java, C++ source code

to CodeDOM.◦ Associate the CodeDOM to the original source

code. Requirements modeling for the second user

story “Source Code Analyze”:◦ Analyze CodeDom for clones.

Current Tasks’ Requirements

19

Page 20: Presentation  7 Summary

UML Model – Load & Parse

20

Page 21: Presentation  7 Summary

UML Model – Translate

21

Page 22: Presentation  7 Summary

UML Model – Associate

22

Page 23: Presentation  7 Summary

UML Model – Analyze

23

Page 24: Presentation  7 Summary

ArchitectureDesign and Architecture

24

Page 25: Presentation  7 Summary

Key Architecture Points Multilanguage support

Configurable for different platforms◦ Stand-along application◦ plug-in◦ backend service

Extendable

25

Page 26: Presentation  7 Summary

Architecture

C# Service

Java Service

C++ Service

ApplicationUser Interface

Code Model

Clone Detection Algorithms

Core

APILanguage Support

(Interface)

26

Service

EclipsePlug-in

Etc…

WebInterface

Page 27: Presentation  7 Summary

Core Unit Code Model

◦ Stores the code in common format Application Programming Interface

◦ Used to embed clone detection in applications Language Service Interface

◦ Communication layer between the core and the specific language services

Code ModelClone Detection Algorithms

Core

API

Language Service Interface

27

Page 28: Presentation  7 Summary

App Configuration

28

Page 29: Presentation  7 Summary

CRC Card SamplingClass Responsibility Collaboration Cards

29

Page 30: Presentation  7 Summary

Java ParserParse Java source code LALRParser (Gold Parser)Construct Java token tree

Java Parser CRC

30

Page 31: Presentation  7 Summary

ParserParse C# source code LALRParser (Gold Parser)Construct C# token tree

C# Parser CRC

31

Page 32: Presentation  7 Summary

LanguageServiceDefines standard interface for all language providers.

ILanguageService

Language ServiceCRC

32

Page 33: Presentation  7 Summary

JavaServiceReads Java source code Java ParserUnderstands Java grammar production rules

CloneDetection

Construct CodeDOM compilation unit

JavaCodeProvider

ILanguageService

Java Service CRC

33

Page 34: Presentation  7 Summary

CsServiceReads C# source code C# ParserUnderstands C# grammar production rules

CloneDetection

Construct CodeDOM compilation unit

CsCodeProvider

ILanguageService

Cs Service CRC

34

Page 35: Presentation  7 Summary

CloneDectionLoads and manages languages services.

ILanguageService

Controls parsingEstablishes CodeDOM compilation units to source code file associationsCompares code segments CodeDomComparerProvides bookkeeping for code segments

CodeDomSummary

CloneDetectionCRC

35

Page 36: Presentation  7 Summary

ParsingOur struggles and our successes.

36

Page 37: Presentation  7 Summary

We explored and conducted spikes on CSParser and CS CodeDOM Parser.◦ They both had advantages and disadvantage.◦ We came to the conclusion that neither of them

were going to fit our needs. We explored and conducted a spike on

GOLD Parser.◦ We ultimately chose the GOLD Parser because it

best fit our needs. This gave us a way to manage multiple language

grammars with one engine.

Parsing Struggles & Successes

37

Page 38: Presentation  7 Summary

GOLD Parsing SystemGOLD Parsing Populating CodeDOM

38

Page 39: Presentation  7 Summary

How It Works (Block Structure)

Grammar Builder

Compiled Grammar

Table (*.cgt)

Engine

Source Code

Parsed

Data

39

Page 40: Presentation  7 Summary

How It Works (Process)

Grammar Builder

Compiled Grammar

Table (*.cgt)

Engine

Source Code

Parsed

Data

Typical output from engine: a long nested tree

40

Page 41: Presentation  7 Summary

Usage within CloneDigger

Compiled Grammar

Table (*.cgt)

Engine

Source Code

Parsed

Data

CodeDOM Conversion• Need to write routine to move

data from Parsed Tree to CodeDOM• Parsed data trees from parser

are stored in consistent data structure, but are based on rules defined within grammars

CodeDOM Conversi

on

AST

41

Page 42: Presentation  7 Summary

Grammar UpdatesBookkeeping for parsing the multiple grammars.

42

Page 43: Presentation  7 Summary

Grammar Updates Currently the grammars we have for the

Gold parser are out dated.

Current Gold Grammars◦ C# version 2.0◦ Java version 1.4

Current available software versions◦ C# version 4.0◦ Java version 6

43

Page 44: Presentation  7 Summary

Grammars for C# and Java are very complex and require a lot of work to build.

Antler and Gold Parser grammars use completely different syntax.

Positive note: Other development not halted by use of older grammars.

Grammar Update Issues

44

Page 45: Presentation  7 Summary

Our BookkeepingBookkeeping for parsing the multiple grammars

45

Page 46: Presentation  7 Summary

For Java, there is…◦ 359 production rules◦ 249 distinctive symbols (terminal & non-terminal)

For C#, there is…◦ 415 production rules◦ 279 distinctive symbols (terminal & non-terminal)

Compiled Grammar Table

46

Page 47: Presentation  7 Summary

Production Rule Dependancies

47

Page 48: Presentation  7 Summary

Since there are so many production rules, we came up with the following bookkeeping:

A spreadsheet of the compiled grammar table (for each language) with each production rule indexed.◦ This spreadsheet covers:

various aspects of language what we have/have not handled from the parser what we have/have not implemented into CodeDOM percentage complete

Our Grammar Bookkeeping

48

Page 49: Presentation  7 Summary

Our Grammar Bookkeeping

49

Page 50: Presentation  7 Summary

Parsing Handlers’ Status:◦ C# = 100% complete◦ Java = 100% complete

Parsing & CodeDOM Status

50

Page 51: Presentation  7 Summary

CodeDOMLanguage Independent Object Model

51

Page 52: Presentation  7 Summary

CodeDOM Document Object Model for Source Code

API - [System.CodeDom]

Only supports certain aspects of the language since it’s language agnostic◦ Good Enough

What Does it Do?◦ Programmatically Constructs Code

What Doesn’t it Do?◦ Does NOT parse

52

Page 53: Presentation  7 Summary

CodeDOM Example CodeCompileUnit

◦ CodeNameSpace Imports Types

Members Event Field Method

Statements Expression

Property

53

Page 54: Presentation  7 Summary

Clone AnaysisClones & Dr. Kraft’s Tool

54

Page 55: Presentation  7 Summary

3 Types of Clones (Definition of Similarity):◦ Type 1: An exact copy without modifications

(except for whitespace and comments)

◦ Type 2: A syntactically identical copy Only variable, type, or function identifiers have

been changed

◦ Type 3: A copy with further modifications Statements have been changed, reordered, added,

or removed

Clones Types

55

Page 56: Presentation  7 Summary

Multi-Language Clone Detection◦ Cutting Edge of Research

Preliminary Research◦ Dr. Kraft and Students at UAB

C# and VB. Publication

Nicholas A. Kraft, Brandon W. Bonds, Randy K. Smith: Cross-language Clone Detection. SEKE 2008: 54-59

◦ Utilizes Mono Parsers C# VB

Clone Research

56

Page 57: Presentation  7 Summary

Performs Comparisons of Code Files

For each File, a CodeDOM tree is tokenized

Uses Levenshtein Distance Calculation◦ Minimum number of edits needed to transform one

sequence into the other

Distances Calculated◦ Distance determines Probability of a Clone

Dr. Kraft Clone Analysis

57

Page 58: Presentation  7 Summary

Dr. Kraft Application

58

Page 59: Presentation  7 Summary

Limitations Only does file-to-file comparisons

◦ Does not detect clones in same source file

Can only detect Type 1 and some Type 2 clones

Not very efficient (brute force)

59

Page 60: Presentation  7 Summary

Add Support for Same File Clone Detection

Add Support for Type 3 Clone Detection◦ Requires more Research

Provide a more efficient clone analysis algorithm

Enhancements

60

Page 61: Presentation  7 Summary

TestingWhite Box & Black Box Testing

61

Page 62: Presentation  7 Summary

White Box Testing: ◦ Unit Testing

Black Box Testing:◦ Production Rule Testing

Allows us to test the robustness of our engine because we can force rule production errors.

Regression Testing Automated

◦ Functional Testing

Testing Our Project

62

Page 63: Presentation  7 Summary

Unit Testing

63

Page 64: Presentation  7 Summary

Production Rule Test Input File Example

64

Page 65: Presentation  7 Summary

Functional Tests

65

Page 66: Presentation  7 Summary

MetricsProject Metrics

66

Page 67: Presentation  7 Summary

As of Nov 8, 2010 SLOC:

◦ CS666_Client = 553 lines◦ CS666_Core = 114 lines◦ CS666_CppParser = 117 lines◦ CS666_CsParser = 1678 lines◦ CS666_JavaParser = 3350 lines◦ CS666_LanguageSupport = 48 lines◦ CS666_UnitTests = 3384 lines

Total = 9244 lines (including unit tests)

SLOC For Our Project

67

Page 68: Presentation  7 Summary

DemonstrationDemonstration of our progress.

68

Page 69: Presentation  7 Summary

Demonstration These are the things we would like to show

you today:◦ GUI work◦ Project setup

Save project Load project

◦ Loading of source code◦ Parsing of source code◦ Translation of source code

69

Page 70: Presentation  7 Summary

Team CollaborationTeam 2 & Team 3

70

Page 71: Presentation  7 Summary

Team Collaboration Due to Team 3’s team size, we have taken

responsibility of gathering & sharing grammars.

Team 3 has the responsibility of the C++ Parsing.

Both Teams will…◦ Use the same grammars & engines

We will both have limitations based on this. Ex: JAVA grammar is based off 1.4 -> we are limited to

using JAVA 1.4◦ Test the same grammars & engines

We will have two test beds. 71

Page 72: Presentation  7 Summary

Team Collaboration Both teams met Monday (11-8-10) after

class and performed the required Pair Programming.

Current Status:◦ Team 2

All project source code has been made available.

We are researching and working to update the Java and C# grammars.

◦ Team 3 Team 3 is working on C++ parsing.

Looking into other parser, ELSA.

72

Page 73: Presentation  7 Summary

Path ForwardCurrent Status & Path Forward for Next Semester

73

Page 74: Presentation  7 Summary

Iteration 1: Parsing -> 85%◦ Completed parsing for Java & C#◦ No parsing for C++

But we have a foundation and design to start from. Iteration 2: Translation to CodeDOM -> 60%

◦ We have the foundation and design completed.◦ Now, it is a matter of turning the crank for the

languages. Iteration 3: Clone Analysis -> 30%

◦ Ported majority of Dr. Kraft’s student project code.◦ Started focusing on the GUI

Where we stand…

74

Page 75: Presentation  7 Summary

Task Understanding Three Step Process• Step 1 Code Translation

• Step 2 Clone Detection

• Step 3 Visualization

Source Files Translator Common

Model

Common Model Inspector Detected

Clones

Detected Clones UI Clone

Visualization

75

Page 76: Presentation  7 Summary

Schedule

76

Page 77: Presentation  7 Summary

Our next step is to re-evaluate where we currently stand.◦ Revisit Release Plan

Pull in Software Studio I work that was not completed.

◦ Revisit User Stories◦ Start off strong with unit tests not completed.

Path Forward

77