presentation 7 summary
DESCRIPTION
Presentation 7 Summary. Cross Language Clone Analysis Team 2 November 22, 2010. Agenda. Feasibility Study Release Plan Architecture Parsing CodeDOM Clone Analysis Testing Demonstration Team Collaboration Path Forward. Our Team. Allen Tucker Patricia Bradford Greg Rodgers - PowerPoint PPT PresentationTRANSCRIPT
Presentation 7 Summary
Cross Language Clone AnalysisTeam 2
November 22, 2010
• Feasibility Study• Release Plan• Architecture• Parsing• CodeDOM• Clone Analysis• Testing• Demonstration• Team Collaboration• Path Forward
Agenda
2
Allen Tucker Patricia Bradford Greg Rodgers Brian Bentley Ashley Chafin
Our Team
3
Feasibility StudyOur evaluation of the project to determine the difficulty in carrying out the task.
4
Our Customers: Dr. Etzkorn and Dr. Kraft Customer Request:
◦ A tool that will abstract programs in C++, C#, Java, and (Python or VB) to the Dagstuhl Middle Metamodel, Microsoft CodeDOM or something similar, and detect cross-language clones.
Areas to Note: ◦ the user interface◦ easy comparisons of clones◦ visualization of clones◦ sub-clones◦ clone detection for large bodies of code
Task Summary
5
Per our task, in order to find clones across different programming languages, we will have to first convert the code from each language over to a language independent object model.
Some Language Independent Object Models:◦ Dagstuhl Middle Metamodel (DMM)◦ Microsoft CodeDOM
Both of these models provide a language independent object model for representing the structure of source code.
Task Summary (cont.)
6
Three Step Process• Step 1 Code Translation
• Step 2 Clone Detection
• Step 3 Visualization
Task Understanding
Source Files Translator Common
Model
Common Model Inspector Detected
Clones
Detected Clones UI Clone
Visualization
7
Benefits Fact: Modularity is a key characteristic in
today’s software world
Why? Allows us to divide software into a decomposed separation of concerns◦ Attributes to maintainability, reusability, testability
and reliability
Clone Detection allows us to detect common software spread across large bodies of code◦ Identify code that is subject to further modularity
8
Features Clone Detection Software Suite
◦ Identifies◦ Tracks◦ Manages Software Clones
Multi-language support◦ C++◦ C#◦ Java
9
Features (cont) Provides complete code coverage Multi-Application Support
◦ Stand-alone◦ Plug-in based (Eclipse)◦ Backend service (Ant task)
Extendible◦ Built on a Plug-in Framework◦ Add new languages
Easy to Navigate between Clones Persists Clones for easy Retrieval
10
Complexity of problem proves more difficult than initial estimates.
Technology to be applied is neither well-established or has yet to be developed.
Unable to complete defined project scope within schedule.
Volatile user requirements leading to redefinition of project objectives.
Risk Analysis
11
Release PlanRelease Plan and User Stories
12
Came out with original Release Plan on 9/15/20
Due to customer wants/needs, we had to re-tool our user stories.
Dr. Etzkorn’s main concerns: Load source code and translate to a language
independent model Analyze the translated source code for clones
◦ Results from meeting: Created two new user stories (see next two slides) These two user stories have been pushed to the front
of our card stack
Re-tooled User Stories
13
CS 666 Studio I User Stories
Phase I
Story ID:Priority:Estimate:
017
1
14 Days
15
As an analyst I want the to load and translate my source code projects so I can analyze the source for clones.
Source Code Load & Translate
Story ID:Priority:Estimate:
018
1
14 Days
16
As an analyst I want the to analyze my source code projects so I can see the clones.
Source Code Analyze
Story ID:Priority:Estimate:
002
1
14 Days
17
As a analyst I want the capability to have the source code associated with clones highlighted within source files so that they are easy to identify.
Code Clone Highlights
Current TasksRequirements & Models
18
Requirements modeling for the first user story “Source Code Load & Translate”:◦ Load & parse C#, Java, C++ source code.◦ Translate the parsed C#, Java, C++ source code
to CodeDOM.◦ Associate the CodeDOM to the original source
code. Requirements modeling for the second user
story “Source Code Analyze”:◦ Analyze CodeDom for clones.
Current Tasks’ Requirements
19
UML Model – Load & Parse
20
UML Model – Translate
21
UML Model – Associate
22
UML Model – Analyze
23
ArchitectureDesign and Architecture
24
Key Architecture Points Multilanguage support
Configurable for different platforms◦ Stand-along application◦ plug-in◦ backend service
Extendable
25
Architecture
C# Service
Java Service
C++ Service
ApplicationUser Interface
Code Model
Clone Detection Algorithms
Core
APILanguage Support
(Interface)
26
Service
EclipsePlug-in
Etc…
WebInterface
Core Unit Code Model
◦ Stores the code in common format Application Programming Interface
◦ Used to embed clone detection in applications Language Service Interface
◦ Communication layer between the core and the specific language services
Code ModelClone Detection Algorithms
Core
API
Language Service Interface
27
App Configuration
28
CRC Card SamplingClass Responsibility Collaboration Cards
29
Java ParserParse Java source code LALRParser (Gold Parser)Construct Java token tree
Java Parser CRC
30
ParserParse C# source code LALRParser (Gold Parser)Construct C# token tree
C# Parser CRC
31
LanguageServiceDefines standard interface for all language providers.
ILanguageService
Language ServiceCRC
32
JavaServiceReads Java source code Java ParserUnderstands Java grammar production rules
CloneDetection
Construct CodeDOM compilation unit
JavaCodeProvider
ILanguageService
Java Service CRC
33
CsServiceReads C# source code C# ParserUnderstands C# grammar production rules
CloneDetection
Construct CodeDOM compilation unit
CsCodeProvider
ILanguageService
Cs Service CRC
34
CloneDectionLoads and manages languages services.
ILanguageService
Controls parsingEstablishes CodeDOM compilation units to source code file associationsCompares code segments CodeDomComparerProvides bookkeeping for code segments
CodeDomSummary
CloneDetectionCRC
35
ParsingOur struggles and our successes.
36
We explored and conducted spikes on CSParser and CS CodeDOM Parser.◦ They both had advantages and disadvantage.◦ We came to the conclusion that neither of them
were going to fit our needs. We explored and conducted a spike on
GOLD Parser.◦ We ultimately chose the GOLD Parser because it
best fit our needs. This gave us a way to manage multiple language
grammars with one engine.
Parsing Struggles & Successes
37
GOLD Parsing SystemGOLD Parsing Populating CodeDOM
38
How It Works (Block Structure)
Grammar Builder
Compiled Grammar
Table (*.cgt)
Engine
Source Code
Parsed
Data
39
How It Works (Process)
Grammar Builder
Compiled Grammar
Table (*.cgt)
Engine
Source Code
Parsed
Data
Typical output from engine: a long nested tree
40
Usage within CloneDigger
Compiled Grammar
Table (*.cgt)
Engine
Source Code
Parsed
Data
CodeDOM Conversion• Need to write routine to move
data from Parsed Tree to CodeDOM• Parsed data trees from parser
are stored in consistent data structure, but are based on rules defined within grammars
CodeDOM Conversi
on
AST
41
Grammar UpdatesBookkeeping for parsing the multiple grammars.
42
Grammar Updates Currently the grammars we have for the
Gold parser are out dated.
Current Gold Grammars◦ C# version 2.0◦ Java version 1.4
Current available software versions◦ C# version 4.0◦ Java version 6
43
Grammars for C# and Java are very complex and require a lot of work to build.
Antler and Gold Parser grammars use completely different syntax.
Positive note: Other development not halted by use of older grammars.
Grammar Update Issues
44
Our BookkeepingBookkeeping for parsing the multiple grammars
45
For Java, there is…◦ 359 production rules◦ 249 distinctive symbols (terminal & non-terminal)
For C#, there is…◦ 415 production rules◦ 279 distinctive symbols (terminal & non-terminal)
Compiled Grammar Table
46
Production Rule Dependancies
47
Since there are so many production rules, we came up with the following bookkeeping:
A spreadsheet of the compiled grammar table (for each language) with each production rule indexed.◦ This spreadsheet covers:
various aspects of language what we have/have not handled from the parser what we have/have not implemented into CodeDOM percentage complete
Our Grammar Bookkeeping
48
Our Grammar Bookkeeping
49
Parsing Handlers’ Status:◦ C# = 100% complete◦ Java = 100% complete
Parsing & CodeDOM Status
50
CodeDOMLanguage Independent Object Model
51
CodeDOM Document Object Model for Source Code
API - [System.CodeDom]
Only supports certain aspects of the language since it’s language agnostic◦ Good Enough
What Does it Do?◦ Programmatically Constructs Code
What Doesn’t it Do?◦ Does NOT parse
52
CodeDOM Example CodeCompileUnit
◦ CodeNameSpace Imports Types
Members Event Field Method
Statements Expression
Property
53
Clone AnaysisClones & Dr. Kraft’s Tool
54
3 Types of Clones (Definition of Similarity):◦ Type 1: An exact copy without modifications
(except for whitespace and comments)
◦ Type 2: A syntactically identical copy Only variable, type, or function identifiers have
been changed
◦ Type 3: A copy with further modifications Statements have been changed, reordered, added,
or removed
Clones Types
55
Multi-Language Clone Detection◦ Cutting Edge of Research
Preliminary Research◦ Dr. Kraft and Students at UAB
C# and VB. Publication
Nicholas A. Kraft, Brandon W. Bonds, Randy K. Smith: Cross-language Clone Detection. SEKE 2008: 54-59
◦ Utilizes Mono Parsers C# VB
Clone Research
56
Performs Comparisons of Code Files
For each File, a CodeDOM tree is tokenized
Uses Levenshtein Distance Calculation◦ Minimum number of edits needed to transform one
sequence into the other
Distances Calculated◦ Distance determines Probability of a Clone
Dr. Kraft Clone Analysis
57
Dr. Kraft Application
58
Limitations Only does file-to-file comparisons
◦ Does not detect clones in same source file
Can only detect Type 1 and some Type 2 clones
Not very efficient (brute force)
59
Add Support for Same File Clone Detection
Add Support for Type 3 Clone Detection◦ Requires more Research
Provide a more efficient clone analysis algorithm
Enhancements
60
TestingWhite Box & Black Box Testing
61
White Box Testing: ◦ Unit Testing
Black Box Testing:◦ Production Rule Testing
Allows us to test the robustness of our engine because we can force rule production errors.
Regression Testing Automated
◦ Functional Testing
Testing Our Project
62
Unit Testing
63
Production Rule Test Input File Example
64
Functional Tests
65
MetricsProject Metrics
66
As of Nov 8, 2010 SLOC:
◦ CS666_Client = 553 lines◦ CS666_Core = 114 lines◦ CS666_CppParser = 117 lines◦ CS666_CsParser = 1678 lines◦ CS666_JavaParser = 3350 lines◦ CS666_LanguageSupport = 48 lines◦ CS666_UnitTests = 3384 lines
Total = 9244 lines (including unit tests)
SLOC For Our Project
67
DemonstrationDemonstration of our progress.
68
Demonstration These are the things we would like to show
you today:◦ GUI work◦ Project setup
Save project Load project
◦ Loading of source code◦ Parsing of source code◦ Translation of source code
69
Team CollaborationTeam 2 & Team 3
70
Team Collaboration Due to Team 3’s team size, we have taken
responsibility of gathering & sharing grammars.
Team 3 has the responsibility of the C++ Parsing.
Both Teams will…◦ Use the same grammars & engines
We will both have limitations based on this. Ex: JAVA grammar is based off 1.4 -> we are limited to
using JAVA 1.4◦ Test the same grammars & engines
We will have two test beds. 71
Team Collaboration Both teams met Monday (11-8-10) after
class and performed the required Pair Programming.
Current Status:◦ Team 2
All project source code has been made available.
We are researching and working to update the Java and C# grammars.
◦ Team 3 Team 3 is working on C++ parsing.
Looking into other parser, ELSA.
72
Path ForwardCurrent Status & Path Forward for Next Semester
73
Iteration 1: Parsing -> 85%◦ Completed parsing for Java & C#◦ No parsing for C++
But we have a foundation and design to start from. Iteration 2: Translation to CodeDOM -> 60%
◦ We have the foundation and design completed.◦ Now, it is a matter of turning the crank for the
languages. Iteration 3: Clone Analysis -> 30%
◦ Ported majority of Dr. Kraft’s student project code.◦ Started focusing on the GUI
Where we stand…
74
Task Understanding Three Step Process• Step 1 Code Translation
• Step 2 Clone Detection
• Step 3 Visualization
Source Files Translator Common
Model
Common Model Inspector Detected
Clones
Detected Clones UI Clone
Visualization
75
Schedule
76
Our next step is to re-evaluate where we currently stand.◦ Revisit Release Plan
Pull in Software Studio I work that was not completed.
◦ Revisit User Stories◦ Start off strong with unit tests not completed.
Path Forward
77