Investigating a Semantic Metrics Suite for Object-Oriented Design
Dr. Letha Etzkorn (PI)Ms. Cara SteinDr. Glenn CoxDr. Sampson GholstonDr. Dawn UtleyDr. Phil Farrington
The University of Alabama in Huntsville
A Semantic Metrics Suite for Object-Oriented Design
Standard software metrics have some problems!
They are implementation dependent since they are calculated strictly from the code
They count code items; sometimes it is arguable whether the items counted accurately reflect the qualities the metrics are supposed to measure Does # of lines of code always reflect
complexity?The University of Alabama in Huntsville
A Semantic Metrics Suite for Object-Oriented Design
Segment #1 and Segment #2 do the same thing but have very different LOC metrics:
Code Segment #1:test[--cnt]-> test =
val1[input_count++].counter + val2[tmp_count--]->mycount;
Code Segment #2:temp = val1[input_count].counter + val2[tmp_count]->mycount;input_count++;tmp_count--;--cnt;test[cnt]->test = temp;
The University of Alabama in Huntsville
A Semantic Metrics Suite for Object-Oriented Design
Object-Oriented software metrics have some similar problems:
Including constructor or destructor functions in the calculation of the Lack of Cohesion in Methods metric can cause the metric to fail
The University of Alabama in Huntsville
A Semantic Metrics Suite for Object-Oriented Design
These problems with software metrics all result due to the metrics being calculated on syntactic aspects of the code.
The solution is to define metrics based on semantic aspects of code (“what the code means”, the code design versus the code implementation) Program Understanding!!!
The University of Alabama in Huntsville
A Semantic Metrics Suite for Object-Oriented Design
Program Understanding Includes any activity that uses dynamic
or static methods to reveal program properties
The University of Alabama in Huntsville
A Semantic Metrics Suite for Object-Oriented Design
Three kinds of program understanding approaches Algorithmic approaches
Annotate programs with formal specifications Knowledge-based Approaches
Knowledge-base is mapped to program concepts Graph parsing approaches
Program turned into a flow graph, then matched to a knowledge-base of flow graphs
The University of Alabama in Huntsville
A Semantic Metrics Suite for Object-Oriented Design
Two kinds of knowledge-based program understanding approaches Look at code only
Kozaczynski, Ning and Engberts; Harandi and Ning
Informal tokens only Biggerstaff, Mitbander and Webster; Etzkorn and
Davis
The University of Alabama in Huntsville
A Semantic Metrics Suite for Object-Oriented Design
Etzkorn Informal Tokens approach: Used natural language processing and
information extraction techniques Used informal tokens: comments and
identifier names Used a knowledge-base consisting of a
hierarchical semantic network Originally implemented in the PATRicia
system (Program Analysis Tool for Reuse)
The University of Alabama in Huntsville
A Semantic Metrics Suite for Object-Oriented Design
Original purpose of the PATRicia system was to identify reusable components in existing object-oriented software.
Included two parts: Was the code useful in an area of
interest? Was the code “good enough” to use?
The University of Alabama in Huntsville
A Semantic Metrics Suite for Object-Oriented Design
To answer: Was the code useful in an area of interest?
Developed the Etzkorn Informal tokens program understanding approach
To answer: Was the code “good enough” to use?
Used Object-Oriented software metrics
The University of Alabama in Huntsville
A Semantic Metrics Suite for Object-Oriented Design
The PATRicia system produced: 2 reports from informal tokens section
A list of areas, with definitions, covered by a class or class hierarchy
A description of the operation of a class in natural language
1 report from metrics section Values of various OO metrics for classes and
class hierarchies
The University of Alabama in Huntsville
A Semantic Metrics Suite for Object-Oriented Design
Description of Operation Report:Class wxbItem:-- minimizes a window. HINT: A button that can be described by a color descriptor
and a left descriptor can minimize a window.-- focuses an <object> HINT: It is possible to focus an area.-- tracks a mouse. HINT: It is possible to track a mouse that can own a button.
The University of Alabama in Huntsville
A Semantic Metrics Suite for Object-Oriented Design
The effectiveness of the PATRicia system informal tokens approach has been demonstrated for:
GUI packages Mathematical softwareUsing information extraction-based metrics for: Recall Precision overgeneration
The University of Alabama in Huntsville
A Semantic Metrics Suite for Object-Oriented Design
Currently have published 17 refereed articles from research related to the PATRicia system Knowledge-based journals and
conferences Natural language journals Software metrics/software
engineering journalsThe University of Alabama in Huntsville
A Semantic Metrics Suite for Object-Oriented Design
Using the PATRicia system, we: Noticed various problems with
traditional software metrics Came to realize that knowledge-
based program understanding could be used to develop new metrics independent of the syntax of the code
The University of Alabama in Huntsville
A Semantic Metrics Suite for Object-Oriented Design
The knowledge-base in the PATRicia system is:
A weighted, hierarchical semantic network
At the lowest level, based on conceptual graphs
The University of Alabama in Huntsville
A Semantic Metrics Suite for Object-Oriented Design
What is a conceptual graph? A knowledge representation
technique Often used in natural language
understanding, especially in natural language generation
The University of Alabama in Huntsville
A Semantic Metrics Suite for Object-Oriented Design
The University of Alabama in Huntsville
CAT STAT LOCSIT MAT
A conceptual graph example:
A Semantic Metrics Suite for Object-Oriented Design
The University of Alabama in Huntsville
CURSOR LOC OBJINSERT TEXT
Interface layer—consists of keywords tagged with the part of speech (noun, adjective, verb, etc.)
infer
infer
Conceptual graphs, concepts in conceptual graphs
A Semantic Metrics Suite for Object-Oriented Design
The various semantic metrics in the semantic metrics suite are defined in terms of a conceptual graph-based knowledge-base.
The PATRicia system knowledge-base is conceptual graph-based; however, it includes an inferencing scheme that is outside the conceptual graph definition.
The University of Alabama in Huntsville
A Semantic Metrics Suite for Object-Oriented Design
Semantic complexity metrics measure the domain complexity rather than the implementation complexity: Semantic Class Definition Entropy (SCDE) Class Domain Complexity (CDC) Relative Class Domain Complexity (RCDC) Key Class Identification (KCI) Class Interface Complexity
The University of Alabama in Huntsville
A Semantic Metrics Suite for Object-Oriented Design
Class Domain Complexity (CDC): CDC = Σi=1
m |concept + conceptual relations| X weight
1 + number of conceptual relations linking the current concept to another concept recognized by the class. Concepts linking to concepts in another class are not included in the count. Only outgoing conceptual relations are included in the count (to prevent counting the same conceptual relation twice)
The University of Alabama in Huntsville
A Semantic Metrics Suite for Object-Oriented Design
Semantic Class Definition Entropy: Based on information theory. Entropy
is the measure of the amount of information.
Validated and Published in “A Semantic Entropy Metric,” Etzkorn, L., Gholston, S., Hughes. W., The Journal of Software Maintenance and Evolution, Vol. 14, 2002, pp. 293-310
The University of Alabama in Huntsville
A Semantic Metrics Suite for Object-Oriented Design
Semantic Class Definition Entropy (cont’d): Amount of information in an alphabetic string
Ii = -log2 Pi
Probability of the I most frequently occurring domain related concepts or keywords
Pi = fi /N1
fi = number of occurrences of I most frequently occurring domain related concepts or keywords
N1 = total number of non-unique domain related concepts or keywords
The University of Alabama in Huntsville
A Semantic Metrics Suite for Object-Oriented Design
Semantic Class Definition Entropy (cont’d): Average amount of information
contributed by each domain-related concept or keyword in a class definition
H = - Σi=1n1 (Pi log2 Pi)
n1 = total number of unique domain-related concepts or keywords belonging to a class
Pi = fi /N1
The University of Alabama in Huntsville
A Semantic Metrics Suite for Object-Oriented Design
Semantic Class Definition Entropy (SCDE):
SCDE = - Σi=1n1 (fi /N1 log2 fi /N1 )
n1 = total number of unique domain-related concepts or keywords belonging to a class
fi = number of occurrences of I most frequently occurring domain related concepts or keywords
N1 = total number of non-unique domain related concepts or keywords
The University of Alabama in Huntsville
A Semantic Metrics Suite for Object-Oriented Design
Proposed Semantic Metrics were published in:
Etzkorn, Letha and Delugach, Harry, "Towards a Semantic Metrics Suite for Object-Oriented Design," Proceedings of the 34th International Conference on Technology of Object-Oriented Languages and Systems, TOOLS 34 (TOOLS USA), July 30-August 4, 2000, IEEE Computer Society Press, Los Alamitos, CA, 2000, pp. 71-80.
In this paper, the semantic metrics were validated theoretically using:
Weyucker’s criteria for metric definitions Briand and Melo’s criteria for cohesion metrics
The University of Alabama in Huntsville
A Semantic Metrics Suite for Object-Oriented Design
Under the NASA grant, currently extending the informal tokens portion of the PATRicia system to analyze Semantic Metrics
New tool is called SemMet
The University of Alabama in Huntsville
A Semantic Metrics Suite for Object-Oriented Design
PATRicia system informal tokens analysis (and now SemMet) written in: C++ CLIPS expert system shell
Various lex parsers
The University of Alabama in Huntsville
A Semantic Metrics Suite for Object-Oriented Design
Plan to validate SemMet-based semantic metrics:
GUI/mathematical packages data MDP data (from Mike Chapman) Advanced Engineering
Environment from MSFC
The University of Alabama in Huntsville