introduction to measurement in software engineering

1

Departamento deInformática

Introduction to

Measurement in Software Engineering

Alessandro GarciaOctober 2013

Departamento de Informática2

Empirical (real-world) relations

2

Departamento de Informática

What is measurement?

Measurement is the process by which numbers (or symbols) are assigned to attributes of entities in the world according to clearly defined rules

A software metric defines how to measure an attribute of a software product, process or resource

Metrics should be valid theoretically and empirically

Framework for software measurement validation[Kitchenham et al, IEEE TSE, September 1995]


Basic Definitions

• Entities – Objects in the real world

– E.g. height and weight of people

• Attributes – Characteristics / features / properties of an entity

– Software-related entities: software products/artifacts,

activities/process and resources

• Internal vs. external attributes

of software attributes

Example

Entity: Program

Attributes- Size

- Number of Defects

- Number of files

etc…

3


Importance of Software Measurement

• Measurement helps us to understand

– Makes the quality of current activity visible

– E.g. average of bugs found per module

• Measurement allows us to control

– Measures establish guidelines, e.g. thresholds

– Predict outcomes in change processes

• Measurement encourages us to improve

– When we hold our product up to a measuring stick,

we can establish quality targets and aim to improve

– E.g. if number of bugs is > 2 per module, we need to

improve development practices


Classifying software measures

• For each entity, we distinguish:– Internal attributes

• Those that can be measured purely in terms of the product, process or the resource itself

• Size, complexity measures, dependencies

– External attributes

• Those that can be measured in terms of how the product, process or the resource relate to their environment

• Change proneness, Error proneness, Experienced failures, Degree of reuse, etc…

4


Classifying software measures

• Processes are collections of software related activities– Associated with time, schedule

• Products are artifacts, deliverables or documents that result from a process activity

• Resources are entities required by a process activity


Examples of Product Metrics

• 9 used metrics

Size

Coupling

Cohesion

Separation of Concerns

Attributes

LOC

NOA

WOC

CBC

LCOO

CDC

CDO

CDLOC

Metrics

DIT

Concern Difusion over Components

Concern Difusion over Operations

Concern Difusion over LOC

Lines of Code

Number of Attributes

Weighted Operations per Component

Coupling between Componentes

Depth of Inheritance

Lack of Cohesion in Operations

5


Metrics - Coupling

• Viewpoint: Component

• It extends the CBO metric [CK94]

• Measures the number of components which are coupled to a given component

– � coupling -> easier to maintain and reuse

CBC(Coupling between Components)


Why a measurement validation framework?

• Usefulness in research

– support the definition of new valid metrics for your

particular context

• E.g. our effort on defining metrics for “concerns”

(entity) realized in the source code

– attributes: tangling, scattering, etc...

• Usefulness in software projects and research

– need to choose among several possible metrics

• E.g. LCOM, LCOM1, LCOM2, LCOM3

– check the validity of an existing metric before using it

• E.g. LCOM is not a valid metric

6


Metrics - Separation of Concerns• Viewpoint: Concern

• Concerns investigated: roles of each

design pattern

getXgetYaddObserverremoveObservernotifysetXsetY

getP1getP2addObserverremoveObservernotifysetP1setP2

Figure1 *

updateDisplay

Screen

update

<<interface>>

Observer

addObserverremoveObservernotify

<<interface>>

SubjectFigureElement

LinePoint

Role “subject”

CDC - # Components = 3

CDO - # Operations = 13

CDLOC - # Transition Points = 10

public class Point

implements Subject {

private HashSet observers;

private int x;

private int y;

public Point(int x, int y, Color color) {

this.x=x;

this.y=y;

this.observers = new HashSet();

}

public int getX() { return x; }

public int getY() { return y; }

public void setX(int x) {

this.x=x;

notifyObservers();

}

public void setY(int y) {

this.y=y;

notifyObservers();

}

public void addObserver(Observer o) {

this.observers.add(o);

}

public void removeObserver(Observer o) {

this.observers.remove(o);

}

public void notifyObservers() {

for (Iterator e = observers.iterator() ; e.hasNext() ;) {

((Observer)e.next()).update(this);

}

}

}


Why a measurement validation framework?

• Usefulness in research

– support the definition of new valid metrics for your

particular context

• E.g. our effort on defining metrics for “concerns”

(entity) realized in the source code

– attributes: tangling, scattering, etc...

• Usefulness in software projects and research

– need to choose among several possible metrics

• E.g. LCOM, LCOM1, LCOM2, LCOM3

– check the validity of an existing metric before using it

• E.g. LCOM is not a valid metric

7


A measurement validation framework

• There is no unified set of validation criteria

• Framework for software measurement validation[Kitchenham et al, IEEE TSE, September 1995]

– This is the most cited one in the SE literature

– Authors tried to unify the validation criteria

– There are more recent references, but they extend

the notion of validity for specific purposes

• i.e. their validity criteria are not mandatory

• Framework consists of:

– Structural model for metric definition and other

theoretical criteria


Measurement: a mapping of…

… the empirical world intoa numerical one

A valid metric should

have these elements of

measurement

Unit: determines how

we measure an attribute

e.g. program size -> LOC

or lexical tokens

8






measurement

Unit: determines how

we measure an attribute

e.g. program size -> LOC

or lexical tokens


Types of Measurement Scales

Various scales of measurements exist:

– Nominal Scale

– Ordinal Scale

– Interval Scale

– Ratio Scale

9


The Nominal Scale (1/2)

Catholic

Muslim

JewishOther

Joe

Rachel

Michelle

Christine

Michael James

Example: A religion nominal scale for people

Clyde Wendy


The Nominal Scale (2/2)

• The most simple measurement scale– classify elements into categories with regards to a

certain attribute

• mapping to arbitrary “labels”

• There is no form of ranking

• Categories must be:– Mutually exclusive

• fault categories/units: ‘data fault’, ‘algorithm fault’, ‘hardware fault’, ‘environment fault’, ‘other faults’

10


The Ordinal Scale (1/2)

1st Class

2nd Class

3rd ClassFailed

Joe

Rachel

Michelle

Christine

Michael James

Example: A degree-classification ordinal scale

Clyde Wendy



• Elements classified into categories

• Categories are ranked

• Categories are transitive A > B & B > C � A > C

• Elements in one category can be said to be better (or worse) than elements in another category

• Elements in the same category are not rankable in any way

• As with nominal scale, categories must be:– Mutually exclusive

11



• Categories must be:– Mutually exclusive: e.g. fault categories/units: ‘major’,

‘minor’, and ‘negligible’

• Important: categories need to be defined precisely, so that…– … different data collectors use the terms consistently

• 2 - Major: fault resulting in failure

• 1- Minor: fault resulting in misleading/unhelpful outputs to the user

• 0 - Negligible: fault that is masked and does not affect beyond its location scope, not leading to any user damage


The Ordinal Scale

• Numbers or labels are used to place objects in order

• But, there is no information regarding the differences (intervals) between points on the scale

12


The Interval Scale

• Interval scales tell us about the order of data points, and

the size of the intervals in between data points

– an interval scale is a scale on which equal intervals between

objects, represent equal differences

– the interval differences are meaningful

• Addition and subtraction can be applied

– ratio is not supported, i.e.

– multiplication and division CANNOT be applied

Temperature of Different CPUs

0°C 30°C 60°C 120°C

CPU A CPU B CPU C Product

D

86°F 140°F


• Fahrenheit Scale

– Interval relationships are meaningful

• a 10-degree difference has the same meaning

anywhere along the scale

• example: the difference between 10 and 20 degrees

is the same as between 80 and 90 degrees

– But, we can’t say that 80 degrees is twice as hot as 40

degrees

• Ratio is not supported

– There is no ‘true’ zero, only an ‘arbitrary’ zero

The Interval Scale

13


Ratio Scale

• A ratio scale is an interval scale with

– ratios are meaningful

• multiplication and division can be applied

• E.g. we can say that 20 seconds is twice as long as 10

seconds

– a true zero point

• E.g. zero has the same meaning for time measured in

seconds, minutes or hours

• The highest level of measurement available

• Physical scales of time and length are ratio scales


Measurement Scales Hierarchy

• Scales are hierarchical

• Each higher-level scale

possesses all the

properties of the lower

ones

• A higher-level of

measurement can be

reduced to a lower one

but not vice-versa

Ratio

Interval

Ordinal

Nominal

Most Powerful

Analysis Possible

Least Powerful

Analysis Possible

14






measurement


Measurement Instrument…

• … may optionally be used to obtain the measured value of an attribute

• Examples:

– thermometer to measure temperature

– a software program to count the number of lines of

code in a program

• The use of instruments help to promote repeatability of measurement

15


Measurement Protocols

• Goal: to enable measurement, as far as possible, independent of the measurer and the environment

– Measure a specific attribute consistently and repeatably

• For example:

– A protocol for measuring the height of humans in meters

• Person must be standing, not bending over…

• Measurement must start at the top of the head (not

from the tip of up-stretched arms)

• The person must remove his/her shoes

• The person must stand on the soles of the feet (not on

tiptoe)


This structural model is not…

… appropriate for indirect measures

... Only for direct

measures

Indirect metrics are

those defined by

equations formed

by other metrics

16


Structural model for indirect measures

• Equation is based on an empirically observed association

between attributes

• We formalise this association as a mathematical equation


Structural model for indirect measures

• Example: when we use program size in an equation to

predict project effort (e.g. COCOMO measures)

• The entities of direct measures can differ from the entities

of the indirect measure

17


Other basic properties of valid measures

• Some theoretical criteria

– There must exist two instances of entities for which the metric results in different attribute values

– Different entity instances can have the same attribute value

– A valid metric must obey the representation condition…

[ there are many others discussed by Kitchenham]

Departamento de Informática34

Empirical (real-world) relations

18


Representation condition

• The mapping should preserve real world relations

– our observations in the real world must be reflected in

the numerical values we obtain from the mathematical

world

• A is taller than B iff M(A) > M(B)

• Binary empirical relation “taller than” is replaced by the numerical relation >

• So, “x is much taller than y” may meanM(x)>M(y)+15


Software measurement validation

• Theoretical validation

– the measure must not violate any necessary properties of

its elements in the structural model

• i.e. the metric measures what it purports to measure

• for example, a coupling metric correctly measures

coupling

• Empirical validation

– the measured values of attributes are consistent with

values predicted

• i.e. the metric is associated with a relevant external

metric, such as maintainability and fault-proneness

• for example, coupling measure is a consistent

predictor of fault proneness of modules

introduction to measurement in software engineering

Documents