software metrics

Software Metrics

Static metrics -Static metrics -Function point & calculate function pointsFunction point & calculate function points 、、 FP FP 與與 S/W S/W sciencescience、、 Feature pointsFeature points、、 Cyclomatic complexityCyclomatic complexity、、 Dataflow Dataflow metricmetric、、 Structural metricStructural metric、、 Relative metricRelative metric、、 Complexity Complexity over timeover time

Dynamic metrics – Runtime complexity

如何使用如何使用 Metric Metric (A Kind of Relative Complexity)(A Kind of Relative Complexity)

Relative Metric for Reusability

2

還有Data flow metrics

Structure Metrics

Entropy - Based Metrics

STATIC METRICS

Data Organization metrics

Span Between data reference, Eg. 抓取 data之 #of Comparison , #of call, #of read...

Slicing： E.g. external output 有關之 Code length

Data Biding： E.g. #of Common Block Var’s.

Volume metrics Halstead S/W science

#of Source line of code

Control metrics

Cyclometric Complexity

Knots Count : #of Control Intersection pts

Scope metric : 每個 statement 之影嚮範圍Logical Complexity： E.g. #of Binary decision (Absolute logical complexity); #of Binary decision / #of statement ( 不含 comment, relative logical complexity)

Average Nesting level

MEBOW - 在 Flow Chart 上，有 branch 的地方放上 weight(1), 算 weight sum 而得與 knot 差不多Call Statement 之總數

Hybrid metrics

兼有 control與 volume 特性

Syntatic Complexity Family

H.F. Li；W.K. Cheung

3

Halstead’s S/W Science

＊參數定義　　　 ** n1 : 不同 operator 之總數 - 基本算術及 logic 運算元如＋，－，＊，／，（），＝，＞，＜， .OR. ，．．．．

- Keyword (RESERVE Word) - Subroutine name, Procedure name, ．．．． ** n2 : 不同 operand 之總數 - 所有 Variables - 所有 Constants - 所有 .TRUE, .FALSE. ** N1 : 用 operator 之總次數 ** N2 : 用 operand 之總次數 ** n : #of Vocabulary (n1 + n2)

** N : Program Length (N1 + N2)

** V : Program Volume - #of total Bits required in memory

4

** L : Program Level

◎主要指 Language description power。 (syntatic power).

** IC : Intelligence Content

◎指 Program 由一種 Language 轉換成另一種 Language 後不變的那一部份。

** D : Program Difficulty

◎指 implement 一個 Algorithm 之難易程度。通常與 L 有關， L 愈低D 就愈高。

** E : Program Effort

◎製作一個 Program 所需之 Effort 。* Estimation formula

** Estimation of Program Length (N)

N = n1lg(n1) + n2lg(n2) Assembly view

◎N 主要被拿來 estimate L.O.C ( Source Code ) ，不含 Comments 。 E.g. N = 17lg17 + 15lg15 = 128.09 N = 113

^^^

^< 實驗 > 255 個 program

通常 N < 170 , N > N

N > 200, N < N

Correction 0.94

^

^

5

LINE LABEL STMT NOE 1 0 0 $JOB WATFOR 2 0 0 C 3 0 0 C PROGRAM TO FIND THE ROOT OF THE EQUATION X ** X = 10 4 0 0 C INTERATION 5 0 0 C 6 0 0 C CHAN CHI HUNG 7 0 0 C EI 8 0 0 C 9 0 1 1 READ, X, E, O 10 0 2 1 IF ( X.LT.0 ) GOTO 20 11 0 3 2 I = 1 12 10 4 3 10 Z = X ** X * (ALOG (X) + 1 ) 13 0 5 3 IF ( ABS(Z) .LT. D) GOTO 30 14 0 6 4 Y = X - ( X ** X - 10 ) / Z 15 0 7 4 IF ( Y.LT. O) GOTO 40 16 0 8 5 I = I + 1 17 0 9 5 IF ( ABS ( X - Y ) .LT .E ) GOTO 60 18 0 10 6 IF ( I .GT. 30 ) GOTO 50 19 0 11 7 X = Y 20 0 12 7 GOTO 10 21 20 13 8 20 PRINT, ‘Initial guess small than zero’ 22 0 14 8 GOTO 70 23 30 15 9 30 PRINT, ‘Derivation of the function vanishes’ 24 0 15 9 A, Newton - Raphson interation invalid’ 25 0 16 9 GOTO 70 26 40 17 10 40 PRINT, ‘Invalid initial guess, next approx’ 27 0 18 10 GOTO 70 28 50 19 11 50 PRINT, ‘Number of interations exceeds 30 29 0 19 11 A, guess’ 30 0 20 11 GOTO 70 31 60 21 12 60 WRITE (6,80) Y 32 80 0 80 FORMAT (11X, ‘Root of equation = ‘, E15.6) 33 70 22 13 70 STOP 34 0 23 14 END 35 0 0 $DATA

6

n1 = 17 OPERATORS----------------------------------------READ 1EOS 21IF 5( ) 10.LT. 4GOTO 10= 5** 2* 1ALOG 1+ 2ABS 2- 3/ 1.GT. 1PRINT 4WRITE 1----------------------------------------TOTAL N1=74

n2 = 15 OPERAANDS------------------------------------------X 90 220 1I 41 3Z 3D 130 2Y 410 240 1E 160 150 170 4------------------------------------------TOTAL N2 = 39

7

如果以 N 來估計 L.O.C. Correlation 約 0.8

　　　　 E.g. LOC = 23 N = 128

愈小的 program 愈不準，愈大的則較準，但均必需乘上一些調整因子。＊不要估算一個 program ，去估計整個 S/W 會好一些，但會碰到 n1與 n2

不好尋得之問題。 Function pts 試圖解決此一問題 ...

＊ Program volume Estimation

V = Nlgn

例子： V = 113lg32 = 565

- 最小可能之 Volume Estimation (Potential Volume )

V* = N* lgn*

n* = n1* + n2*

可能的 Volume 可用於早期（ S/W life cycle)之 Size 估計 .

所以所謂 n* 即表示 SS or SRS 中估算而來之 I/O 與 Process 特性 .

假設 Operator 均為“ Built-in”則 n2*幾乎 Dominate V* 因為 n1* 只有Program name 與 ( )即 2

Executable Code Size

N2* log2n2* V*

( 例子 ) V* =(2+N2) log2(2+N2)

= 17(log217)

≈ 68 (bytes)

^

V* 之 lower bound

8

◎Program Level 之估算

L= 最 High Level之 Language Built-in every thing to be operated

所以 L = 1。 Level 愈低表示 language level 愈低 - How to estimate L

L = 刻意迴避 N1, 因為它關係到 detailed Program logic

** Halstead 認為 L 與 L 之 Correlation 約 0.9

** 有人用 FORTRAN 之 Program 去 estimate L 與 L 之 Correlation 為 0.531 。表示一個 Project中 Engineer 的功力亦將影嚮 L 之值。◎Program Difficulty 之估算 D

D = D = = D = = 22.1

當 Program Size 大時 , n1 對 D 之影嚮不明顯 , 主要之 factor 是程式大時data 將嚴重影嚮程式之難易程度 , (Operand 之平均使用次數 ) ，所以OOD 是一個自然的反應與趨勢。

^

^

^

2n2

n1N2

^

1 L

1

L̂^ n1N2

2n2

17*39

2*15

N2 n2

^

V*V

9

* Intelligence Context (IC)

* Program Effort E

E = = DV

Halstead 認為當 language level 很高時 implement effort 則降低

估計 E = =

例子

= 14153.9

S/W Science

可能忽略了人為之影嚮。例如 Effort 與人之能力與經驗，有很高的關係！

V L

V

L̂n1N2 Nlog2n 2 n2

^

^

17×39×128log232 2 ×15

＊＊

× VVV

VLVIC

n1N2 Nlog2n 2 n2

＊

10

Function Point

* 什麼是 F.P. ?

把 SS or SRS 中之 Capability 分類，分為5 種 Function Types ，分別估計其 F.P.

依 Processing Complexity 調整所有估計值

算出 Total F.P.

* 五種 Function Types

1. External Input Types

2. External Output Types

3. Logical Internal File Types

4. External Interface File Types

5. External Inquiry Types

11

* EXTERNAL INPUT TYPES (transactions input from users or other applications)

** Input data item → “G-value” from INS 。 ** User key in → “ User’s name “ 。 ** Update logical internal file type 中之 data ( 一個 action)

→ “ Update Access right table” 。 Note : - 不同 format 之同一 input (content 一樣 ) ，不論出現次數均 “ count 1” 。

- 有些 inputs format 相同，只要 Processing logic 不同就視為不同之inputs 。

** External Inputs 可區分為三種 Complexity module 。 - Simple：․沒有太多 single data item 。 ․沒有太多 update logical internal file 之 input 。 ․沒有太多 human factor request 。 - Complex： Simple 的相反。 - Average ：搞不清楚 Simple或 Complex 。 Note : - 不要把額外之 input 加上， ( 如為 Testing 方便所設者 ) 。 - 別把 record file input 算進去，因它屬於“ external interface file” 。

12

* EXTERNAL OUTPUT TYPES (transactions output to users or other applications)

** Single output data item or message report ** Control → “ Launch”Note : 同 External input Type ** 三個 complexity level - Simple : 一兩個 field 之 data elements. - Complex : 此一 output 將成為許多或複雜之檔案處理動作的 reference - Average : 好幾個 field 之 data elements 。Note : ** 不要把 output file 算進去，因它屬於 External Interface file 。 ** 不要把 External response ( 即針對 External inquiry response 算進去，

因它屬於 External inquiry type ，即 data從 database 取得。 )

* Logical Internal file Type ** 對 User 而言，一組具有邏輯意義之 data file , 這些 file 可能由

system 產生、使用 or maintain 。如 Access right table for DBMS 。 ** 三個 Complexity level - Simple : record type 不多， data type 不多，沒有特殊 performance 需求及 recover 之需求。 - Complex : Simple 的相反。 - Average : 搞不清是 Simple or Complex 。

13

* EXTERNAL INTERFACE FILE TYPE

** 在 Application 間 ( 不同 CSCI) ，互相傳送或 Shared之 Data File ，且分別在各 Application 上都得 Count 進去！如 Access Right Table

For MIS。 Complexity level 定義與 Logical external file type 完全一樣。

* EXTERNAL INQUIRY TYPE (Single key search)

Input QueryS/W

FunctionQuery Response

如 Search key Search response

Note : ** 相同之 Query/response Format 不論出現幾次仍是 “一個” ** 不論 Format 如何， Processing logic 不同就不是同一個。 ** Complexity Level - Query Part : 用 External Input 之方法 - Output Response : 用 External Output 之方法 Simple Simple Average Average Simple Average Simple Average

Simple Average Complex

14

Calculate Function Points Albrecht 提出 : ( 以 IBM 經驗數據 )

( ＃ of Inputs ×4) + (＃ of Output ×5) + ( ＃ of Inquiries ×4)

+ (＃ of files ×10) + (＃ of Interfaces ×7) = Function Points (F.P.)

* 為了取代 S/W Science 中之＃ of operands 或 operators， F.P. 需做適當調整。

Adjusted F.P. → AFP = PCA × F.P.

Processing Complexity Adjustment◎PCA 的 14 個特徵 :

** Data Communication ( 如 LAN , WAN) 、 ** Multiple Site 、 ** Performance

** Distributed functions ( 需透過 Synchronous or asynchronous mechanisms 運作的func. 如 Handshaking) 、 ** Heavily used configuration (S/W 在很 Busy的

HOST 上跑 )、 ** Transaction Rate 、 ** Online data Entry 、 ** Online update 、 ** End User Efficiency ( turnaround time) 、 ** Installation Ease 、 ** Complex Processing (Application Domain 不單純，如 Matrix Operation 、 Exception Handling等 ) 、 ** Facilitate Change ( 指 C.M. for Capability) 、 ** Reusability ( 指使用許多運作中 S/W的 Components) 。

15

◎定 PCA 的方法

14 個特徵中，給定一個 0~5 的值，代表其對 Processing Complexity 之影嚮。　》沒有該項特徵或有也沒影嚮為 “ 0 ” 》搞不清是 2 or 4 為 “ 3”

》影嚮不明顯 ( 但有 ) 為 “ 1” 》影嚮很顯著為 “ 4” 》影嚮一些為 “ 2” 》影嚮嚴重 (尤指throughput) “5”

PCA 值 range 應在 0.65 ~ 1.35 間

PCA = (14 個特徵值之總和 ) * 0.01 +0.65

AFP FP * PCA

16

FP 與 S/W Science由於 FP 從 SS or SRS 中取得，調整後用以取代 S/W Science 中之 # of

operands & operators ，因此只能算是 Potential Count 。通常直接拿來 estimate SLOC 會差很多。所以用來 estimate V* ：

V* = (AFP +2 ) log2(AFP +2)

如 PL/1 SLOC 6.3 (AFP +2) log2(AFP +2) + 4370 則 Correlation 是 0.997.

* 如把 AFP直接拿來 Estimate SLOC ：則 PL/1 → AFP 65 SLOC COBOL→ AFP 100 SLOC

*AFP與 Language及 Application 屬性也有關係。原來 Function Point Concept ，偏重於 Data intensive Applications之 Estimation ，對 scientific App.s Or embedded S/W 則較無法使用。

Modify

Feature Point

17

Feature Points 旨在矯正 Func. Pt 無法方便的估計 real-time. Embedded, Military , System S/W 之缺點。 * Feature Point 組成之參數 # of Algorithm × 3

# of Input × 4# of Output × 5# of Inquiries × 4# of Data files × 7# of Interfaces × 7

+ Feature pts. * 它也有 PCA. Rang 是 0.6 ~ 1.4

Function Point與 Feature Point 之關係S/W 種類 Feat Pt/Func. Pt.Non Procedural 0.75Batch 0.9Scientific/ 數學 1.05System S/W 1.1Telecommunication 1.15

Process Control 1.2Embedded / Real-time 1.25Graphic / Image Processing 1.3Robtic / Automation 1.35A.I. 1.4

18

CYCLOMATIC COMPLEXITY (McCabe & Gilb)

* 簡言之 #of different cond’ns * 算法 : 1. 把 Program Flow Chart畫出來 2. Cyclomatic complexity

V(G) = #of edges - #of nodes + 2 當 Program 中沒有 decision時 , #of edges = #of nodes - 1

反應 Control 之複雜性 V(G) = -1 + 2 = 1* 對一個 Single entry - Single Exit之 Program 而言 V(G) = #of Single Binary decision +1, 假設有 k 個 decisions , 則 #of edges = (#of Nodes -1) + k, 所以 V(G) = (#of nodes -1) + k - #of nodes + 2 = k+1 = #of decisions +1

* Algorithmic approach 看到 IF , CASE 或其他 alternate execution Construct 就 +1 看到 Iterative Construct 如 Do, Do - While 就 +1 對每個 k choice之 CASE 就 +(k-2) ← 2k edges－ (k+2) nodes = k-2 對每個 IF 中如有 AND 或 OR 就 +1

有問題嗎？

Nested IF

19

例子

1,3,4,5,6 中各有一 Cond’nV(G) = 5+1 = 6V(G) = 18 - 14 + 2 = 6

Exit

** Cyclomatic # 可代表 unit test 之測試需求。

KNOTS Metrics

* 指 Control flow 之交叉點

所以 Branch 愈多 knots 愈多

Control Complexity 愈高

1

2

3

4

5

6

7 11

1210

9

8

13 14

Entry

20

** 與可讀性及可維護性有關。如果 node 上，不只有一個 Statement ，如 Node 3 ，有一個Backward branch 指進去，當然指到1st Statement of Node 3 ，因此會與其 Out branch有 knots 。

例子 1

2

5

6

11

12

10

9

8

13

14

3

4

7

Knots count = 23這個 Program “Go To” 太多 !

IF (Cond’s) THEN GO TO 20

State - 1

State - 2State - 3

Go TO 10

Node 1

Node 2

Node 3

Node 4

10

20

Exer.

1

2

3

4

56

7 11

1210

9

8

13 14

Entry

Exit

21

SCOPE METRIC [Harrison , Magel , 1981] 以衡量一個 Program Node 在 Program Logic 中伴演之角色為基礎。 * Selection node - Program Graph 中，那些 out-degree超過 “ 1” 的 node 稱為 “ Selection node” 。

* Receiving Node - 不是 Selection node 稱之。 * Greatest Lower Bound Node (GLB node)

Node 1是 Selection node 它的 Lower Bound

Nodes 有 8 → Scope 1 ← Scope內沒別人 2 → Scope 1 ← Scope內沒別人 3 → Scope 2 , 1← 2 在 Scope內不是→ 4 → Scope 3 , 7 , 6 , 5 , 4(X) 9 → Scope 3 , 7 , 6 , 5 , 4 , 3 , 2 , 1 10 與 9 同 ….. 13 → Scope 9, 3, 7, 6, 5, 4, 3, 2, 1 10, 4, 3, …… 12,……

11, …… 8

它最大所以 Node 1之 GLB是 13

1

2

3

4

5

6

7 11

13

12

13

10

13

9

13

8

13

22

Node 3 是 Selection node GLB → 9 => 3, 7, 6, 5, 4 不是 → 4 => 3, 7, 6, 5, 4(x), 3 ( 其實 9, 10, 12, … 均可 ) 5 => 與 4 同 ….. 10 => 與 9 同 12 => 與 9 同 11 => 與 9 同不是→ 13 => 9 , 7 , 6 , 5 , 4 , 3 10, 7 , 6 , 5 , 4 12, 7 , 6 , 5 , 4 因為 11, 7 , 6 , 5 , 4 out of scope 8, ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~例子統計 Node Scope Metric GLB 1 2,3,4,5,6,7,8,9,10,11,12 12 13 2 1 Receiving 3 4,5,6,7,3 6 9 4 4,5,6,7,3 6 10 5 4,5,6,7,3 6 12 6 4,5,6,7,3 6 11 7 1 Receiving … ... 13 1 14 0 Terminal

Scope Complexity 44

1

2

3

4

5

6

7 11

13

12

13

10

13

9

13

8

13

23

* SCOPE RATIO

例子 : ( 1 - 13 / 44 ) * 100% = 70.45%

Complexity 愈高愈接近 100%

- Scope Metric 可矯正 Cyclomatic # 對 Node Complexity 之忽略。 - Scope 會誤判情形， Scope 很高可能是 Program 很大，而不是

logical complexity 很複雜！因此 Scope ratio 可平衡一下這種誤差。

%100*)1(SCOPE

N

#of node (terminal node 不算 )

24

Syntatic Complexity Family 是一種 Hybrid metrics ，把因應一套軟體系統中各種不同性質之程式，

組合不同之 metric 。假設電腦 Software S 被程式 P1， P2， ...， Pk 所組成，則 Software S的 Syntatic Complexity C(S) ：

C(S) = b × [ C ( Pi ) ] 其中 b是 weight ，可能與 nested level 有關。

*根據 Decomposition Criteria 及程式性質，決定每次 decompose的component是 proper或 non-proper. V1 if Pi Proper

C(Pi) = V2 if Pi Non-proper

範例： Executable Statement Count (STMT)

C(S) = b× [ C(Pi) ] 1 if Pi is a executable statement 令 b = 1 , C(Pi) =

0 if otherwise

C(S) = C(P1) + [ C(Pi) ] 假設 P1是 executable statement 則

= 1 + [C(Pi)] = … = STMT

k i=1

k i=1

k i=2

k i=2

25

# of Call (CALL)

b = 1 ， 1 if Pi 是個 func. Call or proc. Call C(Pi) = 0 if otherwise

C(S) = C(Pi) = CALL

Cyclomatic Complexity (for single entry and single exit binary decision)

b = 1 ， 1 if Pi是 Segment C(Pi) = 0 if otherwise 所謂 Segment 就是 Segment Segment Entry statement Entry Statement … … Branch statement Terminal statement Generalized Form

if P = S1 ; S2 ; …… Sk a sequence construct C(P) = b( C(S1) + C(S2) + …+C(Sk))

if P is Nested IF construct C( IF B1 THEN S1) = b(B1) C(S1)

= b(B1) C(IF B2 THEN S2) = b(B1) b(B2) C(S2) = b(B1) b(B2) … b(Bk) C(Sk) ，假設 Bi 均為 W個 cond’ns之 Logic expression ( 即為 W-1 個 AND 或 OR之

expression)

令 b(Bi)與 cond’n 個數有關則 C(Nested IF ) Wk C(Sk) Wk

(Sk是 single expression 所以 C(Sk)=1)

k i=1

26

Data-Flow Metric Definition : Block (或 Segment, Chunk ) 範例 … IF(...)THEN GOTO 20 Statement S1 The only statement get control from the other block. Block S2 Sequentially (no branches). ... Sk 可能是一個 Branch ，或是一個 Common statement 。 20 Sk+1 Control entry point 所以是另一個 Block 之開始點。

Def : Variable Definition of a statement single statement X = f function call ( X 在等號左邊 ) Assignment X 得到一個 definition

Def : Variable Reference of a statement single statement = f(X) function call ( X 在等號右邊 ) Assignment Output X 被 reference 到

27

Def : Locally Available Variable of a Block

X 如果在 Block B 有一個 definition 則 X locally Available in B

Def : Locally Exposed Variable reference in a Block

X 在 Block B 被 reference 到，但 X 的 definition 不是來自於 B

◎ Reach

X= … Block B :

X 在過程中沒有 redefine 過，即 X不locally available along the path。 (Path 可能empty 表示 C是 B的 immediate successor) …

= f(x) Block C : 表示 B中 X的 definition Reach Block C

28

◎ Data - Flow metric ※ Reach Set Ri －所有從外面帶進來的值。 (Values count)

Let V be the set of all variables in whole program.

Ri = { Def set of v | variable vV which’s definition reach Block i }

※ Locally Exposed Set －所使用的變數中，其值是在 block 外被定義的。 (Name count)

Vi = { v | variable vV which is locally exposed in Block i}

※ # of definitions of Reach set Variables in Block i

DEF (vj) = ∑j# of definitions of vj , vj Vi, 且其 Definitions Ri

◎討論 ※ Ri 中某個 v 的 reach def. 可能有好多。因為 v 可能在不同的 Block 被

define ，而在不同的時間被送入 Block i 中使用。 ※ Vi 中的 v 在 Block i 中均至少被 exposed 一次。

Block i 的 data flow complexity DFi

DFi = DEF (vj)

||Vi||

j=1

29

◎圖示 ... 加起來

Block i

假如一個程式有 S 個 Blocks ，則此程式之 data flow complexity 就是

DF = DFi

□如果考慮 concurrent program 則 data flow complexity 會變成？想想看。

※ handshaking mechanism

※ mutual exclusion

□ 這種定義只針對 Program 內部 data flow 之 Complexity ，如果考慮 Program 間之 data flow complexity？想想看。

S

i=1

30

Information Flow Metric :

* Def : (Global Flow)

Process A 與 Process B 間有個 global flow 透過一個 global data structure D ，而 A 把 data 送入 D (或 update D) ，而 B 經由 D 拿去用。

A → D → B

* Def : (Local Flow)

Process A 與 Process B 有一個 Local Flow if 滿足其中 :

假設 A call B (Direct local flow)

假設 B call A ，而 A return a value 給 B， return value flow 為一Indirect local flow 。

if C call A 又 call B ，其目的只是為了從 A 中取得一個 value， C 自己不用又送給 B 使用，則 A→ B 是個 indirect data flow 。

A B

Value return

BA

CValue pass through

31

Fan-in Process A 的 Fan-in 是 :

( 進入 A 之 local flows ) + ( 從 global data structure 抓 data 之 flows )

Fan-Out Process A 之 Fan-out 是 :

( 從 A 出去的 local flows ) + ( 去 update global data structure 之 flows )

◎一個 Process (Program) 與外界的關係就是 Fan-in 與 out 。

Information Flow Complexity

Length × ( Fan-in × Fan-out )2

(s/w size)

( Fan-in × Fan-out )2

Dominate Information Flow Complexity

◎為啥長成這樣？ ※ 為了衡量 maintain program 可能發生修改的程度 .

※ 與 data flow 有關。 ※ 與 programmer 有關。

32

※ ( Fan-in × Fan-out ) ：表示 Process A 可能造成之 I/O Combination 。 ※ 在 Team Work 組織下， Programmer 間互相之 interaction 。 ( Fan-in × Fan-out )( Fan-in × Fan-out)

( Fan-in × Fan-out )2 依圖想想 (Fan-in × Fan-out )2

用 UNIX 中之 Procedure 做實驗，關係是到底出了啥問題，對 unix系統

而言。

0.0057 × 4

0.0228 log(fan-in × fan-out) # of procedures changed.

Y 95-38 57 = = = 0.0057 發現修改次數的可 X 104 - 10 9996.8 能關係為整合所構 y = 0.0057 x 成的 tree 之高度，因所以 ( Fan-in × Fan-out )2 =10 此問題通常來自於 2 log ( fan-in × fan-out ) = 整合介面之問題。 4 log ( fan-in × fan-out ) = X

0.0228 log ( fan-in × fan-out ) (Y軸坐標 )

x

2

x2

100 90 . .% of changed 40 procedures 30 20 10

104 complexity

(104,95)

)38,10(

10

10

10 10

102

33

Entropy-Based Measure

◎所謂 information (消息、情報、新聞 ) 就是「發現你尚不知道的事情」。

◎換句話說：一串 message 中，所有 symbol 給你 surprise 的程度，通通加起來就是該 message 所提供的 information 。

◎以 surprise 程度來定義 Information 量，可以衡量語言的表達能力。

所以一串 message 中，任一 symbol 而言，給人「 surprise」程度與其出現機率之倒數成正比。

令 Si 為 message 中的一個 symbol 。

Pi 為 Si 之出現機率

Ii 為 Si 之 information 量 ( 通常以 Bit 為單位 )

Ii = lg ( 1 / Pi) = - lg Pi

34

當 Si 出現時造成之 “震撼” (Entropy) 有多大？

Pi (-lg Pi) = - Pi lg Pi

◎一串 message 而言， S1, S2, …, Sq, 其 total entropy

就是 H= - Pi lg Pi Pi = 1

※ 當 P1, …, Pq, 只有一個 symbol 時， if 是 Pk 則 Pk = 1， H = 0 為最小時。

※ 當 P1=P2=…= Pq 時，則 Pi = 1 / q， H = lg q 為最大值。

－ (P1 lg P1 +... + Pq lg Pq) = － lg = － lg(1/q)

當達到最大值時，表示所有 Symbol 出現機率一樣。

假設 Entropy 愈大， Complexity 愈低。最簡單的例子，一個程式用了三個operators ，而總次數是 3 ，則每個 operator 只出現一次， Entropy 是 - lg 1/3 = lg 3 。

q

i=1

q

i=1

)...( 11

Pqq

P PP

35

◎ 把 Entropy 用在 S/W 上

定義 Pi = fi ： operator i 的使用次數

N1 ：所有 operator 使用總次數

- lg

用以評估 Error Span

N Error Span = 愈大愈好，表示 error density 低 # of errors

實驗發現 Entropy 愈大 → Error Span 愈大

fi

N1

fi

N1

fi

N1

n1

i =1

36

Entropy for S/W structure complexity [ Structural Entropy] 把 Program 分成 Block view (segment, chunk) 。 Apply Entropy

measure for a chunk.

◎ Program flow graph

E.g.

< G > < G’ >

◎ 1st order entropy measure :

把 a, …., g 7 個 chunks 分成若干 equivalence class 。 Def : 兩個 Chunk 同一 class， if 該兩個 chunks (node) 之 in/out

degree 一樣。所以 G 之 1st order class 為 : {a}, {b,c,e,f}, {d}, {g}

G’之 1st order class 為 : {a}, {b,d,e}, {c}, {f}, {g}

◎ G 之 1st order entropy

- [(1/7)lg(1/7) + (4/7)lg(4/7) + (1/7)lg(1/7) + (1/7)lg(1/7)] = 1.666

◎ G’ 則 1st order Entropy = 2.128

a b c

d e f

g

a b c

d e

f g

37

◎ 有啥意義呢 ? G 與 G’ 之 Cyclomatic Complexity 均為“ 3” (但 Structure 不同 ) ，而

1st order entropy 不同， G為 1.666， G’較大是 2.128 。所以 entropy 可以反應 structural different。 G’有 NESTED if or 類似之 statement ，理論上， complexity 要高些！ (logic) ，如 emphasize I/O degree ，發現 entropy不能反應 logic Complexity 。

◎ 2nd order structured Entropy 兩個 Chunk 要 equivalent iff 它們的爸爸與兒子要一樣所以 G 之 2nd class 為 {a} , {b,c}, {d}, {e,f}, {g}

(1/7)lg(1/7) + (2/7)lg(2/7) + (1/7)lg(1/7) + (2/7)lg(2/7) + (1/7)lg(1/7)

= (3/7)lg(1/7) + (4/7)lg(2/7) = -lg7 + 4/7

G’: {a}, {b}, {c}, {d, e}, {f}, {g}

(5/7)lg(1/7) + (2/7)lg(2/7) = - lg7 + 2/7

◎ G 之 2nd order Entropy 大 → Complexity 低 ◎ G’ 之 2nd order Entropy 小 → Complexity 高

a b c

d e f

g

a b c

d e

f g

38

Relative Metric

Halstead

S/W

Science Size

.

Cyclomatic RELATIVE

. Information Content COMPLEXITY

.

Entropy

. Modularity . . . . Data structure . . A . Hybrid . Complexity

Metric Factor Domain (Classification)

Control

39

◎如何計算 S/W 之 relative Complexity :

S/W

m1, m2, …, mi , 一個 S/W 由許多 … S/W module 組成

定義 module i 之 relative complexity :

i = 1Si1 + …+ jSij +…

※ Sij 為 Module i 在 factor domain fj 所得之量測值，所謂 factor domain 就如「 Control, Size, …」。

※ j 為 Module i 在 factor domain fj 所得量測值 Sij 所佔之份量，稱「特徵值」。

與當初選定之 metric domain與 factor domain之 correlation 有關

i 就是 S/W 之 relative complexity 。 i

【 j 】×【 Corij 】= 【 j 】

40

Metric Control Size Information content Modularity Data structure

V(g) 0.951 0.114 0.181 -0.041 -0.039

Statements 0.949 0.164 0.219 -0.058 -0.036

N1 0.944 0.141 0.200 -0.092 -0.072

OutCalls 0.933 0.141 0.021 0..036 -0.004

MaxDepth -0.027 0.971 -0.020 -0.040 -0.042

N2 0.244 0.946 0.065 -0.058 -0.020

Size 0.371 0.908 0.034 -0.034 -0.034

MaxOrder 0.062 0.084 0.919 0.132 -0.040

MeanOrder 0.058 0.085 0.918 0.133 -0.036

BW 0.248 -0.048 0.857 -0.101 0.084

MaxLevel 0.089 -0.029 -0.161 0.764 -0.112

Outputs -0.195 -0.118 0.168 0.741 0.162

InCalls 0.001 0.024 -0.182 0.163 0.791

Inputs -0.112 -0.106 0.247 0.244 0.743

Eigenvalues 3.964 3.733 3.162 1.370 1.240

MaxLevel: The max-number of nested levelBandWidth: A value based on McCabe's

cyclomatic complexity, adjusted for the added complexity of nested Ifs instead of just the number of Ifs in the code.

Table 1Factor Pattern for Metric Analyzer (by correlations)

MaxOrder: The count of the largest number of edges from a single node in the parse tree.

MaxDepth: The length of the longest branch in the tree generated by the parser.

41

Complexity Over Time

m 表示 module mj 第 i 個 version 。

如系統有 m1, m2, ..., mn, modules 組成

第一次系統整合後 , 可改為 < m’1 , m’2, …, m’n >

隨著時間 System update

Version 1 V 1 = < m’1, m’2, ………, m’n > = < 1, …, 1>

V 2 = < 2, 2, 1,……, 1 >

V 3 = < 2, 3, 1,……, 1 >

V 4 = < 3, 3, 2,……, 2 > …...* Version i 之 Relative Complexity

i = j

j

Vji

ij

n

V3 = 12

42

Dynamic Metric ( run time complexity)

假設 Pt1, Pt

2,……., Ptn, 代表 S/W 中 modules m1, ..., mn 在某一特

定時間 t 裏 ( 一段 execution time) ，可能出現之機率

很類似 Working Set 的精神 !

◎ Dynamic Complexity

t = Ptj j

考慮 Configuration 不見得要 relative complexity

任何 metric 加上 Pj 均可視同 dynamic ！

t = Ptj j

但為何 relative 比較好呢 ?

㊣考慮 dynamic 時，在一段時間裏被使用之 module 是變化不定的，而每個 module 在各個 factor domain 量測值不同，然而每個 module會因功能不同而偏向某個 factor domain 的特性，因此 relative complexity 比較能顯現這種差異性。

Vij

n

j=1

n

j=1

43

如何使用 Metric (A Kind of Relative Complexity) ◎ Metric Classification Tree for Type X errors

Figure 1. Example hypothetical metric-classification tree. There is one metric at each diamond-shaped decision node. Each decision outcome corresponds to a range of possible metric values. Leaf nodes indicate if a module is likely to have some property, such as being error-prone or containing errors in a certain class (in the figure, “ ＋” means likely to have errors of Type X and “ －” means unlikely to have errors of Type X).

Data Bindings

Revisions SystemType

－－－＋＋

＋－

－

0-12 >120-18 >18

Real-time

Non-real-time

>1500-150

SourceLines

0-3 4-5 >10

CyclomaticComplexity

6-10

44

Table 1. Interface-error data

Module

A B C D E F G H I J K L

Interface errors 3 2 10 1 2 9 1 3 6 2 3 0

Class －－ + －－ + －－ + －－－

※ 根據經驗可以得到區分的標準，如 Module 之 interface error 數超過 5 個以上才叫 “高危險群” ( 或陽性反應 ) 。2 、選定基準 metrics ，利用這些 metrics 對各選定之 module 計算初值 ( 由歷史資料中取得 ) ，如此可以取得資料如下表： File management (F) User interface (I) Process control (P)

Table 2. Raw training-set data

Metric Module

A B C D E F G H I J K L

Module function I I F I F I P P P I F F

Data bindings 2 9 6 13 10 15 6 15 20 4 17 16

Design revisions 11 9 11 0 5 4 2 10 5 7 1 0

Class －－ + －－ + －－ + －－－

◎如何建造 Metric Classification Tree (MCT- for interface error) 1 、把要分析的對象找定，並收集經驗資料，據此把分類原則確定。

45

3 、依據影嚮程度將各 metrics 之評估值分成若干值區，以便分類。Table 3. Recoded training-set data

Metric Module

A B C D E F G H I J KL

Module function

Data bindings

Design revisions

Class - - + - - + - - + - --

Module function = File management(F); = User interface(I); = Process control(P)

Data bindings = 0 x 7; = 8 x 14; = x 15

Design revisions = 0 x 3; = 4 x 8 ; = x 9

4 、將選定之 metric 加以評估，決定那些 metric 應該擺在 MCT 上的那個 node ，原則是從 root開始選，因此每次均必需選一個對 “分析對象” 區分能力最強的 metric 擺上去。

※為了評估 metric 之分類能力，定義 “ Metric-selection Function”.

※當 metric選定後，它會根據 Step 3 中之值區 (i.e., 、、 ) 將樣本modules

(i.e. A、 B、 C、… ) 分成若干 subsets 。

46

※令 pi 與 ni 分別代表第 i 個 subset 中，各 module 它在 Table 1 所對應之 “陽性反應” 個數與 “陰性反應” 個數。我們把 metric-selection function 定義成 :

F(pi , ni ) = － lg － lg

※F(pi, ni ) 最大值發生在 pi = ni 時※0 F(pi, ni ) 1

※當 F(pi, ni ) 的值愈大時，表示該 subset 所有 module 中為陽性反應與陰性反應之個數相同。也就是說該 subset 並沒有被分類得很好。

※因為一個 metric 會把樣本 modules (i.e., A, B, C, …) 分成若干個 subsets ( 假設分為 V個 ) ，則針對一個 metric 之區分力評估可列為： ( 必須考慮其比重 )

　　　　 E (m, M) = wi × F(pi, ni )

樣本集合　 modules metric ** wi = (pi+ni) ÷ |m| 代表 subset i 之 module 數佔總樣本 module 數之比重

V

i=1

pi pi ni ni

pi+ni pi+ni pi+ni pi+ni

47

**根據討論 E(C,M) 之值愈小表示該 metric 的區分能力愈強。

Figure 3. A partial tree using module function as the candidate metric. The metric-selection function E ({A,B,..,L} , Module Function) return 0.801. Positive target class instances are underlined.

Figure 4. A partial tree using data bindings as the candidate metric. The metric-selection function E ({A,B,..,L} , Data Bindings) return 0.675. Positive target class instances are underlined.

Figure 5. A partial tree using design revision as the candidate metric. The metric-selection function E ({A,B,..,L} , Design Revisions) return 0.603. Positive target class instances are underlined.This metric is selected and its leftmost child becomes a leaf node labeled “-”.

圖 5 E 值最小，因此 metric Design revisions 之區分能力最強 !

p n total weight F(p,n) w×F(p,n)

Child 1 0 4 12 .333 0.0 0.0

Child 2 2 2 12 .333 1.0 .333

Child 3 1 3 12 .333 .811 .270

Sum .603

Designrevisions

D, G,K, L

E, F,I, J

A, B,C, H

Modulefunction

C, E,K, L

A, B,D, F, J G, H, I

Databindings

A, C,G, J

B, D, EF, H, I,

K, L

lg lg pi pi ni ni

pi+ni pi+ni pi+ni pi+ni

Table 3. Recoded training-set data

Metric Module A B C D E F G H I J K L

Module function Data bindings Design revisions Class - - + - - + - - + - - -Module function = File management(F); = User interface(I); = Process control(P) Data bindings = 0 x 7; = 8 x 14; = x 15Design revisions = 0 x 3; = 4 x 8 ; = x 9

48

Figure 6. A partial tree using module function as the candidate metric. The metric-selection function E ({E, F, I, J}, Module Function) return 0.500. Positive target class instances are underlined.

Figure 7. A partial tree using data bindings as the candidate metric. The metric-selection function E ({E, F, I, J}, Data Bindings) return 0. Positive target class instances are underlined. This metric is selected, yielding three leaf nodes labeled , from left to right, “－” and “＋” .

Figure 8. A partial tree using module function as the candidate metric. The metric-selection function E ({A, B, C, H} , Module Function) return 0. Positive target class instances are underlined.

Design revisions

Modules function

A, B, C, H－

E IF, J

Data bindings A, B, C, H－

J F, I

Design revisions

E

Databindings

－

－－＋

Modulesfunction

C A, B H

Design revisions

49

Figure 9. A partial tree using data bindings as the candidate metric. The metric-selection function E ({A, B, C, H}, Data Bindings) return 0.500 Positive target class instances are underlined. For this example, the metric module function is selected (see Figure 8), and it produces three children labeled “＋” and “－” .

Figure 10. The completed classification tree.

Figure 11. Applying the classification tree on module N.

例子 : Raw test-set data. Metric Module M N ODesign revisions 0 7 12Module function P I IData bindings 3 16 9

Recoded test-set data. Metric Module M N ODesign revisions Module function Data bindings

Data bindings

－

－－＋

Data bindings

A, C B H

Design revisions

Designrevisions

Databindings

－

－－＋

Modulefunction

＋－－

0~34~8

> 9

0~78~14

>15F I P

Designrevisions

Databindings

－

－－＋

Modulefunction

＋－－

0~34~8

> 9

0~78~14

>15F I P

50

* 本圖整理了 MCT 建造流程 !

(A)Definetargetclass

(C)List

candidatemetrics

(F)Generate

trees

(G)Collectcurrentproject

data

(H)Persistent

storagemanager

(D)Retrievehistorical

data

(E)Calibration

(B)Developremedial

plans

(J)Take

correctiveaction

(I)Apply treesto current

project

Feedback

Target-classdefinition Remedial

plans

Targetedcomponents

Feedback

Trees

Parameters

Metric list

Metriclist

Target-classcriteria

Training-setdata

Metricdata

New Metric ldata

Currentprojectmetricdata

Data managementand

calibration

Classification-tree

generation

Analysisand

feedback

Overview of the classification-tree methodology.

51

REUSABILITY

Usefulness

Costs Quality

Reusability

Commonalityof function

Within asystem

Overall

RF

Within adomain

RF

Variety offunctions

C R

Identification

Extraction

Qualification

V RetrievalIntegrationV

Use innew

systemsV

RModification TimeSpace

Performance

Readability

Testability

Correctness

Ease ofmodification

V

C

CVPackaging

C Cyclomatic complexityR RegularityRF Reuse frequencyV Volume

The basic reusability attributes model.

software metrics

Documents

c program

equation x

program length n1 n2

alog x

e goto

o goto

program effort program

program volume estimation