presented paper

Using Machine Learning to Predict Project Effort: Empirical Case Studies in

Data-starved Domains

Gary D. Boetticher

Department of Software Engineering

University of Houston - Clear Lake

What Customers Want

What Requirements Tell Us

Standish Group [Standish94]

• Exceeded planned budget by 90%

• Schedule by 222%

• More than 50% of the projects had less than 50% requirements

Underlying Problems

85% are at CMM 1 or 2 [CMU CMM95, Curtis93]

Scarcity of data

Consequences

Early life-cycle estimates use a factor of 4 [Boehm81, Heemstra92]

Related Research: Economic Models

Early inLifecycle

Late inLifecycle

Top-Down COCOMO II COCOMO II

Bottom-Up Function Points

Why are Machine Learning algorithms not used more often for estimating early in the life cycle?

Related Research - 2

Early inLifecycle

Late inLifecycle

Bayesian Chulani

CBR Delany Basio, Finnie, Kadoda,Mukhopadhyay, Prietula

GA Cordero

NeuralNetwork

Boetticher, Srinivasan,Samson, Wittig

Neurofuzzy Hodgkinson

OSR Briand

Goal

Apply Machine Learning (Neural Network)

early in the software lifecycle

against Empirical Data

Neural Network

Data

• B2B Electronic Commerce Data– Delphi-based– 104 Vectors

• Fleet Management Software– Delphi-based– 433 Vectors

Experiment 1: Product-Based Fleet to B2B

Vector SLOC Effort1 26 1: : :

Tra

inin

gD

ata

434 4398 2451 15 1: : :

Tes

tD

ata

104 2796 160

Experiment 1: Product Results

Experiment Actual Correct % Correct pred(25)1 11 out of 104 11%2 10 out of 104 10%3 11 out of 104 11%4 7 out of 104 7%5 12 out of 104 12%6 2 out of 104 2%7 8 out of 104 8%8 10 out of 104 10%9 14 out of 104 13%10 10 out of 104 10%

Experiment 2: Project-Based Results Fleet to B2B

Project Devel opment EffortExperimentNumber Actual Calculated

ProjectAccuracy

1 2083 1958 -6%2 2083 1962 -6%3 2083 1998 -4%4 2083 2238 7%5 2083 2110 1%6 2083 3412 64%7 2083 2555 23%8 2083 2104 1%9 2083 2083 0%10 2083 1777 -15%

Experiment 3: Product-Based B2B to Fleet

Vector SLOC Effort1 26 1: : :

Tra

inin

gD

ata

104 2796 1601 15 1: : :

Tes

tD

ata

434 4398 245

Extrapolation issue

Largest SLOCs divided by each other

4398 / 2796 = 1.57

Experiment 3: Product ResultsActual Correct

(raw scores)(out of 434)

% Correctpred(25)

(raw scores)

ActualCorrect (scaled)

(out of 434)

% Correctpred(25)(scaled)

130 30% 142 33%133 31% 96 22%78 18% 179 41%118 27% 172 40%132 30% 136 31%130 30% 117 27%134 31% 68 16%146 34% 241 56%130 30% 117 27%106 24% 118 43%

Experiment 4: Project-Based Results B2B to Fleet

Calc. Proj.Dev. Effort(Raw Score)

(out of 15949)

ProjectAccuracy

(Raw Score)

Calc. Proj.Dev. Effort

(Scaled)(out of 15949)

ProjectAccuracy(Scaled)

9464 -41% 14887 -7%8787 -45% 13821 -13%9066 -43% 14261 -11%9809 -38% 15429 -3%9281 -42% 14599 -8%8753 -45% 13768 -14%8640 -46% 13591 -15%10855 -32% 17074 7%8915 -44% 14022 -12%9299 -42% 14627 -8%

Results

ExperimentNeural Network

Average Accuracy(Pred 25)

LinearRegression(Pred 25)

Fleet B2BProduct

9% 16%

Fleet B2BProject

90% 0%

B2B FleetProduct (Scaled)

34% 29%

B2B FleetProject (Scaled)

100% 100%

Conclusions

• Bottom-up approach produced very good results on a project-basis

• Results comparable between NN and stat.

• Scaling helped

• Estimation Approach is suitable for Prototype/Iterative Development

Future Directions

• Explore an extrapolation function

• Apply other ML algorithms

• Collect additional metrics

• Integrate with COCOMO II

• Conduct more experiments (additional data)

presented paper

Documents