transfer defect learning
DESCRIPTION
JC's ICSE 2013 presentation.TRANSCRIPT
![Page 1: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/1.jpg)
Transfer Defect Learning
Jaechang Nam The Hong Kong Un iver s i t y o f Sc ience and Techno logy , Ch ina
Sinno Jialian Pan I n s t i t u te for I n focomm Research , S in gapore
Sunghun Kim The Hong Kong Un iver s i t y o f Sc ience and Techno logy , Ch ina
![Page 2: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/2.jpg)
Defect Prediction
• Hassan et al.@ICSE`09, Predicting Faults Using the Complexity of Code Changes
• D’Ambros et al.@MSR`10, An Extensive Comparison of Bug Prediction Approaches
• Rahman et al.@ICSE`12, Recalling the Impression of Cross-Project Defect Prediction
• Hata et al.@ICSE`12, Bug Prediction based on Fine -grained Module histories
• …
2
Program Prediction Model (Machine learning)
Future defects
![Page 3: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/3.jpg)
Training prediction model
3
Test set
Training set
![Page 4: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/4.jpg)
Training prediction model
3
Test set
Training set
M1 M2 … M19 M20 Class
11 5 … 53 78 Buggy
… … … … … …
1 1 … 3 9 Clean
M1 M2 … M19 M20 Class
2 1 … 2 8 ?
… … … … … …
13 6 … 45 69 ?
![Page 5: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/5.jpg)
Cross prediction model
4
Target project (Test set)
Source project (Training set)
![Page 6: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/6.jpg)
Cross-project Defect Prediction
5
“Training data is often not available, either
because a company is too small or it is the first
release of a product”
Zimmerman et al.@FSE`09, Cross-project Defect Prediction
![Page 7: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/7.jpg)
Cross-project Defect Prediction
5
“Training data is often not available, either
because a company is too small or it is the first
release of a product”
Zimmerman et al.@FSE`09, Cross-project Defect Prediction
“For many new projects we may not have enough
historical data to train prediction models.”
Rahman, Posnett, and Devanbu @ICSE`12, Recalling the
“Imprecision” of Cross-project Defect Prediction
![Page 8: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/8.jpg)
Cross-project defect prediction
• Zimmerman et al.@FSE`09
– “We ran 622 cross-project predictions and found
only 3.4% actually worked.”
6
Worked, 3.4%
Not worked, 96.6%
![Page 9: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/9.jpg)
Cross-company defect prediction
• Turhan and Menzies et al.@ESEJ`09
– “Within-company data models are still the best”
7
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Cross Cross with a NN
filter
Within
Avg. F-measure
![Page 10: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/10.jpg)
Cross-project defect prediction
• Rahman, Posnett, and Devanbu@FSE`12
8
0
0.1
0.2
0.3
0.4
0.5
0.6
Cross Within
Avg. F-measure
![Page 11: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/11.jpg)
Cross prediction results
9
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
F-measure
Cross Within Cross Within Cross Within
Equinox JDT Lucene
![Page 12: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/12.jpg)
Approaches of Transfer Defect Learning
10
Normalization TCA
TCA+
![Page 13: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/13.jpg)
11
• Data preprocessing for training and test data Normalization
• A state-of-the art transfer learning algorithm
• Transfer Component Analysis TCA
• Adapted TCA for cross-project defect prediction
• Decision rules to select a suitable data normalization option TCA+
Approaches of Transfer Defect Learning
![Page 14: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/14.jpg)
Data Normalization
• Adjust all feature values in the same scale
– E.g., Make Mean = 0 and Std = 1
• Known to be helpful for classification
algorithms to improve prediction
performance [Han et al. 2012].
12
![Page 15: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/15.jpg)
Normalization Options
• N1: Min-max Normalization (max=1, min=0)
[Han et al., 2012]
• N2: Z-score Normalization (mean=0, std=1)
[Han et al., 2012]
• N3: Z-score Normalization only using source
mean and standard deviation
• N4: Z-score Normalization only using target
mean and standard deviation
13
![Page 16: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/16.jpg)
14
• Data preprocessing for training and test data Normalization
• A state-of-the art transfer learning algorithm
• Transfer Component Analysis TCA
• Adapted TCA for cross-project defect prediction
• Decision rules to select a suitable data normalization option TCA+
Approaches of Transfer Defect Learning
![Page 17: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/17.jpg)
Transfer Learning
15
![Page 18: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/18.jpg)
Transfer Learning
15
Traditional Machine Learning (ML)
Learning
System
Learning
System
![Page 19: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/19.jpg)
Transfer Learning
15
Traditional Machine Learning (ML)
Learning
System
Learning
System
Transfer Learning
Learning
System
Learning
System
Knowledge
Transfer
![Page 20: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/20.jpg)
Transfer Learning
15
Traditional Machine Learning (ML)
Learning
System
Learning
System
Transfer Learning
Learning
System
Learning
System
Knowledge
Transfer
![Page 21: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/21.jpg)
A Common Assumption in
Traditional ML
16
Pan and Yang@TKDE`10, Survey on Transfer Learning
• Same distribution
![Page 22: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/22.jpg)
A Common Assumption in
Traditional ML
16
Pan and Yang@TKDE`10, Survey on Transfer Learning
• Same distribution
Cross Prediction
![Page 23: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/23.jpg)
A Common Assumption in
Traditional ML
16
Pan and Yang@TKDE`10, Survey on Transfer Learning
• Same distribution
Transfer Learning
![Page 24: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/24.jpg)
Transfer Component Analysis
• Unsupervised Transfer learning
– Target project labels are not known.
• Must have the same feature space
• Make distribution difference between
training and test datasets similar
17
Pan et al.@TNN`10, Domain Adaptation via Transfer Component Analysis
![Page 25: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/25.jpg)
Transfer Component Analysis (cont.)
• Feature extraction approach
– Dimensionality reduction
– Projection
• Map original data
in a lower-dimensional feature space
18
![Page 26: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/26.jpg)
Transfer Component Analysis (cont.)
• Feature extraction approach
– Dimensionality reduction
– Projection
• Map original data
in a lower-dimensional feature space
18
2-dimensional feature space
![Page 27: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/27.jpg)
Transfer Component Analysis (cont.)
• Feature extraction approach
– Dimensionality reduction
– Projection
• Map original data
in a lower-dimensional feature space
18
1-dimensional feature space
![Page 28: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/28.jpg)
Transfer Component Analysis (cont.)
• Feature extraction approach
– Dimensionality reduction
– Projection
• Map original data
in a lower-dimensional feature space
18
1-dimensional feature space
![Page 29: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/29.jpg)
Transfer Component Analysis (cont.)
• Feature extraction approach
– Dimensionality reduction
– Projection
• Map original data
in a lower-dimensional feature space
18
1-dimensional feature space
2-dimensional feature space
![Page 30: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/30.jpg)
Transfer Component Analysis (cont.)
• Feature extraction approach
– Dimensionality reduction
– Projection
• Map original data
in a lower-dimensional feature space
– C.f. Principal Component Analysis (PCA)
18
1-dimensional feature space
![Page 31: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/31.jpg)
Transfer Component Analysis (cont.)
19
Pan et al.@TNN`10, Domain Adaptation via Transfer Component Analysis
Target domain data Source domain data
![Page 32: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/32.jpg)
Transfer Component Analysis (cont.)
20
PCA TCA
Pan et al.@TNN`10, Domain Adaptation via Transfer Component Analysis
![Page 33: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/33.jpg)
Preliminary Results using TCA
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
F-measure
21 *Baseline: Cross-project defect prediction without TCA and normalization
Baseline NoN N1 N2 N3 N4 Baseline NoN N1 N2 N3 N4
Safe Apache Apache Safe
![Page 34: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/34.jpg)
Preliminary Results using TCA
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
F-measure
21 *Baseline: Cross-project defect prediction without TCA and normalization
Prediction performance of TCA
varies according to different
normalization options!
Baseline NoN N1 N2 N3 N4 Baseline NoN N1 N2 N3 N4
Safe Apache Apache Safe
![Page 35: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/35.jpg)
22
• Data preprocessing for training and test data Normalization
• A state-of-the art transfer learning algorithm
• Transfer Component Analysis TCA
• Adapted TCA for cross-project defect prediction • Decision rules to select a suitable data
normalization option TCA+
Approaches of Transfer Defect Learning
![Page 36: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/36.jpg)
TCA+: Decision rules
• Find a suitable normalization for TCA
• Steps
– #1: Characterize a dataset
– #2: Measure similarity
between source and target datasets
– #3: Decision rules
23
![Page 37: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/37.jpg)
#1: Characterize a dataset
24
3
1
…
Dataset A Dataset B
2
4
5
8
9
6
11
d1,2
d1,5
d1,3
d3,11
3
1
…
2 4
5
8
9
6 11
d2,6
d1,2
d1,3
d3,11
DIST={dij : i,j, 1 ≤ i < n, 1 < j ≤ n, i < j}
A
![Page 38: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/38.jpg)
#2: Measure Similarity between source and target
• Minimum (min) and maximum (max) values of
DIST
• Mean and standard deviation (std) of DIST
• The number of instances
25
![Page 39: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/39.jpg)
#3: Decision Rules
• Rule #1
– Mean and Std are same NoN
• Rule #2
– Max and Min are different N1 (max=1, min=0)
• Rule #3,#4
– Std and # of instances are different
N3 or N4 (src/tgt mean=0, std=1)
• Rule #5
– Default N2 (mean=0, std=1)
26
![Page 40: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/40.jpg)
EVALUATION
27
![Page 41: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/41.jpg)
Experimental Setup
• 8 software subjects
• Machine learning algorithm
– Logistic regression
28
ReLink (Wu et al.@FSE`11)
Projects # of metrics
(features)
Apache 26
(Source code) Safe
ZXing
AEEEM (D’Ambros et al.@MSR`10)
Projects # of metrics
(features)
Apache Lucene (LC)
61
(Source code,
Churn,
Entropy,…)
Equinox (EQ)
Eclipse JDT
Eclipse PDE UI
Mylyn (ML)
![Page 42: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/42.jpg)
Experimental Design
29
Test set
(50%)
Training set
(50%)
Within-project defect prediction
![Page 43: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/43.jpg)
Experimental Design
30
Target project (Test set)
Source project (Training set)
Cross-project defect prediction
![Page 44: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/44.jpg)
Experimental Design
31
Target project (Test set)
Source project (Training set)
Cross-project defect prediction with TCA/TCA+
TCA/TCA+
![Page 45: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/45.jpg)
RESULTS
32
![Page 46: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/46.jpg)
ReLink Result
33 *Baseline: Cross-project defect prediction without TCA/TCA+
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
F-measure
Baseline TCA TCA+ Within
Safe Apache Apache Safe Safe ZXing
Baseline TCA TCA+ Within Baseline TCA TCA+ Within
![Page 47: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/47.jpg)
ReLink Result F-measure
34
Cross
Source Target
Safe Apache
Zxing Apache
Apache Safe
Zxing Safe
Apache ZXing
Safe ZXing
Average
Baseline
0.52
0.69
0.49
0.59
0.46
0.10
0.49
TCA
0.64
0.64
0.72
0.70
0.45
0.42
0.59
TCA+
0.64
0.72
0.72
0.64
0.49
0.53
0.61
Within
Target Target
0.64
0.62
0.33
0.53
*Baseline: Cross-project defect prediction without TCA/TCA+
![Page 48: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/48.jpg)
AEEEM Result
35 *Baseline: Cross-project defect prediction without TCA/TCA+
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
F-measure
Baseline TCA TCA+ Within
JDT EQ PDE LC PDE ML
Baseline TCA TCA+ Within Baseline TCA TCA+ Within
![Page 49: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/49.jpg)
AEEEM Result F-measure
36
Cross Source Target
JDT EQ
LC EQ
ML EQ
…
PDE LC
EQ ML
JDT ML
LC ML
PDE ML
…
Average
Baseline
0.31
0.50
0.24
…
0.33
0.19
0.27
0.20
0.27
…
0.32
TCA
0.59
0.62
0.56
…
0.27
0.62
0.56
0.58
0.48
…
0.41
TCA+
0.60
0.62
0.56
…
0.33
0.62
0.56
0.60
0.54
…
0.41
Within
Source Target
0.58
…
0.37
0.30
…
0.42
![Page 50: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/50.jpg)
Threats to Validity
• Systems are open-source projects.
• Experimental results may not be
generalizable.
• Decision rules in TCA+ may not be
generalizable.
37
![Page 51: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/51.jpg)
Future Work
• Transfer defect learning on different
feature space
– e.g., ReLink AEEEM
AEEEM ReLink
• Local models using Transfer Learning
• Adapt Transfer learning in other Software
Engineering (SE) problems
– e.g., Knowledge from mailing lists
Bug triage problem
38
![Page 52: Transfer defect learning](https://reader034.vdocument.in/reader034/viewer/2022052507/558bcf90d8b42aab0b8b47a0/html5/thumbnails/52.jpg)
Conclusion
• TCA+
– TCA
• Make distributions of source and target similar
– Decision rules to improve TCA
– Significantly improved cross-project defect prediction performance
• Transfer Learning in SE
– Transfer learning may benefit other
prediction and recommendation systems in
SE domains.
39