a machine learning approach to predict software defects
TRANSCRIPT
Escalation Prediction on Defects Database
Dr. K. V. SubramaniamChetan Hireholi, 01FM14ESE006
GuideProject author
Problem statement Determine what lead to Escalation by interpreting the Defects Corpus of the
customer support cases Alert on the Escalation based on the nature of the Defects, correlate the
Escalations on defects discovered by the customers and find the trigger point which leads to one such Escalation
Data Source
Incident Database
CRs Database
The Incident Database: Contained the Customer Support cases. The CRs Database: Internally used database which details the cases which were Change Requests
Data CleansingThe data in the Incidents and CRs Database had a lot of discrepancy (Ex. Rows
not in order, special characters in the Date Field, Multiple discrepancy in the company names viz. Boeing, Boeing Inc.,)
Tools such as OpenRefine & Microsoft Excel helped in removing such discrepancies.
Green Red Yellow0
50010001500200025003000350040004500
3831
125 329
Total
Total
Incident Database
Understanding the workflow
Algorithms
20.779
70.22
J48 Decision Tree
Correctly Classified Incorrectly classified
1. J 48 Decision Tree: 2. Naïve Bayes (RED & YELLOW corpus):
Attributes selected: Escalation, Expectation, Modules, Severity.
Motivation to do Textual Analysis:The discussion between the client and the developer is captured in the ‘Comments’ attribute in the Incidents Database. By analyzing this can unearth additional info about the defects (viz. what triggered the escalation?, initial escalation of a defect, nature of the client, etc.). This lead to the use of R to do Text Mining
a. Attributes selected: Escalation, Expectation, Modules, Severity.
b. Probability distribution for:i. RED Escalation: 0.242 (24.2%)ii. YELLOW Escalation: 0.758 (75.8%)iii. When Escalation is RED, then it is more likely that the
Severity is URGENT, with its probability distribution: 0.449 (44.9%)
iv. When Escalation is YELLOW, then it is more likely that the Severity is HIGH, with its probability distribution: 0.634 (63.4%)
3. Simple K Means method:
a. Cluster 1 formed: YELLOW, Investigate Issue & Hotfix required, Installation, High
b. Cluster 2 formed: RED, Investigate Issue, Installation, High
Text Mining using RWhy R over NLTK (Python)?Easy to code, abundant packages Faster Pre Processing of the text
Mining the E- mail dump
Create Corpus(RED, YELLOW & GREEN)
Pre Processing of the Text(Removing punctuations,
Stop words, Numbers, Noise)
Apply ‘tm’ package for Text Mining the Corpus
Extract Graphs, Word Clouds of the trigger points
which are causing Escalations
Results from Text mining
Final escalation state= GREEN; Observations made prior to RED
Most frequently usedThe affected module
Final escalation state= GREEN; Observations made prior to YELLOW
Aiding words / Prefix- Postfix Most frequently used
Words with highest frequency mined
Final escalation state= YELLOW; Observations made prior to RED(only 4 cases)
Developer who is associated with the bug/incident
Final escalation state= RED; Observations made prior to RED(Incidents jumped to RED from YELLOW state)
Most frequently usedThe affected module
Observations made on RED corpus(The whole RED escalated dump)
The term “escalation” used along with “please” and “support” indicates that the escalation is RED or it will get converted to RED
Observations made on GREEN corpus(The whole GREEN escalated dump)
The use of “Please” is not frequent; which in turn indicates- there are no much RED escalations happening in the incident history
Escalation count on the defect dump
Green Red Yellow0
500
1000
1500
2000
2500
3000
3500
4000
4500
3831
125329
Total
Total
Other observations made on Incidents For RED cases:
(Where SEVERITY is URGENT) The Average number of days for a case to get escalated = 13.56 days
(Where SEVERITY is HIGH) The Average number of days for a case to get escalated= 25.29 days
(Where SEVERITY is MEDIUM) The Average number of days for a case to get escalated= 19.66 days
Analyzing Incidents: Customers vs Escalations
RHEINENERGIE, HEWLETT PACKARD, DEUTSCHE BUNDESBANK: Highest number of RED escalations
RHEINENERGIE Hewlett Packard Ltd. CHOREGIE0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
4
33
222222222222222
1111111111111111111111111111111111111111111111111111111111111111111111111111111111111
Total
Total
Total RED escalations: 125/6433; The below shows the highest number of escalations on modules
Ops - Action Agent (opcacta) & Installation: Highest number of RED escalations
Ops - M
onitor A
gent (opcm
ona)
Installa
tion
Perf - C
ollecto
r
Ops - M
essage Agent (
opcmsg
a)
Ops - Trap In
terceptor (
opctrapi)
Ops - Lo
gfile Encapsu
lator (opcle
)
LCore - BBC
Ops - O
ther
Ops - Acti
on Agent (opca
cta)
Perf - C
oda
Ops - Agent R
epository
(agtrep)
LCore - XPL
Documentation
Lcore - D
eploy
Cluster A
wareness (ClAw)
Other
Ops - M
essage In
terceptor (o
pcmsg
i)
Perf - O
ther
Ops - O
psAgt
Perf
Unknown
Perf - G
lancePlus
Collecti
on Framework
Perf - A
larm
LCore - Contro
l
Perf - A
RM
LCore - Config
LCore - Secu
rity
0
5
10
15
20
25
30
3531
129 9 8
6 5 5 4 4 4 3 3 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1
Total
Total
Analyzing Incidents: Modules vs Escalations
8.6 11.14 11.02 11.03 11 11.11 11.13 8.60.501 11.12 11.04 11.01 11.1 unknown 8.53 patch
0
5
10
15
20
25
3028
12 1211 11
109
76 6 6
3 3
1
Count of ESCALATION
Analyzing Incidents: S/w release vs Escalations
Row Labels Count of ESCALATION
8.6 2811.14 1211.02 1211.03 11
11 1111.11 1011.13 9
8.60.501 711.12 611.04 611.01 611.1 3
unknown 3
8.53 patch 1
Grand Total 125
Analyzing Incidents: OS vs Escalations
(blank)
VMWare
Solaris 10
Solaris
Windows
Windows 2008 R2
Windows 2003
HP-UX 11.31
Other
AIX 6.1Close
dHP-U
XLin
ux
Centos
Windows 2003 R2
Linux R
ed Hat 6.2
Other (See Descr
iption)
Linux R
ed Hat RHEL 4
.6
Windows 2003 SP1
Linux R
ed Hat RHEL 5
.1
Windows XP SP3
Linux R
ed Hat RHEL 5
.2 AIX
Linux R
ed Hat RHEL 5
.50
10
20
30
40
50
60
70
80
9083
4 3 3 3 3 3 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1
Red
Red
Analyzing Incidents: Developer vs Escalations
prasad.m.k_hp.com: Handled high number of escalations
prasad.m
.k_hp.co
m
vachan.b-s_
hp.com
ashwin.ra
mesh_hp.co
m
knag_hp.com
sowmya-s.
b_hp.com
chaitra.parash
ar_hp.com
sandeep.p5_hp.com
sjkris
hna.k_hp.co
m
ganesh-kumar.a
nantharamu-agrahara_hp.com
yogish.ja
gadeesh-gowda_hp.co
m
cherian.se
bastian_hp.co
m
narasimhiah_hp.co
m
priyanka
-k.ka
chhwaha_hp.co
m
phani.mupparty
_hp.com
ksree_hp.co
m
james.ponnusa
my_hp.co
m
sachin.divy
aveer_hp.co
m
sandeep.bhardwaj_hp.com
sachidananda.naik_
hp.com
shivaku
mara.madegowda_hp.co
m
kapil.dev_
hp.com
jag-hg_hp.com
shailesh
-hastimal.ja
in_hp.com
shibu.m.k_
hp.com
anila.jo
seph_hp.co
m
prashant.k
umar_hp.com
0
5
10
15
20
25
30
35
29
15
10 108 8
5 5 5 53 3 2 2 2 2 2 1 1 1 1 1 1 1 1 1
Total
Total
Analyzing CR data
Total0
2000
4000
6000
8000
10000
12000
10219
75 93
Escalations in CR
NShowstopperY
N Showstopper Y Grand TotalCount of ESCALATION 10219 75 93 10387
Note: For Defects or CRs (QCCR) , Showstopper would be marked for the defects which are must fixes or immediate fix is needed for a release
Analyzing CRs: Customers vs Escalations
TATA CONSULTANCY SERVICES LTD: Highest ”Showstopper” escalations
Allegis, NORTHROP GRUMMAN,PepperWeed: Highest escalations
0
0.5
1
1.5
2
2.52
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
ShowstopperY
Analyzing CRs: Modules vs Escalations
Ops - Monitor Agent (opcmona) & Installation: Highest ”Showstopper” escalations
Installation & Lcore – Other: Highest escalations
Installa
tion
Lcore - O
ther
Ops - O
ther
Perf - O
ther
Ops - M
essage Agent (
opcmsg
a)
Perf - R
TM
Perf - A
RM
Lcore - C
ontrol
Ops - Acti
on Agent (opca
cta)
LCore - BBC
Ops - Agent R
epository
(agtrep)
(blank)
Ops - M
essage In
terceptor (o
pcmsg
i)
Ops - M
onitor A
gent (opcm
ona)
Perf - C
ollecto
r
Unknown
Ops - ECS
Config
Lcore - S
ecurity
Perf - C
oda
Ops - O
psAgt
Documentation
SelfMon
LCore - Deploy
Collecti
on Framework
Ops - Trap In
terceptor (
opctrapi)
Ops - Lo
gfile Encapsu
lator (opcle
)0
5
10
15
20
25
30
35
17
35
4
12
12
23
4
1 12 2
4
1 1 1
31
20
10
53
2 2 2 2 2 2 2 2 21 1 1 1 1 1
ShowstopperY
Analyzing CRs: S/w release vs Escalations
11 8.6
8.6_IP1
8.53 patch 10.5 9 8.5 8.111.2 8.1x
11.00.10111.11
11.1411.01
8.60.50111.02
11.10
10
20
30
40
50
60
70
20
51 1
41 2
10
5
12
5
63
14
64
1 1 1 1
ShowstopperY
Release 11 : Highest number of ”Showstopper” and ”Y” escalations
Analyzing CRs: OS vs Escalations
Windows (Version number not clear): Highest number of Escalations Both “Showstopper” and “Y”
Windows
Windows 2008 R2
Windows 2003 R2
Windows 2008
AIX;HP-UX;Li
nux;Solaris
;Windows
Linux;W
indows
Windows 2003
Windows XP
Windows XP SP2
HP-UX 11;HP-U
X 11.31;Windows 2
003
Windows 2003 R2;W
indows 2008 R2
0
1
2
3
4
5
6
7
8
9
10
4 4 4
3 3
2
1
9
1
2
1 1
3
2
1 1 1
ShowstopperY
Note: Submitter of CRs tend to choose the OS fields as they want to. Some choose the exact versions where the issue was seen or reported or some choose just at a high level. No strict rules observed
Analyzing CRs: Developer vs Escalations
swati.sinha_hp.com: Handled highest number of Showstopper Escalations
umesh.sharoff_hp.com : Handled highest number of Escalations
umesh.sh
aroff_hp.com
tejaswini.s
2_hp.com
srinath.nadig_hp.co
m
balaji.sundaram_hp.co
m
sunil.lingappa_hp.co
m
muneer.vb_hp.co
m
rathneesh.t-
m_hp.com
sonu.sudhaka
ran_hp.com
komal.rathor_hp.co
m
kiran.pilla
i_hp.com
dhanaseka
ran.d_hp.com
vaibhav.khanduja_hp.co
m
manohar.d.c_
hp.com
ganesh-kumar.a
nantharamu-agrahara_hp.com
vijay-s
hriniva
s.kalghatagi_hp.co
m
krish
na-murth
y.ganapathi_hp.co
m
mariyappa.nagalin
ga_hp.com
naresh.durgam_hp.co
m
pradeep.gururaj_hp.com
veera-raghava
.reddy_
hp.com
sjkris
hna.k_hp.co
m
jain.sambhav_
hp.com
yogish.ja
gadeesh-gowda_hp.co
m
bipin.mish
ra_hp.com
vikrant.n
avalgund_hp.com
james.ponnusa
my_hp.co
m
neeraja.k_hp.co
m
sachin.divy
aveer_hp.co
m
shahul-hameed.noor-m
ohamed_hp.com
yogeesh-g.v_
hp.com
(blank)
knag_hp.com
0
1
2
3
4
5
6
7
8
9
10
8
7
6 6
4 4 4
3 3 3 3 3 3
2 2 2 2 2 2 2
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1
9
2
4
1
3
1 1
2 2
6
1
3 3
2
8
1 1 1 1
2
1 1 1 1 1
6
2 2
1 1
ShowstopperY
Company behavior analysis: RHEINENERGIE (Had maximum RED escalations)
28 incident cases Patterns observed:◦ 6 RED escalation◦ Mostly contains RED escalations (6/28); 21.28% chance that an incident logged in will be a
RED escalation ◦ Most reported module:
◦ Ops - Monitor Agent (opcmona) (7 nos.) ; 3 of them were RED escalated◦ Installation (6 nos.)◦ Perf – Collector (3 nos.)
◦ Average number of days a single incident handled: 73.5 days◦ Number of incidents which move to CR: 15; 53.57% of the incidents move to CRs;
◦ All the 6 RED escalations moved to CR; ◦ 8 GREEN escalations moved to CR;◦ 1 YELLOW escalations moved to CR;
Company behavior analysis: APPLE INC
27 incident cases
Patterns observed:
No RED escalations ever
Mostly contains GREEN escalations (19/27); 70.37% chance that an incident logged in will be a GREEN escalation
Most reported modules: ◦ Ops Monitor Agent (4 nos)◦ Perf Collector (3 nos)◦ Installation, Ops- Action Agent, Ops- Ops Agent, Perf Other (2 nos each)
Average number of days a single incident handled: 463.777 days
Number of incidents which move to CR: 10; 37.03% of the incidents move to CRs
Company behavior analysis: BOEING
33 incident cases
Patterns observed:
1 RED escalation
Mostly contains GREEN escalations (31/33); 93.93% chance that an incident logged in will be a GREEN escalation
Most reported module:◦ Installation (7 nos.)◦ Perf Collector, Other (5 nos.)◦ Perf GlancePlus (4 nos.)◦ Perf ARM (RED escalation); 3% chance that it will be an RED escalation
Average number of days a single incident handled: 399.322 days
Number of incidents which move to CR: 22; 66.66% of the incidents move to CRs
Other observations made on Incidents
DIFFERENCE_INITIAL_CLOSED and DAYS_SUPPORT_TO_CPE are not matching
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100103106109112115118121
-400
-300
-200
-100
0
100
200
300
400
500
DIFFERENCE_INITIAL_CLOSED DAYS_SUPPORT_TO_CPE