performance analysis with real case
DESCRIPTION
IBM Korea Global Technology Services. Performance Analysis with Real Case. IBM GTS Infrastructure Support Services Kang, SeungRok. Index. * Case 1 : Script vs C binary * Case 2 : Real Memory and Paging Space * Case 3 : CPU usage & Java Performance * Case 4 : Disk I/O & Disk Wait Ratio. - PowerPoint PPT PresentationTRANSCRIPT
© 2006 IBM Corporation
Performance Analysis with Real Case
IBM GTS Infrastructure Support ServicesKang, SeungRok
IBM Korea Global Technology Services
2
IBM Korea , Global Technology Service
2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation
Index
* Case 1 : Script vs C binary
* Case 2 : Real Memory and Paging Space
* Case 3 : CPU usage & Java Performance
* Case 4 : Disk I/O & Disk Wait Ratio
3
IBM Korea , Global Technology Service
2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation
Performance Problem?
From my personal experience (your mileage may vary)What are the most common performance problem causes?
. 50% poor disk layout + mgmt - some disks 90%+ busy while 50% not used at all - do you have a clear document that maps the files to actual disks?. 10% poor setup of RDBMS tuning parameters relating to memory use. 10% single threaded batch applications (and we have been using SMP for 7 years!!). 10% poorly written customer extensions to standard applications. 5% system running with errors in the errpt log file (including CPU failures!!). 5% paging on large RAM (>2 GB) systems & vmtune not use to set min/maxperm. 5% AIX problems already discovered and fixed but AIX was not up to date.. 4% badly ported app = not compiled with optimisation or on old AIX versions. 1% genuine bugs in AIX
* Nigel Griffiths
4
IBM Korea , Global Technology Service
2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation
Conclusion : Performance analysis is continuous.
This Class show Performance issue is continuous.
Service performance
ITILISO-9000
PD
C A
management1. Plan2. Do3. Check4. Act
TIME
performance
5
IBM Korea , Global Technology Service
2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation
Conclusion : Performance analysis is teamwork play.
This Class show Performance issue is teamwork play. Without teamwork, only bad performance!
System administratorSystem administrator
Network administratorNetwork administratorNetwork administratorNetwork administrator
Application DeveloperApplication Developer
DBADBA
Application DeveloperApplication Developer
DBADBA
6
IBM Korea , Global Technology Service
2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation
Case 1 : Shell vs C binary
Shell C binary
Good Point Easy to developmentEasy to debug
Fast execution
Week Point Slow execution Difficult to developmentDifficult to debug
Language Character
Interpreter LanguageScript Language
Compiler Language
Which one is better as performance view?Software is most important point of performance view
7
IBM Korea , Global Technology Service
2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation
Case 1 : Shell vs C binary
#include<stdio.h>
void main(int argc,char *argv[]){ FILE *fp; char Buffer[1024]; char Head[10]; char Data1[1014]; char Data2[1014]; char Data3[1014]; char Date[1024];
fp = fopen(argv[1],"rw"); while(fgets(Buffer,1024,fp)!=0) { strcpy(Head,strtok(Buffer,",")); strcpy(Data1,strtok(NULL,",")); strcpy(Data2,strtok(NULL,",")); strcpy(Data3,strtok(NULL,"")); if(strcmp(Head,"ZZZZ")==0) { strcpy(Date,Data2); } if(strcmp(Head,"TOP")==0) { printf("%s,%s,%s,%s,%s",Head,Date,Data1,Data2,Data3); } }}
#/usr/bin/kshexport IFS=,cat $1 | while read HEAD DATA1 DATA2 DATA3do# echo $HEAD if [[ $HEAD == "ZZZZ" ]] then ZZZDATE=$DATA2 fi if [[ $HEAD == "TOP" ]] then echo $HEAD","$ZZZDATE","$DATA1","$DATA2","$DATA3 fi
done
top_sh.sh top_c.c
8
IBM Korea , Global Technology Service
2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation
Case 1 : Shell vs C binarytop_sh.sh top_c.c
CPU Total sj_open2 2007- 03- 13
0
10
20
30
40
50
60
70
80
90
100
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
5
10:4
5
10:4
5
10:4
5
10:4
5
10:4
5
10:4
5
10:4
5
10:4
5
10:4
5
10:4
5
10:4
5
10:4
5
10:4
5
10:4
5
10:4
5
10:4
5
10:4
5
10:4
5
10:4
5
10:4
5
10:4
5
10:4
5
10:4
5
User% Sys% Wait%
CPU Total sj_open2 2007- 03- 13
0
10
20
30
40
50
60
70
80
90
100
10:5
6
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
8
10:5
8
10:5
8
10:5
8
10:5
8
10:5
8
10:5
8
10:5
8
10:5
8
10:5
8
10:5
8
10:5
8
10:5
8
10:5
8
10:5
8
10:5
8
10:5
8
10:5
8
10:5
8
User% Sys% Wait%
Paging sj_open2 (filesystem) 2007- 03- 13
0
5
10
15
20
25
30
3510:5
6
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
7
10:5
8
10:5
8
10:5
8
10:5
8
10:5
8
10:5
8
10:5
8
10:5
8
10:5
8
10:5
8
10:5
8
10:5
8
10:5
8
10:5
8
10:5
8
10:5
8
10:5
8
10:5
8
10:5
8
fsin fsout
Paging sj_open2 (filesystem) 2007- 03- 13
0
10
20
30
40
50
60
70
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
4
10:4
5
10:4
5
10:4
5
10:4
5
10:4
5
10:4
5
10:4
5
10:4
5
10:4
5
10:4
5
10:4
5
10:4
5
10:4
5
10:4
5
10:4
5
10:4
5
10:4
5
10:4
5
10:4
5
10:4
5
10:4
5
10:4
5
10:4
5
10:4
5
fsin fsout
CPU Performance CPU Performance
File Cache I/O Performance File Cache I/O Performance
9
IBM Korea , Global Technology Service
2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation
Case 1 : Shell vs C binarytop_sh.sh top_c.c
root@sj_open2:/srkang/case1> timex ./top_c dbserver.nmon > top_c_dbserver.out
real 0.20user 0.16sys 0.02
root@sj_open2:/srkang/case1>
root@sj_open2:/srkang/case1> timex ./top_sh.sh dbserver.nmon > top_sh_dbserver.out
real 46.92user 18.51sys 44.01
root@sj_open2:/srkang/case1>
PID TTY TIME CMD 520416 pts/0 0:00 ksh 544814 pts/0 0:00 \--timex 442610 pts/0 0:06 \--sh 479414 pts/0 0:01 \--cat
PID TTY TIME CMD 520416 pts/0 0:00 ksh 544862 pts/0 0:00 \--timex 553036 pts/0 0:00 \--top_c_sleep
10
IBM Korea , Global Technology Service
2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation
Case 1 : Shell vs Shell
* Some regular expression made bad performance – Fix exist#!/usr/bin/kshON_LIST="gpfs164vg gpfs163vg gpfs162vg gpfs161vg gpfs160vg \gpfs158vg gpfs157vg gpfs156vg gpfs155vg gpfs154vg gpfs153vg gpfs152vg \gpfs151vg gpfs150vg gpfs149vg gpfs148vg gpfs147vg gpfs146vg gpfs145vg \gpfs144vg gpfs143vg gpfs142vg gpfs141vg gpfs140vg gpfs139vg gpfs138vg \gpfs137vg gpfs136vg gpfs135vg gpfs134vg gpfs133vg gpfs132vg gpfs131vg \gpfs365vg gpfs364vg gpfs363vg gpfs362vg gpfs361vg gpfs360vg gpfs359vg"LIST_OF_HDISKS_FOR_RG="vpath0,vpath8,vpath16,vpath24,vpath32,\vpath40,vpath48,vpath56,vpath1,vpath9,vpath17,vpath25,vpath33,\vpath41,vpath49,vpath57,vpath336,vpath337,vpath338,vpath339,vpath340,\vpath341,vpath342,vpath343,vpath2,vpath10,vpath18,vpath26,vpath34,\vpath45,vpath53,vpath61,vpath6,vpath14,vpath22,vpath30,vpath38,\vpath46,vpath54,vpath62"LIST_OF_VOLUME_GROUPS_FOR_RG="dbgmelmvg,dbgmelmvg,dbgmelmvg,\dbgmelmvg,dbgmelmvg,dbgmelmvg,dbgmelmvg,dbgmelmvg,dbgmelmvg,\dbgmelmvg,dbgmelmvg,dbgmelmvg,dbgmelmvg,dbgmelmvg,dbgmelmvg,\dbgmelmvg,dbgmelmvg,dbgmelmvg,dbgmelmvg,dbgmelmvg,dbgmelmvg,\dbgmelmvg,dbgmelmvg,dbgmelmvg,dbpegasusvg,dbpegasusvg,dbpegasusvg,\dbpegasusvg,dbpegasusvg,dbpegasusvg,dbpegasusvg,dbpegasusvg,\dbpegasusvg,dbpegasusvg,dbpegasusvg,dbpegasusvg,dbpegasusvg,\dbpegasusvg,dbpegasusvg"
for disk in $(IFS=', ' set -- $LIST_OF_HDISKS_FOR_RG ; print $*)do print $LIST_OF_VOLUME_GROUPS_FOR_RG |\ IFS=', ' read vg LIST_OF_VOLUME_GROUPS_FOR_RG if [[ -n $vg && $ON_LIST = @(?(* )$vg?( *)) ]]#if [[ -n $vg && -n "$(print "$ON_LIST" | grep " $vg ")" ]]# If i use the above statement, the script works much faster. then continue else echo "would run make_disk_available $disk" fidone
if [[ -n $vg && $ON_LIST = @(?(* )$vg?( *)) ]]----------------------------------------------------if [[ -n $vg && -n "$(print "$ON_LIST" | grep " $vg ")" ]]
real 0m29.68suser 0m29.43ssys 0m0.25s
-----------------------------------------------------real 0m0.68suser 0m0.14ssys 0m0.51s
11
IBM Korea , Global Technology Service
2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation
Case 1 : Shell vs C binary
* Some Customer Used Shell Application which is made for handling a Big SAM file from DB exports. That split the SAM file to several files and delivery those data to other system by FTP. Those job was running for several hours (5~6 hours)
After they change those batch work to C application, Those job was running in just a hour.
You can made shell script easily, But system can be goes worse performance!
12
IBM Korea , Global Technology Service
2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation
Case 2 : Real Memory & Paging Space
Real Memory Paging Space (File System)
Good Point Fast Slow
Week Point ExpensiveSmall Size
Not ExpensiveBig Size to prevent system hang.
Allocation Pinned Memory A process can not know their memory address is in real memory and paging space.
As good as system has a big memory?
13
IBM Korea , Global Technology Service
2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation
Case 2 : Real Memory & Paging SpaceCPU Total sj_open2 2007- 03- 14
0
10
20
30
40
50
60
70
80
90
100
20:5
5
20:5
5
20:5
5
20:5
6
20:5
6
20:5
6
20:5
6
20:5
6
20:5
6
20:5
7
20:5
7
20:5
7
20:5
7
20:5
7
20:5
8
20:5
8
20:5
8
20:5
8
20:5
8
20:5
8
20:5
9
20:5
9
20:5
9
20:5
9
20:5
9
20:5
9
21:0
0
21:0
0
21:0
0
User% Sys% Wait%
Performance with Paging Space In/Out root@sj_open2:/srkang/case2> timex ./top_c dbserver150.nmon > top_c_dbserver150.out
real 11.11user 3.30sys 0.54
Performance without Paging Space In/Outroot@sj_open2:/srkang/case2> timex ./top_c dbserver150.nmon > top_c_dbserver150.out
real 3.85user 3.29sys 0.45
14
IBM Korea , Global Technology Service
2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation
Case 2 : Real Memory & Paging Space
* Some Customer’s database was corrupted. So they had to restore their data from backup tapes. They had to complete recovery those system in short time. So they want to be focus those system to restore. With default setting, they guess that those restore time would be totally 5~6 hours. But they changed their maxpgahead, strict_maxperm then their restore time was just 2 hours.
It is important to file cache of real memorylike computation memory.
maxpgahead : page aheadminfree : paging started when free memory reach minfreemaxfree : paging should stopmaxfree = minfree + maxpgaheadJFS2 : j2_maxPageReadAhead
15
IBM Korea , Global Technology Service
2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation
Case 2 : Real Memory & Paging Space
* One day, Some customer’s batch application that is related with Oracle database. They knows their Application ran slowly when time went by. It looked like Memory Performance issue. But Root cause of those symptom was Kernel Memory Leak. Kernel Memory was on only Real memory not paging space. So those application’s memory went to paging space. And it made those application slow.
Other problem could be cause of specific application’s performance issue
16
IBM Korea , Global Technology Service
2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation
Case 2 : Real Memory & Paging Space
CPU Total dbserver
0
10
20
30
40
50
60
70
80
90
100
User% Sys% Wait%
* Some Customer has 2-tier system, which were database system and web server system. Some day, they had performance issue. It was resolved after they restarted their web server system. Their root cause looks like paging space problem in database system. But when they restarted web server, the most paging spaced process disappeared.
Don’t look at only one system, Please look around related whole systems.
Paging dbserver (pgspace)
0
50
100
150
200
250
300
350
400
450
pgsin pgsout
17
IBM Korea , Global Technology Service
2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation
Case 2 : Real Memory & Paging Space
maxperm 80% (3/31) maxperm 30% (7/26)
Paging IB_WEB1 (pgspace) 26/07/04
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
9:00
9:40
10:2
0
11:0
0
11:4
0
12:2
0
13:0
0
13:4
0
14:2
0
15:0
0
15:4
0
16:2
0
17:0
0
17:4
0
18:2
0
19:0
0
19:4
0
20:2
0
21:0
0
21:4
0
22:2
0
23:0
0
23:4
0
0:20
1:00
1:40
2:20
3:00
3:40
4:20
5:00
5:40
6:20
7:00
7:40
8:20
pgsin pgsout
Memory Use IB_WEB1 26/07/04
0
10
20
30
40
50
60
70
80
90
100
9:00
9:40
10:2
0
11:0
0
11:4
0
12:2
0
13:0
0
13:4
0
14:2
0
15:0
0
15:4
0
16:2
0
17:0
0
17:4
0
18:2
0
19:0
0
19:4
0
20:2
0
21:0
0
21:4
0
22:2
0
23:0
0
23:4
0
0:20
1:00
1:40
2:20
3:00
3:40
4:20
5:00
5:40
6:20
7:00
7:40
8:20
%comp %file
Paging IB_WEB1 (pgspace) 03/31
0
5
10
15
20
25
30
35
9:0
0
9:5
5
10:5
0
11:4
5
12:4
0
13:3
5
14:3
0
15:2
5
16:2
0
17:1
5
18:1
0
19:0
5
20:0
0
20:5
5
21:5
0
22:4
5
23:4
0
0:3
5
1:3
0
2:2
5
3:2
0
4:1
5
5:1
0
6:0
5
7:0
0
7:5
5
8:5
0pgsin pgsout
Memory Use IB_WEB1 03/31
0
10
20
30
40
50
60
70
80
90
100
9:0
0
9:5
5
10:5
0
11:4
5
12:4
0
13:3
5
14:3
0
15:2
5
16:2
0
17:1
5
18:1
0
19:0
5
20:0
0
20:5
5
21:5
0
22:4
5
23:4
0
0:3
5
1:3
0
2:2
5
3:2
0
4:1
5
5:1
0
6:0
5
7:0
0
7:5
5
8:5
0
%comp %file
* Graph is similar but performance is not similar.
18
IBM Korea , Global Technology Service
2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation
Quiz 1
root@sj_open2:/srkang> vmstat 1
System configuration: lcpu=4 mem=1840MB
kthr memory page faults cpu ----- ----------- ------------------------ ------------ ----------- r b avm fre re pi po fr sr cy in sy cs us sy id wa 0 0 229611 1024 0 0 7 12330 112280 0 18 109356 213304 13 33 55 0 0 11 236371 0 0 1 1025 6880 45005 0 205 36424 63339 3 10 48 39 0 22 238178 5 0 7 483 1792 3709 0 179 7873 16152 1 3 46 50 0 7 251143 807 0 34 4681 12883 52835 0 434 44194 95733 6 19 40 35 2 0 268925 939 0 49 2323 17801 29475 0 337 75478 143061 8 23 57 12 0 2 290688 985 0 15 2484 21802 41972 0 270 79692 173846 10 29 47 14 0 0 316751 1041 0 6 4 26125 101487 0 10 104434 209025 12 36 51 1 2 0 341875 558 0 12 3750 25153 60894 0 341 104390 201942 12 34 53 1 1 0 358023 0 0 12 5413 16235 22743 0 449 73764 140318 8 24 58 10
* Which point of bottleneck?
19
IBM Korea , Global Technology Service
2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation
Case 3 : CPU usage & Java Performancetop_java.javaimport java.*;import java.io.*;import java.util.*;import java.io.BufferedReader;
class top_java{
public static void main(String[] args) { String Head = new String(); String Data1 = new String(); String Data2 = new String(); String Data3 = new String(); String Date = new String() ;
try { BufferedReader infile = new BufferedReader(new FileReader(args[0])); String str; while ((str = infile.readLine()) != null) { Data3 = new String(); StringTokenizer st = new StringTokenizer(str,","); Head = new String(st.nextToken()); if(st.hasMoreTokens()) {
Data1 = new String(st.nextToken()); if(st.hasMoreTokens()) { Data2 = new String(st.nextToken()); while(st.hasMoreTokens()) { Data3 = new String(Data3 + "," + st.nextToken()); } } }
if(Head.equals("ZZZZ")) { Date = Data2; } if(Head.equals("TOP")) { System.out.println(Head+","+Date+","+Data1+","+Data2+Data3); } } infile.close(); } catch (IOException e) { System.out.println("File Open Exception"); }
}}
20
IBM Korea , Global Technology Service
2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation
Case 3 : CPU usage & Java Performance
root@sj_open2:/srkang> timex top_c dbserver.nmon > dbserver.nmon_c.out
real 0.30user 0.16sys 0.02
root@sj_open2:/srkang> timex /usr/java14/jre/bin/java top_java dbserver.nmon > dbserver.nmon_java.out
real 2.15user 1.53sys 0.56
-rw-r--r-- 1 root system 5125362 Apr 01 23:21 dbserver.nmon_c.out-rw-r--r-- 1 root system 5125362 Apr 01 23:26 dbserver.nmon_java.out
21
IBM Korea , Global Technology Service
2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation
Case 3 : CPU usage & Java Performance
Java performance metrics
Application
Service request response times (cross-JVM), service request call counts, class-level and method-level response times, class and method call counts, object allocations and deallocations, and so on
Application serverThread pool metrics, database connection pool metrics, JCA connection pool metrics, entity bean and stateful session bean cache metrics, stateless session bean and message-driven bean pool metrics, JMS server metrics, and transaction metrics
JVMMemory usage and garbage collection metrics
Operating system/platformCPU usage, physical memory usage, disk input/output metrics, and network connectivity metrics
22
IBM Korea , Global Technology Service
2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation
Case 3 : CPU usage & Java Performance
* Unnecessary system GC made system performance bad. Option –Xdisableexplicitgc recommanded. Please use IBM JVM free tools. http://www-128.ibm.com/developerworks/java/jdk/diagnosis/141.html
23
IBM Korea , Global Technology Service
2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation
Case 3 : CPU usage & Java Performance
* Some Customer Test WAS server with 3-rd party test program. 600 users test count those result.
Memory arisso3 2005- 11- 10
0500
100015002000250030003500400045005000
20:3
0
20:4
0
20:5
0
21:0
0
21:1
0
21:2
0
21:3
0
21:4
0
21:5
0
22:0
0
22:1
0
22:2
0
22:3
0
22:4
0
22:5
0
23:0
0
23:1
0
23:2
0
23:3
0
23:4
0
Real total(MB)
CPU Total arisso3 2005- 11- 10
0102030405060708090
100
20:3
0
20:4
0
20:5
0
21:0
0
21:1
0
21:2
0
21:3
0
21:4
0
21:5
0
22:0
0
22:1
0
22:2
0
22:3
0
22:4
0
22:5
0
23:0
0
23:1
0
23:2
0
23:3
0
23:4
0
User% Sys% Wait%
Memory Use arisso3 2005- 11- 10
0102030405060708090
100
20:3
0
20:4
0
20:5
0
21:0
0
21:1
0
21:2
0
21:3
0
21:4
0
21:5
0
22:0
0
22:1
0
22:2
0
22:3
0
22:4
0
22:5
0
23:0
0
23:1
0
23:2
0
23:3
0
23:4
0
%comp %file
Paging arisso3 (pgspace) 2005- 11- 10
0
200
400
600
800
1000
1200
20:3
0
20:4
0
20:5
0
21:0
0
21:1
0
21:2
0
21:3
0
21:4
0
21:5
0
22:0
0
22:1
0
22:2
0
22:3
0
22:4
0
22:5
0
23:0
0
23:1
0
23:2
0
23:3
0
23:4
0
pgsin pgsout
System TEST environment
24
IBM Korea , Global Technology Service
2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation
Case 3 : CPU usage & Java Performance
* After Upgrade WAS Application Server, System Administrator thought that those system CPU usage will be half than before.
CPU Total arisso2 2005- 10- 10
0102030405060708090
100
20:3
0
21:1
0
21:5
0
22:3
0
23:1
0
23:5
0
0:3
0
1:1
0
1:5
0
2:3
0
3:1
0
3:5
0
4:3
0
5:1
0
5:5
0
6:3
0
7:1
0
7:5
0
8:3
0
9:1
0
9:5
0
10:3
0
11:1
0
11:5
0
12:3
0
13:1
0
13:5
0
14:3
0
15:1
0
15:5
0
16:3
0
17:1
0
17:5
0
18:3
0
19:1
0
19:5
0
User% Sys% Wait%
CPU Total arisso2 2005- 10- 11
0102030405060708090
100
20:3
0
21:1
0
21:5
0
22:3
0
23:1
0
23:5
0
0:3
0
1:1
0
1:5
0
2:3
0
3:1
0
3:5
0
4:3
0
5:1
0
5:5
0
6:3
0
7:1
0
7:5
0
8:3
0
9:1
0
9:5
0
10:3
0
11:1
0
11:5
0
12:3
0
13:1
0
13:5
0
14:3
0
15:1
0
15:5
0
16:3
0
17:1
0
17:5
0
18:3
0
19:1
0
19:5
0
User% Sys% Wait%
* New system ‘s SPECjbb2000 value is 20% is higher than old one. Refer SPECjbb2000.
http://www-03.ibm.com/systems/p/benchmarks/jba.html
25
IBM Korea , Global Technology Service
2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation
Case 3 : CPU usage & Java Performance
* After Upgrade Java Application Server, System Administrator thought that those system CPU usage will be lower than before. But CPU usage goes higher value. What’s is going on?
* Main Application was Developed with Java as Polling system. New system polling more faster than old one. So CPU usage goes higher value.
Processes icescape01 2007- 12- 02
0
5
10
15
20
25
00:0
1
00:2
7
00:5
3
01:1
9
01:4
5
02:1
1
02:3
7
03:0
3
03:2
9
03:5
5
04:2
1
04:4
7
05:1
3
05:3
9
06:0
5
06:3
1
06:5
7
07:2
3
07:4
9
08:1
5
08:4
1
09:0
7
09:3
3
09:5
9
10:2
5
10:5
1
11:1
7
11:4
3
12:0
9
12:3
5
13:0
1
13:2
7
13:5
3
14:1
9
RunQueue Swap- in
CPU Total icescape01 2007- 12- 02
0
10
20
30
40
50
60
70
80
90
100
00:0
1
00:2
7
00:5
3
01:1
9
01:4
5
02:1
1
02:3
7
03:0
3
03:2
9
03:5
5
04:2
1
04:4
7
05:1
3
05:3
9
06:0
5
06:3
1
06:5
7
07:2
3
07:4
9
08:1
5
08:4
1
09:0
7
09:3
3
09:5
9
10:2
5
10:5
1
11:1
7
11:4
3
12:0
9
12:3
5
13:0
1
13:2
7
13:5
3
14:1
9
User% Sys% Wait%
26
IBM Korea , Global Technology Service
2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation
Case 3 : CPU usage & Java Performance
* This case is related with TPMC. Those System Hit 60% Maximum CPU usage rate. After Upgrade those system, those CPU Max rate is under 10%. (without WAIT value)
* Wait value is not include in CPU Usage.
27
IBM Korea , Global Technology Service
2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation
Case 3 : CPU usage & Java Performance
9 way, 16G, AIX5.1, HACMP5.1
9 way, 16G, AIX5.1
12 way, 16G, AIX5.1, HACMP5.1
CPU Total gamsa 2005- 12- 08
0102030405060708090
100
9:0
2
9:2
2
9:4
2
10:0
2
10:2
2
10:4
2
11:0
2
11:2
2
11:4
2
12:0
2
12:2
2
12:4
2
13:0
2
13:2
2
13:4
2
14:0
2
14:2
2
14:4
2
15:0
2
15:2
2
15:4
2
16:0
2
16:2
2
16:4
2
17:0
2
17:2
2
17:4
2
18:0
2
18:2
2
18:4
2
19:0
2
19:2
2
19:4
2
20:0
2
20:2
2
20:4
2
User% Sys% Wait%
CPU Total gamsa 2005- 12- 12
0102030405060708090
100
9:0
0
9:2
0
9:4
0
10:0
0
10:2
0
10:4
0
11:0
0
11:2
0
11:4
0
12:0
0
12:2
0
12:4
0
13:0
0
13:2
0
13:4
0
14:0
0
14:2
0
14:4
0
15:0
0
15:2
0
15:4
0
16:0
0
16:2
0
16:4
0
17:0
0
17:2
0
17:4
0
18:0
0
18:2
0
18:4
0
19:0
0
19:2
0
19:4
0
20:0
0
20:2
0
20:4
0
User% Sys% Wait%
CPU Total gamsa2 2005- 12- 12
0102030405060708090
1009:0
1
9:2
1
9:4
1
10:0
1
10:2
1
10:4
1
11:0
1
11:2
1
11:4
1
12:0
1
12:2
1
12:4
1
13:0
1
13:2
1
13:4
1
14:0
1
14:2
1
14:4
1
15:0
1
15:2
1
15:4
1
16:0
1
16:2
1
16:4
1
17:0
1
17:2
1
17:4
1
18:0
1
18:2
1
18:4
1
19:0
1
19:2
1
19:4
1
20:0
1
20:2
1
20:4
1
User% Sys% Wait%
* High Availability can made better performance
28
IBM Korea , Global Technology Service
2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation
Quiz 2
* Which one is most important ?Paging (pgspace) 2004- 05- 01
0
50
100
150
200
250
300
350
400
450
17:2
4
21:4
4
2:0
4
6:2
4
10:4
4
15:0
4
19:2
4
23:4
4
4:0
4
8:2
4
12:4
4
17:0
4
21:2
4
1:4
4
6:0
4
10:2
4
14:4
4
19:0
4
23:2
4
3:4
4
8:0
4
12:2
4
16:4
4
21:0
4
1:2
4
5:4
4
10:0
4
14:2
4
18:4
4
23:0
4
3:2
4
7:4
4
12:0
4
16:2
4
20:4
4
1:0
4
5:2
4
9:4
4
pgsin pgsout
Paging (filesystem) 2004- 05- 01
0
500
1000
1500
2000
2500
3000
3500
4000
4500
17:2
4
21:4
9
2:1
4
6:3
9
11:0
4
15:2
9
19:5
4
0:1
9
4:4
4
9:0
9
13:3
4
17:5
9
22:2
4
2:4
9
7:1
4
11:3
9
16:0
4
20:2
9
0:5
4
5:1
9
9:4
4
14:0
9
18:3
4
22:5
9
3:2
4
7:4
9
12:1
4
16:3
9
21:0
4
1:2
9
5:5
4
10:1
9
14:4
4
19:0
9
23:3
4
3:5
9
8:2
4
fsin fsout
CPU 1
0102030405060708090
100
17:2
4
21:4
4
2:0
4
6:2
4
10:4
4
15:0
4
19:2
4
23:4
4
4:0
4
8:2
4
12:4
4
17:0
4
21:2
4
1:4
4
6:0
4
10:2
4
14:4
4
19:0
4
23:2
4
3:4
4
8:0
4
12:2
4
16:4
4
21:0
4
1:2
4
5:4
4
10:0
4
14:2
4
18:4
4
23:0
4
3:2
4
7:4
4
12:0
4
16:2
4
20:4
4
1:0
4
5:2
4
9:4
4
User% Sys% Wait%
Network I/O (KB/s)
0
100
200
300
400
500
600
700
800
17:2
4
21:4
4
2:0
4
6:2
4
10:4
4
15:0
4
19:2
4
23:4
4
4:0
4
8:2
4
12:4
4
17:0
4
21:2
4
1:4
4
6:0
4
10:2
4
14:4
4
19:0
4
23:2
4
3:4
4
8:0
4
12:2
4
16:4
4
21:0
4
1:2
4
5:4
4
10:0
4
14:2
4
18:4
4
23:0
4
3:2
4
7:4
4
12:0
4
16:2
4
20:4
4
1:0
4
5:2
4
9:4
4
en1- read en4- read lo0- read en1- write en4- write lo0- write
① ②
③ ④
29
IBM Korea , Global Technology Service
2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation
Case 4 : Disk I/O & Disk Wait Rate
* Some System’s wait% value is high. They upgrade their CPU to higher Hz one. After then, Those system CPU wait% is looks so high. But there is not any performance issue. But Administrator was concerned that point.
30
IBM Korea , Global Technology Service
2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation
ftp> haHash mark printing on (1024 bytes/hash mark).ftp> put 100M200 PORT command successful.150 Opening data connection for 100M.#####################################################################################################226 Transfer complete.104857600 bytes sent in 10.99 seconds (9321 Kbytes/s)local: 100M remote: 100Mftp> put 100M200 PORT command successful.150 Opening data connection for 100M.#####################################################################################################226 Transfer complete.104857600 bytes sent in 8.846 seconds (1.158e+04 Kbytes/s)local: 100M remote: 100Mftp> put 100M200 PORT command successful.150 Opening data connection for 100M.
Case 4 : Disk I/O & Disk Wait Rate
* Sometimes, FTP is used as Network Performance Test
HIGH water mark for pending write I/Os per file [30] LOW water mark for pending write I/Os per file [20]
HIGH water mark for pending write I/Os per file [0] LOW water mark for pending write I/Os per file [0]
FTP test without I/Oftp> put "| dd if=/dev/zero bs=32k count=10000" /dev/null327680000 bytes sent in 27.65 seconds (1.157e+04 Kbytes/s)local: | dd if=/dev/zero bs=32k count=10000 remote: /dev/null
31
IBM Korea , Global Technology Service
2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation
Case 4 : Disk I/O & Disk Wait Rate
root@sj_open2:/> nfsstat -m/ss from /ss:S80 Flags: vers=3,proto=tcp,auth=unix,hard,intr,link,symlink,rsize=32768,wsize=32768,retrans=5 All: srtt=0 (0ms), dev=0 (0ms), cur=0 (0ms)
root@sj_open2:/ss> timex dd if=/dev/zero of=./100M bs=1024k count=100100+0 records in.100+0 records out.
real 9.84user 0.00sys 0.36 root@sj_open2:/> mount -o vers=2 S80:/ss /ss
root@sj_open2:/> nfsstat -m/ss from /ss:S80 Flags: vers=2,proto=tcp,auth=unix,hard,intr,dynamic,rsize=8192,wsize=8192,retrans=5 All: srtt=0 (0ms), dev=0 (0ms), cur=0 (0ms)
root@sj_open2:/> cd /ssroot@sj_open2:/ss> timex dd if=/dev/zero of=./100M bs=1024k count=100100+0 records in.100+0 records out.
real 97.64user 0.00sys 0.40
* NFS is usually used. But many users don’t verify NFS status. USE nfsstat
32
IBM Korea , Global Technology Service
2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation
Case 4 : Disk I/O & Disk Wait Rate
* On Server #1, Disk Busy rate was 100%. But their I/O ratio was very low. That time, On Server #2, Their Full Backup Process was running. Those two systems access same disks through a cache in DISK subsystem. That made disk lock on server #1
Server #1 Server #2
Please look around total environment.Symptom can be only one side result.
33
IBM Korea , Global Technology Service
2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation
CPU Total 27/03/07
0
10
20
30
40
50
60
70
80
90
100
0:0
5
0:3
5
1:0
5
1:3
5
2:0
5
2:3
5
3:0
5
3:3
5
4:0
5
4:3
5
5:0
5
5:3
5
6:0
5
6:3
5
7:0
5
7:3
5
8:0
5
8:3
5
9:0
5
9:3
5
10:0
5
10:3
5
11:0
5
11:3
5
12:0
5
12:3
5
13:0
5
13:3
5
14:0
5
14:3
5
15:0
5
15:3
5
16:0
5
16:3
5
17:0
5
17:3
5
18:0
5
18:3
5
19:0
5
19:3
5
20:0
5
20:3
5
21:0
5
21:3
5
22:0
5
22:3
5
23:0
5
23:3
5
User% Sys% Wait%
* One day, System Administrator changed some system’s HBA card that is reported by H/W fault. The next day he changed, Those system got Performance Problem.
CPU Total 29/03/07
0
10
20
30
40
50
60
70
80
90
100
0:0
5
0:3
5
1:0
5
1:3
5
2:0
5
2:3
5
3:0
5
3:3
5
4:0
5
4:3
5
5:0
5
5:3
5
6:0
5
6:3
5
7:0
5
7:3
5
8:0
5
8:3
5
9:0
5
9:3
5
10:0
5
10:3
5
11:0
5
11:3
5
12:0
5
12:3
5
13:0
5
13:3
5
14:0
5
14:3
5
15:0
5
15:3
5
16:0
5
16:3
5
17:0
5
17:3
5
18:0
5
18:3
5
19:0
5
19:3
5
20:0
5
20:3
5
21:0
5
21:3
5
22:0
5
22:3
5
23:0
5
23:3
5
User% Sys% Wait%
* It looks like Application Query Problem. But Application Team didn’t agree with that. They Just believe those problem was from Change Management.
Before
After
Case 4 : Disk I/O & Disk Wait Rate
34
IBM Korea , Global Technology Service
2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation
Q & A
Q & AThank you very much !