© 2006 ibm corporation performance analysis with real case ibm gts infrastructure support services...

34
© 2006 IBM Corporation Performance Analysis with Real Case IBM GTS Infrastructure Support Services Kang, SeungRok IBM Korea Global Technology Services

Upload: constance-fay-mccarthy

Post on 29-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

© 2006 IBM Corporation

Performance Analysis with Real Case

IBM GTS Infrastructure Support ServicesKang, SeungRok

IBM Korea Global Technology Services

2

IBM Korea , Global Technology Service

2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation

Index

* Case 1 : Script vs C binary

* Case 2 : Real Memory and Paging Space

* Case 3 : CPU usage & Java Performance

* Case 4 : Disk I/O & Disk Wait Ratio

3

IBM Korea , Global Technology Service

2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation

Performance Problem?

From my personal experience (your mileage may vary)What are the most common performance problem causes?

. 50% poor disk layout + mgmt - some disks 90%+ busy while 50% not used at all - do you have a clear document that maps the files to actual disks?. 10% poor setup of RDBMS tuning parameters relating to memory use. 10% single threaded batch applications (and we have been using SMP for 7 years!!). 10% poorly written customer extensions to standard applications. 5% system running with errors in the errpt log file (including CPU failures!!). 5% paging on large RAM (>2 GB) systems & vmtune not use to set min/maxperm. 5% AIX problems already discovered and fixed but AIX was not up to date.. 4% badly ported app = not compiled with optimisation or on old AIX versions. 1% genuine bugs in AIX

* Nigel Griffiths

4

IBM Korea , Global Technology Service

2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation

Conclusion : Performance analysis is continuous.

This Class show Performance issue is continuous.

Service performance

ITILISO-9000

PD

C A

management1. Plan2. Do3. Check4. Act

TIME

performance

5

IBM Korea , Global Technology Service

2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation

Conclusion : Performance analysis is teamwork play.

This Class show Performance issue is teamwork play. Without teamwork, only bad performance!

System administratorSystem administrator

Network administratorNetwork administratorNetwork administratorNetwork administrator

Application DeveloperApplication Developer

DBADBA

Application DeveloperApplication Developer

DBADBA

6

IBM Korea , Global Technology Service

2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation

Case 1 : Shell vs C binary

Shell C binary

Good Point Easy to developmentEasy to debug

Fast execution

Week Point Slow execution Difficult to developmentDifficult to debug

Language Character

Interpreter LanguageScript Language

Compiler Language

Which one is better as performance view?Software is most important point of performance view

7

IBM Korea , Global Technology Service

2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation

Case 1 : Shell vs C binary

#include<stdio.h>

void main(int argc,char *argv[]){ FILE *fp; char Buffer[1024]; char Head[10]; char Data1[1014]; char Data2[1014]; char Data3[1014]; char Date[1024];

fp = fopen(argv[1],"rw"); while(fgets(Buffer,1024,fp)!=0) { strcpy(Head,strtok(Buffer,",")); strcpy(Data1,strtok(NULL,",")); strcpy(Data2,strtok(NULL,",")); strcpy(Data3,strtok(NULL,"")); if(strcmp(Head,"ZZZZ")==0) { strcpy(Date,Data2); } if(strcmp(Head,"TOP")==0) { printf("%s,%s,%s,%s,%s",Head,Date,Data1,Data2,Data3); } }}

#/usr/bin/kshexport IFS=,cat $1 | while read HEAD DATA1 DATA2 DATA3do# echo $HEAD if [[ $HEAD == "ZZZZ" ]] then ZZZDATE=$DATA2 fi if [[ $HEAD == "TOP" ]] then echo $HEAD","$ZZZDATE","$DATA1","$DATA2","$DATA3 fi

done

top_sh.sh top_c.c

8

IBM Korea , Global Technology Service

2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation

Case 1 : Shell vs C binarytop_sh.sh top_c.c

CPU Total sj_open2 2007- 03- 13

0

10

20

30

40

50

60

70

80

90

100

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

5

10:4

5

10:4

5

10:4

5

10:4

5

10:4

5

10:4

5

10:4

5

10:4

5

10:4

5

10:4

5

10:4

5

10:4

5

10:4

5

10:4

5

10:4

5

10:4

5

10:4

5

10:4

5

10:4

5

10:4

5

10:4

5

10:4

5

10:4

5

User% Sys% Wait%

CPU Total sj_open2 2007- 03- 13

0

10

20

30

40

50

60

70

80

90

100

10:5

6

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

8

10:5

8

10:5

8

10:5

8

10:5

8

10:5

8

10:5

8

10:5

8

10:5

8

10:5

8

10:5

8

10:5

8

10:5

8

10:5

8

10:5

8

10:5

8

10:5

8

10:5

8

10:5

8

User% Sys% Wait%

Paging sj_open2 (filesystem) 2007- 03- 13

0

5

10

15

20

25

30

3510:5

6

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

7

10:5

8

10:5

8

10:5

8

10:5

8

10:5

8

10:5

8

10:5

8

10:5

8

10:5

8

10:5

8

10:5

8

10:5

8

10:5

8

10:5

8

10:5

8

10:5

8

10:5

8

10:5

8

10:5

8

fsin fsout

Paging sj_open2 (filesystem) 2007- 03- 13

0

10

20

30

40

50

60

70

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

4

10:4

5

10:4

5

10:4

5

10:4

5

10:4

5

10:4

5

10:4

5

10:4

5

10:4

5

10:4

5

10:4

5

10:4

5

10:4

5

10:4

5

10:4

5

10:4

5

10:4

5

10:4

5

10:4

5

10:4

5

10:4

5

10:4

5

10:4

5

10:4

5

fsin fsout

CPU Performance CPU Performance

File Cache I/O Performance File Cache I/O Performance

9

IBM Korea , Global Technology Service

2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation

Case 1 : Shell vs C binarytop_sh.sh top_c.c

root@sj_open2:/srkang/case1> timex ./top_c dbserver.nmon > top_c_dbserver.out

real 0.20user 0.16sys 0.02

root@sj_open2:/srkang/case1>

root@sj_open2:/srkang/case1> timex ./top_sh.sh dbserver.nmon > top_sh_dbserver.out

real 46.92user 18.51sys 44.01

root@sj_open2:/srkang/case1>

PID TTY TIME CMD 520416 pts/0 0:00 ksh 544814 pts/0 0:00 \--timex 442610 pts/0 0:06 \--sh 479414 pts/0 0:01 \--cat

PID TTY TIME CMD 520416 pts/0 0:00 ksh 544862 pts/0 0:00 \--timex 553036 pts/0 0:00 \--top_c_sleep

10

IBM Korea , Global Technology Service

2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation

Case 1 : Shell vs Shell

* Some regular expression made bad performance – Fix exist#!/usr/bin/kshON_LIST="gpfs164vg gpfs163vg gpfs162vg gpfs161vg gpfs160vg \gpfs158vg gpfs157vg gpfs156vg gpfs155vg gpfs154vg gpfs153vg gpfs152vg \gpfs151vg gpfs150vg gpfs149vg gpfs148vg gpfs147vg gpfs146vg gpfs145vg \gpfs144vg gpfs143vg gpfs142vg gpfs141vg gpfs140vg gpfs139vg gpfs138vg \gpfs137vg gpfs136vg gpfs135vg gpfs134vg gpfs133vg gpfs132vg gpfs131vg \gpfs365vg gpfs364vg gpfs363vg gpfs362vg gpfs361vg gpfs360vg gpfs359vg"LIST_OF_HDISKS_FOR_RG="vpath0,vpath8,vpath16,vpath24,vpath32,\vpath40,vpath48,vpath56,vpath1,vpath9,vpath17,vpath25,vpath33,\vpath41,vpath49,vpath57,vpath336,vpath337,vpath338,vpath339,vpath340,\vpath341,vpath342,vpath343,vpath2,vpath10,vpath18,vpath26,vpath34,\vpath45,vpath53,vpath61,vpath6,vpath14,vpath22,vpath30,vpath38,\vpath46,vpath54,vpath62"LIST_OF_VOLUME_GROUPS_FOR_RG="dbgmelmvg,dbgmelmvg,dbgmelmvg,\dbgmelmvg,dbgmelmvg,dbgmelmvg,dbgmelmvg,dbgmelmvg,dbgmelmvg,\dbgmelmvg,dbgmelmvg,dbgmelmvg,dbgmelmvg,dbgmelmvg,dbgmelmvg,\dbgmelmvg,dbgmelmvg,dbgmelmvg,dbgmelmvg,dbgmelmvg,dbgmelmvg,\dbgmelmvg,dbgmelmvg,dbgmelmvg,dbpegasusvg,dbpegasusvg,dbpegasusvg,\dbpegasusvg,dbpegasusvg,dbpegasusvg,dbpegasusvg,dbpegasusvg,\dbpegasusvg,dbpegasusvg,dbpegasusvg,dbpegasusvg,dbpegasusvg,\dbpegasusvg,dbpegasusvg"

for disk in $(IFS=', ' set -- $LIST_OF_HDISKS_FOR_RG ; print $*)do print $LIST_OF_VOLUME_GROUPS_FOR_RG |\ IFS=', ' read vg LIST_OF_VOLUME_GROUPS_FOR_RG if [[ -n $vg && $ON_LIST = @(?(* )$vg?( *)) ]]#if [[ -n $vg && -n "$(print "$ON_LIST" | grep " $vg ")" ]]# If i use the above statement, the script works much faster. then continue else echo "would run make_disk_available $disk" fidone

if [[ -n $vg && $ON_LIST = @(?(* )$vg?( *)) ]]----------------------------------------------------if [[ -n $vg && -n "$(print "$ON_LIST" | grep " $vg ")" ]]

real 0m29.68suser 0m29.43ssys 0m0.25s

-----------------------------------------------------real 0m0.68suser 0m0.14ssys 0m0.51s

11

IBM Korea , Global Technology Service

2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation

Case 1 : Shell vs C binary

* Some Customer Used Shell Application which is made for handling a Big SAM file from DB exports. That split the SAM file to several files and delivery those data to other system by FTP. Those job was running for several hours (5~6 hours)

After they change those batch work to C application, Those job was running in just a hour.

You can made shell script easily, But system can be goes worse performance!

12

IBM Korea , Global Technology Service

2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation

Case 2 : Real Memory & Paging Space

Real Memory Paging Space (File System)

Good Point Fast Slow

Week Point ExpensiveSmall Size

Not ExpensiveBig Size to prevent system hang.

Allocation Pinned Memory A process can not know their memory address is in real memory and paging space.

As good as system has a big memory?

13

IBM Korea , Global Technology Service

2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation

Case 2 : Real Memory & Paging SpaceCPU Total sj_open2 2007- 03- 14

0

10

20

30

40

50

60

70

80

90

100

20:5

5

20:5

5

20:5

5

20:5

6

20:5

6

20:5

6

20:5

6

20:5

6

20:5

6

20:5

7

20:5

7

20:5

7

20:5

7

20:5

7

20:5

8

20:5

8

20:5

8

20:5

8

20:5

8

20:5

8

20:5

9

20:5

9

20:5

9

20:5

9

20:5

9

20:5

9

21:0

0

21:0

0

21:0

0

User% Sys% Wait%

Performance with Paging Space In/Out root@sj_open2:/srkang/case2> timex ./top_c dbserver150.nmon > top_c_dbserver150.out

real 11.11user 3.30sys 0.54

Performance without Paging Space In/Outroot@sj_open2:/srkang/case2> timex ./top_c dbserver150.nmon > top_c_dbserver150.out

real 3.85user 3.29sys 0.45

14

IBM Korea , Global Technology Service

2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation

Case 2 : Real Memory & Paging Space

* Some Customer’s database was corrupted. So they had to restore their data from backup tapes. They had to complete recovery those system in short time. So they want to be focus those system to restore. With default setting, they guess that those restore time would be totally 5~6 hours. But they changed their maxpgahead, strict_maxperm then their restore time was just 2 hours.

It is important to file cache of real memorylike computation memory.

maxpgahead : page aheadminfree : paging started when free memory reach minfreemaxfree : paging should stopmaxfree = minfree + maxpgaheadJFS2 : j2_maxPageReadAhead

15

IBM Korea , Global Technology Service

2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation

Case 2 : Real Memory & Paging Space

* One day, Some customer’s batch application that is related with Oracle database. They knows their Application ran slowly when time went by. It looked like Memory Performance issue. But Root cause of those symptom was Kernel Memory Leak. Kernel Memory was on only Real memory not paging space. So those application’s memory went to paging space. And it made those application slow.

Other problem could be cause of specific application’s performance issue

16

IBM Korea , Global Technology Service

2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation

Case 2 : Real Memory & Paging Space

CPU Total dbserver

0

10

20

30

40

50

60

70

80

90

100

User% Sys% Wait%

* Some Customer has 2-tier system, which were database system and web server system. Some day, they had performance issue. It was resolved after they restarted their web server system. Their root cause looks like paging space problem in database system. But when they restarted web server, the most paging spaced process disappeared.

Don’t look at only one system, Please look around related whole systems.

Paging dbserver (pgspace)

0

50

100

150

200

250

300

350

400

450

pgsin pgsout

17

IBM Korea , Global Technology Service

2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation

Case 2 : Real Memory & Paging Space

maxperm 80% (3/31) maxperm 30% (7/26)

Paging IB_WEB1 (pgspace) 26/07/04

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

9:00

9:40

10:2

0

11:0

0

11:4

0

12:2

0

13:0

0

13:4

0

14:2

0

15:0

0

15:4

0

16:2

0

17:0

0

17:4

0

18:2

0

19:0

0

19:4

0

20:2

0

21:0

0

21:4

0

22:2

0

23:0

0

23:4

0

0:20

1:00

1:40

2:20

3:00

3:40

4:20

5:00

5:40

6:20

7:00

7:40

8:20

pgsin pgsout

Memory Use IB_WEB1 26/07/04

0

10

20

30

40

50

60

70

80

90

100

9:00

9:40

10:2

0

11:0

0

11:4

0

12:2

0

13:0

0

13:4

0

14:2

0

15:0

0

15:4

0

16:2

0

17:0

0

17:4

0

18:2

0

19:0

0

19:4

0

20:2

0

21:0

0

21:4

0

22:2

0

23:0

0

23:4

0

0:20

1:00

1:40

2:20

3:00

3:40

4:20

5:00

5:40

6:20

7:00

7:40

8:20

%comp %file

Paging IB_WEB1 (pgspace) 03/31

0

5

10

15

20

25

30

35

9:0

0

9:5

5

10:5

0

11:4

5

12:4

0

13:3

5

14:3

0

15:2

5

16:2

0

17:1

5

18:1

0

19:0

5

20:0

0

20:5

5

21:5

0

22:4

5

23:4

0

0:3

5

1:3

0

2:2

5

3:2

0

4:1

5

5:1

0

6:0

5

7:0

0

7:5

5

8:5

0pgsin pgsout

Memory Use IB_WEB1 03/31

0

10

20

30

40

50

60

70

80

90

100

9:0

0

9:5

5

10:5

0

11:4

5

12:4

0

13:3

5

14:3

0

15:2

5

16:2

0

17:1

5

18:1

0

19:0

5

20:0

0

20:5

5

21:5

0

22:4

5

23:4

0

0:3

5

1:3

0

2:2

5

3:2

0

4:1

5

5:1

0

6:0

5

7:0

0

7:5

5

8:5

0

%comp %file

* Graph is similar but performance is not similar.

18

IBM Korea , Global Technology Service

2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation

Quiz 1

root@sj_open2:/srkang> vmstat 1

System configuration: lcpu=4 mem=1840MB

kthr memory page faults cpu ----- ----------- ------------------------ ------------ ----------- r b avm fre re pi po fr sr cy in sy cs us sy id wa 0 0 229611 1024 0 0 7 12330 112280 0 18 109356 213304 13 33 55 0 0 11 236371 0 0 1 1025 6880 45005 0 205 36424 63339 3 10 48 39 0 22 238178 5 0 7 483 1792 3709 0 179 7873 16152 1 3 46 50 0 7 251143 807 0 34 4681 12883 52835 0 434 44194 95733 6 19 40 35 2 0 268925 939 0 49 2323 17801 29475 0 337 75478 143061 8 23 57 12 0 2 290688 985 0 15 2484 21802 41972 0 270 79692 173846 10 29 47 14 0 0 316751 1041 0 6 4 26125 101487 0 10 104434 209025 12 36 51 1 2 0 341875 558 0 12 3750 25153 60894 0 341 104390 201942 12 34 53 1 1 0 358023 0 0 12 5413 16235 22743 0 449 73764 140318 8 24 58 10

* Which point of bottleneck?

19

IBM Korea , Global Technology Service

2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation

Case 3 : CPU usage & Java Performancetop_java.javaimport java.*;import java.io.*;import java.util.*;import java.io.BufferedReader;

class top_java{

public static void main(String[] args) { String Head = new String(); String Data1 = new String(); String Data2 = new String(); String Data3 = new String(); String Date = new String() ;

try { BufferedReader infile = new BufferedReader(new FileReader(args[0])); String str; while ((str = infile.readLine()) != null) { Data3 = new String(); StringTokenizer st = new StringTokenizer(str,","); Head = new String(st.nextToken()); if(st.hasMoreTokens()) {

Data1 = new String(st.nextToken()); if(st.hasMoreTokens()) { Data2 = new String(st.nextToken()); while(st.hasMoreTokens()) { Data3 = new String(Data3 + "," + st.nextToken()); } } }

if(Head.equals("ZZZZ")) { Date = Data2; } if(Head.equals("TOP")) { System.out.println(Head+","+Date+","+Data1+","+Data2+Data3); } } infile.close(); } catch (IOException e) { System.out.println("File Open Exception"); }

}}

20

IBM Korea , Global Technology Service

2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation

Case 3 : CPU usage & Java Performance

root@sj_open2:/srkang> timex top_c dbserver.nmon > dbserver.nmon_c.out

real 0.30user 0.16sys 0.02

root@sj_open2:/srkang> timex /usr/java14/jre/bin/java top_java dbserver.nmon > dbserver.nmon_java.out

real 2.15user 1.53sys 0.56

-rw-r--r-- 1 root system 5125362 Apr 01 23:21 dbserver.nmon_c.out-rw-r--r-- 1 root system 5125362 Apr 01 23:26 dbserver.nmon_java.out

21

IBM Korea , Global Technology Service

2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation

Case 3 : CPU usage & Java Performance

Java performance metrics

Application

Service request response times (cross-JVM), service request call counts, class-level and method-level response times, class and method call counts, object allocations and deallocations, and so on

Application serverThread pool metrics, database connection pool metrics, JCA connection pool metrics, entity bean and stateful session bean cache metrics, stateless session bean and message-driven bean pool metrics, JMS server metrics, and transaction metrics

JVMMemory usage and garbage collection metrics

Operating system/platformCPU usage, physical memory usage, disk input/output metrics, and network connectivity metrics

22

IBM Korea , Global Technology Service

2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation

Case 3 : CPU usage & Java Performance

* Unnecessary system GC made system performance bad. Option –Xdisableexplicitgc recommanded. Please use IBM JVM free tools. http://www-128.ibm.com/developerworks/java/jdk/diagnosis/141.html

23

IBM Korea , Global Technology Service

2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation

Case 3 : CPU usage & Java Performance

* Some Customer Test WAS server with 3-rd party test program. 600 users test count those result.

Memory arisso3 2005- 11- 10

0500

100015002000250030003500400045005000

20:3

0

20:4

0

20:5

0

21:0

0

21:1

0

21:2

0

21:3

0

21:4

0

21:5

0

22:0

0

22:1

0

22:2

0

22:3

0

22:4

0

22:5

0

23:0

0

23:1

0

23:2

0

23:3

0

23:4

0

Real total(MB)

CPU Total arisso3 2005- 11- 10

0102030405060708090

100

20:3

0

20:4

0

20:5

0

21:0

0

21:1

0

21:2

0

21:3

0

21:4

0

21:5

0

22:0

0

22:1

0

22:2

0

22:3

0

22:4

0

22:5

0

23:0

0

23:1

0

23:2

0

23:3

0

23:4

0

User% Sys% Wait%

Memory Use arisso3 2005- 11- 10

0102030405060708090

100

20:3

0

20:4

0

20:5

0

21:0

0

21:1

0

21:2

0

21:3

0

21:4

0

21:5

0

22:0

0

22:1

0

22:2

0

22:3

0

22:4

0

22:5

0

23:0

0

23:1

0

23:2

0

23:3

0

23:4

0

%comp %file

Paging arisso3 (pgspace) 2005- 11- 10

0

200

400

600

800

1000

1200

20:3

0

20:4

0

20:5

0

21:0

0

21:1

0

21:2

0

21:3

0

21:4

0

21:5

0

22:0

0

22:1

0

22:2

0

22:3

0

22:4

0

22:5

0

23:0

0

23:1

0

23:2

0

23:3

0

23:4

0

pgsin pgsout

System TEST environment

24

IBM Korea , Global Technology Service

2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation

Case 3 : CPU usage & Java Performance

* After Upgrade WAS Application Server, System Administrator thought that those system CPU usage will be half than before.

CPU Total arisso2 2005- 10- 10

0102030405060708090

100

20:3

0

21:1

0

21:5

0

22:3

0

23:1

0

23:5

0

0:3

0

1:1

0

1:5

0

2:3

0

3:1

0

3:5

0

4:3

0

5:1

0

5:5

0

6:3

0

7:1

0

7:5

0

8:3

0

9:1

0

9:5

0

10:3

0

11:1

0

11:5

0

12:3

0

13:1

0

13:5

0

14:3

0

15:1

0

15:5

0

16:3

0

17:1

0

17:5

0

18:3

0

19:1

0

19:5

0

User% Sys% Wait%

CPU Total arisso2 2005- 10- 11

0102030405060708090

100

20:3

0

21:1

0

21:5

0

22:3

0

23:1

0

23:5

0

0:3

0

1:1

0

1:5

0

2:3

0

3:1

0

3:5

0

4:3

0

5:1

0

5:5

0

6:3

0

7:1

0

7:5

0

8:3

0

9:1

0

9:5

0

10:3

0

11:1

0

11:5

0

12:3

0

13:1

0

13:5

0

14:3

0

15:1

0

15:5

0

16:3

0

17:1

0

17:5

0

18:3

0

19:1

0

19:5

0

User% Sys% Wait%

* New system ‘s SPECjbb2000 value is 20% is higher than old one. Refer SPECjbb2000.

http://www-03.ibm.com/systems/p/benchmarks/jba.html

25

IBM Korea , Global Technology Service

2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation

Case 3 : CPU usage & Java Performance

* After Upgrade Java Application Server, System Administrator thought that those system CPU usage will be lower than before. But CPU usage goes higher value. What’s is going on?

* Main Application was Developed with Java as Polling system. New system polling more faster than old one. So CPU usage goes higher value.

Processes icescape01 2007- 12- 02

0

5

10

15

20

25

00:0

1

00:2

7

00:5

3

01:1

9

01:4

5

02:1

1

02:3

7

03:0

3

03:2

9

03:5

5

04:2

1

04:4

7

05:1

3

05:3

9

06:0

5

06:3

1

06:5

7

07:2

3

07:4

9

08:1

5

08:4

1

09:0

7

09:3

3

09:5

9

10:2

5

10:5

1

11:1

7

11:4

3

12:0

9

12:3

5

13:0

1

13:2

7

13:5

3

14:1

9

RunQueue Swap- in

CPU Total icescape01 2007- 12- 02

0

10

20

30

40

50

60

70

80

90

100

00:0

1

00:2

7

00:5

3

01:1

9

01:4

5

02:1

1

02:3

7

03:0

3

03:2

9

03:5

5

04:2

1

04:4

7

05:1

3

05:3

9

06:0

5

06:3

1

06:5

7

07:2

3

07:4

9

08:1

5

08:4

1

09:0

7

09:3

3

09:5

9

10:2

5

10:5

1

11:1

7

11:4

3

12:0

9

12:3

5

13:0

1

13:2

7

13:5

3

14:1

9

User% Sys% Wait%

26

IBM Korea , Global Technology Service

2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation

Case 3 : CPU usage & Java Performance

* This case is related with TPMC. Those System Hit 60% Maximum CPU usage rate. After Upgrade those system, those CPU Max rate is under 10%. (without WAIT value)

* Wait value is not include in CPU Usage.

27

IBM Korea , Global Technology Service

2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation

Case 3 : CPU usage & Java Performance

9 way, 16G, AIX5.1, HACMP5.1

9 way, 16G, AIX5.1

12 way, 16G, AIX5.1, HACMP5.1

CPU Total gamsa 2005- 12- 08

0102030405060708090

100

9:0

2

9:2

2

9:4

2

10:0

2

10:2

2

10:4

2

11:0

2

11:2

2

11:4

2

12:0

2

12:2

2

12:4

2

13:0

2

13:2

2

13:4

2

14:0

2

14:2

2

14:4

2

15:0

2

15:2

2

15:4

2

16:0

2

16:2

2

16:4

2

17:0

2

17:2

2

17:4

2

18:0

2

18:2

2

18:4

2

19:0

2

19:2

2

19:4

2

20:0

2

20:2

2

20:4

2

User% Sys% Wait%

CPU Total gamsa 2005- 12- 12

0102030405060708090

100

9:0

0

9:2

0

9:4

0

10:0

0

10:2

0

10:4

0

11:0

0

11:2

0

11:4

0

12:0

0

12:2

0

12:4

0

13:0

0

13:2

0

13:4

0

14:0

0

14:2

0

14:4

0

15:0

0

15:2

0

15:4

0

16:0

0

16:2

0

16:4

0

17:0

0

17:2

0

17:4

0

18:0

0

18:2

0

18:4

0

19:0

0

19:2

0

19:4

0

20:0

0

20:2

0

20:4

0

User% Sys% Wait%

CPU Total gamsa2 2005- 12- 12

0102030405060708090

1009:0

1

9:2

1

9:4

1

10:0

1

10:2

1

10:4

1

11:0

1

11:2

1

11:4

1

12:0

1

12:2

1

12:4

1

13:0

1

13:2

1

13:4

1

14:0

1

14:2

1

14:4

1

15:0

1

15:2

1

15:4

1

16:0

1

16:2

1

16:4

1

17:0

1

17:2

1

17:4

1

18:0

1

18:2

1

18:4

1

19:0

1

19:2

1

19:4

1

20:0

1

20:2

1

20:4

1

User% Sys% Wait%

* High Availability can made better performance

28

IBM Korea , Global Technology Service

2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation

Quiz 2

* Which one is most important ?Paging (pgspace) 2004- 05- 01

0

50

100

150

200

250

300

350

400

450

17:2

4

21:4

4

2:0

4

6:2

4

10:4

4

15:0

4

19:2

4

23:4

4

4:0

4

8:2

4

12:4

4

17:0

4

21:2

4

1:4

4

6:0

4

10:2

4

14:4

4

19:0

4

23:2

4

3:4

4

8:0

4

12:2

4

16:4

4

21:0

4

1:2

4

5:4

4

10:0

4

14:2

4

18:4

4

23:0

4

3:2

4

7:4

4

12:0

4

16:2

4

20:4

4

1:0

4

5:2

4

9:4

4

pgsin pgsout

Paging (filesystem) 2004- 05- 01

0

500

1000

1500

2000

2500

3000

3500

4000

4500

17:2

4

21:4

9

2:1

4

6:3

9

11:0

4

15:2

9

19:5

4

0:1

9

4:4

4

9:0

9

13:3

4

17:5

9

22:2

4

2:4

9

7:1

4

11:3

9

16:0

4

20:2

9

0:5

4

5:1

9

9:4

4

14:0

9

18:3

4

22:5

9

3:2

4

7:4

9

12:1

4

16:3

9

21:0

4

1:2

9

5:5

4

10:1

9

14:4

4

19:0

9

23:3

4

3:5

9

8:2

4

fsin fsout

CPU 1

0102030405060708090

100

17:2

4

21:4

4

2:0

4

6:2

4

10:4

4

15:0

4

19:2

4

23:4

4

4:0

4

8:2

4

12:4

4

17:0

4

21:2

4

1:4

4

6:0

4

10:2

4

14:4

4

19:0

4

23:2

4

3:4

4

8:0

4

12:2

4

16:4

4

21:0

4

1:2

4

5:4

4

10:0

4

14:2

4

18:4

4

23:0

4

3:2

4

7:4

4

12:0

4

16:2

4

20:4

4

1:0

4

5:2

4

9:4

4

User% Sys% Wait%

Network I/O (KB/s)

0

100

200

300

400

500

600

700

800

17:2

4

21:4

4

2:0

4

6:2

4

10:4

4

15:0

4

19:2

4

23:4

4

4:0

4

8:2

4

12:4

4

17:0

4

21:2

4

1:4

4

6:0

4

10:2

4

14:4

4

19:0

4

23:2

4

3:4

4

8:0

4

12:2

4

16:4

4

21:0

4

1:2

4

5:4

4

10:0

4

14:2

4

18:4

4

23:0

4

3:2

4

7:4

4

12:0

4

16:2

4

20:4

4

1:0

4

5:2

4

9:4

4

en1- read en4- read lo0- read en1- write en4- write lo0- write

① ②

③ ④

29

IBM Korea , Global Technology Service

2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation

Case 4 : Disk I/O & Disk Wait Rate

* Some System’s wait% value is high. They upgrade their CPU to higher Hz one. After then, Those system CPU wait% is looks so high. But there is not any performance issue. But Administrator was concerned that point.

30

IBM Korea , Global Technology Service

2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation

ftp> haHash mark printing on (1024 bytes/hash mark).ftp> put 100M200 PORT command successful.150 Opening data connection for 100M.#####################################################################################################226 Transfer complete.104857600 bytes sent in 10.99 seconds (9321 Kbytes/s)local: 100M remote: 100Mftp> put 100M200 PORT command successful.150 Opening data connection for 100M.#####################################################################################################226 Transfer complete.104857600 bytes sent in 8.846 seconds (1.158e+04 Kbytes/s)local: 100M remote: 100Mftp> put 100M200 PORT command successful.150 Opening data connection for 100M.

Case 4 : Disk I/O & Disk Wait Rate

* Sometimes, FTP is used as Network Performance Test

HIGH water mark for pending write I/Os per file [30] LOW water mark for pending write I/Os per file [20]

HIGH water mark for pending write I/Os per file [0] LOW water mark for pending write I/Os per file [0]

FTP test without I/Oftp> put "| dd if=/dev/zero bs=32k count=10000" /dev/null327680000 bytes sent in 27.65 seconds (1.157e+04 Kbytes/s)local: | dd if=/dev/zero bs=32k count=10000 remote: /dev/null

31

IBM Korea , Global Technology Service

2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation

Case 4 : Disk I/O & Disk Wait Rate

root@sj_open2:/> nfsstat -m/ss from /ss:S80 Flags: vers=3,proto=tcp,auth=unix,hard,intr,link,symlink,rsize=32768,wsize=32768,retrans=5 All: srtt=0 (0ms), dev=0 (0ms), cur=0 (0ms)

root@sj_open2:/ss> timex dd if=/dev/zero of=./100M bs=1024k count=100100+0 records in.100+0 records out.

real 9.84user 0.00sys 0.36 root@sj_open2:/> mount -o vers=2 S80:/ss /ss

root@sj_open2:/> nfsstat -m/ss from /ss:S80 Flags: vers=2,proto=tcp,auth=unix,hard,intr,dynamic,rsize=8192,wsize=8192,retrans=5 All: srtt=0 (0ms), dev=0 (0ms), cur=0 (0ms)

root@sj_open2:/> cd /ssroot@sj_open2:/ss> timex dd if=/dev/zero of=./100M bs=1024k count=100100+0 records in.100+0 records out.

real 97.64user 0.00sys 0.40

* NFS is usually used. But many users don’t verify NFS status. USE nfsstat

32

IBM Korea , Global Technology Service

2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation

Case 4 : Disk I/O & Disk Wait Rate

* On Server #1, Disk Busy rate was 100%. But their I/O ratio was very low. That time, On Server #2, Their Full Backup Process was running. Those two systems access same disks through a cache in DISK subsystem. That made disk lock on server #1

Server #1 Server #2

Please look around total environment.Symptom can be only one side result.

33

IBM Korea , Global Technology Service

2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation

CPU Total 27/03/07

0

10

20

30

40

50

60

70

80

90

100

0:0

5

0:3

5

1:0

5

1:3

5

2:0

5

2:3

5

3:0

5

3:3

5

4:0

5

4:3

5

5:0

5

5:3

5

6:0

5

6:3

5

7:0

5

7:3

5

8:0

5

8:3

5

9:0

5

9:3

5

10:0

5

10:3

5

11:0

5

11:3

5

12:0

5

12:3

5

13:0

5

13:3

5

14:0

5

14:3

5

15:0

5

15:3

5

16:0

5

16:3

5

17:0

5

17:3

5

18:0

5

18:3

5

19:0

5

19:3

5

20:0

5

20:3

5

21:0

5

21:3

5

22:0

5

22:3

5

23:0

5

23:3

5

User% Sys% Wait%

* One day, System Administrator changed some system’s HBA card that is reported by H/W fault. The next day he changed, Those system got Performance Problem.

CPU Total 29/03/07

0

10

20

30

40

50

60

70

80

90

100

0:0

5

0:3

5

1:0

5

1:3

5

2:0

5

2:3

5

3:0

5

3:3

5

4:0

5

4:3

5

5:0

5

5:3

5

6:0

5

6:3

5

7:0

5

7:3

5

8:0

5

8:3

5

9:0

5

9:3

5

10:0

5

10:3

5

11:0

5

11:3

5

12:0

5

12:3

5

13:0

5

13:3

5

14:0

5

14:3

5

15:0

5

15:3

5

16:0

5

16:3

5

17:0

5

17:3

5

18:0

5

18:3

5

19:0

5

19:3

5

20:0

5

20:3

5

21:0

5

21:3

5

22:0

5

22:3

5

23:0

5

23:3

5

User% Sys% Wait%

* It looks like Application Query Problem. But Application Team didn’t agree with that. They Just believe those problem was from Change Management.

Before

After

Case 4 : Disk I/O & Disk Wait Rate

34

IBM Korea , Global Technology Service

2007 pSMA Seminar | Performance Analysis with Real Case | Confidential © 2006 IBM Corporation

Q & A

Q & AThank you very much !