iops ÿ g+ busy o io '! ' ¦ + x

10
iops busy IO iops IO size read/write IO IO iops iops iops OS 1iops IO busy IO 3 IO Method of guaranteeing IO performance that uses IO busy rate for each iops. Kazuichi Oe It is well known that maximum performance (iops) of a hard disk is influenced by access position and range on the volume, the distribution of IO size, the ratio of read/write and etc.. In order to guarantee the IO performance for the specified application, it is needed to monitor the current maximum IO performance dynamically. We propose the method of guaranteeing the IO performance by using the Rate of IO busy for each iops. The Rate of IO busy for each iops can be calculated from statistical information of the operating system. We have verified the effectiveness of this method by using the real samba workload, backup workload and the mixture of them 1. IO read/write seek IO OS OS IO RAID Fujitsu Laboratories Ltd. OS OS IO IO busy IO busy iops IO busy busy IO busy iops busy IO 2 3 4 IO busy IO コンピュータシステム・シンポジウ C om puter System Sym posium 2011 Information Processing Society of Japan 73 ComSys2011 2011/12/1

Upload: others

Post on 11-Nov-2021

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: iops ÿ G+ busy O IO '! ' ¦ + X

iops busy IO

iops IO size read/write IOIOiops

iops iopsOS 1iops IO busy

IO3

IO

Method of guaranteeing IO performancethat uses IO busy rate for each iops.

Kazuichi Oe†

It is well known that maximum performance (iops) of a hard disk is influenced by accessposition and range on the volume, the distribution of IO size, the ratio of read/write and etc..In order to guarantee the IO performance for the specified application, it is needed to monitorthe current maximum IO performance dynamically. We propose the method of guaranteeingthe IO performance by using the Rate of IO busy for each iops. The Rate of IO busy foreach iops can be calculated from statistical information of the operating system. We haveverified the effectiveness of this method by using the real samba workload, backup workloadand the mixture of them

1.

IO read/write

seek

IO

OS

OS IO

RAID

Fujitsu Laboratories Ltd.

OS

OS

IO

• IO busy IO busy iops

• IO busy busy

IO busy iops

busy

IO

2

3

4

IO busy IO

コンピュータシステム・シンポジウム C om puter System Sym posium

ⓒ 2011 Information Processing Society of Japan73

ComSys20112011/12/1

Page 2: iops ÿ G+ busy O IO '! ' ¦ + X

5 6

1 IO

iops

PC

PC

IO

2.

IO

PC

Facade controller

1) SLO

service-level-objective Facade controller

SLO request read/write

Facade controller

iops

Facade controller

SLO

IO queue

iops

VM OS IO

VM PC 1

2) PC

IO share IO share

PC IO PC

read/write PC

flow control flow

control PC read/write IO

share PC IO queue PC

PC IO queue

IO share IO

PC VM SFQ VM

IO queue

1) IO queue

VM

iops

OS IO

Linux Block IO Controller3) Linux

Block IO Controller IO queue CFQ

IO queue VM

IO

2) 1) IO

queue

IO queue

4) Linux XFS

SGI IRIX IO

8)

XFS

IO

IO

IO 5) 6)

3.

• IO size read/write

• IO

• IO IO

OS

IO size size

コンピュータシステム・シンポジウム C om puter System Sym posium

ⓒ 2011 Information Processing Society of Japan74

ComSys20112011/12/1

Page 3: iops ÿ G+ busy O IO '! ' ¦ + X

• IO busy iops

– IO busy IO busy iops

– IO busy busy

IO busy iops

• IO busy

• IO busy =100%

IO

IO

RAID

RAID 64KB stripe

size stripe size *

IO

IO RAID

1/( )

Samba

1000

Veritas VxFS 500GB

7)

NetVault

2

2TB 500GB A

500GB B

A B A B

(A+B) iops %util svctm

%util Linux IO busy

svctm 1IO

iostat svctm iops

iops = 1/svctm

1

http://www.bakbone.co.jp/products/netvault.html

1 IO size

Fig. 1 IO size distribution used to experiment

3 4

A

iops

B A

90% 89% A B

iops

A 70% 69% A

B

A+B

30%

3 4

A 239iops 65iops

3.7

1

Table 1 Storage device environment

PC FUJITSU PRIMERGY RX100S4

Celeron 3.06 GHz,Mem=2GB,CentOS 5.3

disk Newtech Evolution II 400GB

(SATA 2U)

6disk/2TB RAID5

(stripe size=64KB)

2

Table 2 Workload used to experiment

IO size 1 128KB

read:write 1 50:50

offset 2 0-30GB, 300-330GB

コンピュータシステム・シンポジウム C om puter System Sym posium

ⓒ 2011 Information Processing Society of Japan75

ComSys20112011/12/1

Page 4: iops ÿ G+ busy O IO '! ' ¦ + X

2 IO offset

Fig. 2 IO offset distribution used to experiment

3 IO offset svctm

Fig. 3 Relation between IO offset and svctm (file sharing)

iostat kernel block

device IO queue IO CPU

%util

%util=100 IO

%util iops

%util iops %util

%util iops

%util 100%

iops

%util busy

5 busy

A B

90% A+B 95%

B 80%

2.6.9-67

ll rw blk.c

4 IO offset svctm

Fig. 4 Relation between IO offset and svctm (backup)

5

Fig. 5 Summary of workload analysis result

A A+B 85%

busy

iops %util

iops IO

4. IO busy IO

IO IO busy

busy

busy IO busy iops

IO busy iops

1iops busy

1iops IO busy

IO

IO busy busy

コンピュータシステム・シンポジウム C om puter System Sym posium

ⓒ 2011 Information Processing Society of Japan76

ComSys20112011/12/1

Page 5: iops ÿ G+ busy O IO '! ' ¦ + X

IO busy IO

IO busy IO busy

6

busy

1iops IO busy

1iops IO busy

IO

iops IO busy

1iops busy

1iops busy

IO busy

IO busy

IO busy busy

IO busy busy

busy

OS

IO busy busy

6 IO

busy 100% IO busy

busy IO

busy busy IO busy

iops

IO busy 100% 4

busy 80-85%

2-3iops IO busy 100%

IO busy busy

IO busy =100%

IO busy busy

2

IO busy

• IO busy IO busy

• 1iops busy 1iops

busy

IO

3 80%-95%

95%

6

Fig. 6 Workload information obtained when operating it

IO

IO

A a iops

B A B

IO busy X% X IO

busy busy

iops IO busy

1iops IO busy

busy per iops = IO busy /iops

Linux iostat -x

busy per iops = %util/(r/s + w/s)

1iops IO busy

B

iops B A

iops

( 1 ) a iops IO busy

a busy = a * busy per iops

( 2 ) B IO busy

b busy = X - a busy

( 3 ) b util B

b iops

b = b busy / busy per iops

( 4 ) B b iops

コンピュータシステム・シンポジウム C om puter System Sym posium

ⓒ 2011 Information Processing Society of Japan77

ComSys20112011/12/1

Page 6: iops ÿ G+ busy O IO '! ' ¦ + X

(4) 5

IO1 IO

IO b 1

IO 1

IO IO

5

kernel driver

IO busy

A B IO busy 100%

IO busy

IO busy

100%

100%

IO busy 10

3 2100% IO busy

IO

busy New X X New X

New X X x

New X

New X = X - x

7 12 busy

New X 12 x

x

• IO busy =100 iops(=C)

• IO busy 12 iops

IO busy 3

IO busy 12

1 1 1iops

IO busy

12 =

2 2

IO busy 100%

B

18 thread

2 20 603

5 B IO busy 100%

IO busy 100%

IO kernel

A B IO

B kernel

IO IO busy

IO busy =100%

A B iops 20 % B

IO busy 100%

7 1

1iops IO busy =c

2 1iops IO busy

=d d 12 1iops

IO busy 10 d Ciops

x 1

10 IO busy =(c-d)*C

x = (c-d)*C*y (y=0 ∼ 1)

New X

New X = X - (c-d)*C*y (y=0 ∼ 1)

New X

New X

y

y 4

busy New X

New X busy

y

New X

5

y=0.5 0.5

New X

IO busy busy

)

IO busy IO busy

1iops busy

1iops busy

2

busy

IO busy

New X IO busy

IO busy IO busy

4 IO busy = busy

コンピュータシステム・シンポジウム C om puter System Sym posium

ⓒ 2011 Information Processing Society of Japan78

ComSys20112011/12/1

Page 7: iops ÿ G+ busy O IO '! ' ¦ + X

7 busy

Fig. 7 Method of updating proportion limit busy rate

20%

IO

3 busy

iops IO busy

busy

1iops IO busy

iops

1IO

3 svctmA+B

iops=80 svctm=5.68ms

iops=160 svctm=5.72 svctm

0.78% iops 80

160 IO

1iops

IO busy

1iops IO

busy iops

5.

3 1

8

• A IO thread

– thread

10thread IO

3 500GB

IO ( 9 )

– thread

8

Fig. 8 Outline of verification system

4 5 6

thread

• B IO thread

– thread

10thread IO

3 500GB

IO ( 9 )

– thread

4 5 6

thread

thread

• thread

– B

– IO busy =95%

A B

3 3

1 2 3 4

5 6

3 thread A,B

Table 3 Workload given to thread A and B

A B

1

2

3

20 iostat -x

コンピュータシステム・シンポジウム C om puter System Sym posium

ⓒ 2011 Information Processing Society of Japan79

ComSys20112011/12/1

Page 8: iops ÿ G+ busy O IO '! ' ¦ + X

9

Fig. 9 How to give load

1 10 4

11 %util=100

A 50iops

A

B iops 7

2 %util=100

B iops

11

%util=100 3 500

IO busy 63%

%util=100 3 700

thread A 50iops

1 IO busy

2

960 A 5iops B

4 1 iops

Table 4 Target iops of evaluation 1

A iops B iops ( )

5 50 5

50 50 10

5 50 10

5 2 iops

Table 5 Target iops of evaluation 2

A iops B iops ( )

10 200 5

100 200 10

50 200 10

6 3 iops

Table 6 Target iops of evaluation 3

A iops B iops ( )

5 30 5

30 30 10

5 30 10

4 X

10 1

Fig. 10 Outcome of an experiment of evaluation 1

11 1 %util=100

Fig. 11 Outcome of an experiment of evaluation 1

(frequency to which %util=100 is observed)

A B %util

50% IO busy

%util/iops 5iops

0.8 1.3

IO busy

2 12 5

13 %util=100

A 100iops

A iops

B

IO busy

87% A 100iops

1 IO busy

3 14 6

15 %util=100

コンピュータシステム・シンポジウム C om puter System Sym posium

ⓒ 2011 Information Processing Society of Japan80

ComSys20112011/12/1

Page 9: iops ÿ G+ busy O IO '! ' ¦ + X

12 2

Fig. 12 Outcome of an experiment of evaluation 2

13 2 %util=100

Fig. 13 Outcome of an experiment of evaluation 2

(frequency to which %util=100 is observed)

A 100iops

2

3 1 IO busy

1 2

3 A thread IO

1 IO busy

2

IO busy IO busy

IO busy

1iops IO busy

1iops IO busy

1

14 3

Fig. 14 Outcome of an experiment of evaluation 3

15 3 %util=100

Fig. 15 Outcome of an experiment of evaluation 3

(frequency to which %util=100 is observed)

6.

IO busy iops

busy

busy IO busy

iops

IO busy IO busy

1iops IO busy

IO busy busy

3

IO

IO busy

2

コンピュータシステム・シンポジウム C om puter System Sym posium

ⓒ 2011 Information Processing Society of Japan81

ComSys20112011/12/1

Page 10: iops ÿ G+ busy O IO '! ' ¦ + X

1 IO

busy

7.

IT

1) Christopher R.Lumb, Arif Merchant, and

Guillermo A.Alvarez. Facade: virtual storage

devices with performance guarantees. Proceed-

ings of FAST’03, March 2003.

2) Ajay Gulati, Irfan Ahmad, and Carl A. Wald-

spurger. PARDA: Proportional Allocation of

Resources for Distributed Storage Access. Pro-

ceedings of FAST’09, February 2009.

3) http://sourceforge.jp/projects/sfnet ioband/

4)

HOKKE-2009

5)

SPA

2005

6) Kyung D.Ryu, Jeffery K. Hollingsworth, and

Peter J. Keleher. Efficient Network and I/O

Throttling for Fine-Grain Cycle Steamling. In

Proceedings of the ACM/IEEE SC2001 Con-

ference (SC ‘01), November 2001

7) : iops

IO Busy IO

8 SAC-

SIS2010, May 2010.

8) SGI IRIX Admin: Disks and Filesystems(

), 007-2825-009JP

コンピュータシステム・シンポジウム C om puter System Sym posium

ⓒ 2011 Information Processing Society of Japan82

ComSys20112011/12/1