b enchmark on d ell 2950+md1000 atlas tier2/tier3 workshop wenjing wu aglt2 / university of...

Post on 20-Jan-2018

212 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

CURRENT SETUP

TRANSCRIPT

BENCHMARK ON DELL 2950+MD1000

ATLAS Tier2/Tier3 workshop

Wenjing wuAGLT2 / University of Michigan

2008/05/27

DELL 2950+4 MD1000

2

CURRENT SETUP

2950 HARDWARE EQUIPMENT Chassis Model: PowerEdge 2950

2 CPUS: Quad core, Intel Xeon CPU E5335@2.00GHz Model 15 Stepping 11

Memory : 16GB DDR II SDRAM, Memory Speed: 667 MHz NIC :

Broadcom NetXtreme II BCM5708 Gigabit Ethernet Myricom 10G-PCIE-8A-C

Raid controllers PERC 5/E Adapter Version 5.1.1-0040 (Slot 1 PCI-e 8x) PERC 5/E Adapter Version 5.1.1-0040 (Slot 2 PCI-e 4x) PERC 6/E Adapter Firmware version 6.0.2-0002 (Slot 1 PCI-e 8x) (extra

700$) PERC 6/E Adapter Firmware version 6.0.2-0002(Slot 2 PCI-e 4x) (extra 700$)

Storage Enclosures 4 MD1000 (each has15 SATA-II 750GB disks)

2950 SOFTWARE EQUIPMENT

OS Scientific Linux CERN SLC release 4.5 (Beryllium) Kernel version: 2.6.20-20UL3smp (current 2.6.20-

20UL5smp) Version Report

BIOS Version : 1.5.1 (current 2.2.6) BMC Version : 1.33 (current 2.0.5) DRAC 5 Version : 1.14 (current 1.33)

BENCHMARK TOOL Benchmark tool: iozone (iozone-3.279-

1.el4.rf.x86_64) Raid configuration tool: omconfig (srvadmin-

omacore-5.2.0-460.i386) Soft Raid: mdadm (mdadm-2.6.1-4.x86_64)

METRICS OF BENCHMARK Controller Level (both perc5/perc6)

raid setup (R0, R5,R50,R6,R60) Read and write policy (ra, ara,nra, wb, wt, fwb) Threshold of both Controllers Stripe size (8KB,16KB,32KB,64KB, 128KB, 256KB,512Kb,1024KB)

Perc5 support maximum 128KB stripe size, perc6 support maximum 1024KB stripe size

Kernel tuning (2.6.20-20UL3smp) read Ahead size Request queue length IO scheduler

File System tuning (xfs) inode size su/sw size internal/external log device

GENERAL PRINCIPLE FOR BENCHMARK There are various factors which would impact

the benchmark result, to measure one, we are trying to fix the other affecting factors on a best value we have got or we anticipate..

We need to benchmark different IO patterns (sequence read/write random read/write/mix workload)

In all, we need a benchmark for all best options for our Dell2950.

CONTROLLER LEVELraid setup (R5,R50,R6,R60)Read and write policy (ra, ara,nra, wb, wt,

fwb)Threshold of Controller(perc5/perc6)Stripe size (8KB,16KB,32KB,64KB, 128KB,

256KB,512Kb,1024KB)Perc5 support maximum 128KB stripe size, perc6

support maximum 1024KB stripe size

PERC5 VS PERC6System setup:• Controller=perc6/perc5• PCI slots= both pci express x4 and x8• raid=r60/r6/r50• stripe size =128KB• read=ra, write=wb• OS kernel= 2.6.20-20UL3smp• readAhead size=10240Blocks=5MB• nr_queue=128 queue_depth=128• IO_scheduler=deadlinefile system options:• su=0 , sw=0• isize=256, bsize=4096, • log=internal bsize=4096 iozone options:• filesize=32GB, ram size=16GB• record size=512KB , multiple threadsMeasure: perc 5/6

READ

1 2 3 4 5 6 7 8 9 10 11 120

200

400

600

800

1000

1200

1400

1600

1800

2000

perc5E vs perc6E read

p52r50-stripe128p62r50-stripe128p62r60-stripe128p62r60-stripe512

Number of threads

perf

orm

ance

MB/

s

WRITE

1 2 3 4 5 6 7 8 9 10 11 120

200

400

600

800

1000

1200

perc5E vs perc6E write

p52r50-stripe128p62r50-stripe128p62r60-stripe128p62r60-stripe512

Threads number

perf

orm

ance

MB/

s

RAID SETUPSystem setup:• Controller=perc5 /perc6 • PCI slots= both pci express x4 and x8• stripe size =128KB• OS kernel= 2.6.20-20UL3smp• readAhead size=10240Blocks=5MB• nr_queue=128 queue_depth=128• IO_scheduler=deadlinefile system options:• su=0 , sw=0• isize=256, bsize=4096, • log=internal bsize=4096 iozone options:• filesize=32GB, ram size=16GB• record size=512KB, multiple threadsMeasure: different raid (r5, r50,r6,r60)

WRITE

1 2 3 4 5 6 7 8 9 10 11 120

200

400

600

800

1000

1200

write performance

p5-4r5p5-2r5p5-2r50p6-2r50p6-r6p6-2r60

number of threads

perf

orm

ance

MB/

s

SOFT RAID ON PERC5Soft raid 0 over 2 r5:Soft raid stripe size should be the same as the

hard raid5 stripe size(128KB)Soft raid 0 over 2 r50:Soft raid stripe size should be the same as the

hard raid5 stripe size(128KB)

WRITE

1 2 3 4 5 6 7 8 9 10 11 120

100

200

300

400

500

600

700

800

write performance

p5-2r5p5-sr02r5p5-2r50p5-sr02r50

number of threads

perf

orm

ance

MB/

s

READ

1 2 3 4 5 6 7 8 9 10 11 120

200

400

600

800

1000

1200

1400

1600

1800

2000

read performance

p5-4r5p5-2r5p5-2r50p6-2r50p6-r6p6-2r60

number of parallel threads

perf

orm

ance

MB?

S

READ

1 2 3 4 5 6 7 8 9 10 11 120

200

400

600

800

1000

1200

1400

1600

1800

read performance

p5-2r5p5-sr02r5p5-2r50p5-sr02r50

number of parallel threads

perf

orm

ance

MB?

S

READ AND WRITE POLICYSystem setup:• Controller=perc5• PCI slots= both pci express x4 and x8• raid=r50• stripe size =128KB• OS kernel= 2.6.20-20UL3smp• readAhead size=10240Blocks=5MB• nr_queue=128 queue_depth=128• IO_scheduler=deadlinefile system options:• su=0 , sw=0• isize=256, bsize=4096, • log=internal bsize=4096 iozone options:• filesize=32GB, ram size=16GB• different record sizeMeasure: different policies (ra, nra,ara, wb,wt,fwb)

WRITE

32 64 128 256 512 1024 2048 4096 81920

50000

100000

150000

200000

250000

300000

350000

400000

450000

write and read policies Write

nrafwbrawtarawb

record size/ kB

perf

orm

ance

MB/

s

READ

32 64 128 256 512 1024 2048 4096 8192740000

745000

750000

755000

760000

765000

770000

775000

write and read policies Read

nrafwbrawtarawb

record size/ kB

perf

orm

ance

MB/

s

PERC5 THRESHOLDSystem setup:• Controller=perc5• Pci slots= pci express x8• raid=r0• stripe size =128KB• read=ra, write=wb• OS kernel= 2.6.20-20UL3smp• readAhead size=10240Blocks=5MB• nr_queue=128 queue_depth=128• IO_scheduler=deadlinefile system options:• su=0 , sw=0• isize=256, bsize=4096, • log=internal bsize=4096 iozone options:• filesize=32GB, ram size=16GB• record size=512KB

Measure single controller with different number of disks.(4-30disks)

PERC5 THRESHOLD

4 8 10 15 18 20 24 300

100000

200000

300000

400000

500000

600000

700000

800000

900000

raid0 with numbers of disks

readwrite

number of disks

perf

orm

ance

MB/

s

PERC 6 THRESHOLDSystem setup:• Controller=perc6• Pci slots= pci express x8• raid=r60• stripe size =512KB• read=ra, write=wb• OS kernel= 2.6.20-20UL3smp• readAhead size=10240Blocks=5MB• nr_queue=512 queue_depth=128• IO_scheduler=deadlinefile system options:• su=0 , sw=0• isize=256, bsize=4096, • log=internal bsize=4096 iozone options:• filesize=32GB, ram size=16GB• record size=512KB

Measure single controller with different number of disks.(8, 12,24,30,45)

PERC 6 THRESHOLD

8 12 18 24 30 450

100

200

300

400

500

600

700

800

900

1000

writere-writereadre-read

number of disks

perf

orm

ance

MB/

s

STRIPE SIZESystem setup:• Controller=perc6• PCI slots= both pci express x4 and x8• raid=r60• stripe size =(64,128,256,512,1024)KB• read=ra, write=wb• OS kernel= 2.6.20-20UL3smp• readAhead size=10240Blocks=5MB• nr_queue=512 queue_depth=128• IO_scheduler=deadlinefile system options:• su=0 , sw=0• isize=256, bsize=4096, • log=internal bsize=4096 iozone options:• filesize=32GB, ram size=16GB• record size=512KB , multiple threadsMeasure: different stripe size (64,128,256,512,1024)KB

R60 –STRIPE SIZE

1 2 3 4 5 6 7 8 9 10 11 120

200000

400000

600000

800000

1000000

1200000

1400000

1600000

1800000

2000000read

r6-128r60-64r60-128r60-256r60-512r60-1024

number of threads

perf

orm

ance

MB/

s

R60-STRIPE SIZE

1 2 3 4 5 6 7 8 9 10 11 120

100000

200000

300000

400000

500000

600000

700000

800000write

r6-128r60-64r60-128r60-256r60-512r60-1024

number of threads

perf

orm

ance

MB/

s

KERNEL TUNING

read Ahead sizeRequest queue lengthIO scheduler

READAHEAD SIZESystem setup:• Controller=perc5• PCI slots= both pci express x4 and x8• raid=r50• stripe size =128KB• read=ra, write=wb• OS kernel= 2.6.20-20UL3smp• nr_queue=128 queue_depth=128• IO_scheduler=deadlinefile system options:• su=0 , sw=0• isize=256, bsize=4096, • log=internal bsize=4096 iozone options:• filesize=32GB, ram size=16GB• record size=512KB , Measure: different readAhead size

READ

128

256 512 768 1024

2048

3072

4096

5120

6144

7168

8192

9126102

4011

26412

28813

312143

36153

60163

84174

08184

320

100

200

300

400

500

600

700

800

900

read performance with different readAhead size

read

readAhead size (blocks)

Perf

orm

ance

MB/

s

REQUEST QUEUE LENGTHSystem setup:• Controller=perc6• PCI slots= both pci express x4 and x8• raid=r60• stripe size =128KB• read=ra, write=wb• OS kernel= 2.6.20-20UL3smp • readAhead size=10240Blocks=5MB• queue_depth=128• IO_scheduler=deadlinefile system options:• su=0 , sw=0• isize=256, bsize=4096, • log=internal bsize=4096 iozone options:• filesize=32GB, ram size=16GB• record size=512KB , multiple threadsMeasure: different request queue length

READ

1 2 3 4 5 6 7 8 9 10 11 120

200000

400000

600000

800000

1000000

1200000

1400000diff nr_queue read

32641282565121024

number of threads

perf

orm

ance

MB/

s

WRITE

1 2 3 4 5 6 7 8 9 10 11 120

100000

200000

300000

400000

500000

600000

700000

800000

900000

1000000diff nr_queue write

32641282565121024

number of threads

perf

orm

ance

M

B/s

IO SCHEDULERSystem setup:• Controller=perc6• PCI slots= both pci express x4 and x8• raid=r50• stripe size =128KB• read=ra, write=wb• OS kernel= 2.6.20-20UL3smp• readAhead size=10240Blocks=5MB• nr_queue=512 queue_depth=128file system options:• su=0 , sw=0• isize=256, bsize=4096, • log=internal bsize=4096 iozone options:• filesize=32GB, ram size=16GB• record size=512KB, multiple threadsMeasure: different scheduler

READ

1 2 3 4 5 6 7 8 9 10 11 120

200000

400000

600000

800000

1000000

1200000

1400000

1600000diff IO scheduler read

anticipatecfqdeadlinenoop

number of threads

perf

orm

ance

MB/

s

WRITE

1 2 3 4 5 6 7 8 9 10 11 120

100000

200000

300000

400000

500000

600000

700000

800000

900000

1000000diff IO schedule write

anticipatecfqdeadlinenoop

number of threads

Perf

orm

ance

MB/

s

RANDOM READ

1 2 3 4 5 6 7 8 9 10 11 120

20000

40000

60000

80000

100000

120000

140000

160000

180000

200000diff IO scheduler random read

anticipatecfqdeadlinenoop

number of threads

Perf

orm

ance

MB/

s

FILESYSTEM TUNING

• inode size• su/sw size• internal/external log device

FILE SYSTMESystem setup:• Controller=perc5 • Raid=r50• PCI slots= both pci express x4 and x8• stripe size =128KB• OS kernel= 2.6.20-20UL3smp• readAhead size=10240Blocks=5MB• nr_queue=128 queue_depth=128• IO_scheduler=deadlinefile system options:• su=0 , sw=0• isize=256, bsize=4096, dd options:• filesize=10GB, ram size=320MB• record size=1MBMeasure: internal or external log device for xfs

WRITE

1 2 3 4 5 6 7 8 9 10 11 120

100

200

300

400

500

600

700

800

xfs external vs internal log device write

ex-log-10240in-log-10240

Number of threads

Perf

orm

ance

MB/

s

READ

1 2 3 4 5 6 7 8 9 10 11 120

200

400

600

800

1000

1200

1400

1600

1800

xfs external vs internal log device read

ex-log-10240in-log-10240

Number of threads

Perf

orm

ance

MB/

s

XFS INODE SIZESystem setup:• Controller=perc5 • Raid=r50• PCI slots= both pci express x4 and x8• stripe size =128KB• OS kernel= 2.6.20-20UL3smp• readAhead size=10240Blocks=5MB• nr_queue=128 queue_depth=128• IO_scheduler=deadlinefile system options:• su=0 , sw=0• bsize=4096, • Internal Log, isize=256, bsize=4096dd options:• filesize=10GB, ram size=320MB• record size=1MBMeasure: xfs inode size

XFS INODE SIZE

256 512 1024 2048 3072 40890

100

200

300

400

500

600

700

800

xfs inode size

writeread

inode size kB

perf

orm

ance

MB/

s

XFS SU/SW SIZESystem setup:• Controller=perc5 • Raid=r50• PCI slots= both pci express x4 and x8• stripe size =128KB• OS kernel= 2.6.20-20UL3smp• readAhead size=10240Blocks=5MB• nr_queue=128 queue_depth=128• IO_scheduler=deadlinefile system options:• isize=256KB, bsize=4096KB, • Internal Log, isize=256KB, bsize=4096KBiozone options:• filesize=10GB, ram size=320MB• record size=1MBMeasure: xfs sw/su size

SU /SW SIZE

128k/28 128k/14 64k/28 64k/14 32k/28 32k/14 0/00

100

200

300

400

500

600

700

800

xfs su/sw size

writeread

su/sw size

perf

orm

ance

MB/

s

OUR SETUP NOWSystem setup:• Controller=perc56• Raid=r60• PCI slots= both pci express x4 and x8• stripe size =512KB• OS kernel= 2.6.20-20UL5smpKernel options:• readAhead size=10240Blocks=5MB• nr_queue=512 queue_depth=128• IO_scheduler=deadlinefile system options:• isize=256KB, bsize=4096KB, • Internal Log, isize=256KB, bsize=4096KB

OUR PERFORMANCE NOWSingle read=670MB/sAggregate read=1500MB/s (threads>=2)Even with 40 concurrent readers, it can still

achieve 1200MB/s ..

Single write=320MB/sAggregate write=680MB/s (threads>=2)

This is not the best IO, r60 with stripe size 128KB can achieve 760MB/s of single read and single write performs almost the same. For a production system, we focus more on the aggregate performance…

ONGOING PROJECTCITI people of UM are doing: Disk-to-disk transfer over 10 GbEDeliverables• Monthly report on performance tests, server configurations, kernel tuning, and kernel

bottlenecks• Final report on performance tests, server configurations, kernel tuning, and kernel

bottlenecks

UltraLight kernelDeliverables• Tuned and tested UltraLight kernel with full feature set• Current 10GbE NIC drivers• Current storage drivers• Tuned for WAN data movement• Web100 patches• Other patches for performance, security, and stability• Release document and web page updates for UltraLight kernelhttp://www.ultralight.org/web-site/ultralight/workgroups/network/Kernel/kernel.html• Recommend sustainable options for the Ultralight kernel in the near and intermediate term

ONGOING PROJECT(CONT) QoS experiments

Deliverable• Document throughput performance with and without QoS in the face of competing traffic

MORE INFORMATIONAGLT2 IO benchmark page:https://hep.pa.msu.edu/twiki/bin/view/AGLT2/IOTestOnRaidSystemsReferences:http://www.makarevitch.com/rant/3ware/http://insights.oetiker.ch/linux/raidoptimization.html

top related