linux for zseries performance update

35
Klaus Bergmann 02/24/2003, Session # 2559 Linux for zSeries Performance Update

Upload: others

Post on 02-Oct-2021

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Linux for zSeries Performance Update

Klaus Bergmann02/24/2003, Session # 2559

Linux for zSeries Performance Update

Page 2: Linux for zSeries Performance Update

The following are trademarks of the International Business Machines Corporation in the United States and/or other countries.Enterprise Storage ServerESCON*FICONFICON ExpressHiperSocketsIBM*IBM logo*IBM eServerNetfinity*S/390*VM/ESA*WebSphere*z/VMzSeries* Registered trademarks of IBM CorporationThe following are trademarks or registered trademarks of other companies.Intel is a trademark of the Intel Corporation in the United States and other countries.Java and all Java-related trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc., in the United States and other countries.Lotus, Notes, and Domino are trademarks or registered trademarks of Lotus Development Corporation.Linux is a registered trademark of Linus Torvalds.Microsoft, Windows and Windows NT are registered trademarks of Microsoft Corporation.Penguin (Tux) compliments of Larry Ewing.SET and Secure Electronic Transaction are trademarks owned by SET Secure Electronic Transaction LLC.UNIX is a registered trademark of The Open Group in the United States and other countries.* All other products may be trademarks or registered trademarks of their respective companies.I

Trademarks

Page 3: Linux for zSeries Performance Update

Agenda

HardwareScalabilityNetworkingDisk I/O

FCP / SCSIESSLVM

All measurement results arebased on IBM internal benchmarks in a controlled environment and results may vary.

Page 4: Linux for zSeries Performance Update

MBASD54 MB

SD14 MB

SD74 MB

SD34 MB

SD64 MB

SD24 MB

SD44 MB

SD04 MB

MBA

MBA MBA

SC1

SC0

CLK

PU2 PU3

PU6

PU4

PU7

PU5

PUB PU1

PUA PU0

PU12 PU13 PU9 PU8

PUC PUD

PU10

PUE

PU11

PUF

MBA

CLK

SC1

SC0

SD54 MB

SD44 MB

SD14 MB

SD04 MB

PU2

PUB PU1

PU3

PU4

PUC

PUD

PUE PUA PU0

PUF PU5MBA MBA

MBA

20 PU Chips @ 1.3 / 1.09 ns3 SAP's, 1 spareup to 16 CP'sup to 8 ICF's/IFL's

12 PU Chips @ 1.3 ns2 SAP's, 1 spareup to 9 CP'sup to 8(9) ICF's/IFL's

20-PU Module 110-116, 1C1-1C9 12-PU Module210-216, 2C1-2C9 101..109, 100 (CF)

Page 5: Linux for zSeries Performance Update

Memory(up to 16 GB)

Level 2 Cache (16 MB)

Memory(up to 16 GB)

PU PU PU PU PU PU PU PU PU PU

Level 2 Cache (16 MB)

PU PU PU PU PU PU PU PU PU PU

Memory(up to 16 GB)

Memory(up to 16 GB)

100..150x

:15..20x

:1x

The problemMemory access does not scale with CPU-cycle

SolutionsHigh bandwidths (=> throughput):4 x 16 Byte @ 2.18 ns==> 28 GB/secCache Hierarchies (latencies)

Level 1 Caches on CPU-Chip Level 2 Cache 'shared by 10'

PU

256KData

256KInstr

z900 Systemstructure: Optimized for maximum internal bandwidth

Page 6: Linux for zSeries Performance Update

M B A

STI STI STI STI STI STI

M B A

STI STI STI STI STI STI

M B A

STI STI STI STI STI STI

M B A

STI STI STI STI STI STI

6 x 1 GB/s

6 x 1 GB/s

Memory(up to 16 GB)

Level 2 Cache (16 MB)

Memory(up to 16 GB)

PU PU PU PU PU PU PU PU PU PU

Level 2 Cache (16 MB)

PU PU PU PU PU PU PU PU PU PU

Memory(up to 16 GB)

Memory(up to 16 GB)

6 x 1 GB/s

6 x 1 GB/s

z900 Systemstructure: Optimized for maximum external bandwidth

Page 7: Linux for zSeries Performance Update

z900-'Turbo' @ 1.09 nsec16 additional models 2C1..2C9,210..216..20..% faster systems

Additional z900-functionalitiesSupport for FCP and SCSI on FICON Feature in Linux environments

LA since 6/2002, GA 1Q 2003 (with new distributions)CIU: Customer Initiated Upgrade

Web-Interface for processor or memory upgradeLIC/microcode download and upgrade via RSF

OSA-Express(QDIO):IPv6, VLAN, SNMP,...

New functions since 04/2002 :

Page 8: Linux for zSeries Performance Update

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 160

1000

2000

3000

CPs

G6 Turbo z900 2-bus (2064-10x)z900 4-bus (2064-1Cx,11x) z900 4-bus (2064-2Cx,21x)

Relative Performance z900 vs G6 (2064-1C1 = 250)

Page 9: Linux for zSeries Performance Update

ESS800

FICONCU

FICONCU

2Gbps links autonegotiated

1 Gbps linkautonegotiated

FICONCU

DWDM DWDM

ESS800

ESS800

2Gbps links autonegotiated

IBM

IBM

IBM

Direct-attachmentIBM Enterprise Storage Server 800

DWDM and optical amplifiersCisco ONS 15540 ESP (LX, SX) and optical amplifiers (LX, SX)Nortel Optera Metro 5200 and 5300E* and optical amplifiers*IBM 2029 Fiber Saver*

FICON Switched and Cascaded ConnectivityMcDATA ED-6064 (2032-064)INRANGE FC/9000-001/-128 (2042)

Advantages of 2 Gbps Links:Higher throughput, improved performanceExtended opportunities for channel-consolidation

Prerequisite: FICON Express cards (FC 2319, 2320)Native FICON, FICON CTC, FICON Cascaded Directors, Fibre ChannelLink speeds negotiated between server and device

transparent for application and user * SX expected 2H 2003

FICON Express Connectivity Options - 2 Gbps Links

ESS F20

Page 10: Linux for zSeries Performance Update

2064-216 (z900)1.09ns (917MHz)2 * 16 MB L2 Cache (shared)64 GB LPARESCONFICONHiperSocketsOSA Express GbE

384 MB NVS16 GB Cache 128 * 36 GB disks 10.000 RPMFCP (1 Gbps)FICON (1 Gbps)

2105-F20 (Shark)

8687-3RX (8-way X440)8-way Intel Pentium 3 Xeon 1.6 GHz8 * 512K L2 Cache (private)hyperthreadingsummit chipset

Our Hardware for Measurements

Page 11: Linux for zSeries Performance Update

From Kernel version 2.4.7 / 2.4.17 to version 2.4.19From glibc version 2.2.4-31 to version 2.2.5-84From gcc version 2.95.3 to version 3.2-31Huge number of United Linux patches1.3 MLOC (including x,p,i changes)New Linux schedulerAsync I/O...

SuSE SLES7 versus SuSE SLES8

Page 12: Linux for zSeries Performance Update

1 4 8 12 16 20 26 32 40

no. of processes

0

200

400

600

800

1000

1200

thro

ughp

ut [M

B/s

] 1 CPU

4 CPUs

8 CPUs

16 CPUs

SLES8, LPAR , 2064-216, ext2, 31-Bit

1 4 8 12 16 20 26 32 40

no. of processes

0

200

400

600

800

1000

1200

thro

ughp

ut [M

B/s

] 1 CPU

4 CPUs

8 CPUs

16 CPUs

SLES7, LPAR, 2064-216, ext2, 31-Bit

8-way Intel Pentium III, 700 MHz 8-way Intel Pentium III Xeon, 1.6 GHzhyperthreading, summit chipsetvalues in () are Linux maxcpus parameter

1 4 8 12 16 20 26 30

no. of processes

0

200

400

600

800

1000

1200

thro

ughp

ut [M

B/s

ec] 1

2

4

8

netfinity 8-way, ext2, kernel 2.4.14

1 4 8 12 16 20 26 32 40

no. of processes

0

200

400

600

800

1000

1200

thro

ughp

ut [M

B/s

]

1 (1)

1 (2)

2 (4)

4 (8)

8 (16)

xSeries 8-way, ext2, kernel 2.4.20

Dbench File I/O - Scalability

Page 13: Linux for zSeries Performance Update

Context Switching

2 4 8 16 24 32 64 96 128 192 256

# processess

0200400600800

1000

mic

rose

cond

s

0K4K8K16K32K64K96K128K256K

Context SwitchZ900 Kernel 2.4.18

2 4 8 16 24 32 64 96128192256

# processess

0200

400600

800

1000

mic

rose

cond

s

0K4K8K16K32K64K96K128K256K

Context SwitchX440 Kernel 2.4.20

Page 14: Linux for zSeries Performance Update

Networking

Netmarks Results

Page 15: Linux for zSeries Performance Update

Transaction workload - RR 200/1000

31 Bit HS 32k

31 Bit HS 32k

31 Bit Gbe 9000

31 Bit Gbe 9000

31 Bit Gbe 1500

31 Bit Gbe 1500

0 1000 2000 3000 4000 5000 6000 7000 8000 9000

1 connection SLES7SLES8

better

Transactions per second

Page 16: Linux for zSeries Performance Update

Network cost VM - transaction workload

31Bit HS 32K

31Bit HS 32K

31Bit GbE 9000

31Bit GbE 9000

31Bit GbE 1500

31Bit GbE 1500

0 250 500 750

RR 200/1000 - 1 connection

2064-216 CPU µs/tran

betterSLES7SLES8

Page 17: Linux for zSeries Performance Update

Connect Request Response workload (CRR 64/8k-10 connections)

GL 32k 64Bit VM

GL 32k 64Bit VM

HS 32k 64Bit VM

HS 32k 64Bit VM

HS 32k 64Bit LPAR

HS 32k 64Bit LPAR

GL 32k 31Bit VM

GL 32k 31Bit VM

HS 32k 31Bit VM

HS 32k 31Bit VM

HS 32k 31Bit LPAR

HS 32k 31Bit LPAR

0 2000 4000 6000 8000 10000 12000

Con

nect

ion

Typ

e

Transactions per second

better SLES7SLES8

Page 18: Linux for zSeries Performance Update

Disk I/O

Don't treat ESS as a black box, understand its structureThe default is close to worst case:You ask for 16 disks and your SysAdmin gives you addresses 5100-510FWhat's wrong with that?

Page 19: Linux for zSeries Performance Update

F C P C H P I D

F C P C H P I D

F C P C H P I D

z S e r i e s2 0 6 4

F C P S w i t c h 2 1 0 9

F C P C H P I D

1

C l u s t e r P r o c e s s o r C o m p l e x- 4 w a y S M P R I S C s y s t e m

Clus te r P r o c e s s o r C o m p l e x- 4 w a y S M P R I S C s y s t e m

D A D A D A D AD A D A D A D A

Loop BLoop AR a n k

E S S2 1 0 5 - F 2 0

I/OA d a p t e r

I /OA d a p t e r

I /OA d a p t e r

I /OA d a p t e r

2 3 4

H A B a y 1 H A B a y 2 H A B a y 3 H A B a y 4

CHPIDs

Host Adapters 16 HA in 4 bays

Device Adapter Pairssupporting 2 loops

Disk ranks, each rank is one RAID-5 array

ESS Architecture

Page 20: Linux for zSeries Performance Update

Scenario CHPIDs Host Adapt. Ranks Disks bottlenecksingle disk 1 1 1 1 1 HAsingle rank 1 1 1 8 1 HAsingle HA 1 1 4 8 1 HAsingle CHPID 1 4 4 8 1 CHPIDtwo CHPIDs 2 4 4 8 4 HAMax. Avail. 4 4 4 8 4 HA

Benchmark used for measuring: Iozone (http://www.iozone.org)multi process sequential file system I/O, scaling 1-8/16 processeseach process writes and reads a 350 MB file on a separate diskSystem: LPAR, 4 CPUs, 128 MB main memory, linux 2.4.17

Scaling Tests

Page 21: Linux for zSeries Performance Update

2064-216, 917 Mhz, 256MB LPAR4 FICON Express channels used for FCP (IOCDS: type FCP)6 FICON Express Channels used for FICON (IOCDS: type FC)2109-F16 FCP switchESS 2105-F20:

16GB cache, 4 FCP host adapters, 6 FICON host adapters4 device adapter pairsonly A-loops contain disks (36.4 GB, 10,000 RPM):- 4 ranks for FB (fixed block) disks used for FCP- 8 ranks ECKD disks used for FICON measurements

Hardware Setup

Page 22: Linux for zSeries Performance Update

s e q u e n t i a l w r i t e s e q u e n t i a l r e a d0

2 5

5 0

7 5

1 0 0

1 2 5

1 5 0

1 7 5

2 0 0

2 2 5

2 5 0

2 7 5

3 0 0

I o z o n e , M a x i m u m T h r o u g h p u t , 3 1 b i tS e q u e n t i a l f i l e s y s t e m I / O

s i n g l e d i s k s i n g l e r a n k s i n g l e H A s i n g l e C H P I D t w o C H P I D s

4 C 4 W 4 R 2 1 0 5 - F 2 0

MB

/s

- 1 HA limits to 40MB/s write and 65 MB/s read, regardless of the number of ranks- 4 HA are limiting to 125 MB/s write and 240 MB/s read,but 4 CHPIDs are required to make use of - 31 bit and 64 bit difference is small- it is expected that the values further increase using more ranks, HA, CHPIDs

Results for sequential file system I/O

Page 23: Linux for zSeries Performance Update

Locating disks

use the tool 'ESS Specialist' to generate a html report of all disks(Storage Allocation -> Tabular View -> print table)

for ECKD disk it looks like:

logical control unit address unit address ECKD disk rank information(IOCDS: CUADD ) (IOCDS: UNITADD )

for a FCP disk it looks like

rank information FCP disk 16 bit LUN *0x1000000000000=64 bit FCP LUN

FCP disks are not defined in IOCDS!

Page 24: Linux for zSeries Performance Update

CHPIDs: C1, C2, C3, C4Host Adapters: HA1, HA2, HA3, HA4device numbers / LUNs:

rank1: 5000, 5001, 5002, 5003, ...rank2: 5400, 5401, 5402, 5403, ...rank3: 5800, 5801, 5802, 5803, ...rank4: 5C00, 5C01, 5C02, 5C03, ...

For FCP use paths:C1 -> HA1 -> rank1C2 -> HA2 -> rank2C3 -> HA3 -> rank3C4 -> HA4 -> rank4

FICON: Define a path from each CHPID to each LCU

Sample Setup

Page 25: Linux for zSeries Performance Update

LVM:add the disks to one VG in the following order:5000, 5400, 5800, 5C00, 5001, 5401, 5801, 5C01, ...Use striping (32 KB or 64 KB)!

For Linux VM guests:guest1: 5000, 5400, 5800, 5C00, ...guest2: 5001, 5401, 5801, 5C01, ...guest3: 5003, 5403, 5803, 5C03, ... guest4: 5004, 5404, 5804, 5C04, ...

Sample Setup

Page 26: Linux for zSeries Performance Update

LVM: Logical Volume Manager

What is LVM?LVM builds an abstraction layer which hides hardware detailsData is stored on a virtual disk managed by LVMLVM is contained in SuSE SLES7 and SLES8http://www.sistina.com/products_lvm.htm

Benefits of LVMfile and file system sizes independent of physical hardware sizeallows to manage multiple devices as one entity (scalability!)offers much higher performance when configured appropriately

Page 27: Linux for zSeries Performance Update

Physical Disk

Physical Disk

Physical Array

LVM: operating system view

Raid AdapterDevice Driver

(Journaled) Filesystem Raw Logical Volume

Volume Group

PhysicalVolume

PhysicalVolume

PhysicalVolume

Logical Volume Logical Volume

LogicalVolumeManager

Page 28: Linux for zSeries Performance Update

LVM: improving performance

PhysicalVolume

PhysicalVolume

PhysicalVolume

With LVM and striping parallelism is achieved.http://publib-b.boulder.ibm.com/Redbooks.nsf/RedbookAbstracts/tips0128.html?open

Data Stream (striped)

Page 29: Linux for zSeries Performance Update

A simple experiment (1)

System setup: SuSE SLES8, VM guest, 1 CPU, 256MB RAMSingle disk setup: 50GB SCSI diskLVM setup: 4 CHPIDs, 4 WWPNs, 4 ranks, 16 disks, 16 stripes, 32KB stripe size, 62GB total disk sizeFS block size: 4096 bytesWrite block size: 1024 bytes (4096 bytes)Write command: dd if=/dev/zero of=/mnt/dummy.file bs=1024 count=43352000

Page 30: Linux for zSeries Performance Update

0 50 100 150

MB/s

ext2, single disk, SLES7ext3, single disk, SLES7reiser, single disk, SLES7ext2, single disk, SLES8ext2, LVM, SLES8ext2, LVM, bs=4096, SLES8ext3, single disk, SLES8ext3, LVM, SLES8reiser, single disk, SLES8reiser, LVM, SLES8

A simple experiment (2)

Datarates

dd if=/dev/zero of=/mnt/dummy.file bs=1024 count=43352000

Page 31: Linux for zSeries Performance Update

0 1 2 3 4 5 6

seconds [Thousands]

ext2, single disk, SLES7ext3, single disk, SLES7reiser, single disk, SLES7ext2, single disk, SLES8ext2, LVM, SLES8ext2, LVM, bs=4096, SLES8ext3, single disk, SLES8ext3, LVM, SLES8reiser, single disk, SLES8reiser, LVM, SLES8

A simple experiment (3)

Elapsed Times

dd if=/dev/zero of=/mnt/dummy.file bs=1024 count=43352000

Page 32: Linux for zSeries Performance Update

0 1 2 3 4 5

seconds [Thousands]

ext2, single disk, SLES7ext3, single disk, SLES7reiser, single disk, SLES7ext2, single disk, SLES8ext2, LVM, SLES8ext2, LVM, bs=4096, SLES8ext3, single disk, SLES8ext3, LVM, SLES8reiser, single disk, SLES8reiser, LVM, SLES8

A simple experiment (4)

CPU Times

dd if=/dev/zero of=/mnt/dummy.file bs=1024 count=43352000

Page 33: Linux for zSeries Performance Update

Visit us

http://www10.software.ibm.com/developerworks/opensource/linux390/whatsnew.shtml

Linux for zSeries Performance Website:

Linux-VM Performance Website:

http://www.vm.ibm.com/perf/tips/linuxper.html

Page 34: Linux for zSeries Performance Update

Questions

Page 35: Linux for zSeries Performance Update

Thank you!