s yste m io n e tw o rk e vo lu tio n c lo sin g th e r e...

14
1 Systems and Technology Group Copyrighted, International Business Machines Corporation, 2006 System IO Network Evolution Closing the Requirement Gaps Renato Recio IBM Distinguished Engineer Chief Architect, eServer IO 2 IBM Systems & Technology Group Copyrighted, International Business Machines Corporation, 2006 Agenda ! System IO network trends ! Socket performance trends ! Internal IO evolution ! Cluster network and LAN evolution ! Requirement Gaps ! Management Complexity ! Acceleration

Upload: nguyenkhue

Post on 10-Apr-2018

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: S yste m IO N e tw o rk E vo lu tio n C lo sin g th e R e ...bnrg.eecs.berkeley.edu/~randy/Courses/CS294.F07/2.2.pdfB a sics Ð S e rve r IO N e tw o rk T yp e s! Processor I O Li

1

Systems and Technology Group

Copyrighted, International Business Machines Corporation, 2006

System IO Network Evolution

Closing the Requirement Gaps

Renato RecioIBM Distinguished EngineerChief Architect, eServer IO

2IBM Systems & Technology Group Copyrighted, International Business Machines Corporation, 2006

Agenda

! System IO network trends!Socket performance trends! Internal IO evolution!Cluster network and LAN evolution

! Requirement Gaps!Management Complexity!Acceleration

Page 2: S yste m IO N e tw o rk E vo lu tio n C lo sin g th e R e ...bnrg.eecs.berkeley.edu/~randy/Courses/CS294.F07/2.2.pdfB a sics Ð S e rve r IO N e tw o rk T yp e s! Processor I O Li

2

3IBM Systems & Technology Group Copyrighted, International Business Machines Corporation, 2006

HighEnd

Servers

Storage Network

Local Area Network

Cluster Network

Basics – Server IO Network Types

! Processor IO Link connects CPU to IO components over a short distance (5-10”).! Local IO Networks (inches) and Remote IO Networks (few meters, across chassis)

aggregate external network links, by using bridges and switches to attach IO adapters.! IO adapters attach through a memory mapped IO semantic link (i.e. PCI family).! Switches connect links of the same type in a network.! Bridges are used to connect different link types at the network’s edges.

! Cluster networks connect servers to other servers (typical range is a data center, but may span data centers).

! Storage Networks connect servers to storage (similar range as cluster networks).! Local Area Networks connect servers to a wide variety of computing devices, such as:

NAS servers, file servers, printers, clients (typical range is data center or local campus, but may span beyond that).

Mem

ory Processor

Chip

LocalHub

Processor IO LinkM

emor

y ProcessorChip

LocalHub

Processor IO Link

Remote IONetwork

IO SlotsRemoteBridge

Adapters

Mem

ory Processor

Chip

Processor IO Link

HostBridge

High VolumeServers

Processor IO Link

Adapter

IO Slots (PCI Family)

HostBridge

ProcessorChip

Mem

ory

ProcessorChip

USB2.0 Local

Bridge

Local IO Network

4IBM Systems & Technology Group Copyrighted, International Business Machines Corporation, 2006

Microprocessor Socket Performance Trends

! CMOS performance growth is slowing down.! Designs are being optimized for power-efficiency.

! Instruction-level parallelism drives significant complexity and power inefficiency.! Diminishing returns - highly non-linear benefit for

added circuits and complexity.! Multiple cores and multithreading are being

used to achieve chip-level parallelism.! Takes advantage of continued growth in CMOS

density, with improved power efficiency and performance.

Time

8-16 LogicalProcessors4-8 Logical

Processors

>16 LogicalProcessors

4 Cores1 Thread

8-16 Cores1 Thread

2 Cores1 Thread

4+ Cores2 Threads

2-4 Cores2 Threads

4-8 Cores2-4 Threads

1 Core2 Threads

1 Core1 Thread

PerformanceOptimized

PowerEfficiencyOptimized

Device structuresGate Dielectrics

Carrier MobilityInterconnect Dielectrics

InterconnectConductivity

Silicon on Insulator

30% CGR

15% CGR

Chip-Level Parallelism RoadmapsNumber of Circuits or Power Usage

Socket Level Performance

MultiplePhysical or Logical

Processors

One Processor:Instruction-Level

Parallelism

uPCoreL2 Cache

uPCore

uPCore

L2 Cache

Relative Transistor Performance

.1

1

10

1997 2002 2007 2012

Page 3: S yste m IO N e tw o rk E vo lu tio n C lo sin g th e R e ...bnrg.eecs.berkeley.edu/~randy/Courses/CS294.F07/2.2.pdfB a sics Ð S e rve r IO N e tw o rk T yp e s! Processor I O Li

3

5IBM Systems & Technology Group Copyrighted, International Business Machines Corporation, 2006

Microprocessor Socket IO Link Performance Trends

! High bandwidth application environments (HPC and BI) fuel demand for higher IO link bandwidths.

! Microprocessor IO link performance growth is not slowing down.! 10x increase every 10-11 years.! uP IO link is keeping pace with IO demands of

high bandwidth applications running on the socket’s multiple processors.

Microprocessor IO Link

.01

.1

1

10

100

1997 2002 2007 2012

GB/

s

ASCI Purple Req’sBI Req’s

uP IO Link Bandwidth

High Bandwidth App Requirements

.01

.1

1

10

100

1997 2002 2007 2012

GB/

s

ASCI Purple Req’sBI Req’s

Microprocessor IO Link

.01

.1

1

10

100

1997 2002 2007 2012

GB/

s

ASCI Purple Req’sBI Req’s

uP IO Link BandwidthEthernet Link Bandwidth

! High volume external networks (i.e. Ethernet) must be aggregated to meet system level performance demand.

6IBM Systems & Technology Group Copyrighted, International Business Machines Corporation, 2006

Internal IO Evolution of High Volume Servers

! Adapter slots continue to be based on PCI Family!PCI-X DDR " PCIe Gen 1 " PCIe Gen2

! Higher bandwidth I/O slots may simplify the I/O topology in some scale-up environments.! For example, a single 8x PCIe Gen2 slot (4 GB/s)

supports four 10 GigE ports;!Reducing the number of servers that need an I/O

Expansion Network.

Adapter

IO Slots (PCI Family)

HostBridge

ProcessorChip

Mem

ory

ProcessorChip

USB2.0 Local

Bridge

2 Socket4 Socket

Gen 1Gen 2

Gen 3

System Level Total I/O Bandwidth

.1

1

10

100

1997 2002 2007 2012

GB/

s

I/O Attachment Slot Performance

.01

.1

1

10

100

1997 2002 2007 2012

GB/

s

PCI/PCI-X PCI-Exp.

Page 4: S yste m IO N e tw o rk E vo lu tio n C lo sin g th e R e ...bnrg.eecs.berkeley.edu/~randy/Courses/CS294.F07/2.2.pdfB a sics Ð S e rve r IO N e tw o rk T yp e s! Processor I O Li

4

7IBM Systems & Technology Group Copyrighted, International Business Machines Corporation, 2006

System Level Total I/O Bandwidth

.01

.1

1

10

100

1000

10000

1997 2002 2007 2012

Band

wid

th (G

B/s)

Internal IO Evolution of Large SMP Servers

! Large SMPs use remote IO networks to increase number of PCI IO slots per system.! These networks are typically proprietary and support advanced functions, such as:

nMultiple roots, multiple paths, automated switchover, and PCI transaction tunneling.! In the future, PCIe may be used as a remote IO network.

Mem

ory Processor

Chip

LocalHub

Processor IO LinkM

emor

y ProcessorChip

LocalHub

Processor IO Link

Remote IONetwork

IO SlotsRemoteBridge

Adapters

Mem

ory Processor

Chip

Processor IO Link

HostBridge Low-end

Mid-rangeHigh-end

8IBM Systems & Technology Group Copyrighted, International Business Machines Corporation, 2006

Cluster Network Evolution

! HPC cluster networks can be classified as:!Message passing using

nPlain, vanilla IP/Ethernet network stack.nStandard RDMA

!InfiniBand RDMA stack, with or without proprietary extensions.!iWARP RDMA stack, with or without proprietary extensions (has not entered the top 500 yet)

nIHV custom, with additional proprietary extensions (e.g. collectives, adaptive multipathing, …)!Distributed memory

! Plain, vanilla (no vendor flavors added) Ethernet family has grown significantly in HPC recently, to nearly 50% of Top500.! 10 GigE has not made its debut yet, likely to occur this year.

! Given above data, how much of HPC market really cares about interconnect bandwidth?

0102030405060708090

100OtherCrayHiPPIIBMMyrinetQuadricsInfiniBandEthernet

1997 2000 2002 2005

Interconnect Family

Functions

Interconnect Performance vs Function

Message PassingDistributed

Memory

IHVCustom

ServerVendorCustom

EthernetFamily

Perfo

rman

ce(B

andw

idth

and

Lat

ency

)

InfiniBand Family

RDMACollective

Ops, …

Page 5: S yste m IO N e tw o rk E vo lu tio n C lo sin g th e R e ...bnrg.eecs.berkeley.edu/~randy/Courses/CS294.F07/2.2.pdfB a sics Ð S e rve r IO N e tw o rk T yp e s! Processor I O Li

5

9IBM Systems & Technology Group Copyrighted, International Business Machines Corporation, 2006

State of Cluster Networks in 2006 - Bandwidth and Latency

02468

10121416

IB 1.0 4x IB 1.0 12x Proprietary 1 Proprietary 2 Proprietary 3 10 GE RNIC 10 GE RNICLL Switch

Link

Switch

System stack

256 Byte Message Latencies (2 Hop fabric)

0

5

10

15

20

25

IB 1.0 4x IB 1.0 12x Proprietary 1 Proprietary 2 Proprietary 3 10 GE RNIC(4Q/06)

10 GE RNIC + LL Switch

Link

Switch

System stack

8 KByte Message Latencies (2 Hop fabric)

Mic

rose

cond

sM

icro

seco

nds

10IBM Systems & Technology Group Copyrighted, International Business Machines Corporation, 2006

Projection of 2011 Cluster Networks - Bandwidth and Latency

0

1

2

3

4

5

6

IB DDR 4x IB DDR 12x IB QDR 4x IB QDR 12x Proprietary 1 Proprietary 2 100 GE RNIC 100 GE RNICLL Switch

Link

Switch

System stack

256 Byte Message Latencies (2 Hop fabric)

0123456789

10

IB DDR 4x IB DDR 12x IB QDR 4x IB QDR 12x Proprietary1

Proprietary2

100 GERNIC

100 GERNIC LLSwitch

LinkSwitchSystem stack

8 KByte Message Latencies (2 Hop fabric)

Mic

rose

cond

sM

icro

seco

nds

Page 6: S yste m IO N e tw o rk E vo lu tio n C lo sin g th e R e ...bnrg.eecs.berkeley.edu/~randy/Courses/CS294.F07/2.2.pdfB a sics Ð S e rve r IO N e tw o rk T yp e s! Processor I O Li

6

11IBM Systems & Technology Group Copyrighted, International Business Machines Corporation, 2006

State of Standard Cluster and Local Area Networks in 2006 - Stacks

AdapterFunctions

WireProtocol

HardwareInterface

APIs

Type

OS Network

Stack

Sockets

COTS NIC 1

COTS NIC

Ethernet LinkEthernet Phy

Application

OS

TCP

- Checksum- Large Send- Interrupt Coalesce

Commodity Of The Shelf NICs

- - - - - - - - - - - - - TCP/UDP/IP - - - - - - - - - - - -

- - - - - - - - - - - - - Proprietary - - - - - - - - - - - --

Traditional Network Stack

SDPRNIC Lib

CM

Application

OS

- - RDMA over TCP/IP - -

- - - - - - - - - - - - RDMA Verbs - - - - - - - - - - - -

Industry IBStack

TCP/IP

RNIC

Ethernet LinkEthernet Phy

MPARDMA/DDP

Industry iWARPStack

Custom NIC

Custom NIC

Application

Ethernet LinkEthernet Phy

- Checksum- Large Send- Multiple Queues- Direct Placement- Proprietary

Sockets Lib

NIC DriverTCP/IP

CM

Custom TCP Acceleration

StackStack

InfiniBandHCA

iWARPRDMA enabled NIC

CustomNIC

IB Network

HCA

IB LinkIB Phy

IB Transport

uDAPL

NIC DriverIP IB HCA Lib

Application

MPIuDAPL

- - RDMA over IB - -

SocketsSDP

SocketsMPI

12IBM Systems & Technology Group Copyrighted, International Business Machines Corporation, 2006

CPU Efficiency and Application Speed Up

! Industry is focusing on RDMA based network stack for data center communications.! Benefits long-lived connections.!Not well suited for short-lived connections.!Requires proprietary extensions to fully optimize HPC market needs

(e.g. collectives, multipathing with adaptive routing, etc…).! For short-lived connections, TCP/IP acceleration is better (not shown above).

! TCB is not handed off to adapter.! Adapter has multiple queues and minimal state.!Utilizes proprietary network stack function distribution.

! Benefit of network stack offload (IB or iONIC) depends on the ratio of:! Application/Middleware instructions to network stack instructions.

Send and Receive Pair Path Lengths

100

1000

10000

100000

1000000

1 10 100 1000 10000 100000Transfer size in bytes

CPU

Inst

ruct

ions

Sockets over NICSockets over SDP / RDMAuDAPL / IT API

Asynchronous Application Speed Up

123456789

10

1 10 100 1000 10000 100000Transfer size in bytes

Spee

d U

p M

ultip

le

10 M Ins5 M Ins 1 M ins500 K ins100 K ins50 K ins10 K ins

Number of Application instructions

Page 7: S yste m IO N e tw o rk E vo lu tio n C lo sin g th e R e ...bnrg.eecs.berkeley.edu/~randy/Courses/CS294.F07/2.2.pdfB a sics Ð S e rve r IO N e tw o rk T yp e s! Processor I O Li

7

13IBM Systems & Technology Group Copyrighted, International Business Machines Corporation, 2006

System Level View of Network Stack Offload

XML and Java have very high (10:1 to 50:1) App::Net instruction ratios.

ClientTier

BrowserUser

Web

Ser

ver

Presentation Server

PresentationData

PresentationTier

DB

Clie

nt/R

epl.

Web Application

Server

ApplicationData

ApplicationTier

Business Function

Server: OLTP & BI DB; HPC

BusinessData

Business FunctionTier

XML & Java overheads must be reduced for NC to helpSocketsLow-level (uDAPL, ICSC) support most beneficialBlock or file IO (e.g. iSCSI/iSER, NFSeR)

Legend

0% to 50%

0% to 10%?

4% to 50%

TBD

5% to 6%

5% to 25%

?

14IBM Systems & Technology Group Copyrighted, International Business Machines Corporation, 2006

Agenda

! System IO network trends!Socket performance trends! Internal IO evolution!Cluster network and LAN evolution

! Requirement Gaps!Management Complexity!Acceleration

Page 8: S yste m IO N e tw o rk E vo lu tio n C lo sin g th e R e ...bnrg.eecs.berkeley.edu/~randy/Courses/CS294.F07/2.2.pdfB a sics Ð S e rve r IO N e tw o rk T yp e s! Processor I O Li

8

15IBM Systems & Technology Group Copyrighted, International Business Machines Corporation, 2006

Enterprise Data Center Management

! Above is a depiction of a real customer’s application work flow (page 1 of 2).! It captures Middleware, OS, and System interactions.! But it still doesn’t capture the full complexity of the environment, because it is missing:

n Layers of manageable components;n Manageable attributes per layer; andn Management system function variance.

16IBM Systems & Technology Group Copyrighted, International Business Machines Corporation, 2006

Quantifying the Management Complexity Problem

! Expenses related to labor intensive tasks dominate IT budgets.!Significant opportunities exist in simplifying or eliminating these tasks.

! Networking contributes to most components of management spending.! HPC data centers may have simpler work flows, network structures, and

overall system homogeneity.!However, in discussions with some HPC customers, they share the same problem.

15%7%8%11%

12%13%

15%

19%Initial systemand softwaredeployment

Planning for upgrades, expansion, and

capacity

Upgrades, patches, etc.

Systemmonitoring

Systemmaintenance

Other

15%7%8%11%

12% 13%

15%

19%

Components ofManagement Spending

IDC Survey, 2002-2004

Maintenanceand tuning

MigrationServer

SpendingSource: IDC, 2004

20406080

100120140160

‘96 ’98 ‘00 ’02 ’04 ’06 ’08

Spending($B)

ManagementSpending

Management and Server Spending Outlook

Page 9: S yste m IO N e tw o rk E vo lu tio n C lo sin g th e R e ...bnrg.eecs.berkeley.edu/~randy/Courses/CS294.F07/2.2.pdfB a sics Ð S e rve r IO N e tw o rk T yp e s! Processor I O Li

9

17IBM Systems & Technology Group Copyrighted, International Business Machines Corporation, 2006

Network Contributions to the Management Complexity Problem

! Expenses related to labor intensive tasks dominate IT budgets.!Significant opportunities exist in simplifying or eliminating these tasks.

! Networking contributes to most components of management spending.! HPC data centers may have simpler work flows, network structures, and

overall system homogeneity.!However, in discussions with some HPC customers, they share the same problem.

15%7%8%11%

12%13%

15%

19%Initial systemand softwaredeployment

Planning for upgrades, expansion, and

capacity

Upgrades, patches, etc.

Systemmonitoring

Systemmaintenance

Other

15%7%8%11%

12% 13%

15%

19%

Components ofManagement Spending

IDC Survey, 2002-2004

Maintenanceand tuning

MigrationServer

SpendingSource: IDC, 2004

20406080

100120140160

‘96 ’98 ‘00 ’02 ’04 ’06 ’08

Spending($B)

ManagementSpending

Management and Server Spending Outlook

Source: Network World Application Performance Market Study, Aug 2003

Delivery network (actual):99.2% availability

Data center (target):99.999% availability

345 bad minutes

Bad

min

utes

per

mon

th26 bad seconds

Blackouts: 91 minutes (26%)

Brownouts: 254 minutes(74%)

Network instability issues dominate user experience problems

How do you know there is an application slowdown on your network?

Source: Enterprise customer case study,

Jan 2003

Sources of above data: Jim Rymarczyk, System Virtualization Strategy; and Donna Dillenberger, Policy Based Adaptive Networking for the Enterprise, AoT ODN Presentation, 7/05

15%7%8%11%

12%13%

15%

19%Systemmaintenance

Other

15%7%

15%

Network RAS Contributesto Management Spending

Maintenanceand tuning

18IBM Systems & Technology Group Copyrighted, International Business Machines Corporation, 2006

Network Contributions to the Management Complexity Problem

! Expenses related to labor intensive tasks dominate IT budgets.!Significant opportunities exist in simplifying or eliminating these tasks.

! Networking contributes to most components of management spending.! HPC data centers may have simpler work flows, network structures, and

overall system homogeneity.!However, in discussions with some HPC customers, they share the same problem.

15%7%8%11%

12%13%

15%

19%Initial systemand softwaredeployment

Planning for upgrades, expansion, and

capacity

Upgrades, patches, etc.

Systemmonitoring

Systemmaintenance

Other

15%7%8%11%

12% 13%

15%

19%

Components ofManagement Spending

IDC Survey, 2002-2004

Maintenanceand tuning

MigrationServer

SpendingSource: IDC, 2004

20406080

100120140160

‘96 ’98 ‘00 ’02 ’04 ’06 ’08

Spending($B)

ManagementSpending

Management and Server Spending Outlook

Source: Network World Application Performance Market Study, Aug 2003

Delivery network (actual):99.2% availability

Data center (target):99.999% availability

345 bad minutes

Bad

min

utes

per

mon

th

26 bad secondsBlackouts: 91 minutes

(26%)

Brownouts: 254 minutes(74%)

Network instability issues dominate user experience problems

How do you know there is an application slowdown on your network?

Source: Enterprise customer case study,

Jan 2003

Sources of above data: Jim Rymarczyk, System Virtualization Strategy; and Donna Dillenberger, Policy Based Adaptive Networking for the Enterprise, AoT ODN Presentation, 7/05

15%7%8%11%

12%13%

15%

19%Systemmaintenance

Other

15%7%

15%

Network RAS Contributesto Management Spending

Maintenanceand tuning

Example of how network RAS management impacts the management complexity problem:

–Distributed networking creates a complex bowl of spaghetti that requires 10 people around the table to figure out what the source of a problem is. Our biggest need is a tool that lets them find the root cause of a problem. – Name, company, and system types kept confidential to protect the guilty.

–“I am interested in a network performance tool, that allows me to determine which component in the system is causing a bottleneck, so I know who to call. Otherwise, the network vendor says, it’s the storage; the storage vendor says, it’s the server; and the server vendor points me back to the network!” - Renato Recio paraphrasing an IT manager (name, company, and system types kept confidential to protect the guilty).

–Customers want the ability to perform their business process instead of tracking low level elements. We need proactive tools that: detect when a network component, serving hundreds of users is about to fail; creates an event that allows us to work around that component; and identifies the business processes that may be impacted by the impending outage. IT engineer (name, company, and system types kept confidential to protect the guilty).

Page 10: S yste m IO N e tw o rk E vo lu tio n C lo sin g th e R e ...bnrg.eecs.berkeley.edu/~randy/Courses/CS294.F07/2.2.pdfB a sics Ð S e rve r IO N e tw o rk T yp e s! Processor I O Li

10

19IBM Systems & Technology Group Copyrighted, International Business Machines Corporation, 2006

Principles for Reducing Management Costs

Simplify or automate management tasks (Reduce chance for human error)

Use stateless components wherever possible(Simplifies provisioning and workload mgt)

Centralize resource management control points (Avoid interdependent mgt points)

Avoid interlocking dependencies(Decouple hardware & software

dependencies)

Don’t touch(Reduce change frequency)

Use standardized building blocks(Reduce number of special components)

Don’t over-manage(Over provision if cheaper)

System and application consolidation(Reduce component number, type, & location)

Management ImprovementsComplexity Reduction

Server Spending

Source: IDC, 2004

20406080

100120140160

‘96 ’98 ‘00 ’02 ’04 ’06 ’08

Spending($B)

ManagementSpending

Management and Server Spending Outlook

20IBM Systems & Technology Group Copyrighted, International Business Machines Corporation, 2006

Blades - One step in simplifying data center managementExamples of the cable mess problem Blade Chassis solution

Complexity Reduction! Cable reduction

!45 vs. 209 cables per rack!Reduced failure points, misconnections, bumping,

time to make configuration changes! Flexible, standard building blocks

!Can house Intel, Power, and AMD CPU blades! “Appliance-like" deployment platform ! Integrated switches

Management Improvements! Simpler management/integration platform

!Simpler and faster to deploy than rack optimized!Faster function and capacity rollouts!Cost savings for customer and manufacturer!Centralized chassis management

! Reliability / Availability / Serviceability!Easier to replace or upgrade nodes

• 14 blades per chassis• Integrated switches• Common chassis level management

Page 11: S yste m IO N e tw o rk E vo lu tio n C lo sin g th e R e ...bnrg.eecs.berkeley.edu/~randy/Courses/CS294.F07/2.2.pdfB a sics Ð S e rve r IO N e tw o rk T yp e s! Processor I O Li

11

21IBM Systems & Technology Group Copyrighted, International Business Machines Corporation, 2006

Fabric Convergence - Another step in simplifying data center management

! One link type reduces TCO, even if each network type requires a dedicated adapter.!Management can be simplified, because only one network type is being management.

! Several standard technologies are necessary to enable network convergence.!RDMA based infrastructure: APIs, session level protocols (e.g. NFSeR, SDP, iSER), HCAs/RNICs.! iSCSI based infrastructure: session level protocols (e.g. discovery), iSCSI or iSER HBAs.!High performance network infrastructure: HCAs/RNICs (for IT, TCP/IP accelerators), low latency

switches, high B/W links, virtual lanes with lane based buffer and congestion management.! Improved network management suite.

Storage Network

Local Area Network

Cluster NetworkToday’sServer

Adapter

Adapter

Adapter

Storage Network

Local Area Network

Cluster NetworkCase A

Adapter

Adapter

Adapter

• Network links are optimized to the needs of a specific network type.

• Each network link has unique hardware and management infrastructure.

• Ethernet link (esp. with DCE) will be good enough to consolidate most network types

• IB can be used when performance is critical.

One or more of these:

Merged Storage, Cluster, and LAN

Case B

Adapter

Common, dedicated networks:

Common, shared

network:

IB or Enet

22IBM Systems & Technology Group Copyrighted, International Business Machines Corporation, 2006

Comparing IB vs Ethernet as Convergence Fabrics

=/++Self-management

++

-/+

=-

+= (RNIC) / + (Accel)

+==+

+

Option AIP Ethernet (w/ DCE)

(RNIC + TCP/IP Accel)

=-+

==

==++=+

+

Option BIB

(HCA)

Congestion management

Distance

Scaling (pipelines, scale-up/out)

Fabric virtualization

Standardization

Latency (both are destination routed)

Bandwidth (depends on what you need)

CPU efficiency (zcopy, zkernel)

Security (encryption, authorization..)

QoS (service levels, management)

CostPrice

Criteria

+ = Satisfies requirement - = Does not satisfy requirement= Partially satisfies requirement=

Page 12: S yste m IO N e tw o rk E vo lu tio n C lo sin g th e R e ...bnrg.eecs.berkeley.edu/~randy/Courses/CS294.F07/2.2.pdfB a sics Ð S e rve r IO N e tw o rk T yp e s! Processor I O Li

12

23IBM Systems & Technology Group Copyrighted, International Business Machines Corporation, 2006

Option X - Multi-host PCIe IO Virtualization

! IO Virtualization will impact SHV host, switches and adapters.!Host impact adds proprietary functions added to microprocessors.! Switch and adapter impacts add PCI standard functions.

! PCI IO Virtualization will be next major impact to PCIe architecture.! 16 companies (includes HP, Sun, Intel, and MS) are pursuing these standards in the PCI-SIG.! Two function sets are being pursued by industry: single-root IOV and multi-root IOV.

! Single-root IOV enables sharing expensive PCI adapters in blade environments!Uses standard, native IOV functions in adapter with proprietary IOV enablers in host.! A couple of adapter vendors will start with proprietary solutions, then migrate to standard.

! Multi-root (MR) IOV adds standard, native IOV MR functions in adapter and switch.

TodayCPU Blades

EnetFC

MuP, $

MuP, $ PCI-E

RootEnet

Switch

Switches CPU Blades Switches

PCIeSwitch

EnetEnetM

uP, $

MuP, $ PCI-E

RootEnetFC

MuP, $

MuP, $ PCI-E

Root EnetFC

MuP, $

MuP, $ PCI-E

Root EnetFC

MuP, $

MuP, $ PCI-E

Root

$4000$4500Switch$7500$11,500Total = $19,000

$250$500Cost/adapter1414Adapters

4 Gb FC10 GigESample prices

$4500$4500

n/an/a

MR PCIe

n/an/aSwitch (may be much less)$1000$2000Total = $10,500

$250$500Cost/adapter44Adapters

FCEnetSample prices

Cost saving option in 2008-2010

MuP, $

MuP, $ PCI-E

Root

MuP, $

MuP, $ PCI-E

Root

MuP, $

MuP, $ PCI-E

Root

Slide Owner: Holland/Recio

24IBM Systems & Technology Group Copyrighted, International Business Machines Corporation, 2006

Potential Enhancement

Areas

I/O SubsystemEnhancements

&I/O Attached

Special-PurposeHardware

Placement Options

MicroprocessorChip

Enhancements

Network-AttachedSpecial-Purpose Server

Special-Purpose Adapter

I/O HubExtensions

Memory Subsystem Extensions

Application Tailored Cores

SpecialProcessors

CoreExtensions

System Hardware Enhancements – Approaches

PCI-EAdapter

ProcessorChip

Mem

oryProcessor

Chip

LocalHub

PCI-E

LocalHub

ProcessorChip

Mem

oryProcessor

Chip

PCI-E

ProcessorChip

Mem

oryProcessor

Chip

LocalHub

Server

Network

CacheMem Ctrl

CacheMem Ctrl

CacheMem Ctrl

CacheMem Ctrl

# XML performance# Java performance# Messaging overhead, latency, and bandwidth

# Virtualization enhancements# Cryptography# Trusted Platform Module

Page 13: S yste m IO N e tw o rk E vo lu tio n C lo sin g th e R e ...bnrg.eecs.berkeley.edu/~randy/Courses/CS294.F07/2.2.pdfB a sics Ð S e rve r IO N e tw o rk T yp e s! Processor I O Li

13

25IBM Systems & Technology Group Copyrighted, International Business Machines Corporation, 2006

SAS

IO Expansion

Proprietary IB Based

IO Slots

LAN

Storage

Cluster

IO ExpansionRemote IO

ProprietaryIBFCEthernetPCI-EPCI-XNetwork Type

HighEnd

Servers

Storage Network

Local Area Network

Cluster Network

Mem

ory Processor

Chip

LocalHub

Processor IO LinkM

emor

y ProcessorChip

LocalHub

Processor IO Link

Remote IONetwork

IO SlotsRemoteBridge

Adapters

Mem

ory Processor

Chip

Processor IO Link

HostBridge

High VolumeServers

Processor IO Link

Adapter

IO Slots (PCI Family)

HostBridge

ProcessorChip

Mem

ory

ProcessorChip

USB2.0 Local

Bridge

Local IO Network

26IBM Systems & Technology Group Copyrighted, International Business Machines Corporation, 2006

Source: Electronics, April, 1965

Moore's Law - Commentary in 1965 Paper

Page 14: S yste m IO N e tw o rk E vo lu tio n C lo sin g th e R e ...bnrg.eecs.berkeley.edu/~randy/Courses/CS294.F07/2.2.pdfB a sics Ð S e rve r IO N e tw o rk T yp e s! Processor I O Li

14

Systems and Technology Group

Copyrighted, International Business Machines Corporation, 2006

Questions?

Thank You