presentation infiniband

Upload: alok-jha

Post on 06-Apr-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 Presentation Infiniband

    1/32

  • 8/3/2019 Presentation Infiniband

    2/32

    1.1 Conventional Bus Architecture

    System Controller

    (System-to-I/O-Bridge)

    Systembus

    CPU

    System Memory

    PCI to PCI

    Bridge

    PCI to PCI

    Bridge

    SCSI I/O

    Controller

    System-I/O Bus (PCI) #1

    I/O

    Controller

    Grahic I/O

    Controller

    PCI-Bus#2

    SCSI I/O

    Controller

    LAN I/O

    Controller

    PCI-Bus#3

    Some drawbacks of PCI:

    - P2P-Bridge needs for more devices

    - shared bandwith

    - uncontrolled termination

    - many pins for each connection

    - most disadvantage: cant support out of

    box

  • 8/3/2019 Presentation Infiniband

    3/32

    Super Computers

    and Mainframes

    Bunch of interconnected

    Linux machines

    Much Lower costBut, lower reliability, underutilization, Higher

    Complexity, Storage bottleneck

    Paradigm Shift

  • 8/3/2019 Presentation Infiniband

    4/32

    point-to-point switch-based interconnect

    designed for fault tolerance

    link has exactly one device connected

    provides scalability

    aggregate bandwidth increases as additional switches are

    added

    SWITCHED FABRIC

  • 8/3/2019 Presentation Infiniband

    5/32

    Endnode

    Endnode

    Endnode

    Endnode

    Endnode

    Switch

    Switch

    Switched fabric architecture

    Designed for high bandwith (2.5 up to 30Gb/s), with fault tolerance and scalability.

    Pushed by industry leaders like Sun, HP,IBM, intel, Microsoft, Dell.

    Switch fabric is directly a point to point interconnection, means, that every link has one device

    connect.

    Termination is well controlled and to every device the same.

    The I/O Performance greater within a fabric.

  • 8/3/2019 Presentation Infiniband

    6/32

    Contrasting the different Architecture

    We know, the PCI is the bus standard desgined to provide a low cost interface=> most I/O Connection into PC.

    The bandwith capabilities are not able to keep up the requirements that servers place on it.

    Today Servers need host cards like SCSI cards (soon Ultra329SCSI) GbEthernet, Clusteringcards and so on.

    So, PCI can not keep up with the I/O bandwith required by these device.

    Feature Fabric Bus

    Topology Switched Shared BusPin Count Low High

    Number Of End Points Many Few

    Max Signal Length KMs Inches

    Reliability Yes No

    Scalable Yes No

    Fault Tolerant Yes No

  • 8/3/2019 Presentation Infiniband

    7/32

    What Defines Infiniband?Infiniband is a Specification.

    A Switch Fabric Architecture Which Enables:

    Increased Network Bandwidth

    Improved ReliabilityFailover

    Loss-less Connectivity Support

    Shared Resources

    Lower CPU Utilization

  • 8/3/2019 Presentation Infiniband

    8/32

    Industry-standard specification

    A system interconnect fabric architecture

    Used between any combination of: Servers, Communication

    equipments, Storage devices and Embedded systems

    Low-latency

    High-bandwidth interconnections

    Low processing overhead

    Carry multiple traffic types over a single connection

    Infiniband Characteristics

  • 8/3/2019 Presentation Infiniband

    9/32

    IBA (simple)

    CPU

    System

    Controller

    System

    Memory

    HCA

    IB

    SwitchTCAI/O Controller TCA I/O Controller

    TCA

    I/O Controller

    Host Channel Adapters (HCA), Target Channel Adapter (TCA)

  • 8/3/2019 Presentation Infiniband

    10/32

    Infiniband: A layered hardware protocol

    1) Physical Layer

    Defines both electrical and mechanical

    characteristics for the system

    Includes cables and receptables for fibre

    and copper media, backplane connectors

    Defines three link speeds, 1X,4X,12X

    Each individual link is a 4-wire

    differential connection that provides a full

    duplex connection at 2.5 Gb/s

  • 8/3/2019 Presentation Infiniband

    11/32

    3.4.1 Physical Link

    1 x Link 4 x Link 12 x Link

    IB Link Signal Count ignaling Rat Data Rate

    Full

    Duplexed

    Data Rate

    [Gb/s] [Gb/s] [Gb/s] [Gb/s]

    1 X 4 2.5 2 4

    4 X 16 10 8 16

    12 X 48 30 24 48

    Note: Because the data is 8b/10b encoded, the actual raw bandwith

    is 2.0 Gb/s. Since bi-directional => 4Gb/s. With uses multi-port, the

    I/O bandwith will be additiv.

  • 8/3/2019 Presentation Infiniband

    12/32

    2) Link Layer

    Encompasses packet layout, point-to-pointoperations, and switching within a local subnet

    Packets- Two types: Management and Datapackets

    1. Management Packets- Used for linkconfiguration and maintenance

    2. Data Packets- Carry up to 4k bytes of atransaction payload

    Switching- Devices within a subnet have a 16-bitLocal ID assigned by the subnet manager. ThisLID is used for addressing.

  • 8/3/2019 Presentation Infiniband

    13/32

    Network Layer

    Handles routing of packets from one subnetto another(within a subnet a network layer isnot required)

    Packets contain a global route header(GRH)

    GRH contains the 128-bit IPv6 address

    Transport Layer

    Responsible for in-order packet delivery,channel multiplexing and transport services

    Also handles transaction data segmentationwhen sending, and reassembly whenreceiving

  • 8/3/2019 Presentation Infiniband

    14/32

    IBA Data Packet Format

    Start

    DelimiterData

    End

    DelimiterIdles

    Packet

    LRH GRH BTH ETH Payload I Data ICRC VCRC

    Upper Layer

    Transport Layer

    Network Layer

    Link Layer

    Local Routing Header (has 8Bytes), Global Routing Header (40B), Base Transport Header

    (12B), Extended Transport Header (4,8,16or28B), Data (0-4kB), Immediate Data (4Bytes),

    Invariant CRC (4B), Variant CRC (2B)

  • 8/3/2019 Presentation Infiniband

    15/32

    IBA Fabric

    Node

    Node

    Node

    NodeNode

    IBA Network

    At a high level, IBA is an interconnect for endnodes

  • 8/3/2019 Presentation Infiniband

    16/32

    IBA SubnetIBA

    Subnet

    IBA Subnet

    Router

    IBA Subnet

    Router

    EndNode

    EndNode

    EndNode

    EndNodeEndNode

    EndNodeEndNode

    IBA Network Components

    An IBA network is subdivided into subnets with interconnected by routers.

    Endnodes may attached to a single subnets or attach to more than one

    subnets.

  • 8/3/2019 Presentation Infiniband

    17/32

    Switch

    Switch

    Switch Switch

    Switch

    EndNodeEndNode

    EndNode

    EndNodeRouter

    Subnet Manager

    IBA Subnet Components

    An IBA subnet is composed as shown of endnodes, switches routers and a

    subnet manager. Each IB device possible attach to a single switch or is

    connected with more than one switch (or/and directly with each other).

  • 8/3/2019 Presentation Infiniband

    18/32

    IBA Components

    Links and Repeater

    Channel Adapter

    Switches

    Router

    Management Structure

  • 8/3/2019 Presentation Infiniband

    19/32

  • 8/3/2019 Presentation Infiniband

    20/32

    VL

    QP QPQP

    Channel Adapter

    VLVL

    Port

    VL VLVL

    Port

    VL VLVL

    Port

    QP

    DMA

    A CA has a DMA engine with special features, that allow remote and local DMA operations.Each port has ts own set of send and receive buffers.

    Buffering is channeled through VL (Virtual Lines), where each line has its own flow control.

    The implement Subnetmanager Agent (SMA) communicates with the subnet manager in the fabric.

    SMA

    Transport

    Memory

  • 8/3/2019 Presentation Infiniband

    21/32

    Packed Relay

    VL

    Switches

    VLVL

    Port

    VL VLVL

    Port

    VL VLVL

    Port

    IBA switches are the fundamental routing component for intra-subnet routing.

    Switches interconnect links by relaying packets between the links.

    Switches have two ore more ports between which packets are relayed

    Switch elements are forwarding tables.

    Switches can be configured to forward either to a single location or to multiple devices.

  • 8/3/2019 Presentation Infiniband

    22/32

    GRH Packed Relay

    VL

    Routers

    VLVL

    Port

    VL VLVL

    Port

    VL VLVL

    Port

    IBA router are the routing component for inter-subnet routing.

    Each subnet is uniquely identified with a subnet ID.

    The router reads the Global Route Header from the IPv6 network layer Address for forwarding the packets.

    Each router forwards the packet through the next subnet to another router until the packet reach the target subnet.

    The last router sends the packet as the Destination LID to the subnet.

    The subnet manager configures routers with information about the subnet.

  • 8/3/2019 Presentation Infiniband

    23/32

    IBA-Management

    IBA Management provides a subnet manager (SM)

    SM is an entity directly attached to a subnet: Responsible for configuration

    and managing switches, routers, an CAs.

    A SM can be implemented in other devices, such as a CA or a switch.

    configures each CA port with a range of LIDs, GIDs and subnetIDs.

    configures each switch with some LIDs, the subnetID, and with its forwarding

    database.

    link failover

    maintains the service databases for the subnet and provides a GUID to

    LID/GID resolution service.

    error reporting

    other services to ensure a solid connection

    d

  • 8/3/2019 Presentation Infiniband

    24/32

    Road to IB

    2001

    Venture Funding

    Early Product

    Development

    First silicon

    2002

    Early Pilots

    First Generation

    Beta Products

    1x Product

    4x Prototype

    2003

    Early Adopters

    Commercial

    Deployments 1x, 4x

    Large Vendor of

    IB Product

    Early Native IB

    Server / Storage

    Application / OS

    Support grows

    Continued early

    Adopters

    First Volume1x, 4x, 12x

    Growing Native IB

    for Server / Storage

    Application / OS

    Support grows futher

    2004

    Rapid Adoption

    1x, 4x, 12x

    Sizeable Native IB

    for Server / Storage

    Rapid Application /

    OS Support grows

    2005

    Rapid Market

    Adoption

    Close to 50% of

    Servers with IB

    Support

    Rapid Application /

    OS Support grows

    2006

  • 8/3/2019 Presentation Infiniband

    25/32

    First Vendors of IBA-Components

    IBM

    intel

    Dell

    Sun

    Microsoft

    HP

    Mellanox

    Voltaire

    Banderacom

    Infiniswitch

    VIEO

    JNI

    IBA

    System Vendors

    IB Vendors

  • 8/3/2019 Presentation Infiniband

    26/32

    Real Deployments Today: Wall

    Street Bank with 512 Node GridSAN LAN

    2 96-port

    TS-270

    23 24-port

    TS-120

    512 Server Nodes

    2 TS-360 w/ Ethernet and Fibre

    Channel Gateways

    Core

    Fabric

    Edge Fabric

    GRID I/O

    ExistingNetworks

    Fibre Channel and GigE

    connectivity built seamlessly

    into the cluster

    http://www1.us.dell.com/content/products/productdetails.aspx/pedge_1850?c=us&cs=555&l=en&s=bizhttp://www1.us.dell.com/content/products/productdetails.aspx/pedge_1850?c=us&cs=555&l=en&s=bizhttp://www1.us.dell.com/content/products/productdetails.aspx/pedge_1850?c=us&cs=555&l=en&s=bizhttp://www1.us.dell.com/content/products/productdetails.aspx/pedge_1850?c=us&cs=555&l=en&s=bizhttp://www1.us.dell.com/content/products/productdetails.aspx/pedge_1850?c=us&cs=555&l=en&s=bizhttp://www1.us.dell.com/content/products/productdetails.aspx/pedge_1850?c=us&cs=555&l=en&s=bizhttp://www1.us.dell.com/content/products/productdetails.aspx/pedge_1850?c=us&cs=555&l=en&s=bizhttp://www1.us.dell.com/content/products/productdetails.aspx/pedge_1850?c=us&cs=555&l=en&s=bizhttp://www1.us.dell.com/content/products/productdetails.aspx/pedge_1850?c=us&cs=555&l=en&s=bizhttp://www1.us.dell.com/content/products/productdetails.aspx/pedge_1850?c=us&cs=555&l=en&s=bizhttp://www1.us.dell.com/content/products/productdetails.aspx/pedge_1850?c=us&cs=555&l=en&s=bizhttp://www1.us.dell.com/content/products/productdetails.aspx/pedge_1850?c=us&cs=555&l=en&s=biz
  • 8/3/2019 Presentation Infiniband

    27/32

    520 Dual CPU Nodes

    1,040 CPUs

    NCSANational Centre for Supercomputing Applications

    CoreFabric

    Edge Fabric

    6 72-portTS270

    29 24-portTS120

    174 uplinkcables

    512 1mcables

    18 ComputeNodes

    18 ComputeNodes

    Parallel MPI codes for commercial clients

    Point to point 5.2us MPI latency

    Deployed: November 2004

  • 8/3/2019 Presentation Infiniband

    28/32

    D.E. Shaw Bio-Informatics:

    Fault

    Tolerant

    Core

    Fabric

    Edge Fabric

    12 96-port

    TS-270

    89 24-port

    TS-120

    1,068 5m/7m/10m/15m

    uplink cables

    1,066 1m

    cables

    12 Compute

    Nodes12 Compute

    Nodes

    1,066 Fully Non-Blocking Fault Tolerant IB Cluster

  • 8/3/2019 Presentation Infiniband

    29/32

    Advantages

    Superior performance

    Low-latency

    High-efficiency

    Fabric consolidation and low energy usage

    Reliable, stable connections

    Data Integrity

    Highly interoperable environment

  • 8/3/2019 Presentation Infiniband

    30/32

    Drawbacks

    Complex in design

    Few platforms supports it as yet

    Bleeding edge, for now, so users will need

    to perform extensive testing

  • 8/3/2019 Presentation Infiniband

    31/32

  • 8/3/2019 Presentation Infiniband

    32/32

    Q&A