ccix: a new coherent multichip...

19
© 2017 Arm Limited David Koenen / Jeff Defilippi CCIX: a new coherent multichip interconnect for accelerated use cases Senior Product Manager Arm Tech Symposia

Upload: others

Post on 27-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CCIX: a new coherent multichip interconnectarmtechforum.com.cn/attached/article/C7_CCIX20171226161955.pdf · • Arm’s DynamIQ and CoreLink CMN-600 technology • Cadence CCIX and

© 2017 Arm Limited

David Koenen / Jeff Defilippi

CCIX: a new coherent multichip interconnect

for accelerated use cases

Senior Product Manager

Arm

Tech Symposia

Page 2: CCIX: a new coherent multichip interconnectarmtechforum.com.cn/attached/article/C7_CCIX20171226161955.pdf · • Arm’s DynamIQ and CoreLink CMN-600 technology • Cadence CCIX and

© Arm 2017 2

Interconnects for different scale

SoC interconnect

• Connectivity for on-chip processor, accelerator, IO and memory elements.

Server node interconnect - ‘scale-up’

• Simple multichip interconnect (typically PCIe) topology on a PCB motherboard with simple switches and expansion connectors.

Rack interconnect - ‘scale-out’

• Scale-out capabilities with complex topologies connecting 1000’s of server nodes and storage elements.

Tech Symposia

Page 3: CCIX: a new coherent multichip interconnectarmtechforum.com.cn/attached/article/C7_CCIX20171226161955.pdf · • Arm’s DynamIQ and CoreLink CMN-600 technology • Cadence CCIX and

© Arm 2017 3

Key drivers for interconnect technology

• Decline of Moore’s law forcing more heterogeneous compute

• Big data analytics growing at 11.7% CAGR

• 5G wireless applications requiring 10x more bandwidth, 10x lower latency by 2021

• Increase in distributed data forcing more network intelligence at faster data rates (10GbE -> 100GbE -> 400GbE)

• Data bandwidth and sharing growth projected at 10x-50x increase vs present PCIe by 2021

Tech Symposia

Page 4: CCIX: a new coherent multichip interconnectarmtechforum.com.cn/attached/article/C7_CCIX20171226161955.pdf · • Arm’s DynamIQ and CoreLink CMN-600 technology • Cadence CCIX and

© Arm 2017 4

CCIXTM cache coherent interconnect for accelerators

New class of interconnect for accelerated applications

Mission of the CCIX Consortium is to develop and promote adoption of an industry standard specification to enable coherent interconnect technologies between general-purpose processors and acceleration devices for efficient heterogeneous computing.

https://www.ccixconsortium.com/

Tech Symposia

Page 5: CCIX: a new coherent multichip interconnectarmtechforum.com.cn/attached/article/C7_CCIX20171226161955.pdf · • Arm’s DynamIQ and CoreLink CMN-600 technology • Cadence CCIX and

© Arm 2017 5

CCIX Consortium Inc

• Formed January 2016, incorporated

in February 2017

• Complete ecosystem with 34

members and growing

• Hardware specification available for

design starts for member companies

• CCIX pronounced: (c’ siks)Arteris, Inc. Guizhou Huaxintong Semiconductor Technology Co. Ltd.INVECAS INC Netronome Phytium Technology Co., Ltd. PLDA Shanghai Zhaoxin Semiconductor Co., Ltd. Silicon Laboratories Inc.SmartDV Technologies India Private Ltd.

Promoters

Contributors

Adopters

Tech Symposia

Page 6: CCIX: a new coherent multichip interconnectarmtechforum.com.cn/attached/article/C7_CCIX20171226161955.pdf · • Arm’s DynamIQ and CoreLink CMN-600 technology • Cadence CCIX and

© Arm 2017 6

Applications benefiting from CCIX

4G and 5G base station

Data-center Search

Embedded Computing

High Performance (Super)Computing

In memory database processing

Intelligent network acceleration

Machine / Deep Learning

Mobile Edge Computing

Video analytics

Tech Symposia

Page 7: CCIX: a new coherent multichip interconnectarmtechforum.com.cn/attached/article/C7_CCIX20171226161955.pdf · • Arm’s DynamIQ and CoreLink CMN-600 technology • Cadence CCIX and

© Arm 2017 7

CCIX multichip connectivity

High performance, low latency

• CCIX defines 25GT/s (3x performance*)

• Examining 56GT/s (7x performance*) and beyond

• Enabling low latency via light transaction layer

Flexible, scalable interconnect topologies

• Flexible point-to-point, daisy chained and switched topologies

Seamless integration

• Runs on existing PCIe transport layer and management stack

• Supports all major instruction set architectures (ISA)

Processor

Accelerator

Smart Network

PersistentMemory

Switch

Tech Symposia

Page 8: CCIX: a new coherent multichip interconnectarmtechforum.com.cn/attached/article/C7_CCIX20171226161955.pdf · • Arm’s DynamIQ and CoreLink CMN-600 technology • Cadence CCIX and

© Arm 2017 8

System topology examples

Accelerator

CCIX

Switch

Processor

CCIX

Processor

CCIX

Memory

CCIX

Memory

CCIX

Processor

CCIX

Accelerator

CCIX

Processor

CCIX

Accel

CCIX

CCIX

CC

IX

CC

IX Accel

CCIX

CCIX

CC

IX

CC

IXAccel

CCIX

CCIX

CC

IX

CC

IXAccel

CCIX

CCIX

CC

IX

CC

IX

Processor

CCIX

Processor

PCIe

Accel

CCIX

CC

IX

CC

IX

PCIe

Accel

CCIX

CC

IX

CC

IX

PCIe

Accel

CCIX

CCIX

CC

IX

CC

IXAccel

CCIX

CCIX

CC

IX

CC

IX

Processor

PCIe

Direct attached, daisy chain, mesh and switched topologies

Tech Symposia

Page 9: CCIX: a new coherent multichip interconnectarmtechforum.com.cn/attached/article/C7_CCIX20171226161955.pdf · • Arm’s DynamIQ and CoreLink CMN-600 technology • Cadence CCIX and

© Arm 2017 9

DMA Engines: The problem with traditional accelerators

Operating System vendors are interested in the opportunity for workload-optimized accelerators

Traditional DMA approach requires a special (Linux) kernel driver for every unique accelerator

Requires skilled kernel developers (a driver for each accelerator), failure mode is catastrophic (system crash/downtime)

Operating Systems used tomorrow have already been deployed. Updates are 9-12 months apart

Drivers must be in “upstream” Linux before we support them, a year+ turnaround for every accelerator

'Trilby”: DMA Engine driven FPGA based workload accelerator built by Jon Masters for research into the barriers to adoption in the Enterprise, uses traditional approach of kernel driver and Operating System hacks.

Tech Symposia

Page 10: CCIX: a new coherent multichip interconnectarmtechforum.com.cn/attached/article/C7_CCIX20171226161955.pdf · • Arm’s DynamIQ and CoreLink CMN-600 technology • Cadence CCIX and

© Arm 2017 10

Coherent virtual memory eliminates data transfer overhead

Processor

Accelerator

Processor

Accelerator

Clean and copy data

Non-coherent system without Shared Virtual Memory (SVM)Software must manage cache maintenance and data copying

Clean and copy data

Clean and copy data

Cache coherent system with Shared Virtual Memory (SVM)Hardware managed cache maintenance, shared address space with direct memory access

AcceleratorProcessor

Tech Symposia

Page 11: CCIX: a new coherent multichip interconnectarmtechforum.com.cn/attached/article/C7_CCIX20171226161955.pdf · • Arm’s DynamIQ and CoreLink CMN-600 technology • Cadence CCIX and

© Arm 2017 11

CCIX acceleration functions that just work in the cloud

Container_2

P hysical M achine (e.g., processors, DRAM, caches, mmu, iommu, other resources and SoC devices ...)

V irtual M achines

G uest O S 1

JVM_1App_1

VNF_2

Virtual M achine1

G uest O S 2

VNF_1

V irtual M achine2

V irtual M achine M onitor (V M M )/H ypervisor

P hysical M achine

...

H yper-P riveleged

N onpriveleged

P riveleged

Container_1

G uest O S 3

App_4

Virtual M achine3

...

Container_1

VNF_3

Container_2

VNF_4

O ther E xternal D evices (e.g., disks, NICs, FPGAs, GPUs, crypto, other accelerators, other devices ...)

Firm w areFirm w are,

O ption R O M s, etc

(O ptional)S ystemD ependent

Non-privileged

Privileged

HyperPrivileged

OptionalSystem Dependent

CCIXfunction

CCIXfunction

CCIXfunc

CCIXfunc

Tech Symposia

Page 12: CCIX: a new coherent multichip interconnectarmtechforum.com.cn/attached/article/C7_CCIX20171226161955.pdf · • Arm’s DynamIQ and CoreLink CMN-600 technology • Cadence CCIX and

© Arm 2017 12

CCIX layered architecture

• Protocol Layer – coherency protocol, memory read & write flows

• Link Layer – formats CCIX messages for target transport

• Transaction Layer – Adds optimized packets, manages credit based flow control

• Physical Layer – Dual mode PHY to support extended data rates

PCIeTransaction

LayerCCIX

Transaction Layer

PCIe Data Link Layer

CCIX/PCIe Physical Layer

Tx Rx

PCIe packetsCCIX messages

CCIXLink Layer

CCIXProtocol Layer

Tech Symposia

Page 13: CCIX: a new coherent multichip interconnectarmtechforum.com.cn/attached/article/C7_CCIX20171226161955.pdf · • Arm’s DynamIQ and CoreLink CMN-600 technology • Cadence CCIX and

© Arm 2017 13

CCIX example request to home data flows

Memory

Accelerator shares processor memory

ReqCache

Home

LALA

Daisy chain to shared processor memory

ReqCache

Memory

ReqCache

Home

LALA

ReqCache

LA

ReqCache

LA

Memory

ReqCache

Home

LALA

ReqCache

Shared processor and accelerator memory

Memory

Home

Memory

ReqCache

Home

LALA

ReqCache

Shared memory with aggregation

Memory

Home

LALA

Tech Symposia

Page 14: CCIX: a new coherent multichip interconnectarmtechforum.com.cn/attached/article/C7_CCIX20171226161955.pdf · • Arm’s DynamIQ and CoreLink CMN-600 technology • Cadence CCIX and

© Arm 2017 14

Improved efficiency with CCIX transaction layer

Reduced latency with light weight transaction layer

Improved packet efficiency with optimized CCIX header

Tech Symposia

Page 15: CCIX: a new coherent multichip interconnectarmtechforum.com.cn/attached/article/C7_CCIX20171226161955.pdf · • Arm’s DynamIQ and CoreLink CMN-600 technology • Cadence CCIX and

© Arm 2017 15

CCIX port aggregation to boost bandwidth and transactions

CCIX defines a hashing function to steer requests across multiple links

Aggregation effectively multiplies the bandwidth

Aggregation could also be used to increase number of transactions (eg 50GT/s vs 25GT/s)

PCIe requires separate address spaces, requests can not be hashed

Memory

ReqCache

Home

LALA

ReqCache

CCIX with Port Aggregation

Memory

Home

LALA

Memory

Processor

Cache

Home

PC

I

PC

I

Accelerator

Cache

Mem0

Home0

PC

I

PC

I

Mem1

Home0

PCIe with Aggregation

Tech Symposia

Page 16: CCIX: a new coherent multichip interconnectarmtechforum.com.cn/attached/article/C7_CCIX20171226161955.pdf · • Arm’s DynamIQ and CoreLink CMN-600 technology • Cadence CCIX and

© Arm 2017 16

CCIX SoC integration example

PC

IeTr

ansa

ctio

n

Laye

r

CC

IXTr

ansa

ctio

n

Laye

r

Dat

a Li

nk

Laye

r

PH

Y (u

p t

o 2

5G

pb

s)

16 Lanes

DMC-620

DMC-620 DMC-620

DMC-620

XP

CM

LR

NI

CXS

AXI

3rd party PCIe/CCIX IP

Example CMN-600 mesh design

CoreLink CMN-600

Tech Symposia

Page 17: CCIX: a new coherent multichip interconnectarmtechforum.com.cn/attached/article/C7_CCIX20171226161955.pdf · • Arm’s DynamIQ and CoreLink CMN-600 technology • Cadence CCIX and

© Arm 2017 17

Arm CCIX demonstration vehicle

• Arm’s DynamIQ and CoreLink CMN-600 technology

• Cadence CCIX and PCIe controller and PHY IP

• TSMC 7nm process technology

• CCIX Connectivity to Xilinx’s Virtex UltraSoC+ FPGA

Xilinx, Arm, Cadence, and TSMC Announce World's First CCIX Silicon Demonstration Vehicle in 7nm Process Technology

Tech Symposia

Page 18: CCIX: a new coherent multichip interconnectarmtechforum.com.cn/attached/article/C7_CCIX20171226161955.pdf · • Arm’s DynamIQ and CoreLink CMN-600 technology • Cadence CCIX and

© Arm 2017 18

Scale-up server node performance with CCIX

CCIX is a class of interconnect providing high performance, low latency for new accelerators use cases

Easy adoption and simplified development by leveraging today’s data centerinfrastructure

IP available from Arm and ecosystem to optimize CCIX SoC today

• Server, FPGAs, GPUs, network/storage adapters, intelligent networks and custom ASICs

For more information go to:

https://developer.arm.com

Tech Symposia

Page 19: CCIX: a new coherent multichip interconnectarmtechforum.com.cn/attached/article/C7_CCIX20171226161955.pdf · • Arm’s DynamIQ and CoreLink CMN-600 technology • Cadence CCIX and

1919

Thank You!Danke!Merci!谢谢!ありがとう!Gracias!Kiitos!

© Arm 2017

The Arm trademarks featured in this presentation are registered trademarks or trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners.

www.arm.com/company/policies/trademarks