pcie gen 3.0 presentation @ 4th fpga camp

30
Copyright © 2011, PCI-SIG, All Rights Reserved 1 PCIe 3.0 Controller Case Study Trupti Gowda Field Application Engineer PLDA

Upload: fpga-central

Post on 18-Nov-2014

3.496 views

Category:

Technology


1 download

DESCRIPTION

PCIe Gen3 presentation by PLDA at 4th FPGA Camp in Santa Clara, CA. For more details visit http://www.fpgacentral.com/fpgacamp or http://www.fpgacentral.com

TRANSCRIPT

Page 1: PCIe Gen 3.0 Presentation @ 4th FPGA Camp

1Copyright © 2011, PCI-SIG, All Rights Reserved

PCIe 3.0 Controller Case StudyPCIe 3.0 Controller Case Study

Trupti GowdaField Application Engineer

PLDA

Trupti GowdaField Application Engineer

PLDA

Page 2: PCIe Gen 3.0 Presentation @ 4th FPGA Camp

PCI-SIG Developers Conference 2Copyright © 2011, PCI-SIG, All Rights Reserved

The challenge of Gen3 PCIe 2.0 introduced a x2.0 bandwidth

increase, and speed negotiation

Gen1• Initial

specification

Gen2• x2

bandwidth• Speed

negotiation

Gen3• …• …

Challenge mostly on electrical part, but existing PCIe logic could be adapted easily.

Page 3: PCIe Gen 3.0 Presentation @ 4th FPGA Camp

PCI-SIG Developers Conference 3Copyright © 2011, PCI-SIG, All Rights Reserved

The challenge of Gen3 PCIExpress 3.0 introduces x2.0 bandwidth

increase, and 128b/130b encoding

Gen1• Initial

specification

Gen2• x2

bandwidth• Speed

negotiation

Gen3• x2

bandwidth• 128b/130b

encoding

Major changes are required in PHY layer

Page 4: PCIe Gen 3.0 Presentation @ 4th FPGA Camp

PCI-SIG Developers Conference 4Copyright © 2011, PCI-SIG, All Rights Reserved

Why a new architecture ? 128b/130b encoding has a major impact in PHY

layer and many parts must be significantly modified:

Blocks completely changes the way packets & ordered sets are processed & transmitted.

Scrambling is different and affects data differentlyDifferent and new ordered sets (SKIP, SDS, ..)Equalization affects LTSSM & training sets

Page 5: PCIe Gen 3.0 Presentation @ 4th FPGA Camp

PCI-SIG Developers Conference 5Copyright © 2011, PCI-SIG, All Rights Reserved

Why a new architecture ?

Existing designs have often being defined for Gen1, adapted for Gen2, but technical choices are no longer optimal with another x2.0 throughput increase.

And also .. this is a good opportunity to use experience in PCIe to design a more powerful architecture.

Page 6: PCIe Gen 3.0 Presentation @ 4th FPGA Camp

PCI-SIG Developers Conference 6Copyright © 2011, PCI-SIG, All Rights Reserved

Selecting datapath

Datapath width is an important choice because it strongly affects :

Design effortDevice scalabilityEase of use for end-userPossible targeted FPGA devicesCost (gate count..)

Page 7: PCIe Gen 3.0 Presentation @ 4th FPGA Camp

PCI-SIG Developers Conference 7Copyright © 2011, PCI-SIG, All Rights Reserved

Selecting datapath Example datapaths for a x8 device :

PCIe 1.0

2.5 Gbps

64 bits 250 MHz

128 bits 125 MHz

256 bits 62 MHz

PCIe 2.0

5 Gbps

64 bits 500 MHz

128 bits 250 MHz

256 bits 125 MHz

PCIe 3.0

8 Gbps

64 bits1 GHz

128 bits 500 MHz

256 bits 250 MHz

Next

Generation

64 bits…

128 bits…

256 bits…

Page 8: PCIe Gen 3.0 Presentation @ 4th FPGA Camp

PCI-SIG Developers Conference 8Copyright © 2011, PCI-SIG, All Rights Reserved

Selecting datapath Benefits of a small datapath (32/64-bit) :

Easier to design, minimal data alignment and corner case issues.– Only one (32-bit) or a maximum of two (64-bits) packets

received on the same clock cycle

Small gate count

Easy to connect to common peripherals

Page 9: PCIe Gen 3.0 Presentation @ 4th FPGA Camp

PCI-SIG Developers Conference 9Copyright © 2011, PCI-SIG, All Rights Reserved

Selecting datapath Drawbacks of a small datapath (32/64-bit) :

More complex in some situations– 3..4 clock cycles (32-bit) or 2 clock cycles (64-bit) required to

get the complete header of a TLP. Header checking done over several clock cycles.

Not suitable for high-throughput devices

Requires a faster clock than a large datapath for the same throughput : more power consumed and more expensive FPGA device.

Page 10: PCIe Gen 3.0 Presentation @ 4th FPGA Camp

PCI-SIG Developers Conference 10Copyright © 2011, PCI-SIG, All Rights Reserved

Selecting datapath Benefits of a large datapath (128/256-bit) :

Requires a slower clock than a small datapath for the same throughput : less power consumed.

Can be implemented in slower and cheaper FPGA devices

Suitable for high-throughput devices (Gen2/Gen3)

Page 11: PCIe Gen 3.0 Presentation @ 4th FPGA Camp

PCI-SIG Developers Conference 11Copyright © 2011, PCI-SIG, All Rights Reserved

Selecting datapath Drawbacks of a large datapath (128/256-bit) :

Generally more complex to design : more data alignment & corner cases to handle.– For example : up to 4 DLLPs (256-bit) can be received on a

single clock cycle.

Larger gate count

More difficult to interface to legacy peripherals

Page 12: PCIe Gen 3.0 Presentation @ 4th FPGA Camp

PCI-SIG Developers Conference 12Copyright © 2011, PCI-SIG, All Rights Reserved

Datapath effect on device Note that datapath size can advertly affect a

device:For Example : datapath frequency can have a huge impact on power consumption…

Page 13: PCIe Gen 3.0 Presentation @ 4th FPGA Camp

PCI-SIG Developers Conference 13Copyright © 2011, PCI-SIG, All Rights Reserved

Datapath effect on device Example datapaths for a x4 gen1 device :

32-bit@ 250 Mhz

Gate x1.0Freq x1.0

64-bit@ 125 MHz

Gate x1.5Freq x0.5

128-bit@ 62.5 MHz

Gate x2.0Freq x0.25

256-bit@ 31.2 MHz

Gate x2.5Freq x0.12

Page 14: PCIe Gen 3.0 Presentation @ 4th FPGA Camp

PCI-SIG Developers Conference 14Copyright © 2011, PCI-SIG, All Rights Reserved

Datapath effect on device Power consumption is related to Gate x Freq2 :

32-bit@ 250 Mhz

Gate x1.0Freq x1.0

Power 100%

64-bit@ 125 MHz

Gate x1.5Freq x0.5

Power 37.5%

128-bit@ 62.5 MHz

Gate x2.0Freq x0.25

Power 12.5%

256-bit@ 31.2 MHz

Gate x2.5Freq x0.12Power 4%

x25 factor !

Page 15: PCIe Gen 3.0 Presentation @ 4th FPGA Camp

PCI-SIG Developers Conference 15Copyright © 2011, PCI-SIG, All Rights Reserved

Datapath effect on device So this shows that :

All parts of the device must be designed carefully to avoid performance degradation.

Datapath must be chosen according to number of lanes and PCIe generation in order to :– sustain throughput at a reasonable frequency– match gate count & power requirements

Page 16: PCIe Gen 3.0 Presentation @ 4th FPGA Camp

PCI-SIG Developers Conference 16Copyright © 2011, PCI-SIG, All Rights Reserved

Clock Domain Crossing PCIe logic typically runs at PIPE interface clock

frequency. Clock Domain Crossing (CDC) allows part of this logic to run at an application specific clock frequency.

Clock Domain Crossing location is an important choice because:There can be lots of signals to re-synchronize

– This can consume logic and add latency

There can be required relationships between PIPE clock and application clock.– This can limit usability of CDC

There can be side effects inside PCIe logic.

Page 17: PCIe Gen 3.0 Presentation @ 4th FPGA Camp

PCI-SIG Developers Conference 17Copyright © 2011, PCI-SIG, All Rights Reserved

Clock Domain Crossing Solution #1 : Change clock frequency between

Transaction layer & application:

PHYMACLayer

Data Link

Layer

Transaction

Layer

Application

Page 18: PCIe Gen 3.0 Presentation @ 4th FPGA Camp

PCI-SIG Developers Conference 18Copyright © 2011, PCI-SIG, All Rights Reserved

Clock Domain Crossing Benefits:

Simple to implement and there are no side-effects on PCIe logic.

Application clock is fully independent of PIPE clock and can be freely reduced to save power.

PHYMACLayer

Data Link

Layer

Transaction

Layer

Application

Page 19: PCIe Gen 3.0 Presentation @ 4th FPGA Camp

PCI-SIG Developers Conference 19Copyright © 2011, PCI-SIG, All Rights Reserved

Clock Domain Crossing Drawbacks :

All receive/transmit datapath must be buffered with FIFOs or memories (consumes gates & adds latency)

All PCIe logic always runs at full PIPE frequency so this solution allows no power savings on PCIe logic.

PHYMACLayer

Data Link

Layer

Transaction

Layer

Application

Page 20: PCIe Gen 3.0 Presentation @ 4th FPGA Camp

PCI-SIG Developers Conference 20Copyright © 2011, PCI-SIG, All Rights Reserved

Clock Domain Crossing Solution #2 : Change clock frequency between

Data Link & Transaction layers:

PHYMACLayer

Data Link

Layer

Transaction

Layer

Application

Page 21: PCIe Gen 3.0 Presentation @ 4th FPGA Camp

PCI-SIG Developers Conference 21Copyright © 2011, PCI-SIG, All Rights Reserved

Clock Domain Crossing Benefits:

Application clock is fully independent of PIPE clock and can be freely reduced to save power.

Receive/transmit datapaths can be buffer through receive/transmit buffers (saves gates).

PHYMACLayer

Data Link

Layer

Transaction

Layer

Application

ReceiveBuffer

TransmitBuffer

Page 22: PCIe Gen 3.0 Presentation @ 4th FPGA Camp

PCI-SIG Developers Conference 22Copyright © 2011, PCI-SIG, All Rights Reserved

Clock Domain Crossing Drawbacks :

A part of PCIe logic always runs at full PCIe speed : this limits power savings.

Potential side effects– Flow control overflow cannot be detected accurately,…

PHYMACLayer

Data Link

Layer

Transaction

Layer

Application

ReceiveBuffer

TransmitBuffer

Page 23: PCIe Gen 3.0 Presentation @ 4th FPGA Camp

PCI-SIG Developers Conference 23Copyright © 2011, PCI-SIG, All Rights Reserved

Clock Domain Crossing Solution #3 : Change clock frequency between

PHYMAC & Data Link layers:

PHYMACLayer

Data Link

Layer

Transaction

Layer

Application

Page 24: PCIe Gen 3.0 Presentation @ 4th FPGA Camp

PCI-SIG Developers Conference 24Copyright © 2011, PCI-SIG, All Rights Reserved

Clock Domain Crossing Benefit:

Most of PCIe logic runs at application frequency, this allows large power saving :– Example : with a PIPE 8-bit interface clock is 250MHz in

Gen1, however with a 32-bit datapath a 62.5MHz application clock is enough.

PHYMACLayer

Data Link

Layer

Transaction

Layer

Application

Page 25: PCIe Gen 3.0 Presentation @ 4th FPGA Camp

PCI-SIG Developers Conference 25Copyright © 2011, PCI-SIG, All Rights Reserved

Clock Domain Crossing Drawbacks :

All receive/transmit datapath must be buffered with FIFOs or memories (consumes gate & adds latency)

Application clock frequency must always be high enough to match data rate.– This is because there is no way to limit PCIe throughput

before receive buffer.

PHYMACLayer

Data Link

Layer

Transaction

Layer

Application

ReceiveBuffer

Page 26: PCIe Gen 3.0 Presentation @ 4th FPGA Camp

PCI-SIG Developers Conference 26Copyright © 2011, PCI-SIG, All Rights Reserved

Case Study Conclusion

Through these examples we can see that some key architecture choices directly impact PCIe device in terms of :

flexibility performance logic & power usage

Page 27: PCIe Gen 3.0 Presentation @ 4th FPGA Camp

PCI-SIG Developers Conference Copyright © 2010, PCI-SIG, All Rights Reserved 27

How PLDA can help your PCIe3.0 design?

PLDA has been designing PCIe since 2002, 65 man year effort, 20+ entries on PCISIG integrators list

PLDA has Soft IP for Gen3.0 - XpressRICH3 256b data-path for smaller clock frequencyUser clock - integrated Clock Domain Crossing to

support user-selected freq at the Application layer Ease of use on FPGA

Quick PCIe layer to easily adapt the PCIe Gen1/2/3 Soft/Hard IP to the AXI4 interface which is popular with FPGA vendors.

Coming soon Stratix IV based PCIe3.0 hardware!

Page 28: PCIe Gen 3.0 Presentation @ 4th FPGA Camp

PCI-SIG Developers Conference Copyright © 2010, PCI-SIG, All Rights Reserved 28

AXI4 Interconnect Support ARM has created the AMBA AXI4 bus which can be

integrated into the FPGA fabric. The AXI4 protocol defines a point to point interface

developed to address SoC performance challenges. • It supports multiple clock domains, larger burst lengths

up to 256 beats, and Quality of Service (QoS) signaling. AXI4-Lite is a subset of the AXI4 protocol used for control interface. The AXI4-Stream protocol is used to exchange data between masters and slaves and it does not have a defined or maximum burst or packet length.

PLDA is amongst the first in PCIe domain to standardise on the AMBA 4 specification as part of our interconnect strategy to support 'plug and play' FPGA design

Page 29: PCIe Gen 3.0 Presentation @ 4th FPGA Camp

PCI-SIG Developers Conference Copyright © 2010, PCI-SIG, All Rights Reserved 29

QuickPCIe Block Diagram

UNBUNB

PCIe

Block

QuickPCIe-xxx-top

AXI4-StreamAXI4-StreamAXI4-StreamAXI4 Stream(s)

QuickPCIe-top

AXI4-Lite slave

AXI4-StreamAXI4-StreamAXI4-StreamAXI4 Slave(s)

AXI4-Lite Master

AXI4-StreamAXI4-StreamAXI4-StreamAXI4 Master(s)

AXI4 Desc Master

QuickPCIe Layer

Internal Registers

Interconnect

Address Translator(s)Address Translator(s)Address Translator(s)Address Translator(s)

Address Translator(s)Address Translator(s)Address Translator(s)DMA Engine(s)

Page 30: PCIe Gen 3.0 Presentation @ 4th FPGA Camp

PCI-SIG Developers Conference Copyright © 2010, PCI-SIG, All Rights Reserved 30

Learn more about our soft IP controller and hardware solutions…..Visit our booth here at FPGA Camp!meet Sales Manager Kate Martin [email protected] (408)887-5981 

Thank You!