i ntel 5520 chipset: an i / o hub chipset for server ...€¢ crc protection on every 80 bits along...
TRANSCRIPT
1* Other names and brands may be claimed as the property of others. Copyr ight © 2009, I ntel Corporat ion. Hotchips 2009
I nte l ® 5 5 2 0 Chipset : An I / O Hub Chipset for Server , W orksta t ion, and High End Desktop
Debendra Das Sharm aPrincipal Engineer, Digital Enterprise Group
I ntel Corporat ion
2* Other names and brands may be claimed as the property of others. Copyr ight © 2009, I ntel Corporat ion. Hotchips 2009
Cont r ibutors• Nilesh Bhagat• Rob Blankenship• Celeste Brown• Sam Chiang• Ken Creta• Debendra Das Sharma• Hanh Hoang • Siva Gadey Prasad • S. Jayakr ishna• Michelle Jen• Daniel Joe• Chandra P Joshi • Lily Looi • Dean Mulla• Sridhar Muthrasanallur• K. Pat tabhiraman• Guru Rajamani • Bill Rash• Aquiles Saenz• Rajesh Sankaran• Miles Schwartz• Mark R Swanson • Pat r ick Tsui• Cyprian Woo• Robin Zhang• and many others..
3* Other names and brands may be claimed as the property of others. Copyr ight © 2009, I ntel Corporat ion. Hotchips 2009
Agenda
• Plat form Overview• Feature Set• Micro-architecture and Transact ion Flows• Perform ance • Chip Stat ist ics• Sum m ary
4* Other names and brands may be claimed as the property of others. Copyr ight © 2009, I ntel Corporat ion. Hotchips 2009
Server Chipset Evolut ion
FSB
CPU CPU
Mem & I/O Controller
PXBPXBSouthBridge
PCI2.1*M
emor
y
Expander Bus
FSB
CPU CPU
Mem & I/O Controller
SouthBridge
Mem
ory
PCIe to PCI Bridge P
CI/P
CI-
X*
PCIe 1.0a*
FSB
CPU CPU
Mem & I/O Controller
Enterprise SouthBridge
Mem
ory
PCIe 1.0a*
ESI
PCI/PCI-X*
PCIe 1.0a*Legacy I/Os
F B
D
CPU CPU
SouthBridge
ESI
PCIe 2.0*
5520 IOH
FSB
CPU CPU
ScalableNode Controller
I/O Hub SouthBridge
PCI/PCI-X*
Scalable Port
PCI-(X)Bridge
DMH
DMH
Mem
ory
450 GX(~95)
450 NX(~98)
E8870*(~01)
Plumas(~02)
E7520(~04)
E8500 (Twincastle*)(~05)
5000P (Blackford*)(~06)
Seaburg(~07)
Clarksboro(~07)
5520(~09)
Trends:- Increased integration => fewer chips- Increased bandwidth for coherent traffic: FSB -> multiple FSBs -> Links-Increased memory capacity and bandwidth (memory buses -> FBD ->Integrated memory controller) -Increased I/O connectivity and bandwidth: PCI bus /bridges to PCIe* links(FSB: Front Side Bus. *: Published in Hotchips)
5* Other names and brands may be claimed as the property of others. Copyr ight © 2009, I ntel Corporat ion. Hotchips 2009
Plat form Overview
• Platform Transition:– Front Side Bus to Intel® Quick Path
Interconnect Link
– Memory controller integrated to CPU
• Intel® 5520 Chipset is an I/O Hub– Bridge between QPI and I/O
– First server chipset with PCIe*2.0
– Flexible I/O with 36 PCIe* 2.0 Lanes (3 to 10 Root ports of different widths)
– One or two CPU socket connection
– Server and Workstation platforms
– Customized to High-End Desktop (X58)
– Multi-generational CPU upgrades
South Bridge
Intel® 5520
Chipset
QPI(x20)
QPI(x20)
QPI(x20)
12 USB 2.0
ESI(x4, 2.5G)
6 x1 PCIe*
6 SATA
Intel® HD AudioLPC
Other ports
JTAG, SMBus, etc
(2 CPU Sockets, Single IOH System)
CPU 0 CPU 1
x16 x8
x8x4
x4
x4x4
x16x8
x8
x4
x4x4
x4
x4 x2x2
36 P
CIe
* 2.
0 La
nes
6* Other names and brands may be claimed as the property of others. Copyr ight © 2009, I ntel Corporat ion. Hotchips 2009
Dual I OH Plat form• Increased I/O Connectivity
– 72 PCIe* 2.0 Lanes
– Maintains I/O flexibility
– Workstation and server
• Two IOHs coordinate their accesses to appear as a single IOH to the CPU(s)– Enhanced QPI protocol in the
IOH-IOH Link (e.g. Recall cache lines from other IOH to ensure forward progress)
– Dynamic adjustment of CPU HOME resources (trackers) to ensure performance
• One or two CPU sockets
PCIe* 2.0(36 Lanes)
South Bridge
Intel® 5520
Chipset
QPI(x20)
QPI(x20)
QPI (x20)
12 USB 2.0
ESI6 x1 PCIe*
6 SATA
Intel® HD AudioLPC
Other ports
CPU 0 CPU 1
QPI+ (x20)
PCIe* 2.0(36 Lanes)
Intel® 5520
Chipset
(2 Sockets, Dual IOH System)
JTAG/ SMBus etc..
7* Other names and brands may be claimed as the property of others. Copyr ight © 2009, I ntel Corporat ion. Hotchips 2009
I nte l ® Quick Path I nterconnect ( QPI )
• Two QPI links, each connect ing to a CPU socket (or I OH)• Different ial links with forwarded clock• Supports up to 17” channel length with two connectors• Each QPI is 20 bits wide, up to 6.4 GT/ s Data Rate• 51.2 GB/ s raw bandwidth and 31.5 GB/ s of sustained real data
t ransfer with two QPI Links at 6.4 GT/ s • CRC protect ion on every 80 bits along with Link level ret ry• Unordered fabric for performance• Snoopless I OH ensures CPU does not snoop I OH
– QPI connect ing I OH only used for I / O accesses– I OH does not rem ain in snoop path of CPU through upgrades
8* Other names and brands may be claimed as the property of others. Copyr ight © 2009, I ntel Corporat ion. Hotchips 2009
I nte l ® 5 5 2 0 I / O Vir tua lizat ion Features• First server chipset shipping with PCI e* I / O Virtualizat ion
– Mult iple VMM vendor (Xen, KVM, Cit r ix, Parallels, VMWare, etc) products enabled
• All inbound m em ory requests undergo VT-d ( I OMMU) address t ranslat ion and access pr ivilege check
• Translat ion based on the requestor’s BDF < Bus, Device, Funct ion> and page address
– Each BDF may belong to a different vir tual machine (VM)
• Context Ent ry (2- level) and a 4- level address t ranslat ion st ructure resident in m em ory
• Mult i- level caching st ructures inside I OH– Context ent ry cache, L1/ L2, L3 and I OTLB caches– I nvalidat ion: Register-based and Queue-based from memory
• Supports PCI -SI G defined ATS/ ACS for end-point caching and access r ights check
CPU CPU
SouthBridge
ESI PCIe 2.0*
5520 IOH
0 1 2
0 1 2
0 1 2
P1
P2
P3
[Assigned I/O for VMs:VM1: (P1,0), (P2,2)VM2: (P1, 1), (P2, 0), P3VM3: (P1, 2), (P2, 1)]
9* Other names and brands may be claimed as the property of others. Copyr ight © 2009, I ntel Corporat ion. Hotchips 2009
Reliabilit y, Ava ilabilit y, Serviceabilit y ( RAS)
• High speed interfaces (ESI , PCI e* , QPI ) are CRC protected with Link level ret ry
• Poison support throughout (PCI e* , ESI , QPI , internal paths)• I nternal data path most ly ECC/ CRC protected
– Configurat ion registers are parity protected
• Detailed error logs in each interface for each error type• Advanced Error Report ing St ructures
– Both MSI -based as well as interrupt pin based not ificat ion
• Hot -plug support on all PCI e* Links• Lane degradat ion and reversal support on PCI e*• Live Error Recovery for guaranteed error containment
– Can program each error type in to take the PCI e* link down – Exam ple: I f a device can not handle poison data, we can take
that Link down rather than propagate poison to the device
10* Other names and brands may be claimed as the property of others. Copyr ight © 2009, I ntel Corporat ion. Hotchips 2009
I nte l ® 5 5 2 0 Chipset Features• I sochronous support on ESI for Quality of Service
– Separate Virtual Channel for HD Audio ( latency & bandwidth)
– Separate Virtual Channel for USB (bandwidth guarantee)
• I OAPI C: I / O Advanced Programmable I nterrupt Cont roller– Converts legacy interrupts to Message Signaled I nterrupt
– Avoids interrupt sharing = > bet ter for perform ance
• QuickData Accelera t ion for CPU off- load– DMA Move Engine w/ CRC capability: 5GB/ s of bandwidth
– Eight funct ions for bet ter vir tualizat ion support
– Direct Cache Access to CPU cores from PCI e*
• I ntegra ted Manageabilit y Engine for System management – Em bedded m icrocont roller with encrypt ion engine
– Sideband paths to com ponents
– I nband PCI ports (serial port , DMA, em ulated I DE, HECI )
– Separate power well
11* Other names and brands may be claimed as the property of others. Copyr ight © 2009, I ntel Corporat ion. Hotchips 2009
I nte l ® 5 5 2 0 I OH Block Diagram
PCIe* x4 (2 x2)
PCIe* x16(2 x8, 4 x4)
ESI/PCIe* AFE
ESI PL + LL
ESITL Pkt Proc
PCIe* AFE
PCIe* PL + LL
PCIe* TL Pkt Processing
PCIe* AFE
PCIe* PL + LL
PCIe* TL Pkt Processing
MgmtEngine
(267 MHz)
CB3 @400Mhz
ESI/PCIe* Transaction Layer Inbound/Outbound Queues
Central Data Path Packet Processing/Routing VT-d
ESI (x4)
QPI AFE
QPI PL + LL
QPI Protocol Layer with cache
QPI AFE
QPI PL + LL
QPI Protocol Layer with cache
400 Mhz(QPI FreqDiv 16)
250 Mhz
QPI Link(x20, up to 6.4 GT/s)
QPI Link(x20, up to 6.4 GT/s)
JTAGCfg ringIO APIC
PCIe* x16(2 x8, 4 x4)
PCIe* AFE
PCIe* PL + LL
PCIe* TL Pkt Processing
Hot Plug
Hdr Bus + 16B Data Bus +Check bits + Control (Internal Datapath B/W: 51.2 GB/s) [PL: Physical Layer, LL: Link Layer,TL: Transaction Layer, Pkt: Packet,AFE: Analog Front End, QPI: Quick Path Interconnect Link]
12* Other names and brands may be claimed as the property of others. Copyr ight © 2009, I ntel Corporat ion. Hotchips 2009
Transact ion Flow Exam ple: DMA Read• 1. Mem ory Read (MRd) loaded to Queue
• 2. VTd t ranslat ion and access r ight check
• 3. Ordering check. Packet broken to Cache line(s) . Request sent to QPI 0 (hom e in CPU0)
• 4. QPI 0: Conflict check; Check t rackers; Consum e t racker; Send request to CPU0
• 5. QPI 1 sends snoop request to CPU 1
• 6. CPU 1 sends snoop response to CPU0
• 7. CPU 0 sends Data Return to I OH. QPI 0 releases the t racker on receipt of Data Return
• 8. Data loaded to outbound PCI e* queue
• 9. Data com plet ion sent out on PCI e*
CPU 0 CPU 1
IOH
ESI/ PCIe*
PCIe* Queues
CDP
QPI 0 QPI 1
1. MRd
2. VTd Access
3. Cache Line Request
4. Read Request(Traker consumed)
5. Snoop
6. Snoop Response
7. Data(Tracker Release)
9. Data Completion
8. DataCompletion
QPI QPI
QPI
13* Other names and brands may be claimed as the property of others. Copyr ight © 2009, I ntel Corporat ion. Hotchips 2009
DMA W rite: Request for Ow nership
• 1. Mem ory Write (MWr) loaded to queue
• 2. VTd t ranslat ion and access r ight check. Page Walk on a m iss.
• 3. Packet broken to Cache line(s) . Request for Ownership (RFO) sent to QPI 0 (hom e in CPU0) . No Ordering check to pipeline RFOs
• 4. QPI 0: Conflict check; Check t rackers; Consum e t racker; Send request to CPU0
• 5. QPI 1 sends snoop request to CPU 1
• 6. CPU 1 sends snoop response to CPU0
• 7. CPU 0 returns the (Exclusive) Ownership of the Cache Line (without Data) to I OH
CPU 0 CPU 1
IOH
ESI/ PCIe*
PCIe* Queues
CDP
QPI 0 QPI 1
1. MWr
2. VTd Access
3. Cache Line Request
4. Request for Ownership (tracker)
5. Snoop
6. Snoop Response
QPI QPI
QPI
14* Other names and brands may be claimed as the property of others. Copyr ight © 2009, I ntel Corporat ion. Hotchips 2009
DMA W rite: W r iteback Phase• 8. QPI 0: ownership not ificat ion to CDP
so that it can process DMA Write
• 9. CDP waits t ill the posted t ransact ion gets to the top of the posted queue, per PCI e* Ordering rules
• 10. CDP: Check with QPI to ownership st ill there; perform write if there; else request line again
• 11. QPI 0 perform s Writeback of Data and relinquishes ownership
• 12. CPU 0 sends com plet ion for the Writeback Transact ion. Tracker released for subsequent reuse
CPU 0 CPU 1
IOH
ESI/ PCIe*
PCIe* Queues
CDP
QPI 0 QPI 1
11. Data and Relinquish Ownership
QPI QPI
QPI
9. Wait tillCache line At Head of P Queue
10. Write Data(Complete Txn onPCIe*)
8. Cache ownershipnotification passednn to CDP
15* Other names and brands may be claimed as the property of others. Copyr ight © 2009, I ntel Corporat ion. Hotchips 2009
Perform ance
• Measured results on single I OH at launch. More than 2X previous generat ions due to QPI as well as PCI e 2.0*
• 100% Rd B/ W PCI e* lim ited• 100% Wr and 50-50 RW B/ W is t racker ent ry lim ited
– Writes occupy t racker ent ry longer since there are two round- t r ips on QPI Link– Bandwidth expected to scale with compute capability in subsequent CPU
generat ions due to more t racker ent r ies from CPU• Other details: I O Meter benchmark, 2.93 GHz 5500, QPI at 6.4 GT/ s, 1333MHz
RDI MM DDR3(6 x2 GB, 2 x8 channel) , Request Size: 4KB, Max payload: 256B
Configurat ion (Single I OH) 100% Rd(GB/ s)
100% Wr (GB/ s)
50-50 Rd/ Wr (GB/ s)
NUMA w/ 3 PCI e 2.0* cards (2 x16, 1 x4) 15.9 12.4 15.2
NUMA w/ 4 x8 PCI e 2.0* Cards 13.9 12.3 14.7
I nter leaved w/ 4 x8 PCI e 2.0* Cards 14.1 12.1 14.5
16* Other names and brands may be claimed as the property of others. Copyr ight © 2009, I ntel Corporat ion. Hotchips 2009
Pow er
• PCI e* and ESI : – Act ive State Power Managem ent puts idle link to low power L1 state– PCI * Power Managem ent m echanism s to allow system software to
m anage System Sleep states (ent ry as well as exit )
• QPI : Support for low power L1 state on idle link• Several power savings measures in the design (e.g., fine-grain
and coarse-grain clock gat ing)• System wide sleep state orchest rated with South Bridge• Power numbers:
– TDP: 27.1 W – All Links working on full speed (e.g., QPI at 6.4 GT/ s and PCI e* at 5 GT/ s) – All features and internal devices enabled – No act ive state power management benefit assumed– Accounts for worst possible combinat ion of process, voltage, temperature
– I dle power: 10 W ( through system low power state)
17* Other names and brands may be claimed as the property of others. Copyr ight © 2009, I ntel Corporat ion. Hotchips 2009
Chip Stat ist ics• 65 nm process
technology• Die Size: 13.6 mm
X 10.4 mm• ~ 100 M t ransistors
– 33x original Pent ium
• Package: FCBGA 37.5 x 37.5 mm, 1.067 pin pitch, 10 layer
• Signal Pins: 570; total pins on package: 1295
18* Other names and brands may be claimed as the property of others. Copyr ight © 2009, I ntel Corporat ion. Hotchips 2009
Sum m ary
• I ntel® 5520 is first QPI -based chipset with PCI e* 2.0
• Leadership features– I / O bandwidth with flexible I / O Connect ivity (36 or 72 PCI e
2.0* Lanes) for various segments– I / O Virtualizat ion– QuickData for I / O Accelerat ion– Manageabilit y– I sochrony for Quality of Service
• Designed to last m ult iple CPU generat ions on the sam e plat form to protect custom er investm ents