bdw + fpga beta release 5.0.3 core cache interface (cci-p ...athanas/harp tutorial... · 12-feb-15...
TRANSCRIPT
Intel Confidential
BDW + FPGA
Beta Release 5.0.3 Core Cache Interface (CCI-P)
Interface Specification
2-Sep-16 Document Version 1.0
BDW + FPGA Beta Release 5.0.3
Core Cache Interface (CCI-P) Interface Specification
2-Sep-16 5:07 PM Page 2 of 69 Notices and Disclaimers
Intel Confidential
Notices and Disclaimers
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by
this document.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of
merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising
from course of performance, course of dealing, or usage in trade.
This document contains information on products, services and/or processes in development. All
information provided here is subject to change without notice. Contact your Intel representative to
obtain the latest forecast, schedule, specifications and roadmaps.
The products and services described may contain defects or errors known as errata which may cause
deviations from published specifications. Current characterized errata are available on request.
Intel, the Intel logo, and Xeon are trademarks of Intel Corporation in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
© 2016 Intel Corporation.
Updates
This document belongs to the group of documents provided for the BDW + FPGA product.
Identify the latest copy by the date printed in the footer on each page.
Questions and Feedback
Intel solicits and appreciates feedback. Input should be provided through Intel® Premier Support (IPS). Customers need to ensure IPS access by working with their respective Account Manager/ FAE.
BDW + FPGA Beta Release 5.0.3 Core Cache Interface (CCI-P) Interface Specification
Revision History Page 3 of 69 2-Sep-16 5:07 PM
Intel Confidential
Revision History
Date Version Doc Modifications
24-Nov-14 0.15 Edits to A2C interface
12-Feb-15 0.25
19-Jul-15 0.5 Major edits to CCI-P section. Defined separate CFG Read/Write channel
6-Aug-15 0.55 Define internal interfaces to IPs: CCI-U and csr sideband
3-Sep-15 0.56 Formatting and minor editing
21-Sep-15 0.56 Minor editing
28-Oct-15 0.6 Draft version. Updates to CCI-U cfg header, and CCI-P control signals. Cfg channel renamed to MMIO and signals regrouped in CCI-P section
01-Dec-15 0.6 Removed internal information. Reformat for external distribution.
15-Dec-15 Pre-Alpha Format for pre-alpha; add some clarification
20-Dec-15 0.6 0.6 version with internal information
30-Dec-15 Pre-Alpha External sections updated; called Pre-Alpha
24-Jan-16 Pre-Alpha (CCI-P 0.7)
Edit and format
16-Mar-16 5.0.2 Update for Beta.
11-Jun-16 5.0.2 v1.0 Added Intel Confidential to footer.
23-Aug-16 5.0.3
BDW + FPGA Beta Release 5.0.3
Core Cache Interface (CCI-P) Interface Specification
2-Sep-16 5:07 PM Page 4 of 69 Contents
Intel Confidential
Contents
Notices and Disclaimers ....................................................................................................................... 2
Updates ............................................................................................................................................... 2
Questions and Feedback ...................................................................................................................... 2
Revision History ................................................................................................................................... 3
About this Document ........................................................................................................................... 8
Intended Audience ............................................................................................................................... 8
Conventions ......................................................................................................................................... 8
Related Documentation ....................................................................................................................... 9
Glossary ............................................................................................................................................. 10
1 Introduction ............................................................................................................................... 13
1.1 Xeon® Processor + FPGA Block Diagram ..................................................................................... 14
1.2 Development models .................................................................................................................. 17
1.3 Memory hierarchy ...................................................................................................................... 18
2 CCI-P Interface ............................................................................................................................ 20
2.1 Features ...................................................................................................................................... 22
2.2 Signaling information .................................................................................................................. 23
2.3 Read from/Write to Main Memory............................................................................................. 24
2.4 UMsg ........................................................................................................................................... 24
2.5 MMIO Cycles to IO Memory ....................................................................................................... 26
2.6 CCI-P Tx Signals ........................................................................................................................... 27
2.7 Tx Header Format ....................................................................................................................... 30
2.8 CCI-P Rx Signals ........................................................................................................................... 34
2.8.1 Rx Header and RxData Format .......................................................................................... 36
2.9 Multi-Cacheline Memory Requests ............................................................................................ 39
2.10 Additional Control Signals ........................................................................................................... 41
Protocol Flow .......................................................................................................................................... 43
2.10.1 Upstream Requests ........................................................................................................... 43
2.10.2 Downstream Requests ...................................................................................................... 45
2.11 Ordering Rules ............................................................................................................................ 46
2.11.1 Memory Requests ............................................................................................................. 46
2.11.1.1 Write Fence usage ............................................................................................................ 47
2.11.1.2 Memory Consistency Explained ........................................................................................ 47
2.11.1.2.1 Two Writes on Different VCs ............................................................................................ 48
2.11.1.2.2 Two Writes on the Same VC ............................................................................................. 49
2.11.1.2.3 Two Reads on Different VCs ............................................................................................. 50
2.11.1.2.4 Two Reads on the Same VC .............................................................................................. 51
2.11.1.2.5 Read-after-Write on Same VC ........................................................................................... 51
BDW + FPGA Beta Release 5.0.3 Core Cache Interface (CCI-P) Interface Specification
Contents Page 5 of 69 2-Sep-16 5:07 PM
Intel Confidential
2.11.1.2.6 Read-after-Write on Different VCs ................................................................................... 51
2.11.1.2.7 Write-after-Read on Same or Different VCs ..................................................................... 51
2.11.1.2.8 Some example scenarios: ................................................................................................. 52
2.11.2 MMIO Requests ................................................................................................................ 53
2.12 Timing diagrams .......................................................................................................................... 54
2.13 Clock Frequency .......................................................................................................................... 55
2.14 CCI-P Guidance ............................................................................................................................ 56
3 AFU Requirements ..................................................................................................................... 57
3.1 Mandatory AFU CSR Definitions ................................................................................................. 57
3.2 AFU Discovery Flow ..................................................................................................................... 61
3.3 AFU_ID ........................................................................................................................................ 61
3.3.1 How to Create an AFU_ID / GUID ..................................................................................... 62
3.3.2 How to Use an AFU_ID ...................................................................................................... 62
4 Basic Building Blocks .................................................................................................................. 63
5 Device Feature List ..................................................................................................................... 64
Code
Code 1: ccip_std_afu port map ............................................................................................................. 23
Code 2: Tx interface structure inside ccip_if_pkg.sv .................................................................................. 27
Code 3: Tx channel structures inside ccip_if_pkf.sv ................................................................................... 28
Code 4: Rx interface structure inside ccip_if_pkg.sv .......................................................................... 34
Code 5: Rx channel structure inside ccip_if_pkg.sv ............................................................................ 34
Code 6: Set the Mandatory AFU Registers in the AFU ................................................................................ 58
Code 7: AAL Reads the AFU ID .................................................................................................................... 58
Figures
Figure 1: High-Level Block Diagram of Xeon®+ FPGA Logic ........................................................................ 15
Figure 2 Xeon+FPGA system memory hierachy, 1 Processor topology ..................................................... 18
Figure 3: CCI-P Signals ................................................................................................................................. 21
Figure 4 : UMsg initialization and usage flow ............................................................................................. 25
Figure 5 : Multi-CL Memory Write Requests .............................................................................................. 39
Figure 6 : Multi-CL Memory Write Reponses .............................................................................................. 40
Figure 7 : Multi-CL Memory Read Responses ............................................................................................. 40
Figure 8: Write Out of Order Commit ......................................................................................................... 48
Figure 9: Use WrFence to Enforce Write Ordering ..................................................................................... 48
Figure 10: Two Writes on Same VC, Only One Outstanding ....................................................................... 49
Figure 11: Read Re-Ordering to Same Address, Different VCs ................................................................... 50
BDW + FPGA Beta Release 5.0.3
Core Cache Interface (CCI-P) Interface Specification
2-Sep-16 5:07 PM Page 6 of 69 Contents
Intel Confidential
Figure 12: Read Re-Ordering to Same Address, Same VC........................................................................... 51
Figure 13: Tx Channel 0 & 1 almost full threshold ...................................................................................... 54
Figure 14: Write Fence Behavior ................................................................................................................. 54
Figure 15: C0 Rx Channel Interleaved between MMIO Requests and Memory Responses ....................... 55
Figure 16:Rd Response Timeout ................................................................................................................. 55
Figure 17 : AFU discovery flow .................................................................................................................... 61
Figure 18 Example feature hierarchy .......................................................................................................... 65
Figure 19: Device Feature Conceptual View ............................................................................................... 69
Tables
Table 1: CCI-P Features ............................................................................................................................... 13
Table 2: Comparison of Platform Capabilities ............................................................................................ 16
Table 3 AFU Memory Read paths ............................................................................................................... 19
Table 4: CCI-P Features summary ............................................................................................................... 22
Table 5: Tx Channel Signal Description ....................................................................................................... 29
Table 6 Tx Header Field Definitions ............................................................................................................ 30
Table 7: Tx Request Encodings & Mapping to Header Formats ................................................................. 31
Table 8:C0 Read Memory Request Header Format .................................................................................... 32
Table 9: C1 Write Memory Request Header Format .................................................................................. 32
Table 10: C1 Fence Header Format ............................................................................................................. 32
Table 11: C2 MMIO Response Header Format ........................................................................................... 33
Table 12: Rx Channel Signal Description ..................................................................................................... 35
Table 13 Rx Header Field Definitions .......................................................................................................... 36
Table 14: AFU Rx Response Encodings and Channels Mapping.................................................................. 37
Table 15: C0 Memory Read Response Header Format ............................................................................... 37
Table 16: MMIO Request Header Format .................................................................................................. 37
Table 17: C1 Memory Write Response Header Format .............................................................................. 38
Table 18: UMsg Header Format .................................................................................................................. 38
Table 19: WrFence Header Format ............................................................................................................. 38
Table 20: Clock and Reset ........................................................................................................................... 41
Table 21: Protocol Flow for upstream requests from AFU to FIU .............................................................. 43
Table 22 CCI-P VL0 protocol flows .............................................................................................................. 44
Table 23: Protocol Flow for Downstream Requests from CPU to AFU ....................................................... 45
Table 24 Ordering rules for upstream requests from AFU ......................................................................... 46
Table 25: MMIO Ordering Rules ................................................................................................................. 53
Table 26: Clock Frequency .......................................................................................................................... 55
Table 27 Recommended Choices for Memory Requests ............................................................................ 56
Table 28: Register Attribute Definition ....................................................................................................... 57
Table 29: Mandatory AFU CSRs .................................................................................................................. 57
Table 30: Feature Header CSR Definition ................................................................................................... 59
Table 31: AFU_ID_L CSR Definition ............................................................................................................. 60
BDW + FPGA Beta Release 5.0.3 Core Cache Interface (CCI-P) Interface Specification
Contents Page 7 of 69 2-Sep-16 5:07 PM
Intel Confidential
Table 32: AFU_ID_H CSR Definition ............................................................................................................ 60
Table 33: DFH_RSVD0 CSR Definition ......................................................................................................... 60
Table 34: DFH_RSVD1 CSR Definition ......................................................................................................... 60
Table 35:Differences between AFU, Private Features, and BBBs ............................................................... 64
Table 36 : Device Feature Header CSR ........................................................................................................ 66
Table 37 Next DFH Byte offset example ..................................................................................................... 66
Table 38 Mandatory AFU DFH register map ............................................................................................... 67
Table 39 AFU_ID_L CSR definition .............................................................................................................. 67
Table 40 AFU_ID_H CSR definition .............................................................................................................. 67
Table 41 Next AFU CSR................................................................................................................................ 67
Table 42: DFH_RSVD1 CSR Definition ......................................................................................................... 68
Table 43: Mandatory BBB DFH Register Map ............................................................................................. 68
Table 44: BBB_ID_L CSR Definition ............................................................................................................. 68
Table 45: BB_ID_H CSR Definition .............................................................................................................. 68
BDW + FPGA Beta Release 5.0.3
Core Cache Interface (CCI-P) Interface Specification
2-Sep-16 5:07 PM Page 8 of 69 About this Document
Intel Confidential
About this Document
This document describes the Core Cache Interface (CCI-P) specification which is the interface between the Accelerated Function Unit (AFU) and the BDW + FPGA IP.
Intended Audience
The intended audience is system engineers, platform architects, and software developers. . Users must design the HW AFU to be compliant with the CCI-P specification
Conventions
Conventions used in this document include the following:
# preceding a command indicates the command is to be entered as root.
$ indicates a command is to be entered as a user.
This font this font
Filenames, commands, and keywords are printed in this font. Long command lines are printed in this font. Although some very long command lines may wrap to the next line, the return is not considered part of the command; do not enter it.
<variable_name> indicates the placeholder text that appears between the angle brackets is to be replaced with an appropriate value. Do not enter the angle brackets
BDW + FPGA Beta Release 5.0.3 Core Cache Interface (CCI-P) Interface Specification
Related Documentation Page 9 of 69 2-Sep-16 5:07 PM
Intel Confidential
Related Documentation
Item Description
BDW + FPGA Beta Release 5.0.3 Read This First
This document summarizes the available documentation and suggests how users might navigate through it.
BDW + FPGA Beta Release 5.0.3 Release Notes
This document lists the key features, limitations, changes from the previous release, and possible future changes.
BDW + FPGA Beta Release 5.0.3 Software Installation Guide
This document lists the software prerequisites needed by the AAL SDK and provides instructions on how to install the AAL SDK.
BDW + FPGA Beta Release 5.0.3 AFU Simulation Environment User’s Guide
This document provides instructions on how to use the Accelerated Function Unit (AFU) Simulation Environment (ASE).
BDW + FPGA Beta Release 5.0.3 Software Architecture Guide
This document presents the rationale behind AAL and the concepts upon which AAL is based.
BDW + FPGA Beta Release 5.0.3 Programmer’s Guide
This document shows how the concepts described in BDW + FPGA Beta Release 5.0.3 Software Architecture Guide can be implemented in code. It does not assume the existence of an AFU.
BDW + FPGA Beta Release 5.0.3 Core Cache Interface (CCI-P) Specification
This document describes the Core Cache Interface (CCI-P) specification which is the interface between the Accelerated Function Unit (AFU) and the FPGA Interface Unit.
BDW + FPGA Beta Release 5.0.3 How to Build, Load, and Debug a Bitstream
This document lists the steps to build, load, and debug a green bitstream with Quartus and the BDW + FPGA product
BDW + FPGA Beta Release 5.0.3 Sample Programs Guide
This document describes how to run the sample programs provided with Release 5.0.3 of the BDW + FPGA Accelerator Abstraction Layer and the BDW + FPGA platform.
Arria 10 Avalon-ST Interface with SR-IOV PCIe Solutions User Guide
https://documentation.altera.com/#/00014789-AA$NT00089097
Intel Software Developers Manual http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
Intel Virtualization Technology for Directed-IO
Intel Virtualization Technology for Directed-IO
BDW + FPGA Beta Release 5.0.3
Core Cache Interface (CCI-P) Interface Specification
2-Sep-16 5:07 PM Page 10 of 69 Glossary
Intel Confidential
Glossary
Acronym Expansion Description
AAL Accelerator Abstraction Layer A set of runtime and software development tools that facilitate the deployment of systems consisting of a collection of non-uniform, asymmetric compute resources.
The AALSDK is the AAL Software Development Kit.
AFU Accelerated Function Unit Hardware Accelerator implemented in FPGA logic that accelerates or intends to accelerate an application kernel.
ALI AFU Link Interface This is the software interface between AAL and CCI-P.
ASE AFU Simulation Environment A co-development and simulation area available in Intel® QuickAssist AALSDK consisting of hardware and software.
CA Caching Agent A Caching Agent (CA) makes read and write requests to the coherent memory in the system. It is also responsible for servicing snoops generated by other agents in the system.
CCI-P Core Cache Interface Interface between the AFU and the FPGA Interface Unit (FIU).
CL Cache Line 64-byte cache line
DPI Direct Programming Interface A set of features in SystemVerilog that allows export/import of parameters to/from a C function
FIU FPGA Interface Unit The Intel UPI & PCIe on FPGA together form the FIU sub-block.
FPGA Field Programmable Gate Array http://en.wikipedia.org/wiki/Fpga
PA Physical Address Physical address of the host machine
IPC Inter-Process Communication Refers to constructs in Linux-like shared memory (/dev/shm) and message queues (/dev/mqueue); these are leveraged for ASE core functionality.
KiB 1024 bytes The term KiB is for 1024 bytes and KB for 1000 bytes. When referring to memory, KB is often used and KiB is implied. When referring to clock frequency, KHz is used, and here K is 1000.
BDW + FPGA Beta Release 5.0.3 Core Cache Interface (CCI-P) Interface Specification
Glossary Page 11 of 69 2-Sep-16 5:07 PM
Intel Confidential
Acronym Expansion Description
Mdata Message Tag Data This is a user-defined field, which is relayed from Tx header to the Rx header. It may be used to tag requests with transaction id or channel id.
Msg Message Message- a control notification
NLB Native Loopback Adapter Sample RTL
PAR Place & Route In this context, refers to a stage in the building a bitstream. Placement decides where to place components on the FPGA; and routing determines how to connect the placed components.
RdLine_I1 Read Line Invalid Memory Read Request, with FPGA cache hint set to Invalid, i.e. do not cache it. The line will not cached in FPGA, but may cause FPGA cache pollution.
RdLine_S Read Line Shared Memory Read Request, with FPGA cache hint set to Shared. An attempt will be made to keep it in FPGA cache in Shared state.
Rx Receive Receive or input from AFU’s perspective
Tx Transmit Transmit or output from AFU’s perspective
Upstream Direction up to CPU Logical direction towards CPU. Example, upstream port, means port going to CPU.
UMsg Unordered Message from CPU to AFU
An unordered notification with a 64-byte payload
UMsgH Unordered Message Hint from CPU to AFU
This is a Hint to a subsequent UMsg. No data payload.
UPI Intel© Ultra Path Interconnect Intel’s proprietary coherent interconnect protocol between Intel cores or other IP.
WrLine_I Write Line Invalid Memory Write Request, with FPGA cache hint set to Invalid. FIU will write the data with no intention of keeping the data in FPGA cache.
1 The cache tag is used to track the request status for all outstanding requests on UPI. Therefore, even though RdLine_I is marked Invalid upon completion, it consumes the cache tag temporarily to track the request status over UPI. This action may result in the eviction of a cache line, resulting in cache pollution. The advantage of using RdLine_I is that it is not tracked by CPU directory; thus it will prevent snooping from CPU.
BDW + FPGA Beta Release 5.0.3
Core Cache Interface (CCI-P) Interface Specification
2-Sep-16 5:07 PM Page 12 of 69 Glossary
Intel Confidential
Acronym Expansion Description
WrLine_M Write Line Modified Memory Write Request, with FPGA cache hint set to Modified. FIU will write the data and leave it in the FPGA cache in Modified state.
WrPush_I Write Push Invalid Memory Write Request, with FPGA cache hint set to Invalid. FIU writes the data into the processor’s last level cache (LLC) with no intention of keeping the data in FPGA cache. The LLC it writes to is always the LLC associated with the processor where the DRAM address is homed.
BDW + FPGA Beta Release 5.0.3 Core Cache Interface (CCI-P) Interface Specification
Introduction Page 13 of 69 2-Sep-16 5:07 PM
Intel Confidential
1 Introduction
CCI-P is the hardware-side signaling interface between the Accelerated Function Unit (AFU) and the FPGA Interface Unit (FIU). This document defines the signaling interface. It specifies the access types, the request format and the memory model, and the mandatory AFU CSRs. It provides timing diagrams and AFU design guidelines.
CCI-P provides an abstraction of the physical links between the FPGA and CPU. An AFU sees a unified interface with four virtual channels and a unified address space. CCI-P uses data payloads with four cachelines (4 CL). Table 1 lists some key CCI-P features.
Table 1: CCI-P Features
Feature CCI-P
Data transfer size 64, 128, 256B
Addressing Mode Physical Addressing Mode
Addressing Width
(CL aligned addresses)
42 bits
Caching Hints Yes
Virtual Channels VA, VL0, VH0, VH1
Response Ordering Out of order responses
MMIO Read & Write Supported
FPGA to CPU Interrupt Supported
Interface Clk frequency 400MHz
CCI-P introduces two architectural concepts: Device Feature Lists (DFL) and Basic Building Blocks (BBBs).
DFL defines a structure for grouping like functionalities and enumerating them.
BBB defines an architecture for wrapping features into building blocks. You can incorporate these building blocks into your AFUs.
BBBs are source-visible reference designs; other than a few mandatory registers, there are no other requirements imposed on a BBB. For example, the Memory Properties Factory (MPF) is a BBB that translates virtual memory addresses to physical memory addresses for memory shared between the
BDW + FPGA Beta Release 5.0.3
Core Cache Interface (CCI-P) Interface Specification
2-Sep-16 5:07 PM Page 14 of 69 Introduction
Intel Confidential
Xeon Processor and the FPGA . MPF also does read response ordering and provides data hazard resolution. Section 4 provides more information on BBBs.
1.1 Xeon® Processor + FPGA Block Diagram
FPGA logic (as shown in Figure 1) is divided into two parts: the Intel-provided FPGA Interface Unit (FIU) represented by the blue box (called the blue bitstream) and the user-developed AFU represented by the green box (called the green bitstream).
Note that although the FIU is called the blue bitstream, it is not actually a bitstream. A bitstream is a file that can be loaded onto an FPGA. The blue bitstream is not a file; it is the set of RTL files that make up the Intel IP. You must combine these RTL files with a green bitstream to get a bitstream that can be loaded onto the FPGA. Such a loadable bitstream is called a base bitstream or a full-chip bitstream, and it contains both blue and green parts.
The green bitstream is also not a base or full-chip bitstream; but you can replace the green part of a previously loaded base bitstream with another green part. This green part exists as a separate file.
The FIU implements all the key features required for deployment and manageability of FPGA in a Xeon datacenter. The FIU implements the interface protocols for links between the CPU and FPGA. The FIU also provides platform capabilities such as VT-d, security, error monitoring, performance monitoring, power and thermal management, partial reconfiguration of AFUs, etc.
Note the three physical links: PCIe0, PCIe1, and UPI. These physical links are presented as virtual channels on the CCI-P interface. Refer to Section 1.3 for more information about physical and virtual channels.
The SMBUS interface running between the Xeon processor and the FPGA is SMBus-like; it does not follow published SMBUS specifications. It is used for out-of-band temperature monitoring, configuration during the bootstrap process, and platform debug purposes.
BDW + FPGA Beta Release 5.0.3 Core Cache Interface (CCI-P) Interface Specification
Introduction Page 15 of 69 2-Sep-16 5:07 PM
Intel Confidential
FPGA Management Engine (FME)1. thermal monitor2. power monitor3. performance monitor4. Partial Reconfiguration5. global errors
CCI-U 64B@200MHz
CCI-U 64B@250MHz
Fabric
Intel IP:FPGA Interface Unit (FIU)
QPI 6.4G
PCIe Gen3x8
EP1
PCIe Gen3x8
EP0
CCI-P Port0- SignalTap- UMsg- port reset- port errors
AFU 0
Control Channel
Data Channel
CCI-U 64B@250MHz
IOMMU & Device TLB
BDX only blocks
SMBusslave
SKX only blocksUPI 9.2G
Coherent intf
Xeon
Optional- parameterized
PR Unit
Cache controller
CCI
-P
Figure 1: High-Level Block Diagram of Xeon®+ FPGA Logic
Refer to Table 2 for a list of platform capabilities.
Unified Address space
Even though FIU has three physical links going to the CPU, the AFU maintains a single view of the system address space. A write to address X directed over Coherent Interface or PCIe goes to the same cacheline in the system memory.
Intel Virtualization for Directed IO (VT-d) support
SKX+FPGA has hardware support for memory isolation.
BDW + FPGA Beta Release 5.0.3
Core Cache Interface (CCI-P) Interface Specification
2-Sep-16 5:07 PM Page 16 of 69 Introduction
Intel Confidential
Partial Reconfiguration (PR) of AFU
PR uses Altera FPGA technology to allow a user to reconfigure parts of the FPGA device dynamically, while the remainder of the FPGA continues to operate. Each CCI-P interface port supports one PR enabled AFU.
Remote Debug
The Xeon + FPGA product enables the PSG (formerly Altera) in-system debug tool This debug tool is Remote Access SignalTap (RSTP) via the Xeon processor. The remote access capability obviates the need for physical access to the machine when debugging an FPGA design.
Table 2: Comparison of Platform Capabilities
Capability BDW+FPGA SKX+FPGA
Unified Address space Yes Yes
VT-d support for AFU No Yes
Partial Reconfiguration Yes Yes
Support for two AFUs No No
Remote Debug Yes Yes
FPGA Cache size 64KiB direct mapped 128KiB direct mapped
BDW + FPGA Beta Release 5.0.3 Core Cache Interface (CCI-P) Interface Specification
Introduction Page 17 of 69 2-Sep-16 5:07 PM
Intel Confidential
1.2 Development models
The two AFU development models supported are HDL design and OpenCL design.
1. HDL design
This is the traditional FPGA development flow, where users design an AFU in an HDL language like Verilog, System Verilog or VHDL adhering to the CCI-P interface specification. Users then compile their code (the RTL) through the Quartus tool chain to generate an AFU bitstream.
2. OpenCL design
The PSG OpenCL SDK is a framework for writing programs at a higher level of C-like abstraction. Users develop an AFU in OpenCL C and compile it along with the Xeon + FPGA BSP to generate an FPGA bitstream and a software executable. For best performance, the OpenCL code must be optimized for the Xeon + FPGA platform.
BDW + FPGA Beta Release 5.0.3
Core Cache Interface (CCI-P) Interface Specification
2-Sep-16 5:07 PM Page 18 of 69 Introduction
Intel Confidential
1.3 Memory hierarchy
This section explains the memory hierarchy in the Xeon + FPGA system. Refer to Figure 2. The green dotted box shows the multi-processor coherence domain. The FIU on the FPGA extends the coherence domain from the processor to the FPGA, encompassing a cache implemented on the FPGA (called the FPGA cache).
The FIU implements a cache controller and UPI Caching Agent (CA). The CA makes read and write requests to coherent system memory and services snoop requests to the FIU cache.
N Cores
Last Level Cache
VC
steering
AFUUPI
DRAM
DDR
DRAMDRAM
Processor FPGA
CCI-P
FIU
Multi-processor Coherence Domain
cache
PCIe 1
PCIe RP
PCIe RP: PCIe rootportVC : Virtual Channel
12
3
Figure 2 Xeon+FPGA system memory hierachy, 1 Processor topology
The CCI-P interface abstracts the physical links to the processor and provides simple load/store semantics to the AFU for accessing system memory.
The physical links are presented as virtual channels on the CCI-P interface. Each request can select the virtual channel. The virtual channels are called VL0, VH0, and VH1. There is a fourth called VA (for V Auto) where the FIU chooses one of the other three. Refer to Table 3. The response header identifies which VC was selected by the FIU.
For a single-processor system, AFU sees a three-level memory hierarchy: (1) FIU Cache (2) Processor Last Level Cache (LLC) (3) DRAM.
The memory access latency increases as you go from (1) to (3).
Note that the AFU accesses 2nd and 3rd level memory along two independent paths, each with a different latency. Table 3 lists the different possible AFU Memory Read operations in increasing order of latency. Each row shows the request path, and the node that services the request is highlighted in GREEN.
BDW + FPGA Beta Release 5.0.3 Core Cache Interface (CCI-P) Interface Specification
Introduction Page 19 of 69 2-Sep-16 5:07 PM
Intel Confidential
Table 3 AFU Memory Read paths
Request FPGA Cache Virtual Channel Processor LLC DRAM
FPGA Cache Hit Hit
Processor Cache Hit Miss VL0 Hit
VH*
All Cache Miss Miss VL0 Miss Read
VH*
VH* - means either VH0 or VH1.
If you are still developing experience with the CCI-P interface CCI-P, choose the VA channel. This channel is optimized for maximum bandwidth and producer-consumer type data flows. Refer to Section 2.11 for ordering rules. When you choose VA, the FIU makes a decision to steer your request to a physical link based on the following:
Caching hint Cacheable requests will be biased towards the UPI link.
Data payload size 64B requests will be biased towards UPI link. A cache line is 64 byes. A multi-cacheline read/write will NOT be split, it is guaranteed to be processed by a single physical link.
Link utilization VA will attempt to balance the load across the virtual channels.
The cache is along the VL0 data path. The VC steering decision is made before the cache lookup. You could incur a high memory latency, if the requested cache line is cached in FPGA, and the request got steered to VH*. In this case, the processor will have to snoop the FPGA cache, in order to complete the VH* request.
BDW + FPGA Beta Release 5.0.3
Core Cache Interface (CCI-P) Interface Specification
2-Sep-16 5:07 PM Page 20 of 69 CCI-P Interface
Intel Confidential
2 CCI-P Interface
CCI-P provides access to two types of memory: main memory and IO memory.
Main Memory Subsequent to this section, main memory is just referred to as memory. This is the memory attached to the processor and exposed to the operating system. Requests from the AFU to main memory are called upstream requests.
IO Memory IO memory is implemented within the IO device, which in our case is the AFU. How this memory is implemented and organized is up to the AFU. The AFU may choose flip-flops, M20Ks or MLABs.
The CCI-P interface defines a request format to access IO memory using Memory Mapped IO (MMIO) requests. Requests from the processor to IO Memory are called downstream requests.
The AFU’s MMIO address space is 256KiB
Figure 3 shows all CCI-P signals grouped into three Tx Channels, two Rx Channels and some additional control signals.
Tx/Rx The flow direction is from the AFU point of view. Tx flows from AFU to FIU. Rx flows from FIU to AFU.
Channels Grouping of signals that together completely defines the request or response.
Figure 3 reflects the organization shown in the files ccip_std_afu.sv and ccip_if_pkg.sv.
BDW + FPGA Beta Release 5.0.3 Core Cache Interface (CCI-P) Interface Specification
CCI-P Interface Page 21 of 69 2-Sep-16 5:07 PM
Intel Confidential
Figure 3: CCI-P Signals
BDW + FPGA Beta Release 5.0.3
Core Cache Interface (CCI-P) Interface Specification
2-Sep-16 5:07 PM Page 22 of 69 CCI-P Interface
Intel Confidential
2.1 Features
Table 4 summarizes the features unique to the CCI-P interface for AFUs.
Table 4: CCI-P Features summary
Virtual Channels Physical links are presented to the AFU as virtual channels. The AFU can select the virtual channel for each memory request.
VL0 Low latency virtual channel. (Mapped to UPI)
VH0 High latency virtual channel. (Mapped to PCIe0). Protocol efficiency is better for larger data payloads.
VH1 High latency virtual channel. (Mapped to PCIe1). Protocol efficiency is better for larger data payloads.
VA
Virtual Auto: FIU auto selects the link based on link utilization, request caching hint, and payload size.
Latency: expect to see high variance
BW: expect to see high steady state BW
Memory Request AFU read/write to memory
Addressing Mode Physical address
Address Width 42 bits (CL address)
Data Lengths 64B 128B 256B
Byte Addressing Not supported
FPGA Caching Hint
The AFU can ask the FIU to cache the CL in a specific state. For requests directed to VL0, FIU attempts to cache the data in the requested state, given as a hint. Except for WrPush_I, cache hint requests on VH0/1 are ignored.
Note that the caching hint is only a hint and provides no guarantee of final cache state. Ignoring a cache hint, impacts performance but does not impact functionality.
<request>_I No intention to cache
<request>_S Desire to cache in S state
<request>_M Desire to cache in M state
BDW + FPGA Beta Release 5.0.3 Core Cache Interface (CCI-P) Interface Specification
CCI-P Interface Page 23 of 69 2-Sep-16 5:07 PM
Intel Confidential
MMIO Request CPU read/write to AFU IO Memory.
MMIO Read payload
4B 8B
MMIO Write payload
4B 8B 64B
MMIO writes could be combined by the x86 Write Combining buffer
UMsg Unordered Message
This is a spin loop optimization. It is an improvement to the AFU polling an address location in main memory. When the CPU writes to the memory, AFU receives a UMsg.
UMsgs data payload 64B
# UMsg supported 8 per AFU
2.2 Signaling information
All CCI-P signals must be synchronous to pClk.
All signals are active high, unless explicitly mentioned. Active low signals use a suffix _n.
We recommend using the CCI-P structures defined inside ccip_if_pkg.sv file. This is included in the RTL package.
All AFU output signals must be registered.
AFU output bits marked as RSVD are reserved and must be driven to 0.
AFU output bits marked as RSVD-DNC, are don’t care bits. The AFU can drive either 0 or 1.
All AFU input signals must also be registered.
AFU input bits marked as RSVD must be treated as don’t care (X) by the AFU.
Code 1 shows the port map for the ccip_std_afu module. The AFU must be instantiated under here. The subsequent sections explains the interface signals.
Code 1: ccip_std_afu port map
$ module ccip_std_afu( // CCI-P Clocks and Resets input logic pClk, // 400MHz - CCI-P clock domain. Primary interface clock input logic pClkDiv2, // 200MHz - CCI-P clock domain. input logic pClkDiv4, // 100MHz - CCI-P clock domain. input logic uClk_usr, // User clock domain. input logic uClk_usrDiv2, // User clock domain. Half the programmed frequency input logic pck_cp2af_softReset, // CCI-P ACTIVE HIGH Soft Reset input logic [1:0] pck_cp2af_pwrState, // CCI-P AFU Power State input logic pck_cp2af_error, // CCI-P Protocol Error Detected // Interface structures input t_if_ccip_Rx pck_cp2af_sRx, // CCI-P Rx Port output t_if_ccip_Tx pck_af2cp_sTx // CCI-P Tx Port );
BDW + FPGA Beta Release 5.0.3
Core Cache Interface (CCI-P) Interface Specification
2-Sep-16 5:07 PM Page 24 of 69 CCI-P Interface
Intel Confidential
2.3 Read from/Write to Main Memory
The AFU makes a memory read request to the FIU over C0, using Tx signals, and receives the response over C0, using Rx signals.
AFU drives the C0 valid signal to indicate that C0 Hdr contains a request. The c0_ReqMemHdr structure provides a convenient mapping from flat bit-vector to read request fields. The req_type signal provides a cache hint (RDLINE_I, Invalid or RDLINE-S, Shared) . The mdata field is a user defined request id.
Then, the FIU responds over C0. The resp_type signal in the c0_RspMemHdr structure indicates response type (Memory Read or UMsg Received). The data field in C0 contains the data that were read. The mdata field in the c0_RspMemHdr structure contains the same value that went out with the request.
The AFU makes a memory write request to the FIU over C1, using Tx signals, and receives the response over C1, using Rx signals.
AFU drives the C1 valid signal to indicate that C1 Hdr contains a request. The c1_ReqMemHdr structure provides a convenient mapping from flat bit-vector to write request fields. The req_type signal provides request type and cache hint.
Then, the FIU responds over C1 using Rx signals. The resp_type field in the c1_RespMemHdr structure indicates whether the response is for a memory write. The mdata field in the c1_RespMemHdr structure contains the same value that went out with the write request.
Write memory requests need explicit synchronization using WrFence.
2.4 UMsg
UMsg provides the same functionality as a spin loop from the AFU, without burning the CCI-P read bandwidth. Think of it as a spin loop optimization, where a monitoring agent inside the FPGA cache controller is monitoring snoops to cachelines allocated by the driver. When it sees a snoop to the cacheline, it reads the data back and sends a UMsg to the AFU.
UMsg flow makes use of the cache coherency protocol to implement a high speed unordered messaging path from CPU to AFU. This process consists of two stages as shown in Figure 4.
The first stage is initialization, this is where SW pins the UMsg Address Space (UMAS) and shares the UMAS start address with the FPGA cache controller. Once this is done, the FPGA cache controller reads each cache line in the UMAS and puts it as Shared State in the FPGA cache.
The second stage is actual usage, where the CPU writes to the UMAS. A CPU write to UMAS generates a snoop to FPGA cache. The FPGA responds to the snoop and marks the line as invalid. The CPU write request completes, and the data become globally visible. A snoop in UMAS address range, triggers the monitoring agent (MA), which in turn sends out a read request to CPU for the cache line (CL) and optionally sends out a UMsg with Hint (UMsgH) to the AFU. When the read request completes, a UMsg with 64B data is sent to the AFU.
BDW + FPGA Beta Release 5.0.3 Core Cache Interface (CCI-P) Interface Specification
CCI-P Interface Page 25 of 69 2-Sep-16 5:07 PM
Intel Confidential
Figure 4 : UMsg initialization and usage flow
Functionally, UMsg is equivalent to a spin loop or a monitor and mwait instruction on a Xeon.
Some key characteristics of UMsgs:
1. Just as spin loops to different addresses in a multi-threaded application have no relative ordering guarantee, UMsgs to different addresses have no ordering guarantee between them.
2. Every CPU write to a UMAS CL, may not result in a corresponding UMsg. The AFU may miss an intermediate change in the value of a CL, but it is guaranteed to see the newest data in the CL. Again it helps to think of this like a spin loop: if the producer thread updates the flag CL multiple times, it is possible that polling thread misses an intermediate change in value, but it is guaranteed to see the newest value.
Here is an example usage. Software updates to a descriptor queue pointer may be mapped to a UMsg. The pointer is always expected to increment. UMsg will guarantee that AFU sees the final value of the pointer, it may miss intermediate updates to the pointer, which is acceptable.
3. UMsg will use the FPGA cache, as a result it could cause cache pollution, a situation in which a program unnecessarily loads data into the cache and causes other needed data to be evicted, thus degrading performance.
4. Because the CPU may exhibit false snooping, UMsgH should be treated as a hint. That is, you can start speculative execution or pre-fetch based on UMsgH, but you should wait for UMsg before committing the results.
5. UMsg provides the same latency as a AFU read polling using RdLine_S, but it saves CCI-P channel bandwidth which can be used for read traffic.
Setup UMAS(Pinned Memory)
Inform FPGA of UMAS location
CPU Writes to UMASCPU Wr causes a Snoop to
FPGA UMsgH
Inti
aliz
atio
nU
sag
e
FPGA gets the read data UMsg + 64B data
CPU Memory FPGA QPI Agent
AFU
For ultra low latency, Snp itself is used as a UMsgH
Snp + Read Data is sent as UMsg
BDW + FPGA Beta Release 5.0.3
Core Cache Interface (CCI-P) Interface Specification
2-Sep-16 5:07 PM Page 26 of 69 CCI-P Interface
Intel Confidential
2.5 MMIO Cycles to IO Memory
MMIO Write requests posted AFU must not return a response.
MMIO Read requests non-posted AFU must return a response.
Key points:
Read data widths supported = 4B, 8B
Write data widths supported = 4B, 8B
AFU must support 8B MMIO accesses to IO memory and register file.
4B accesses are optional. It can be avoided by coordinating with the SW application developer.
Maximum outstanding MMIO read requests is limited to 64.
MMIO read request timeout value = 512 pClk cycles
Maximum MMIO request rate = 1 request per 2 pClks
MMIO Reads to undefined AFU registers should still return a response.
The FIU makes an MMIO read request to the AFU over C0, using Rx signals. mmioRdValid indicates that C0 Hdr contains a MMIO read request. The c0_ReqMmioHdr structure provides a convenient mapping from flat bit-vector to MMIO read request fields – {address, length, tid}.
Then, the AFU drives a response over C2 using Tx signals. The C2 signal mmioRdValid indicates that the C2 Hdr and data fields contain the MMIO Read response. The c0_RspMmioHdr.tid field must match that provided in c0_ReqMmioHdr.tid; this is used to match the response against request.
It is illegal to split a 8B MMIO Read request into 2 4B MMIO Read responses.
The FIU makes an MMIO write request to the AFU over C0, using Rx signals. mmioWrValid indicates that the c0_ReqMmioHdr structure is an MMIO write request and contains the IO address to be written. The C0 data field contains the data to be written.
For generating 64B MMIO Writes to AFU, refer to Section 11.3.1 in the Intel Software Developers Manual Volume 3.
BDW + FPGA Beta Release 5.0.3 Core Cache Interface (CCI-P) Interface Specification
CCI-P Interface Page 27 of 69 2-Sep-16 5:07 PM
Intel Confidential
2.6 CCI-P Tx Signals
Code 2: Tx interface structure inside ccip_if_pkg.sv
There are 3 Tx channels:
The C0 and C1 Tx channels are used for memory requests. They provide independent flow control. The C0 Tx channel is used for memory read requests; the C1 Tx channel ids used for memory write requests.
The C2 Tx channel is used to return MMIO Read response to the FIU. The CCI-P port guarantees to accept responses on C2 therefore it has no flow control.
11.3.1 Buffering of Write Combining Memory Locations
:
:
Once the processor has started to evict data from the WC buffer into system memory, it will make a bus-transaction style decision based on how much of the buffer contains valid data. If the buffer is full (for example, all bytes are valid), the processor will execute a burst-write transaction on the bus. This results in all ia32 (P6 family processors) orx86_ 64/EM64T (Pentium 4 and more recent processor) being transmitted on the data bus in a singleburst transaction. If one or more of the WC buffer’s bytes are invalid (for example, have not been written by software), the processor will transmit the data to memory using “partial write” transactions (one chunk at a time, where a “chunk” is 8 bytes).
This will result in a maximum of 4 partial write transactions (for P6 family processors) or 8 partial write transactions (for the Pentium 4 and more recent processors) for one WC buffer of data sent to memory.
$ typedef struct packed { t_if_ccip_c0_Tx c0; t_if_ccip_c1_Tx c1; t_if_ccip_c2_Tx c2; } t_if_ccip_Tx;
BDW + FPGA Beta Release 5.0.3
Core Cache Interface (CCI-P) Interface Specification
2-Sep-16 5:07 PM Page 28 of 69 CCI-P Interface
Intel Confidential
Code 3: Tx channel structures inside ccip_if_pkf.sv
Each Tx channel has a valid signal to qualify the corresponding hdr and data signals within the structure.
Table 5 describes the signals that make up the CCI-P Tx interface.
// Channel 0 : Memory Reads typedef struct packed { t_ccip_c0_ReqMemHdr hdr; // Request Header logic valid; // Request Valid } t_if_ccip_c0_Tx; // corresponding AlmostFull inside t_if_ccip_Rx.c0TxAlmFull // Channel 1 : Memory Writes typedef struct packed { t_ccip_c1_ReqMemHdr hdr; // Request Header t_ccip_clData data; // Request Data logic valid; // Request Wr Valid } t_if_ccip_c1_Tx; // corresponding AlmostFull inside t_if_ccip_Rx.c1TxAlmFull // Channel 2 : MMIO Read response typedef struct packed { t_ccip_c2_RspMmioHdr hdr; // Response Header logic mmioRdValid; // Response Read Valid t_ccip_mmioData data; // Response Data } t_if_ccip_c2_Tx;
BDW + FPGA Beta Release 5.0.3 Core Cache Interface (CCI-P) Interface Specification
CCI-P Interface Page 29 of 69 2-Sep-16 5:07 PM
Intel Confidential
Table 5: Tx Channel Signal Description
Signal Width Direction Description
pck_af2cp_sTx.c0.hdr 74b Output Channel 0 request header .Refer to Table 6 Tx Header Field Definitions.
pck_af2cp_sTx.c0.valid 1b Output When set to 1, it indicates channel 0 hdr is valid.
pck_cp2af_sRx.c0TxAlmFull 1b Input When set to 1, Tx Channel0 is almost full. After this signal is set, AFU is allowed to send a maximum of 8 requests.
When set to 0, AFU can start sending requests immediately.
pck_af2cp_sTx.c1.hdr 80b Output Channel 1 request header. Refer to Table 6 Tx Header Field Definitions.
pck_af2cp_sTx.c1.data 512b Output Channel 1 data.
pck_af2cp_sTx.c1.valid 1b Output When set to 1, it indicates channel 1 hdr and data is valid.
pck_cp2af_sRx.c1TxAlmFull 1b Input When set to 1, Tx Channel1 is almost full. After this signal is set, AFU is allowed to send a maximum of 8 requests or data.
When set to 0, AFU can start sending requests immediately.
pck_af2cp_sTx.c2.hdr 9b Output Channel 2 response header. Refer to Table 6 Tx Header Field Definitions.
pck_af2cp_sTx.c2.mmioRdValid 1b Output When set to 1, indicates Channel 2 hdr and data is valid
pck_af2cp_sTx.c2.data 64b Output MMIO Rd Data bus, used to read AFU registers. For 32b reads, data must be driven on bits [31:0]. For 64b reads, AFU must drive one 64b data response. Response cannot be split into two 32b responses.
BDW + FPGA Beta Release 5.0.3
Core Cache Interface (CCI-P) Interface Specification
2-Sep-16 5:07 PM Page 30 of 69 CCI-P Interface
Intel Confidential
2.7 Tx Header Format
Table 6 Tx Header Field Definitions
Field Description
mdata Metadata: user defined request id that is returned unmodified from request to response hdr.
For multi-CL writes on C1 Tx, mdata is only valid for the hdr when sop=1.
tid Transaction ID: AFU must return the tid MMIO Read request to response hdr. It is used to match the response against the request.
vc_sel Virtual Channel selected 2’h0 – VA 2’h1 – VL0 2’h2 – VH0 2’h3 – VH1
All CLs that form a multi-CL write request are routed over the same VC.
req_type Request types listed in Table 7
sop Start of Packet for multi-CL memory write
1’b1 – marks the first hdr. Must write in increasing address order. 1’b0 – subsequent hdrs
cl_len Length for memory requests 2’b00 – 64B 2’b01 – 128B 2’b11 – 256B
address 64B aligned Physical Address, that is, byte_address>>6
The address must be self-aligned w.r.t. cl_len field. Example for cl_len=2’b01, the address must be divisible by 128B, similarly for cl_len=2’b11, the address must be divisible by 256B.
BDW + FPGA Beta Release 5.0.3 Core Cache Interface (CCI-P) Interface Specification
CCI-P Interface Page 31 of 69 2-Sep-16 5:07 PM
Intel Confidential
Table 7: Tx Request Encodings & Mapping to Header Formats
Request Type Encoding Data Description Hdr Format
t_if_ccip_c0_tx: enum t_ccip_c0_req
eREQ_RDLINE_I 4’h0 No Memory read request with no intention to cache.
C0 Memory Request Header. Refer to Table 8.
eREQ_RDLINE_S 4’h1 No Memory read request with caching hint set to Shared.
t_if_ccip_c1_tx: enum t_ccip_c1_req
eREQ_WRLINE_I 4’h0 Yes Memory write request with no intention of keeping the data in FPGA cache.
C1 Memory Request Hdr. Refer to Table 9.
eREQ_WRLINE_M 4’h1 Yes Memory write request with caching hint set to Modified.
eREQ_WRPush_I 4’h2 Yes Memory Write Request, with caching hint set to Invalid. FIU writes the data into the processor’s last level cache (LLC) with no intention of keeping the data in FPGA cache. The LLC it writes to is always the LLC associated with the processor where the DRAM address is homed.
eREQ_WRFENCE 4’h4 No Memory write fence. This request doesn’t have a data payload.
Fence Hdr. Refer to Table 10.
t_if_ccip_c2_tx – doesn’t have a request type field
MMIO Rd N.A. Yes MMIO read response MMIO Rd Response Hdr. Refer to Table 11.
All unused encodings are considered Reserved.
BDW + FPGA Beta Release 5.0.3
Core Cache Interface (CCI-P) Interface Specification
2-Sep-16 5:07 PM Page 32 of 69 CCI-P Interface
Intel Confidential
Table 8:C0 Read Memory Request Header Format Structure: t_ccip_c0_ReqMemHdr
Bit # bits Field
[73:72] 2 vc_sel
[71:70] 2 RSVD
[69:68] 2 cl_len
[67:64] 4 req_type
[63:58] 6 RSVD
[57:16] 42 address
[15:0] 16 mdata
Table 9: C1 Write Memory Request Header Format Structure: t_ccip_c1_ReqMemHdr
Bit # bits Field,
SOP=1
Field,
SOP=0
[79:74] 6 RSVD RSVD
[73:72] 2 vc_sel RSVD-DNC
[71] 1 sop=1 sop=0
[70] 1 RSVD RSVD
[69:68] 2 cl_len RSVD-DNC
[67:64] 4 req_type req_type
[63:58] 6 RSVD RSVD
[57:18] 40 address
RSVD-DNC
[17:16] 2 address
[15:0] 16 mdata RSVD-DNC
Table 10: C1 Fence Header Format Structure: t_ccip_c1_ReqFenceHdr
BDW + FPGA Beta Release 5.0.3 Core Cache Interface (CCI-P) Interface Specification
CCI-P Interface Page 33 of 69 2-Sep-16 5:07 PM
Intel Confidential
Bit # bits Field
[79:74] 6 RSVD
[73:72] 2 vc_sel
[71:68] 4 RSVD
[67:64] 4 req_type
[63:16] 48 RSVD
[15:0] 16 mdata
Table 11: C2 MMIO Response Header Format
Bit # bits Field
[8:0] 9 tid
BDW + FPGA Beta Release 5.0.3
Core Cache Interface (CCI-P) Interface Specification
2-Sep-16 5:07 PM Page 34 of 69 CCI-P Interface
Intel Confidential
2.8 CCI-P Rx Signals
Code 4: Rx interface structure inside ccip_if_pkg.sv
There are 2 Rx channels.
Channel 0 interleaves memory responses, MMIO requests and UMsgs.
Channel 1 returns responses for AFU requests initiated on Tx Channel 1.
The c0TxAlmFull and c1TxAlmFull signals are inputs to the AFU. Although they are declared with the Rx signals structure, they logically belong to the Tx interface and so were described in the previous section.
Rx Channels have no flow control. The AFU must accept responses for memory requests it generated. The AFU must pre-allocate buffers before generating a memory request. The AFU must also accept MMIO requests.
Code 5: Rx channel structure inside ccip_if_pkg.sv
Rx Channel 0 has separate valid signals for memory requests and MMIO requests. Only one of those valid signals can be set in a cycle. MMIO request valid further has two valid signals, one for MMIO Rd and other for MMIO Wr. When either are true the hdr must be interpreted as an MMIO hdr instead of memory response header.
typedef struct packed { logic c0TxAlmFull; // C0 Request Channel Almost Full logic c1TxAlmFull; // C1 Request Channel Almost Full t_if_ccip_c0_Rx c0; t_if_ccip_c1_Rx c1; } t_if_ccip_Rx;
// Channel 0: Memory Read response, MMIO Request typedef struct packed { t_ccip_c0_RspMemHdr hdr; // Rd Response/ MMIO / UMsg req Header t_ccip_clData data; // Rd Data / MMIO / UMsg req Data logic rspValid; // Rd Response / UMsg Valid logic mmioRdValid; // MMIO Read Valid logic mmioWrValid; // MMIO Write Valid } t_if_ccip_c0_Rx; // Channel 1: Memory Writes typedef struct packed { t_ccip_c1_RspMemHdr hdr; // Response Header logic rspValid; // Response Valid
BDW + FPGA Beta Release 5.0.3 Core Cache Interface (CCI-P) Interface Specification
CCI-P Interface Page 35 of 69 2-Sep-16 5:07 PM
Intel Confidential
Table 12: Rx Channel Signal Description
Signal Width Direction Description
pck_cp2af_sRx.c0.hdr 28b Input Channel 0 response header or MMIO request header. Refer to Table 13 Rx Header Field Definitions.
pck_cp2af_sRx.c0.data 512b Input Channel 0 Data bus Memory Read Response & UMsg:
Returns 64B data
MMIO Write Request: For 32b write, data driven on bits [31:0] For 64b write, data driven on bits [63:0]
pck_cp2af_sRx.c0.resp_valid 1b Input When set to 1, it indicates hdr and data on Channel 0 are valid. The hdr must be interpreted as a memory response, decode resp_type field.
pck_cp2af_sRx.c0.mmioRdValid 1b Input When set to 1, it indicates a MMIO Rd request Channel 0.
pck_cp2af_sRx.c0.mmioWrValid 1b Input When set to 1, it indicates a MMIO Wr request on Chanel 0.
pck_cp2af_sRx.c1.hdr 28b Input Channel 1 response header. Refer to Table 13 Rx Header Field Definitions.
pck_cp2af_sRx.c1.respValid 1b Input When set to 1, it indicates hdr on channel 1 is a valid response.
BDW + FPGA Beta Release 5.0.3
Core Cache Interface (CCI-P) Interface Specification
2-Sep-16 5:07 PM Page 36 of 69 CCI-P Interface
Intel Confidential
2.8.1 Rx Header and RxData Format
Table 13 Rx Header Field Definitions
Field Description
mdata Metadata: User defined request id, returned unmodified from memory request to response header.
For multi-CL memory response, the same mdata is returned for each CL.
vc_used Virtual channel used: when using VA, this field identifies the virtual channel selected for the request by FIU. For other VCs it returns the request VC.
format When using multi-CL memory write requests, FIU may return a single response for the entire payload or a response per CL in the payload.
1’b0 Unpacked write response: returns a response per CL. Look up the cl_num field to identify the cache line.
1’b1 Packed write response: returns a single response for entire payload. cl_num field gives the payload size, that is, 1 CL, 2 CLs, or 4CLs.
cl_num format=0 For a response with >1CL data payload, this field identifies the cl_num.
2’h0 – 1st CL. Lowest Address 2’h1 – 2nd CL 2’h3 – 4th CL. Highest Address
Responses may be returned out of order.
format=1 This field identifies the data payload size. 2’h0 – 1 CL or 64B 2’h1 – 2 CL or 128B 2’h3 – 4 CL or 256B
hit_miss Cache Hit/Miss status. AFU can use this to generate fine grained hit/miss statistics for various modules.
1’h0 – Cache Miss 1’h1 – Cache Hit
MMIO Length Length for MMIO requests: 2’h0 – 4B 2’h1 – 8B
MMIO Address DWord aligned MMIO address offset, that is, byte Address>>2.
UMsg ID Identifies the CL corresponding to the UMsg
UMsg Type Two type of UMsg are supported: 1’b1 – UMsgH (Hint) without data 1’b0 – UMsg with Data
BDW + FPGA Beta Release 5.0.3 Core Cache Interface (CCI-P) Interface Specification
CCI-P Interface Page 37 of 69 2-Sep-16 5:07 PM
Intel Confidential
Table 14: AFU Rx Response Encodings and Channels Mapping
Response Type Encoding Data Payload Header Format
t_if_ccip_c0_Rx: enum t_ccip_c0_rsp
eRSP_RDLINE 4’h0 Yes Memory Response Header. Refer to Table 15. Qualified with c0.rspValid
MMIO Read N.A. No MMIO Request Header. Refer to Table 16.
MMIO Write N.A. Yes
eRSP_UMSG 4’h4 Yes/No UMsg Response Header. Refer to Table 18. Qualified with c0.rspValid
t_if_ccip_c1_Rx: enum t_ccip_c1_rsp
eRSP_WRLINE 4’h0 No Memory Response Header. Refer to Table 17.
eRSP_WRFENCE 4’h4 No WrFence Response Header. Refer to Table 15.
Table 15: C0 Memory Read Response Header Format Structure: t_ccip_c0_RspMemHdr
Bit # bits Field
[27:26] 2 vc_used
[25] 1 RSVD
[24] 1 hit_miss
[23:22] 2 RSVD
[21:20] 2 cl_num
[19:16] 4 resp_type
[15:0] 16 mdata
Table 16: MMIO Request Header Format
Bit # bits Field
[27:12] 16 address
[11:10] 2 length
[9] 1 RSVD
[8:0] 9 TID
BDW + FPGA Beta Release 5.0.3
Core Cache Interface (CCI-P) Interface Specification
2-Sep-16 5:07 PM Page 38 of 69 CCI-P Interface
Intel Confidential
Table 17: C1 Memory Write Response Header Format Structure: t_ccip_c1_RspMemHdr
Bit # bits Field
[27:26] 2 vc_used
[25] 1 RSVD
[24] 1 hit_miss
[23] 1 format
[22] 1 RSVD
[21:20] 2 cl_num
[19:16] 4 resp_type
[15:0] 16 mdata
Table 18: UMsg Header Format
Bit # bits Field
[27:20] 8 RSVD
[19:16] 4 resp_type
[15] 1 UMsg Type
[14:3] 12 RSVD
[2:0] 3 UMsg ID
Table 19: WrFence Header Format Structure: t_ccip_c1_RspFenceHdr
Bit # bits Field
[27:20] 8 RSVD
[19:16] 4 resp_type
[15:0] 16 mdata
BDW + FPGA Beta Release 5.0.3 Core Cache Interface (CCI-P) Interface Specification
CCI-P Interface Page 39 of 69 2-Sep-16 5:07 PM
Intel Confidential
2.9 Multi-Cacheline Memory Requests
To achieve highest link efficiency, pack the memory requests into large transfer sizes. Use the multi-CL requests for this. Listed below are the characteristics of multi-CL memory requests:
VH0, VH1 and VA attain highest memory BW when using a data payload of 4CLs.
Memory write request should always begin with the lowest address first. SOP=1 in the c1_ReqMemHdr marks the first CL. All subsequent headers in the multi-CL request must drive the corresponding CL address.
An N CL memory write request takes N cycles on Channel 1. It is legal to have bubbles between the cycles that form a multi-CL request, but it cannot be interleaved with another request. It is illegal to start a new request without completing the entire data payload for a multi-CL write request.
FIU guarantees to complete the multi-CL VA requests on a single VC.
The memory request address must be self-aligned. A 2CL request should start on a 2CL boundary. Its CL address must be divisible by 2. A 4CL request should be aligned on a 4CL boundary. Its CL address must be divisible by 4.
Figure 5 is an example of a multi-CL Memory Write request.
‘h0‘h1 ‘h0 ‘h0 ‘h1 ‘h0 ‘h1 ‘h1 ‘h0
pClk
pck_af2cp_sTx.c1.hdr.sop
pck_af2cp_sTx.c1.valid
D1D0 D2 D3 D4 D5 D6 D7 D8pck_af2cp_sTx.c1.data
‘h3 ‘h1 ‘h0 ‘h1pck_af2cp_sTx.c1.hdr.cl_len
‘h1040 ‘h1041 ‘h1043 ‘h1044pck_af2cp_sTx.c1.hdr.addr[41:2]
‘h1‘h0 ‘h2 ‘h3 ‘h0 ‘h1 ‘h1 ‘h0 ‘h1pck_af2cp_sTx.c1.hdr.addr[1:0]
WrLine_I WrLine_MWrLin
e_MWrLine_Ipck_af2cp_sTx.c1.hdr.req_type
VA VH0 VL0 VH1pck_af2cp_sTx.c1.hdr.vc_sel
‘h10 ‘h11 ‘h12 ‘h13pck_af2cp_sTx.c1.hdr.mdata
Figure 5 : Multi-CL Memory Write Requests
BDW + FPGA Beta Release 5.0.3
Core Cache Interface (CCI-P) Interface Specification
2-Sep-16 5:07 PM Page 40 of 69 CCI-P Interface
Intel Confidential
Figure 6 is an example for a Memory Write Response Cycles. For unpacked response, the individual CLs could return out of order.
Figure 6 : Multi-CL Memory Write Reponses
Figure 7 is an example of a Memory Read Response Cycle. The read response can be re-ordered within itself; that is, there is no guaranteed ordering between individual CLs of a multi-CL Read. All CLs within a multi-CL response have the same mdata and same vc_used. Individual CLs of a multi-CL Read are identified using the cl_num field.
Figure 7 : Multi-CL Memory Read Responses
‘h0‘h1 ‘h0 ‘h0 ‘h1 ‘h0 ‘h0 ‘h1
pClk
pck_cp2af_sRx.c1.hdr.hit_miss
pck_cp2af_sRx.c1.valid
‘h0‘h1 ‘h0 ‘h1 ‘h0 ‘h2 ‘h1 ‘h3pck_cp2af_sRx.c1.hdr.cl_num
WrLinepck_cp2af_sRx.c1.hdr.resp_type
VH0VL0 VL0 VH0 VL0 VL0 VH1 VL0pck_cp2af_sRx.c1.hdr.vc_used
‘h11‘h10 ‘h12 ‘h11 ‘h10 ‘h10 ‘h13 ‘h10pck_cp2af_sRx.c1.hdr.mdata
‘h0‘h0 ‘h0 ‘h0 ‘h0 ‘h0 ‘h1 ‘h0pck_cp2af_sRx.c1.hdr.format
‘h0‘h1 ‘h0 ‘h0 ‘h1 ‘h0 ‘h0 ‘h1
pClk
pck_cp2af_sRx.c0.hdr.hit_miss
pck_cp2af_sRx.c0.valid
‘h0‘h1 ‘h0 ‘h1 ‘h0 ‘h2 ‘h1 ‘h3pck_cp2af_sRx.c0.hdr.cl_num
RdLinepck_cp2af_sRx.c0.hdr.resp_type
VH0VL0 VL0 VH0 VL0 VL0 VH1 VL0pck_cp2af_sRx.co.hdr.vc_used
‘h11‘h10 ‘h12 ‘h11 ‘h10 ‘h10 ‘h13 ‘h10pck_cp2af_sRx.c0.hdr.mdata
BDW + FPGA Beta Release 5.0.3 Core Cache Interface (CCI-P) Interface Specification
CCI-P Interface Page 41 of 69 2-Sep-16 5:07 PM
Intel Confidential
2.10 Additional Control Signals
Unless otherwise mentioned, all signals are active high.
Table 20: Clock and Reset
Signal Width Direction Description
pck_cp2af_softReset 1b Input Synchronous ACTIVE HIGH soft reset.
When set to 1, AFU must reset all logic. Minimum Reset pulse width is 256 pClk cycles. All outstanding CCI-P requests will be flushed before de-asserting soft reset.
A soft reset will not reset the FIU.
pClk 1b Input Primary interface clock. All CCI-P interface signals are synchronous to this clock. Clock frequency is listed in Section 2.13.
pClkDiv2 1b Input Synchronous and in phase with pClk. 0.5x clock frequency.
pClkDiv4 1b Input Synchronous and in phase with pClk. 0.25x clock frequency.
uClk_usr 1b Input The user defined clock is not synchronous with the pClk.
AFU must synchronize the signals to pClk domain before driving the CCI-P interface.
Default frequency is 312.5 MHz.
Quartus partial reconfiguration flow does not allow PLLs to be instantiated in the reconfigurable region (that is, the AFU). The AFU load utility will program the user defined clock frequency before de-asserting pck_cp2af_softReset.
uClk_usrDiv2 1b Input Synchronous with uClk_usr and 0.5x the frequency.
BDW + FPGA Beta Release 5.0.3
Core Cache Interface (CCI-P) Interface Specification
2-Sep-16 5:07 PM Page 42 of 69 CCI-P Interface
Intel Confidential
Signal Width Direction Description
pck_cp2af_pwrState 2b Input Indicates the current AFU power state request. In response to this, the AFU must attempt to reduce its power consumption. If sufficient power reduction is not achieved, the AFU may be Reset.
2’h0 – AP0 - Normal operation mode 2’h1 – AP1 - Request for 50% power reduction 2’h2 – Reserved, illegal 2’h3 – AP2 - Request for 90% power reduction
When pck_cp2af_pwrState is set to AP1, the FIU will start throttling the memory request path to achieve 50% throughput reduction. The AFU is also expected to reduce it power utilization to 50%, by throttling back accesses to FPGA internal memory resources and its compute engines. Similarly upon transition to AP2, the FIU will throttle the memory request paths to achieve 90% throughput reduction over normal state, and AFU in turn is expected to reduce its power utilization to 90%.
pck_cp2af_error 1b Input CCI-P protocol error has been detected and logged in the PORT Error register. This register is visible to the AFU.
It can be used as trigger for signal taps.
When such an error is detected, the CCI-P interface stops accepting new requests and sets AlmFull is set to 1.
There is no expectation to complete outstanding requests.
The AFU is not reset.
BDW + FPGA Beta Release 5.0.3 Core Cache Interface (CCI-P) Interface Specification
CCI-P Interface Page 43 of 69 2-Sep-16 5:07 PM
Intel Confidential
Protocol Flow
2.10.1 Upstream Requests
Table 21: Protocol Flow for upstream requests from AFU to FIU
Type Tx Request Tx Data Rx Response Rx Data
Memory Write WrLine_I
Yes WrLine No WrLine_M
WrPush_I
Memory Read RdLine_I
No RdLine Yes
RdLine_S
Special Messages WrFence No WrFence No
Column 3 Identifies whether the request expects a Tx Data payload Tx Data
Column 5 Identifies whether the response returns a Data payload Rx Data
BDW + FPGA Beta Release 5.0.3
Core Cache Interface (CCI-P) Interface Specification
2-Sep-16 5:07 PM Page 44 of 69 CCI-P Interface
Intel Confidential
Table 22 CCI-P VL0 protocol flows
CCI-P Request
FPGA Cache UPI Cycle
FPGA cache Next state
CCI-P Response
UPI Cycle FPGA cache Next State
CCI-P Response
UPI Cycle
FPGA cache Next State
CCI-P Response
Hit/ Miss
State Phase 1 Phase 2 Phase 3
WrLine_I Hit M None M WrLine WbMtoI I
Hit S InvItoE
Miss S, I
WrLine_M Hit M None M WrLine N.A.
Hit S InvItoE
Miss S, I
WrLine_I Miss M WbMotI I InvItoE M WrLine WbMotI I
WrLine_M
WrPush_I WbPushMotI
I
WrPush_I Hit M None M WrLine WbPushMotI
I
Hit S, I InvItoE
Miss S, I
RdLine_S Hit S, M None No Change
RdLine N.A.
Miss S,I RdCode S RdLine
RdLine_I Hit S, M None No Change
RdLine N.A.
Miss S,I RdCur I RdLine
RdLine_I Miss M WbMotI I RdCur I RdLine
RdLine_S RdCode S
WrLine_I Requires special handling, because it must first write to the CL and then evict it from the cache. The eviction forms Phase 2 of the request.
RdLine_I Recommended as the default read type.
RdLine_S Use sparingly only for cases where you have identified highly referenced CLs.
RdCode Updates the CPU directory and lets the FPGA cache the line in Shared state. RdCur does NOT update the CPU directory, FPGA will not cache this line. A future access to this line from CPU, will not snoop the FPGA.
BDW + FPGA Beta Release 5.0.3 Core Cache Interface (CCI-P) Interface Specification
CCI-P Interface Page 45 of 69 2-Sep-16 5:07 PM
Intel Confidential
2.10.2 Downstream Requests
Table 23: Protocol Flow for Downstream Requests from CPU to AFU
Rx Request Rx Data Tx Response Tx Data
MMIO Read No MMIO Read Data Yes
MMIO Write Yes None N.A.
UMsg Yes None N.A.
UMsgH No None N.A.
BDW + FPGA Beta Release 5.0.3
Core Cache Interface (CCI-P) Interface Specification
2-Sep-16 5:07 PM Page 46 of 69 CCI-P Interface
Intel Confidential
2.11 Ordering Rules
2.11.1 Memory Requests
The CCI-P memory consistency model is different from the PCIe consistency model. CCI-P implements a “relaxed” memory consistency model.
It relaxes ordering requirements for requests to:
Same address
Different addresses
Table 24 below defines the ordering relationship between two memory requests on CCI-P. The same rules apply for requests to the “same” address or “different” addresses. The table entries are defined as follows:
Yes the second (row) request is allowed to pass the first (column) request. No the second (row) request is not allowed to pass the first (column) request.
Table 24 Ordering rules for upstream requests from AFU
Row bypass column? (col 2) Read (col 3) Write (col 4) WrFence
(row 2) Read Yes Yes Yes
(row 3) Write Yes Yes No
(row 4) WrFence Yes No No
Interpret Table 24 as follows.
1. The Read (2nd row) can bypass an earlier Read (2nd column), a write (3rd column), and a WrFence (4th column).
2. The Write (3rd row) can bypass an earlier Read (2nd column) and a write (3rd column). It cannot bypass a WrFence (4th column).
3. The WrFence (4th row) can bypass an earlier Read (2nd column). It cannot bypass a Write (3rd column) and a WrFence (4th column).
BDW + FPGA Beta Release 5.0.3 Core Cache Interface (CCI-P) Interface Specification
CCI-P Interface Page 47 of 69 2-Sep-16 5:07 PM
Intel Confidential
Intra-VC Write observability Upon receiving a memory write response, the write has reached a local observability point.
- All future reads from AFU to same VC will get the new data - All future writes on same VC will replace the data
Inter-VC Write observability A memory write response does NOT mean the data are globally observable across the VCs. A subsequent read on a different VC may return old data. Use a WrFence VA to synchronize across VCs. A WrFence VA does a broadcast.
- It goes beyond waiting for write responses. It pushed all earlier writes to global observability point.
- Upon receiving a WrFence response, all future reads from AFU get the new data.
2.11.1.1 Write Fence usage
To enforce ordering between memory writes, use a WrFence. Because using a WrFence is an expensive operation, restrict its use to synchronization points.
WrFence guarantees that all writes preceding the fence are committed to memory before any writes following the Write Fence are processed.
A WrFence will not be re-ordered with other memory writes or WrFence requests.
WrFence provides no ordering assurances with respect to Read requests.
A WrFence does NOT block the reads. In other words, memory reads can bypass a WrFence. This is shown as item 1 in the interpretation of Table 24.
WrFence request has a vc_sel field. This allows you to determine which of the virtual channels the WrFence is applied to. For example, if you move the data block on VL0, you only need to serialize with respect to other write requests on VL0; that is, you must use WrFence with VL0. Similarly, if your use memory writes with VA, then use WrFence with VA.
A WrFence request returns a response. The response is delivered to the AFU over C1 and identified by the resp_type field. Recall that a read can bypass a WrFence. However, if you want to ensure that you read the latest written data, you can issue a WrFence and then wait for the WrFence response before driving a read.
2.11.1.2 Memory Consistency Explained
CCI-P can re-order requests to the same and different addresses. It does not implement logic to identify data hazards for requests to same address.
BDW + FPGA Beta Release 5.0.3
Core Cache Interface (CCI-P) Interface Specification
2-Sep-16 5:07 PM Page 48 of 69 CCI-P Interface
Intel Confidential
2.11.1.2.1 Two Writes on Different VCs
Example1 Figure 8 shows two writes on different VCs may be committed to memory in a different order
AFU Processor
VH1: Write1 X, Data=A VL0: Write2 X, Data=B
Read1 X, Data = B Read2 X, Data = A
Figure 8: Write Out of Order Commit
AFU writes to X twice, Data=A over VH1 and Data=B over VL0. The processor polls on X and may see updates to X in reverse order; that is, the CPU may see Data=B, followed by Data=A. In summary, the write order seen by the processor may be different from the order in which AFU completed the writes.
To enforce write ordering, the AFU must explicitly identify the ordering boundary and add a WrFence between the Writes.
Example 2 Figure 9 shows the use of WrFence to enforce Write ordering.
AFU Processor
VH1: Write1 X, Data=A VA: WrFence VL0: Write2 X, Data=B
Read1 X, Data = A Read2 X, Data = B
Figure 9: Use WrFence to Enforce Write Ordering
This time AFU adds a VA WrFence between the two writes. The WrFence ensures that the processor sees the writes before the WrFence followed by the writes after the WrFence. Hence, the processor sees Data=A and then Data=B. VA WrFence was used here, because the Writes to be serialized were sent on different VCs.
BDW + FPGA Beta Release 5.0.3 Core Cache Interface (CCI-P) Interface Specification
CCI-P Interface Page 49 of 69 2-Sep-16 5:07 PM
Intel Confidential
2.11.1.2.2 Two Writes on the Same VC
Memory may see two writes to the same VC in a different order from their execution, unless the second write request was generated after the first write response was received.
Example 1 Figure 10 shows two writes on the same VC when the second write is executed after the first write is received.
AFU Processor
VH1: Write1 X, Data=A Resp 1 VH1: Write2 X, Data=B Resp 2
Read1 X, Data = A Read2 X, Data = B
Figure 10: Two Writes on Same VC, Only One Outstanding
AFU writes to X twice on same VC, but it only sends the second write after the first write is received. This ensures that the first write was sent out on the link, before the next one goes out. The CCI-P guarantees that these writes will be seen by the Processor in the right order. Processor will see Data A, followed by Data B.
You may also use a WrFence instead to enforce ordering between writes to same VC. Note, however, that WrFence has stronger semantics, it will stall processing all writes after the fence until all previous writes have completed.
BDW + FPGA Beta Release 5.0.3
Core Cache Interface (CCI-P) Interface Specification
2-Sep-16 5:07 PM Page 50 of 69 CCI-P Interface
Intel Confidential
2.11.1.2.3 Two Reads on Different VCs
Two reads on different VCs may complete out of order; the last read response may return old data.
Example 1 Figure 11 shows how reads from the same address over different VCs may result in re-ordering.
Processor AFU
Store X=1 Store X=2
Request Response
VH1: Read1 X
VL0: Read2 X
VL0: Resp2 X, Data=2
VH1: Resp1 X, Data=1
Figure 11: Read Re-Ordering to Same Address, Different VCs
Processor writes X=1 and then X=2. The AFU reads X twice over different VCs, in Figure 11, Read1 was sent on VH1 and Read2 on VL0. The CCI-P may re-order the responses and return data out of order. AFU may see X=2, followed by X=1. This is different from the Processor write order.
BDW + FPGA Beta Release 5.0.3 Core Cache Interface (CCI-P) Interface Specification
CCI-P Interface Page 51 of 69 2-Sep-16 5:07 PM
Intel Confidential
2.11.1.2.4 Two Reads on the Same VC
Reads to the same VC may complete out of order; the last read response will always return the “new” data.
Note, however, that VA reads behave like two reads on different VCs.
Example 1 Figure 12 shows how reads from the same address over the same VC may result in re-ordering. However, the AFU sees updates in the same order in which they were written.
Processor AFU
Store X=1 Store X=2
Request Response
VL0: Read1 X
VL0: Read2 X
VL0: Resp2 X, Data=1
VL0: Resp1 X, Data=2
Figure 12: Read Re-Ordering to Same Address, Same VC
Processor writes X=1 and then X=2. The AFU reads X twice over the same VC; in Figure 12 both Read1 and Read2 are sent on VL0. The CCI-P may still re-order the Read responses, but CCI-P guarantees to return the newest data last; that is, AFU will see updates to X in the order in which Processor writes to it.
When using VA, CCI-P may return data out of order, because VA request may get directed on VL0, VH0 or VH1.
2.11.1.2.5 Read-after-Write on Same VC
CCI-P does not order read and write requests to even the same address. The AFU must explicitly resolve such dependencies. To do this, the AFU has two requirements:
1. AFU must use same VC for write and read requests. Do not use VA. 2. AFU must send the read request only after write response is received.
2.11.1.2.6 Read-after-Write on Different VCs
The AFU cannot resolve a read-after-write dependency when different VCs are used.
2.11.1.2.7 Write-after-Read on Same or Different VCs
CCI-P does not order write after read requests even when they are to the same address. The AFU must explicitly resolve such dependencies. The AFU must send the write request only after read response is received.
BDW + FPGA Beta Release 5.0.3
Core Cache Interface (CCI-P) Interface Specification
2-Sep-16 5:07 PM Page 52 of 69 CCI-P Interface
Intel Confidential
2.11.1.2.8 Some example scenarios:
1. More than one outstanding read/write requests to an address results in non-deterministic behavior.
Example 1 Two writes to same address X can be completed out of order. The final value at address X is non-deterministic. To enforce ordering add a WrFence between the write requests.
Example 2 Two reads from same address X, may be completed out of order. This is not a data hazard, but an AFU developer should make no ordering assumptions.
Example 3 Write to followed by read from address X. It is non-deterministic; that is, the Read will return the new data (data after the write) or the old data (data before the write) at address X.
Example 4 Read followed by write to address X. It is non-deterministic; that is, the read will return the new data (data after the write) or the old data (data before the write) at address X.
Use the read responses to resolve read dependencies.
Use a write Fence to implement a write memory barrier.
2. Read/write requests to different addresses may be completed out of order.
Example 1 AFU writes the data to address Z and then wants to notify the SW thread by updating a value of flag at address X.
To implement this, the AFU must use a write fence between write to Z and write to X. The write fence will ensure that Z is globally visible before write to X is processed.
Example 2 AFU reads data starting from address Z and then wants to notify the SW thread by updating the value of flag at address X.
To implement this, the AFU must perform the read from Z, wait for the read response and then perform the write to X.
BDW + FPGA Beta Release 5.0.3 Core Cache Interface (CCI-P) Interface Specification
CCI-P Interface Page 53 of 69 2-Sep-16 5:07 PM
Intel Confidential
2.11.2 MMIO Requests
MMIO memory is exposed as pre-fetchable memory to the OS. This means that accesses to MMIO region should have no read side-effects. This is same as the 64b pre-fetchable BAR as defined in the PCIe Specification.
MMIO Read cycles follow the UC (uncacheable) ordering rules. Refer to the Intel Software Developers Manual for more information on UC ordering rules.
MMIO Write cycles may follow either WC (write coalescing) or UC ordering rules.
Table 25: MMIO Ordering Rules
Request Memory Attribute
Payload size Memory Ordering Comments
MMIO
Write UC 4B or 8B or
using AVX 64B Strongly ordered Common case- AAL behavior
WC 4B or 8B or 64B Weakly ordered Special case
MMIO
Read UC 4B or 8B Strongly ordered Common case- AAL behavior
WC 4B or 8B Weakly ordered Special case- streaming read (MOVNTDQA) can cause wider reads. NOT supported
MMIO requests within the FIU, maintain the ordering set forth by the CPU.
MMIO read responses within the FIU, are not ordered w.r.t. memory read or write requests. The AFU must resolve ordering dependencies w.r.t. memory requests, before returning the MMIO Read response.
BDW + FPGA Beta Release 5.0.3
Core Cache Interface (CCI-P) Interface Specification
2-Sep-16 5:07 PM Page 54 of 69 CCI-P Interface
Intel Confidential
2.12 Timing diagrams
This section provides the timing diagrams for CCI-P interface signals.
H1H0 H2 H3 H4 H5 H6 H7 H8 H9
pClk
pck_af2cp_sTx.c1.hdr
pck_cp2af_sRx.c1.TxAlmostFull
pck_af2cp_sTx.c1.valid
D1D0 D2 D3 D4 D5 D6 D7 D8 D9pck_af2cp_sTx.c1.data
Up to 8 valid cycles
Tx Channel 0 & 1 timing
Figure 13: Tx Channel 0 & 1 almost full threshold
Wr1Wr0 Wr2 WrF Wr3 Wr4
pClk
pck_af2cp_sTx.c1.hdr
pck_af2cp_sTx.c1.valid
D1D0 D2 D3 D4pck_af2cp_sTx.c1.data
Wr1 Wr2 Wr0 WrFpck_cp2af_sRx.c1.valid
pck_cp2af_sRx.c1.valid
Wr4
Write Barrier
Write Barrier
*WrF- Write Fence
Wr3
WrFence behavior
Figure 14: Write Fence Behavior
WrFence is inserted between WrLine requests. A WrFence response returns on the Rx channel. Note that in Figure 14, all the writes generated before the write fence get response completions before the writes after the write fence are completed.
WrFence will only fence the Write on the VC selected. Chose VA if you want to fence across all VCs.
BDW + FPGA Beta Release 5.0.3 Core Cache Interface (CCI-P) Interface Specification
CCI-P Interface Page 55 of 69 2-Sep-16 5:07 PM
Intel Confidential
Wr0Rd0 Wr1 Rd1 Rd2 Wr2 Wr3 Wr4
pClk
pck_cp2af_sRx.c0.hdr
pck_cp2af_sRx.c0.mmioWrValid
D0 D1 D2 D3 D4pck_cp2af_sRx.c0.data[63:0]
C0 channel interleaved between MMIO Requests & Memory Responses
pck_cp2af_sRx.c0.mmioRdValid
pck_cp2af_sRx.c0.rspValid
Rsp0
D0
MMIO Wr Request MMIO Rd Request Memory Rd ResponseColor legend
Figure 15: C0 Rx Channel Interleaved between MMIO Requests and Memory Responses
Req
pClk
pck_cp2af_sRx.c0.hdr
Rsppck_af2cp_sTx.c2.hdr
pck_af2cp_sTx.c2.data
pck_cp2af_sRx.c0.mmioRdValid
pck_af2cp_sTx.c2.mmioRdValid
Data
Max response latency 512 pClk cycles
MMIO Rd Response timeout
Re
qu
est
Re
spo
nse
Figure 16:Rd Response Timeout
2.13 Clock Frequency
Table 26: Clock Frequency
CPU pClk (MHz)
“Interface Clock” pClkDiv2 (MHz) pClkDiv4 (MHz)
SKL+FPGA 400 200 100
BDW + FPGA Beta Release 5.0.3
Core Cache Interface (CCI-P) Interface Specification
2-Sep-16 5:07 PM Page 56 of 69 CCI-P Interface
Intel Confidential
2.14 CCI-P Guidance
This section suggests techniques and settings that are useful when you are just beginning to use the BDW + FPGA system.
The CCI-P interface provides several advanced features for fine grained control of FPGA caching states and virtual channels. When used correctly, you can get optimal performance through the interface; if used incorrectly, you may see significant degradation in performance.
Table 27 lists some suggested parameters for request fields.
Table 27 Recommended Choices for Memory Requests
Field Recommended Option
vc_sel For producer-consumer type flows VA For Latency sensitive flows VL0 For data dependent flow Use 1 VC, except VA
Length For maximum bandwidth 2’b11 – 256B
Request Type Memory Reads RdLine_I Memory Writes WrLine_M
CPU-to-FPGA notification Use MMIO Write for control notification only. FPGA-to-CPU notification Implement a polling loop on the SW thread, reading MMIO
Register in AFU. When setting the size of the request buffers in the AFU, follow this guidance:
64 outstanding requests each on VH0 and VH1 for a total of 128 requests.
Typical 128 outstanding requests with a maximum of 256 outstanding requests on VL0.
Total number of outstanding requests on VA is 128 + 128 (or 256) = 256 (or 384) outstanding requests.
In some cases UMsg may give better performance for CPU-to-FPGA notification. However, using UMsgs is an advanced technique that introduces additional complexity. It is best used by experienced users.
BDW + FPGA Beta Release 5.0.3 Core Cache Interface (CCI-P) Interface Specification
AFU Requirements Page 57 of 69 2-Sep-16 5:07 PM
Intel Confidential
3 AFU Requirements
This section defines the AFU initialization flow upon power on, and mandatory AFU CSRs.
3.1 Mandatory AFU CSR Definitions
The following requirements are defined for software access to AFU CSRs.
1. Software is expected to access 64-bit CSRs as aligned quad words. For example, to modify a field (for example, bit or byte) in a 64-bit CSRs, the entire quad word is read, the appropriate field(s) are modified, and the entire quad word is written back.
2. Similarly for AFUs supporting 32-bit CSRs, software is expected to access them as aligned double words.
3. Locked operations to AFU CSRs are not supported. Software must not issue locked operations to access AFU CSRs.
Each CCI-P-compliant AFU is required the implement the four mandatory registers defined in Table 29. If you do not implement these registers or if you implement them incorrectly, AFU discovery could fail, or some other unexpected behavior may occur.
Table 28: Register Attribute Definition
Attribute Expansion Description
RO Read Only The bit is set by hardware only. Software can only read this bit. Writes do not have any effect.
Rsvd Reserved Reserved for future definition. AFU must set them to 0s. SW must ignore these fields.
Table 29 shows both byte and DWORD offsets for the mandatory AFU CSRs. The base address is set by the platform and need not be specified by the AFU.
Table 29: Mandatory AFU CSRs
DWORD Address Offset (CCI-P)
Byte Address Offset (AAL)
Width Attr Name
0x0000 0x0000 64b RO DEV_FEATURE_HDR (DFH)
0x0002 0x0008 64b RO AFU_ID_L Refer to Table 31
0x0004 0x0010 64b RO AFU_ID_H Refer to Table 32
0x0006 0x0018 64b Rsvd DFH_RSVD0 Refer to Table 33
BDW + FPGA Beta Release 5.0.3
Core Cache Interface (CCI-P) Interface Specification
2-Sep-16 5:07 PM Page 58 of 69 AFU Requirements
Intel Confidential
DWORD Address Offset (CCI-P)
Byte Address Offset (AAL)
Width Attr Name
0x0008 0x0020 64b Rsvd DFH_RSVD1 Refer to Table 34.
Code 6 shows how the AFU might set the mandatory AFU CSRs. You must define your own AFU ID. Note that the AFU uses DWORD addresses. Code 7 shows how an AAL program might read the AFU ID.
Code 6: Set the Mandatory AFU Registers in the AFU
The AAL software and the AFU RTL must reference the same AFU ID.
Code 7: AAL Reads the AFU ID
t_ccip_c0_ReqMmioHdr mmioHdr; : : case (mmioHdr.address) // AFU header 16'h0000 : af2cp_sTxPort.c2.data <= { // DFH 4'b0001, // Feature Type = AFU 8'b0, // Reserved 4'b0, // AFU Minor Revision = 0 7'b0, // Reserved 1'b1, // End of DFH list = 1 24'b0, // Next DFH offset = 0 4'b0, // AFU Major version = 0 12'b0 // Feature ID = 0 }; 16'h0002 : af2cp_sTxPort.c2.data <= 64'ha12e_bb32_8f7d_d35c; // AFU_ID_L (arbitrary example) 16'h0004 : af2cp_sTxPort.c2.data <= 64'ha455_783a_3e90_43b9; // AFU_ID_H (arbitrary example) 16'h0006 : af2cp_sTxPort.c2.data <= 64'h0; // Next AFU 16'h0008 : af2cp_sTxPort.c2.data <= 64'h0; // Reserved
btUnsigned32bitInt AFUID_H, AFUID_L; : : IALIMMIO *m_pALIMMIOService; //< Pointer to MMIO Service: : : // the AFUID to be passed to the Resource Manager. It will be used to locate the appropriate device. ConfigRecord.Add(keyRegAFU_ID,"A455783A-3E90-43B9-A12E-BB328F7DD35C"); : m_pALIMMIOService->mmioRead32(0x0008, &AFUID_L); printf("Read AFUID_L= 0x%08x\n", AFUID_L); m_pALIMMIOService->mmioRead32(0x0010, &AFUID_H); printf("Read AFUID_H= 0x%08x\n", AFUID_H);
BDW + FPGA Beta Release 5.0.3 Core Cache Interface (CCI-P) Interface Specification
AFU Requirements Page 59 of 69 2-Sep-16 5:07 PM
Intel Confidential
Table 30: Feature Header CSR Definition
Register Name Device Feature Header (DFH)
Address Offset 0x0
Bit Attr Default Description
63:60 RO 0x1 Type: AFU
59:52 Rsvd 0x0 Reserved
51:48 RO 0x0 AFU Minor version # User defined value
47:41 Rsvd 0x0 Reserved
40 RO N.A. End of List
1’b0 There is another feature header beyond this (see “Next DFH Byte Offset”)
1’b1 This is the last feature header for this AFU
39:16 RO 0x0 Byte offset to the Next Device Feature Header; that is, offset from the current address.
Example: Feature 0 @ Address 0x0 Next Feature offset = 0x100 Feature 1 @ Address 0x100 Next Feature offset = 0x100 Feature 2 @ Address 0x200 Next Feature offset = N.A. for last
feature
15:12 RO N.A. AFU Major version # User defined value
11:0 RO 0x070 CCI-P version # Use the CCIP_VERSION_NUMBER parameter from ccip_if_pkg.sv
BDW + FPGA Beta Release 5.0.3
Core Cache Interface (CCI-P) Interface Specification
2-Sep-16 5:07 PM Page 60 of 69 AFU Requirements
Intel Confidential
Table 31: AFU_ID_L CSR Definition
Register Name AFU_ID_L
Address Offset 0x8
Bit Attr Default Description
63:0 RO 0h Lower 64-bits of the AFU_ID GUID. Refer to Section 3.3.2
Table 32: AFU_ID_H CSR Definition
Register Name AFU_ID_H
Address Offset 0x10
Bit Attr Default Description
63:0 RO 0h Upper 64-bits of the AFU_ID GUID. Refer to Section 3.3.2
Table 33: DFH_RSVD0 CSR Definition
Register Name DFH_RSVD0
Address Offset 0x18
Bit Attr Default Description
63:0 Rsvd 0x0 Reserved for future definition.
Table 34: DFH_RSVD1 CSR Definition
Register Name DFH_RSVD1
Address Offset 0x20
Bit Attr Default Description
63:0 Rsvd 0x0 Reserved for future definition.
BDW + FPGA Beta Release 5.0.3 Core Cache Interface (CCI-P) Interface Specification
AFU Requirements Page 61 of 69 2-Sep-16 5:07 PM
Intel Confidential
3.2 AFU Discovery Flow
A CCI-P compliant AFU must implement the mandatory AFU CSRs. Figure 17 shows initial transactions immediately after pck_cp2af_softReset is de-asserted. The AFU has to accept the MMIO Read cycles immediately after soft rest is de-asserted.
Driver FIUUser AFU
De-assert Port Reset pck_cp2af_softReset=0
MMIO Rd to FIU CSR
Response with Reset statusDriver checks if all old requests are drained
User Application
MMIO Rd (DFH) MMIO Rd (0x0)
Rsp(DFH type=AFU)Rsp(DFH type=AFU)
MMIO Rd (AFU_ID_L) MMIO Rd (0x8)
Rsp(AFU_ID_L)Rsp(DFH type=AFU_ID_L)
MMIO Rd (AFU_ID_H) MMIO Rd (0x10)
Rsp(AFU_ID_H)Rsp(DFH type=AFU_ID_H)
Publish AFU resource/ Allocate AFU
Enumerate AFU DFH
Read AFU_ID
Driver hands over AFU control to
Application
Install Driver
AFU out of Reset
Figure 17 : AFU discovery flow
3.3 AFU_ID
The purpose of an AFU_ID is to precisely identify the architectural interface of an AFU. This interface is the contract that the AFU makes with the software.
Multiple instantiations of an AFU can have the same AFU_ID value, but if the architectural interface of the AFU changes, then it needs a new AFU_ID.
The architectural interface of an AFU comprises the syntax and semantics of the AFU design, consisting of the AFU’s functionality, its CSR definitions, the protocol expected by the AFU when manipulating its CSRs, and all implicit or explicit assumptions or guarantees about its buffers.
The AAL framework and the application software use the AFU_ID to ensure that they are matched to the correct AFU; that is, that they are obeying the same architectural interface.
Technically, the AFU_ID is a 128 bit GUID, and can be generated using standard GUID creation tools (see below).
BDW + FPGA Beta Release 5.0.3
Core Cache Interface (CCI-P) Interface Specification
2-Sep-16 5:07 PM Page 62 of 69 AFU Requirements
Intel Confidential
3.3.1 How to Create an AFU_ID / GUID
Linux Use the command uuidgen.
$ uuidgen 1ad7bb9f-1371-4b3c-ab68-aaaa657f130b
Microsoft 1. Use Power Shell. From a PowerShell console, enter
> [guid]::NewGuid() cf545c57-9e6a-46cf-8601-32d3be765f4a
2. Or execute that Power Shell command from a standard Windows CMD shell:
> powershell -Command "[guid]::NewGuid()" 63ae3df9-1204-49ff-9144-45d23e27a4d3
3.3.2 How to Use an AFU_ID
Assuming that you get a GUID that looks like the following:
00112233-4455-6677-8899-aabbccddeeff
In the RTL, use this format (an underscore every four hex digits):
16'h0002 : af2cp_sTxPort.c2.data <= 64'ha12e_bb32_8f7d_d35c; // AFU_ID_L 16'h0004 : af2cp_sTxPort.c2.data <= 64'ha455_783a_3e90_43b9; // AFU_ID_H
In the software (for example, when constructing a configuration record), use this format(note the location of the dashes):
ConfigRecord.Add(keyRegAFU_ID,"A455783A-3E90-43B9-A12E-BB328F7DD35C");
These two AFU_ID values match each other. This allows the AAL runtime framework to match the hardware AFU with the software application using the AFU_ID.
BDW + FPGA Beta Release 5.0.3 Core Cache Interface (CCI-P) Interface Specification
Basic Building Blocks Page 63 of 69 2-Sep-16 5:07 PM
Intel Confidential
4 Basic Building Blocks
Basic Building Blocks are Intel-provided reference designs that users can instantiate in their AFU. There are two types of Basic Building Blocks (BBBs): software-visible (exposes q register interface and requires software interaction) and software-invisible (does not require software interaction). In both cases, it is your responsibility to integrate the hardware and software into your AFU.
Examples:
SW visible Shared Virtual Memory (SVM) using the Memory Property Factory’s (MPF) VTP feature. SW invisible Reorder buffer; asynchronous CCI-P interface.
An example of a BBB is the Memory Properties Factory (MPF). MPF is an optional, parameterized basic building block to implement shared virtual memory in a proprietary manner. MPF is a collection of shims that transform CCI to CCI, adding some property. VTP (Virtual to Physical) is the translation shim. Other MPF Properties include read response sorting, order guarantees within a cache line, and VTP.
BBBs have mandatory registers defined in the next section.
BDW + FPGA Beta Release 5.0.3
Core Cache Interface (CCI-P) Interface Specification
2-Sep-16 5:07 PM Page 64 of 69 Device Feature List
Intel Confidential
5 Device Feature List
This section defines a feature list structure that creates a linked list of feature headers within MMIO space, thus providing an extensible way of adding features. The software can walk through the feature headers to enumerate the following:
AFUs
Basic Building Blocks (BBBs)
Private features
Table 35:Differences between AFU, Private Features, and BBBs
AFU Private Feature BBB
Must implement mandatory AFU registers, including AFU ID. An AFU will be compliant to the CCI-P interface and connected directly to the CCI-P Port.
It is a primary unit of allocation, PR and reset from SW PoV
These are a linked list of features within the AFU, which provides a way of organizing functions within an AFU. It is the AFU developer’s responsibility to enumerate and manage them.
They are not required to implement a GUID.
BBBs are special features within the AFU, which are meant to be reusable building blocks (design once, reuse many times). SW visible BBBs typically come with a corresponding software service to enumerate and configure the BBB, and possibly provide a higher-level SW interface to the BBB.
BBBs do not have strong HW interface requirements like an AFU, but they must have strong architectural semantics from SW PoV.
They must implement a GUID.
A feature region (sometimes referred to simply as a “feature”) is a group of related CSRs. For example, two different features of a DMA engine are queue management and QoS functions. You can group queue management and QoS functions into two different feature regions.
BBB and private features are always children of an AFU and as such must be contained within an AFU.
Figure 18 shows an example of feature hierarchy and the relationship of the AFU BBB and private
features.
BDW + FPGA Beta Release 5.0.3 Core Cache Interface (CCI-P) Interface Specification
Device Feature List Page 65 of 69 2-Sep-16 5:07 PM
Intel Confidential
(mandatory)Address: 0x0
DFH Type=AFUEOL=0
Private Feature 1EOL=0
BBB Feature 2EOL=0
Private Feature 3EOL=1
Figure 18 Example feature hierarchy
A Device Feature Header (DFH) register (shown in Table 36) marks the start of the feature region.
BDW + FPGA Beta Release 5.0.3
Core Cache Interface (CCI-P) Interface Specification
2-Sep-16 5:07 PM Page 66 of 69 Device Feature List
Intel Confidential
Table 36 : Device Feature Header CSR
Device Feature Header
Bit Description
63:60 Feature Type
4’h1 – AFU 4’h2 – BBB 4’h3 – Private Features
59:52 Reserved
51:48 AFU Minor version # User defined value
Reserved
47:41 Reserved
40 End of List
1’b0 There is another feature header beyond this (see “Next DFH Byte Offset”) 1’b1 This is the last feature header for this AFU
39:16 Next DFH Byte offset Next DFH Address = Current DFH Address + Next DFH Byte offset Refer to the example in Table 37.
15:12 AFU Major Version # User defined
Feature Revision User defined
11:0 CCI-P Version # Refer to Table 38 for the AFU DFH register map.
Feature ID Contains user defined ID to identify features within an AFU.
Table 37 Next DFH Byte offset example
Feature DFH Address EOL Next DFH Byte offset
0 0x0 0x0 0x100
1 0x100 0x0 0x180
2 – Last feature 0x280 0x1 0x80
Unallocated MMIO space, no DFH
0x300 N.A. N.A.
BDW + FPGA Beta Release 5.0.3 Core Cache Interface (CCI-P) Interface Specification
Device Feature List Page 67 of 69 2-Sep-16 5:07 PM
Intel Confidential
Table 38 Mandatory AFU DFH register map
Byte Address offset w.r.t DFH Register Name
0x0000 DFH Type=AFU
0x0008 AFU_ID_L
0x0010 AFU_ID_H
0x0018 Next AFU
0x0020 Reserved
Table 39 AFU_ID_L CSR definition
Register Name AFU_ID_L
Bit Attr Description
63:0 RO Lower 64-bits of the AFU_ID GUID
Table 40 AFU_ID_H CSR definition
Register Name AFU_ID_H
Bit Attr Description
63:0 RO Upper 64-bits of the AFU_ID GUID
Table 41 Next AFU CSR
Register Name Next AFU
Bit Attr Description
63:24 Rsvd Reserved
23:0 RO Next AFU DFH Byte offset
Next AFU DFH address = current address + offset Value of 0, implies it is the last AFU in the list.
Example: AFU 0 @ Address 0x0 Next AFU offset = 0x100 AFU 1 @ Address 0x100 Next AFU offset = 0x100 AFU 2 @ Address 0x200 Next AFU offset = 0x0 (indicates end of AFU list)
BDW + FPGA Beta Release 5.0.3
Core Cache Interface (CCI-P) Interface Specification
2-Sep-16 5:07 PM Page 68 of 69 Device Feature List
Intel Confidential
Table 42: DFH_RSVD1 CSR Definition
Register Name DFH_RSVD1
Bit Attr Description
63:0 Rsvd Reserved
A DFH with Type=BBB must be followed by the mandatory BBB registers.
Table 43: Mandatory BBB DFH Register Map
Byte Address offset w.r.t DFH Register Name
0x0000 DFH Type=BBB
0x0008 BBB_ID_L
0x0010 BBB_ID_H
The mandatory BBB register definitions are defined below.
Table 44: BBB_ID_L CSR Definition
Register Name BBB_ID_L
Bit Attr Description
63:0 RO Lower 64-bits of the BBB_ID GUID
Table 45: BB_ID_H CSR Definition
Register Name BBB_ID_H
Bit Attr Description
63:0 RO Upper 64-bits of the BBB_ID GUID
The BBB_ID is a GUID, similar in concept to an AFU_ID. It is defined so that each BBB has a unique identifier from the SW PoV; this allows the AAL to identify the SW service associated with the BBB RTL.
Figure 19 shows how a logical feature hierarchy (shown on left-hand side) can be expressed using DFH registers defined in this section.
BDW + FPGA Beta Release 5.0.3 Core Cache Interface (CCI-P) Interface Specification
Device Feature List Page 69 of 69 2-Sep-16 5:07 PM
Intel Confidential
Feature CSRs
GUID_H
GUID_L
Type=BBB Feature Rev Feature IDNext DFH Byte offsetReserved
DFH Type=AFUEOL=0
Private Feature 1
EOL=0
BBB Feature 2EOL=0
Private Feature 3
EOL=1
Next DFH Byte offset Feature Rev Feature IDReserved
Feature CSRs
Reserved
Reserved
AFU_ID_H
AFU_ID_L
63 0
Type=AFU
ReservedType=Priv
If==1
No+
Feature 1 Addr
AFU major # CCI-P version #Next DFH Byte offsetAFU minor #
If==1
+
Feature 2 Addr
No
No
Next DFH Byte offset Feature Rev Feature IDReserved
Feature CSRs
ReservedType=Priv
Yes
+
Feature 3 Addr
End of Feature list
Register Map
EOL=0AFU DFH
(mandatory)
EOL=0
EOL=0
EOL=1
If!=1
If==1
Logical View
Figure 19: Device Feature Conceptual View