p51: high performance networking - university of cambridge
TRANSCRIPT
![Page 2: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/2.jpg)
Practical Assignment
![Page 3: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/3.jpg)
Practical Assignment
• Goal: Design a high performance networked application
• Who: 2 students in each team
• What:
• Start from a reference design (e.g. Reference Switch)
• Pick an application
• Implement in a network device and provide line-rate performance
• When:
• Due date: First day of Easter term, 23/4/2019Via Moodle
![Page 4: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/4.jpg)
Practical Assignment
• Submission contents:
• Source code and bit files
• Any scripts used for testing + outputs
• Documentation
• Architecture
• Performance profiles
• Design decisions
• Evaluation plan and evaluation results
• Full details will be provided later
![Page 5: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/5.jpg)
Project Selection
• Project ideas appear on Moodle• Pick an application – prefer a topic you are familiar with.• Pick a platform:
• NetFPGA – default, provided by the course’s team• Otherwise – bring your own platform, subject to approval
Access to the platform required for the assessment• Pick a starting point – Choose an easy starting point!
• Default – NetFPGA Reference Switch• Can be any network application
• Pick a design flow:• Verilog – stable, more support and existing code.• P4 – less hardware knowledge required, easier to implement.
![Page 6: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/6.jpg)
Project – For Next Week
• Choose your partner
• Choose your project
• Submit a project proposal via Moodle by 29/1/2019 16:00
• Use the template on the course’s website
• Indicate if you are interested in publishing your work
• Projects will be discussed in the next lab (30/1/2019)
• The project proposal can be updated following 2:1 discussion (next week)
![Page 7: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/7.jpg)
Section I: The NetFPGA platform
![Page 8: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/8.jpg)
NetFPGA = Networked FPGA
A line-rate, flexible, open networking platform for teaching and research
![Page 9: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/9.jpg)
NetFPGA Family of Boards
NetFPGA-1G (2006)
NetFPGA-1G-CML (2014)
NetFPGA-10G (2010)
NetFPGA SUME (2014)
![Page 10: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/10.jpg)
NetFPGA consists of…
Four elements:
• NetFPGA board
• Tools + reference designs
• Contributed projects
• Community
![Page 11: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/11.jpg)
FPGA
Memory
10GbE
10GbE
10GbE
10GbE
NetFPGA Board
PCI-Express
CPU Memory
PC with NetFPGA
NetworkingSoftwarerunning on a standard PC
A hardware acceleratorbuilt with Field Programmable Gate Arraydriving 1/10/100Gb/snetwork links
![Page 12: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/12.jpg)
Tools + Reference Designs
Tools:• Compile designs• Verify designs• Interact with hardware
Reference designs:• Router (HW)• Switch (HW)• Network Interface Card (HW)• Router Kit (SW)• SCONE (SW)
![Page 13: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/13.jpg)
Community
Wiki
• Documentation
• User’s Guide “so you just got your first NetFPGA”
• Developer’s Guide “so you want to build a …”• Encourage users to contribute
Mailing list
• Announcements
• Support by users for users
![Page 14: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/14.jpg)
International Community
Over 1,200 users, using over 3500 cards at
200 universities in over 47 countries
![Page 15: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/15.jpg)
NetFPGA SUME Community (since Feb 2015)
Over 600 users, 300 universities in 60 countries and 6 continents
![Page 16: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/16.jpg)
NetFPGA’s Defining Characteristics
• Line-Rate– Processes back-to-back packets
• Without dropping packets • At full rate
– Operating on packet headers • For switching, routing, and firewall rules
– And packet payloads• For content processing and intrusion prevention
• Open-source Hardware– Similar to open-source software
• Full source code available • BSD-Style License for SUME, LGPL 2.1 for 10G
– But harder, because • Hardware modules must meet timing• Verilog & VHDL Components have more complex interfaces • Hardware designers need high confidence in specification of modules
![Page 17: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/17.jpg)
Test-Driven Design
• Regression tests• Have repeatable results • Define the supported features• Provide clear expectation on functionality
• Example: Internet Router• Drops packets with bad IP checksum• Performs Longest Prefix Matching on destination address• Forwards IPv4 packets of length 64-1500 bytes• Generates ICMP message for packets with TTL <= 1• Defines how to handle packets with IP options or non IPv4
… and dozens more … Every feature is defined by a regression test
![Page 18: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/18.jpg)
Section II: Hardware Overview
![Page 19: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/19.jpg)
NetFPGA-SUME
![Page 20: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/20.jpg)
NetFPGA-SUME
High Level Block Diagram
![Page 21: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/21.jpg)
Xilinx Virtex 7 690T
• Optimized for high-performance applications
• 690K Logic Cells
• 52Mb RAM
• 3 PCIe Gen. 3 Hard cores
![Page 22: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/22.jpg)
Memory Interfaces
• DRAM:2 x DDR3 SoDIMM1866MT/s, 4GB
• SRAM:3 x 9MB QDRII+, 500MHz
![Page 23: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/23.jpg)
Host Interface
• PCIe Gen. 3
• x8 (only)
• Hardcore IP
![Page 24: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/24.jpg)
Front Panel Ports
• 4 SFP+ Cages
• Directly connected tothe FPGA
• Supports 10GBase-R transceivers (default)
• Also Supports 1000Base-X transceivers and direct attach cables
![Page 25: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/25.jpg)
Expansion Interfaces
• FMC HPC connector
• VITA-57 Standard
• Supports Fabric Mezzanine Cards (FMC)
• 10 x 12.5Gbps serial links
• QTH-DP
• 8 x 12.5Gbps serial links
![Page 26: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/26.jpg)
Storage
• 128MB FLASH
• 2 x SATA connectors
• Micro-SD slot
• Enable standalone operation
![Page 27: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/27.jpg)
Beyond Hardware
• NetFPGA Board• Xilinx Vivado based IDE• Reference designs using AXI4• Software (embedded and PC)• Public Repository• Public WikiReference Designs AXI4 IPs
Xilinx Vivado
MicroBlaze SW PC SW
GitHub, User Community
![Page 28: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/28.jpg)
Section III: Life of a Packet
![Page 29: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/29.jpg)
Reference Switch Pipeline
• Five stages:• Input port• Input arbitration• Forwarding decision and
packet modification• Output queuing• Output port
• Packet-based module interface
• Pluggable design
10GERxQ
10GERxQ
10GERxQ
10GERxQ DMA
Input Arbiter
Output Port Lookup
Output Queues
10GETx
10GETx
10GETx
10GETx DMA
![Page 30: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/30.jpg)
Full System Components
Software
PCIe Bus
NetFPGA
AXI Lite
user data path
Registers
nf0 nf1 nf2 nf3 ioctl
MACTxQ
MACRxQ
Ports
CPURxQ
CPUTxQ
MACTxQ
MACRxQ
MACTxQ
MACRxQ
10GETx
10GERx
![Page 31: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/31.jpg)
00:0a:..:0X00:0a:..:0Y
Life of a Packet through the Hardware
Port 1 Port 2
![Page 32: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/32.jpg)
10GE Rx Queue
10GE Rx
Queue
![Page 33: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/33.jpg)
10GE Rx Queue
10GE Rx
Queue
Eth Hdr:Dst MAC, Src MAC
Payload
Length, Srcport, Dst port, User defined
0
TUSER TDATA
![Page 34: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/34.jpg)
Input Arbiter
Input Arbiter
Rx 0
Rx 1
…
Rx 4
Pkt
Pkt
Pkt
![Page 35: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/35.jpg)
Output Port Lookup
Output Port
Lookup
![Page 36: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/36.jpg)
Output Port
Lookup
Eth Hdr: Dst MAC= nextHop , Src MAC = port
4Payload
Length, Srcport, Dst port, User defined
0
Output Port Lookup
1- Parse header: SrcMAC, Dst
MAC, Src port
2 - Lookup next hop
MAC& output port
3- Learn SrcMAC & Src
port
4- Update output port in TUSER
TUSER TDATA
![Page 37: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/37.jpg)
Output Queues
Output Queues
OQ0
OQ2
OQ4
![Page 38: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/38.jpg)
10GE Port Tx
10GE Port Tx
![Page 39: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/39.jpg)
MAC Tx Queue
MAC Tx Queue
Eth Hdr: Dst MAC , SrcMAC
Payload
Length, Srcport, Dst port, User defined
0
![Page 40: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/40.jpg)
P4NetFPGA Compilation Overview
41
P4 Program
Xilinx P416 Compiler
Xilinx SDNet Tools
SimpleSumeSwitch Architecture
NetFPGA Reference Switch
10GERxQ
10GERxQ
10GERxQ
10GERxQ DMA
Input Arbiter
Output Port Lookup
Output Queues
10GETx
10GETx
10GETx
10GETx DMA
SimpleSumeSwitch
![Page 41: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/41.jpg)
NetFPGA-Host Interaction
• Linux driver interfaces with hardware
• Packet interface via standard Linux network stack
• Register reads/writes via ioctl system call with wrapper functions:
• rwaxi(int address, unsigned *data);
eg:
rwaxi(0x7d4000000, &val);
![Page 42: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/42.jpg)
NetFPGA-Host Interaction
NetFPGA to host packet transfer
PCIe
Bus
2. Interrupt notifies driver of packet arrival
3. Driver sets up and initiates DMA transfer
1. Packet arrives –forwarding table sends to DMA queue
![Page 43: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/43.jpg)
NetFPGA-Host Interaction
NetFPGA to host packet transfer (cont.)
PCIe
Bus4. NetFPGA transfers packet via DMA
5. Interrupt signals completion of DMA
6. Driver passes packet to network stack
![Page 44: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/44.jpg)
NetFPGA-Host Interaction
Host to NetFPGA packet transfers
PCIe
Bus3. Interrupt signals completion of DMA
1. Software sends packet via network socketsPacket delivered to driver
2. Driver sets up and initiates DMA transfer
![Page 45: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/45.jpg)
NetFPGA-Host Interaction
Register access
PCIe
Bus
1. Software makes ioctlcall on network socket
ioctl passed to driver
2. Driver performs PCIememory read/write
![Page 46: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/46.jpg)
Section IV: Today’s Lab Session
![Page 47: P51: High Performance Networking - University of Cambridge](https://reader034.vdocument.in/reader034/viewer/2022042721/626794b2184e946f1432c92f/html5/thumbnails/47.jpg)
Today: Getting to know the NetFPGA Platform
• Starting point: experimenting with existing projects
• Then: Learning how to modify projects
• Follow the instructions in the handout
• 1-2 people per machine (2-3 people per pair of machines)