netfpga workshop day 1 - stanford...
Post on 07-Mar-2018
218 Views
Preview:
TRANSCRIPT
Melbourne Tutorial – November 4-5, 2010 1
NetFPGA Workshop Day 1
Presented by:
Hosted by: Gavin Buskes
at Melbourne University
September 15 - 16, 2010
http://NetFPGA.org
Glen Gibb (Stanford University)
Melbourne Tutorial – November 4-5, 2010 2
Tutorial Outline • Background
– Introduction – The NetFPGA Platform
• The Stanford Base Reference Router – Motivation: Basic IP review – Demo1: Reference Router running on the NetFPGA
• The Enhanced Reference Router – Motivation: Understanding buffer size requirements in a router – Demo 2: Observing and controlling the queue size
• How does the NetFPGA work – Utilities – Reference Designs – Inside the NetFPGA Hardware
• The Life of a Packet Through the NetFPGA – Hardware Datapath – Interface to software: Exceptions and Host I/O
• Exercise: Drop Nth Packet • Concluding Remarks
– Using NetFPGA for research and teaching
Melbourne Tutorial – November 4-5, 2010 3
Section I: Motivation
Melbourne Tutorial – November 4-5, 2010 4
What is the NetFPGA? A line-rate, flexible, open networking platform for teaching and research
Melbourne Tutorial – November 4-5, 2010 5
NetFPGA Board
NetFPGA consists of…
Four elements:
• NetFPGA board
• Tools + reference designs
• Contributed projects
• Community
Melbourne Tutorial – November 4-5, 2010 6
FPGA
Memory
1GE
1GE
1GE
1GE
NetFPGA board
PCI
CPU Memory
NetFPGA Board
PC with NetFPGA
Networking Software running on a standard PC
A hardware accelerator built with Field Programmable Gate Array driving Gigabit network links
Melbourne Tutorial – November 4-5, 2010 7
Tools + reference designs
Tools: • Compile designs • Verify designs • Interact with hardware
Reference designs: • Router (HW) • Switch (HW) • Network Interface Card (HW) • Router Kit (SW) • SCONE (SW)
Melbourne Tutorial – November 4-5, 2010 8
Example Contributed Projects
More projects: http://netfpga.org/foswiki/NetFPGA/OneGig/ProjectTable
Project Contributor OpenFlow switch Stanford University Packet generator Stanford University NetFlow Probe Brno University NetThreads University of Toronto zFilter (Sp)router Ericsson Traffic Monitor University of Catania DFA UMass Lowell
Melbourne Tutorial – November 4-5, 2010 9
Community
Wiki • Documentation (slowly growing) • Encourage users to contribute
Forums • Support by users for users • Active community – 10s to 100s of posts
per week
Melbourne Tutorial – November 4-5, 2010 10
NetFPGA’s Defining Characteristics • Line-Rate
– Processes back-to-back packets • Without dropping packets • At full rate of Gigabit Ethernet Links
– Operating on packet headers • For switching, routing, and firewall rules
– And packet payloads • For content processing and intrusion prevention
• Open-source Hardware – Similar to open-source software
• Full source code available • BSD-Style License
– But harder, because • Hardware modules must meeting timing • Verilog & VHDL Components have more complex interfaces • Hardware designers need high confidence in specification of modules
Melbourne Tutorial – November 4-5, 2010 11
Test-Driven Design • Regression tests
– Have repeatable results – Define the supported features – Provide clear expectation on functionality
• Example: Internet Router – Drops packets with bad IP checksum – Performs Longest Prefix Matching on destination address – Forwards IPv4 packets of length 64-1500 bytes – Generates ICMP message for packets with TTL <= 1 – Defines how packets with IP options or non IPv4
… and dozens more … Every feature is defined by a regression test
Melbourne Tutorial – November 4-5, 2010 12
Who, How, Why
Who uses the NetFPGA? – Teachers – Students – Researchers
How do they use the NetFPGA? – To run the Router Kit – To build modular reference designs
• IPv4 router • 4-port NIC • Ethernet switch, …
Why do they use the NetFPGA? – To measure performance of Internet systems – To prototype new networking systems
Melbourne Tutorial – November 4-5, 2010 13
What you will learn • Overall picture of NetFPGA • How reference designs work • How you can work on a project
– NetFPGA Design Flow – Directory Structure, library modules and projects – How to utilize contributed projects
• Interface/Registers – How to verify a design (Simulation and Regression
Tests) – Things to do when you get stuck
AND… You can start your own projects!
Melbourne Tutorial – November 4-5, 2010 14
Section II: Demo Basic Use
Melbourne Tutorial – November 4-5, 2010 15
Basic Uses of NetFPGA
• Recap Internet Protocol and Routing
• Demonstrate – How you can use the NetFPGA as a router – See routing in action
Melbourne Tutorial – November 4-5, 2010 16
What is IP?
• IP (Internet Protocol) – Protocol used for communicating data across
packet-switched networks – Divides data into a number of packets (IP
packet)
• IP Packet – Header (IP Header) including:
• Source IP address • Destination IP address
Melbourne Tutorial – November 4-5, 2010 17
IP Header
Data Hdr Data Hdr Data Hdr
Data
16 32 4 1
Data
Options (if any)
Destination Address
Source Address
Header Checksum Protocol TTL
Fragment Offset Flags Fragment ID
Total Packet Length T.Service HLen Ver
20 bytes
Melbourne Tutorial – November 4-5, 2010 18
IP Address
• Used to uniquely identify a device (such as a computer) from all other devices on a network – Two parts
• Identifier of a particular network on the Internet • Identifier of a particular device within a network
All packets, except the ones for the same network, first go to their gateway (router) and are transferred to the destination via routers.
Melbourne Tutorial – November 4-5, 2010 19
Basic Operation of an IP Router R3
A
B
C
R1
R2
R4 D
E
F R5
R5 F R3 E R3 D Next Hop Destination
D
Melbourne Tutorial – November 4-5, 2010 20
What does a router do? R3
A
B
C
R1
R2
R4 D
E
F R5
R5 F R3 E R3 D Next Hop Destination
D
16 32 4 1
Data
Options (if any)
Destination Address
Source Address
Header Checksum Protocol TTL
Fragment Offset Flags Fragment ID
Total Packet Length T.Service HLen Ver
20 b
ytes
Melbourne Tutorial – November 4-5, 2010 21
What does a router do?
A
B
C
R1
R2
R3
R4 D
E
F R5
Melbourne Tutorial – November 4-5, 2010 22
Basic Components of an IP Router
Control Plane
Datapath per-packet processing
Switching Forwarding Table
Routing Table
Routing Protocols
Management & CLI
Softw
are H
ardware
Melbourne Tutorial – November 4-5, 2010 23
Per-packet processing in an IP Router
1. Accept packet arriving on an incoming link. 2. Lookup packet destination address in the
forwarding table to identify outgoing port(s). 3. Manipulate IP header: e.g., decrement TTL,
update header checksum. 5. Buffer packet in the output queue. 6. Transmit packet onto outgoing link.
Melbourne Tutorial – November 4-5, 2010 24
Generic Datapath Architecture
Lookup IP Address
Update Header
Header Processing Data Hdr Data Hdr
Forwarding Table
IP Address Next Hop
Queue Packet
Buffer Memory
Melbourne Tutorial – November 4-5, 2010 25
CIDR and Longest Prefix Matches
The IP address space is broken into line segments. Each line segment is described by a prefix. A prefix is of the form x/y where x indicates the prefix of all
addresses in the line segment, and y indicates the length of the segment.
e.g. The prefix 128.9/16 represents the line segment containing addresses in the range: 128.9.0.0 … 128.9.255.255.
0 232-1
128.9/16
128.9.0.0
216
142.12/19 65/8
128.9.16.14
Melbourne Tutorial – November 4-5, 2010 26
Classless Interdomain Routing (CIDR)
0 232-1
128.9/16
128.9.16.14
128.9.16/20 128.9.176/20
128.9.19/24 128.9.25/24
Most specific route = “longest matching prefix”
Melbourne Tutorial – November 4-5, 2010 27
Techniques for LPM in hardware • Linear search
– Slow • Direct lookup
– Currently requires too much memory – Updating a prefix leads to many changes
• Tries – Deterministic lookup time – Easily pipelined but require multiple memories/
references • TCAM (Ternary CAM)
– Simple and widely used but have lower density than RAM and need more power
– Gradually being replaced by algorithmic methods
Melbourne Tutorial – November 4-5, 2010 28
An IP Router on NetFPGA
Switching Forwarding Table
Routing Table
Routing Protocols
Management & CLI
Softw
are H
ardware
Linux user-level processes
Verilog on NetFPGA PCI board
Exception Processing
Melbourne Tutorial – November 4-5, 2010 29
NetFPGA Router
Function – 4 Gigabit Ethernet ports
Fully programmable – FPGA hardware
Low cost
Open-source FPGA hardware – Verilog base design
Open-souce Software – Drivers in C and C++
Melbourne Tutorial – November 4-5, 2010 30
Demo 1
Reference Router running on the NetFPGA
Melbourne Tutorial – November 4-5, 2010 31
Net-FPGA
Hardware Setup for Demo #1
Net-FPGA GE
GE
GE
GE
Internet Router
Hardware
CPU x2
Net-FPGA
NIC GE
PCI-e
PCI
Video Display
GE
GE
GE
GE
GE
CAD Tools
GE
GE
GE
GE
Internet Router
Hardware
Internet Router
Hardware
CPU x2
PCI-e
PCI
Video Server NIC
GE
PCI-e GE
…
Server delivers streaming HD video through a chain of NetFPGA Routers
Melbourne Tutorial – November 4-5, 2010 32
Topology
.1.1
.1.2 .3.1
.30.2
.4.1
.4.2
.6.1 .3.2
.7.1
.7.2
.9.1 .6.2
.10.1
.10.2
.12.1 .9.2
.13.1
.13.2
.15.1 .12.2
.16.1
.16.2 .15.2
.28.1
.28.2 .27.1
.30.1
.25.1
.25.2 .24.1
.27.2
.22.1
.22.2 .21.1
.24.2
.19.1
.19.2
.17.1
.21.2 .18.2
.5.1 .8.1 .11.1 .14.1 .18.1
.20.1 .23.1 .26.1
.29.1
.2.1
Video Client Shortest Path
Video Server
Melbourne Tutorial – November 4-5, 2010 33
Melbourne Tutorial – November 4-5, 2010 34
Working IP Router
• Objectives – Become familiar with
Stanford Reference Router
– Observe PW-OSPF re-routing traffic around a failure
Melbourne Tutorial – November 4-5, 2010 35
Step 1 – Observe the Routing Tables
The router is already configured and running on your machines
The routing table has converged to the routing decisions with minimum number of hops
Next, break a link …
Melbourne Tutorial – November 4-5, 2010 36
Step 2 - Dynamic Re-routing
eth1 of Host PC 192.168.X.Y
5
6
4 10.1 19
2.16
8.18
.*
192.168.21.* 8
3 2
0
1
7 9
16.1
Key:
NetFPGA Router #
13.1
19.1 22.1 1.1 25.1 28.1
7.1 4.1
Any PC can stream traffic through multiple NetFPGA routers in the ring topology
to any other PC
192.168.24.* 192.168.27.* 192.168.30.*
192.168.3.*
192.168.6.* 192.168.9.* 192.168.12.* 1192.168.15.*
To stream video from server 4.1, type: ./play 192.168.4.1
Example:
2
1
3 4 5 6
7 8 9 10
Melbourne Tutorial – November 4-5, 2010 37
Step 3 - Dynamic Re-routing Break the link
between video server and video client
Routers re-route traffic around the broken link and video continues playing
.1.1
.1.2 .3.1
.30.2
.4.1
.4.2
.6.1 .3.2
.7.1
.7.2
.9.1
.6.2
.10.1
.10.2
.12.1
.9.2
.13.1
.13.2
.15.1
.12.2
.16.1
.16.2 .15.2
.28.1
.28.2 .27.1
.30.1
.25.1
.25.2 .24.1
.27.2
.22.1
.22.2 .21.1
.24.2
.19.1
.19.2
.17.1
.21.2 .18.2
.5.1 .8.1 .11.1 .14.1 .18.1
.20.1
.23.1 .26.1
.29.1
.2.1
Melbourne Tutorial – November 4-5, 2010 38
Section III: Demo Advanced Use
Melbourne Tutorial – November 4-5, 2010 39
Advanced Uses of NetFPGA
• Introduction on TCP and Buffer Sizes
• Demonstrate – NetFPGA used for real time measurement – See TCP Saw tooth in real time
Melbourne Tutorial – November 4-5, 2010 40
Buffer Requirements in a Router Buffer size matters:
– Small queues reduce delay – Large buffers are expensive
Theoretical tools predict requirements – Queuing theory – Large deviation theory – Mean field theory
Yet, there is no direct answer – Flows have a closed-loop nature – Question arises on whether focus should be on
equilibrium state or transient state
Melbourne Tutorial – November 4-5, 2010 41
• Universally applied rule-of-thumb: – A router needs a buffer size: – 2T is the two-way propagation delay (or just 250ms) – C is capacity of bottleneck link
• Context – Mandated in backbone and edge routers – Appears in RFPs and IETF architectural guidelines – Already known by inventors of TCP
• [Van Jacobson, 1988] – Has major consequences for router design
Rule-of-thumb
C Router Source Destination
2T
Melbourne Tutorial – November 4-5, 2010 42
The Story So Far
10,000 20 # packets at 10Gb/s 1,000,000
(1) Assume: Large number of desynchronized flows; 100% utilization (2) Assume: Large number of desynchronized flows; <100% utilization
Melbourne Tutorial – November 4-5, 2010 43
Exploring Buffer Sizes
• Need to reduce buffer size and measure occupancy
• Not possible in commercial routers • So, we will use the NetFPGA instead
Objective: – Use the NetFPGA to understand how large a
buffer we need for a single TCP flow.
Melbourne Tutorial – November 4-5, 2010 44
Rule for adjusting W – If an ACK is received: W ← W+1/W – If a packet is lost: W ← W/2
Why 2TxC for a single TCP Flow?
Only W packets may be outstanding
http://guido.appenzeller.net/anims/ �
Melbourne Tutorial – November 4-5, 2010 45
Time evolution of a single TCP flow through a router. Buffer is < 2T*C
Time Evolution of a Single TCP Flow
Time evolution of a single TCP flow through a router. Buffer is 2T*C
Melbourne Tutorial – November 4-5, 2010 46
Demo 2
Buffer Sizing Experiments using the NetFPGA Router
Melbourne Tutorial – November 4-5, 2010 47
Hardware Setup for Demo #2
CPU x2
PCI-e
Video Server NIC
GE
PCI-e GE
Net-FPGA CPU x2
NIC GE
PCI-e
PCI
Video Client
GE
GE
GE
GE
GE
Internet Router
Hardware
…
Server delivers streaming HD video to adjacent client
Melbourne Tutorial – November 4-5, 2010 48
Topology • eth1 connects your host to your NetFPGA Router • nf2c2 routes to nf2c1 (your adjacent server) • eth2 serves web and video traffic to your neighbor • nf2c0 & nf2c3 (the network ring) are unused
.1.1 .1.2
.4.1
.4.2
.7.1
.7.2
.10.1
.10.2
.13.1
.13.2
.16.1 .16.2
.28.1
.28.2
.25.1
.25.2
.22.1
.22.2
.19.1
.19.2
.2.2
.2.1
.5.2
.5.1
.8.2
.8.1
.11.2
.11.1
.14.2
.14.1 .17.2
.17.1
.20.2
.20.1
.23.2
.23.1
.26.2
.26.1
.29.2
.29.1
This configuration allows you to modify and test your router without affecting others
Melbourne Tutorial – November 4-5, 2010 49
Enhanced Router
Objectives – Observe router with new modules – New modules: rate limiting, event capture
Execution – Run event capture router – Look at routing tables – Explore details pane – Start tcp transfer, look at queue occupancy – Change rate, look at queue occupancy
Melbourne Tutorial – November 4-5, 2010 50
Step 1 - Run Pre-made Enhanced Router
Start terminal and cd to “netfpga/projects/
tutorial_router/sw/”
Type “./tut_adv_router_gui.pl”
A familiar GUI should start
Melbourne Tutorial – November 4-5, 2010 51
Step 2 - Explore Enhanced Router
Click on the Details tab
A similar pipeline to the one seen previously shown with some additions
Melbourne Tutorial – November 4-5, 2010 52
Enhanced Router Pipeline
Two modules added 1. Event Capture
to capture output queue events (writes, reads, drops)
2. Rate Limiter to create a bottleneck
MAC RxQ
CPU RxQ
MAC RxQ
CPU RxQ
MAC RxQ
CPU RxQ
MAC RxQ
CPU RxQ
Input Arbiter
Output Port Lookup
MAC TxQ
CPU TxQ
MAC TxQ
CPU TxQ
MAC TxQ
CPU TxQ
MAC TxQ
CPU TxQ
Output Queues
Rate Limiter
Event Capture
Melbourne Tutorial – November 4-5, 2010 53
Step 3 - Decrease the Link Rate To create bottleneck and
show the TCP “sawtooth,” link-rate is decreased.
In the Details tab, click the “Rate Limit” module
Check Enabled
Set link rate to 1.953Mbps
Melbourne Tutorial – November 4-5, 2010 54
Step 4 – Decrease Queue Size
Go back to the Details panel and click on “Output Queues”
Select the “Output Queue 2” tab
Change the output queue size in packets slider to 16
Melbourne Tutorial – November 4-5, 2010 55
Step 5 - Start Event Capture
Click on the Event Capture module under the Details tab
This should start the configuration page
Melbourne Tutorial – November 4-5, 2010 56
Step 6 - Configure Event Capture
Check Send to local host to receive events on the local host
Check Monitor Queue 2 to monitor output queue of MAC port1
Check Enable Capture to start event capture
Melbourne Tutorial – November 4-5, 2010 57
Step 7 - Start TCP Transfer
We will use iperf to run a large TCP transfer and look at queue evolution
Start a terminal and cd to “netfpga/projects/tutorial_router/sw”
Type “./iperf.sh”
Melbourne Tutorial – November 4-5, 2010 58
Step 8 - Look at Event Capture Results
Click on the Event Capture module under the Details tab.
The sawtooth pattern should now be visible.
Melbourne Tutorial – November 4-5, 2010 59
Queue Occupancy Charts
Leave the control windows open
Observe the TCP/IP sawtooth
Melbourne Tutorial – November 4-5, 2010 60
Section IV: How does the NetFPGA Work
Melbourne Tutorial – November 4-5, 2010 61
Integrated Circuit Technology
Full-custom Design – Complementary Metal Oxide Semiconductor (CMOS)
Semi-custom ASIC Design – Gate array – Standard cell
Programmable Logic Device – Programmable Array Logic – Field Programmable Gate Arrays
Processors
Melbourne Tutorial – November 4-5, 2010 62
Combinatorial Logic
A B
C D
Z
Look-Up Tables Combinatorial logic is stored
in Look-Up Tables (LUTs) – Also called
Function Generators (FGs) – Capacity is limited only by
number of inputs, not complexity – Delay through the LUT is constant
A B C D Z
0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 1 0 1 0 0 1 0 1 0 1 1
. . . 1 1 0 0 0 1 1 0 1 0 1 1 1 0 0 1 1 1 1 1
Diagram From: Xilinx, Inc
Melbourne Tutorial – November 4-5, 2010 63
Slice 0
LUT Carry
LUT Carry D Q CE
PRE
CLR
D Q CE
PRE
CLR
Xilinx CLB Structure Each slice has four outputs
– Two registered outputs, two non-registered outputs
– Two BUFTs associated with each CLB, accessible by all 16 CLB outputs
Carry logic run vertically – Signals run upward – Two independent
carry chains per CLB
Diagram From: Xilinx, Inc.
Melbourne Tutorial – November 4-5, 2010 64
Field Programmable Gate Arrays CLB
– Primitive element of FPGA
Routing Module – Global routing – Local interconnect
Macro Blocks – Block Memories – Microprocessor
I/O Block
Melbourne Tutorial – November 4-5, 2010 65
NetFPGA Package
• Utilities – Simulation – Synthesis – Registers
• Verilog Libraries (shared modules)
• Projects (reference and contributed)
Melbourne Tutorial – November 4-5, 2010 66
Simulation and Synthesis • Simulation (nf_run_test.pl)
– Allows simulation from command line or GUI – Uses backend libraries (Perl and Python) to create
packets for simulation
• Synthesis (make) – In the projects synth directory – Automatically includes Xilinx Coregen components
from shared libraries – Includes all Xilinx Coregen components form a
projects synth directory (.xco)
Melbourne Tutorial – November 4-5, 2010 67
Shared Verilog Libraries (modules) • Located at netfpga/lib/verilog
• Specify shared libraries in project.xml – Any project can use any module
• Local modules in a project’s src dir over rides a shared library – If arp_reply is found both in shared library and
project’s src directory only the project’s src directory version is used
Melbourne Tutorial – November 4-5, 2010 68
Register System • Project XML (project.xml)
– Found in project/include directory – Specifies shared libraries and location of registers
in pipeline
• Each module with registers has an XML file – Specifies the register names and widths
• Register files automatically created using nf_register_gen.pl – Perl header files – C header files – Verilog file defining registers
Melbourne Tutorial – November 4-5, 2010 69
Reference Projects
• Easily extend and add modules
• Currently – Reference NIC – Reference Router – Reference Switch – Router Kit – Router Buffer Sizing
Melbourne Tutorial – November 4-5, 2010 70
Full System Components
Software
PCI Bus
NetFPGA
CPU RxQ
CPU TxQ
nf2_reg_grp
user data path
nf2c0 nf2c1 nf2c2 nf2c3 ioctl
MAC TxQ
MAC RxQ
Ethernet
CPU RxQ
CPU TxQ
CPU RxQ
CPU TxQ
CPU RxQ
CPU TxQ
MAC TxQ
MAC RxQ
MAC TxQ
MAC RxQ
MAC TxQ
MAC RxQ
Melbourne Tutorial – November 4-5, 2010 71
Reference Router Pipeline • Five stages
– Input – Input arbitration – Routing decision and
packet modification – Output queuing – Output
• Packet-based module interface
• Pluggable design
MAC RxQ
CPU RxQ
MAC RxQ
CPU RxQ
MAC RxQ
CPU RxQ
MAC RxQ
CPU RxQ
Input Arbiter
Output Port Lookup
MAC TxQ
CPU TxQ
MAC TxQ
CPU TxQ
MAC TxQ
CPU TxQ
MAC TxQ
CPU TxQ
Output Queues
Melbourne Tutorial – November 4-5, 2010 72
Section V: Life of a Packet Through Hardware
Melbourne Tutorial – November 4-5, 2010 73
port0 port2 192.168.2.y 192.168.1.x
Life of a Packet through the Hardware
Melbourne Tutorial – November 4-5, 2010 74
Inter-Module Communication
Using “Module Headers”:
IP Hdr Eth Hdr
… 0 0
0 Last word of packet 0x10
Last Module Hdr y … …
Module Hdr x Contain information such as packet length, input port, output port, …
Data Word (64 bits)
Ctrl Word (8 bits)
Melbourne Tutorial – November 4-5, 2010 75
data
Inter-Module Communication
ctrl wr rdy
Melbourne Tutorial – November 4-5, 2010 76
MAC Rx Queue
Melbourne Tutorial – November 4-5, 2010 77
Rx Queue
IP Hdr: IP Dst: 192.168.2.3,
TTL: 64, Csum:0x3ab4
Eth Hdr: Dst MAC = port 0,
Ethertype = IP
Data
0
0
0
Pkt length, input port = 0 0xff
Melbourne Tutorial – November 4-5, 2010 78
Input Arbiter
Pkt
Pkt
Pkt
Melbourne Tutorial – November 4-5, 2010 79
Output Port Lookup
Melbourne Tutorial – November 4-5, 2010 80
IP Hdr: IP Dst: 192.168.2.3,
TTL: 64, Csum:0x3ab4
IP Hdr: IP Dst: 192.168.2.3,
TTL: 63, Csum:0x3ac2
Output Port Lookup
EthHdr: Dst MAC = 0 Src MAC = x,
Ethertype = IP
Data
0
0
0
Pkt length, input port = 0 0xff
1- Check input port matches
Dst MAC
2- Check TTL, checksum
3- Lookup next hop IP & output port
(LPM)
4- Lookup next hop MAC address (ARP)
5- Add output port header
6- Modify MAC Dst and Src addresses
7-Decrement TTL and update
checksum
EthHdr: Dst MAC = nextHop Src MAC = port 4,
Ethertype = IP
Pkt length, input port = 0
output port = 4
Melbourne Tutorial – November 4-5, 2010 81
Output Queues
OQ0
OQ4
OQ7
Melbourne Tutorial – November 4-5, 2010 82
MAC Tx Queue
Melbourne Tutorial – November 4-5, 2010 83
MAC Tx Queue
IP Hdr: IP Dst: 192.168.2.3,
TTL: 64, Csum:0x3ab4
IP Hdr: IP Dst: 192.168.2.3,
TTL: 63, Csum:0x3ac2
EthHdr: Dst MAC = nextHop Src MAC = port 4,
Ethertype = IP
Data
0
0
0
Pkt length, input port = 0
output port = 4 0xff
Melbourne Tutorial – November 4-5, 2010 84
Exception Packet
• Example: TTL = 0 or TTL = 1 • Packet has to be sent to the CPU which will
generate an ICMP packet as a response • Difference starts at the Output Port lookup
stage
Melbourne Tutorial – November 4-5, 2010 85
Exception Packet Path
Software
PCI Bus
NetFPGA
CPU RxQ
CPU TxQ
CPU RxQ
CPU TxQ
CPU RxQ
CPU TxQ
CPU RxQ
CPU TxQ
nf2_reg_grp
user data path
nf2c0 nf2c1 nf2c2 nf2c3 ioctl
MAC TxQ
MAC RxQ
MAC TxQ
MAC RxQ
MAC TxQ
MAC RxQ
MAC TxQ
MAC RxQ
Ethernet
Melbourne Tutorial – November 4-5, 2010 86
IP Hdr: IP Dst: 192.168.2.3,
TTL: 1, Csum:0x3ab4
Output Port Lookup
EthHdr: Dst MAC = 0, Src MAC = x,
Ethertype = IP
Data
0
0
0
Pkt length, input port = 0 0xff
1- Check input port matches
Dst MAC
2- Check TTL, checksum – EXCEPTION!
3- Add output port module
Pkt length, input port = 0
output port = 1
Melbourne Tutorial – November 4-5, 2010 87
Output Queues
OQ0
OQ1
OQ2
OQ7
Melbourne Tutorial – November 4-5, 2010 88
CPU Tx Queue
Melbourne Tutorial – November 4-5, 2010 89
CPU Tx Queue
IP Hdr: IP Dst: 192.168.2.3,
TTL: 1, Csum:0x3ab4
EthHdr: Dst MAC = 0, Src MAC = x,
Ethertype = IP
Data
0
0
0
Pkt length, input port = 0
output port = 1 0xff
Melbourne Tutorial – November 4-5, 2010 90
ICMP Packet
• For the ICMP packet, the packet arrives at the CPU Rx Queue from the PCI Bus
• It follows the same path as a packet from the MAC until it reaches the Output Port Lookup
• The OPL module sees the packet is from the CPU Rx Queue 1 and sets the output port directly to 0
• The packet then continues on the same path as the non-exception packet to the Output Queues and then MAC Tx queue 0
Melbourne Tutorial – November 4-5, 2010 91
ICMP Packet Path
Software
PCI Bus
NetFPGA
CPU RxQ
CPU TxQ
CPU RxQ
CPU TxQ
CPU RxQ
CPU TxQ
CPU RxQ
CPU TxQ
nf2_reg_grp
user data path
nf2c0 nf2c1 nf2c2 nf2c3 ioctl
MAC TxQ
MAC RxQ
MAC TxQ
MAC RxQ
MAC TxQ
MAC RxQ
MAC TxQ
MAC RxQ
Ethernet
Melbourne Tutorial – November 4-5, 2010 92
NetFPGA-Host Interaction
• Linux driver interfaces with hardware – Packet interface via standard Linux network
stack
– Register reads/writes via ioctl system call with wrapper functions:
• readReg(nf2device *dev, int address, unsigned *rd_data); • writeReg(nf2device *dev, int address, unsigned *wr_data);
eg: readReg(&nf2, OQ_NUM_PKTS_STORED_0, &val);
Melbourne Tutorial – November 4-5, 2010 93
NetFPGA-Host Interaction
NetFPGA to host packet transfer
PCI B
us
2. Interrupt notifies driver of packet arrival
3. Driver sets up and initiates DMA transfer
1. Packet arrives – forwarding table sends to CPU queue
Melbourne Tutorial – November 4-5, 2010 94
NetFPGA-Host Interaction
NetFPGA to host packet transfer (cont.)
PCI B
us
4. NetFPGA transfers packet via DMA
5. Interrupt signals completion of DMA
6. Driver passes packet to network stack
Melbourne Tutorial – November 4-5, 2010 95
NetFPGA-Host Interaction
Host to NetFPGA packet transfers
PCI B
us
3. Interrupt signals completion of DMA
1. Software sends packet via network sockets
Packet delivered to driver
2. Driver sets up and initiates DMA transfer
Melbourne Tutorial – November 4-5, 2010 96
NetFPGA-Host Interaction
Register access
PCI B
us
1. Software makes ioctl call on network socket
ioctl passed to driver
2. Driver performs PCI memory read/write
Melbourne Tutorial – November 4-5, 2010 97
NetFPGA-Host Interaction
• Packet transfers shown using DMA interface
• Alternative: use programmed IO to transfer packets via register reads/writes – slower but eliminates the need to deal with
network sockets
Melbourne Tutorial – November 4-5, 2010 98
Section VI: Exercise
Melbourne Tutorial – November 4-5, 2010 99
Drop 1 in N Packets
Objectives – Add counter and FSM to the code – Synthesize and test router
Execution – Open drop_nth_packet.v – Insert counter code – Synthesize – After synthesis, test the new system.
Melbourne Tutorial – November 4-5, 2010 100
New Reference Router Pipeline One module added
1. Drop Nth Packet to drop every Nth packet from the reference router pipeline
MAC RxQ
CPU RxQ
MAC RxQ
CPU RxQ
MAC RxQ
CPU RxQ
MAC RxQ
CPU RxQ
Input Arbiter
Output Port Lookup
MAC TxQ
CPU TxQ
MAC TxQ
CPU TxQ
MAC TxQ
CPU TxQ
MAC TxQ
CPU TxQ
Output Queues
Rate Limiter
Event Capture
Drop Nth Packet
Melbourne Tutorial – November 4-5, 2010 101
Step 1 - Open the Source
We will modify the Verilog source code to add a counter to the drop_nth_packet module
Open terminal Type “xemacs netfpga/projects/tutorial_router/
src/drop_nth_packet.v
Melbourne Tutorial – November 4-5, 2010 102
Step 2 - Add Counter to Module
Add counter using the following signals: • counter
–16 bit output signal that you should increment on each packet pulse
• rst_counter – reset signal (a pulse input)
• inc_counter – increment (a pulse input)
Search for insert counter (ctrl+s insert counter, Enter)
Insert counter and save (ctrl+x+s)
Melbourne Tutorial – November 4-5, 2010 103
Step 3 - Build the Hardware
Start terminal, cd to “netfpga/projects/ tutorial_router/synth”
Run “make clean”
Start synthesis with “make”
Melbourne Tutorial – November 4-5, 2010 104
Step 4 – Test your Router You can watch the number of received and sent packets to watch the
module drop every Nth packet. Ping a local machine (i.e. 192.168.7.1) and watch for missing pings
To run your router: 1- Enter the directory by typing:
cd netfpga/projects/tutorial_router/sw 2- Run the router by typing:
./tut_adv_router_gui.pl --use_bin ../../../bitfiles/tutorial_router.bit
To set the value of N (which packet to drop) type regwrite 0x2000704 N – replace N with a number (such as 100)
To enable packet dropping, type: To disable packet dropping, type: regwrite 0x2000700 0x1 regwrite 0x2000700 0x0
Melbourne Tutorial – November 4-5, 2010 105
Step 5 – Measurements • Determine iperf TCP throughput to neighbor’s server
for each of several values of N – Similar to Demo 2, Step 8
• cd netfpga/projects/tutorial_router/sw!• ./iperf.sh!
– Ping 192.168.x.2 (where x is your neighbor’s server) – TCP throughput with:
• Drop circuit disabled – TCP Throughput = ________ Mbps
• Drop one in N = 1,000 packets – TCP Throughput = ________ Mbps
• Drop one in N = 100 packets – TCP Throughput = ________ Mbps
• Drop one in N = 10 packets – TCP Throughput = ________ Mbps
• Explain why TCPs throughput is so low given that only a tiny fraction of packets are lost
Melbourne Tutorial – November 4-5, 2010 106
Section VII: Concluding Remarks
Melbourne Tutorial – November 4-5, 2010 107
NetFPGAs are used: • To run laboratory courses on network routing
– Professors teach courses (Stanford, Cambridge, Rice, ...)
• To teach students how to build real Internet routers – Train students to build routers (Cisco, Juniper, Huawei, .. )
• To research how new features in the network – Build network services for data centers (Google, UCSD.. )
• To prototype systems with live traffic – That measure buffers (while maintaining throughput, ..)
• To help hardware vendors understand device requirements – Use of hardware (Xilinx, Micron, Cypress, Broadcom, ..)
Melbourne Tutorial – November 4-5, 2010 108
FPGA
Memory
1GE
1GE
1GE
1GE
Running the Router Kit
User-space development, 4x1GE line-rate forwarding
PCI
CPU Memory OSPF BGP
My Protocol user
kernel Routing
Table
Usage #1
IPv4 Router
1GE
1GE
1GE
1GE
Fwding Table
Packet Buffer
“Mirror”
Melbourne Tutorial – November 4-5, 2010 109
FPGA
Memory
1GE
1GE
1GE
1GE
Enhancing Modular Reference Designs
PCI
CPU Memory
Usage #2
NetFPGA Driver
Java GUI Front Panel (Extensible)
PW-OSPF
In Q Mgmt
IP Lookup
L2 Parse
L3 Parse
Out Q Mgmt
1GE
1GE
1GE
1GE Verilog modules interconnected by FIFO interfaces
My Block
Verilog EDA Tools
(Xilinx, Mentor, etc.)
1. Design 2. Simulate 3. Synthesize 4. Download
Melbourne Tutorial – November 4-5, 2010 110
FPGA
Memory
1GE
1GE
1GE
1GE
Creating new systems
PCI
CPU Memory
Usage #3
NetFPGA Driver
1GE
1GE
1GE
1GE
My Design
(1GE MAC is soft/replaceable)
Verilog EDA Tools
(Xilinx, Mentor, etc.)
1. Design 2. Simulate 3. Synthesize 4. Download
Melbourne Tutorial – November 4-5, 2010 111
NetFPGA Platform Major Components
– Interfaces • 4 Gigabit Ethernet Ports • PCI Host Interface
– Memories • 36Mbits Static RAM • 512Mbits DDR2 Dynamic RAM
– FPGA Resources • Block RAMs • Configurable Logic Block (CLBs) • Memory Mapped Registers
Melbourne Tutorial – November 4-5, 2010 112
NetFPGA Cube Systems
• PCs assembled from parts – Stanford University – Cambridge University
• Pre-built systems available – Accent Technology Inc.
• Details are in the Guide http://netfpga.org/static/guide.html
Melbourne Tutorial – November 4-5, 2010 113
Rackmount NetFPGA Servers
NetFPGA inserts in PCI or PCI-X slot
2U Server (Dell 2950)
Thanks: Brian Cashman for providing machine
1U Server (Accent Technology Inc.)
Melbourne Tutorial – November 4-5, 2010 114
Stanford NetFPGA Cluster
Statistics • Rack of 40
• 1U PCs with NetFPGAs
• Manged • Power • Console • LANs
• Provides 4*40=160 Gbps of full line-rate processing bandwidth
Melbourne Tutorial – November 4-5, 2010 115
Acknowledgments NetFPGA Team at University of Cambridge (Past and Present):
Andrew Moore, David Miller, Martin Zadnik
NetFPGA Team at Stanford University (Past and Present):
Nick McKeown, Glen Gibb, Jad Naous, David Erickson, G. Adam Covington, John W. Lockwood, Jianying Luo, Brandon Heller,
Paul Hartke, Neda Beheshti, Sara Bolouki, James Zeng, Jonathan Ellithorpe, Sachidanandan Sambandan
All community members (including but not limited to):
Paul Rodman (Google), Kumar Sanghvi, Wojciech A. Koszek (Xilinx/FreeBSD), Yahsar Ganjali (University of Toronto), Martin Labrecque (University of Toronto), Jeff Shafer (Rice University), Eric Keller (Princeton), Tatsuya
Yabe (NEC/Stanford), Billal Anwer (Georgia Tech)
Melbourne Tutorial – November 4-5, 2010 116
Special thanks to our Partners:
Past NetFPGA Tutorial Presented At:
SIGMETRICS
Patrick Lysaght, Veena Kumar, Paul Hartke, Anna Acevedo Xilinx University Program (XUP)
See: http://NetFPGA.org/tutorials/
Melbourne Tutorial – November 4-5, 2010 117
Thanks to our Sponsors: • Support for the NetFPGA project has been provided
by the following companies and institutions
Disclaimer: Any opinions, findings, conclusions, or recommendations expressed in these materials do not necessarily reflect the views of the National Science Foundation or of any other sponsors supporting this project.
top related