shyamal pandya implementation of network processor packet filtering and parameterization for higher...
Post on 21-Dec-2015
216 views
TRANSCRIPT
Shyamal Pandya
Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors
1
Implementation of IXP1200 Network Processor Packet Filtering Software and
Parameterization for Higher Performance Network Processors
Shyamal H. Pandya
Shyamal Pandya
Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors
2
Agenda
• Introduction and Goal of the Thesis
• Brief description of IXP1200 Network Processor and the ENP-2505 ESB
• Software Environment
• Packet Filter Design
• Implementation
• Tests, Results and Parameterization
• Conclusion
Shyamal Pandya
Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors
3
Introduction• Network Processors
– A class of programmable processors designed for applications – flexible and efficient alternative to ASICs and General
Purpose Processors– Employ several architectural features to achieve their design
goals:• A number of processing elements• Intelligent and fast memory units and buses• Instruction set architecture specifically tailored for packet
processing operations– Examples: Intel IXP1200, IBM PowerNP series, Vitesse
IQ2200
Shyamal Pandya
Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors
4
IXP1200• Belongs to the IXP family of Network Processors from
Intel (IXP1200, IXP2400, IXP2800)
• Major Components
– Intel StrongARM core processor
– Six programmable RISC microengines
• 4 hardware contexts per microengine
• instruction set tailored to suit network applications
– Memory Units
• 32-bit SRAM unit supporting upto 8 MB
• 64-bit SDRAM unit supporting upto 256 MB
• 8 KB of 32-bit Scratchpad Memory
Shyamal Pandya
Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors
5
Goal• Network Processors targeted towards network applications - e.g.
routers, VoIP, intrusion detection, packet filtering.
• These applications are characterized by the need to process packets at extremely fast rates to keep up with the speed of network traffic.
• Goal: to investigate the programmability of the IXP1200 through the design and implementation of a packet filter.
• Linux IP Tables - the Linux packet filtering framework, chosen as the basis of our packet filter.
• Parameterization - based on the experiences with packet filter implementation on the IXP1200, the architectural enhancements of the IXP2400 and higher performance network processor of the same family is analyzed to estimate its benefits.
Shyamal Pandya
Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors
6
IXP1200 in more Detail
Shyamal Pandya
Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors
7
IXP1200 in Operation
Shyamal Pandya
Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors
8
ENP-2505
• ESB based on IXP1200
• Pluggable in a PCI slot of a host computer
• Supports 4 10/100 Mbps ethernet ports
• 8 MB SRAM, 256 MB SDRAM
• StrongARM core processor and Microengines operate at 232 MHz
• 8 MB of flash memory that holds a RAM disk.
ENP-2505 and Host Setup
Shyamal Pandya
Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors
9
Programming Model• The ACE framework - A software framework to design
applications that consists of isolated software components performing well-defined tasks
– An ACE encapsulates the tasks or modules performing independent packet processing functions
– One or more input targets and one or more output targets
– Packets arrive at the input targets, are processed within the ACE and are transmitted through one of its output targets
– An ACE can be bound to another by binding its output target to the other’s input target
– An application is comprised of several ACEs bound to each other
Shyamal Pandya
Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors
10
Example ACE Application (Packet Forwarder)
Shyamal Pandya
Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors
11
MicroACE
• An extension to the ACE model: part of the ACE implemented on core processor, other part on the microengines
• Microblock performs fast path packet processing
• Core component a conventional ACE, manages the microblock
• MicroACE model can be exploited to divide the tasks between the microengines and the core processor
Forwarding Application using MicroACEs
Shyamal Pandya
Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors
12
Packet Filter Design• IP Tables
– Packet filtering Infrastructure for the Linux OS
– A set of modules that maintain tables of rules
– A rule contains a specifications in terms of values that fields of a header must match and a target (ACCEPT/DROP)
– Tables correspond to the kind of manipulation a packet undergoes - e.g. filter table, NAT table etc.
– Table contains a number of chains, each chain to be traversed at particular points in the packets path, e.g INPUT, OUTPUT, FORWARD
– Extensibility - each rule has at a minimum specs for IP Header matching. More examination can be specified by adding match structures, e.g tcp_match structures has specifications for matching packet TCP headers.
Shyamal Pandya
Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors
13
Packet Filter Design - Data Structures
Shyamal Pandya
Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors
14
Packet Filter Design - Algorithm• For each rule in the chain of interest
– match packet IP header against the specs in the rule. If the match succeeds, look for other match structures in the rule.
– match the packet against each match structure found in the rule. If the packet satisfies all matches, the packet has successfully matched the rule.
– For a successful match, look at the target of the rule• if the target is ACCEPT, let the packet pass• if the target is DROP, drop the packet and free its
resources– For unsuccessful match, go to the next rule and repeat the
process • last rule matches all packets. Target specified is default policy
Shyamal Pandya
Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors
15
Implementation
• Task Division between the core processor and the microengines
– Data Plane(Microengines): Ingress, Filtering, Forwarding, egress.
– Control Plane(Core): Filter table, route table management.
– Management Plane(Core): User Interface, Deployment
• Chains - INPUT, OUTPUT, FORWARD
– INPUT and OUTPUT chains are traversed infrequently– FORWARD chain is used most frequently, hence implemented
on microengines• Software Components
– Ingress, Egress, Forwarder MicroACEs and Stack ACE. Provided as part of SDK.
– PacketFilter MicroACE - Designed and Implemented as part of the thesis.
Shyamal Pandya
Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors
16
Implementation
Application Design in terms of MicroACEs
Shyamal Pandya
Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors
17
Implementation
• User Interface - iptables command
– used to manipulate filter table by adding, deleting, inserting, replacing rules
– an executable and libraries implement the user interface
– Algorithm
• parse the command line,validate all the options and arguments
• obtain a local copy of the filter table by making a cross-call to the PacketFilter core component
• modify the local copy according to the command
• make a cross-call to the PacketFilter core component to replace old filter table with the new one, passing the modified filter table as argument
Shyamal Pandya
Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors
18
Implementation
• PacketFilter Core Component
– Initialization
• Control Data Structures, filter table allocation in SRAM, patching filter table address to microcode
– Cross-call Interface
• function do_replace, used by user interface to replace the current filter table with a new filter table in the SRAM
Shyamal Pandya
Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors
19
Implementation
• Microcode - Each microengine can run more than one microblock
• Flow of control is governed by a dispatch loop running on each enabled microengine
• Microblock partitioning across microengines
Shyamal Pandya
Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors
20
Implementation• Dispatch Loops - Microengine 0
– Initialize the Ingress and PacketFilter Microblocks
– In an infinite loop, do the following• Call Ingress Microblock
• If a packet has arrived, call the PacketFilter Microblock, else if there is an exception, queue the packet for Ingress core component, else continue from beginning of the loop
• If PacketFilter microblock returns ACCEPT, queue the packet for Microengine 2, running the Forwarder
• If PacketFilter microblock returns DROP, drop the packet
– Every SA_CONSUME_NUM times around the loop, poll the Core to ME packet queue for packets from core components. If there is a packet, determine its source (Ingress core or PacketFilter Core) and call the corresponding microblock
– SA_CONSUME_NUM - tunable parameter to control frequency of memory accesses w.r.t. Core to ME packet queue
Shyamal Pandya
Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors
21
Implementation• Dispatch Loops - Microengine 2
– Initialize the Forwarder Microblock
– In an infinite loop, do the following• Poll the packet queue from Microengine 0 to see if there is a packet.
• If packet available, call the Forwarder microblock, else continue from the beginning
• If Forwarder microblock returns success, queue the packet for microengine 5 to be scheduled for output, else if it returns an exception, queue the packet for the core component, else drop the packet
– Poll the Core to ME packet buffer every SA_CONSUME_NUM times, and if there is a packet from the core component, call the microblock
Shyamal Pandya
Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors
22
Implementation
• Dispatch Loops - Microengine 5– Initialize Egress microblock
– 4 output queues, contain packets for each output port
– Context 0 polls the 4 output queues in a round-robin manner
– Contexts 1-3 fill up the TFIFO with data from the current packet to be transmitted
• PacketFilter Microblock macros– PacketFilter() - main macro
– ip_packet_match() - called from PacketFilter()
– ipt_tcp_match() - TCP extension to core packet filtering code
Shyamal Pandya
Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors
23
Implementation - Microengine Re-tasking• Triggered when the first rule specifying TCP match specs is added to the
table
• Implementation– Core component sends inter-thread signals to all threads of microengine 0
– Each time around the dispatch loop, each thread checks for a signal
– If signal is present, the thread stops its execution and sends interrupt to the StrongARM
– Interrupt Handler - when an interrupt is received from each of the 4 threads of microengine 0, it wakes up the process sleeping on the interrupt (PacketFilter core component)
– The core component disables microengine 0, reloads it with a new image containing ipt_tcp_match() macro and enables the microengine
• Above design makes sure that microengines are not interrupted while processing a packet thus preventing packet loss
Shyamal Pandya
Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors
24
Tests and Results
• Test setup
• Packets sent from host machine to the notebook
• Libnet library used to build packets
• host machine runs tcpdump and windows laptop runs ethereal
Shyamal Pandya
Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors
25
Tests and Results• Experiment 1 - Code size
• Experiment 2 - Packet filtering operations
– various commands to add, delete rules from the filter table
– packet filtering operations performed correctly from observations of packet transmission and reception from tcpdump and ethereal
Shyamal Pandya
Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors
26
Tests and Results• Experiment 3 - performance penalty due to task partitioning
across microengines
• Experiment 4 - Microengine Re-tasking
– command to add a TCP match specs rule to the filter table
– Microengine 0 was re-tasked successfully and packet filtering operations continued
Shyamal Pandya
Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors
27
Parameterization• IXP2400 Network Processor
– Higher performance network processor of same family, with significant architectural enhancements
• Microstore (4Kb v/s 16KB)– 1 K instructions limit - split tasks across 2 microengines– 4K instructions: not necessary, performance penalty
avoided
Shyamal Pandya
Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors
28
Parameterization– Total number of microwords for
Ingress+PacketFilter+Forwarder = 1156
– extra instruction store space can be used for other components, UDP match, limit match, NAT, connection tracking
• Number of Microengines and Contexts– IXP1200 serves 8 ports with 16 contexts for input and 8
contexts for output to forward packets– Number of context per microengine is doubled, so each
microengine can serve 4 ports for the input process (2 contexts per port as in IXP1200)
– with 5 microengines for input and 3 for output, the number of ports service could be 20
Shyamal Pandya
Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors
29
Parameterization• Next neighbor register set
– data sharing very fast, avoiding memory accesses
– Task partitioning between microengines = packet queues. Inter-microengine data communication - SRAM accesses, performance penalty
– IXP2400 - packet queues avoided, buffer handles shared through next neighbor registers. Performance penalty avoided.
• Memory
– ENP-2505 has 48 MB DRAM and 3 MB SRAM accessible to microengines
– SRAM could accommodate 9K rules of average size. Thus memory was enough for PacketFilter application
– Increase in memory in IXP2400 could benefit simultaneous execution of many memory hungry applications
Shyamal Pandya
Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors
30
Conclusions• Successfully implemented Packet filter core code and TCP
header match extension
• Had to split filtering and forwarding across 2 microengines due to instruction store size limits
• MicroACE software framework was ideal for the design of the packet filter
• Microengine re-tasking complicated by the lack of smooth interface to microengine signals and interrupt handling
• Future work: investigating simultaneous operation of more than one application, more IP Tables extensions to the packet filter.
• Future work: incorporating interface to inter-thread signals and call-backs to MicroACE Framework