embedded transport acceleration intel xeon processor as a packet processing engine abhishek mitra...
Post on 21-Dec-2015
220 views
TRANSCRIPT
Embedded Transport Acceleration
Intel Xeon Processor as a Packet Processing Engine
Abhishek Mitra
Professor: Dr. Bhuyan
Why do we need PPE / TOE
• The problem is that TCP termination– Involves reconstructing a stream of coherent data
from many independent packets– Compute-intensive task– Requires roughly 10 times performance as TCP
routing– A 400-MHz MIPS CPU consumes all of its cycles
trying to terminate a Fast Ethernet 100Mbps channel– A 200-MHz IXP1200 has similar TCP performance.
Packet Processing Engine
• Computing and Memory Resources– Necessary for communication processing– Scalable (throughput)– Extensible (Newer Protocols, and
applications)– Programmable (changing Standards)
• Intel Xeon is extensible and programmable– Future (Multi core in a single chip)
• Particular idea why ETA is being researched
ETA S/W Architecture
• Host and Server Partitioning– Host
• General purpose OS and application processes
– PPE• All communication centric tasks are processed
– Interface• Asynchronous queues in a cache-coherent, shared
host memory
ETA S/W StackN
A
T
I
V
E
T
C
P
/
I
P
A
C
C
E
L
E
R
A
T
E
D
ETA host-engine interface
• Set of queuing structures (DTI)• DTI (Direct Transport Interfaces)
– Based on Infiniband and VI Architecture – DTI also supports TCP connection commands– Buffer pools to buffer TCP streams– Parent DTIs listen on new TCP connections– When ETA host accepts a new connection
a child DTI is created to service the TCP session
DTI Structure
Direct Transport Interface
• Send Queue, Receive Queue [Host to PPE, vice versa]
• Event Queue [Post Event notice to Host]• Doorbells [Host writes signals directly to ETA
PPE]• Data buffers [ETA PPE buffers data when
– Source / target buffers are not pre-conditioned– PPE receives TCP segments w/o receive descriptors
on receive queue– TCP segments are out of order
ETA PPE SW
• ETA architecture: Independent of PPE implementation– Fixed device, a specialized engine, or a CPU– ETA aware PPE must support DTI structures– Execute packet processing function on behalf
of host (termination of TCP / IP)– Support an interface to the network
The Prototype
• Dual Processor (Xeon CPUs)– Host CPU0– PPE CPU1
• Establish and terminate TCP/IP sessions on behalf of host
– No special hardware developed– Use of standard tools– Gigabit Ethernet cards with modified drivers– Shared memory interface between host and
PPE
SW Environment
• Linux Kernel 2.4• PPE SW is a loadable kernel module
– Supports DTI– Affinity for one processor (CPU1)– Never yields control of processor, implying
dedicated use of CPU1 as PPE– PPE polls NIC descriptors in shared memory – DTI structures in shared host memory– CPU and PPE communicate via doorbells
Hardware Platform
• The Prototype can run on any Linux multiprocessor kernel
• One server, with five Ethernet links
• Five clients are cots servers running Linux and TTCP
ETA Test Environment
Measurement and Analysis• Comparison between ETA and standard Linux dual processor server
– ETA leaves more than 80% of CPU idle– Tx throughput increases considerably– Receive performance lower, because ETA uses memory-memory copy from packet buffer to destination
buffer
Performance with HTHT results in ~ 50% increase in Tx performance
Receive performance lower, because ETA uses memory-memory copy from packet buffer to destination buffer
ETA HT NoCopy: Test path, w/o data copy, enhanced Rx performance
Related Work
• TOEs have been developed – Devices attached to the server’s I/O
subsystem – Use separate specialized processing and
memory resources
• ETA uses processing and memory resources of the server instead
EOP.