ip routing processing with graphic processors author: shuai mu, xinya zhang, nairen zhang, jiaxin...
TRANSCRIPT
1
IP Routing Processing with Graphic Processors
Author:Shuai Mu , Xinya Zhang , Nairen Zhang , Jiaxin Lu ,
Yangdong Steve Deng, Shu Zhang
Publisher:IEEE Conference On DATE, pp.93-98, 2010
Presenter:Ye-Zhi Chen
Date:2011/8/24
2
Introduction
Internet traffic will grow at an accelerated rate , routers , therefore ,
have to deliver increasing processing capacity accordingly.
Throughput and Programmability
Hardware - Higher Throughput , but Lower Programmability
Software – Higher Programmability , but Lower Throughput
Netowrk Processors(NPs) – The lack of mature programming
models and software development tools and incompatibility of
architectures
GPU – With high-performance computing and its programming is
accessible with CUDA.
3
CUDA
CUDA Program – composed of codes running on both CPU and
GPU.
Kernel - The function called by CPU but executed on GPU
Block – threads inside a block could exchange data through the
shared memory and synchronize with one another.
4
Network Intrusion Detection
Signature matching - checks if network payloads contain pre-
supplied signatures to at line rates. 60%
Two Algorithms
Bloom filter –
1. hash table
2. space-efficient data structure
3. Errors – hash conflicts
Aho-Corisick (AC) –
4. DFA
5
Implementation
Input
6
Implementation
Transfer packet :
Individually : simplest way but time-consuming
Batch : batch many small transfers into a larger one.
Paged-locked memory :be mapped into the address
space of the device
Store Bloom vector and transition table in GPU’s texture
memory
Divide each packet into smaller chunks , and every two
neighboring chunks have an overlapped content with a
length equal to the largest match texts.
7
Rounting Table Lookup
Longest prefix match(LPM)
Radix tree
Portable Routing Table
trie
8
Result
CPU : 0.6 Gbit/sGPU : • Kernel only : 19 Gbit/s• Transfer : 3.4 Gbit/s• Paged-locked : 17 Gbit/s
9
Result
CPU : 0.6 Gbit/sGPU : • Kernel only : 3.6 Gbit/s• Transfer : 2.3 Gbit/s• Paged-locked : 3.2 Gbit/s
10
Result
The GPU performance of DFA improves rapidly and approaches a peak throughput of 9.2Gbit/s, which is more than 15 times faster than CPU
11
Result