augustus: a ccn router for programmable networks - acm...

Augustus: a CCN router for programmable networks

ACM ICN 2016, Kyoto

Davide Kirchner1∗, Raihana Ferdous2∗, Renato Lo Cigno3,

Leonardo Maccari3, Massimo Gallo4, Diego Perino5∗, and Lorenzo Saino6

September 27, 2016

1Google Inc., Dublin, Ireland; 2Create-Net, Trento, Italy; 3DISI – University of Trento, Italy4Bell Labs – Nokia, Paris, France; 5Telefonica Research, Spain; 6Fastly, London, UK∗This work was done while D. Kirchner and R. Ferdous were at the University of Trento, and D.

Perino and L. Saino at Bell Labs.

Outline

1. Introduction

2. The Augustus CCN router

3. Performance evaluation

4. Conclusions and lessons learned

2

Introduction

Objectives

The main goal is to explore the possibilities offered by modern

general-purpose hardware in the context of information-centric

networking:

• Implement a CCN data plane forwarder fully in software

• Run on a commodity x86 64 machine

• Performance-oriented, open-source and extensible

• Analyze the performance in a worst-case scenario

Why software router? Flexibility:

• Quicker development/deployment cycle and (re)configuration

• Hardware can be dynamically allocated to network functions

Tools

• Off-the-shelf high-performance hardware

• High-speed packet I/O libraries [Int, Riz12]

• Software routing frameworks built on top [BSM15, KJL+15]

4

Objectives

The main goal is to explore the possibilities offered by modern

general-purpose hardware in the context of information-centric

networking:

• Implement a CCN data plane forwarder fully in software

• Run on a commodity x86 64 machine

• Performance-oriented, open-source and extensible

• Analyze the performance in a worst-case scenario

Why software router? Flexibility:

• Quicker development/deployment cycle and (re)configuration

• Hardware can be dynamically allocated to network functions

Tools

• Off-the-shelf high-performance hardware

• High-speed packet I/O libraries [Int, Riz12]

• Software routing frameworks built on top [BSM15, KJL+15]4

Forwarding flow

• Focus on the Content Centric Networing

approach [JST+09]

• Interests hold full content name

• Similar to CCNx (vs NDN)

• CS and PIT: exact match

• Longest-prefix match at FIB

Example: get /com/updates/sw/v4.2.5.tar.gz

Router R2:

/com/updates eth0

Forwarding information base (FIB)

/com/updates/sw/v4.2.5.tar.gz {eth1}

Pending Interest Table (PIT)

/com/updates/sw/v4.2.5.tar.gz (data. . . )

Content Store (CS)

A

R1

R2

B R3

C

eth0

eth1 eth2

5

The Augustus CCN router

Design principles

• Exploit parallelism at all possible levels:

• Hardware multi-queue at NIC

• DRAM memory channels

• Multiple cores on chip

• Multiple NUMA sockets

• Data structures designed to match the x86 cache system

• Shared read-only FIB, duplicated in all NUMA sockets

• Sharded, thread-private CS and PIT

• Exploit NIC’s Receive Side Scaling capabilities to dispatch incoming

packets to threads

• Zero-copy packet processing

• Based on DPDK for fast packet I/O [Int]

• Explored two trade-offs: max performance or more flexibility

7

Design - standalone

Low-level standalone C implementation:

• Based on low-level optimized APIs

• Pushes the platform to its limits

• Architecture based on Caesar [PVL+14]

8

Design - modular

• Based on (Fast)Click

[KMC+00, BSM15]

• Easy to extend, experiment

with

• Same optimized data structures

• Can be deployed aside other

routing components

InputMux

CheckICNHeader

ICN_CS

ICN_PIT ICN_FIB

OutputDemux

FromDPDKDevice(n)

0 1

0

0

0

I

2 01

D

0

2

0

0

0D(hit)

1

0 1

I(miss)I(hit)

0

0 2

0 1 2

I(miss)

012

D (hit)

1

1

Input portoutput port

Discard

ToDPDKDevice(n)

I = Interest PacketD = Data Packet

9

Performance evaluation

Experimental setup

• Two twin machines, each with two 10Gbps Ethernet ports

• Measurements expressed in data packets per second

• Work in slight overload conditions

Worst-case assumptions:

• Every interest packet has a unique name: no CS hits, no PIT aggregation

• Minimal-sized packets, to stress the forwarding engine

Augustus router

Traffic generator and sink

Interest generator Echo server

eth1 eth0

inte

rest

interestd

ata

da

ta

11

Threads and core mapping

Threads are pinned to processing cores

Test servers: 2 sockets × 8 cores × 2 (hyperthreading)

L3

L2L1 DL1 I0

16 CPU

L2L1 DL1 I2

18 CPU

L2L1 DL1 I10

26 CPU

L2L1 DL1 I12

28 CPU

L2L1 DL1 I14

30 CPU

L2L1 DL1 I8

24 CPU

L2L1 DL1 I6

22 CPU

L2L1 DL1 I4

20 CPU

L2L1 DL1 I 1

17

CPU

L2L1 DL1 I 3

19

CPU

L2L1 DL1 I 11

27

CPU

L2L1 DL1 I 13

29

CPU

L2L1 DL1 I 15

31

CPU

L2L1 DL1 I 9

25

CPUL2

L1 DL1 I 7

23CPU

L2L1 DL1 I 5

21

CPUL3

12

Standalone performance

1 2 4 6 8 10 12 14 16 18202224262830320

2

4

6

8

10

12

Data

thro

ugh

pu

t[M

pp

s]

Standalone

1 2 4 6 8 10 12 14 16 1820222426283032

Number of threads

0.0

0.1

0.2

0.3

0.4

0.5

0.6

L3

cach

em

isse

sra

tio Hyperthreading

Single Socket

Dual Socket

• 2 threads: large gap

hyperthreaded vs physical cores

• Best performance: 4 threads

(dual socket), 8 threads

(single/dual)

13

Click module performance

1 2 4 6 8 10 12 14 16 18202224262830320

2

4

6

8

10

12

Data

thro

ugh

pu

t[M

pp

s]

Click module

1 2 4 6 8 10 12 14 16 1820222426283032

Number of threads

0.0

0.1

0.2

0.3

0.4

0.5

0.6

L3

cach

em

isse

sra

tio

Hyperthreading

Single Socket

Dual Socket

• 1 thread: same cache miss

ratio, half performance

• Best performance: 16 threads

14

FIB size scaling

212 214 216 218 220 222 224 2260

1

2

3

4

5

6

7

8

9

10

11

12

Data

thro

ughput

[Mpps]

Standalone, 8 threads

Standalone, 4 threads

Click module, 16 threads

Standalone, 1 thread

Click module, 1 thread

212 214 216 218 220 222 224 226

Number of FIB buckets

0.00.10.20.30.40.50.6

Cach

em

iss

rati

o

15

Conclusions and lessons learned

Conclusions and lessons learned

Present Augustus, a CCN software router which:

• Forwards packets at more than 10 millions data packets per second

and supports a FIB with up to 226 entries, and it is able to saturate

the 10 Gbit/s link with Ethernet payloads as small as 87 bytes;

• Tested with a thorough worst-case oriented performance evaluation

• Runs both as a stand-alone system, achieving the best performance,

or as a set of elements in the Click modular router framework

• Is open source and can be used in software based networks for fast

and incremental ICN deployment

Lessons learned:

• Manual configuration for best performance

• Abstraction hides critical low level properties

• Complex zero-copy in modular framework

17

Augustus: a CCN router for programmable networks

ACM ICN 2016, Kyoto

September 27, 2016

Thanks for your [email protected]

Bibliography

References I

[BSM15] Tom Barbette, Cyril Soldani, and Laurent Mathy.

Fast userspace packet processing.

In Proceedings of the Eleventh ACM/IEEE Symposium on Architectures

for Networking and Communications Systems, ANCS ’15, pages 5–16,

Washington, DC, USA, 2015. IEEE Computer Society.

[Int] Intel R©.

DPDK: Data plane development kit.

http://dpdk.org.

[JST+09] Van Jacobson, Diana K. Smetters, James D. Thornton, Michael F. Plass,

Nicholas H. Briggs, and Rebecca L. Braynard.

Networking named content.

In Proceedings of the 5th International Conference on Emerging

Networking Experiments and Technologies, CoNEXT ’09, pages 1–12,

New York, NY, USA, 2009. ACM.

20

http://dpdk.org

References II

[KJL+15] Joongi Kim, Keon Jang, Keunhong Lee, Sangwook Ma, Junhyun Shim,

and Sue Moon.

Nba (network balancing act): A high-performance packet processing

framework for heterogeneous processors.

In Proceedings of the Tenth European Conference on Computer Systems,

EuroSys ’15, pages 22:1–22:14, New York, NY, USA, 2015. ACM.

[KMC+00] Eddie Kohler, Robert Morris, Benjie Chen, John Jannotti, and M. Frans

Kaashoek.

The Click modular router.

ACM Trans. Comput. Syst., 18(3):263–297, August 2000.

[PVL+14] Diego Perino, Matteo Varvello, Leonardo Linguaglossa, Rafael Laufer, and

Roger Boislaigue.

Caesar: A Content Router for High-speed Forwarding on Content

Names.

In Proceedings of the Tenth ACM/IEEE Symposium on Architectures for

Networking and Communications Systems, ANCS ’14, pages 137–148,

New York, NY, USA, 2014. ACM.

21

References III

[Riz12] Luigi Rizzo.

netmap: A novel framework for fast packet I/O.

In 21st USENIX Security Symposium (USENIX Security 12), pages

101–112, Bellevue, WA, August 2012. USENIX Association.

22

augustus: a ccn router for programmable networks - acm...

Documents