part1: paste: network stacks must integrate with nvmm ... · network stack storage stack nvmm app...

28
Michio Honda (NEC Laboratories Europe) With acknowledge to Lars Eggert and Douglas Santry IIJ-II Seminar December 26th, Tokyo, Japan Part1: PASTE: Network Stacks Must Integrate with NVMM Abstractions Part2: Report on ACM HotNets 2016 *work done

Upload: others

Post on 05-Jun-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Part1: PASTE: Network Stacks Must Integrate with NVMM ... · Network stack Storage stack NVMM App NIC (1) (2) (5) Emulated using a reserved region of DRAM (3) read() (4) •Persist

Michio Honda (NEC Laboratories Europe)

With acknowledge to Lars Eggert and Douglas Santry

IIJ-II Seminar

December 26th, Tokyo, Japan

Part1: PASTE:Network Stacks Must Integrate with NVMM

AbstractionsPart2: Report on ACM HotNets 2016

*work done

Page 2: Part1: PASTE: Network Stacks Must Integrate with NVMM ... · Network stack Storage stack NVMM App NIC (1) (2) (5) Emulated using a reserved region of DRAM (3) read() (4) •Persist

https://www.hpe.com/us/en/servers/persistent-memory.html

Motivation

• Non-Volatile Main Memories (NVMMs)• Persistent• Byte-addressable• Low latency

• 10s-1000s of ns• Shift from block- to byte-granularity persistency

• OS abstractions• Direct access to mmap()-ed files

• Data structures• Filesystems and databases

What are implications for networking?

Page 3: Part1: PASTE: Network Stacks Must Integrate with NVMM ... · Network stack Storage stack NVMM App NIC (1) (2) (5) Emulated using a reserved region of DRAM (3) read() (4) •Persist

Case Study: Write-Ahead Logging

• Persist client’s request prior to acknowledgment• Durably store data into a log file to mask overhead of

updating primary database(e.g. B-tree) to the client

• 1KB commit

• 2030 us• Networking takes 40 us

client

DRAM

Network stack

SSD/DIsk

App

NIC(1)

(2)

(5)

Storage stack

(4)

write()/fsync() ormemcpy()/msync()

(3)

read()

Page 4: Part1: PASTE: Network Stacks Must Integrate with NVMM ... · Network stack Storage stack NVMM App NIC (1) (2) (5) Emulated using a reserved region of DRAM (3) read() (4) •Persist

write()/fsync() or memcpy()/msync()

Case Study: Write-Ahead Logging

• 2000 42 us• Networking takes 40 us

• This 2 us is not small

client

DRAM

Network stack

Storage stack

NVMM

App

NIC(1)

(2)

(5)Emulated using a reserved region of DRAM

(3)

read()

(4)

• Persist client’s request prior to acknowledgment• Durably store data into a log file to mask overhead of

updating primary database(e.g. B-tree) to the client

• 1KB commit

Page 5: Part1: PASTE: Network Stacks Must Integrate with NVMM ... · Network stack Storage stack NVMM App NIC (1) (2) (5) Emulated using a reserved region of DRAM (3) read() (4) •Persist

• Parallel requests are serialized on each core

Case Study: Write-Ahead Logging

33 % throughput decrease, 50 % latency increase

Page 6: Part1: PASTE: Network Stacks Must Integrate with NVMM ... · Network stack Storage stack NVMM App NIC (1) (2) (5) Emulated using a reserved region of DRAM (3) read() (4) •Persist

Data Copies Matter

• Cache Misses• Copy to tmp buffer (e.g., read()) is cheap• Logging always happens to a different destination

app bufferkernel buffer

read()

log file (mmap()-ed)

memcpy()

Overall cache misses Largest Contributor

Networking only 0.0004 % net_rx_action() (84%)

Networking + NVMM(read() + memcpy() + msync())

4.4121 % memcpy() (98%)

Networking + NVMM (read()+msync())

8.3451 % sys_read() (99 %)

We must avoid data copy for logging!

Page 7: Part1: PASTE: Network Stacks Must Integrate with NVMM ... · Network stack Storage stack NVMM App NIC (1) (2) (5) Emulated using a reserved region of DRAM (3) read() (4) •Persist

Packet Store (PASTE) Overview

• Static packet buffers on a named NVMM region• DMA to NVMM

• Zero-copy APIs• Fast logging

client

Network stack

Storage stack

NVMM

App

NIC(1)

(2)

(3)

/mnt/nvmm/pktbufs

/mnt/pmem/appmd

(4)

metadata only(e.g., buffer index)

Page 8: Part1: PASTE: Network Stacks Must Integrate with NVMM ... · Network stack Storage stack NVMM App NIC (1) (2) (5) Emulated using a reserved region of DRAM (3) read() (4) •Persist

Fast Logging with PASTE

/mnt/nvmm/myapp_metadata

buf_idx off len

0 100 1135

1 100 932

3 100 1024

/mnt/nvmm/pktbufs

buf_ofs: 123

/mnt/nvmm/pktbufs

packet buffers(static)

metadata header

metadata entries

NIC ring

Application

(2) Write metadata entry (3) Flush (buffer and metadata)

netmap APImmap()

(1) Read data (zero copy)

Kernel

User

TCP/IPinput and

output

Page 9: Part1: PASTE: Network Stacks Must Integrate with NVMM ... · Network stack Storage stack NVMM App NIC (1) (2) (5) Emulated using a reserved region of DRAM (3) read() (4) •Persist

Fast Logging with PASTE

/mnt/nvmm/myapp_metadata

buf_idx off len

0 100 1135

1 100 932

3 100 1024

/mnt/nvmm/pktbufs

buf_ofs: 123

/mnt/nvmm/pktbufs

packet buffers(static)

metadata header

metadata entries

NIC ring

Application

(2) Write metadata entry (3) Flush (buffer and metadata)

netmap APImmap()

(1) Read data (zero copy)

Kernel

User

TCP/IPinput and

output

UnreadRead orwritten

Page 10: Part1: PASTE: Network Stacks Must Integrate with NVMM ... · Network stack Storage stack NVMM App NIC (1) (2) (5) Emulated using a reserved region of DRAM (3) read() (4) •Persist

Fast Logging with PASTE

/mnt/nvmm/myapp_metadata

buf_idx off len

0 100 1135

1 100 932

3 100 1024

/mnt/nvmm/pktbufs

buf_ofs: 123

/mnt/nvmm/pktbufs

packet buffers(static)

metadata header

metadata entries

NIC ring

Application

(2) Write metadata entry (3) Flush (buffer and metadata)

netmap APImmap()

(1) Read data (zero copy)

Kernel

User

TCP/IPinput and

output

UnreadRead orwritten

Page 11: Part1: PASTE: Network Stacks Must Integrate with NVMM ... · Network stack Storage stack NVMM App NIC (1) (2) (5) Emulated using a reserved region of DRAM (3) read() (4) •Persist

Fast Logging with PASTE

/mnt/nvmm/myapp_metadata

buf_idx off len

0 100 1135

1 100 932

3 100 1024

/mnt/nvmm/pktbufs

buf_ofs: 123

/mnt/nvmm/pktbufs

packet buffers(static)

metadata header

metadata entries

NIC ring

Application

(2) Write metadata entry(3) Flush (buffer and metadata)

netmap APImmap()

(1) Read data (zero copy)

Kernel

User

TCP/IPinput and

output

UnreadRead orwritten

Page 12: Part1: PASTE: Network Stacks Must Integrate with NVMM ... · Network stack Storage stack NVMM App NIC (1) (2) (5) Emulated using a reserved region of DRAM (3) read() (4) •Persist

Fast Logging with PASTE

/mnt/nvmm/myapp_metadata

buf_idx off len

0 100 1135

1 100 932

3 100 1024

/mnt/nvmm/pktbufs

buf_ofs: 123

/mnt/nvmm/pktbufs

packet buffers(static)

metadata header

metadata entries

NIC ring

Application

(2) Write metadata entry (3) Flush (buffer and metadata)

netmap APImmap()

(1) Read data (zero copy)

Kernel

User

TCP/IPinput and

output

Flushed

DMA is performed to L3 cache (DDIO)

UnreadRead orwritten

Page 13: Part1: PASTE: Network Stacks Must Integrate with NVMM ... · Network stack Storage stack NVMM App NIC (1) (2) (5) Emulated using a reserved region of DRAM (3) read() (4) •Persist

Fast Logging with PASTE

/mnt/nvmm/myapp_metadata

buf_idx off len

0 100 1135

1 100 932

3 100 1024

/mnt/nvmm/pktbufs

buf_ofs: 123

/mnt/nvmm/pktbufs

packet buffers(static)

metadata header

metadata entries

NIC ring

Application

(2) Write metadata entry (3) Flush (buffer and metadata)

netmap APImmap()

(1) Read data (zero copy)

Kernel

User

TCP/IPinput and

output

DMA is performed to L3 cache (DDIO)

Flushed

UnreadRead orwritten

Page 14: Part1: PASTE: Network Stacks Must Integrate with NVMM ... · Network stack Storage stack NVMM App NIC (1) (2) (5) Emulated using a reserved region of DRAM (3) read() (4) •Persist

Fast Logging with PASTE

/mnt/nvmm/myapp_metadata

buf_idx off len

0 100 1135

1 100 932

3 100 1024

/mnt/nvmm/pktbufs

buf_ofs: 123

/mnt/nvmm/pktbufs

packet buffers(static)

metadata header

metadata entries

NIC ring

Application

(2) Write metadata entry (3) Flush (buffer and metadata)

netmap APImmap()

(1) Read data (zero copy)

Kernel

User

TCP/IPinput and

output

DMA is performed to L3 cache (DDIO)

Flushed

UnreadRead orwritten

Page 15: Part1: PASTE: Network Stacks Must Integrate with NVMM ... · Network stack Storage stack NVMM App NIC (1) (2) (5) Emulated using a reserved region of DRAM (3) read() (4) •Persist

Fast Logging with PASTE

/mnt/nvmm/myapp_metadata

buf_idx off len

0 100 1135

1 100 932

3 100 1024

/mnt/nvmm/pktbufs

buf_ofs: 123

/mnt/nvmm/pktbufs

packet buffers(static)

metadata header

metadata entries

NIC ring

Application

(2) Write metadata entry (3) Flush (buffer and metadata)

netmap APImmap()

(1) Read data (zero copy)

Kernel

User

TCP/IPinput and

output

DMA is performed to L3 cache (DDIO)

Flushed

UnreadRead orwritten

Page 16: Part1: PASTE: Network Stacks Must Integrate with NVMM ... · Network stack Storage stack NVMM App NIC (1) (2) (5) Emulated using a reserved region of DRAM (3) read() (4) •Persist

Fast Logging with PASTE

/mnt/nvmm/myapp_metadata

buf_idx off len

0 100 1135

1 100 932

3 100 1024

/mnt/nvmm/pktbufs

buf_ofs: 123

/mnt/nvmm/pktbufs

packet buffers(static)

metadata header

metadata entries

NIC ring

Application

(2) Write metadata entry (3) Flush (buffer and metadata)

netmap APImmap()

(1) Read data (zero copy)

Kernel

User

TCP/IPinput and

output

DMA is performed to L3 cache (DDIO)

Flushed

UnreadRead orwritten

Page 17: Part1: PASTE: Network Stacks Must Integrate with NVMM ... · Network stack Storage stack NVMM App NIC (1) (2) (5) Emulated using a reserved region of DRAM (3) read() (4) •Persist

Fast Logging with PASTE

/mnt/nvmm/myapp_metadata

buf_idx off len

0 100 1135

1 100 932

3 100 1024

/mnt/nvmm/pktbufs

buf_ofs: 123

/mnt/nvmm/pktbufs

packet buffers(static)

metadata header

metadata entries

NIC ring

Application

(2) Write metadata entry (3) Flush (buffer and metadata)

netmap APImmap()

(1) Read data (zero copy)

Kernel

User

TCP/IPinput and

output

Idempotent request

DMA is performed to L3 cache (DDIO)Unnecessary data is not flushed to DIMM

Flushed

UnreadRead orwritten

Page 18: Part1: PASTE: Network Stacks Must Integrate with NVMM ... · Network stack Storage stack NVMM App NIC (1) (2) (5) Emulated using a reserved region of DRAM (3) read() (4) •Persist

Fast Logging with PASTE

/mnt/nvmm/myapp_metadata

buf_idx off len

0 100 1135

1 100 932

3 100 1024

/mnt/nvmm/pktbufs

buf_ofs: 123

/mnt/nvmm/pktbufs

packet buffers(static)

metadata header

metadata entries

NIC ring

Application

(2) Write metadata entry (3) Flush (buffer and metadata)

netmap APImmap()

(1) Read data (zero copy)

Kernel

User

TCP/IPinput and

output

DMA is performed to L3 cache (DDIO)Unnecessary data is not flushed to DIMM

Flushed

UnreadRead orwritten

Page 19: Part1: PASTE: Network Stacks Must Integrate with NVMM ... · Network stack Storage stack NVMM App NIC (1) (2) (5) Emulated using a reserved region of DRAM (3) read() (4) •Persist

Fast Logging with PASTE

/mnt/nvmm/myapp_metadata

buf_idx off len

0 100 1135

1 100 932

3 100 1024

/mnt/nvmm/pktbufs

buf_ofs: 123

/mnt/nvmm/pktbufs

packet buffers(static)

metadata header

metadata entries

NIC ring

Application

(2) Write metadata entry (3) Flush (buffer and metadata)

netmap APImmap()

(1) Read data (zero copy)

Kernel

User

TCP/IPinput and

output

DMA is performed to L3 cache (DDIO)Unnecessary data is not flushed to DIMM

Flushed

UnreadRead orwritten

Page 20: Part1: PASTE: Network Stacks Must Integrate with NVMM ... · Network stack Storage stack NVMM App NIC (1) (2) (5) Emulated using a reserved region of DRAM (3) read() (4) •Persist

Fast Logging with PASTE

/mnt/nvmm/myapp_metadata

buf_idx off len

0 100 1135

1 100 932

3 100 1024

/mnt/nvmm/pktbufs

buf_ofs: 123

/mnt/nvmm/pktbufs

packet buffers(static)

metadata header

metadata entries

NIC ring

Application

(2) Write metadata entry (3) Flush (buffer and metadata)

netmap APImmap()

(1) Read data (zero copy)

Kernel

User

TCP/IPinput and

output

DMA is performed to L3 cache (DDIO)Unnecessary data is not flushed to DIMM

Flushed

UnreadRead orwritten

Page 21: Part1: PASTE: Network Stacks Must Integrate with NVMM ... · Network stack Storage stack NVMM App NIC (1) (2) (5) Emulated using a reserved region of DRAM (3) read() (4) •Persist

Implementation

• Extension to netmap memory allocator• Claim packet buffers from a given file backed by NVMM

• e.g., pkt-gen -i eth1@/mnt/pmem/bufs -f rx

• Server app using the netmap API can easily implement logging

2016-9-12 © 2016 NetApp, Inc. All rights reserved. 21

Page 22: Part1: PASTE: Network Stacks Must Integrate with NVMM ... · Network stack Storage stack NVMM App NIC (1) (2) (5) Emulated using a reserved region of DRAM (3) read() (4) •Persist

10-88 % throughput increase, 9-46 % latency reduction

Preliminary Results

• Implementation• Extend the netmap framework

• Stackmap for TCP/IP

Page 23: Part1: PASTE: Network Stacks Must Integrate with NVMM ... · Network stack Storage stack NVMM App NIC (1) (2) (5) Emulated using a reserved region of DRAM (3) read() (4) •Persist

Related Work

• Enhanced network stacks• MegaPipe (OSDI’12), Stackmap (ATC’16), Fastsocket

(ASPLOS’16)

• IX and Arrakis (OSDI’14), mTCP (NSDI’13), Sandstorm (SIGCOMM’14), MICA (NSDI’14)

• NVMM filesystems• BPFS (SOSP’09), NOVA (FAST’15)

• NVMM databases• NVWAL (ASPLOS’15), REWIND (VLDB’15), NV-Tree

(FAST’15)

No NVMM aware

No networking aware

Page 24: Part1: PASTE: Network Stacks Must Integrate with NVMM ... · Network stack Storage stack NVMM App NIC (1) (2) (5) Emulated using a reserved region of DRAM (3) read() (4) •Persist

Conclusion

• Implications• Network stacks are now a bottleneck for

durably storing data• Improving network and storage stacks in

isolation is not enough• We need new stacks design

PASTE: Fast logging with named packet buffers on NVMM and zero-copy API

Page 25: Part1: PASTE: Network Stacks Must Integrate with NVMM ... · Network stack Storage stack NVMM App NIC (1) (2) (5) Emulated using a reserved region of DRAM (3) read() (4) •Persist

HotNets 2016 Reports

• ACM Workshop on Hot Topics in Networks• Focus on new ideas and future directions in networking

• November 9-10 2016 @ Atlanta

• ~90 attendees (Invitation only)• 1 author per paper

• Invited people in the community

• Lottery

2016-9-12 © 2016 NetApp, Inc. All rights reserved. 25

Page 26: Part1: PASTE: Network Stacks Must Integrate with NVMM ... · Network stack Storage stack NVMM App NIC (1) (2) (5) Emulated using a reserved region of DRAM (3) read() (4) •Persist

Submission and Reviews

• 30 papers (out of 108 submitted)

• 3-6 reviews• 48 papers on PC discussion

• We submitted 2 papers• 1 rejected (received 4 reviews)

• 3 weak rejects + 1 weak accept

• 1 accepted (received 6 reviews)• 1 accept + 4 weak accepts + 1 weak reject

2016-9-12 © 2016 NetApp, Inc. All rights reserved. 26

Page 27: Part1: PASTE: Network Stacks Must Integrate with NVMM ... · Network stack Storage stack NVMM App NIC (1) (2) (5) Emulated using a reserved region of DRAM (3) read() (4) •Persist

Workshop Format

• Format was awesome• Very friendly – like a university internal workshop

• Young people can be active

• Senior people (incl. TPCs) comment and/or raise discussion

2016-9-12 © 2016 NetApp, Inc. All rights reserved. 27

Page 28: Part1: PASTE: Network Stacks Must Integrate with NVMM ... · Network stack Storage stack NVMM App NIC (1) (2) (5) Emulated using a reserved region of DRAM (3) read() (4) •Persist

Workshop Topics

• ISPs• Frontier networks, Traffic engineering using MPTCP, monitoring

• Resource allocation• ML for cluster scheduling etc, blockchain for the Internet (BGP, DNS)

• Container networking• RDMA-based interfaces

• Social networks and clouds• SaaS, recommendation etc

• Datacenters• Topology, deadlockes in RDMA networks, debugging

• Mobile• Low-energy consumption network stack, network personalization

• Network monitoring and analysis• Using programmable switches

• Wireless (MIT)

• NF modeling verification

• DDoS

2016-9-12 © 2016 NetApp, Inc. All rights reserved. 28