05 - nvm ssds - pages.cs.wisc.edu

35
NVM-based SSDs CS 839 - Persistence

Upload: others

Post on 13-Jun-2022

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 05 - NVM SSDs - pages.cs.wisc.edu

NVM-based SSDsCS 839 - Persistence

Page 2: 05 - NVM SSDs - pages.cs.wisc.edu

Learning outcomes

• Understand the software overheads in different layers of storage access when devices run at the speed of memory, not flash or disk• Understand where software can be optimized for reducing latency

Page 3: 05 - NVM SSDs - pages.cs.wisc.edu

Questions from reviews

• Could FusionIO or RAID be done on top of PCM?• Would you want to use PCM devices in RAID-like array?• How evaluate power efficiency with FPGA?

• Use power models from Cacti tool• Why compare against SSD & Disk?• How specific to PCM?

• No flash translation layer!• Parallelism for 4kb

• Can read from multiple PCM DIMMs in parallel and hit latency target• Unfair evaluation!• Weak workloads – had no file system• Confusions:

• PCIe protocol & DMA

Page 4: 05 - NVM SSDs - pages.cs.wisc.edu

Background story

• Faster persistent memory is raising interest (2009), initially investigated for DRAM replacement• Natural use case is for faster SSDs• Both Intel and Non-Volatile Systems Lab (NVSL) at UCSD build

prototype devices based on DRAM to identify software overheads well before PCM/3d Xpoint becomes available

Page 5: 05 - NVM SSDs - pages.cs.wisc.edu

OS/HW I/O path

• Each layer adds latency

• SATA/SCSI: HBA layer adds 25 usec• Goal is to aggregate multiple slow devices, not

needed• 6 PIOs needed for IO submission

• Outcome: NVMe interface• Driver talks to PCIe directly• Single PIO for request submission• Completions pushed to memory, not need PIO to

read• Interrupts steered to core that submits request• Multiple request queues for multi-core scaling

Page 6: 05 - NVM SSDs - pages.cs.wisc.edu

Benefit of NVMe interface

Software Dominates!

Page 7: 05 - NVM SSDs - pages.cs.wisc.edu

Where does time go with NVMe?

Page 8: 05 - NVM SSDs - pages.cs.wisc.edu

I/O scheduler

• What is the goal of an I/O scheduler?• Why is it valuable for Disks & Flash SSDs?• Is it valuable to NVM SSDs?

Page 9: 05 - NVM SSDs - pages.cs.wisc.edu

I/O scheduler

• What is the goal of an I/O scheduler?• Optimize order of requests to maximize performance• Implement prioritization/fairness rules

• Why is it valuable for Disks & Flash SSDs?• Strong benefit of sequential I/O so reordering helps• Useful when there are lots of requests queued

• How does it work?• Separate scheduler thread takes enqueued data from rest of kernel, submits IO

requests• Adds 2 usec of overhead

• Is it valuable to NVM SSDs?• Not as much – NO-OP scheduler does best, does least

Page 10: 05 - NVM SSDs - pages.cs.wisc.edu

Issuing/completing requests

• Early version required multiple PIO writes to submit a request (like SATA)• Result: need to acquire a lock to prevent races on multicore• Does not scale to lots of cores

• Solution: make request submission a single atomic operation• Pack everything into 64 bits

• 8 bit tag to match response to requests• 8 bit command• 16 bit length• 32 bit storage address

• Remove memory address of buffer – attach to channel instead! • Allow multi-threaded interrupt handling

• Old approach: read status fields, then clear interrupt• Requires lock to atomically read status & clear interrupt

• New approach: interrupt automatically cleared when read status; guaranteed next update causes a new interrupt & updates status

Page 11: 05 - NVM SSDs - pages.cs.wisc.edu

Avoiding interrupts

• Interrupts allow doing other useful work during I/O• What happens if I/O is

fast?

Page 12: 05 - NVM SSDs - pages.cs.wisc.edu

Avoiding interrupts

• Interrupts allow doing other useful work during I/O• What happens if I/O is

fast?

For a 4KiB transfer, Ta = 4.9, Td = 4.1, Tb = 4.1, and Tu = 2.7

Page 13: 05 - NVM SSDs - pages.cs.wisc.edu

Polling is much faster than interrupts

• HW is faster: no interrupt generation, PIO to check status• Execution time used during

async I/O (Tb above) is shorter than the extra time added to context switch & do interrupts• Result: net loss of

performance from interrupts

Page 14: 05 - NVM SSDs - pages.cs.wisc.edu

When are interrupts beneficial? Why

Page 15: 05 - NVM SSDs - pages.cs.wisc.edu

Removing copies

• Standard I/O copies data from usermode to kernel buffers• Can we get rid of this copy?

Page 16: 05 - NVM SSDs - pages.cs.wisc.edu

Removing copies

• Standard I/O copies data from usermode to kernel buffers• Can we get rid of this copy?

• Memory locations used for I/O must be pinned à user code either does copy, or make expensive syscall to pin

• Must pass memory locations to SSD à adds I/O operations

Page 17: 05 - NVM SSDs - pages.cs.wisc.edu

Overall results

Page 18: 05 - NVM SSDs - pages.cs.wisc.edu

Internal SSD scheduling

• Issue: mix of short and large requests• 4kb vs 2 MB

• If run in order (FIFO), short requests wait for long ones to complete, hurt latency

• Solution: look at CPU schedulers• round-robin in queue• Serve 4KB of each request, then put back in queue

Page 19: 05 - NVM SSDs - pages.cs.wisc.edu

How does Moneta compare against a fast Flash SSD?• Notes:

• Not that much faster for some workloads.

• Why?

• Generally

Page 20: 05 - NVM SSDs - pages.cs.wisc.edu

Real Optane SSD

• Much lower tail latency• Much more stable latency

at higher IOPS

Page 21: 05 - NVM SSDs - pages.cs.wisc.edu

What is missing from Moneta?

Page 22: 05 - NVM SSDs - pages.cs.wisc.edu

What is missing from Moneta?

• Still have to enter kernel, move data from user to kernel• Still have slow file system – adds 50-60% of latency on to hardware

• Only have DRAM simulation, not PCM

Page 23: 05 - NVM SSDs - pages.cs.wisc.edu

Onyx – first PCM SSD

• Bought PCM chips, built their own DIMMs, built SSD• Performance not much better

than SSD

Page 24: 05 - NVM SSDs - pages.cs.wisc.edu

Start-gap wear leveling

Page 25: 05 - NVM SSDs - pages.cs.wisc.edu

User-mode access to Moneta SSD

• What is needed to let applications access Moneta directly?

Page 26: 05 - NVM SSDs - pages.cs.wisc.edu

Goal for Moneta-Direct

Page 27: 05 - NVM SSDs - pages.cs.wisc.edu

User-mode access to Moneta SSD

• What is needed to let applications access Moneta directly?

• User-space space driver• Virtualization / many channels (many processes)• Protection

Page 28: 05 - NVM SSDs - pages.cs.wisc.edu

User-space driver

• What is it for?

• How access it?

Page 29: 05 - NVM SSDs - pages.cs.wisc.edu

User-space driver

• What is it for?• Knows protocol for talking to SSD• Issues requests, waits for response

• Function:• Call kernel to do open/close• Load information about file to user space – where blocks are located• Implement read/write/sync in use space

• How access it?• Idea 1: add new API• Idea 2: relink programs against new implementation of I/O syscalls• Idea 3: LD_PRELOAD to dynamically link programs against new implementation

Page 30: 05 - NVM SSDs - pages.cs.wisc.edu

Virtualization – many channels

• Each process needs separate connection to SSD• Submit separate requests, should not see requests of other processes• Interface to SSD includes privileged state – control over whole device

• Channels: safe user-mode connection to device• Supports 1000+ registers• Implemented as memory-mapped region with command register,

• How limit what addresses can be used for DMA?• If not, could access any physical memory• Solution: kernel provides DMA buffer for each channel, can only DMA to/from

that buffer

Page 31: 05 - NVM SSDs - pages.cs.wisc.edu

Enforcing protection

• What do we need to enforce for user-level access to submit I/O requests?

Page 32: 05 - NVM SSDs - pages.cs.wisc.edu

Enforcing protection

• What do we need to enforce for user-level access to submit I/O requests?

• Only access files you have opened• à only access blocks of files you have opened

• Solution: provide information about what blocks are accessible to SSD• Get extents (file offset, range of blocks) from file system• Provide to SSD for a channel• SSD caches some set

Page 33: 05 - NVM SSDs - pages.cs.wisc.edu

What happens during a request?

• Allocate a tag; get 16kb region of DMA buffer attached to tag• Lookup offset in library extent map; if not present (soft miss) call

kernel to fetch• Submit request to SSD• Poll for completion

• SSD looks up block in extent map to see if accessible• If not, fail• If not present, fail and make user-driver call kernel

to provide info to SSD

Page 34: 05 - NVM SSDs - pages.cs.wisc.edu

Moneta-Direct results

Page 35: 05 - NVM SSDs - pages.cs.wisc.edu

Summary

• With fast SSD, HW and SW overhead dominate performance

• HW optimizations:• Simpler, single PIO interface• Atomic interfaces for scalability• Scheduling• Move in OS functionality – permission checks

• SW interfaces• Remove unnecessary layers• Use polling for short requests• Move I/O submission to user space