disksim with ssd_extension
DESCRIPTION
Analyzed the source code of disk simulator Disksim, and its SSD extension from MicrosoftTRANSCRIPT
Disksim with SSD extension-- A develop's perspective
Jiannan OuyangPhD CS@PITT
2011/04/07
Outline
Overview
Disksim implementation
SSD extension
Disksim
Disksim: An open source disk simulator originally developed at UMich. and enhanced at CMU.
Disksim features
Various device model including: disk, simpledisk, memsmodel
Controller model: simple, smart(with cache)
Trace synthesis and different trace file format
DIXtrac: automatic disk characterization
ssdmodel
Developed by Microsoft.
NOT for any specific SSD Device
For an idealized SSD that is parameterized by the properties of NAND flash chips
Cache is NOT natively supported
Source Dir
src/ disksim source (disksim_*.c/h)
ssdmodel/ ssd extension source (ssd_*.c/h)
diskmodel/ diskmodel layout and mech
memsmodel/ MEMS device model
libparam/ parameter processing lib
...
Outline
Overview
Disksim implementation
SSD extension
Disksim source: src/
disksim_main* main entrance main()
disksim_iodriver* driver iodriver_send_event_down_path()
dismsim_bus* bus bus_deliver_event()
disksim_controller* controller controller_event_arrive()
disksim_diskctlr* disk controller disk_event_arrive()
...
Disksim Control Path
Event Based System: various types of events: io, interrupt, timer...all event are stored in a global queue in time orderaddtointq() and removefromintq() are used to access the global queue
Equivalent code:while(curr=getnextevent()){ swith (curr->type){ case IO_REQUEST_ARRIVE: iodriver_request(curr); break; }}
Example
src/disksim_iosim.c io_internal_event() case IO_ACCESS_ARRIVE: iodriver_schedule(0, curr); break;
src/disksim_iodriver.c iodriver_schedule() iodriver_send_event_down_path(curr);
src/disksim_iodriver.c iodriver_send_event_down_path() bus_deliver_event(busno.byte[0], slotno.byte[0], curr);
Example con.
src/disksim_bus.c bus_deliver_event() case CONTROLLER: controller_event_arrive(devno, curr); break;
case DEVICE: ASSERT(devno == curr->devno); device_event_arrive(curr); break;
This control flow is a simulation of an event.
Disksim & Device Interface
INLINE void device_event_arrive (ioreq_event *curr){ ASSERT1 ((curr->devno >= 0) && (curr->devno < numdevices), "curr->devno", curr->devno); return disksim->deviceinfo->devices[curr->devno]->event_arrive(curr);}
Funtion pointer! By dynamic tracing using gdb, we found thatFor disk, it jumps to disk_event_arrive()For ssd, it jumps to ssd_event_arrive()
event_arrive: disk v.s. ssddisk_event_arrive() ssd_event_arrive()case IO_ACCESS_ARRIVE: disk_request_arrive(curr); case DEVICE_OVERHEAD_COMPLETE: disk_request_arrive(curr); case DEVICE_BUFFER_SEEKDONE: disk_buffer_seekdone(currdisk, curr); case DEVICE_BUFFER_SECTOR_DONE: disk_buffer_sector_done(currdisk, curr); case DEVICE_GOTO_REMAPPED_SECTOR: disk_goto_remapped_sector(currdisk, curr); case DEVICE_GOT_REMAPPED_SECTOR: disk_got_remapped_sector(currdisk, curr); case DEVICE_PREPARE_FOR_DATA_TRANSFER: disk_prepare_for_data_transfer(curr); case DEVICE_DATA_TRANSFER_COMPLETE: disk_reconnection_or_transfer_complete(curr); case IO_INTERRUPT_COMPLETE: disk_interrupt_complete(curr);
case DEVICE_OVERHEAD_COMPLETE: ssd_request_arrive(curr); case DEVICE_ACCESS_COMPLETE: ssd_access_complete (curr); case DEVICE_DATA_TRANSFER_COMPLETE: ssd_bustransfer_complete(curr); case IO_INTERRUPT_COMPLETE: ssd_interrupt_complete(curr);case SSD_CLEAN_GANG: ssd_clean_gang_complete(curr);case SSD_CLEAN_ELEMENT: ssd_clean_element_complete(curr);
"buffer" is cache related events."remapped sector" seems to related to data layout (not sure)
"clean" is garbage collection and wear-leveling related. "Gang" and "Element" specify the allocation and reclaim unit.
Outline
Overview
Disksim implementation
SSD extension
ssdmodel features
Add an auxiliary level of parallel elements, each with a closed queue, to represent flash elements or gangsAdd logic to serialized request completions from these parallel elementsFor each elements, maintain data structures to represent SSD logical block maps, cleaning state and wear_leveling stateDelay is introduced when request is processedParameters including background cleaning, gang-size, gang organization, interleaving, overprovisioning
Flash Package Internal
Flash Chip Performance
1. Latencybus<->data reg 100us
media->reg: read 25us
reg->media: write 200us
erease 1.5ms
4. Bandwidth and Interleave
src plane -> dest plane 4 page copying(100us per page)
2. Two-plane commands can be executed on their plane pairs 0&1 or 2&3
3. Support background copy on the same plane
SSD Simulation
Logical Block Mapallocation pool
Cleaninggreedy or wear-leveling aware
Parallelism and Interconnect Density ganging, interleaving, background cleaning
Persistencesaving mapping information per block in DRAM
Interconnection - Ganging
A gang of flash packages can be utilized in synchrony to optimized a multi-page request. Allow multiple packages to be used in parallel while sharing one request queueA request queue can be associated to each gang or to each element (full interconnection mode)
Logical Block Map
Use allocation pool to think about how an SSD allocates flash blocks to service write requests
An allocation pool an be a flash package or a gang
Static: a portion of each LBA constitutes a fixed mapping to a specific allocation pool
Dynamic: the non-static portion of a LBA is the lookup key for a mapping within a pool
Garbage Collection (Cleaning)
active block: block available to holding incoming writes in a pool
superseded page: out-of-date page
cleaning efficiency: (superseded / total pages) in a block
a pure greedy approach: choosing blocks to clean based on potential cleaning efficiency
Wear-Leveling
average remaining lifetime(ARL) of a blockage variance (say 20%) of the ARLretirement age (say 85%) of the ARL
Wear-aware garbage collection:1. If ARL < retirement, migrate cold data into this block from a
migration-candidate queue, and recycle the head block of the queue. Populate the queue with new blocks with cold data.
Otherwise, if ARL<age variance, then restrict recycling of the block with a probability that increases linearly as the remaining lifetime drops to 0. (80% of average ~ Prob of recycle = 1; 0% of average ~ 0)
Source: ssdmodel/
ssdmodel is very simple, all c files listed below:
ssd.c main ssd_event_arrive()
ssd_clean.c gabege collection and wear leveling
ssd_activate_gang()
ssd_gang.c several flash packages orgnised as gang
ssd_clean_blocks_greedy()
ssd_timing.c timing model ssd_compute_access_time()
ssd_utils.c util
ssd_init.c init
Example
event sequences for one request:ssd_request_arrive->ssd_interrupt_complete(reconnect)->ssd_bustransfer_complete->ssd_access_complete->ssd_interrupt_complete(completion)
ssd_bustransfer_complete() -> ssd_media_access_request ();ssdmodel/ssd.c: ssd_media_access_request () case SSD_ALLOC_POOL_PLANE: case SSD_ALLOC_POOL_CHIP: ssd_media_access_request_element(curr); break; case SSD_ALLOC_POOL_GANG:#if SYNC_GANG ssd_media_access_request_gang_sync(curr);#else ssd_media_access_request_gang(curr);#endif break;
Example con.
ssd_media_access_request_element() -> sse_activate_element() -> ssd_invoke_element_cleaning() -> ssd_compute_access_time(currdisk, elem_num, read_reqs, read_total); -> add complete into global event queue -> ssd_compute_access_time(currdisk, elem_num, write_reqs, write_total); -> add complete into global event queue
Parallel processing sequential complete is achieved by processing batch of requests in parallel, however, generate the ACCESS_COMPLETE events sequencially
References
Disksim: http://www.pdl.cmu.edu/DiskSim/Disksim Manual: http://www.pdl.cmu.edu/PDL-FTP/DriveChar/CMU-PDL-08-101.pdfDisksim implementation doc: src/doc/Outline.txtSSD Extension: http://research.microsoft.com/en-us/downloads/b41019e2-1d2b-44d8-b512-ba35ab814cd4/SSD Extension paper: Design Tradeoffs for SSD Performance, N Agrawal, 2008Cache over SSD project: Group 6 on http://www-users.cselabs.umn.edu/classes/Spring-2009/csci8980-ass/
Thanks
Q & A ?
Block stripping
// blocks can be concatenated (chained) from each plane//// plane 0 plane 1 plane 2 plane 3// ------------------------------------------// blk 0 blk 2048 blk 4096 blk 6144// blk 1 blk 2049 blk 4097 blk 6145// ... ...// blk 2047 blk 4095 blk 6143 blk 8191
// blocks can be stripped across all the planes//// plane 0 plane 1 plane 2 plane 3// ------------------------------------------// blk 0 blk 1 blk 2 blk 3// blk 4 blk 5 blk 6 blk 7// ... ...// blk 8188 blk 8189 blk 8190 blk 8191//