a design of user-level distributed shared memory zhi zhai feng shen computer science and engineering...
TRANSCRIPT
A Design of User-Level Distributed Shared Memory
Zhi Zhai Feng Shen Computer Science and Engineering
University of Notre Dame Oct. 27, 2009
Progress Report
Outline • Part I: General Ideas• Part II: Related Work • Part III: Implementation
– Client/Server Processes– C/S Page Tables– Page Fault Handler– Consistency mechanism
• Part IV: Accomplished Work • Part V: Future work
General IdeasDSM Characteristics:• Physically: distributed memory• Logically: a single shared address space
Software DSM Layer
P1 P2 P3 Pn-1
M1 M2 M3 Mn-1
Structure of DSM CPU …CPU CPU
Memory Bus
MemoryDSM
HardwareNetwork
+ Simpler abstraction + Possibly better performance:
Larger memory space - no need to do paging on disk+ Process migration simplified - one process can easily be moved to a different
machine since they all share the same address space Long shared memory access can be a bottleneck!
Related WorkModels and Main Features: • IVY (Yale) - Divided Space: Shared & Private space • Mirage (UCLA) - Time Interval d : Avoid page thrashing• TreadMarks (Rice) - Lazy Release Consistency : Improve efficiency
• SAM (Stanford)
Sample Operation
connect connect
Get Addr.
Fetch Page
Implementation
• Server Process and Client Process• Server Page Table and Client Page Table• Page Fault Handler• Consistency Mechanism
Client/Server Process
• Client Process
C/S Process
Listening
Thread
• Listening new requests coming in
• Page Table Management
• Processing requests from client 1
C/S Communi-cation
Page Table Thread
• Client 2
• Client n
• Server Process
C/S Page Table
• Server Page Table– Client Data
• Client IDs, IP addresses– Page Info for all
• Setting the number of pages/frames• Owners / Prot bits/ frame mappings (all)
– Does not care underlying storage on each client
C/S Page Table
• Client Page Table– Storage Info
• Pointer to the physical memory• Address space of the Virtual Memory
– Page Info for local pages• Self-owned pages• Cached pages• Owners / Prot bits / frame mappings (local)
– Does not care pages not visited
Page Fault Handler
• Fetch the IP address and frame #• Clone the demanded page• Update Prot bits• Executing operations A B C…• Writing back dirty pages (writing)• Restore Prot bits
Consistency Mechanism• Single Writer / Multi-Readers
– Snap-shot, one writing allowed• Page Modification
– Two folded role of local frames• Two reads should return the same value
– Occurrences• Writing operations• Page replacement
– Block modifications to the pages being used• Variable: use_counts (>0? Wait or redo: modify OK )• Fcntl lock (modifications suspended)
Accomplished work Client Server
Ready (bima.helios.nd.edu)
Communicating
Client 1 (chopin.helios.nd.edu)
Client 2 (mozart.helios.nd.edu)
Future Work
• Page Fault Handler Implementation• Testing Plan
– Inspired by JUMP (Univ. of Hong Kong)• Similar Mechanism: File locks to keep consistency • Source Code Available • Relatively New: 2001
– Comparing the performance of the same application on:• DSM vs. Single Machine • Different settings • Different DSM Systems
Appendix
Algorithms
Implementation• Central Server Algorithm • Migration Algorithm • Read-Replication Algorithm • Full-Replication Algorithm
Non Replicated
Central Non Migrated
Replicated
Migration
Full Replication
Read Replication Migrated
Consistency Model
• Strict Consistency • Causal Consistency • Weak Consistency• Release Consistency
Granularity• Granularity: size of the shared memory unit • Large page size: + less overhead incurred due to page size - greater chance for contention to access a page by many processes. • Smaller page sizes: + less apt to cause contention (reduce the likelihood of false sharing) - Higher Overhead
C/S Page Table
• Server Page Table Entries– npages, nframes
• nframes – set by the server owner• Npages – decided by # clients connected
– client_addresses• Be accessed when page fault occured
– Full_page_mappings• Recorded the frame # each page is located in
– Full_page_bits• Indicate the usage status of each page
C/S Page Table
• Client Page Table Entries– Client_id (int)
• Assigned by the server
– nframes, npages (int)• Constant values configured on the server
– Physmem• Actual allocated physical address / frame spaces• PROT_READ|PROT_WRITE
– Virtmem• Virtual memory address range• PROT_NONE, MAP_NONRESEARVE
C/S Page Table
– Local_Page_mapping • Page # -> frame # they are loaded in
– local_page_bits (PROT_NONE for unknown pages)
• PROT_NONE, PROT_WRITE, PROT_READ, PROT_READ|PROT_WRITE
– local_page_owners• To separate the pages owned by local client and the
pages loaded from other clients– Use_status (int)
• Indicate if the owned page s are being cached or written by other clients
– Page_fault_handler
Page fault handler
• Reading Attempts– Send PAGE_REQ to server– Fetch the corresponding client address and the
frame #– Modify the server page table
• page_bits -> PROT_READ
– Clone the page from page owner to the local frame x– Modify the local page table
• Page_mapping -> framex• Page_owner -> remote client ID• Page _bits -> PROT_READ
Memory Consistency
• Writing– Snapshot
• The client processes who have cloned the page to local memory will not see the change until being notified after the writing completes
– One concurrent writer only• The server page table bits will be set as
PROT_READ|PROT_WRITE, write requests to the same page will be delayed until the writing program exits
Memory Consistency
• Page Replacement– Two consecutive read on the page should
return the same value if no writing is requested
– The local frame being read or written by other clients will not be replaced
– Use_status• == 0, no other clients are using this page• > 0, the number of clients who are reading or
writing this page