distributed shared memory: a survey of issues and algorithms b,. nitzberg and v. lo university of...
TRANSCRIPT
![Page 1: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon](https://reader033.vdocument.in/reader033/viewer/2022051216/56649e705503460f94b6e156/html5/thumbnails/1.jpg)
Distributed Shared Memory:A Survey of Issues and Algorithms
B,. Nitzberg and V. LoUniversity of Oregon
![Page 2: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon](https://reader033.vdocument.in/reader033/viewer/2022051216/56649e705503460f94b6e156/html5/thumbnails/2.jpg)
INTRODUCTION
• Distributed shared memory is a software abstraction allowing a set of workstations connected by a LAN to share a single paged virtual address space
![Page 3: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon](https://reader033.vdocument.in/reader033/viewer/2022051216/56649e705503460f94b6e156/html5/thumbnails/3.jpg)
Why bother with DSM?
• Key idea is to build fast parallel computers that are– Cheaper than shared memory multiprocessor
architectures– As convenient to use
![Page 4: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon](https://reader033.vdocument.in/reader033/viewer/2022051216/56649e705503460f94b6e156/html5/thumbnails/4.jpg)
CPU
Shared memory
Conventional parallel architecture
CACHE CACHE CACHE CACHE
CPU CPU CPU
![Page 5: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon](https://reader033.vdocument.in/reader033/viewer/2022051216/56649e705503460f94b6e156/html5/thumbnails/5.jpg)
Today’s architecture
• Clusters of workstations are much more cost effective– No need to develop complex bus and cache
structures– Can use off-the-shelf networking hardware
• Gigabit Ethernet • Myrinet (1.5 Gb/s)
– Can quickly integrate newest microprocessors
![Page 6: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon](https://reader033.vdocument.in/reader033/viewer/2022051216/56649e705503460f94b6e156/html5/thumbnails/6.jpg)
Limitations of cluster approach
• Communication within a cluster of workstation is through message passing– Much harder to program than concurrent
access to a shared memory• Many big programs were written for shared
memory architectures– Converting them to a message passing
architecture is a nightmare
![Page 7: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon](https://reader033.vdocument.in/reader033/viewer/2022051216/56649e705503460f94b6e156/html5/thumbnails/7.jpg)
Distributed shared memory
DSM = one shared global address space
main memories
![Page 8: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon](https://reader033.vdocument.in/reader033/viewer/2022051216/56649e705503460f94b6e156/html5/thumbnails/8.jpg)
Distributed shared memory
• DSM makes a cluster of workstations look like a shared memory parallel computer– Easier to write new programs– Easier to port existing programs
• Key problem is that DSM only provides the illusion of having a shared memory architecture– Data must still move back and forth among
the workstations
![Page 9: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon](https://reader033.vdocument.in/reader033/viewer/2022051216/56649e705503460f94b6e156/html5/thumbnails/9.jpg)
Basic approaches
• Hardware implementations:– Use extensions of traditional hardware
caching architecture• Operating system/library implementations:
– Use virtual memory mechanisms• Compiler implementations
– Compiler handles all shared accesses
![Page 10: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon](https://reader033.vdocument.in/reader033/viewer/2022051216/56649e705503460f94b6e156/html5/thumbnails/10.jpg)
Design Issues (I)
1. Structure and granularity– Big units are more efficient
• Virtual memory pages– Can have false sharing whenever page
contains different variables that are accessed at the same time by different processors
![Page 11: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon](https://reader033.vdocument.in/reader033/viewer/2022051216/56649e705503460f94b6e156/html5/thumbnails/11.jpg)
False Sharing
accesses x accesses y
x y
page containing x and y will move back and forthbetween main memories of workstations
![Page 12: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon](https://reader033.vdocument.in/reader033/viewer/2022051216/56649e705503460f94b6e156/html5/thumbnails/12.jpg)
Design Issues (II)
1. Structure and granularity (cont'd)– Shared objects can also be
• Objects from a distributed object-oriented system
• Data types from an extant language
![Page 13: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon](https://reader033.vdocument.in/reader033/viewer/2022051216/56649e705503460f94b6e156/html5/thumbnails/13.jpg)
Design Issues (III)
2. Coherence semantics– Strict consistency is not possible– Various authors have proposed weaker
consistency models• Cheaper to implement• Harder to use in a correct fashion
![Page 14: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon](https://reader033.vdocument.in/reader033/viewer/2022051216/56649e705503460f94b6e156/html5/thumbnails/14.jpg)
Design Issues (IV)
3. Scalability– Possibly very high but limited by
• Central bottlenecks• Global knowledge operation and storage
![Page 15: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon](https://reader033.vdocument.in/reader033/viewer/2022051216/56649e705503460f94b6e156/html5/thumbnails/15.jpg)
Design Issues (V)
4. Heterogeneity– Possible but complex to implement
![Page 16: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon](https://reader033.vdocument.in/reader033/viewer/2022051216/56649e705503460f94b6e156/html5/thumbnails/16.jpg)
Portability Issues
• Portability of programs– Some DSMs allow programs written for a
multiprocessor architecture to run on a cluster of workstations without any modifications (dusty decks)
– More efficient DSMs require more changes• Portability of DSM
– Some DSMs require specific OS features
Not in paper
![Page 17: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon](https://reader033.vdocument.in/reader033/viewer/2022051216/56649e705503460f94b6e156/html5/thumbnails/17.jpg)
Implementation Issues (I)
1. Data Location and Access:• Keep data a single centralized location • Let data migrate (better) but must have way to
locate them• Centralized server (bottleneck)• Have a "home" node associated with
each piece of data • Will keep track of its location
![Page 18: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon](https://reader033.vdocument.in/reader033/viewer/2022051216/56649e705503460f94b6e156/html5/thumbnails/18.jpg)
Implementation Issues (II)
1. Data Location and Access (cont'd):• Can either
• Maintain a single copy of each piece of data• Replicate it on demand
• Must either• Propagate updates to all replicas• Use an invalidation protocol
![Page 19: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon](https://reader033.vdocument.in/reader033/viewer/2022051216/56649e705503460f94b6e156/html5/thumbnails/19.jpg)
Invalidation protocol
• Before update:
• At update time
X = 0 X = 0 X = 0
X = 5 X = 0 X = 0INVALID INVALID
![Page 20: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon](https://reader033.vdocument.in/reader033/viewer/2022051216/56649e705503460f94b6e156/html5/thumbnails/20.jpg)
Main advantage
• Locality of updates:– A page that is being modified has a high
likelihood of being modified again• Invalidation mechanism minimizes consistency
overhead– One single invalidation replaces many
updates
![Page 21: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon](https://reader033.vdocument.in/reader033/viewer/2022051216/56649e705503460f94b6e156/html5/thumbnails/21.jpg)
A realization: Munin
• Developed at Rice University• Based on software objects (variables)• Used the processor virtual memory to detect
access to the shared objects• Included several techniques for reducing
consistency-related communication• Only ran on top of the V kernel
![Page 22: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon](https://reader033.vdocument.in/reader033/viewer/2022051216/56649e705503460f94b6e156/html5/thumbnails/22.jpg)
Munin main strengths
• Excellent performance • Portability of programs
– Allowed programs written for a multiprocessor architecture to run on a cluster of workstations with a minimum number of changes(dusty decks)
![Page 23: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon](https://reader033.vdocument.in/reader033/viewer/2022051216/56649e705503460f94b6e156/html5/thumbnails/23.jpg)
Munin main weakness
• Very poor portability of Munin itself– Depended of some features of the V kernel
• Not maintained since the late 80's
![Page 24: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon](https://reader033.vdocument.in/reader033/viewer/2022051216/56649e705503460f94b6e156/html5/thumbnails/24.jpg)
Consistency model
• Munin uses software release consistency– Only requires the memory to be consistent at
specific synchronization points
![Page 25: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon](https://reader033.vdocument.in/reader033/viewer/2022051216/56649e705503460f94b6e156/html5/thumbnails/25.jpg)
SW release consistency (I)
• Well-written parallel programs use locks to achieve mutual exclusion when they access shared variables– P(&mutex) and V(&mutex)– lock(&csect) and unlock(&csect) – acquire( ) and release( )
• Unprotected accesses can produce unpredictable results
![Page 26: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon](https://reader033.vdocument.in/reader033/viewer/2022051216/56649e705503460f94b6e156/html5/thumbnails/26.jpg)
SW release consistency (II)
• SW release consistency will only guarantee correctness of operations performed within a request/release pair
• No need to export the new values of shared variables until the release
• Must guarantee that workstation has received the most recent values of all shared variables when it completes a request
![Page 27: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon](https://reader033.vdocument.in/reader033/viewer/2022051216/56649e705503460f94b6e156/html5/thumbnails/27.jpg)
SW release consistency (III)
shared int x;acquire( );
x = 1;release ( );// export x=1
shared int x;
acquire( );// wait for new value of x
x++;release ( );// export x=2
![Page 28: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon](https://reader033.vdocument.in/reader033/viewer/2022051216/56649e705503460f94b6e156/html5/thumbnails/28.jpg)
SW release consistency (IV)
• Must still decide how to release updated values– Munin uses eager release:
• New values of shared variables were propagated at release time
![Page 29: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon](https://reader033.vdocument.in/reader033/viewer/2022051216/56649e705503460f94b6e156/html5/thumbnails/29.jpg)
SW release consistency (V)
Eagerrelease
Each release forwards the update to the two other processors.
![Page 30: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon](https://reader033.vdocument.in/reader033/viewer/2022051216/56649e705503460f94b6e156/html5/thumbnails/30.jpg)
Multiple write protocol
• Designed to fight false sharing• Uses a copy-on-write mechanism• Whenever a process is granted access to write-
shared data, the page containing these data is marked copy-on-write
• First attempt to modify the contents of the page will result in the creation of a copy of the page modified (the twin).
![Page 31: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon](https://reader033.vdocument.in/reader033/viewer/2022051216/56649e705503460f94b6e156/html5/thumbnails/31.jpg)
Creating a twin Not in paper
![Page 32: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon](https://reader033.vdocument.in/reader033/viewer/2022051216/56649e705503460f94b6e156/html5/thumbnails/32.jpg)
x = 1
y = 2
x = 1
y = 2
First write access
twin
x = 3
y = 2
Before
After
Compare with twinNew value of x is 3
Example Not in paper
![Page 33: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon](https://reader033.vdocument.in/reader033/viewer/2022051216/56649e705503460f94b6e156/html5/thumbnails/33.jpg)
Other DSM Implementations (I)
• Software release consistency with lazy release (Treadmarks)– Faster and designed to be portable
• Sequentially-Consistent Software DSM (IVY):– Sends messages to other copies at each write– Much slower
![Page 34: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon](https://reader033.vdocument.in/reader033/viewer/2022051216/56649e705503460f94b6e156/html5/thumbnails/34.jpg)
Other DSM Implementations (II)
• Entry consistency (Midway):– Requires each variable to be associated to a
synchronization object (typically a lock)– Acquire/release operations on a given
synchronization object only involve the variables associated with that object
– Requires less data traffic– Does not handle well dusty decks
![Page 35: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon](https://reader033.vdocument.in/reader033/viewer/2022051216/56649e705503460f94b6e156/html5/thumbnails/35.jpg)
Other DSM Implementations (III)
• Structured DSM Systems (Linda):– Offer to the programmer a shared tuple space
accessed using specific synchronized methods
– Require a very different programming style
![Page 36: Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon](https://reader033.vdocument.in/reader033/viewer/2022051216/56649e705503460f94b6e156/html5/thumbnails/36.jpg)
TODAY'S IMPACT
• Very low:– According to W. Zwaepoel. truth is that
computer clusters are "only suitable for coarse-grained parallel computation" and this is "[a] fortiori true for DSM"
– DSM competed with OpenMP model and OPenMP model won