replica control for peer-to- peer storage systems
TRANSCRIPT
![Page 1: Replica Control for Peer-to- Peer Storage Systems](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cf65503460f949c5399/html5/thumbnails/1.jpg)
Replica Control for Peer-to-Peer Storage Systems
![Page 2: Replica Control for Peer-to- Peer Storage Systems](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cf65503460f949c5399/html5/thumbnails/2.jpg)
P2P
• Peer-to-peer (P2P) has emerged as an important paradigm model for sharing resources at the edges of the Internet.
• the most widely exploited resource is storage, as typified in P2P music file sharing– Napster– Gnutella
• Following the great success of P2P file sharing, a natural next step is to develop wide-area, P2P storage systems to aggregate the storage across the Internet.
![Page 3: Replica Control for Peer-to- Peer Storage Systems](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cf65503460f949c5399/html5/thumbnails/3.jpg)
Replica Control Protocol
•Replication– maintain multiple copies of some critical data to increase the availability
– to reduce read access times
•Replica Control Protocol – to avoid inconsistent updates
– to guarantee a consistent view of the replicated data
![Page 4: Replica Control for Peer-to- Peer Storage Systems](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cf65503460f949c5399/html5/thumbnails/4.jpg)
Resiliency Requirement
• Need data replication– Even if some nodes fail, the computation can
progress– Consistency requirement– Failures may partition the network– Rejoining need to use consistency control
algorithms
![Page 5: Replica Control for Peer-to- Peer Storage Systems](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cf65503460f949c5399/html5/thumbnails/5.jpg)
One-copy equivalence consistency criteria
• The set of replicas must behave as if there is only a single copy. Conditions to ensure one-copy equivalence are– no two write operations can proceed at the sa
me time– no a pair of a read operation and a write oper
ation can proceed at the same time– a read operation always returns the replica th
at the last write operation writes
![Page 6: Replica Control for Peer-to- Peer Storage Systems](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cf65503460f949c5399/html5/thumbnails/6.jpg)
Replica Control Methods
• Optimistic– Proceed with computation on the available
subgroup– Optimistic to join later with consistency
• Pessimistic– Restrict computations with worst-case
assumptions– Approaches
• Primary site • Voting
![Page 7: Replica Control for Peer-to- Peer Storage Systems](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cf65503460f949c5399/html5/thumbnails/7.jpg)
Optimistic Approach
• Version vector for file f– N element vector, where N is the number of n
odes in which f is stores– The ith element represents the number of upda
tes done by node I
• A vector V dominated V’ if– Every element in V >= corresponding element
in V’
• Conflicts if neither dominates
![Page 8: Replica Control for Peer-to- Peer Storage Systems](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cf65503460f949c5399/html5/thumbnails/8.jpg)
Optimistic (cont’d)
• Consistency resolution – If V dominates V’, inconsistent; can be
resolved by copying V to V’– If V and V’ conflict, inconsistency cannot be
resolved
• Version vector can resolve only update conflicts; cannot resolve read-write conflicts
![Page 9: Replica Control for Peer-to- Peer Storage Systems](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cf65503460f949c5399/html5/thumbnails/9.jpg)
Primary Site Approach
• Data replicated on at least k+1 nodes (for k-resilient)
• One node acts as the primary site (PS)– Any read request is served by the PS– Any write request is copied to all other back-
up sites– Any write request to back-up sites are
forwarded to the PS
![Page 10: Replica Control for Peer-to- Peer Storage Systems](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cf65503460f949c5399/html5/thumbnails/10.jpg)
PS Failure Handling
• If back-up fails, no interruption in service• If PS fails, there are two possibilities
– If the network not segmented• Choose another node in the set as the primary• If checkpointing has been active, need to restart
only from the previous checkpoint
– If segmented• Only the partition with PS can progress• Other partitions stops updates on data• Necessary to distinguish between site failures and
network partitions
![Page 11: Replica Control for Peer-to- Peer Storage Systems](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cf65503460f949c5399/html5/thumbnails/11.jpg)
Witnesses
Witness - small entity that maintains enough information to identity
the replicas that contain the most recent version of the data
- this information could be a timestamp containing the time of
the latest update
- replaced by a version number, which is an integer
incremented each time the data are updated
![Page 12: Replica Control for Peer-to- Peer Storage Systems](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cf65503460f949c5399/html5/thumbnails/12.jpg)
Voting Approach
• V votes are distributed to n replicas with – Vw+Vr > V– Vw+Vw > V
• Obtain Vr or more votes to read
• Obtain Vw or more votes to write
• Quorum system is more general than voting
![Page 13: Replica Control for Peer-to- Peer Storage Systems](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cf65503460f949c5399/html5/thumbnails/13.jpg)
Quorum Systems
• Trees
• Grid-based (array-based)
• Torus
• Hierarchical
• Multi-column
and so on…
![Page 14: Replica Control for Peer-to- Peer Storage Systems](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cf65503460f949c5399/html5/thumbnails/14.jpg)
Classification of P2P Storage Sys.
• Unstructured– “Replication Strategies for Highly Available Peer-to-peer Sto
rage”– “Replication Strategies in Unstructured Peer-to-peer Networ
ks” • Structured
– CFS– PAST– LAR– Ivy– Oasis– Om– Eliot– Sigma (for mutual exclusion primitive)
Read only
Read/Write (Mutable)
![Page 15: Replica Control for Peer-to- Peer Storage Systems](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cf65503460f949c5399/html5/thumbnails/15.jpg)
Ivy
• Stores a set of logs with the aid of distributed hash tables.
• Ivy keeps, for each participant, a log storing all its updates, and maintains data consistency optimistically by performing conflict resolutions among all logs. (Maintain data consistency in a best-effort manner)
• The logs should be kept indefinitely and a participant must scan all the logs related to a file to look up the up-to-date file data. Thus, Ivy is only suitable for small groups of participants.
![Page 16: Replica Control for Peer-to- Peer Storage Systems](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cf65503460f949c5399/html5/thumbnails/16.jpg)
Eliot
• Eliot relies a reliable, fault-tolerant, immutable P2P storage substrate Charles to store data blocks, and uses an auxiliary metadata service (MS) for storing mutable metadata.
• It supports NFS-like consistency semantics; however, the traffic between MS and the client is high for such semantics.
• It also supports AFS open-close consistency semantics; however, this semantics may cause the problem of lost updates.
• The MS service is provided by a conventional replicated database, which may be not fit for dynamic P2P environments.
![Page 17: Replica Control for Peer-to- Peer Storage Systems](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cf65503460f949c5399/html5/thumbnails/17.jpg)
Oasis
• Oasis is based on Gifford’s weighted voting quorum concept and allows dynamic quorum membership.
• It spreads versioned metadata along with data replicas over the P2P network.
• To complete an operation on a data object, a client must first find a metadata related to the object and figure out the total number of votes, required votes for read/write operations, replica list, and so on, to form a quorum accordingly.
• One drawback of Oasis is that if a node happens to use a stale metadata, the data consistency may be violated.
![Page 18: Replica Control for Peer-to- Peer Storage Systems](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cf65503460f949c5399/html5/thumbnails/18.jpg)
Om
• Om is based on the concepts of automatic replica regeneration and replica membership reconfiguration.
• The consistency is maintained by two quorum systems: a read-one-write-all quorum system for accessing replicas, and a witness-modeled quorum system for reconfiguration.
• Om allows replica regeneration from single replica. However, a write in Om is always first forwarded to the primary copy, which serializing all writes and uses a two-phase procedure to propagate the write to all secondary replicas.
• The drawbacks of Om are (1) the primary replica may become a bottleneck (2) the overhead incurred by the two-phase procedure may be too high (3) the reconfiguration by witness model has the probability of violating consistency.
![Page 19: Replica Control for Peer-to- Peer Storage Systems](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cf65503460f949c5399/html5/thumbnails/19.jpg)
Sigma
• The Sigma protocol intelligently collect states from all replicas to achieve mutual exclusion.
• The basic idea of the Sigma protocol is as follows. A node u wishing to be the winner of the mutual exclusion sends a timestamped request for each of the totally n (n=3k+1) replicas and waits for replies. On receiving a request from u, a node v should put u’s request in a local queue by the timestamp order, takes the node as the winner whose request is in the front of the queue, and reply the winner ID to u.
![Page 20: Replica Control for Peer-to- Peer Storage Systems](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cf65503460f949c5399/html5/thumbnails/20.jpg)
Sigma• When the number of replies received by u exceeds m (m=2k+
1), u acts according to the following conditions:(1) if more than m replies take v as the winner, then u is the winner. (2) if more than m replies take w (wu) as the winner, then w is the winner and u just keeps waiting.(3) if no node is regarded as the winner by more than m replies, then u sends YIELD message to cancel its request temporarily and then re-inserts its request again.
• In this manner, one node can eventually be elected as the winner even when communication delay variance is large.
• A drawback of the Sigma protocol is that a node needs to send requests to all replicas and gets advantaged replies from a large portion (2/3) of nodes to be the winner of the mutual exclusion, which will incur large overhead. Moreover, the overhead will even be larger under an environment of high contention.
![Page 21: Replica Control for Peer-to- Peer Storage Systems](https://reader030.vdocument.in/reader030/viewer/2022032722/56649cf65503460f949c5399/html5/thumbnails/21.jpg)
MUREX comes to the rescue!