starfish : highly-available block storage
DESCRIPTION
StarFish : highly-available block storage. Eran Gabber Jeff Fellin Michael Flaster Fengrui Gu Bruce Hillyer Wee Teck Ng Banu O¨ zden Elizabeth Shriver 2003 USENIX Annual Technical Conference Presenter: D00922019 林敬棋. Introduction. Important data need to be protected . - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: StarFish : highly-available block storage](https://reader034.vdocument.in/reader034/viewer/2022051518/568163a8550346895dd4b6a9/html5/thumbnails/1.jpg)
StarFish: highly-available block storageEran GabberJeff FellinMichael FlasterFengrui GuBruce HillyerWee Teck Ng Banu O¨ zden Elizabeth Shriver
2003 USENIX Annual Technical Conference
Presenter: D00922019 林敬棋
![Page 2: StarFish : highly-available block storage](https://reader034.vdocument.in/reader034/viewer/2022051518/568163a8550346895dd4b6a9/html5/thumbnails/2.jpg)
IntroductionImportant data need to be
protected.◦Making replicas.
Replication on remote sites◦Reduce the amount of data lost in
failure.◦Decrease the time required to
recover from catastrophic site failure.
![Page 3: StarFish : highly-available block storage](https://reader034.vdocument.in/reader034/viewer/2022051518/568163a8550346895dd4b6a9/html5/thumbnails/3.jpg)
StarFishA highly-available geographically-
dispersed block storage system.◦Does not require expensive
dedicated communication lines to all replicas to achieve highly-available .
◦Achieves good performance even during recovery from a replica failure.
◦Single-owner access semantics.
![Page 4: StarFish : highly-available block storage](https://reader034.vdocument.in/reader034/viewer/2022051518/568163a8550346895dd4b6a9/html5/thumbnails/4.jpg)
ArchitectureStarFish consists of
◦One Host Element(HE) Provides storage virtualization and read
cache.◦N Storage Element(SE)
Q: write quorum size. Synchronous updates to a quorum of Q
SEs, and asynchronous updates to the rest.
![Page 5: StarFish : highly-available block storage](https://reader034.vdocument.in/reader034/viewer/2022051518/568163a8550346895dd4b6a9/html5/thumbnails/5.jpg)
Recommended Setup
N = 3, Q = 2
MAN : Metropolitan Area NetworkWAN :Wide Area Network
![Page 6: StarFish : highly-available block storage](https://reader034.vdocument.in/reader034/viewer/2022051518/568163a8550346895dd4b6a9/html5/thumbnails/6.jpg)
Another Deployment
![Page 7: StarFish : highly-available block storage](https://reader034.vdocument.in/reader034/viewer/2022051518/568163a8550346895dd4b6a9/html5/thumbnails/7.jpg)
SE RecoveryWrite log
◦HE keeps a circular buffer of recent writes.
◦Each SE maintains a circular buffer of recent writes on a log disk.
Three types of recovery◦Quick recovery◦Replay recovery◦Full recovery
![Page 8: StarFish : highly-available block storage](https://reader034.vdocument.in/reader034/viewer/2022051518/568163a8550346895dd4b6a9/html5/thumbnails/8.jpg)
Availability and ReliabilityAssume that the failure and
recovery processes of the network links and SEs are i.i.d Poisson processes with combined mean failure and recovery rates of λ and μ per second.
Similarly, the HE has Poisson-distributed λhe and μhe .
![Page 9: StarFish : highly-available block storage](https://reader034.vdocument.in/reader034/viewer/2022051518/568163a8550346895dd4b6a9/html5/thumbnails/9.jpg)
AvailabilityThe steady-state probability that
at least Q SEs are available.
Derived from the standard machine repairman mode.
NQ
iN
NQA N
QN
i
i
1,10,
)1(),( 0
![Page 10: StarFish : highly-available block storage](https://reader034.vdocument.in/reader034/viewer/2022051518/568163a8550346895dd4b6a9/html5/thumbnails/10.jpg)
Machine Repairman Model
![Page 11: StarFish : highly-available block storage](https://reader034.vdocument.in/reader034/viewer/2022051518/568163a8550346895dd4b6a9/html5/thumbnails/11.jpg)
Availability(cont.)
![Page 12: StarFish : highly-available block storage](https://reader034.vdocument.in/reader034/viewer/2022051518/568163a8550346895dd4b6a9/html5/thumbnails/12.jpg)
Availability(cont.)
X ★ 9 : the number of 9s in an availability measure.
Achieve a much higher availability when N = 2Q + 1.
For fixed N, availability decrease with larger quorum size.◦Increasing quorum size trades off
availability for reliability.
![Page 13: StarFish : highly-available block storage](https://reader034.vdocument.in/reader034/viewer/2022051518/568163a8550346895dd4b6a9/html5/thumbnails/13.jpg)
ReliabilityThe probability of no data loss.The reliability increases with
larger Q.Two approaches
◦Make Q > floor(N/2) and at least Q SEs are available. Reduce availability and performance.
◦Read-only consistency
![Page 14: StarFish : highly-available block storage](https://reader034.vdocument.in/reader034/viewer/2022051518/568163a8550346895dd4b6a9/html5/thumbnails/14.jpg)
Read-only ConsistencyAvailable in read-only mode
during failure.◦Read-only mode obviates the need
for Q SEs to be available to handle updates.
◦Increase availabilityQ
he
iQ
ihe
Nhe
iN
iadOnly
iQ
iN
NQA)1)(1(
)(
)1)(1(
)(),(
1
0
1
0Re
he
he
headOnly
QANANQA
1
),1(1
),1(),(Re
![Page 15: StarFish : highly-available block storage](https://reader034.vdocument.in/reader034/viewer/2022051518/568163a8550346895dd4b6a9/html5/thumbnails/15.jpg)
Availability with Read-only Consistency
![Page 16: StarFish : highly-available block storage](https://reader034.vdocument.in/reader034/viewer/2022051518/568163a8550346895dd4b6a9/html5/thumbnails/16.jpg)
ObservationsIf ρhe = 0, availability is
independent of Q.◦Can always recover from HE.
If ρhe increase, availability increase with Q.
Largest increase occurs from Q = 1 to Q = 2, and bounded by 3/16 when ρ = 1.◦Diminishing gain after Q = 2.◦Suggest Q = 2 in practical system.
![Page 17: StarFish : highly-available block storage](https://reader034.vdocument.in/reader034/viewer/2022051518/568163a8550346895dd4b6a9/html5/thumbnails/17.jpg)
Implementation
![Page 18: StarFish : highly-available block storage](https://reader034.vdocument.in/reader034/viewer/2022051518/568163a8550346895dd4b6a9/html5/thumbnails/18.jpg)
Performance MeasurementsCompares with a direct-attached
RAID unit.
![Page 19: StarFish : highly-available block storage](https://reader034.vdocument.in/reader034/viewer/2022051518/568163a8550346895dd4b6a9/html5/thumbnails/19.jpg)
SettingsDifferent network delays
◦1, 2, 4, 8, 23, 36, 65 msDifferent bandwidth limitations
◦31, 51, 62, 93, 124 Mb/s.Benchmark:
◦Micro-benchmark Read hit Read miss Write
◦PostMark
![Page 20: StarFish : highly-available block storage](https://reader034.vdocument.in/reader034/viewer/2022051518/568163a8550346895dd4b6a9/html5/thumbnails/20.jpg)
Effects of network delays and HE cache size
Near SE delay: 4ms; Far SE delay: 8msNo cache miss if HE cache size = 400
MB
![Page 21: StarFish : highly-available block storage](https://reader034.vdocument.in/reader034/viewer/2022051518/568163a8550346895dd4b6a9/html5/thumbnails/21.jpg)
ObservationLarge HE cache improves
performance.◦HE can respond to more read
requests without communicating with SE. Does not change write requests.
◦Especially beneficial when local SE has significant delays.
Q = 2 and 400MB cache size is not influenced by the delay to local SE.◦Depend on near SE.
![Page 22: StarFish : highly-available block storage](https://reader034.vdocument.in/reader034/viewer/2022051518/568163a8550346895dd4b6a9/html5/thumbnails/22.jpg)
Normal Operation and placement of the far SE
1-8: 1, 2, 4, 8 ms; 4-12: 4, 8, 12 ms 23-65: 23, 36, 65 ms; 31-124:
31,51,62,93,124 Mbps Local SE delay: 0ms
N = 3
![Page 23: StarFish : highly-available block storage](https://reader034.vdocument.in/reader034/viewer/2022051518/568163a8550346895dd4b6a9/html5/thumbnails/23.jpg)
Normal Operation and placement of the far SE(Cont.)
N = 3 8 threads
![Page 24: StarFish : highly-available block storage](https://reader034.vdocument.in/reader034/viewer/2022051518/568163a8550346895dd4b6a9/html5/thumbnails/24.jpg)
Normal Operation and placement of the far SE(Cont.)
![Page 25: StarFish : highly-available block storage](https://reader034.vdocument.in/reader034/viewer/2022051518/568163a8550346895dd4b6a9/html5/thumbnails/25.jpg)
ObservationPerformance is influenced mostly
by two parameters◦Write quorum size◦Delay to the SE.
StarFish can provide adequate performance when one of the SEs is placed in a remote location.◦At least 85% of the performance of a
direct-attached RAID.
![Page 26: StarFish : highly-available block storage](https://reader034.vdocument.in/reader034/viewer/2022051518/568163a8550346895dd4b6a9/html5/thumbnails/26.jpg)
Recovery
Performance degrades more during full recovery.
![Page 27: StarFish : highly-available block storage](https://reader034.vdocument.in/reader034/viewer/2022051518/568163a8550346895dd4b6a9/html5/thumbnails/27.jpg)
ConclusionThe StarFish system reveals
significant benefits from a third copy of the data at an intermediate distance.
A StarFish system with 3 replicas, a write quorum size of 2, and read-only consistency yields better than 99.9999% availability assuming individual Storage Element availability of 99%.