![Page 1: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/1.jpg)
Cloud Computing Towards Elastic Transactional Cloud Storage with Range Query Support Saarbrücken, November 30th, 2010
Martin Ites
![Page 2: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/2.jpg)
Outline
• Motivation • Related work • System architecture of ecStore ▫ distributed storage layer (BATON) ▫ replication layer (self-tuning range histogram) ▫ transaction layer
• Performance study • Conclusion
2
![Page 3: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/3.jpg)
Motivation
• Cloud computing should be used as a utility
• Cloud storage has to be adjusted dynamically • Minimal startup costs • Pay-per-use model • Elastically scale on-demand ▫ Allow users scale up and down on the fly
• Can only be archived when storage nodes could be easily added into or removed from the system
3
![Page 4: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/4.jpg)
Outline
• Motivation • Related work • System architecture of ecStore ▫ distributed storage layer (BATON) ▫ replication layer (self-tuning range histogram) ▫ transaction layer
• Performance study • Conclusion
4
![Page 5: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/5.jpg)
Related work
• Replication in distributed and peer-to-peer systems ▫ Primary copy of data is responsible to handle both
read and write request from clients ▫ Only support operations on a single data items ▫ Data resided on a storage node is replicated on the
successor node ▫ Pessimistic replication technique
5
![Page 6: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/6.jpg)
Related work
• Distributed and parallel databases ▫ B+-tree, optimistic scheme, two-phase commit
protocol ▫ Online load balancing in range-partitioned
systems using data migration and self-tuning approach to re-organize the data in a shared-nothing system ▫ Traditional parallel database technologies not fit
100% for scalable storage
6
![Page 7: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/7.jpg)
Related work
• Cloud data and transaction service ▫ Data management system on top of the Amazon
S3 based on the client-server model ▫ System with a not tightly coupled transactional
component and a data component ▫ Storage nodes are organized on a ring-based
distributed hash table (DHT) and each data item is asynchronously replicated on the successor storage nodes
7
![Page 8: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/8.jpg)
Weaknesses of cloud storage services
• Guarantees on consistency (data updates) • No range query support • Data migration to balance the storage load • No support transactional semantics across
multiple keys
8
![Page 9: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/9.jpg)
Outline
• Motivation • Related work • System architecture of ecStore ▫ distributed storage layer (BATON) ▫ replication layer (self-tuning range histogram) ▫ transaction layer
• Performance study • Conclusion
9
![Page 10: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/10.jpg)
ecStore (elastic cloud storage system)
• Scalable storage system within the cloud cluster • The architecture follows a stratum design • Organizes storage nodes as a balanced tree
structured overlay and assigns a data range for each storage node
• Data objects are distributed and replicated in a cluster of commodity computer nodes
10
![Page 11: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/11.jpg)
Architecture of ecStore
• Automated data partitioning and replication
• Load balancing • Efficient range query • Transactional access
11
11
![Page 12: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/12.jpg)
Outline
• Motivation • Related work • System architecture of ecStore ▫ distributed storage layer (BATON) ▫ replication layer (self-tuning range histogram) ▫ transaction layer
• Performance study • Conclusion
12
![Page 13: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/13.jpg)
Distributed storage layer
• Distributed data structure ▫ Decluster data objects across storage nodes ▫ Facilitates parallelism to improve performance
• DHT-based structure (distributed hash table) ▫ BATON (BAlance Tree Overlay Network)
13
![Page 14: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/14.jpg)
BATON
• Tree-based structure ▫ To realize a scalable range-partitioned system
• Support efficient range query processing
• Automatically repartition and redistribute the data when storage nodes are added into or removed from the system
14
![Page 15: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/15.jpg)
Outline
• Motivation • Related work • System architecture of ecStore ▫ distributed storage layer (BATON) ▫ replication layer (self-tuning range histogram) ▫ transaction layer
• Performance study • Conclusion
15
![Page 16: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/16.jpg)
Replication layer • BATON does not provide replication and
transaction support • Extend BATON to efficiently support load-
adaptive replication for large-scale data ▫ Two-tier partial replication strategy Data availability Load balancing function
• Tuning the replication process based on data popularity in common ▫ Self-tuning range histogram
16
![Page 17: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/17.jpg)
Replication in BATON
• Usually BATON is range instead of hash based • “Where to replicate a certain data object?”
• Approaches ▫ Straightforward approach ▫ Replication based on data range ▫ Shift key value scheme (ecStore)
17
![Page 18: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/18.jpg)
Replication in BATON
• Straightforward approach ▫ Replicate data on the surrounding nodes ▫ Replicas indentified by the location of primary
copy ▫ It is complicated to identify the surrounding links
of a failure node
18
![Page 19: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/19.jpg)
Replication in BATON
• Replication based on data range ▫ If the key of a data item belongs to a certain range Hash the range value Use the output to determine the identity of the
storage node where we can store the replica ▫ Hashing breaks the order of replicated data
19
![Page 20: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/20.jpg)
Replication in BATON
• Shift key value scheme (ecStore) ▫ Different replicas will be stored in the same
BATON structure of the primary copy but associated with their virtual keys ▫ Well distributed across the storage nodes in the
cluster ▫ Shifting the initial key to multiple virtual keys ▫ Preserves the order of replicated data
20
![Page 21: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/21.jpg)
Two-tier partial replication
• “Which data should be replicated?”
• Approaches ▫ Straightforward approach ▫ Data migration ▫ Two-tier replication mechanism (ecStore)
21
![Page 22: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/22.jpg)
Two-tier partial replication
• Straightforward approach ▫ Replicate all data objects with the same replication
level K ▫ If K is large, the system storage and the overhead
to keep them consistent can be considerably high • Data migration ▫ Migrating hot data from one overloaded node to
another node only shuffles the hotspot throughout the system
22
![Page 23: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/23.jpg)
Two-tier partial replication
• Two-tier replication mechanism (ecStore) ▫ Provide both data availability and load balancing ▫ Each data object is associated with two kinds of
replicas – secondary and slave replicas
23
![Page 24: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/24.jpg)
Two-tier partial replication
• First tier ▫ Small level K replication for all data objects
• Second tier ▫ Popular data objects are associated with
additional replicas – called slave replicas ▫ Facilitate load balancing for frequently accessed
objects
24
![Page 25: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/25.jpg)
BATON-Two-tier partial replication
25
![Page 26: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/26.jpg)
Self-tuning range histogram
• Only a small number of replicas ▫ Histogram maintenance cost minimal
• Histogram to approximately estimate the access frequency of a data range
• When load balancing process is triggered, the storage node will replicate most popular data ranges to other lighter-loaded nodes
• Piggy-back the load information on the query
26
![Page 27: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/27.jpg)
Self-tuning range histogram
• Dynamically restructuring the histogram ▫ Splitting/merging the buckets
• Total number of buckets is kept constant ▫ Merge consecutive buckets with similar frequency
into a bucket with a larger data range ▫ Split the bucket with high access frequency into
buckets with smaller data range
• Only replicate the data ranges maintained by small buckets
27
![Page 28: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/28.jpg)
Self-tuning range histogram
• Reduce the cost of maintaining unnecessary replicas
• No benefits for load balancing anymore ▫ Discard slave replica of a data range
28
![Page 29: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/29.jpg)
Replica consistency management • cloud storage has provide 24x7 data availability • Updating all copies synchronously is not suitable
• Pessimistic replication technique ▫ Update needs to be reflected on all replicas before
coming to effect • Optimistic replication method (ecStore) ▫ Primary copy is always updated immediately
29
![Page 30: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/30.jpg)
Replica consistency management
• Write-ahead logging scheme • Guarantees that updates to the primary copy are
durable and eventually propagated to the secondary copies
• Adaptive read consistency by using the quorum model for read operations
• Write request will update primary copy first and asynchronously propagate it to the replicas
30
![Page 31: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/31.jpg)
Replica consistency Management
• Adopts the notion of BASE (BAsically available, Soft state, Eventually consistency)
• Does not need to implement the two-phase commit protocol for refresh transaction
31
![Page 32: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/32.jpg)
Outline
• Motivation • Related work • System architecture of ecStore ▫ distributed storage layer (BATON) ▫ replication layer (self-tuning range histogram) ▫ transaction layer
• Performance study • Conclusion
32
![Page 33: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/33.jpg)
Transaction management layer
• Multi-versioning ▫ Enhances the performance of read-dominant apps ▫ Can benefit the read-only transactions
• Optimistic concurrency control ▫ Advantages of apps where users access mutually
exclusive data ▫ Protects system from locking overheads
• Commit protocol and Recovery control ▫ Guarantees the data durability requirement ▫ Atomicity and durability
33
![Page 34: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/34.jpg)
Transaction management layer
• Data in the Cloud ▫ Perform operations on recent snapshot of data ▫ Independent between concurrent transactions ▫ Hybrid scheme of multi-version and optimistic
concurrency control Isolation and consistency for large-scale databases
34
![Page 35: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/35.jpg)
Transaction management layer • Multi-version Optimistic Concurrency Scheme ▫ Startup timestamp, commit timestamp ▫ Read-only transactions runs against a consistent
snapshot of the database Can commit without the validation phase ▫ Update transactions uses version number To check for write-write/write-read conflicts ▫ Update transaction can only commit if the version
of the object is the same as in the read phase ▫ Snapshot isolation property Not serializable in all executions
Not checking read-before-write conflicts 35
![Page 36: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/36.jpg)
Transaction management layer
• Commit protocol ▫ Read-only transactions Consistent snapshot of the database – no commit ▫ Update transactions The log and commit records are stored in a local
dedicated disk and also replicated over the storage nodes in the system
36
![Page 37: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/37.jpg)
Transaction management layer • Recovery control ▫ A storage node can safely leave the system No recovery process is needed ▫ Unsafe departure Short-term failure (software bugs …)
Check its local log store Long-term failure (hardware crashes …)
Another healthy node take care of the range index that previously is managed by the failure node
New responsible node will recover the data New responsible node will check the transaction logs ▫ Redo operations by forwarding the log records
37
![Page 38: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/38.jpg)
Outline
• Motivation • Related work • System architecture of ecStore ▫ distributed storage layer (BATON) ▫ replication layer (self-tuning range histogram) ▫ transaction layer
• Performance study • Conclusion
38
![Page 39: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/39.jpg)
Performance study
• Pessimistic replication method is outperformed by the optimistic replication
39
39
![Page 40: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/40.jpg)
Performance study
• Results show that the proposed load-adaptive replication method can effectively balance the system load distribution under skewed workloads
40
![Page 41: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/41.jpg)
Outline
• Motivation • Related work • System architecture of ecStore ▫ distributed storage layer (BATON) ▫ replication layer (self-tuning range histogram) ▫ transaction layer
• Performance study • Conclusion
41
![Page 42: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/42.jpg)
Conclusion
• ecStore ▫ Underlying BATON distributed index Load-adaptive replication Multi-version optimistic concurrency control
42
![Page 43: Cloud Computing - resources.mpi-inf.mpg.deresources.mpi-inf.mpg.de/departments/d5/teaching/... · ecStore (elastic cloud storage system) • Scalable storage system within the cloud](https://reader034.vdocument.in/reader034/viewer/2022050518/5fa1f4fdf02b415ab62c03ff/html5/thumbnails/43.jpg)
Thanks for your attention! Questions?
Martin Ites