building an elastic main-memory database e-store · o constraints: each tuple/block must be...
TRANSCRIPT
![Page 2: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/2.jpg)
Collaboration Between Many Rebecca Taft, Vaibhav Arora, Essam Mansour, Marco Serafini, Jennie Duggan, Aaron J. Elmore, Ashraf Aboulnaga, Andy Pavlo, Amr El Abbadi, Divy Agrawal, Michael Stonebraker
E-Store @ VLDB 2015, Squall @ SIGMOD 2015, E-Store++ @ ????
![Page 3: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/3.jpg)
Databases are Great
Developer ease via ACID
Turing Award winning great
![Page 4: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/4.jpg)
But they are Rigid and Complex
![Page 5: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/5.jpg)
Growth…
0
50
100
150
200
250
300
350
Q3 09
Q4 09
Q1 10
Q2 10
Q3 10
Q4 10
Q1 11
Q2 11
Q3 11
Q4 11
Q1 12
Q2 12
Q3 12
Average Millions of Active Users Rapid growth of some web services led to design of new “web-scale” databases…
![Page 6: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/6.jpg)
Rise of NoSQL Scaling is needed
Chisel away at functionality ◦ No transactions ◦ No secondary indexes ◦ Minimal recovery ◦ Mixed Consistency
Not always suitable…
![Page 7: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/7.jpg)
Workloads Fluctuate
Demand
Time
Reso
urce
s
Slide Credits: Berkeley RAD Lab
![Page 8: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/8.jpg)
Peak Provisioning
Demand
Capacity
Time
Reso
urce
s
Unused resources
Slide Credits: Berkeley RAD Lab
![Page 9: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/9.jpg)
Peak Provisioning isn’t Perfect
Time
Reso
urce
s
Slide Credits: Berkeley RAD Lab
![Page 10: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/10.jpg)
Growth is not always sustained
0
50
100
150
200
250
300
350
Q3 09 Q4 09 Q1 10 Q2 10 Q3 10 Q4 10 Q1 11 Q2 11 Q3 11 Q4 11 Q1 12 Q2 12 Q3 12 Q4 12 Q1 13 Q2 13 Q3 13 Q4 13 Q1 14 Q2 14 Q3 14 Q4 14
Average Millions of Active Users
http://www.statista.com/statistics/273569/monthly-active-users-of-zynga-games/
![Page 11: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/11.jpg)
Need Elasticity ELASTICITY > SCALABILITY
![Page 12: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/12.jpg)
The Promise of Elasticity
Demand
Capacity
Time
Reso
urce
s
Unused resources
Slide Credits: Berkeley RAD Lab
![Page 13: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/13.jpg)
Primary use-cases for elasticity Database-as-a-Service with elastic placement of non-correlated tenants, often low utilization per tenant. High-throughput transactional systems (OLTP)
![Page 14: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/14.jpg)
No Need to Weaken the Database!
![Page 15: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/15.jpg)
High Throughput = Main Memory Cost per GB for RAM is dropping.
Network memory is faster than local disk.
Much faster than disk based DBs.
![Page 16: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/16.jpg)
Approaches for “NewSQL” main-memory* Highly concurrent, latch-free data structures Partitioning into single-threaded executors
*Excuse the generalization
![Page 17: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/17.jpg)
Client Application
Procedure Name Input Parameters
Slide Credits: Andy Pavlo
![Page 18: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/18.jpg)
Database Partitioning
DISTRICT
CUSTOMER
ORDER_ITEM
ITEM
STOCK
WAREHOUSE
ORDERS
DISTRICT
CUSTOMER
ORDER_ITEM
STOCK
ORDERS ITEM
Replicated
WAREHOUSE
TPC-C Schema Schema Tree
Slide Credits: Andy Pavlo
![Page 19: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/19.jpg)
ITEM ITEMj ITEM ITEM ITEM
Database Partitioning
P2
P4
DISTRICT
CUSTOMER
ORDER_ITEM
STOCK
ORDERS
Replicated
WAREHOUSE
P1
P1
P1
P1
P1
P1
P2
P2
P2
P2
P2
P2
P3
P3
P3
P3
P3
P3
P4
P4
P4
P4
P4
P4
P5
P5
P5
P5
P5
P5
P5
P3
P1
ITEM ITEM
ITEM ITEM
ITEM
Partitions
ITEM
Schema Tree
Slide Credits: Andy Pavlo
![Page 20: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/20.jpg)
The Problem: Workload Skew Many OLTP applications suffer from variable load and high skew:
Extreme Skew: 40-60% of NYSE trading volume is on 40 individual stocks
Time Variation: Load “follows the sun”
Seasonal Variation: Ski resorts have high load in the winter months
Load Spikes: First and last 10 minutes of trading day have 10X the average volume
Hockey Stick Effect: A new application goes “viral”
![Page 21: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/21.jpg)
High Skew
Low Skew
No Skew Uniform data access
2/3 of queries access top 1/3 of data
Few very hot items
The Problem: Workload Skew
![Page 22: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/22.jpg)
The Problem: Workload Skew High skew increases latency by 10X and decreases throughput by 4X
Partitioned shared-nothing systems are especially susceptible
![Page 23: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/23.jpg)
The Problem: Workload Skew Possible solutions: oProvision resources for peak load (Very expensive and brittle!)
oLimit load on system (Poor performance!)
oEnable system to elastically scale in or out to dynamically adapt to changes in load
![Page 24: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/24.jpg)
Elastic Scaling Key Value
1 kjfhlaks
2 ewoiej
3 fsakjfkf
4 fkjwei
Key Value
5 wekjxmn
6 fmeixtm
7 ewjkdm
8 weiuqw
Key Value
9 mznjku
10 ewrenx
11 qieucm
12 roiwrio
Key Value
1 kjfhlaks
2 ewoiej
3 fsakjfkf
4 fkjwei
Key Value
5 wekjxmn
6 fmeixtm
7 ewjkdm
8 weiuqw
Key Value
9 mznjku
10 ewrenx
Key Value
11 qieucm
12 roiwrio
![Page 25: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/25.jpg)
Load Balancing Key Value
1 kjfhlaks
2 ewoiej
3 fsakjfkf
4 fkjwei
Key Value
5 wekjxmn
6 fmeixtm
7 ewjkdm
8 weiuqw
Key Value
9 mznjku
10 ewrenx
11 qieucm
12 roiwrio
Key Value
1 kjfhlaks
2 ewoiej
3 fsakjfkf
4 fkjwei
10 ewrenx
11 qieucm
Key Value
5 wekjxmn
6 fmeixtm
7 ewjkdm
8 weiuqw
12 roiwrio
Key Value
9 mznjku
![Page 26: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/26.jpg)
Two-Tiered Partitioning What if only a few specific tuples are very hot? Deal with them separately!
Two tiers: 1. Individual hot tuples, mapped explicitly to partitions 2. Large blocks of colder tuples, hash- or range-partitioned at coarse granularity
Possible implementations: o Fine-grained range partitioning o Consistent hashing with virtual nodes o Lookup table combined with any standard partitioning scheme
Existing systems are “one-tiered” and partition data only at course granularity o Unable to handle cases of extreme skew
![Page 27: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/27.jpg)
E-Store End-to-end system which extends H-Store (a distributed, shared-nothing, main memory DBMS) with automatic, adaptive, two-tiered elastic partitioning
![Page 28: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/28.jpg)
E-Store Normal
operation, high level
monitoring
Tuple level monitoring (E-Monitor)
Tuple placement planning
(E-Planner)
Online reconfiguration (Squall)
Load imbalance detected
Hot tuples, partition-level access counts
Reconfiguration complete
New partition plan
![Page 29: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/29.jpg)
E-Monitor: High-Level Monitoring High level system statistics collected every ~1 minute
o CPU indicates system load, used to determine whether to add or remove nodes, or re-shuffle the data
o Accurate in H-Store since partition executors are pinned to specific cores
o Cheap to collect
o When a load imbalance (or overload/underload) is detected, detailed monitoring is triggered
![Page 30: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/30.jpg)
E-Monitor: Tuple-Level Monitoring Tuple-level statistics collected in case of load imbalance
o Finds the top 1% of tuples accessed per partition (read or written) during a 10 second window o Finds total access count per block of cold tuples
Can be used to determine workload distribution, using tuple access count as a proxy for system load
o Reasonable assumption for main-memory DBMS w/ OLTP workload
Minor performance degradation during collection
![Page 31: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/31.jpg)
E-Monitor: Tuple-Level Monitoring Sample output
![Page 32: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/32.jpg)
E-Planner Given current partitioning of data, system statistics and hot tuples/partitions from E-Monitor, E-Planner determines:
o Whether to add or remove nodes o How to balance load
Optimization problem: minimize data movement (migration is not free) while balancing system load.
We tested five different data placement algorithms: o One-tiered bin packing (ILP – computationally intensive!) o Two-tiered bin packing (ILP – computationally intensive!) o First Fit (global repartitioning to balance load) o Greedy (only move hot tuples) o Greedy Extended (move hot tuples first, then cold blocks until load is balanced)
![Page 33: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/33.jpg)
E-Planner: Greedy Extended Algorithm
Current YCSB partition plan "usertable": { 0: [0-100000) 1: [100000-200000) 2: [200000-300000) }
Hot tuples
Accesses
0 20,000
1 12,000
2 5,000
Range Accesses
3-1000 5,000
1000-2000 3,000
2000-3000 2,000
… … New YCSB partition plan "usertable": { 0: [1000-100000) 1: [1-2),[100000-200000) 2: [200000-300000),[0-1), [2-1000) }
?
![Page 34: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/34.jpg)
E-Planner: Greedy Extended Algorithm
Partition Keys Total Cost (tuple accesses)
0 [0-100000) 77,000
1 [100000-200000) 23,000
2 [200000-300000) 5,000
Target cost per partition: 35,000
Hot tuples
Accesses
0 20,000
1 12,000
2 5,000
Range Accesses
3-1000 5,000
1000-2000 3,000
2000-3000 2,000
… …
![Page 35: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/35.jpg)
E-Planner: Greedy Extended Algorithm
Partition Keys Total Cost (tuple accesses)
0 [0-100000) 77,000
1 [100000-200000) 23,000
2 [200000-300000) 5,000
Target cost per partition: 35,000
Hot tuples
Accesses
0 20,000
1 12,000
2 5,000
Range Accesses
3-1000 5,000
1000-2000 3,000
2000-3000 2,000
… …
![Page 36: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/36.jpg)
E-Planner: Greedy Extended Algorithm
Partition Keys Total Cost (tuple accesses)
0 [1-100000) 57,000
1 [100000-200000) 23,000
2 [200000-300000),[0-1)
25,000
Target cost per partition: 35,000
Hot tuples
Accesses
0 20,000
1 12,000
2 5,000
Range Accesses
3-1000 5,000
1000-2000 3,000
2000-3000 2,000
… …
![Page 37: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/37.jpg)
E-Planner: Greedy Extended Algorithm
Partition Keys Total Cost (tuple accesses)
0 [1-100000) 57,000
1 [100000-200000) 23,000
2 [200000-300000),[0-1)
25,000
Target cost per partition: 35,000
Hot tuples
Accesses
0 20,000
1 12,000
2 5,000
Range Accesses
3-1000 5,000
1000-2000 3,000
2000-3000 2,000
… …
![Page 38: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/38.jpg)
E-Planner: Greedy Extended Algorithm
Partition Keys Total Cost (tuple accesses)
0 [1-100000) 57,000
1 [100000-200000) 23,000
2 [200000-300000),[0-1)
25,000
Target cost per partition: 35,000
Hot tuples
Accesses
0 20,000
1 12,000
2 5,000
Range Accesses
3-1000 5,000
1000-2000 3,000
2000-3000 2,000
… …
![Page 39: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/39.jpg)
E-Planner: Greedy Extended Algorithm
Partition Keys Total Cost (tuple accesses)
0 [2-100000) 45,000
1 [100000-200000), [1-2)
35,000
2 [200000-300000),[0-1)
25,000
Target cost per partition: 35,000
Hot tuples
Accesses
0 20,000
1 12,000
2 5,000
Range Accesses
3-1000 5,000
1000-2000 3,000
2000-3000 2,000
… …
![Page 40: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/40.jpg)
E-Planner: Greedy Extended Algorithm
Partition Keys Total Cost (tuple accesses)
0 [2-100000) 45,000
1 [100000-200000), [1-2)
35,000
2 [200000-300000),[0-1)
25,000
Target cost per partition: 35,000
Hot tuples
Accesses
0 20,000
1 12,000
2 5,000
Range Accesses
3-1000 5,000
1000-2000 3,000
2000-3000 2,000
… …
![Page 41: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/41.jpg)
E-Planner: Greedy Extended Algorithm
Partition Keys Total Cost (tuple accesses)
0 [2-100000) 45,000
1 [100000-200000), [1-2)
35,000
2 [200000-300000),[0-1)
25,000
Target cost per partition: 35,000
Hot tuples
Accesses
0 20,000
1 12,000
2 5,000
Range Accesses
3-1000 5,000
1000-2000 3,000
2000-3000 2,000
… …
![Page 42: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/42.jpg)
E-Planner: Greedy Extended Algorithm
Partition Keys Total Cost (tuple accesses)
0 [3-100000) 40,000
1 [100000-200000), [1-2)
35,000
2 [200000-300000),[0-1), [2-3)
30,000
Target cost per partition: 35,000
Hot tuples
Accesses
0 20,000
1 12,000
2 5,000
Range Accesses
3-1000 5,000
1000-2000 3,000
2000-3000 2,000
… …
![Page 43: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/43.jpg)
E-Planner: Greedy Extended Algorithm
Partition Keys Total Cost (tuple accesses)
0 [3-100000) 40,000
1 [100000-200000), [1-2)
35,000
2 [200000-300000),[0-1), [2-3)
30,000
Target cost per partition: 35,000
Hot tuples
Accesses
0 20,000
1 12,000
2 5,000
Range Accesses
3-1000 5,000
1000-2000 3,000
2000-3000 2,000
… …
![Page 44: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/44.jpg)
E-Planner: Greedy Extended Algorithm
Partition Keys Total Cost (tuple accesses)
0 [3-100000) 40,000
1 [100000-200000), [1-2)
35,000
2 [200000-300000),[0-1), [2-3)
30,000
Target cost per partition: 35,000
Hot tuples
Accesses
0 20,000
1 12,000
2 5,000
Range Accesses
3-1000 5,000
1000-2000 3,000
2000-3000 2,000
… …
![Page 45: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/45.jpg)
E-Planner: Greedy Extended Algorithm
Partition Keys Total Cost (tuple accesses)
0 [1000-100000) 35,000
1 [100000-200000), [1-2)
35,000
2 [200000-300000),[0-1), [2-1000)
35,000
Hot tuples
Accesses
0 20,000
1 12,000
2 5,000
Range Accesses
3-1000 5,000
1000-2000 3,000
2000-3000 2,000
… …
Target cost per partition: 35,000
![Page 46: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/46.jpg)
E-Planner: Greedy Extended Algorithm
Partition Keys Total Cost (tuple accesses)
0 [1000-100000) 35,000
1 [100000-200000), [1-2)
35,000
2 [200000-300000),[0-1), [2-1000)
35,000
Hot tuples
Accesses
0 20,000
1 12,000
2 5,000
Range Accesses
3-1000 5,000
1000-2000 3,000
2000-3000 2,000
… …
Target cost per partition: 35,000
![Page 47: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/47.jpg)
E-Planner: Other Heuristic algorithms Greedy
o Like Greedy Extended, but the algorithm stops after all hot tuples have been moved o If there are not many hot tuples (e.g. low skew), may not sufficiently balance the workload
First Fit
o First packs hot tuples onto partitions, filling one partition at a time o Then packs blocks of cold tuples, filling the remaining partitions one at a time o Results in a balanced workload, but does not attempt to limit the amount of data movement
![Page 48: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/48.jpg)
E-Planner: Optimal Algorithms Two-Tiered Bin Packer
o Uses Integer Linear Programming (ILP) to optimally pack hot tuples and cold blocks onto partitions o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must
have total load less than the average + 5% o Optimization Goal: minimize the amount of data moved in order the satisfy the constraints
One-Tiered Bin Packer o Like Two-Tiered Bin Packer, but can only pack blocks of tuples, not individual tuples o Both are computationally intensive, but show one- and two-tiered approaches in best light
![Page 49: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/49.jpg)
Squall Given plan from E-Planner, Squall physically moves the data while the system is live
For immediate benefit, moves data from hottest partitions to coldest partitions first
More on this in a bit…
![Page 50: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/50.jpg)
Results – Two-Tiered V. One-Tiered YCSB High skew
YCSB Low skew
![Page 51: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/51.jpg)
Results – Heuristic Planners YCSB High skew
YCSB Low skew
![Page 52: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/52.jpg)
Results YCSB High skew
YCSB Low skew
![Page 53: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/53.jpg)
But What About… Distributed Transactions???
Current E-Store does not take them into account when planning data movement
Ok when most transactions access a single partitioning key – tends to be the case for “tree schemas” such as YCSB, Voter, and TPC-C
E-Store++ will address the general case o More later…
![Page 54: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/54.jpg)
Squall FINE-GRAINED LIVE RECONFIGURATION FOR PARTITIONED MAIN MEMORY DATABASES
![Page 55: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/55.jpg)
The Problem Need to migrate tuples between partitions to reflect the updated partitioning.
Would like to do this without bringing the system offline: ◦ Live Reconfiguration
Similar to live migration of an entire database between servers.
![Page 56: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/56.jpg)
Existing Solutions are Not Ideal Predicated on disk based solutions with traditional concurrency and recovery.
Zephyr: Relies on concurrency (2PL) and disk pages.
ProRea: Relies on concurrency (SI and OCC) and disk pages.
Albatross: Relies on replication and shared disk storage. Also introduces strain on source.
Slacker: Replication middleware based.
![Page 57: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/57.jpg)
Not Your Parent’s Migration More than a single source and destination ◦ Want lightweight coordination
Single threaded execution model ◦ Either doing work or migration
Presence of distributed transactions and replication
Migrating 2 warehouses in TPC-C In E-Store with a Zephyr like migration
![Page 58: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/58.jpg)
Squall Given plan from E-Planner, Squall physically moves the data while the system is live
Conforms to H-Store single-threaded execution model
o While data is moving, transactions are blocked
To avoid performance degradation, Squall moves small chunks of data at a time, interleaved with regular transaction execution
![Page 59: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/59.jpg)
Reconfiguration (New Plan, Leader ID)
Pull W_ID=2
Partition 2
Partition 3
Squall Steps
Pull W_ID>5
Client
Partition 1
Partition 4
Partition 2
Partition 3
Partition 1
Partition 4
0 1
2
3 4
5 6
7
8 9
10
district
customer
stock
warehouse
0 1
2
3 4
5 6
7
8 9
10 Incoming: 2 Outgoing: 5
Outgoing: 2
Incoming: 5
1. Identify migrating data 2. Live reactive pulls for required data 3. Periodic lazy/async pulls for large chunks
![Page 60: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/60.jpg)
Keys to Performance Redirect or pull only if needed.
Properly size reconfiguration granule.
Split large reconfigurations to limit demands on single partition.
Tune what gets pulled.
Sometimes pull a little extra.
Migrating 2 warehouses in TPC-C In E-Store with a Zephyr like migration
![Page 61: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/61.jpg)
Redirect and Pull Only When Needed
![Page 62: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/62.jpg)
Data Migration Query arrives, must be trapped to check if data is potentially moving. Check key map, then ranges list.
If either source or destination partition is local check their map, keep local if possible.
If neither partition is local, forward to destination.
If data is not moving, process transaction.
![Page 63: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/63.jpg)
Trap for Data Movement If txn requires incoming data, block execution and schedule data pull. ◦ Can only block dependent nodes in query plan ◦ Upon receipt mark and dirty tracking structures, and unblock.
If txn requires lost data, restart as distributed transaction or forward request.
![Page 64: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/64.jpg)
Data Pull Requests Live data pulls are scheduled at destination as high priority transactions.
Current transaction finishes before extraction.
Timeout detection is needed.
![Page 65: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/65.jpg)
Chunk Data for Asynchronous Pulls
![Page 66: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/66.jpg)
Why Chunk? Unknown amount of data when not partitioned by clustered index.
Customers by W_ID in TPC-C
Time spent extracting, is time not spent on TXNS.
Want a mechanism to support partial extraction while maintaining consistency.
![Page 67: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/67.jpg)
Async Pulls Periodically pull chunks of cold data
These pulls are answered lazy
Execution is interwoven with extracting and sending data (dirty the range though!)
![Page 68: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/68.jpg)
Mitigating Async Pulls
Partition 1
Txn Queue
Idle Clock
Partition 2
Txn Queue
Pull Async Data
![Page 69: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/69.jpg)
New Transactions Take Precedent
Partition 1
Txn Queue
Partition 2
Txn Queue
![Page 70: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/70.jpg)
Extract up to Chunk Limit
Txn Queue
Partition 2
Txn Queue
Partition 1
Important to note data is partially migrated!
![Page 71: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/71.jpg)
Repeat Until Complete
Partition 1
Txn Queue
Partition 2
Txn Queue
Repeat chunking until complete. New transactions still take precedent
![Page 72: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/72.jpg)
Sizing Chunks Static analysis to set chunk sizes, future work to dynamically set sizing and scheduling.
Impact on chunk sizes on a 10% reconfiguration during a YCSB workload.
![Page 73: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/73.jpg)
Space Async Pulls Introduce delay at destination between new async pull requests.
Impact on chunk sizes on a 10% reconfiguration during a YCSB workload with 8mb chunk size.
![Page 74: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/74.jpg)
Splitting Reconfigurations Split by pairs of source and destination
Example: partition 1 is migrating W_ID 2,3 to partitions 3 and 7, execute as two reconfigurations.
If migrating large objects, split them and use distributed transactions.
![Page 75: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/75.jpg)
Splitting into Sub-Plans Set a cap on sub-plan splits, and split on pairs and ability to decompose migrating objects
![Page 76: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/76.jpg)
All about trade-offs Trading off time to complete migration and performance degradation.
Future work to consider automating this trade-off based on service level objectives.
![Page 77: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/77.jpg)
Results Highlight
TPC-C load balancing hotspot warehouses
![Page 78: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/78.jpg)
YCSB Latency
YCSB cluster consolidation 4 to 3 nodes YCSB data shuffle 10% pairwise
![Page 79: Building an Elastic Main-Memory Database E-Store · o Constraints: each tuple/block must be assigned to exactly one partition, and each partition must have total load less than the](https://reader034.vdocument.in/reader034/viewer/2022042416/5f31dc5682721c257d591a79/html5/thumbnails/79.jpg)
I Fell Asleep… What Happened Skew happens, two-tiered partitioning for greedy load-balancing and elastic growth helps
If you have work or migrate, be careful to break up the migrations and don’t be too needy on any one partition.
We are thinking hard about skewed workloads that aren’t trivial to partition.
Questions?