automated, elastic resource provisioning for nosql clusters using tiramola ioannis konstantinou...
TRANSCRIPT
Automated, Elastic Resource Provisioning for NoSQL clusters
Using TIRAMOLA
Ioannis Konstantinou
CSLAB, National Technical University of Athens, Greece
14/05/2013
2 Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
Motivation – the story(1) ‘Big-data’ opts for highly scalable distributed solutions
(Web) analytics, science, business Store + analyze everything, no matter the size Traditional databases not up to the task
Applications suffer from highly variable and unpredictable workloads Social networks, internet gaming, web sites, etc Over-provisioning is costly, under-provisioning leads to
outages Cloud computing can help!!!!
Elastic, pay-as-you go resource provisioning Suitable for distributed scalable applications Can we take advantage of elasticity in an automated, fine-tuned application agnostic manner?
14/05/2013
3 Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
Motivation – the story (2) NoSQL
Non-relational Horizontal scalable Distributed Open source
And often: schema-free, easily replicated, simple API, eventually
consistent /(not ACID), big-data-friendly, etc
Many, many, implementations… Currently around 150 see http://nosql-database.org/
14/05/2013
4 Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
NoSQLs and elasticity Column family
Hbase, Cassandra, … Document store
CouchDB, mongoDB, … Key-Value store
Riak, Dynamo, Voldemort, … Many offer elasticity+sharding:
Expand/contract resources according to demand Shared-nothing architecture allows that Pay-as-you-go, robustness, performance Manual and simple threshold-based elastic actions are
suboptimal Important! See Apr 2011 Amazon outage (foursquare, reddit,
…)
14/05/2013
thus…(end of the story) PaaS and NoSQLs are (or should be) inherently
elastic How efficiently do they implement elasticity?
NoSQLs over an IaaS platform EC2, Eucalyptus, OpenStack,…
We need a study that registers qualitative + quantitative results
Can we automate the elasticity procedure? For any NoSQL, any user-given policy, any metric of
interest? Can we bundle everything together? The Tiramola system
Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
5
14/05/2013
6
Contributions Tiramola, a generic modular system able to
manage any NoSQL engine User-defined policies Automatic resource provisioning using IaaS clouds
Module for NoSQL cluster monitoring Real time reporting of client, application and general
purpose metrics Decision making module for automatic elasticity
Implementation as a Markov Decision Process Continuously identifies the best scaling action
according to observed system state
Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
14/05/2013
7 Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
Contribution side-effects Coding + infrastructure
Open source python code (GFOSS + google code) http://tiramola.googlecode.com
Using cloud-based client tools, platform-agnostic Euca2ools guarantee execution in numerous cloud
platforms YCSB clients Cassandra, Hbase implementation
almost Voldemort, Riak
14/05/2013
8 Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
Related work Elastic Nimbus [13], AutoScale [14], SmartSla [15], iBalloon
[17], Kingfisher [18] Do not handle NoSQL systems Do not address dynamic cluster resizes
[19] elastic HDFS Specific HDFS implementation Not fine-grained policy support, No support for auto-learning
Cloudy [20], Nefeli [21], Microeconomics [23] [20] Provides only support for scaling [21] requires software installation at the cloud vendor [22] classic microeconomics for profit maximization, no support
for fine-grained policy definitions Systems like Autoscaling [24], RightScale [26], Scalr [27]
Commercial: vendor-lock in Limited metric support, primitive decision making policies
14/05/2013
9 Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
Tiramola architecture overview
Collect Metrics
Decision Making
Monitoring Adjustresources
Virtual NoSQL Cluster
ManageNoSQL nodes
User policies
TIRAMOLA
Cluster Coordinator
Cloud Management
Orchestrate Cluster
Resize ActionGet freshmetrics
Use
rs
Load
Add/delete VMs
Cloud Provider
14/05/2013
10 Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
Tiramola monitoring module
Collect Metrics
Decision Making
Monitoring Adjustresources
Virtual NoSQL Cluster
ManageNoSQL nodes
User policies
TIRAMOLA
Cluster Coordinator
Cloud Management
Orchestrate Cluster
Resize ActionGet freshmetrics
Use
rs
Load
Add/delete VMs
Cloud Provider
Ganglia tool Suitable for
clusters Easy to
install/maintain Low overhead-
UDP Metrics
General purpose like CPU, RAM
App-specific and user metrics: gmetric spoofing
14/05/2013
11 Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
Tiramola cloud management module
Collect Metrics
Decision Making
Monitoring Adjustresources
Virtual NoSQL Cluster
ManageNoSQL nodes
User policies
TIRAMOLA
Cluster Coordinator
Cloud Management
Orchestrate Cluster
Resize ActionGet freshmetrics
Use
rs
Load
Add/delete VMs
Cloud Provider
Translates resize actions Contacts IaaS provider Acquires or releases
cloud resources Euca2tools
EC2 compliant Support for any cloud
management Precooked VM images
Ready to launch AMIs Contain all necessary
NoSQL libraries
14/05/2013
12 Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
Tiramola cluster coordinator module
Collect Metrics
Decision Making
Monitoring Adjustresources
Virtual NoSQL Cluster
ManageNoSQL nodes
User policies
TIRAMOLA
Cluster Coordinator
Cloud Management
Orchestrate Cluster
Resize ActionGet freshmetrics
Use
rs
Load
Add/delete VMs
Cloud Provider
Operates when cloud management finishes resource (de-)allocation
Orchestrates NoSQL When resources change
Implements specific NoSQL resizing actions Remote execution of
shell scripts Injection of new config
files Current support for:
HBase, Cassandra, Riak
14/05/2013
13 Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
Tiramola core: Decision-Making module Formulation as an MDP:
{S, A, {Piαj}, γ, Riαj} Identify Exp. Sum of rew.
State = # running VMs Actions = {add-n, rem-n, no-op} P: Transition probabilities going from state i to j Reward Function R – sets the policy for the resize
Immediate gain for going to state s R(s) = f(gains, costs)
Thus optim. Value f: (Bellman’s equation) System of equations, optimal policy greedy w.r.t. V
ssrEsV tk
tkk
1)(
)(max)()( *
'
* sVsRsVss a
14/05/2013
14 Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
Large Degree of freedom MDP allows for optimal solution without
previous knowledge The system learns from previous experience, by
“exploring” permissible states Learning is real-time and continues during
execution Decision making is auto-tuned, reacting to
environment and reward changes. Generic applicable
Arbitrary reward functions allow for virtually any optimization policy
14/05/2013
15 Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
Estimate R(s) for possible transitions How to know exact R(s) for all s, without making the
transition? Assume “deterministic”, reliable cluster behavior Similar input ⇨ similar performance metrics
Idea: cluster measurements around current load
r(s)=f(latency, VMs) Add 2 extra dimensions: #VMs, load
Find, for all permissible transitions, what latency would be
14/05/2013
16 Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
Architectural considerations Robustness
Daemon process that checkpoints and can be restarted
State is provided from the IaaS Cloud and the Monitoring module.
Applicable timeouts (not realtime systems!) Modularity
Different interchangeable components APIs that utilize primitives (NoSQL and Policies)
Expandability Speed (irrelevant in most cases) Written in Python
14/05/2013
17 Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
Platform Setup Cloud Cluster
Private OpenStack cactus cluster A total of 16 server VMs
Similar to an Amazon EC2 large instance 4-core processor, 8GB RAM, 50GB disk space
A total of 20 client VMs 2-core processor, 4GB RAM, 50GB disk space
Storage QCOW image: 1.6GB compressed, 4.3GB
uncompressed VM root fs instead of EBS (Reddit outage)
14/05/2013
18 Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
Clients, Data and Workloads Hbase (v. 0.20.6), Cassandra (v. 0.7.0 beta)
Hadoop 0.2.20 8 initial nodes (VMs)
Ganglia 3.1.2 YCSB tool
Database: 20M objects – 20GB raw (Cass ~60GB, Hbase ~90GB)
Loads: UNI_R, UNI_U, UNI_RMW, ZIPF_R Default: uniform read, 10%-50% range
Both client (YCSB) and cluster (Ganglia) metrics reported
14/05/2013
19 Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
1. Metrics affected
Max throughput HBase μmax = 80Kreqs/sec @ λ=80Kreqs/sec
Cassandra μmax = 13Kreqs/sec @ λ=20Kreqs/sec
Max total cluster CPU usage HBase CPUmax=55%
Cassandra CPUmax=76%
Further λ increase has no effect Systems are fully utilized and requests are queued
Stress systems by increasing client request λ
UNI_R workload Identify critical
operation points 8-node cluster, no
resize
14/05/2013
20 Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
The setup λstart=180K reqs/sec (way over critical point) At time t=370sec:
Double the cluster size (add extra 8 nodes) Triple the cluster size (add extra 16 nodes) 4 different experiments
READ+8, READ+16, UPDATE+8 and UPDATE+16
Measure client, cluster-side metrics query latency, throughput and total cluster usage Vs time
14/05/2013
21 Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
2. HBase cluster resize For READ, throughput is
(roughly) doubled and tripled More servers handle more
requests Data is not transferred,
but it is cached
Update is not affected I/O bound operation Updates will be handled by the initial 8 nodes Only new regions due to compaction will be
served by new nodes
14/05/2013
22 Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
Evaluating Tiramola – setup Initial state: S4, 1 to 16 VMs cluster size allowed Sinusoid-like READ loads – vary peak, periodicity
Alter YCSB clients Reward functions (proof of concept)
r1(s) = -C∙|VMs| r2(s) = B ∙ thr r3(s) = B ∙ thr - C ∙ |VM|2
r4(s) = B ∙ thr - C ∙ |VMs| - A ∙ lat. Monitor every min, decide every 10 min, 5 min
backoff Training Period for initial data points
Not necessary, but allows quicker “correct” decisions
14/05/2013
23 Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
Evaluating Tiramola – 1 r1, r2
r3, r4
14/05/2013
24 Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
Evaluating Tiramola – 2 Different amplitude
Different periodicity
14/05/2013
25 Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
Evaluating Tiramola – 3 Initial training set close to applied load
Initial training set very different from applied load
Min training
Min training
Max training
Max training
14/05/2013
26 Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA - I. Konstantinou
Questions? “TIRAMOLA: Elastic NoSQL Provisioning
through A Cloud Management Platform” – SIGMOD 2012(Demo Track)
“Automatic Scaling of Selective SPARQL Joins Using the TIRAMOLA System” – SWIM 2012
“On the Elasticity of NoSQL Databases over Cloud Management Platforms” – CIKM 2011
“Elastic NoSQL databases over the Cloud” – ΕΛ/ΛΑΚ 2011
CELAR: “Automatic, multi-grained elasticity-provisioning for the Cloud” – FP7
http://www.celarcloud.eu/ http://tiramola.googlecode.com