1 moshe shadmon scaledb scaling mysql in the cloud
TRANSCRIPT
3
Start small, grow incrementally Scalable AND highly available Add capacity on demand with zero downtime Simplicity
No need to partition data No need for master-slave
Shared Disk Advantages
Server 1
OSS DBMSOSS
DBMS
ScaleDBScaleDB
VM
The Virtualized Cloud Database
Local DiskLocal Disk
OSS DBMSOSS DBMS
Storage EngineStorage Engine
My SQL Server Server 2
OSS DBMSOSS
DBMS
ScaleDBScaleDB
VMOSS
DBMSOSS
DBMS
ScaleDBScaleDB
VMOSS
DBMSOSS
DBMS
ScaleDBScaleDB
VMOSS
DBMSOSS
DBMS
ScaleDBScaleDB
VM
Shared Nothing
Shared StorageShared Storage
Shared Disk
4
ScaleDB As the Storage Engine
5
ScaleDB Storage Engine
MySql DatabaseManagement Level
Storage EngineLevel
MySql Server
ScaleDBCluster Manager
ScaleDB Node
ScaleDB APIScaleDB API
Transaction ManagerTransaction Manager
IndexManager
IndexManager
DataManager
DataManager
Lock Manager
Local Lock Manager
Local Lock Manager
Log ManagerLog Manager
RecoveryManagerRecoveryManager
Storage ManagerStorage Manager
Global Recovery Manager
Global Recovery Manager
Global SyncManager
Global SyncManager
Global LockManager
Global LockManager
ScaleDB Storage SystemScaleDB Storage System
Cache & Storage Devices
Cache & Storage Devices
Cache & Storage Devices
Cache & Storage Devices
ScaleDB’s Internal Architecture
ScaleDBStorage Sysytem
Buffer Manager
Local Sync CoordinatorLocal Sync Coordinator
Threads ManagerThreads Manager
6
Deploying ScaleDB
…ScaleDB Cluster
Manager
ScaleDB Cluster
Manager
Node 1
DBMSDBMS
ScaleDBScaleDB
Node 2
DBMSDBMS
ScaleDBScaleDB
Node N
DBMSDBMS
ScaleDBScaleDB
ApplicationApplicationApplication Layer
Database Layer(Physical or VM nodes)
Storage LayerShared StorageShared Storage
Shared StorageShared Storage
ScaleDB
7
The Storage Engine
• Pluggable Storage Engine– Transactional storage engine
– Supports MySQL Storage Engine API
– Reads/Writes done via network to a shared storage
– Maintains a local cache
– Local Lock Manager – manage locking at the node level
– Connector to Cluster Manager – synchronize operations at a cluster level
8
The Cluster Manager
• Distributed Lock Manager – manage cluster level locks– Locks can be held over any type of resource:
• DBMS, Table, Partition, File, Block, Row etc.
– Supports multiple lock modes:• Read, Read/Write, exclusive etc.
– Synchronize state using messaging
• Local Lock Manager – manage locks at a node level– Maintains locks at the node level
– Synchronize state using shared memory
• Identifies node failures and manage recovery
9
The Cluster Manager
• Distributed Lock Manager– Synchronize conflicting processes between nodes in the
cluster• Example: 2 nodes need to update the same resource at the same
time.
– The challenge:• Requests are done via the network – can be expensive:
– Internal operations may be in nanoseconds , network operations are in milliseconds
– The solution• Requests are send only when conflicts occur
10
The Storage
• Independent storage nodes– Accessible via network
– Each node has a Cache Layer and a Persistent Layer
– Database nodes can force the write to disk based on transactional requirement
– Data can be distributed over multiple storage nodes
– Each Storage Node can be mirrored
– Each Storage Node may have a Hot Backup Node
11
The Storage Node
12
DisksDisks
CacheBased On LRU
Interface to Storage
Storage Node
– Manage the data in cache and flush to disk when required.
– Supports the storage engine calls for Read, Write, etc.
– Supports pushed calls from storage engine such Count Rows, Search, etc.
– Each node is a Linux machine. No need for Network File System (NFS).
Scaling the Storage Tier
…
ScaleDB Cluster
Manager
ScaleDB Cluster
Manager
Node 1
DBMSDBMS
ScaleDBScaleDB
Node 2
DBMSDBMS
ScaleDBScaleDB
Node N
DBMSDBMS
ScaleDBScaleDB
Database Layer(Physical or VM nodes)
Storage Layer
13
Shared StorageShared Storage
CacheCache
TCP/UDPTCP/UDP
Shared StorageShared Storage
CacheCache
TCP/UDPTCP/UDP
Shared StorageShared Storage
CacheCache
TCP/UDPTCP/UDP
Shared StorageShared Storage
CacheCache
TCP/UDPTCP/UDP
Local Cache
Local Cache
Local Cache
GlobalCache
14
Global Cache
• Guarantees cache coherency • Manages caching of shared data• Minimizes access time to data which is not
in local cache and would otherwise be read from disk
• Implements fast direct memory access over high-speed interconnects for all data blocks and types
• Uses an efficient and scalable messaging protocol
HA of the Storage Tier
…ScaleDB Cluster
Manager
ScaleDB Cluster
Manager
Node 1
DBMSDBMS
ScaleDBScaleDB
Node 2
DBMSDBMS
ScaleDBScaleDB
Node N
DBMSDBMS
ScaleDBScaleDB
Database Layer(Physical or VM nodes)
Storage Layer Shared
StorageShared Storage
Mirrored StorageMirrored Storage
ScaleDB
Hot Backup
Hot Backup
15
Scaling the Storage Tier
…ScaleDB Cluster
Manager
ScaleDB Cluster
Manager
Node 1
DBMSDBMS
ScaleDBScaleDB
Node 2
DBMSDBMS
ScaleDBScaleDB
Node N
DBMSDBMS
ScaleDBScaleDB
Database Layer(Physical or VM nodes)
Partitioned Storage
Partitioned Storage
Partitioned Mirrored
Partitioned Mirrored
Partitioned Hot
Backup
Partitioned Hot
Backup
Partitioned Storage
Partitioned Storage
Partitioned Mirrored
Partitioned Mirrored
Partitioned Hot
Backup
Partitioned Hot
Backup
Partitioned Storage
Partitioned Storage
Partitioned Mirrored
Partitioned Mirrored
Partitioned Hot
Backup
Partitioned Hot
BackupPartition 1 Partition 2 Partition Q
16
Scaling the Storage Tier
ScaleDB Cluster
Manager
ScaleDB Cluster
Manager
Node N
MySQLMySQLDatabase Layer(Physical or VM nodes)
17
ScaleDB
Local CacheLocal Cache
Cache
Storage
Cache
Storage
Cache
Storage
Cache
Storage
Main Main
Mirror Mirror
Cache
Storage
• Read – From Local Cache
– From Main Or Mirror• Get From Cache
• Get From Storage
• Write– To local cache
– At end of transaction• multicast to main and
mirror
• optional acknowledgement:– after receive
– after write
18
Traditional Query Processing
What Were Yesterday Sales ?
Get The Sales Table
Storage Array
Retrieve Entire Sales
Table
Process Table Data
DBMS Server
19
ScaleDB Query Processing
Storage Nodes
DBMS Server
What Were Yesterday Sales ?
Get October 15 Sales
Get October 15 Sales
Get October 15 Sales
Get October 15 Sales
Scaling the Storage Tier
20
• Advantages– Parallel processing:
• I/O calls are executed simultaneously on multiple Storage Nodes.• Logic pushed to storage layer:
“SELECTcustomer_name from calls WHERE amount > 200”
• Traditional approach – return all rows to the database• ScaleDB storage – return selected rows to the database
– Leverage cache on multiple storage nodes– Storage layer can be expended without downtime– Data is Mirrored – Support for Hot-Backup– Low cost
High Availability
• Failure of a node– Detected by the Cluster Manager
• A surviving node is requested to undo uncommitted transactions
• Failure of the Cluster Manager– Detected by the Standby Cluster Manager
• Requests all nodes to undo uncommitted transactions
• Failure of a Storage Node– Continue with a mirrored storage – or –
– Use the Storage Node Log to recover
21
22
Performance / Tuning
• Occurs when 2 or more nodes want the same resource at the same time
• Types of Contention:– Read/Read contention – is never a problem because of
the shared disk system– Read/Write contention – reader is requested to release
the block and grant is provided to writer– Write/Read or Write/Write –
• Writer sends block to the global cache layer,
• Buffer invalidate message is send to the other nodes
• Requestor receives the grant
23
Performance / Tuning
• Fast Network between the nodes – 2 logical networks:
• Between the database nodes and the Cluster Manager• Between the database nodes and the storage
– Optimize Socket Receive Buffers ( 256 KB – 1MB )
• Partition requests to maintain locality of data– Send requests that update/query the same data to the same node
• By Database• By Table • By Table with PK
– Logic can change dynamically to adopt to changes• Changes in data distribution• Changes in user behaviors• Additional DBMS nodes
ScaleDB: Elastic/Enterprise Database
Function SimpleDB RDS ScaleDB
Transactions No Yes Yes
Joins No Yes Yes
Data Consistency No (Eventual) Yes Yes
SQL Support No Yes Yes
ACID Compliant No Yes Yes
Supports MySQL applications without modification
No Yes Yes
Dynamic Elasticity (w/o interruption)
Yes No Yes
High-Availability Yes No Yes
Eliminates Partitioning Yes No Yes
Eliminates possible 5-minutedata loss upon failure
Yes No Yes
24
Value Proposition
• Runs on low-cost cloud infrastructures (e.g. Amazon)
• High-availability, no single point of failure
• Dramatically easier set-up & maintenance– No partitioning/repartitioning
– No slave and replication headaches
– Simplified tuning
• Scales up/down without interrupting your application
• Lower TCO
25