sqlfire at strata 2012
DESCRIPTION
SQLFire is VMware's in-memory distributed NewSQL database. I delivered this preso in connection with Jags, the product architect and we covered the design choices SQLFire makes to achieve extreme scalability, as well as the connection between big data and fast data. The deck looks a little different in presenter mode so for best results download and enjoy.TRANSCRIPT
![Page 1: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/1.jpg)
SQLFire
Fast meets scalable in VMware‘s NewSQL database.
Strata 2012
Jags Ramnarayan – Chief Architect, SQLFireCarter Shanklin – Product Manager, SQLFire
![Page 2: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/2.jpg)
Sponsor Sessions Suck• We Promise To:
– Keep it relevant.– Keep it technical.– Keep it entertaining.
![Page 3: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/3.jpg)
Speed Matters
Users demand fast applications and fast websites.The database is the hardest thing to scale.
![Page 4: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/4.jpg)
Speed• In-memory for maximum
speed and minimum latency.
SQLFire: Speed, Scale, SQLScale
• Horizontally scalable.• Add or remove nodes at
any time for more capacity or availability.
SQL• Familiar SQL interface.• SQL 92 compliant.• JDBC and ADO.NET
interfaces.
![Page 5: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/5.jpg)
How does SQLFire get scale and speed?
• Horizontal scaleout + Dynamic partitioning– Appears to app as single database
• Tunable consistency– Including asynchronous global distribution
• In-memory architecture– “Memory is the new disk, disk the new tape”
![Page 6: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/6.jpg)
![Page 7: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/7.jpg)
Diverging needs for online and analytics
Online Layer
Analytics Layer
User Concurren
cy
Update Rate
Query Richness
Data Volume
![Page 8: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/8.jpg)
![Page 9: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/9.jpg)
![Page 10: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/10.jpg)
![Page 11: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/11.jpg)
![Page 12: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/12.jpg)
![Page 13: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/13.jpg)
SQLFire: What does it really look like?
![Page 14: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/14.jpg)
1
2
3
4
5
6
7
8
910
SQLFire Tables Are Replicated By Default. CREATE TABLE sales
(product_id int, store_id int,
price float);
SQLFire Node 1
SQLFire Node 2
Replica
Replica
sales
Best for small andfrequently accessed
data.
![Page 15: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/15.jpg)
1
2
3
4
5
6
7
8
910
Partitioned Tables Are Split Among Members. CREATE TABLE sales
(product_id int, store_id int,
price float)
PARTITION BY
COLUMN (product_id);
SQLFire Node 1
SQLFire Node 2
Replica
Replica
sales Partition 1
Partition 2Best for largedata sets.
![Page 16: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/16.jpg)
Type Purpose Example
Hash Partitioning (Default)
Built-in hashing algorithm splits data at random across available servers.
PARTITION BY COLUMN (customer_id);
ListManually divide data across servers based on discrete criteria.
PARTITION BY LIST (home_state) (VALUES (‘CA’, ‘WA’), VALUES (‘TX’, ‘OK’));
RangeManually divide data across servers based on continuous criteria.
PARTITION BY RANGE (date) (VALUES BETWEEN ‘2008-01-01’ AND ‘2008-12-31’, VALUES BETWEEN ‘2009-01-01’ AND ‘2009-12-30’);
ExpressionFully dynamic division of data based on function execution. Can use UDFs.
PARTITION BY (MONTH(date));
Types Of Partitioning In SQLFire.
![Page 17: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/17.jpg)
How does it scale for queries?
N = 2 4 6 8 10
200k
420k
604k
790k
1M
Partitioned TablePK queries per second
(1kb Rows)
Number Of Servers
# Clients = 2*N
200
400
600
800
1000
![Page 18: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/18.jpg)
How does it scale for updates?
N = 2 4 6 8 10
220k
490k
750k
950k
1.3M
Partitioned TableUpdates Per Second
(3 columns)
Number Of Servers
85% < 1mslatency # Clients = 2*N
200
400
600
800
1000
![Page 19: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/19.jpg)
1
2
3
4
5
6
7
8
910
Redundancy Increases Availability. CREATE TABLE sales
(product_id int, store_id int,
price float)
PARTITION BY
COLUMN (product_id);
REDUNDANCY 1;
SQLFire Node 1
Partition 2*
SQLFire Node 2
Partition 1*
Replica
Replica
salesPartition 1
Partition 2All data is availableif Node 1 fails.
![Page 20: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/20.jpg)
Partitioning and redundancy
Redundancy = 2(but tunable)
Single ownerfor any row at point
in time
Replication can be “rack aware”
Replication is synchronous but done
in parallel
![Page 21: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/21.jpg)
SQLFire: Derp-Proof Database
• Instant failover at protocol level• Apps retain their connections• Data remains available Was that cord
supposed to be in the wall?
![Page 22: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/22.jpg)
Select * from Customer c, Sales swhere c.cust_id = s.cust_id
and c.cust_id ='xxx';
• With Hash partitioning the join logic executes everywhere
• Distributed joins are expensive and inhibit scaling– joins across distributed nodes could involve distributed locks and
potentially a lot of intermediate data transfer across nodes
Linearly scaling joins
![Page 23: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/23.jpg)
Designer thinks about how data maps to partitions– The main idea is to:
1) minimize excessive data distribution by keeping the most frequently accessed and joined data collocated on partitions
2) Collocate transaction working set on partitions so complex 2-phase commits/paxos commit is eliminated or minimized.
Read Pat Helland’s “Life beyond Distributed Transactions” and the Google MegaStore paper
Partition Aware DB Design
![Page 24: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/24.jpg)
1
2
3
4
5
6
7
8
910
Collocate Data For Fast Joins. CREATE TABLE sales
(product_id int, store_id int,
price float)
PARTITION BY
COLUMN (product_id);
COLOCATE WITH customers;
SQLFire Node 1
Customer 1 Sales
SQLFire Node 2
Customer 2 Sales
Replica
Replica
Customer 1
Customer 2SQLFire can jointables withoutnetwork hops.
C1
C2
Related data placedon the same node.
![Page 25: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/25.jpg)
SQLFire Node 1
Customer 1 Sales
SQLFire Node 2
Customer 2 Sales
Replica
Replica
Customer 1
Customer 2SQLFire can jointables withoutnetwork hops.
C1
C2
Related data placedon the same node.
Select * fromCustomer c, Sales swhere c.cust_id =
s.cust_id and c.cust_id =‘C1';
Collocate Data For Fast Joins.
Query pruned to node 1
![Page 26: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/26.jpg)
SQLFire Node 1
Customer 1 Sales
SQLFire Node 2
Customer 2 Sales
Replica
Replica
Customer 1
Customer 2
In parallel, each node does hash join, aggregation locally
C1
C2
Related data placedon the same node.
SELECT sum(value) AS total FROM sales s, customer c
WHERE s.cust_id = c.cust_id and c.state = ‘CA’ GROUP By
cust_id ORDER By total
Collocate Data For Fast Joins.
Parallel scatter-gather
![Page 27: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/27.jpg)
Dynamic Data Colocation
Redundancy = 2Single master forany entity group
Dynamic entitygroup formation
Based on foreignkey relationships
![Page 28: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/28.jpg)
Data-Aware Stored Procs• Procedure execution routed to the
data• Full scaled-out execution• Highly available• Use pure Java to access/store data• Demo later on Like Map/Reduce But Different
![Page 29: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/29.jpg)
1
2
3
4
5
6
7
8
910
Scaling Stored Procedures CALL maxSales(arguments)
ON TABLE salesWHERE (Location in (‘CA’,’WA’,’OR')
WITH RESULT PROCESSOR
maxSalesReducer
SQLFire uses data-aware routing to
route processing tothe data.
maxSales on local data
maxSales on local data
maxSalesReducer
Result Processorsgive map/reduce
functionality.
![Page 30: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/30.jpg)
Scalability: ConsistencyWith Transactions And Without
- Row updates always atomic and isolated
- FIFO consistency
- Distributed transactions with 1-phase commit- Coordinator per
node- Eager locking + Fail
fastAssumes:Most x-actions small in space and timeWrite-write conflicts rare
![Page 31: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/31.jpg)
• Parallel log structured storage
• Each partition writes in parallel
• Backups write to disk also– Increase reliability
against h/w loss
Scalability: High performance persistence
MemoryTables
Append only Operation logs
OS Buffers
LOG Compressor
Record1
Record2
Record3
Record1
Record2
Record3
MemoryTables
Append only Operation logs
OS Buffers
LOG Compressor
Record1
Record2
Record3
Record1
Record2
Record3
![Page 32: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/32.jpg)
Demos!
![Page 33: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/33.jpg)
Demo: Distributed Procedures
• Autocorrelation of time series
• All pure Java scaled-out• Tolerant of node failures• Using SuanShu Java
library
![Page 34: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/34.jpg)
Demo: Caching• Read-only or…• Read-through / Write-
behind• Cache analytics results• Skip the ETL
![Page 35: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/35.jpg)
http://vmware.com/go/sqlfireTry SQLFire Today!Free for developer use to 3 nodes.
Download:
Forum:http://vmware.com/vmtn/appplatform/vfabric_sqlfireGot questions? Get answers.
:sigh:Just Google it
Twitter: @vFabricSQLFire, @cshanklin, @jagsrI need more followers to get a promotion.
![Page 36: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/36.jpg)
Demo Details
![Page 37: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/37.jpg)
Scaling Stored Procs (1)
Insert Timeseries
Ubuntu(database)
![Page 38: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/38.jpg)
Scaling Stored Procs (2)
Insert Timeseries
Compute Autocorrelations
Complete
Ubuntu(database)
![Page 39: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/39.jpg)
Scaling Stored Procs (3)
Insert Timeseries
Compute Autocorrelations
Complete
Ubuntu(database)
Compute Autocorrelations
Complete
Ubuntu(database)
Compute Autocorrelations
Complete
Ubuntu(database)
Rebalance Rebalance
All usingstandard SQL
APIs
![Page 40: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/40.jpg)
Caching Analytics (1)
Continuous BatchProcessing
![Page 41: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/41.jpg)
Caching Analytics (2)
Low latency
Ubuntu(database)
Continuous BatchProcessing
In-memorycaching
JDBC rowloader
![Page 42: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/42.jpg)
Caching Analytics (3)
Low latency
Ubuntu(database)
Continuous BatchProcessing
In-memorycaching
Scalable +Tunable Cache
Policies
![Page 43: SQLFire at Strata 2012](https://reader035.vdocument.in/reader035/viewer/2022062616/54b6bcf84a7959fa048b45ca/html5/thumbnails/43.jpg)
• LRU Count– Overflow to disk or destroy.
• Time To Live– Counter ticks as soon as the row is loaded.
• Idle Time– Destroy rows when they are not accessed for a while.
• Specified in CREATE TABLE syntax.
Caching Policies