Download - Иван Глушков (Echo)
![Page 2: Иван Глушков (Echo)](https://reader034.vdocument.in/reader034/viewer/2022052410/554a5dd4b4c90531228b5381/html5/thumbnails/2.jpg)
About myself• MIPT • MCST, Elbrus compiler project • Echo, real-time social platform
(PaaS)
![Page 3: Иван Глушков (Echo)](https://reader034.vdocument.in/reader034/viewer/2022052410/554a5dd4b4c90531228b5381/html5/thumbnails/3.jpg)
History of SQL• Relational Model in 1970:
disk-oriented, row, sql • “One size fits all” doesn’t
work: – Column-oriented for OLAP. – Key-Value storages,
Document storages
![Page 4: Иван Глушков (Echo)](https://reader034.vdocument.in/reader034/viewer/2022052410/554a5dd4b4c90531228b5381/html5/thumbnails/4.jpg)
Startups lifecycle• Start: no money, no users,
open source • Middle: more users, storage
optimization • Final: plenty of users, storage
failure
![Page 5: Иван Глушков (Echo)](https://reader034.vdocument.in/reader034/viewer/2022052410/554a5dd4b4c90531228b5381/html5/thumbnails/5.jpg)
Sharding in application• Additional logic in application • Difficulties with cross-sharding
transactions • More servers to maintain • More components — higher
prob for breakdown
![Page 6: Иван Глушков (Echo)](https://reader034.vdocument.in/reader034/viewer/2022052410/554a5dd4b4c90531228b5381/html5/thumbnails/6.jpg)
New requirements• Huge and growing data sets
9M msg per hour in Facebook 50M msg per day in Twitter
• Generated by devices • High concurrency • Data model with some relations • Transactional integrity
![Page 7: Иван Глушков (Echo)](https://reader034.vdocument.in/reader034/viewer/2022052410/554a5dd4b4c90531228b5381/html5/thumbnails/7.jpg)
Architecture change
Client Side
Server Side
Cloud Storage
Client Side
Server Side
Database
![Page 8: Иван Глушков (Echo)](https://reader034.vdocument.in/reader034/viewer/2022052410/554a5dd4b4c90531228b5381/html5/thumbnails/8.jpg)
Architecture change
• ‘P’ in CAP is not discrete • ACID vs BASE • Managing partitions: detection,
limitations in operations, recovery
![Page 9: Иван Глушков (Echo)](https://reader034.vdocument.in/reader034/viewer/2022052410/554a5dd4b4c90531228b5381/html5/thumbnails/9.jpg)
NoSQL• CAP: first ‘A’, then ‘C’ • Horizontal scaling • Custom API • Schemaless • Types: Key-Value, Document,
Graph, …
![Page 10: Иван Глушков (Echo)](https://reader034.vdocument.in/reader034/viewer/2022052410/554a5dd4b4c90531228b5381/html5/thumbnails/10.jpg)
NewSQL: definition
“A DBMS that delivers the scalability !and flexibility promised by NoSQL!while retaining the support for SQL !queries and/or ACID, or to improve !performance for appropriate workloads.”!
451 Group
![Page 11: Иван Глушков (Echo)](https://reader034.vdocument.in/reader034/viewer/2022052410/554a5dd4b4c90531228b5381/html5/thumbnails/11.jpg)
NewSQL: definition
❖ SQL as the primary interface!❖ ACID support for transactions!❖ Non-locking concurrency control!❖ High per-node performance!❖ Parallel, shared nothing architecture!
Michael Stonebraker
![Page 12: Иван Глушков (Echo)](https://reader034.vdocument.in/reader034/viewer/2022052410/554a5dd4b4c90531228b5381/html5/thumbnails/12.jpg)
Shared nothing• Each node is independent and
self-sufficient • No shared memory or disk • Scale infinitely • Data partitioning • Slow multi-shards requests
![Page 13: Иван Глушков (Echo)](https://reader034.vdocument.in/reader034/viewer/2022052410/554a5dd4b4c90531228b5381/html5/thumbnails/13.jpg)
Column-oriented DBMS• Store content by column • Efficient in hard disk access • Good for sparse data • Higher data compression • More reads/writes for large
records with a lot of fields
![Page 14: Иван Глушков (Echo)](https://reader034.vdocument.in/reader034/viewer/2022052410/554a5dd4b4c90531228b5381/html5/thumbnails/14.jpg)
• Useful work is only 12%
• By Stonebraker and research group
Traditional DBMS
12%
10%
11%
18% 20%
29%
![Page 15: Иван Глушков (Echo)](https://reader034.vdocument.in/reader034/viewer/2022052410/554a5dd4b4c90531228b5381/html5/thumbnails/15.jpg)
In-memory storage• High throughput • Low latency • No Buffer Management • If serialized, no Locking or
Latching
![Page 16: Иван Глушков (Echo)](https://reader034.vdocument.in/reader034/viewer/2022052410/554a5dd4b4c90531228b5381/html5/thumbnails/16.jpg)
In-memory storage• Amazon price reduction !!!!
• Current price for 1TB: $14K
![Page 17: Иван Глушков (Echo)](https://reader034.vdocument.in/reader034/viewer/2022052410/554a5dd4b4c90531228b5381/html5/thumbnails/17.jpg)
NewSQL: categories• New approaches: VoltDB,
Clustrix, NuoDB • New storage engines: TokuDB,
ScaleDB • Transparent clustering:
ScaleBase, dbShards
![Page 18: Иван Глушков (Echo)](https://reader034.vdocument.in/reader034/viewer/2022052410/554a5dd4b4c90531228b5381/html5/thumbnails/18.jpg)
NuoDB• Multi-tier architecture: – Administrative: managing,
stats, cli, web-ui – Transactional: ACID except
‘D’, cache – Storage: key-value store
![Page 19: Иван Глушков (Echo)](https://reader034.vdocument.in/reader034/viewer/2022052410/554a5dd4b4c90531228b5381/html5/thumbnails/19.jpg)
NuoDB• Everything is an ‘Atom’ • P2P, encrypted sessions • MVCC + Append-only storage
![Page 20: Иван Глушков (Echo)](https://reader034.vdocument.in/reader034/viewer/2022052410/554a5dd4b4c90531228b5381/html5/thumbnails/20.jpg)
YCSB• Yahoo Cloud Serving
Benchmark • Key-value:
insert/read/update/scan • Measures – Performance – Scaling
![Page 21: Иван Глушков (Echo)](https://reader034.vdocument.in/reader034/viewer/2022052410/554a5dd4b4c90531228b5381/html5/thumbnails/21.jpg)
NuoDB: YCSBThroughput, tps/nodes
0
275 000
550 000
825 000
1 100 000
1 2 4 8 16 24
Update latency, s
0
25
50
1 2 4 8 16 24
Read latency, s
0
1.5
3
1 2 4 8 16 24
Hosts: 32GB, Xeon 8 cores, 1TB HDD, 1Gb LAN5% updates, 95% reads
![Page 22: Иван Глушков (Echo)](https://reader034.vdocument.in/reader034/viewer/2022052410/554a5dd4b4c90531228b5381/html5/thumbnails/22.jpg)
VoltDB• In-memory storage • Stored procedure interface • Serializing all data access • Horizontal partitioning • Replication (“K-safety”) • Snapshots + Command Logging
![Page 23: Иван Глушков (Echo)](https://reader034.vdocument.in/reader034/viewer/2022052410/554a5dd4b4c90531228b5381/html5/thumbnails/23.jpg)
VoltDB• Open-source • Partitioning and Replication
control • Java + C++
![Page 24: Иван Глушков (Echo)](https://reader034.vdocument.in/reader034/viewer/2022052410/554a5dd4b4c90531228b5381/html5/thumbnails/24.jpg)
VoltDB: kv bench
90%reads, 10%writes 3 nodes: 64GB, dual 2.93GHz intel 6 core processors
![Page 25: Иван Глушков (Echo)](https://reader034.vdocument.in/reader034/viewer/2022052410/554a5dd4b4c90531228b5381/html5/thumbnails/25.jpg)
VoltDB: sql bench
26 SQL statements "per transaction
![Page 26: Иван Глушков (Echo)](https://reader034.vdocument.in/reader034/viewer/2022052410/554a5dd4b4c90531228b5381/html5/thumbnails/26.jpg)
• Shared data • No single point
of failure
ScaleDBApplication
MySQL MySQL MySQL…
Mirrored"Storage! Mirrored"
Storage!
… Mirrored"Storage!
![Page 27: Иван Глушков (Echo)](https://reader034.vdocument.in/reader034/viewer/2022052410/554a5dd4b4c90531228b5381/html5/thumbnails/27.jpg)
ClustrixDB• “Query fragment”: • read/write/execute funs • perform synchronisation • send rows to query
fragments • Partition, replication: “slices”
![Page 28: Иван Глушков (Echo)](https://reader034.vdocument.in/reader034/viewer/2022052410/554a5dd4b4c90531228b5381/html5/thumbnails/28.jpg)
ClustrixDB• “Move query to the data” • Dynamic and transparent data
layout • Linear scale
![Page 29: Иван Глушков (Echo)](https://reader034.vdocument.in/reader034/viewer/2022052410/554a5dd4b4c90531228b5381/html5/thumbnails/29.jpg)
TPC-C
• OLTP benchmark • 9 types of tables • “New-order transaction”
![Page 30: Иван Глушков (Echo)](https://reader034.vdocument.in/reader034/viewer/2022052410/554a5dd4b4c90531228b5381/html5/thumbnails/30.jpg)
ClustrixDB• ~ 400GB • linear
scale
![Page 31: Иван Глушков (Echo)](https://reader034.vdocument.in/reader034/viewer/2022052410/554a5dd4b4c90531228b5381/html5/thumbnails/31.jpg)
ClustrixDB: examples• 30M users, 10M logins per day • 4.4B transactions per day • 1.08/4.69 Petabytes per month
writes/reads • 42 nodes, 336 cores, 2TB
memory, 46TB SSD
![Page 32: Иван Глушков (Echo)](https://reader034.vdocument.in/reader034/viewer/2022052410/554a5dd4b4c90531228b5381/html5/thumbnails/32.jpg)
Conclusions• NewSQL is an established trend
with a number of options • Hard to pick one because
they're not on a common scale • No silver bullet • Need more efficient ways to
store and process it