dynomite: a highly available, distributed and scalable dynamo layer--ioannis papapanagiotou,...
TRANSCRIPT
![Page 1: Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis Papapanagiotou, Shailesh Birari, Jason Cacciatore, Netflix](https://reader035.vdocument.in/reader035/viewer/2022062822/587c04f81a28ab7c668b7549/html5/thumbnails/1.jpg)
Cloud Database Engineering
Making Non-Distributed Databases, Distributed
- Shailesh Birari- Ioannis Papapanagiotou, PhD
![Page 2: Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis Papapanagiotou, Shailesh Birari, Jason Cacciatore, Netflix](https://reader035.vdocument.in/reader035/viewer/2022062822/587c04f81a28ab7c668b7549/html5/thumbnails/2.jpg)
Dynomite Ecosystem● Dynomite● Dynomite-manager● Dyno client
![Page 3: Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis Papapanagiotou, Shailesh Birari, Jason Cacciatore, Netflix](https://reader035.vdocument.in/reader035/viewer/2022062822/587c04f81a28ab7c668b7549/html5/thumbnails/3.jpg)
Cloud Database Engg (CDE) Team● Develop and operate data stores in AWS
- Cassandra, Dynomite, Elastic Search, RDS, S3
● Ensure availability, scalability, durability and latency SLAs
● Database expertise, client libraries, tools and best practices
![Page 4: Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis Papapanagiotou, Shailesh Birari, Jason Cacciatore, Netflix](https://reader035.vdocument.in/reader035/viewer/2022062822/587c04f81a28ab7c668b7549/html5/thumbnails/4.jpg)
● Cassandra not a speed demon (reads)● Needed a data store:
o Scalable & highly availableo High throughput, low latencyo Active-active multi datacenter replication
● Usage of Redis increasing:o Netflix use case is active-active, highly availableo Does not have bi-directional replicationo Cannot withstand a Monkey attack
Problems & Observations
![Page 5: Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis Papapanagiotou, Shailesh Birari, Jason Cacciatore, Netflix](https://reader035.vdocument.in/reader035/viewer/2022062822/587c04f81a28ab7c668b7549/html5/thumbnails/5.jpg)
What is Dynomite?● A framework that makes non-distributed data
stores, distributed.o Can be used with many key-value storage engines
like Redis, Memcached, LMDB, etc.o Focus: performance, cross-datacenter active-active
replication and high availabilityo Features: node warmup (cold bootstrapping),
tunable consistency, S3 backups/restores
![Page 6: Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis Papapanagiotou, Shailesh Birari, Jason Cacciatore, Netflix](https://reader035.vdocument.in/reader035/viewer/2022062822/587c04f81a28ab7c668b7549/html5/thumbnails/6.jpg)
Dynomite @ Netflix● Running around 1.5 years in PROD● ~1000 customer facing nodes● 1M OPS at peak● Largest cluster: 6TB● Quarterly upgrades in PROD
![Page 7: Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis Papapanagiotou, Shailesh Birari, Jason Cacciatore, Netflix](https://reader035.vdocument.in/reader035/viewer/2022062822/587c04f81a28ab7c668b7549/html5/thumbnails/7.jpg)
Dynomite Overview● Layer on top of a non-distributed key value
data store○ Peer-peer, Shared Nothing○ Auto Sharding○ Multi-datacenter○ Linear scale○ Replication(Encrypted) ○ Gossiping
![Page 8: Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis Papapanagiotou, Shailesh Birari, Jason Cacciatore, Netflix](https://reader035.vdocument.in/reader035/viewer/2022062822/587c04f81a28ab7c668b7549/html5/thumbnails/8.jpg)
● Each rack contains one copy of data, partitioned across multiple nodes in that rack
● Multiple Racks == Higher Availability (HA)
Topology
![Page 9: Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis Papapanagiotou, Shailesh Birari, Jason Cacciatore, Netflix](https://reader035.vdocument.in/reader035/viewer/2022062822/587c04f81a28ab7c668b7549/html5/thumbnails/9.jpg)
Replication● A client can connect to any node on
the Dynomite cluster when sending requests.o If node owns the data,
▪ data are written in local data-store and asynchronously replicated.
o If node does not own the data▪ node acts as a coordinator
and sends the data in the same rack & replicates to other nodes in other racks and DC.
![Page 10: Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis Papapanagiotou, Shailesh Birari, Jason Cacciatore, Netflix](https://reader035.vdocument.in/reader035/viewer/2022062822/587c04f81a28ab7c668b7549/html5/thumbnails/10.jpg)
The Dynomite Ecosystem
RESP = RedisSerialization Protocol
![Page 11: Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis Papapanagiotou, Shailesh Birari, Jason Cacciatore, Netflix](https://reader035.vdocument.in/reader035/viewer/2022062822/587c04f81a28ab7c668b7549/html5/thumbnails/11.jpg)
Consistency● DC_ONE
o Reads and writes are propagated synchronously only to the node in local rack and asynchronously replicated to other racks and data centers
● DC_QUORUMo Reads and writes are propagated synchronously to quorum
number of nodes in the local region and asynchronously to the rest
● Consistency can be configured dynamically for read or write operations separately (cluster-wide)
![Page 12: Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis Papapanagiotou, Shailesh Birari, Jason Cacciatore, Netflix](https://reader035.vdocument.in/reader035/viewer/2022062822/587c04f81a28ab7c668b7549/html5/thumbnails/12.jpg)
Performance Setup● Instance Type:
○ Dynomite: r3.2xlarge (1Gbps)○ Pappy/Dyno: m2.2xls (typical of an app@Netflix)
● Replication factor: 3○ Deployed Dynomite in 3 zones in us-east-1○ Every zone had the same number of servers
● Demo app used simple workloads key/value pairs○ Redis: GET and SET
● Payload ○ Size: 1024 Bytes○ 80%/20% reads over writes
![Page 13: Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis Papapanagiotou, Shailesh Birari, Jason Cacciatore, Netflix](https://reader035.vdocument.in/reader035/viewer/2022062822/587c04f81a28ab7c668b7549/html5/thumbnails/13.jpg)
Performance (Dynomite Speed)● Throughput scales linearly with number of nodes.● Dynomite can reach >1Million Client requests with ~24 nodes.
![Page 14: Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis Papapanagiotou, Shailesh Birari, Jason Cacciatore, Netflix](https://reader035.vdocument.in/reader035/viewer/2022062822/587c04f81a28ab7c668b7549/html5/thumbnails/14.jpg)
Performance (Latency - average/P50)● Dynomite’s latency on average is 0.16ms.● Client side latency is 0.6ms and does not increase as the cluster scales
up/down
![Page 15: Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis Papapanagiotou, Shailesh Birari, Jason Cacciatore, Netflix](https://reader035.vdocument.in/reader035/viewer/2022062822/587c04f81a28ab7c668b7549/html5/thumbnails/15.jpg)
Performance (Latency - P99)● The major contributor to latency at P99 is the network.● Dynomite affects <10%
![Page 16: Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis Papapanagiotou, Shailesh Birari, Jason Cacciatore, Netflix](https://reader035.vdocument.in/reader035/viewer/2022062822/587c04f81a28ab7c668b7549/html5/thumbnails/16.jpg)
Dynomite-manager
● Token management for multi-region deployments
● Support AWS environment
● Automated security group update in multi-region environment
● Monitoring of Dynomite and the underlying storage engine
● Node cold bootstrap (warm up)
● S3 backups and restores
● REST API
![Page 17: Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis Papapanagiotou, Shailesh Birari, Jason Cacciatore, Netflix](https://reader035.vdocument.in/reader035/viewer/2022062822/587c04f81a28ab7c668b7549/html5/thumbnails/17.jpg)
Dynomite-manager: warm up1. Dynomite-manager identifies which node has the same token in the
same DC2. Sets Redis to “Slave” mode of that node3. Checks for peer syncing
a. difference between master and slave offset4. Once master and slave are in sync, Dynomite is set to allow write only5. Dynomite is set back to normal state6. Checks for health of the node - Done!
![Page 18: Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis Papapanagiotou, Shailesh Birari, Jason Cacciatore, Netflix](https://reader035.vdocument.in/reader035/viewer/2022062822/587c04f81a28ab7c668b7549/html5/thumbnails/18.jpg)
Warm up (node terminated)
![Page 19: Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis Papapanagiotou, Shailesh Birari, Jason Cacciatore, Netflix](https://reader035.vdocument.in/reader035/viewer/2022062822/587c04f81a28ab7c668b7549/html5/thumbnails/19.jpg)
Warm up (auto-scale)
![Page 20: Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis Papapanagiotou, Shailesh Birari, Jason Cacciatore, Netflix](https://reader035.vdocument.in/reader035/viewer/2022062822/587c04f81a28ab7c668b7549/html5/thumbnails/20.jpg)
Warm up (node with same token)
![Page 21: Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis Papapanagiotou, Shailesh Birari, Jason Cacciatore, Netflix](https://reader035.vdocument.in/reader035/viewer/2022062822/587c04f81a28ab7c668b7549/html5/thumbnails/21.jpg)
Warm up (Redis replication)
![Page 22: Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis Papapanagiotou, Shailesh Birari, Jason Cacciatore, Netflix](https://reader035.vdocument.in/reader035/viewer/2022062822/587c04f81a28ab7c668b7549/html5/thumbnails/22.jpg)
Warm up (Streaming data)
![Page 23: Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis Papapanagiotou, Shailesh Birari, Jason Cacciatore, Netflix](https://reader035.vdocument.in/reader035/viewer/2022062822/587c04f81a28ab7c668b7549/html5/thumbnails/23.jpg)
Warm up (Nodes in sync)
![Page 24: Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis Papapanagiotou, Shailesh Birari, Jason Cacciatore, Netflix](https://reader035.vdocument.in/reader035/viewer/2022062822/587c04f81a28ab7c668b7549/html5/thumbnails/24.jpg)
Dynomite: S3 backups/restores● Why?
o Disaster recovery o Data corruption
● How?o Redis dumps data on the instance driveo Dynomite-manager sends data to S3 buckets
● Data per node are not large so no need for incrementals.● Use case:
o clusters that use Dynomite as a storage layero Not enabled in clusters that have short TTL or use Dynomite as a
cache
![Page 25: Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis Papapanagiotou, Shailesh Birari, Jason Cacciatore, Netflix](https://reader035.vdocument.in/reader035/viewer/2022062822/587c04f81a28ab7c668b7549/html5/thumbnails/25.jpg)
Dynomite S3 backups (operation)1. Perform backup
a. Dynomite-manager performs it on a pre-defined intervalb. Dynomite-manger REST call:
i. curl http://localhost:8080/REST/v1/admin/s3backup2. Perform a Redis BGREWRITEAOF or BGSAVE.
a. Check the size of the persisted file. If the size is zero, which means that there was an issue with Redis or no data are there, then we do not perform S3 backups
3. S3 backup key: backup/region/clustername-ASG/token/date
![Page 26: Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis Papapanagiotou, Shailesh Birari, Jason Cacciatore, Netflix](https://reader035.vdocument.in/reader035/viewer/2022062822/587c04f81a28ab7c668b7549/html5/thumbnails/26.jpg)
Dynomite S3 restores1. Perform restore:
a. Dynomite-manager performs once it starts if configuration is enabledb. Dynomite-manger REST call:
i. curl http://localhost:8080/REST/v1/admin/s3backup2. Stop Dynomite process:
a. We perform this to notify Discovery that Dynomite is not accessibleb. Stop Redis process
3. Restore the data from a specific date a. provided in the configuration
4. Start Redis process and check if the data has been loaded. 5. Start Dynomite and check if process is up
![Page 27: Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis Papapanagiotou, Shailesh Birari, Jason Cacciatore, Netflix](https://reader035.vdocument.in/reader035/viewer/2022062822/587c04f81a28ab7c668b7549/html5/thumbnails/27.jpg)
Dyno Client - Java API● Connection Pooling● Load Balancing● Effective failover● Pipelining● Scatter/Gather● Metrics, e.g. Netflix Insights
![Page 28: Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis Papapanagiotou, Shailesh Birari, Jason Cacciatore, Netflix](https://reader035.vdocument.in/reader035/viewer/2022062822/587c04f81a28ab7c668b7549/html5/thumbnails/28.jpg)
Dyno Load Balancing
● Dyno client employs token aware load balancing.
● Dyno client is aware of the cluster topology of Dynomite within the region, can write to specific node
using consistent hashing.
![Page 29: Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis Papapanagiotou, Shailesh Birari, Jason Cacciatore, Netflix](https://reader035.vdocument.in/reader035/viewer/2022062822/587c04f81a28ab7c668b7549/html5/thumbnails/29.jpg)
Dyno Failover● Dyno will route
requests to different racks in failure scenarios.
![Page 30: Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis Papapanagiotou, Shailesh Birari, Jason Cacciatore, Netflix](https://reader035.vdocument.in/reader035/viewer/2022062822/587c04f81a28ab7c668b7549/html5/thumbnails/30.jpg)
Roadmap● Multi-threaded support for Dynomite● Data reconciliation & repair v2● Dynomite-spark connector● Investigation for persistent stores ● Async Dyno Client● Others….
![Page 31: Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis Papapanagiotou, Shailesh Birari, Jason Cacciatore, Netflix](https://reader035.vdocument.in/reader035/viewer/2022062822/587c04f81a28ab7c668b7549/html5/thumbnails/31.jpg)
More information● Netflix OSS:o https://github.com/Netflix/dynomiteo https://github.com/Netflix/dyno
● Contact:o email: {sbirari, ipapapanagiotou}@netflix.com
![Page 32: Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis Papapanagiotou, Shailesh Birari, Jason Cacciatore, Netflix](https://reader035.vdocument.in/reader035/viewer/2022062822/587c04f81a28ab7c668b7549/html5/thumbnails/32.jpg)