redisconf17- using redis at scale @ twitter
Post on 21-Jan-2018
584 Views
Preview:
TRANSCRIPT
Nighthawk
Distributed caching with Redis @
Rashmi Ramesh@rashmi_ur
Agenda
What is Nighthawk?
How does it work?
Scaling out
High availability
Current challenges
Nighthawk - cache-as-a-service
Runs redis at it’s core
> 10M QPS,
Largest cluster runs ~3K redis nodes
> 10TB of data
Who uses Nighthawk?
Some of our biggest customers:
Analytics services - Ads, Video
Ad serving
Ad Exchange
Direct Messaging
Mobile app conversion tracking
Design Goals
Scalable: scale vertically and horizontally
Elastic: add / remove instances without violating SLA
High throughput and low latencies
High availability in the event of machine failures
Topology agnostic client
Nighthawk Architecture
Client
Proxy/Routing layer
Backend N
..……...
Redis 0 Redis N
Backend 0
..……...
Redis 0 Redis N
Topology
Cluster
manager
Cache backend
Mesos Container
Redis nodes
Topology
watcher and
announcer
1 2 3
NM
Proxy/Router
Replica 1 -> Redis1
Replica 2 -> Redis2
Replica 3 -> Redis3
Redis1(dc,host,port1,capacity)
Redis2(dc,host,port2, capacity)
Redis3(dc,host,port3,, capacity)
Topology
Cluster manager
Manages topology membership and changes
- (Re)Balances replicas
- Reacts to topology changes, eg: dead node
- Replicated cache - ensures 2 replicas of same partition are on separate
failure domains
Redis databases for partitions
Partition -> Redis DB
Granular key remapping
Logical data isolation
Enumerating - redis db scan
Deletion - flushdb
Enables replica rehydration
K1 K4K2 K3
Partition X Partition Y
1 2
Scaling
Scaling out with Client/Proxy managed
partitioningKey count: 1.5 M keys
Client
500K 500K500K
Scaling out with Client/Proxy managed
partitioningKey count: 1.5M keys
Remapped keys: 600KClient
300K 300K300K 300K300K
Persistent storage
Scaling out with Cluster managerKey count: 1.5M keys
Partition count: 100
Keys/Partition: 15K
Client
Persistent storage
Proxy
Topology and
cluster manager
500K 500K500K
Scaling out with Cluster managerKey count: 1.5M keys
Partition count: 100
Keys/Partition: 15K
Client
Persistent storage
Proxy
Topology and
cluster manager
500K 485K500K 15K
Scaling out with Cluster managerKey count: 1.5M keys
Partition count: 100
Keys/Partition: 15K
Client
485K 485K500K 15K 15K
Persistent storage
Proxy
Topology and
cluster manager
Scaling out with Cluster manager - Post
balancingKey count: 1.5M keys
Partition count: 100
Post balancing...
Client
Persistent storage
Proxy
Topology and
cluster manager
250K 250K250K 250K 500K
Advantages over Client managed partitioning
- Thin client - simple and oblivious to topology
- Clients, proxy layer and backends scale independently
- Pluggable custom load balancing logic through cluster manager
- No cluster downtime during scaling out/up/back
High Availability
High Availability with Replication
Synchronous, best effort
RF = 2, Intra DC
Supports idempotent operations only - get, put, remove, count, scan
Copies of a partition never on the same host and rack
Passive warming for failed/restarted replicas
High Availability with Replication
Client
Proxy/Routing layer
Backend 0
Partition 2,5,9
Topology
Cluster
manager
GetKey in
Partition 5GetKey in
Partition 5
SERVING
Backend N
Partition
12,5,10
SERVINGFAILED
Backend N*
Partition 12,5,10
WARMING
SetKey in
partition 5
Pool A Pool B
Current challenges
Remember this?
The most retweeted
Tweet of 2014!
Hot key symptom
Significantly high QPS to a single cache server
Hot Key Mitigation
Server side diagnostics:
Sampling a small % of requests and logging
Post processing the logs to identify high frequency keys
Client side solution:
Client side hot key detection and caching
Better to have:
Redis tracks the hot keys
Protocol support to send feedback to client if a key is hot
Active warming of replicas
Client
Proxy/Routing layerTopology
Cluster
managerBackend A
Partition 2,5,9
SERVING
Backend B*
Partition 12,5,10
WARMING
writes
Bootstrapper
Pool APool B
Questions?
top related