rethinking topology in cassandra -...

59
Eric Evans [email protected] @jericevans ApacheCon Europe November 7, 2012 Rethinking Topology in Cassandra Wednesday, November 7, 12

Upload: others

Post on 17-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

Eric [email protected]

@jericevans

ApacheCon EuropeNovember 7, 2012

Rethinking Topology in Cassandra

Wednesday, November 7, 12

Page 2: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

DHT 101

Wednesday, November 7, 12

Page 3: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

DHT 101partitioning

AZ

Wednesday, November 7, 12

Page 4: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

DHT 101partitioning

A

B

C

Y

Z

Wednesday, November 7, 12

Page 5: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

DHT 101partitioning

A

B

C

Y Key = Aaa

Z

Wednesday, November 7, 12

Page 6: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

DHT 101replica placement

A

B

C

Y Key = Aaa

Z

Wednesday, November 7, 12

Page 7: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

DHT 101consistency

Consistency

Availability

Partition tolerance

Wednesday, November 7, 12

Page 8: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

A

?

?

W

DHT 101scenario: consistency level = one

Wednesday, November 7, 12

Page 9: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

A

?

?

R

DHT 101scenario: consistency level = all

Wednesday, November 7, 12

Page 10: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

DHT 101scenario: quorum write

A

B

?

R+W > N

W

Wednesday, November 7, 12

Page 11: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

DHT 101scenario: quorum read

?

B

C

R+W > N

R

Wednesday, November 7, 12

Page 12: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

Awesome, yes?

Wednesday, November 7, 12

Page 13: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

Well...

Wednesday, November 7, 12

Page 14: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

Problem:Poor load distribution

Wednesday, November 7, 12

Page 15: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

Distributing Load

A

B

C

Y

Z

M

Wednesday, November 7, 12

Page 16: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

Distributing Load

A

B

C

Y

Z

M

Wednesday, November 7, 12

Page 17: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

Distributing Load

A

B

C

Y

Z

M

Wednesday, November 7, 12

Page 18: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

Distributing Load

A

B

C

Y

Z

M

Wednesday, November 7, 12

Page 19: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

Distributing Load

AZ

Y B

CM

Wednesday, November 7, 12

Page 20: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

Distributing Load

A

B

C

Y

Z A1

M

Wednesday, November 7, 12

Page 21: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

Distributing Load

A

B

C

Y

Z A1

M

Wednesday, November 7, 12

Page 22: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

Distributing Load

AZ

Y B

C

A1

M

Wednesday, November 7, 12

Page 23: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

Problem:Poor data distribution

Wednesday, November 7, 12

Page 24: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

Distributing DataA

B

CD

Wednesday, November 7, 12

Page 25: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

Distributing DataA

B

CD

E

Wednesday, November 7, 12

Page 26: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

Distributing Data

A

B

CD

EA

C

B

D

Wednesday, November 7, 12

Page 27: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

Distributing Data

A

B

CD

EA

C

B

D

Wednesday, November 7, 12

Page 28: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

Distributing DataA

B

CD

H E

FG

Wednesday, November 7, 12

Page 29: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

Distributing DataA

B

CD

H E

FG

Wednesday, November 7, 12

Page 30: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

Virtual Nodes

Wednesday, November 7, 12

Page 31: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

In a nutshell...

host

host

host

Wednesday, November 7, 12

Page 32: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

Benefits

• Operationally simpler (no token management)

• Better distribution of load

• Concurrent streaming involving all hosts

• Smaller partitions mean greater reliability

• Supports heterogenous hardware

Wednesday, November 7, 12

Page 33: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

Strategies

• Automatic sharding

• Fixed partition assignment

• Random token assignment

Wednesday, November 7, 12

Page 34: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

StrategyAutomatic Sharding

• Partitions are split when data exceeds a threshold

• Newly created partitions are relocated to a host with lower data load

• Similar to sharding performed by Bigtable, or Mongo auto-sharding

Wednesday, November 7, 12

Page 35: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

StrategyFixed Partition Assignment

• Namespace divided into Q evenly-sized partitions

• Q/N partitions assigned per host (where N is the number of hosts)

• Joining hosts “steal” partitions evenly from existing hosts.

• Used by Dynamo and Voldemort (described in Dynamo paper as “strategy 3”)

Wednesday, November 7, 12

Page 36: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

StrategyRandom Token Assignment

• Each host assigned T random tokens

• T random tokens generated for joining hosts; New tokens divide existing ranges

• Similar to libketama; Identical to Classic Cassandra when T=1

Wednesday, November 7, 12

Page 37: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

Considerations

1. Number of partitions

2. Partition size

3. How 1 changes with more nodes and data

4. How 2 changes with more nodes and data

Wednesday, November 7, 12

Page 38: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

Evaluating

Strategy No. Partitions Partition size

Random O(N) O(B/N)

Fixed O(1) O(B)

Auto-sharding O(B) O(1)

B ~ total data size, N ~ number of hosts

Wednesday, November 7, 12

Page 39: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

Evaluating

• Automatic sharding

• partition size constant (great)

• number of partitions scales linearly with data size (bad)

• Fixed partition assignment

• Random token assignment

Wednesday, November 7, 12

Page 40: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

Evaluating

• Automatic sharding

• Fixed partition assignment

• Number of partitions is constant (good)

• Partition size scales linearly with data size (bad)

• Higher operational complexity (bad)

• Random token assignment

Wednesday, November 7, 12

Page 41: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

Evaluating

• Automatic sharding

• Fixed partition assignment

• Random token assignment

• Number of partitions scales linearly with number of hosts (good ok)

• Partition size increases with more data; decreases with more hosts (good)

Wednesday, November 7, 12

Page 42: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

Evaluating

• Automatic sharding

• Fixed partition assignment

• Random token assignment

Wednesday, November 7, 12

Page 43: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

Cassandra

Wednesday, November 7, 12

Page 44: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

Configurationconf/cassandra.yaml

# Comma separated list of tokens,# (new installs only).initial_token:<token>,<token>,<token>

or

# Number of tokens to generate.num_tokens: 256

Wednesday, November 7, 12

Page 45: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

Configurationnodetool info

Token : (invoke with -T/--tokens to see all 256 tokens)ID : 64090651-6034-41d5-bfc6-ddd24957f164Gossip active : trueThrift active : trueLoad : 92.69 KBGeneration No : 1351030018Uptime (seconds): 45Heap Memory (MB): 95.16 / 1956.00Data Center : datacenter1Rack : rack1Exceptions : 0Key Cache : size 240 (bytes), capacity 101711872 (bytes ...Row Cache : size 0 (bytes), capacity 0 (bytes), 0 hits, ...

Wednesday, November 7, 12

Page 46: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

Configurationnodetool ring

Datacenter: datacenter1==========Replicas: 2

Address Rack Status State Load Owns Token 9022770486425350384127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -9182469192098976078127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -9054823614314102214127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8970752544645156769127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8927190060345427739127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8880475677109843259127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8817876497520861779127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8810512134942064901127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8661764562509480261127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8641550925069186492127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8636224350654790732......

Wednesday, November 7, 12

Page 47: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

Configurationnodetool status

Datacenter: datacenter1=======================Status=Up/Down|/ State=Normal/Leaving/Joining/Moving-- Address Load Tokens Owns Host ID RackUN 10.0.0.1 97.2 KB 256 66.0% 64090651-6034-41d5-bfc6-ddd24957f164 rack1UN 10.0.0.2 92.7 KB 256 66.2% b3c3b03c-9202-4e7b-811a-9de89656ec4c rack1UN 10.0.0.3 92.6 KB 256 67.7% e4eef159-cb77-4627-84c4-14efbc868082 rack1

Wednesday, November 7, 12

Page 48: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

Configurationnodetool status

Datacenter: datacenter1=======================Status=Up/Down|/ State=Normal/Leaving/Joining/Moving-- Address Load Tokens Owns Host ID RackUN 10.0.0.1 97.2 KB 256 66.0% 64090651-6034-41d5-bfc6-ddd24957f164 rack1UN 10.0.0.2 92.7 KB 256 66.2% b3c3b03c-9202-4e7b-811a-9de89656ec4c rack1UN 10.0.0.3 92.6 KB 256 67.7% e4eef159-cb77-4627-84c4-14efbc868082 rack1

Wednesday, November 7, 12

Page 49: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

Configurationnodetool status

Datacenter: datacenter1=======================Status=Up/Down|/ State=Normal/Leaving/Joining/Moving-- Address Load Tokens Owns Host ID RackUN 10.0.0.1 97.2 KB 256 66.0% 64090651-6034-41d5-bfc6-ddd24957f164 rack1UN 10.0.0.2 92.7 KB 256 66.2% b3c3b03c-9202-4e7b-811a-9de89656ec4c rack1UN 10.0.0.3 92.6 KB 256 67.7% e4eef159-cb77-4627-84c4-14efbc868082 rack1

Wednesday, November 7, 12

Page 50: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

MigrationA

C B

Wednesday, November 7, 12

Page 51: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

Migrationedit conf/cassandra.yaml and restart

# Number of tokens to generate.num_tokens: 256

Wednesday, November 7, 12

Page 52: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

Migrationconvert to T contiguous tokens in existing ranges

AAAAAAAAAAAA

AA A A A A A A A A AC

AA

AA

AA

AAAA

AB

Wednesday, November 7, 12

Page 53: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

Migrationshuffle

AAAAAAAAAAAA

AA A A A A A A A A AC

AA

AA

AA

AAAA

AB

Wednesday, November 7, 12

Page 54: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

Shuffle

• Range transfers are queued on each host

• Hosts initiate transfer of ranges to self

• Pay attention to the logs!

Wednesday, November 7, 12

Page 55: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

Shufflebin/shuffle

Usage: shuffle [options] <sub-command>

Sub-commands: create Initialize a new shuffle operation ls List pending relocations clear Clear pending relocations en[able] Enable shuffling dis[able] Disable shuffling

Options: -dc, --only-dc Apply only to named DC (create only) -tp, --thrift-port Thrift port number (Default: 9160) -p, --port JMX port number (Default: 7199) -tf, --thrift-framed Enable framed transport for Thrift (Default: false) -en, --and-enable Immediately enable shuffling (create only) -H, --help Print help information -h, --host JMX hostname or IP address (Default: localhost) -th, --thrift-host Thrift hostname or IP address (Default: JMX host)

Wednesday, November 7, 12

Page 56: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

Performance

Wednesday, November 7, 12

Page 57: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

removenode

0

100

200

300

400

Acunu Reflex / Cassandra 1.2 Cassandra 1.1

Wednesday, November 7, 12

Page 58: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

bootstrap

0

125

250

375

500

Acunu Reflex / Cassandra 1.2 Cassandra 1.1

Wednesday, November 7, 12

Page 59: Rethinking Topology in Cassandra - ApacheConarchive.apachecon.com/...Cloud/...rethinking-topology-in-cassandra.pdf · Rethinking Topology in Cassandra Wednesday, November 7, 12. DHT

The End

• Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels “Dynamo: Amazon’s Highly Available Key-value Store” Web.

• Low, Richard. “Improving Cassandra's uptime with virtual nodes” Web.

• Overton, Sam. “Virtual Nodes Strategies.” Web.

• Overton, Sam. “Virtual Nodes: Performance Results.” Web.

• Jones, Richard. "libketama - a consistent hashing algo for memcache clients” Web.

Wednesday, November 7, 12