cassandra-from tarball to production

36
Cassandra From tarball to production Cassandra Users Group - March 2015 @rkuris Ron Kuris <[email protected]> @Lookout Lookout Mobile Security

Upload: planet-cassandra

Post on 15-Jul-2015

439 views

Category:

Engineering


2 download

TRANSCRIPT

Page 1: Cassandra-From Tarball to Production

CassandraFrom tarball to production

Cassandra Users Group - March 2015@rkuris Ron Kuris <[email protected]>

@Lookout Lookout Mobile Security

Page 2: Cassandra-From Tarball to Production

Why talk about this?You are about to deploy CassandraYou are looking for “best practices”You don’t want:... to scour through the documentation... to do something known not to work well... to forget to cover some important step

Page 3: Cassandra-From Tarball to Production

What we won’t cover● Cassandra: how does

it work?● How do I design my

schema?● What’s new in

Cassandra X.Y?

Page 4: Cassandra-From Tarball to Production

So many things to doMonitoring Snitch DC/Rack Settings Time Sync

Seeds/Autoscaling Full/Incremental Backups

AWS Instance Selection

Disk - SSD?

Disk Space - 2x? AWS AMI (Image) Selection

Periodic Repairs Replication Strategy

Compaction Strategy

SSL/VPC/VPN Authorization + Authentication

OS Conf - Users

OS Conf - Limits OS Conf - Perms OS Conf - FSType OS Conf - Logs

C* Start/Stop OS Conf - Path Use case evaluation

Page 5: Cassandra-From Tarball to Production

Chef to the rescue! (kinda)Chef community cookbook availablehttps://github.com/michaelklishin/cassandra-chef-cookbook

Installs java Creates a “cassandra” user/group

Download/extract the tarball Fixes up ownership

Builds the C* configuration files Sets the ulimits for filehandles, processes, memory locking

Sets up an init script Sets up data directories

Page 6: Cassandra-From Tarball to Production

Chef Cookbook CoverageMonitoring Snitch DC/Rack Settings Time Sync

Seeds/Autoscaling Full/Incremental Backups

Disk - SSD? Disk - How much?

AWS Instance Type AWS AMI (Image) Selection

Periodic Repairs Replication Strategy

Compaction Strategy

SSL/VPC/VPN Authorization + Authentication

OS Conf - Users

OS Conf - Limits OS Conf - Perms OS Conf - FSType OS Conf - Logs

C* Start/Stop OS Conf - Path Use case evaluation

Page 7: Cassandra-From Tarball to Production

MonitoringIs every node answering queries?Are nodes talking to each other?Are any nodes running slowly?

Push UDP! (statsd)http://hackers.lookout.com/2015/01/cassandra-monitoring/

Page 8: Cassandra-From Tarball to Production

Monitoring - Synthetic StuffHealth checks, bad and good● ‘nodetool status’ exit code

○ Might return 0 if the node is not accepting requests○ Slow, cross node reads

● cqlsh -u sysmon -p password < /dev/null● Verifies this node is OK, not cluster● Depends on auth query for read

Page 9: Cassandra-From Tarball to Production

What about OpsCenter?We chose not to use itWant consistent interface for all monitoringGUI vs Command Line argumentDidn’t see good auditing capabilitiesDidn’t interface well with our chef solution

Page 10: Cassandra-From Tarball to Production

SnitchUse the right snitch!● AWS EC2MultiRegionSnitch● Google? GoogleCloudSnitch● GossipingPropertyFileSnitchNOT● SimpleSnitch (default)Community cookbook: set it!

Page 11: Cassandra-From Tarball to Production

What is RF?Replication Factor is how many copies of dataValue is hashed to determine primary hostAdditional copies always next node

Hash here

Page 12: Cassandra-From Tarball to Production

What is CL?Consistency Level -- It’s not RF!Describes how many nodes must respond before operation is considered COMPLETECL_ONE - only one node respondsCL_QUORUM - (RF/2)+1 nodes (round down)CL_ALL - RF nodes respond

Page 13: Cassandra-From Tarball to Production

DC/Rack SettingsYou might need to set these

EC2 snitches set the DC to ‘us-west’ and ‘us-east’ instead of ‘us-west-1’

Maybe you’re not in AmazonRack == Availability Zone?Hard: Renaming DC or adding racks

Page 14: Cassandra-From Tarball to Production

Renaming DCsClients “remember” which DC they talk toRenaming single DC causes all clients to failBetter to spin up a new one than rename oldEC2 snitches default to part of DC nameExample: us-west (not us-west-1 or us-west-2)

Page 15: Cassandra-From Tarball to Production

Adding a rackStart with 6 node cluster, rack R1Replication factor 3Add 1 node in R2, and rebalanceALL data in R2 node?Good idea to keep racks balanced

Page 16: Cassandra-From Tarball to Production

I don’t have time for thisClusters must have synchronized timeYou will get lots of drift with: [0-3].amazon.pool.ntp.orgCommunity cookbook doesn’t cover anything here

Page 17: Cassandra-From Tarball to Production

Better make time for thisC* serializes write operations by time stampsClocks on virtual machines drift!It’s the relative difference among clocks that mattersC* nodes should synchronize with each otherSolution: use a pair of peered NTP servers (level 2 or 3) and a small set of known upstream providers

Page 18: Cassandra-From Tarball to Production

From a small seed…Seeds are used for new nodes to find clusterEvery new node should use the same seedsSeed nodes get topology changes fasterEach seed node must be in the config fileMultiple seeds per datacenter recommendedTricky to configure on AWS

Page 19: Cassandra-From Tarball to Production

Backups - Full+IncrementalNothing in the cookbooks for thisC* makes it “easy”: snapshot, then copySnapshots might require a lot more spaceRemove the snapshot after copying it

Page 20: Cassandra-From Tarball to Production

Disk selectionSSD Rotational

EphemeralEBS

Low latency Any size instance Any size instance

Recommended Not cheap Less expensive

Great random r/w perf Good write performance No node rebuilds

No network use for disk No network use for disk

Page 21: Cassandra-From Tarball to Production

AWS Instance SelectionWe moved to EC2c3.2xlarge (15GiB mem, Disk 160GB)?i2.xlarge (30GiB mem, 800GB disk)Max recommended storage per node is 1TBUse instance types that support HVMSome previous generation instance types, such as T1, C1, M1, and M2 do not support Linux HVM AMIs. Some current generation instance types, such as T2, I2, R3, G2, and C4 do not support PV AMIs.

Page 22: Cassandra-From Tarball to Production

How much can I use??Snapshots take space (kind of)Best practice: keep disks half full!800GB disk becomes 400GBSnapshots during repairs?Lots of uses for snapshots!

Page 23: Cassandra-From Tarball to Production

Periodic RepairsBuried in the docs:“As a best practice, you should schedule repairs weekly”http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html

● “-pr” (yes)● “-par” (maybe)● “--in-local-dc” (no)

Page 24: Cassandra-From Tarball to Production

Repair TipsRaise gc_grace_seconds (tombstones?)Run on one node at a timeSchedule for low usage hoursUse “par” if you have dead time (faster)Tune with: nodetool setcompactionthroughput

Page 25: Cassandra-From Tarball to Production

I thought I deleted thatCompaction removes “old” tombstones10 day default grace period (gc_grace_period)After that, deletes will not be propagated!Run ‘nodetool repair’ at least every 10 daysOnce a week is perfect (3 day grace)Node down >7 days? ‘nodetool remove’ it!

Page 26: Cassandra-From Tarball to Production

Changing RF within DC?Easy to decrease RFImpossible to increase RF without (usually)Reads with CL_ONE might fail!

Hash here

Page 27: Cassandra-From Tarball to Production

Replication StrategyHow many replicas should we have?What happens if some data is lost?Are you write-heavy or read-heavy?Quorum considerations: odd is better!RF=1? RF=3? RF=5?

Page 28: Cassandra-From Tarball to Production

Compaction StrategySolved by using a good C* designSizeTiered or Leveled?

Leveled has better guarantees for read timesSizeTiered may require 10 (or more) reads!Leveled uses less disk spaceLeveled tombstone collection is slower

Page 29: Cassandra-From Tarball to Production

Auth*Cookbooks default to OFF

Turn authenticator and authorizer on‘cassandra’ user is super special

Requires QUORUM (cross-DC) for signonLOCAL_ONE for all other users!

Page 30: Cassandra-From Tarball to Production

UsersOS users vs Cassandra users: 1 to 1?Shared credentials for apps?Nothing logs the user taking the action!‘cassandra’ user is created by cookbookAll processes run as ‘cassandra’

Page 31: Cassandra-From Tarball to Production

LimitsChef helps here! Startup:ulimit -l unlimited # mem lockulimit -n 48000 # fds

/etc/security/limits.dcassandra - nofile 48000cassandra - nproc unlimitedcassandra - memlock unlimited

Page 32: Cassandra-From Tarball to Production

Filesystem TypeOfficially supported: ext4 or XFSXFS is slightly fasterInteresting options:● ext4 without journal● ext2● zfs

Page 33: Cassandra-From Tarball to Production

LogsTo consolidate or not to consolidate?Push or pull? Usually push!FOSS: syslogd, syslog-ng, logstash/kibana, heka, bananaOthers: Splunk, SumoLogic, Loggly, Stackify

Page 34: Cassandra-From Tarball to Production

ShutdownNice init script with cookbook, steps are:● nodetool disablethrift (no more clients)● nodetool disablegossip (stop talking to

cluster)● nodetool drain (flush all memtables)● kill the jvm

Page 35: Cassandra-From Tarball to Production

Quick performance wins● Disable assertions - cookbook property● No swap space (or vm.swappiness=1)● max_concurrent_reads/max_concurrent_writ

es

Page 36: Cassandra-From Tarball to Production

Stay in touch

Ron Kuris @rkuris <[email protected]>Lookout Mobile Security @lookout