cassandra from her nephew’s pov - meetupfiles.meetup.com/11129032/cassandra_oc_jug.pdf · key...
TRANSCRIPT
Cassandra From Her Nephew’s POVAn presentation about using Cassandra with Astyanax!By Mike Epstein!Principal Engineer @ callfire.com
A little bit about the presenter- I am currently a principal engineer at callfire.com
- I am just a regular developer who has been privileged enough to work with several different NoSQL technologies at several different companies
- I AM NOT A Cassandra expert
A little bit about the presenter
- Why am I giving a talk about Cassandra if I am not expert? Because I work with Cassandra daily and have worked with other NoSQL technologies in the past. So I have a regular developers perspective that I am going to share with you.
A little bit about the presentation
- This presentation is from a software developer using Cassandra POV
so it is not about operational Cassandra no writing a Cassandra client no developing/contributing to Cassandra those are great interesting topics but not for this presentation (I would love to hear a talk about the last one BTW) None of those things are my primary view as developer I want to work with Cassandra and this talk is about exactly that. Working with Cassandra.
Brought to you by:
This presentation and myself are brought to you by CallFire. So a little about Callfire (and this mostly relevant since it leads into why we use Cassandra) Callfire is a telecommunications company that allows small to medium sized organizations to use voice/text broadcast, IVR, a hosted call center solution and text based marketing. Among other things. Those call center calls and voice mails are optionally recorded. Meaning we have to save those files somewhere.
Stats- small company ~70 employees and growing!!
- grew signups 154% last year!
- 318,522 accounts!
- 63,179,955 campaigns!
- 1,889,341,296 calls/text ever sent or received !
- 12.4M calls in one day
yeah those are the exact numbers as of sometime Monday I believe So we have a lot of data to keep on all those accounts, campaigns, and calls
Why did we choose Cassandra?
- Sound file storage!- Ease of operation
In short. Sound file storage and at the time the company was really really small. Why is it good for sound file storage?
Cassandra is Really Really fast*
source: http://planetcassandra.org/what-is-apache-cassandra/!
*If you are using it right and that using it right fits what you need
This is slide is from planetcassandra.org which is a DataStax community site. DataStax is a commercial entity behind Cassandra support and Cassandra distribution. Take all benchmarks with a grain of salt especially in an ultra competitive market like NoSQL is today. Things are skewed and enhancements are made constantly. HBase and Cassandra are always trading places in benchmarks. But the very nature of Cassandra makes it especially fast for writes. And since these sound files are assigned an ID they are easily stored and retrieved by key. Prefect use case for Cassandra. Heavy write seldom read both by key. And provides Cassandra provides its own storage.
Cassandra is a stand alone applicationGetting started with Cassandra is very easy you download it, un pack it and run it.
Cassandra only depends on have the JVM installed Every time I have talked to someone who uses Cassandra this was always brought up as a big reason why they started with Cassandra. Easy to get up and running. No dedicated storage, no ZooKeeper, don’t need a huge a grid or Master and Slave. Just unpack and run. This ability cannot be under estimated.
What is Cassandra and how does it work in brief?
tl;dr !- It is a column store !- It stores data by key
The NoSQL Pantheon- Key-Value Stores!
- Redis!- Riak!
- Document Stores!- MongoDB!- CouchDB!
- Graph Databases!- Neo4J!
- Column Stores!- HBase!- Cassandra
Image source: http://xaxor.com/travel/12647-pantheon-rome-italy.html
Welcome to the NoSQL Pantheon Where does Cassandra fall in the NoSQL Pantheon? Key value stores are just flat distributed maps. you put a value in by key you get it out by key. Document stores take in structured data and allow for limited querying and indexing Graph databases stores data nodes connected by edges Finally column stores
What is a column store?
In general a column store is a data sink that stores data in a series of nested maps.
A logical view of Cassandra
Map[KeySpaceName,!Map[ColumnFamilyName, !
Map[RowKey,!Map[ColumnKey,Value]]]]
Forget everything you may have heard about tables, columns and rows. This is what Cassandra looks like functionally to a developer. Always keep that in the back of you mind when doing you data modeling.
Keyspace
Column Family
Rows
ColumnsValues
This is diagram is based on slide from the Datastax Casssandra Tech Day LA which I attended recently
source: https://github.com/msabramo/pycon2012-notes/blob/master/apache_cassandra_and_python.rst
Here is an example of a Cassandra Keyspace. Choose a column family inside the column family you have rows in the rows you have key-value pairs
Advantages of this structure- You can group related data together in the same
column family!- Each row has multiple values; each with their own key!- This Map structure is flexible so each row can have
different columns (great for sparse data)
How does Cassandra Work?- Cassandra operates as a peer-to-peer cluster of
nodes
- each node is a stand alone database responsible for certain rows of data
- rows are assigned to nodes deterministically using a hash function (actually you can configure this to be ordered also)
a short mention of rebalancing if you tell to Cassandra add or remove a node from the cluster the nodes are reassigned rows and the existing rows are moved to match the new assignments. So that row location is alway deterministic.
How is Cassandra a cluster then?
if each node is a stand alone database which is it. how is Cassandra a cluster?
- Coordination - Replication - Failover
Coordination
When a client connects to a Cassandra cluster it only connects to one node. That node then “coordinates” with the others nodes in the cluster to store or retrieve the row requested by the client. This node is known as the “coordinator node”.
So what gets coordinated? replication!
ReplicationThe coordinator node will handle replication of writes and reads to the responsible nodes.
The ${replication_factor} is the number of copies Cassandra will keep in the cluster of a particular row. The ${replication_factor} is set when creating the keyspace.
As mentioned the keyspace holds your column families so all column families in that keyspace will have the same replication factor. So different replication factor would be a good reason to have different keyspaces. At Callfire we use one keyspace for everything. !3 is pretty commonly used number for ${replication_factor}
Replication - Write
The coordinator node will simultaneously send the write request to the node that “owns” that row and ${replication_factor} - 1 other nodes moving around the cluster. Waiting for ${write_consistency_level} acknowledgements before returning success to the client.
“Owns” doesn't mean much in Cassandra the owing node is just the first node to start counting replicas from. Consistency level is a very cool feature of Cassandra that lets you decide per op how many nodes must respond to a request. Trading consistency for performance. So if you write with a consistency level less than replication factor you can save time.
Replication - Read
The coordinator node will simultaneously send the read request to node that “owns” that row and ${replication_factor} - 1 other nodes moving around the cluster. Waiting for ${read_consistency_level} acknowledgements before returning the row to the client.
Same goes for read consistency do I need to wait for every node to respond or not?
Failover
As you can see from the way writing and reading to the cluster works failover between nodes is built in! As long as your don't require a consistency level of ${replication_factor} you get some amount of fault tolerance based on your ${replication_factor} and ${read/write_consistency_level}
In our production cluster we run with ${replication_factor} = 3 and usually do ${read/write_consistency_level} = 2 !Which would allow for one of the nodes with that row to be down. It also speeds up reads and writes since we are only waiting for a majority not all. Meaning we never wait for the slowest node.
- ${replication_factor} = # of copies!!
- ${write_consistency_level} = # of nodes that acknowledge a write operation!!
- ${read_consistency_level} = # of nodes that acknowledge a read operation!!
- ${write_consistency_level} =!! ${replication_factor} !! ➔ consistency!!
- ${read_consistency_level} = !! ${replication_factor} !! ➔ consistency!!
- If needed, tune consistency for performance
Write/Read row with key: ${row_key}
80
50 30
1070
60
40
20
Client!
${replication_factor} = 3
token(hash(${row_key})) = 43
this picture(also improved from the Cassandra tech day LA) rather succinctly and colorfully summarizes Cassandra read/write ops at the cluster level 1.our client wants to do some op (read/write) on a row with key ${row_key} 2.our client connects to a node and gives it a the row with key ${row_key} and the op it wants performed 3.the node then calculates where in the cluster this row should be stored based on the ${row_key} and ${replication_factor} with the hash determining
which node is the owner and walk forward cluster nodes until ${replication_factor} nodes hold the data. Again this can be done while the client waits or in the background.
In this way Cassandra always knows which nodes should be participating in an OP. So the white nodes in our example above are not involved at all. And the blue “coordinator” node was chosen by the client somehow. The owning node or one of the replicas easily could have been the coordinating node as well. !One quick thing to mentions. is that after ${replication_factor} nodes goes down you start to loose data but because of the tunable consistency level individual ops could still succeed.
The sum of its parts or more?
So now that we have talked about the Cassandra cluster as a whole lets talk a little about the individual nodes. This is kinda of a detail when working with Cassandra as a developer since you will never speak directly to node you will always go through a client which talks to a node and that coordinator node actually issues the reads and writes to the other nodes. So all of this falls technically falls under the heading of implementation details since as long as Cassandra still stores data in columns, replicates, fails over, is fast, etc. The rest of what I am saying could change and your application would still be humming along just fine. That said this section explains some fundamentals of Cassandra that could help with selecting Cassandra for your use case.
Nodes- A Node is an instance of Cassandra
running in its own JVM!- Nodes speak directly to each other
using peer-to-peer protocols !- Snitch is used to get the topology of
cluster!- Gossip is used to get the state of each
node in the cluster
We mentioned before that a node in Cassandra is a self contained database. So lets talk a little about how the individual nodes work. If the topology changes (node added or removed) or if a node goes down the others nodes find out directly. This causes a lot of chatter especially as cluster size goes up. So Cassandra has its own specialized protocols
Nodes - Write1.Write request arrives at
the node!
2.The node writes simultaneously to memory and to a commit log!
3.Send ack back to client!memory
persistent
memtable
sstablecommit logappend fsynch
- Compaction - Merge sstables - Evict tombstones - Rebuild indexes
1
2
3
This is another diagram that I refactored from the Cassandra Tech Day LA (read slide) Thats it! That is why Cassandra is super fast with writes. Write to memory, write to the commit log. Done! Writes to the commit log are append only and fsynced so no data is ever lost the sequel nature makes it very fast also. The red arrows are unrelated to our particular row write but are on the write path. these steps are: memtables are flushed to disk when they get full (this threshold is configurable) compaction happens periodically to clean up the SSTables on disk
Nodes - Read1.Read request arrives at the
node!2.The node tries to read as
fast as possible. The stack in the diagram shows one of the many places the row could be both in memory and on disk.!
3.Send ack back to client
memory
persistent
sstable
12
3
key cache
bloom filter
OS Cache
memtablerow cache2
There is both more and less happening on the read side. There are a number of optimizations in Cassandra to store rows in memory for quick retrieval but it can still fall through to disk
When accessing Cassandra you have choices!
So at this point if I were listening to this talk I would be thinking ‘OK great that is a lot about Cassandra but I still cant work with it’ !So lets get to some specifics about using Cassandra to store your data from your application.
Java Cassandra Clients
Up till now I have been purposely vague about connecting to Cassandra only talking clients. The reason for that there are tons of Java clients. But first lets talk about …
Thrift vs CQL
There are two mains ways to access Cassandra. Thrift and CQL (the others are Hadoop, Solr, and internal StorageProxyAPI none of these are general purpose) !Short history lesson I promise it is relevant. !Cassandra was originally developed by Facebook for inbox search before being handed over to the Apache foundation. For its interface Facebook chose another Facebook technology Thrift which is a language agnostic RPC framework. Thrift is now an Apache project also. The Thrift interface to Cassandra will be support indefinitely but is not going to get any new features. !CQL (Cassandra Query Language) - a SQL like language used to query Cassandra. It is still relatively new. !New Cassandra users are being shepherded toward CQL even though there are many mature Thrift clients still in active development. !CQL is considered higher level than Thrift but this debatable given the breath of Thrift clients available.
Cassandra Java Clients- CQL: DataStax Java Driver
- Firebrand
- PlayORM
- Astyanax
- Hector
- Kundera
- Easy-Cassandrasource: https://wiki.apache.org/cassandra/ClientOptions
All of these are in active development commits in the last 6 months or less. This is just the list from the apache site there are probably others. All them but the DataStax Java Driver user Thift underneath. You can of course use Thrift directly yourself also. It is also possible to use some of these clients together in the application depending on your usage.
Also cribbed from the datastax slide deck. !Netflix is one of the biggest users on Cassandra and they document their usage quite extensively.
Astyanax by Netflix- De facto Java Cassandra driver pre-CQL
- Very well documented
- Very much in active development still
- Widely used
- OSS
- Feature rich
Netflix’s biggest contribution to Cassandra is Astyanax
We use Astyanax at CallFire
Sound File StorageWe 11TBs of sound files stored in Cassandra
Let me preface this by saying this was the original use of Cassandra at CallFire this project was started 3 years 7 months ago and hasn't been touched in 11 months. Since 3 years 7 months ago we have 81.5M sounds files for a total of 11TB of sounds. With a replication factor of 3 using 33TBs of Cassandra storage. Thats a lot of sounds. Most of these sounds are never played but you need to store all of them since you dont know which ones will be needed in the future. Some clients have auditing requirements around their calls as well making sound storage a hard requirement for them.
Astyanax Setup - using google-guicepublic class AstyanaxModule extends AbstractModule { public static final String KEYSPACE = "CallFire"; private static final Logger LOG = LoggerFactory.getLogger(AstyanaxModule.class); private final Properties astyanaxProperties = PropertiesUtil.loadOrFail("astyanax.properties"); @Override public void configure() { Names.bindProperties(binder(), astyanaxProperties); Multibinder.newSetBinder(binder(), Service.class).addBinding().to(AstyanaxService.class);! // bind a property for a single node String contactNodes = astyanaxProperties.getProperty("astyanax.nodes"); int commaIdx = contactNodes.indexOf(','); String contactNode = commaIdx > 0 ? contactNodes.substring(0, commaIdx - 1) : contactNodes; bind(String.class).annotatedWith(Names.named("astyanax.node")).toInstance(contactNode); } @Provides @Singleton public AstyanaxContext<Keyspace> getCluster( @Named("astyanax.nodes") String contactNodes, @Named("astyanax.port") Integer thriftPort) { LOG.info("Astyanax Nodes: {}:{}", contactNodes, thriftPort); return new AstyanaxContext.Builder() .forCluster("CallFire") .forKeyspace(KEYSPACE) .withAstyanaxConfiguration(new AstyanaxConfigurationImpl() .setDiscoveryType(NodeDiscoveryType.RING_DESCRIBE) ) .withConnectionPoolConfiguration(new ConnectionPoolConfigurationImpl("ConnectionPool") .setPort(thriftPort) .setMaxConnsPerHost(1) .setSeeds(contactNodes) ) .withConnectionPoolMonitor(new CountingConnectionPoolMonitor()) .buildKeyspace(ThriftFamilyFactory.getInstance()); } @Provides @Singleton public Keyspace getKeyspace(AstyanaxContext<Keyspace> context) { return context.getClient(); }
So here we setup our Astyanax beans in particular we need the keyspace which is a property of AstyanaxContext . !The only information we provide Astyanax is a comma delimited list of seed nodes from which it can discover the other hosts in the cluster, the port Cassandra is listening on for Thrift, the cluster and keyspace name (both “CallFire” in this case). !The rest is pretty standard: a setup connection pool and use ring describe to discover other nodes which is just asking a node to run the describe command and give you back the cluster info.
Astyanax Setup - using google-guice
astyanax.nodes=localhostastyanax.port=9160
just for completeness here is an example of the contents of the properties file
public class CallSoundDataStore implements SoundDataStore<CallSound> { private static final Logger LOG = LoggerFactory.getLogger(AbstractSoundDataStore.class); private static final String COLUMN_NAME = "data"; private final ColumnFamily<String, String> columnFamily; @Inject private Keyspace keyspace;! public CallSoundDataStore() { this.columnFamily = new ColumnFamily<>(“CallSounds" , StringSerializer.get(), StringSerializer.get()); } private String getRowId(CallSound sound) { return sound.getId().toString(); } @Override public InputStream loadData(CallSound sound) throws IOException { try { Column<String> column = keyspace.prepareQuery(columnFamily) .getRow(getRowId(sound)) .execute() .getResult() .getColumnByName(COLUMN_NAME); if (column == null) { throw new IOException("sound " + sound + " is missing sound data"); } LOG.debug("read {} bytes for sound {}", column.getByteArrayValue().length, sound); return new ByteArrayInputStream(column.getByteArrayValue()); } catch (ConnectionException e) { throw new IOException("exception reading sound " + sound, e); } } @Override public void saveData(CallSound sound, final InputStream inStream) throws IOException { try { int size = sound.getLengthInBytes() != null ? sound.getLengthInBytes().intValue() : 0; MutationBatch batch = keyspace.prepareMutationBatch(); batch.withRow(columnFamily, getRowId(sound)) .putColumn(COLUMN_NAME, StreamUtil.toByteArray(inStream, size)); batch.execute(); } catch (ConnectionException e) { throw new IOException("exception saving sound " + sound, e); } }}
some blank lines removed so this can fit OK on one slide !note we did not set an explicit consistency level the default in Astyanax is 2. you can of course explicitly set the consistency level by using setConsistencyLevel() after prepareQuery() i didn't include CallSound since it mostly metadata like the Account and Call it is associated with the CallSound. The relevant bits are seen here like the id and the length. !As you can see it pretty straight forward save and load by key.
Hazelcast Backing Store
Hazelcast in Brief- Distributed In-Memory Data Grid
- Exposed as a Map and a Queue (among other things)
- Can be “backed” by a “store”
- Does read/write through and read/write behind
Cassandra as a Backing Store- Greatly expands the usefulness of Hazelcast
since not everything has to be in memory with a very small penalty (in other words Cassandra is really really fast)
- Adds great fault tolerance to Hazelcast since you have real persistent storage behind it
- Transparent to the application; it is just talking to a Map
This was an idea I brought over from my last gig where we used Hazelcast and backed it by HBase (shameless plug see my other presentation on HBase at http://www.mylife.com/eng-blog/mylife-with-hbase-or-hbase-three-flavors/ ) !
To Eclipse!
This code is a little bit more complicated so it is harder to graft into a slide deck to let us switch over to eclipse. !code to have open in eclipse: !DeliveryReportDaoImpl GatewayKeyMapStore used in DeliveryReportDaoImpl // Maps gateway:gatewayId -> brand:messageId private IMap<String, String> gatewayKeyMap; - first with DeliveryReportMapStore AbstractCassandraMapStore MapStore MapLoader AbstractCassandraCRUD !