Download - Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Cassandra python driver Benchmarking concurrency for nyt aбrik⨍[email protected]
A Global Mesh with a Memory
Message-based: WebSocket, AMQP, SockJS
If in doubt:• Resend• Reconnect• Reread
Idempotent:• Replicating• Racy• Resolving
Classes of service:• Gold: replicate/race• Silver: prioritize• Bronze: queueable
Millions of users
Message: an event with data
CREATE TABLE source_data ( hash_key int, -- real ones are more complex message_id timeuuid, body blob, -- whatever metadata text, -- JSON PRIMARY KEY (hash_key, message_id));
1-10kb
1-10kb
Ack
Ack
Push
1kb
1kb
10-150kb
10-150kb
Pull
Synchronous:C* Thrift orCQL Native
ConcurrentDegree = 3
(using theLibev eventLoop)
Asynchronous:CQL Native only
More Concurrency
Can also try:• DC Aware• Token Aware• Subprocessing
Build one
def build_message(self): message = { "message_id": str(uuid.uuid1()), "hash_key": randint(0, self._hash_key_range), # int(e ** 8) "app_id": self._app_id, "timestamp": datetime.utcnow().isoformat() + 'Z', "content_type": "application/binary", "body": os.urandom(randint(1, self._body_range)) # int(e ** 9) }
Kick-off
def push_message(self): if self._submitted_count.next() < self._message_count: message = self.build_message() self.submit_query(message)
def push_initial_data(self): self._start_time = time()
try: with self._lock: for i in range( 0, min(CONCURRENCY, self._message_count) ): self.push_message()
Put it in the pipeline
def submit_query(self, message): body = message.pop('body')
substitution_args = ( json.dumps(message, **JSON_DUMPS_ARGS), body, message['hash_key'], uuid.UUID(message['message_id']) )
future = self._cql_session.execute_async( self._query, substitution_args )
future.add_callback(self.push_or_finish) future.add_errback(self.note_error)
Maintain concurrency or finish
def push_or_finish(self, _): try: if ( self._unfinished and self._confirmed_count.next() < self._message_count ): with self._lock: self.push_message() else: self.finish()
1-10kb
1-10kb
Ack
Ack
Push
Push some messages
usage: bm_push.py [-h] [-c [CQL_HOST [CQL_HOST ...]]] [-d LOCAL_DC] [--remote-dc-hosts REMOTE_DC_HOSTS] [-p PREFETCH_COUNT] [-w WORKER_COUNT] [-a] [-t] [-n {ONE, TWO, THREE, QUORUM, ALL, LOCAL_QUORUM, EACH_QUORUM, SERIAL, LOCAL_SERIAL, LOCAL_ONE}] [-r] [-j] [-l {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}]
Push messages from a RabbitMQ queue into a Cassandra table.
Push messages many times
usage: run_push.py [-h] [-c [CQL_HOST [CQL_HOST ...]]] [-i ITERATIONS] [-d LOCAL_DC] [-w [worker_count [worker_count ...]]] [-p [prefetch_count [prefetch_count ...]]] [-n [level [level ...]]] [-a] [-t] [-m MESSAGE_EXPONENT] [-b BODY_EXPONENT] [-l {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}]
Run multiple test cases based upon the product of worker_counts,prefetch_counts, and consistency_levels. Each test case may be run with up to4 variations reflecting the use or not of the dc_aware and token_awarepolicies. The results are output to stdout as a JSON object.
1kb
1kb
10-150kb
10-150kb
Pull