Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cassandra Python Driver: Benchmarking concurrency for the NYT ⨍aбrik Platform

Cassandra Python Driver: Benchmarking concurrency for the NYT ⨍aбrik Platform

Systems architect Michael Laing explains how how Apache Cassandra is used with Python in the NY Times ⨍aбrik messaging platform. This talk was presented at Cassandra Day New York in June 2014. Here's the video.

More Decks by The New York Times Developers

Other Decks in Technology

Transcript

  1. A  Global  Mesh  with  a  Memory   Message-­‐based:  WebSocket,  AMQP,

     SockJS   If  in  doubt:   •  Resend   •  Reconnect   •  Reread   Idempotent:   •  Replicating   •  Racy   •  Resolving   Classes  of  service:   •  Gold:  replicate/race   •  Silver:  prioritize   •  Bronze:  queueable   Millions  of  users  
  2. Message:  an  event  with  data   CREATE TABLE source_data (

    hash_key int, -- real ones are more complex message_id timeuuid, body blob, -- whatever metadata text, -- JSON PRIMARY KEY (hash_key, message_id) );
  3. Client Rabbit Cassandra Concurrent   Degree  =  3    

    (using  the   Libev  event   Loop)     Asynchronous:   CQL  Native  only  
  4. Client Rabbit Cassandra More  Concurrency     Can  also  try:

      •  DC  Aware   •  Token  Aware   •  Subprocessing    
  5. Build  one   def build_message(self): message = { "message_id": str(uuid.uuid1()),

    "hash_key": randint(0, self._hash_key_range), # int(e ** 8) "app_id": self._app_id, "timestamp": datetime.utcnow().isoformat() + 'Z', "content_type": "application/binary", "body": os.urandom(randint(1, self._body_range)) # int(e ** 9) }
  6. Kick-­‐off   def push_message(self): if self._submitted_count.next() < self._message_count: message =

    self.build_message() self.submit_query(message) def push_initial_data(self): self._start_time = time() try: with self._lock: for i in range( 0, min(CONCURRENCY, self._message_count) ): self.push_message()
  7. Put  it  in  the  pipeline   def submit_query(self, message): body

    = message.pop('body') substitution_args = ( json.dumps(message, **JSON_DUMPS_ARGS), body, message['hash_key'], uuid.UUID(message['message_id']) ) future = self._cql_session.execute_async( self._query, substitution_args ) future.add_callback(self.push_or_finish) future.add_errback(self.note_error)
  8. Maintain  concurrency  or  <inish   def push_or_finish(self, _): try: if

    ( self._unfinished and self._confirmed_count.next() < self._message_count ): with self._lock: self.push_message() else: self.finish()
  9. Push  some  messages   usage: bm_push.py [-h] [-c [CQL_HOST [CQL_HOST

    ...]]] [-d LOCAL_DC] [--remote-dc-hosts REMOTE_DC_HOSTS] [-p PREFETCH_COUNT] [-w WORKER_COUNT] [-a] [-t] [-n {ONE, TWO, THREE, QUORUM, ALL, LOCAL_QUORUM, EACH_QUORUM, SERIAL, LOCAL_SERIAL, LOCAL_ONE}] [-r] [-j] [-l {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}] Push messages from a RabbitMQ queue into a Cassandra table.
  10. Push  messages  many  times   usage: run_push.py [-h] [-c [CQL_HOST

    [CQL_HOST ...]]] [-i ITERATIONS] [-d LOCAL_DC] [-w [worker_count [worker_count ...]]] [-p [prefetch_count [prefetch_count ...]]] [-n [level [level ...]]] [-a] [-t] [-m MESSAGE_EXPONENT] [-b BODY_EXPONENT] [-l {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}] Run multiple test cases based upon the product of worker_counts, prefetch_counts, and consistency_levels. Each test case may be run with up to 4 variations reflecting the use or not of the dc_aware and token_aware policies. The results are output to stdout as a JSON object.