Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cassandra for Pythonistas

Cassandra for Pythonistas

Talk given at PyCon APAC 2013 on Cassandra drivers for Python with a focus on cassandra-driver.

Avatar for Sébastien Béal

Sébastien Béal

September 14, 2013
Tweet

Other Decks in Programming

Transcript

  1. Other Features • Partitioner • Data replication: ‣ Simple Strategy

    (1 datacenter) ‣ Network Topology Strategy • Compaction
  2. Data Model keyspace column family column family row row row

    column column column row row row row super column super column super column super column column column column column
  3. Data Model column family = {row key: {column name: value}

    } column family = {row key: {super column name: {column name: value} } }
  4. Communication • Thrift • Cassandra Query Language (CQL) • CQL

    2 • CQL 3 (Cassandra 1.2.x) • CQL 3.1 (Cassandra 2.0+)
  5. Python Packages • Pycassa (Thrift) • Telephus (Thrift, twisted) •

    Silverberg (CQL, twisted) • cassandra-dbapi2 (CQL, PEP249) • cassandra-driver (CQL3, libev)
  6. cassandra driver • Released in August 2013 • Designed for

    CQL • Replacement for Pycassa Still in Beta!
  7. CQL • “Denormalized SQL” ‣ No joins ‣ No sub-queries

    ‣ No aggregation ‣ Limited ORDER BY
  8. Keyspace from cassandra.cluster import Cluster cluster = Cluster() session =

    cluster.connect() session.execute("CREATE KEYSPACE Keyspace WITH REPLICATION = {'class' : 'SimpleStrategy', 'replication_factor': 1};") session.set_keyspace("Keyspace")
  9. Column Family session.execute("CREATE TABLE users (" "username varchar," "gender varchar,"

    "session_token varchar," "birth_year bigint," "PRIMARY KEY (user_name));")
  10. Prepared Statement query = "INSERT INTO users (username, gender, birth_year)

    VALUES (?, ?, ?)" prepared = session.prepare(query) session.execute(prepared.bind(('seb', 'M', 1984)))
  11. Prepared Statement from cassandra.query import ValueSequence users = ('alice', 'bob',

    'seb') query = "SELECT * FROM users WHERE user_id IN ?" session.execute(query, parameters=[ValueSequence(users)])
  12. Decoder session.execute("SELECT * FROM users") # [Row(username=u'seb', birth_year=1984, gender=u'M', session_token=None)]

    from cassandra.decoder import ordered_dict_factory session.row_factory = ordered_dict_factory session.execute("SELECT * FROM users") # [OrderedDict([(u'user_name', u'seb'), ( u'birth_year', 1984), (u'gender', u'M'), (u'session_token', None)])]
  13. Async Calls future = session.execute_async("SELECT * FROM users") def print_results(results):

    for row in results: print "Results: %s" % row def print_error(exc): print "Operation failed: %s" % exc future.add_callbacks(print_results, print_error) # Results: Row(user_name=u'seb', birth_year=1984, gender=u'M', session_token=None)
  14. Lessons Learned • CQL vs Thrift / C* vocabulary •

    Row size limit: row sharding • Opscenter for supervising
  15. Time Series Data CREATE TABLE temperature ( sensor_id varchar, ts

    timestamp, temperature float, PRIMARY KEY (sensor_id, ts)); compound primary key (partition key, clustering key)
  16. Time Series Data CREATE TABLE temperature_by_day ( sensor_id varchar, date

    text, ts timestamp, temperature float, PRIMARY KEY ((sensor_id, date), ts) ) WITH CLUSTERING ORDER BY (ts DESC); reverse order composite partition key