Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Store JSON in Cassandra the Hard Way

Store JSON in Cassandra the Hard Way

Keen IO stores events in the same format that our customers send them in—JSON. Yet, Keen uses Cassandra, a distributed database without any JSON primitives, and Keen gives customers the ability to query over arbitrary (even nested) JSON dimensions. How can this be???

Josh Dzielak

July 23, 2014
Tweet

More Decks by Josh Dzielak

Other Decks in Technology

Transcript

  1. Use static column families? CREATE TABLE requests ( KEY uuid

    PRIMARY KEY, timestamp text, url text, response_status integer, … )
  2. Use static column families? • Possible if schema is known

    in advance CREATE TABLE requests ( KEY uuid PRIMARY KEY, timestamp text, url text, response_status integer, … )
  3. Use static column families? • Possible if schema is known

    in advance • Still might not be fast to scan +1M / +1B rows CREATE TABLE requests ( KEY uuid PRIMARY KEY, timestamp text, url text, response_status integer, … )
  4. Use static column families? • Possible if schema is known

    in advance • Still might not be fast to scan +1M / +1B rows • Not possible with Keen - schema is implicitly defined by customers, and not enforced at write-time CREATE TABLE requests ( KEY uuid PRIMARY KEY, timestamp text, url text, response_status integer, … )
  5. Use static column families? • Possible if schema is known

    in advance • Still might not be fast to scan +1M / +1B rows • Not possible with Keen - schema is implicitly defined by customers, and not enforced at write-time • Keen has 500,000+ user-defined collections today, stored in 1 Cassandra column family CREATE TABLE requests ( KEY uuid PRIMARY KEY, timestamp text, url text, response_status integer, … )
  6. requests cef7-be80:TimeUUID() a87b-472c:TimeUUID() . . . { “status” : 200

    } { “status” : 400 } . . . Row Key UTF8Type User’s collection name (w/ Project ID prefix, omitted for brevity) ! Column Name Composite(UTF8Type, TimeUUID) A unique UUID for the event, plus a TimeUUID CF: events
  7. Good Simple write path, can filter by timeframe CF: events

    requests cef7-be80:TimeUUID() a87b-472c:TimeUUID() . . . { “status” : 200 } { “status” : 400 } . . .
  8. Good Simple write path, can filter by timeframe •Row growth

    is unbounded; hot spots for big collections •Can only filter on time by > or <, not both •Little gains from parallelizing b/c row is the same •JSON deserialization in queries is expensive Bad CF: events requests cef7-be80:TimeUUID() a87b-472c:TimeUUID() . . . { “status” : 200 } { “status” : 400 } . . .
  9. requests-0 cef7-be80:TimeUUID() a87b-472c:TimeUUID() . . . { “status” : 200

    } { “status” : 304 } . . . requests-1 9f45-bf97:TimeUUID() 76ab-dca6:TimeUUID() . . . { “status” : 417 } { “status” : 200 } . . . requests-n . . . . . . . . . CF: events
  10. • Turned ‘requests’ into ‘requests-[0…n]’ requests-0 cef7-be80:TimeUUID() a87b-472c:TimeUUID() . .

    . { “status” : 200 } { “status” : 304 } . . . requests-1 9f45-bf97:TimeUUID() 76ab-dca6:TimeUUID() . . . { “status” : 417 } { “status” : 200 } . . . requests-n . . . . . . . . . CF: events
  11. • Turned ‘requests’ into ‘requests-[0…n]’ • No more unbounded row

    growth requests-0 cef7-be80:TimeUUID() a87b-472c:TimeUUID() . . . { “status” : 200 } { “status” : 304 } . . . requests-1 9f45-bf97:TimeUUID() 76ab-dca6:TimeUUID() . . . { “status” : 417 } { “status” : 200 } . . . requests-n . . . . . . . . . CF: events
  12. • Turned ‘requests’ into ‘requests-[0…n]’ • No more unbounded row

    growth •Obvious, effective way to parallelize queries requests-0 cef7-be80:TimeUUID() a87b-472c:TimeUUID() . . . { “status” : 200 } { “status” : 304 } . . . requests-1 9f45-bf97:TimeUUID() 76ab-dca6:TimeUUID() . . . { “status” : 417 } { “status” : 200 } . . . requests-n . . . . . . . . . CF: events
  13. • Turned ‘requests’ into ‘requests-[0…n]’ • No more unbounded row

    growth •Obvious, effective way to parallelize queries •Required another CF, an ‘index’ to store pointers to these ‘buckets’: requests-0 cef7-be80:TimeUUID() a87b-472c:TimeUUID() . . . { “status” : 200 } { “status” : 304 } . . . requests-1 9f45-bf97:TimeUUID() 76ab-dca6:TimeUUID() . . . { “status” : 417 } { “status” : 200 } . . . requests-n . . . . . . . . . requests 0 1 . . . { “start” : “2014-07-21” … } { “start” : “2014-07-22” … } . . . CF: index CF: events
  14. Concerns CF: events requests-0 cef7-be80:TimeUUID() a87b-472c:TimeUUID() . . . {

    “status” : 200 } { “status” : 304 } . . . requests-1 9f45-bf97:TimeUUID() 76ab-dca6:TimeUUID() . . . { “status” : 417 } { “status” : 200 } . . . requests-n . . . . . . . . .
  15. •Write path has become more complex; position of bucket index

    must be kept in another CF and rolled over Concerns CF: events requests-0 cef7-be80:TimeUUID() a87b-472c:TimeUUID() . . . { “status” : 200 } { “status” : 304 } . . . requests-1 9f45-bf97:TimeUUID() 76ab-dca6:TimeUUID() . . . { “status” : 417 } { “status” : 200 } . . . requests-n . . . . . . . . .
  16. •Write path has become more complex; position of bucket index

    must be kept in another CF and rolled over •Keeping consistent state for more than 1 CF 
 (solution: atomic batch) Concerns CF: events requests-0 cef7-be80:TimeUUID() a87b-472c:TimeUUID() . . . { “status” : 200 } { “status” : 304 } . . . requests-1 9f45-bf97:TimeUUID() 76ab-dca6:TimeUUID() . . . { “status” : 417 } { “status” : 200 } . . . requests-n . . . . . . . . .
  17. •Write path has become more complex; position of bucket index

    must be kept in another CF and rolled over •Keeping consistent state for more than 1 CF 
 (solution: atomic batch) •Still have to deserialize JSON at query time Concerns CF: events requests-0 cef7-be80:TimeUUID() a87b-472c:TimeUUID() . . . { “status” : 200 } { “status” : 304 } . . . requests-1 9f45-bf97:TimeUUID() 76ab-dca6:TimeUUID() . . . { “status” : 417 } { “status” : 200 } . . . requests-n . . . . . . . . .
  18. •Write path has become more complex; position of bucket index

    must be kept in another CF and rolled over •Keeping consistent state for more than 1 CF 
 (solution: atomic batch) •Still have to deserialize JSON at query time •Still not fast enough Concerns CF: events requests-0 cef7-be80:TimeUUID() a87b-472c:TimeUUID() . . . { “status” : 200 } { “status” : 304 } . . . requests-1 9f45-bf97:TimeUUID() 76ab-dca6:TimeUUID() . . . { “status” : 417 } { “status” : 200 } . . . requests-n . . . . . . . . .
  19. The Epiphany Don’t let the physical data model dictate the

    logical data model. Define an ideal logical model first. Then find a way to implement it using a physical data store.
  20. The Epiphany Don’t let the physical data model dictate the

    logical data model. Define an ideal logical model first. Then find a way to implement it using a physical data store. This was not obvious coming from the relational or document store worlds.
  21. The Epiphany Don’t let the physical data model dictate the

    logical data model. Define an ideal logical model first. Then find a way to implement it using a physical data store. This was not obvious coming from the relational or document store worlds. It felt dirty. But it doesn’t have to be.
  22. requests-0 timestamp response.status url [“2014-07-21…”, “2014-07-21…”] [200, 400] [“keen.io”, “keen.io”]

    requests-1 timestamp response.status url [“2014-07-21…”, “2014-07-21…”] [200, 400] [“keen.io”, “keen.io”] Row Key UTF8Type User’s collection name with a ‘bucket’ sequence number ! Column Name UTF8Type The dotted.name of the property ! Column Value BytesType A Kryo-serialized, compressed Object[] containing ~5000 property values CF: events
  23. requests-0 timestamp response.status url [“2014-07-21…”, “2014-07-21…”] [200, 400] [“keen.io”, “keen.io”]

    requests-1 timestamp response.status url [“2014-07-21…”, “2014-07-21…”] [200, 400] [“keen.io”, “keen.io”] Notes Order of properties in the Object[] arrays is *very* important! ! timestamp[5], response.status[5] and url[5] must be properties from the same event CF: events
  24. Super-fast queries due to: • columnar access • columnar compression

    • Kryo deserialization (fast!) • parallelism via bucketing The Good
  25. Super-fast queries due to: • columnar access • columnar compression

    • Kryo deserialization (fast!) • parallelism via bucketing Tradeoffs • Application code is more complicated. • Writes and reads must understand this structure. The Good
  26. Super-fast queries due to: • columnar access • columnar compression

    • Kryo deserialization (fast!) • parallelism via bucketing Tradeoffs • Application code is more complicated. • Writes and reads must understand this structure. The Good Worth it!
  27. Lessons Learned • Set performance targets early on ! •

    Design a logical data model first with the characteristics you need (columnar, compressible, partition-able), then figure out how to project it onto physical storage. ! • Don’t be afraid to try crazy stuff that would feel unnatural in the relational or document worlds
  28. Doomstones A machine with corrupted RAM propagated future dated tombstones

    around the ring. This made data ‘invisible’ for number of row keys.
  29. Serialization Incompatibility Code change that changed serialization format made it

    into write path before read path. 
 Result: KryoException
  30. Clock Drift When ring clocks are not in sync “Last

    Write Wins” becomes “Any Write Wins”.