Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Store JSON in Cassandra the Hard Way

Store JSON in Cassandra the Hard Way

Keen IO stores events in the same format that our customers send them in—JSON. Yet, Keen uses Cassandra, a distributed database without any JSON primitives, and Keen gives customers the ability to query over arbitrary (even nested) JSON dimensions. How can this be???

Josh Dzielak

July 23, 2014
Tweet

More Decks by Josh Dzielak

Other Decks in Technology

Transcript

  1. Cassandra does not have a
    JSON data type

    View full-size slide

  2. Cassandra does not have a
    JSON data type
    What do we do?

    View full-size slide

  3. {
    "id": “c645-abd3“,
    "timestamp": "2013-08-06T12:30:00-0700Z",
    "url": "https://keen.io",
    "method": "GET",
    "response": {
    "status": 200
    }
    }
    Example JSON Event
    Models an HTTP request

    View full-size slide

  4. Use static column families?

    View full-size slide

  5. Use static column families?
    CREATE TABLE requests (
    KEY uuid PRIMARY KEY,
    timestamp text,
    url text,
    response_status integer,

    )

    View full-size slide

  6. Use static column families?
    • Possible if schema is known in advance
    CREATE TABLE requests (
    KEY uuid PRIMARY KEY,
    timestamp text,
    url text,
    response_status integer,

    )

    View full-size slide

  7. Use static column families?
    • Possible if schema is known in advance
    • Still might not be fast to scan +1M / +1B rows
    CREATE TABLE requests (
    KEY uuid PRIMARY KEY,
    timestamp text,
    url text,
    response_status integer,

    )

    View full-size slide

  8. Use static column families?
    • Possible if schema is known in advance
    • Still might not be fast to scan +1M / +1B rows
    • Not possible with Keen - schema is implicitly defined by
    customers, and not enforced at write-time
    CREATE TABLE requests (
    KEY uuid PRIMARY KEY,
    timestamp text,
    url text,
    response_status integer,

    )

    View full-size slide

  9. Use static column families?
    • Possible if schema is known in advance
    • Still might not be fast to scan +1M / +1B rows
    • Not possible with Keen - schema is implicitly defined by
    customers, and not enforced at write-time
    • Keen has 500,000+ user-defined collections today,
    stored in 1 Cassandra column family
    CREATE TABLE requests (
    KEY uuid PRIMARY KEY,
    timestamp text,
    url text,
    response_status integer,

    )

    View full-size slide

  10. 3 Perfectly Plausible Attempts
    to Store JSON in Cassandra

    View full-size slide

  11. 3 Perfectly Plausible Attempts
    to Store JSON in Cassandra
    And The 1 That Worked

    View full-size slide

  12. requests
    cef7-be80:TimeUUID() a87b-472c:TimeUUID() . . .
    { “status” : 200 } { “status” : 400 } . . .
    Row Key UTF8Type
    User’s collection name (w/ Project ID prefix, omitted for brevity)
    !
    Column Name Composite(UTF8Type, TimeUUID)
    A unique UUID for the event, plus a TimeUUID
    CF: events

    View full-size slide

  13. Good Simple write path, can filter by timeframe
    CF: events
    requests
    cef7-be80:TimeUUID() a87b-472c:TimeUUID() . . .
    { “status” : 200 } { “status” : 400 } . . .

    View full-size slide

  14. Good Simple write path, can filter by timeframe
    •Row growth is unbounded; hot spots for big collections
    •Can only filter on time by > or <, not both
    •Little gains from parallelizing b/c row is the same
    •JSON deserialization in queries is expensive
    Bad
    CF: events
    requests
    cef7-be80:TimeUUID() a87b-472c:TimeUUID() . . .
    { “status” : 200 } { “status” : 400 } . . .

    View full-size slide

  15. requests-0
    cef7-be80:TimeUUID() a87b-472c:TimeUUID() . . .
    { “status” : 200 } { “status” : 304 } . . .
    requests-1
    9f45-bf97:TimeUUID() 76ab-dca6:TimeUUID() . . .
    { “status” : 417 } { “status” : 200 } . . .
    requests-n . . . . . . . . .
    CF: events

    View full-size slide

  16. • Turned ‘requests’ into ‘requests-[0…n]’
    requests-0
    cef7-be80:TimeUUID() a87b-472c:TimeUUID() . . .
    { “status” : 200 } { “status” : 304 } . . .
    requests-1
    9f45-bf97:TimeUUID() 76ab-dca6:TimeUUID() . . .
    { “status” : 417 } { “status” : 200 } . . .
    requests-n . . . . . . . . .
    CF: events

    View full-size slide

  17. • Turned ‘requests’ into ‘requests-[0…n]’
    • No more unbounded row growth
    requests-0
    cef7-be80:TimeUUID() a87b-472c:TimeUUID() . . .
    { “status” : 200 } { “status” : 304 } . . .
    requests-1
    9f45-bf97:TimeUUID() 76ab-dca6:TimeUUID() . . .
    { “status” : 417 } { “status” : 200 } . . .
    requests-n . . . . . . . . .
    CF: events

    View full-size slide

  18. • Turned ‘requests’ into ‘requests-[0…n]’
    • No more unbounded row growth
    •Obvious, effective way to parallelize queries
    requests-0
    cef7-be80:TimeUUID() a87b-472c:TimeUUID() . . .
    { “status” : 200 } { “status” : 304 } . . .
    requests-1
    9f45-bf97:TimeUUID() 76ab-dca6:TimeUUID() . . .
    { “status” : 417 } { “status” : 200 } . . .
    requests-n . . . . . . . . .
    CF: events

    View full-size slide

  19. • Turned ‘requests’ into ‘requests-[0…n]’
    • No more unbounded row growth
    •Obvious, effective way to parallelize queries
    •Required another CF, an ‘index’ to store pointers to these
    ‘buckets’:
    requests-0
    cef7-be80:TimeUUID() a87b-472c:TimeUUID() . . .
    { “status” : 200 } { “status” : 304 } . . .
    requests-1
    9f45-bf97:TimeUUID() 76ab-dca6:TimeUUID() . . .
    { “status” : 417 } { “status” : 200 } . . .
    requests-n . . . . . . . . .
    requests
    0 1 . . .
    { “start” : “2014-07-21” … } { “start” : “2014-07-22” … } . . .
    CF: index
    CF: events

    View full-size slide

  20. Concerns
    CF: events
    requests-0
    cef7-be80:TimeUUID() a87b-472c:TimeUUID() . . .
    { “status” : 200 } { “status” : 304 } . . .
    requests-1
    9f45-bf97:TimeUUID() 76ab-dca6:TimeUUID() . . .
    { “status” : 417 } { “status” : 200 } . . .
    requests-n . . . . . . . . .

    View full-size slide

  21. •Write path has become more complex; position of
    bucket index must be kept in another CF and rolled over
    Concerns
    CF: events
    requests-0
    cef7-be80:TimeUUID() a87b-472c:TimeUUID() . . .
    { “status” : 200 } { “status” : 304 } . . .
    requests-1
    9f45-bf97:TimeUUID() 76ab-dca6:TimeUUID() . . .
    { “status” : 417 } { “status” : 200 } . . .
    requests-n . . . . . . . . .

    View full-size slide

  22. •Write path has become more complex; position of
    bucket index must be kept in another CF and rolled over
    •Keeping consistent state for more than 1 CF 

    (solution: atomic batch)
    Concerns
    CF: events
    requests-0
    cef7-be80:TimeUUID() a87b-472c:TimeUUID() . . .
    { “status” : 200 } { “status” : 304 } . . .
    requests-1
    9f45-bf97:TimeUUID() 76ab-dca6:TimeUUID() . . .
    { “status” : 417 } { “status” : 200 } . . .
    requests-n . . . . . . . . .

    View full-size slide

  23. •Write path has become more complex; position of
    bucket index must be kept in another CF and rolled over
    •Keeping consistent state for more than 1 CF 

    (solution: atomic batch)
    •Still have to deserialize JSON at query time
    Concerns
    CF: events
    requests-0
    cef7-be80:TimeUUID() a87b-472c:TimeUUID() . . .
    { “status” : 200 } { “status” : 304 } . . .
    requests-1
    9f45-bf97:TimeUUID() 76ab-dca6:TimeUUID() . . .
    { “status” : 417 } { “status” : 200 } . . .
    requests-n . . . . . . . . .

    View full-size slide

  24. •Write path has become more complex; position of
    bucket index must be kept in another CF and rolled over
    •Keeping consistent state for more than 1 CF 

    (solution: atomic batch)
    •Still have to deserialize JSON at query time
    •Still not fast enough
    Concerns
    CF: events
    requests-0
    cef7-be80:TimeUUID() a87b-472c:TimeUUID() . . .
    { “status” : 200 } { “status” : 304 } . . .
    requests-1
    9f45-bf97:TimeUUID() 76ab-dca6:TimeUUID() . . .
    { “status” : 417 } { “status” : 200 } . . .
    requests-n . . . . . . . . .

    View full-size slide

  25. We are here Why?

    View full-size slide

  26. Our assumptions about Cassandra
    were not valid.

    View full-size slide

  27. Our assumptions about Cassandra
    were not valid.
    Namely, that it would do most of the work for us.

    View full-size slide

  28. - Dieter Rams

    View full-size slide

  29. The Epiphany

    View full-size slide

  30. The Epiphany
    Don’t let the physical data model dictate
    the logical data model.

    View full-size slide

  31. The Epiphany
    Don’t let the physical data model dictate
    the logical data model.
    Define an ideal logical model first. Then find a way
    to implement it using a physical data store.

    View full-size slide

  32. The Epiphany
    Don’t let the physical data model dictate
    the logical data model.
    Define an ideal logical model first. Then find a way
    to implement it using a physical data store.
    This was not obvious coming from the relational
    or document store worlds.

    View full-size slide

  33. The Epiphany
    Don’t let the physical data model dictate
    the logical data model.
    Define an ideal logical model first. Then find a way
    to implement it using a physical data store.
    This was not obvious coming from the relational
    or document store worlds.
    It felt dirty. But it doesn’t have to be.

    View full-size slide

  34. Don’t store JSON as JSON!
    The Epiphany

    View full-size slide

  35. requests-0
    timestamp response.status url
    [“2014-07-21…”,
    “2014-07-21…”]
    [200,
    400]
    [“keen.io”,
    “keen.io”]
    requests-1
    timestamp response.status url
    [“2014-07-21…”,
    “2014-07-21…”]
    [200,
    400]
    [“keen.io”,
    “keen.io”]
    Row Key UTF8Type
    User’s collection name with a ‘bucket’ sequence number
    !
    Column Name UTF8Type
    The dotted.name of the property
    !
    Column Value BytesType
    A Kryo-serialized, compressed Object[]
    containing ~5000 property values
    CF: events

    View full-size slide

  36. requests-0
    timestamp response.status url
    [“2014-07-21…”,
    “2014-07-21…”]
    [200,
    400]
    [“keen.io”,
    “keen.io”]
    requests-1
    timestamp response.status url
    [“2014-07-21…”,
    “2014-07-21…”]
    [200,
    400]
    [“keen.io”,
    “keen.io”]
    Notes
    Order of properties in the Object[] arrays is *very* important!
    !
    timestamp[5], response.status[5] and url[5] must be
    properties from the same event
    CF: events

    View full-size slide

  37. Super-fast queries due to:
    • columnar access
    • columnar compression
    • Kryo deserialization (fast!)
    • parallelism via bucketing
    The Good

    View full-size slide

  38. Super-fast queries due to:
    • columnar access
    • columnar compression
    • Kryo deserialization (fast!)
    • parallelism via bucketing
    Tradeoffs
    • Application code is more complicated.
    • Writes and reads must understand this structure.
    The Good

    View full-size slide

  39. Super-fast queries due to:
    • columnar access
    • columnar compression
    • Kryo deserialization (fast!)
    • parallelism via bucketing
    Tradeoffs
    • Application code is more complicated.
    • Writes and reads must understand this structure.
    The Good
    Worth it!

    View full-size slide

  40. Lessons Learned
    • Set performance targets early on
    !
    • Design a logical data model first with the
    characteristics you need (columnar,
    compressible, partition-able), then figure out
    how to project it onto physical storage.
    !
    • Don’t be afraid to try crazy stuff that would feel
    unnatural in the relational or document worlds

    View full-size slide

  41. 3 Awesome Things That
    Happened In Production
    That Were Not
    Awesome At The Time

    View full-size slide

  42. Doomstones
    A machine with corrupted RAM propagated
    future dated tombstones around the ring. This
    made data ‘invisible’ for number of row keys.

    View full-size slide

  43. Serialization
    Incompatibility
    Code change that changed serialization format
    made it into write path before read path. 

    Result: KryoException

    View full-size slide

  44. Clock Drift
    When ring clocks are not in sync “Last Write
    Wins” becomes “Any Write Wins”.

    View full-size slide

  45. Thanks!
    !
    Questions?
    Talk at my face
    !
    or email me at
    [email protected]

    View full-size slide