1, 2, 3, 4 Add Another Data Store (And Other Rhymes)

Cf6bf9ccf0e6602f915a9db78480cc92?s=47 simplereach
September 05, 2012

1, 2, 3, 4 Add Another Data Store (And Other Rhymes)

Eric Lubow's presentation from the 2012 Cassandra Summit.

Cf6bf9ccf0e6602f915a9db78480cc92?s=128

simplereach

September 05, 2012
Tweet

Transcript

  1. 1.

    1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow elubow@simplereach.com #cassandra12
  2. 2.

    1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow Overview • SimpleReach • Definitions and Data Stores • Evolution to Polyglottany • Tie It Together • Questions
  3. 4.

    1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow Size • 100m events recorded per day and growing • 500m Pageviews per month and growing
  4. 5.

    1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow Polyglot Persistence Polyglot Persistence, like polyglot programming, is all about choosing the right persistence option for the task at hand. http://www.sleberknight.com/blog/sleberkn/entry/polyglot_persistence
  5. 7.

    1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow Why? • Heavier READ loads vs heavier write loads • Data relationships may be less important • Different aspects of a system have different requirements
  6. 16.

    1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow • Large data volume ingestion • Really fast writes to many locations (eventual consistency) • Query by column groups within rows • Range queries in Hive (partial CF scans) Cassandra
  7. 17.

    1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow • Fast atomic increments (Node.js is native JSON) • Sharding for faster distributed increments • Solid ORM for Rails (MongoID) • Fast access for pub/sub of durable/persisted documents mongoDB
  8. 18.

    1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow • Supports hundreds of thousands transactions per second • Great caching engine • Supports useful variable types like sorted set • Pay SerDe price on each access Redis
  9. 19.

    1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow • Column Stores for ad-hoc analytics queries in SQL • Databases built for business intelligence • Heavy compression of data • Pre-aggregated data (Extents/Knowledge Grid) InfiniDB and Infobright
  10. 20.

    1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow • Polyglottany doesn’t only apply to data stores • Each language has its own benefit to each data storage layer • Each language has its own individual benefits • JSON, APIs, Performance Ruby, Node.js, Python
  11. 22.

    1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow Cons • Redis - Can only utilize a single core • MySQL Column Store - DELETE/UPDATEs are VERY expensive • Cassandra - No btree indexes • Mongo - Queries slow down when shard count increases. Indexes must fit in memory • Python - Whitespace. Community • Ruby - Not high performance enough for our standards • Javascript (Node.js) - Bad for CPU or IO intensive workloads
  12. 23.

    1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow Tying It Together • Built in the cloud • Service Oriented Architecture (Internal API) • Built Helenus (Cassandra Node.js driver) • Data accuracy checks: visual and programmatic • Built framework for testing out storage engines
  13. 24.

    1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow Service Architecture Internal API Analytics Real-time
  14. 25.

    1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow Helenus • Built Node.js driver for Cassandra • https://github.com/simplereach/helenus • CQL 2/3, Composite Column, Thrift Interface • More about Node.js and Cassandra
  15. 26.

    1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow Points To Consider • Data consistency - Same in all data stores • How important is data durability? • Managing many servers (Chef, AWS, CSSH) • Managing and learning many different applications and tuning for them
  16. 27.

    1,2,3,4 Add Another Data Store (And Other Rhymes) Eric Lubow

    @elubow Summary • Polyglottany is not a sin • Know your data read/write patterns • Know the tools available to you • Know your compromises
  17. 29.

    Questions are guaranteed in life. Answers aren’t. Eric Lubow @elubow

    elubow@simplereach.com #cassandra12 Thank you.