Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How Keen IO uses Storm

Josh Dzielak
October 18, 2013

How Keen IO uses Storm

This presentation discusses Storm, the distributed computation system, and how it’s used at Keen IO. Redstorm, which makes it possible to build Storm topologies in Ruby, is also discussed. First given at #sfrails in October 2013.

Josh Dzielak

October 18, 2013
Tweet

More Decks by Josh Dzielak

Other Decks in Technology

Transcript

  1. About me * Full stack developer spanning 2 millenia *

    Helped found & build Togetherville (Disney) Ruby 1.8.7 and Rails 2.3.8 FTW! * Author of mongoid_alize, four & keen-gem. * Mentor at HackBright & HackReactor * Currently VP Engineering at Keen IO
  2. I’d rather spend time building features I know I need

    analytics, but Sendgrid Iron.io Twilio Heroku Pusher I use APIs Would I like Keen IO? YES You probably would!
  3. Tech @ Keen IO Tornado API Server Flask on the

    web Official & community SDK’s Ruby is very popular!
  4. Tech @ Keen IO Our old backend Pros: * Fast

    writes * Easy to setup * Develop features quickly Cons/what we outgrew: * Ad-hoc query performance * Operational ease * Aggregation features
  5. What is Storm? a) A project with 7,000+ followers on

    Github b) Low-latency distributed computation system c) WNBA team in Seattle d) Capable of streaming map-reduce Pop quiz! Storm is: e) All of the above
  6. Storm Primitives SPOUT pulls from data sources BOLT Does some

    processing Username Level Date dzello 99 2013-10-17 TUPLE What’s on the wire
  7. Storm, Deployed ExampleTopology Host 1 Host 2 Host 3 Worker

    1 Worker 2 Worker 3 Worker 4 Bolt Bolt Bolt Bolt Bolt Bolt Bolt Bolt Bolt Bolt Spout Spout Spout Spout Data Source Bolt
  8. Common Storm Myths Myth: Clouds don’t like Storms. Storm deploys

    to any cloud. https://github.com/nathanmarz/storm-deploy
  9. Storm at Keen IO The primary logical layer for storing

    events and performing queries. Cassandra distributes the data & Storm distributes the computation. Because Storm and Cassandra scale linearly, we can perform writes and queries with low latency, high throughput, all while remaining fault tolerant.
  10. How fast is this? The Write Topology Storm Nodes Cassandra

    Nodes Events/Sec 3 6 50,000+ The Query Topology Query Type Collection Size (events) Mean Response Time Full Count 100M >100ms Average w/ groups 100M 300ms Sum over a field 600M 800ms
  11. The Write Topology Tornado API Kafka Kafka Spout Zookeeper keeps

    the peace EventPartitioner Bolt EventPartitioner Bolt PartitionEvent Bolt PersistEvent Bolt PersistEvent Bolt PersistEvent Bolt Cassandra enforces exactly-once semantics splits the work Kafka Spout Kafka Spout keeps the data fault-tolerance starts here
  12. The Query Topology Tornado API DRPC Spout Zookeeper keeping the

    peace EventPartitioner Bolt EventPartitioner Bolt IndexExpander Bolt PersistEvent Bolt PersistEvent Bolt BucketReducer Bolt Cassandra emits matching buckets Storm DRPC Server DRPC Spout DRPC Spout Aggregation Bolt keeping the data reduces each bucket returns response
  13. Haz Storm for Ruby? REDSTORM https://github.com/colinsurprenant/redstorm Elegant JRuby bindings for

    Storm. Includes batteries: CLI scripts to package jars & work with storm locally and deploy t a cluster. Very easy way to get familiar with Storm. Simple twitter streaming example – https://github.com/dzello/ontweet
  14. Thanks #sfrails! More resources for Storm & distributed systems http://www.michael-noll.com/blog/2012/10/16/understanding-the-

    parallelism-of-a-storm-topology/ https://speakerdeck.com/dzello/distributed-systems-are-everywhere- where-the-full-stack-is-headed http://storm-project.net/ https://github.com/colinsurprenant/redstorm/wiki/Ruby-DSL-Documentation
  15. Coming to defrag? (November 4th - 6th) Check out my

    talk: One Billion Per Second The Rise of Designer Data Architectures http://defragcon.com/2013/agenda/