How Keen IO uses Storm

Aee8ace6215b362ce4524bfdfc4a718c?s=47 Josh Dzielak
October 18, 2013

How Keen IO uses Storm

This presentation discusses Storm, the distributed computation system, and how it’s used at Keen IO. Redstorm, which makes it possible to build Storm topologies in Ruby, is also discussed. First given at #sfrails in October 2013.

Aee8ace6215b362ce4524bfdfc4a718c?s=128

Josh Dzielak

October 18, 2013
Tweet

Transcript

  1. How Keen IO uses Storm Josh Dzielak @dzello 10/18/2013 #sfrails

  2. About me * Full stack developer spanning 2 millenia *

    Helped found & build Togetherville (Disney) Ruby 1.8.7 and Rails 2.3.8 FTW! * Author of mongoid_alize, four & keen-gem. * Mentor at HackBright & HackReactor * Currently VP Engineering at Keen IO
  3. An analytics API for the modern developer. http://keen.io But is

    it for me?
  4. I’d rather spend time building features I know I need

    analytics, but Sendgrid Iron.io Twilio Heroku Pusher I use APIs Would I like Keen IO? YES You probably would!
  5. Tech @ Keen IO Tornado API Server Flask on the

    web Official & community SDK’s Ruby is very popular!
  6. Tech @ Keen IO Our old backend Pros: * Fast

    writes * Easy to setup * Develop features quickly Cons/what we outgrew: * Ad-hoc query performance * Operational ease * Aggregation features
  7. Tech @ Keen IO Our new backend STORM

  8. Heard of Storm? NO

  9. What is Storm? a) A project with 7,000+ followers on

    Github b) Low-latency distributed computation system c) WNBA team in Seattle d) Capable of streaming map-reduce Pop quiz! Storm is: e) All of the above
  10. Storm Primitives SPOUT pulls from data sources BOLT Does some

    processing Username Level Date dzello 99 2013-10-17 TUPLE What’s on the wire
  11. Storm, Deployed ExampleTopology Host 1 Host 2 Host 3 Worker

    1 Worker 2 Worker 3 Worker 4 Bolt Bolt Bolt Bolt Bolt Bolt Bolt Bolt Bolt Bolt Spout Spout Spout Spout Data Source Bolt
  12. Common Storm Myths Myth: Clouds don’t like Storms. Storm deploys

    to any cloud. https://github.com/nathanmarz/storm-deploy
  13. Storm at Keen IO The primary logical layer for storing

    events and performing queries. Cassandra distributes the data & Storm distributes the computation. Because Storm and Cassandra scale linearly, we can perform writes and queries with low latency, high throughput, all while remaining fault tolerant.
  14. How fast is this? The Write Topology Storm Nodes Cassandra

    Nodes Events/Sec 3 6 50,000+ The Query Topology Query Type Collection Size (events) Mean Response Time Full Count 100M >100ms Average w/ groups 100M 300ms Sum over a field 600M 800ms
  15. The Write Topology Tornado API Kafka Kafka Spout Zookeeper keeps

    the peace EventPartitioner Bolt EventPartitioner Bolt PartitionEvent Bolt PersistEvent Bolt PersistEvent Bolt PersistEvent Bolt Cassandra enforces exactly-once semantics splits the work Kafka Spout Kafka Spout keeps the data fault-tolerance starts here
  16. The Query Topology Tornado API DRPC Spout Zookeeper keeping the

    peace EventPartitioner Bolt EventPartitioner Bolt IndexExpander Bolt PersistEvent Bolt PersistEvent Bolt BucketReducer Bolt Cassandra emits matching buckets Storm DRPC Server DRPC Spout DRPC Spout Aggregation Bolt keeping the data reduces each bucket returns response
  17. Haz Storm for Ruby? REDSTORM https://github.com/colinsurprenant/redstorm Elegant JRuby bindings for

    Storm. Includes batteries: CLI scripts to package jars & work with storm locally and deploy t a cluster. Very easy way to get familiar with Storm. Simple twitter streaming example – https://github.com/dzello/ontweet
  18. Hello, redstorm

  19. Thanks #sfrails! More resources for Storm & distributed systems http://www.michael-noll.com/blog/2012/10/16/understanding-the-

    parallelism-of-a-storm-topology/ https://speakerdeck.com/dzello/distributed-systems-are-everywhere- where-the-full-stack-is-headed http://storm-project.net/ https://github.com/colinsurprenant/redstorm/wiki/Ruby-DSL-Documentation
  20. Coming to defrag? (November 4th - 6th) Check out my

    talk: One Billion Per Second The Rise of Designer Data Architectures http://defragcon.com/2013/agenda/
  21. Thanks! Questions? Talk at my face or email me at

    josh@keen.io