Slide 1

Slide 1 text

How Keen IO uses Storm Josh Dzielak @dzello 10/18/2013 #sfrails

Slide 2

Slide 2 text

About me * Full stack developer spanning 2 millenia * Helped found & build Togetherville (Disney) Ruby 1.8.7 and Rails 2.3.8 FTW! * Author of mongoid_alize, four & keen-gem. * Mentor at HackBright & HackReactor * Currently VP Engineering at Keen IO

Slide 3

Slide 3 text

An analytics API for the modern developer. http://keen.io But is it for me?

Slide 4

Slide 4 text

I’d rather spend time building features I know I need analytics, but Sendgrid Iron.io Twilio Heroku Pusher I use APIs Would I like Keen IO? YES You probably would!

Slide 5

Slide 5 text

Tech @ Keen IO Tornado API Server Flask on the web Official & community SDK’s Ruby is very popular!

Slide 6

Slide 6 text

Tech @ Keen IO Our old backend Pros: * Fast writes * Easy to setup * Develop features quickly Cons/what we outgrew: * Ad-hoc query performance * Operational ease * Aggregation features

Slide 7

Slide 7 text

Tech @ Keen IO Our new backend STORM

Slide 8

Slide 8 text

Heard of Storm? NO

Slide 9

Slide 9 text

What is Storm? a) A project with 7,000+ followers on Github b) Low-latency distributed computation system c) WNBA team in Seattle d) Capable of streaming map-reduce Pop quiz! Storm is: e) All of the above

Slide 10

Slide 10 text

Storm Primitives SPOUT pulls from data sources BOLT Does some processing Username Level Date dzello 99 2013-10-17 TUPLE What’s on the wire

Slide 11

Slide 11 text

Storm, Deployed ExampleTopology Host 1 Host 2 Host 3 Worker 1 Worker 2 Worker 3 Worker 4 Bolt Bolt Bolt Bolt Bolt Bolt Bolt Bolt Bolt Bolt Spout Spout Spout Spout Data Source Bolt

Slide 12

Slide 12 text

Common Storm Myths Myth: Clouds don’t like Storms. Storm deploys to any cloud. https://github.com/nathanmarz/storm-deploy

Slide 13

Slide 13 text

Storm at Keen IO The primary logical layer for storing events and performing queries. Cassandra distributes the data & Storm distributes the computation. Because Storm and Cassandra scale linearly, we can perform writes and queries with low latency, high throughput, all while remaining fault tolerant.

Slide 14

Slide 14 text

How fast is this? The Write Topology Storm Nodes Cassandra Nodes Events/Sec 3 6 50,000+ The Query Topology Query Type Collection Size (events) Mean Response Time Full Count 100M >100ms Average w/ groups 100M 300ms Sum over a field 600M 800ms

Slide 15

Slide 15 text

The Write Topology Tornado API Kafka Kafka Spout Zookeeper keeps the peace EventPartitioner Bolt EventPartitioner Bolt PartitionEvent Bolt PersistEvent Bolt PersistEvent Bolt PersistEvent Bolt Cassandra enforces exactly-once semantics splits the work Kafka Spout Kafka Spout keeps the data fault-tolerance starts here

Slide 16

Slide 16 text

The Query Topology Tornado API DRPC Spout Zookeeper keeping the peace EventPartitioner Bolt EventPartitioner Bolt IndexExpander Bolt PersistEvent Bolt PersistEvent Bolt BucketReducer Bolt Cassandra emits matching buckets Storm DRPC Server DRPC Spout DRPC Spout Aggregation Bolt keeping the data reduces each bucket returns response

Slide 17

Slide 17 text

Haz Storm for Ruby? REDSTORM https://github.com/colinsurprenant/redstorm Elegant JRuby bindings for Storm. Includes batteries: CLI scripts to package jars & work with storm locally and deploy t a cluster. Very easy way to get familiar with Storm. Simple twitter streaming example – https://github.com/dzello/ontweet

Slide 18

Slide 18 text

Hello, redstorm

Slide 19

Slide 19 text

Thanks #sfrails! More resources for Storm & distributed systems http://www.michael-noll.com/blog/2012/10/16/understanding-the- parallelism-of-a-storm-topology/ https://speakerdeck.com/dzello/distributed-systems-are-everywhere- where-the-full-stack-is-headed http://storm-project.net/ https://github.com/colinsurprenant/redstorm/wiki/Ruby-DSL-Documentation

Slide 20

Slide 20 text

Coming to defrag? (November 4th - 6th) Check out my talk: One Billion Per Second The Rise of Designer Data Architectures http://defragcon.com/2013/agenda/

Slide 21

Slide 21 text

Thanks! Questions? Talk at my face or email me at [email protected]