Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How Keen IO uses Storm

Josh Dzielak
October 18, 2013

How Keen IO uses Storm

This presentation discusses Storm, the distributed computation system, and how it’s used at Keen IO. Redstorm, which makes it possible to build Storm topologies in Ruby, is also discussed. First given at #sfrails in October 2013.

Josh Dzielak

October 18, 2013
Tweet

More Decks by Josh Dzielak

Other Decks in Technology

Transcript

  1. How Keen IO
    uses Storm
    Josh Dzielak
    @dzello
    10/18/2013
    #sfrails

    View Slide

  2. About me
    * Full stack developer spanning 2 millenia
    * Helped found & build Togetherville (Disney)
    Ruby 1.8.7 and Rails 2.3.8 FTW!
    * Author of mongoid_alize, four & keen-gem.
    * Mentor at HackBright & HackReactor
    * Currently VP Engineering at Keen IO

    View Slide

  3. An analytics API for the modern developer.
    http://keen.io
    But is it for me?

    View Slide

  4. I’d rather
    spend time
    building
    features
    I know I need
    analytics, but
    Sendgrid
    Iron.io
    Twilio
    Heroku
    Pusher
    I use APIs
    Would I like Keen IO?
    YES
    You
    probably
    would!

    View Slide

  5. Tech @ Keen IO
    Tornado API Server
    Flask on the web
    Official & community SDK’s
    Ruby is very popular!

    View Slide

  6. Tech @ Keen IO
    Our old backend
    Pros:
    * Fast writes
    * Easy to setup
    * Develop features quickly
    Cons/what we outgrew:
    * Ad-hoc query performance
    * Operational ease
    * Aggregation features

    View Slide

  7. Tech @ Keen IO
    Our new backend
    STORM

    View Slide

  8. Heard of Storm?
    NO

    View Slide

  9. What is Storm?
    a) A project with 7,000+ followers on Github
    b) Low-latency distributed computation system
    c) WNBA team in Seattle
    d) Capable of streaming map-reduce
    Pop quiz! Storm is:
    e) All of the above

    View Slide

  10. Storm Primitives
    SPOUT
    pulls from data
    sources
    BOLT
    Does some
    processing
    Username Level Date
    dzello 99 2013-10-17
    TUPLE
    What’s on the
    wire

    View Slide

  11. Storm, Deployed
    ExampleTopology
    Host 1 Host 2 Host 3
    Worker 1
    Worker 2
    Worker 3 Worker 4
    Bolt
    Bolt
    Bolt
    Bolt
    Bolt
    Bolt
    Bolt
    Bolt
    Bolt
    Bolt
    Spout
    Spout
    Spout
    Spout Data Source
    Bolt

    View Slide

  12. Common Storm Myths
    Myth: Clouds don’t like Storms.
    Storm deploys to any cloud.
    https://github.com/nathanmarz/storm-deploy

    View Slide

  13. Storm at Keen IO
    The primary logical layer
    for storing events and performing queries.
    Cassandra distributes the data &
    Storm distributes the computation.
    Because Storm and Cassandra scale
    linearly, we can perform writes and
    queries with low latency, high throughput,
    all while remaining fault tolerant.

    View Slide

  14. How fast is this?
    The Write Topology
    Storm Nodes Cassandra Nodes Events/Sec
    3 6 50,000+
    The Query Topology
    Query Type Collection Size (events) Mean Response Time
    Full Count 100M >100ms
    Average w/ groups 100M 300ms
    Sum over a field 600M 800ms

    View Slide

  15. The Write Topology
    Tornado API
    Kafka
    Kafka Spout
    Zookeeper
    keeps the
    peace
    EventPartitioner
    Bolt
    EventPartitioner
    Bolt
    PartitionEvent
    Bolt PersistEvent
    Bolt
    PersistEvent
    Bolt
    PersistEvent
    Bolt
    Cassandra
    enforces exactly-once semantics
    splits the work
    Kafka Spout
    Kafka Spout
    keeps the
    data
    fault-tolerance starts here

    View Slide

  16. The Query Topology
    Tornado API
    DRPC Spout
    Zookeeper keeping the
    peace
    EventPartitioner
    Bolt
    EventPartitioner
    Bolt
    IndexExpander
    Bolt PersistEvent
    Bolt
    PersistEvent
    Bolt
    BucketReducer
    Bolt
    Cassandra
    emits matching
    buckets
    Storm DRPC Server
    DRPC Spout
    DRPC Spout
    Aggregation Bolt
    keeping the
    data
    reduces each bucket
    returns response

    View Slide

  17. Haz Storm for Ruby?
    REDSTORM
    https://github.com/colinsurprenant/redstorm
    Elegant JRuby bindings for Storm.
    Includes batteries:
    CLI scripts to package jars & work
    with storm locally and deploy t a cluster.
    Very easy way to get familiar with Storm.
    Simple twitter streaming example –
    https://github.com/dzello/ontweet

    View Slide

  18. Hello, redstorm

    View Slide

  19. Thanks #sfrails!
    More resources for Storm & distributed systems
    http://www.michael-noll.com/blog/2012/10/16/understanding-the-
    parallelism-of-a-storm-topology/
    https://speakerdeck.com/dzello/distributed-systems-are-everywhere-
    where-the-full-stack-is-headed
    http://storm-project.net/
    https://github.com/colinsurprenant/redstorm/wiki/Ruby-DSL-Documentation

    View Slide

  20. Coming to defrag?
    (November 4th - 6th)
    Check out my talk:
    One Billion Per Second
    The Rise of Designer Data Architectures
    http://defragcon.com/2013/agenda/

    View Slide

  21. Thanks!
    Questions?
    Talk at my face
    or email me at
    [email protected]

    View Slide