Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building a real time analytics engine in JRuby

Building a real time analytics engine in JRuby

David Dahl

March 02, 2013
Tweet

More Decks by David Dahl

Other Decks in Programming

Transcript

  1. whoami ‣ Senior developer at Burt ‣ Analytics for online

    advertising ‣ Ruby lovers since 2009 ‣ AWS
  2. Getting started ‣ Writing everything to mysql, querying for every

    report - Broke down on first major campaign ‣ Precalculate all the things! ‣ Every operation in one application - Extremely scary to deploy ‣ Still sticking to MRI
  3. Stuck ‣ Separate and buffer with RabbitMQ - Eventmachine ‣

    Store stuff with MongoDB - Blocking operations ‣ Bad things
  4. Java? ‣ Threading ‣ “Enterprise” ‣ Lots of libraries Think

    about creating something Java ecosystem Discover someone has made it for you already Profit!
  5. Moving to JRuby ‣ Threads! ‣ A real GC ‣

    JIT ‣ Every Java, Scala, Ruby lib ever made ‣ Wrapping java libraries is fun! ‣ Bonus: Not hating yourself
  6. “100%” uptime ‣ We can “never” be down! ‣ But

    we can pause ‣ Don’t want to fail on errors ‣ But it’s ok to die
  7. Buffering ‣ Split into isolated services ‣ Add a buffering

    pipeline between - We LOVE RabbitMQ ‣ Ack and persist in a “transaction” ‣ Figure out if you want - at most once - at least once
  8. Databases ‣ Pick the right tool for the job ‣

    MongoDB everywhere = bad ‣ Cassandra ‣ Redis ‣ NoDB - keep it streaming!
  9. queue = ! JavaConcurrent::LinkedBlockingQueue.new # With timeout queue.offer(data, 60, Java::TimeUnit::SECONDS)

    queue.poll(60, Java::TimeUnit::SECONDS) # Blocking queue.put(data) queue.take
  10. queue = ! JavaConcurrent::ArrayBlockingQueue.new(100) # With timeout queue.offer(data, 60, Java::TimeUnit::SECONDS)

    queue.poll(60, Java::TimeUnit::SECONDS) # Blocking queue.put(data) queue.take
  11. Easy mode ‣ Thread safety is hard ‣ Use j.u.c

    ‣ Avoid shared mutual state if possible ‣ Back pressure
  12. module WordCount class Mapper def map(key, value, context) # ...

    end end class Reducer def reduce(key, value, context) # ... end end end
  13. Rubydoop.configure do |input_path, output_path| job 'word_count' do input input_path output

    output_path mapper WordCount::Mapper reducer WordCount::Reducer output_key Hadoop::Io::Text output_value Hadoop::Io::IntWritable end end