Building a real time analytics engine in JRuby

Building a real time analytics engine in JRuby

D9b1cd6c39815c929de70c1a4cf9ee55?s=128

David Dahl

March 02, 2013
Tweet

Transcript

  1. Building a real time analytics engine in JRuby David Dahl

    @effata
  2. whoami ‣ Senior developer at Burt ‣ Analytics for online

    advertising ‣ Ruby lovers since 2009 ‣ AWS
  3. None
  4. None
  5. None
  6. Getting started ‣ Writing everything to mysql, querying for every

    report - Broke down on first major campaign ‣ Precalculate all the things! ‣ Every operation in one application - Extremely scary to deploy ‣ Still sticking to MRI
  7. None
  8. Stuck ‣ Separate and buffer with RabbitMQ - Eventmachine ‣

    Store stuff with MongoDB - Blocking operations ‣ Bad things
  9. Java? ‣ Threading ‣ “Enterprise” ‣ Lots of libraries Think

    about creating something Java ecosystem Discover someone has made it for you already Profit!
  10. Moving to JRuby ‣ Threads! ‣ A real GC ‣

    JIT ‣ Every Java, Scala, Ruby lib ever made ‣ Wrapping java libraries is fun! ‣ Bonus: Not hating yourself
  11. Challenges

  12. “100%” uptime ‣ We can “never” be down! ‣ But

    we can pause ‣ Don’t want to fail on errors ‣ But it’s ok to die
  13. Buffering ‣ Split into isolated services ‣ Add a buffering

    pipeline between - We LOVE RabbitMQ ‣ Ack and persist in a “transaction” ‣ Figure out if you want - at most once - at least once
  14. Databases ‣ Pick the right tool for the job ‣

    MongoDB everywhere = bad ‣ Cassandra ‣ Redis ‣ NoDB - keep it streaming!
  15. Java.util.concurrent

  16. Shortcut

  17. Executors Better than doing Thread.new

  18. thread_pool = ! Executors.new_fixed_thread_pool(16) stuff.each do |item| thread_pool.submit do crunch_stuff(item)

    end end
  19. Blocking queues Producer/consumer pattern made easy Don’t forget back pressure!

  20. queue = ! JavaConcurrent::LinkedBlockingQueue.new # With timeout queue.offer(data, 60, Java::TimeUnit::SECONDS)

    queue.poll(60, Java::TimeUnit::SECONDS) # Blocking queue.put(data) queue.take
  21. Back pressure Storage Timer Data processing Queue State

  22. queue = ! JavaConcurrent::ArrayBlockingQueue.new(100) # With timeout queue.offer(data, 60, Java::TimeUnit::SECONDS)

    queue.poll(60, Java::TimeUnit::SECONDS) # Blocking queue.put(data) queue.take
  23. More awesomeness ‣ Java.util.concurrent - Atomic(Boolean/Integer/Long) - ConcurrentHashMap - CountDownLatch

    / Semaphore ‣ Google Guava ‣ LMAX Disruptor
  24. Easy mode ‣ Thread safety is hard ‣ Use j.u.c

    ‣ Avoid shared mutual state if possible ‣ Back pressure
  25. Actors Another layer of abstractions

  26. Akka Concurrency library in Scala Most famous for its actor

    implementation
  27. Mikka Small ruby wrapper around Akka

  28. class SomeActor < Mikka::Actor def receive(message) # do the thing

    end end
  29. Storm github.com/colinsurprenant/redstorm

  30. We broke it But YOU should definitely try it out!

  31. Hadoop github.com/iconara/rubydoop

  32. module WordCount class Mapper def map(key, value, context) # ...

    end end class Reducer def reduce(key, value, context) # ... end end end
  33. Rubydoop.configure do |input_path, output_path| job 'word_count' do input input_path output

    output_path mapper WordCount::Mapper reducer WordCount::Reducer output_key Hadoop::Io::Text output_value Hadoop::Io::IntWritable end end
  34. Other cool stuff ‣ Hotbunnies ‣ Eurydice ‣ Bundesstrasse ‣

    Multimeter
  35. Thank you @effata david@burtcorp.com