Building a real time analytics engine in JRuby

Building a real time analytics engine in JRuby David Dahl
@effata

whoami ‣ Senior developer at Burt ‣ Analytics for online
advertising ‣ Ruby lovers since 2009 ‣ AWS

Getting started ‣ Writing everything to mysql, querying for every
report - Broke down on first major campaign ‣ Precalculate all the things! ‣ Every operation in one application - Extremely scary to deploy ‣ Still sticking to MRI

Stuck ‣ Separate and buffer with RabbitMQ - Eventmachine ‣
Store stuff with MongoDB - Blocking operations ‣ Bad things

Java? ‣ Threading ‣ “Enterprise” ‣ Lots of libraries Think
about creating something Java ecosystem Discover someone has made it for you already Proﬁt!

Moving to JRuby ‣ Threads! ‣ A real GC ‣
JIT ‣ Every Java, Scala, Ruby lib ever made ‣ Wrapping java libraries is fun! ‣ Bonus: Not hating yourself

Challenges

“100%” uptime ‣ We can “never” be down! ‣ But
we can pause ‣ Don’t want to fail on errors ‣ But it’s ok to die

Buffering ‣ Split into isolated services ‣ Add a buffering
pipeline between - We LOVE RabbitMQ ‣ Ack and persist in a “transaction” ‣ Figure out if you want - at most once - at least once

Databases ‣ Pick the right tool for the job ‣
MongoDB everywhere = bad ‣ Cassandra ‣ Redis ‣ NoDB - keep it streaming!

Java.util.concurrent

Shortcut

Executors Better than doing Thread.new

thread_pool = ! Executors.new_fixed_thread_pool(16) stuff.each do |item| thread_pool.submit do crunch_stuff(item)
end end

Blocking queues Producer/consumer pattern made easy Don’t forget back pressure!

queue = ! JavaConcurrent::LinkedBlockingQueue.new # With timeout queue.offer(data, 60, Java::TimeUnit::SECONDS)
queue.poll(60, Java::TimeUnit::SECONDS) # Blocking queue.put(data) queue.take

Back pressure Storage Timer Data processing Queue State

queue = ! JavaConcurrent::ArrayBlockingQueue.new(100) # With timeout queue.offer(data, 60, Java::TimeUnit::SECONDS)
queue.poll(60, Java::TimeUnit::SECONDS) # Blocking queue.put(data) queue.take

More awesomeness ‣ Java.util.concurrent - Atomic(Boolean/Integer/Long) - ConcurrentHashMap - CountDownLatch
/ Semaphore ‣ Google Guava ‣ LMAX Disruptor

Easy mode ‣ Thread safety is hard ‣ Use j.u.c
‣ Avoid shared mutual state if possible ‣ Back pressure

Actors Another layer of abstractions

Akka Concurrency library in Scala Most famous for its actor
implementation

Mikka Small ruby wrapper around Akka

class SomeActor < Mikka::Actor def receive(message) # do the thing
end end

Storm github.com/colinsurprenant/redstorm

We broke it But YOU should definitely try it out!

Hadoop github.com/iconara/rubydoop

module WordCount class Mapper def map(key, value, context) # ...
end end class Reducer def reduce(key, value, context) # ... end end end

Rubydoop.configure do |input_path, output_path| job 'word_count' do input input_path output
output_path mapper WordCount::Mapper reducer WordCount::Reducer output_key Hadoop::Io::Text output_value Hadoop::Io::IntWritable end end

Other cool stuff ‣ Hotbunnies ‣ Eurydice ‣ Bundesstrasse ‣
Multimeter

Thank you @effata [email protected]

Building a real time analytics engine in JRuby

Building a real time analytics engine in JRuby

David Dahl

More Decks by David Dahl

Other Decks in Programming

Featured

Transcript