Slide 1

Slide 1 text

Building a real time analytics engine in JRuby David Dahl @effata

Slide 2

Slide 2 text

whoami ‣ Senior developer at Burt ‣ Analytics for online advertising ‣ Ruby lovers since 2009 ‣ AWS

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

Getting started ‣ Writing everything to mysql, querying for every report - Broke down on first major campaign ‣ Precalculate all the things! ‣ Every operation in one application - Extremely scary to deploy ‣ Still sticking to MRI

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

Stuck ‣ Separate and buffer with RabbitMQ - Eventmachine ‣ Store stuff with MongoDB - Blocking operations ‣ Bad things

Slide 9

Slide 9 text

Java? ‣ Threading ‣ “Enterprise” ‣ Lots of libraries Think about creating something Java ecosystem Discover someone has made it for you already Profit!

Slide 10

Slide 10 text

Moving to JRuby ‣ Threads! ‣ A real GC ‣ JIT ‣ Every Java, Scala, Ruby lib ever made ‣ Wrapping java libraries is fun! ‣ Bonus: Not hating yourself

Slide 11

Slide 11 text

Challenges

Slide 12

Slide 12 text

“100%” uptime ‣ We can “never” be down! ‣ But we can pause ‣ Don’t want to fail on errors ‣ But it’s ok to die

Slide 13

Slide 13 text

Buffering ‣ Split into isolated services ‣ Add a buffering pipeline between - We LOVE RabbitMQ ‣ Ack and persist in a “transaction” ‣ Figure out if you want - at most once - at least once

Slide 14

Slide 14 text

Databases ‣ Pick the right tool for the job ‣ MongoDB everywhere = bad ‣ Cassandra ‣ Redis ‣ NoDB - keep it streaming!

Slide 15

Slide 15 text

Java.util.concurrent

Slide 16

Slide 16 text

Shortcut

Slide 17

Slide 17 text

Executors Better than doing Thread.new

Slide 18

Slide 18 text

thread_pool = ! Executors.new_fixed_thread_pool(16) stuff.each do |item| thread_pool.submit do crunch_stuff(item) end end

Slide 19

Slide 19 text

Blocking queues Producer/consumer pattern made easy Don’t forget back pressure!

Slide 20

Slide 20 text

queue = ! JavaConcurrent::LinkedBlockingQueue.new # With timeout queue.offer(data, 60, Java::TimeUnit::SECONDS) queue.poll(60, Java::TimeUnit::SECONDS) # Blocking queue.put(data) queue.take

Slide 21

Slide 21 text

Back pressure Storage Timer Data processing Queue State

Slide 22

Slide 22 text

queue = ! JavaConcurrent::ArrayBlockingQueue.new(100) # With timeout queue.offer(data, 60, Java::TimeUnit::SECONDS) queue.poll(60, Java::TimeUnit::SECONDS) # Blocking queue.put(data) queue.take

Slide 23

Slide 23 text

More awesomeness ‣ Java.util.concurrent - Atomic(Boolean/Integer/Long) - ConcurrentHashMap - CountDownLatch / Semaphore ‣ Google Guava ‣ LMAX Disruptor

Slide 24

Slide 24 text

Easy mode ‣ Thread safety is hard ‣ Use j.u.c ‣ Avoid shared mutual state if possible ‣ Back pressure

Slide 25

Slide 25 text

Actors Another layer of abstractions

Slide 26

Slide 26 text

Akka Concurrency library in Scala Most famous for its actor implementation

Slide 27

Slide 27 text

Mikka Small ruby wrapper around Akka

Slide 28

Slide 28 text

class SomeActor < Mikka::Actor def receive(message) # do the thing end end

Slide 29

Slide 29 text

Storm github.com/colinsurprenant/redstorm

Slide 30

Slide 30 text

We broke it But YOU should definitely try it out!

Slide 31

Slide 31 text

Hadoop github.com/iconara/rubydoop

Slide 32

Slide 32 text

module WordCount class Mapper def map(key, value, context) # ... end end class Reducer def reduce(key, value, context) # ... end end end

Slide 33

Slide 33 text

Rubydoop.configure do |input_path, output_path| job 'word_count' do input input_path output output_path mapper WordCount::Mapper reducer WordCount::Reducer output_key Hadoop::Io::Text output_value Hadoop::Io::IntWritable end end

Slide 34

Slide 34 text

Other cool stuff ‣ Hotbunnies ‣ Eurydice ‣ Bundesstrasse ‣ Multimeter

Slide 35

Slide 35 text

Thank you @effata david@burtcorp.com