Building a real
time analytics
engine in JRuby
David Dahl
@effata
Slide 2
Slide 2 text
whoami
‣
Senior developer at Burt
‣
Analytics for online advertising
‣
Ruby lovers since 2009
‣
AWS
Slide 3
Slide 3 text
No content
Slide 4
Slide 4 text
No content
Slide 5
Slide 5 text
No content
Slide 6
Slide 6 text
Getting started
‣
Writing everything to mysql, querying
for every report
- Broke down on first major campaign
‣
Precalculate all the things!
‣
Every operation in one application
- Extremely scary to deploy
‣
Still sticking to MRI
Slide 7
Slide 7 text
No content
Slide 8
Slide 8 text
Stuck
‣
Separate and buffer with RabbitMQ
- Eventmachine
‣
Store stuff with MongoDB
- Blocking operations
‣
Bad things
Slide 9
Slide 9 text
Java?
‣
Threading
‣
“Enterprise”
‣
Lots of libraries
Think about creating
something
Java ecosystem
Discover someone
has made it for you
already
Profit!
Slide 10
Slide 10 text
Moving to JRuby
‣
Threads!
‣
A real GC
‣
JIT
‣
Every Java, Scala, Ruby lib ever made
‣
Wrapping java libraries is fun!
‣
Bonus: Not hating yourself
Slide 11
Slide 11 text
Challenges
Slide 12
Slide 12 text
“100%” uptime
‣
We can “never” be down!
‣
But we can pause
‣
Don’t want to fail on errors
‣
But it’s ok to die
Slide 13
Slide 13 text
Buffering
‣
Split into isolated services
‣
Add a buffering pipeline between
- We LOVE RabbitMQ
‣
Ack and persist in a “transaction”
‣
Figure out if you want
- at most once
- at least once
Slide 14
Slide 14 text
Databases
‣
Pick the right tool for the job
‣
MongoDB everywhere = bad
‣
Cassandra
‣
Redis
‣
NoDB
- keep it streaming!
Slide 15
Slide 15 text
Java.util.concurrent
Slide 16
Slide 16 text
Shortcut
Slide 17
Slide 17 text
Executors
Better than doing Thread.new
Slide 18
Slide 18 text
thread_pool =
! Executors.new_fixed_thread_pool(16)
stuff.each do |item|
thread_pool.submit do
crunch_stuff(item)
end
end
Slide 19
Slide 19 text
Blocking queues
Producer/consumer pattern made easy
Don’t forget back pressure!