Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Building a real time analytics engine in JRuby
Search
David Dahl
March 02, 2013
Programming
1
510
Building a real time analytics engine in JRuby
David Dahl
March 02, 2013
Tweet
Share
More Decks by David Dahl
See All by David Dahl
Nosql - getting over the bad parts
effata
1
110
Other Decks in Programming
See All in Programming
AIコーディングAgentとの向き合い方
eycjur
0
280
How Android Uses Data Structures Behind The Scenes
l2hyunwoo
0
480
Swift Updates - Learn Languages 2025
koher
2
510
AIでLINEスタンプを作ってみた
eycjur
1
230
Oracle Database Technology Night 92 Database Connection control FAN-AC
oracle4engineer
PRO
1
470
Rancher と Terraform
fufuhu
2
550
Kiroで始めるAI-DLC
kaonash
2
620
The Past, Present, and Future of Enterprise Java
ivargrimstad
0
410
AI時代のUIはどこへ行く?
yusukebe
18
9.1k
スケールする組織の実現に向けた インナーソース育成術 - ISGT2025
teamlab
PRO
1
160
そのAPI、誰のため? Androidライブラリ設計における利用者目線の実践テクニック
mkeeda
2
1.8k
Amazon RDS 向けに提供されている MCP Server と仕組みを調べてみた/jawsug-okayama-2025-aurora-mcp
takahashiikki
1
110
Featured
See All Featured
Mobile First: as difficult as doing things right
swwweet
224
9.9k
Become a Pro
speakerdeck
PRO
29
5.5k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
285
14k
Optimizing for Happiness
mojombo
379
70k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
667
120k
A better future with KSS
kneath
239
17k
Done Done
chrislema
185
16k
Statistics for Hackers
jakevdp
799
220k
The Cost Of JavaScript in 2023
addyosmani
53
8.9k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
139
34k
Code Review Best Practice
trishagee
71
19k
How to train your dragon (web standard)
notwaldorf
96
6.2k
Transcript
Building a real time analytics engine in JRuby David Dahl
@effata
whoami ‣ Senior developer at Burt ‣ Analytics for online
advertising ‣ Ruby lovers since 2009 ‣ AWS
None
None
None
Getting started ‣ Writing everything to mysql, querying for every
report - Broke down on first major campaign ‣ Precalculate all the things! ‣ Every operation in one application - Extremely scary to deploy ‣ Still sticking to MRI
None
Stuck ‣ Separate and buffer with RabbitMQ - Eventmachine ‣
Store stuff with MongoDB - Blocking operations ‣ Bad things
Java? ‣ Threading ‣ “Enterprise” ‣ Lots of libraries Think
about creating something Java ecosystem Discover someone has made it for you already Profit!
Moving to JRuby ‣ Threads! ‣ A real GC ‣
JIT ‣ Every Java, Scala, Ruby lib ever made ‣ Wrapping java libraries is fun! ‣ Bonus: Not hating yourself
Challenges
“100%” uptime ‣ We can “never” be down! ‣ But
we can pause ‣ Don’t want to fail on errors ‣ But it’s ok to die
Buffering ‣ Split into isolated services ‣ Add a buffering
pipeline between - We LOVE RabbitMQ ‣ Ack and persist in a “transaction” ‣ Figure out if you want - at most once - at least once
Databases ‣ Pick the right tool for the job ‣
MongoDB everywhere = bad ‣ Cassandra ‣ Redis ‣ NoDB - keep it streaming!
Java.util.concurrent
Shortcut
Executors Better than doing Thread.new
thread_pool = ! Executors.new_fixed_thread_pool(16) stuff.each do |item| thread_pool.submit do crunch_stuff(item)
end end
Blocking queues Producer/consumer pattern made easy Don’t forget back pressure!
queue = ! JavaConcurrent::LinkedBlockingQueue.new # With timeout queue.offer(data, 60, Java::TimeUnit::SECONDS)
queue.poll(60, Java::TimeUnit::SECONDS) # Blocking queue.put(data) queue.take
Back pressure Storage Timer Data processing Queue State
queue = ! JavaConcurrent::ArrayBlockingQueue.new(100) # With timeout queue.offer(data, 60, Java::TimeUnit::SECONDS)
queue.poll(60, Java::TimeUnit::SECONDS) # Blocking queue.put(data) queue.take
More awesomeness ‣ Java.util.concurrent - Atomic(Boolean/Integer/Long) - ConcurrentHashMap - CountDownLatch
/ Semaphore ‣ Google Guava ‣ LMAX Disruptor
Easy mode ‣ Thread safety is hard ‣ Use j.u.c
‣ Avoid shared mutual state if possible ‣ Back pressure
Actors Another layer of abstractions
Akka Concurrency library in Scala Most famous for its actor
implementation
Mikka Small ruby wrapper around Akka
class SomeActor < Mikka::Actor def receive(message) # do the thing
end end
Storm github.com/colinsurprenant/redstorm
We broke it But YOU should definitely try it out!
Hadoop github.com/iconara/rubydoop
module WordCount class Mapper def map(key, value, context) # ...
end end class Reducer def reduce(key, value, context) # ... end end end
Rubydoop.configure do |input_path, output_path| job 'word_count' do input input_path output
output_path mapper WordCount::Mapper reducer WordCount::Reducer output_key Hadoop::Io::Text output_value Hadoop::Io::IntWritable end end
Other cool stuff ‣ Hotbunnies ‣ Eurydice ‣ Bundesstrasse ‣
Multimeter
Thank you @effata
[email protected]