Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Building a real time analytics engine in JRuby
Search
David Dahl
March 02, 2013
Programming
1
510
Building a real time analytics engine in JRuby
David Dahl
March 02, 2013
Tweet
Share
More Decks by David Dahl
See All by David Dahl
Nosql - getting over the bad parts
effata
1
110
Other Decks in Programming
See All in Programming
202507_ADKで始めるエージェント開発の基本 〜デモを通じて紹介〜(奥田りさ)The Basics of Agent Development with ADK — A Demo-Focused Introduction
risatube
PRO
5
1.3k
Gemini CLI のはじめ方
ttnyt8701
1
110
可変性を制する設計: 構造と振る舞いから考える概念モデリングとその実装
a_suenami
8
1k
No Install CMS戦略 〜 5年先を見据えたフロントエンド開発を考える / no_install_cms
rdlabo
0
390
AI Agent 時代のソフトウェア開発を支える AWS Cloud Development Kit (CDK)
konokenj
6
1k
SwiftでMCPサーバーを作ろう!
giginet
PRO
2
210
階層化自動テストで開発に機動力を
ickx
1
450
構造化・自動化・ガードレール - Vibe Coding実践記 -
tonegawa07
0
160
Strands Agents で実現する名刺解析アーキテクチャ
omiya0555
1
110
React は次の10年を生き残れるか:3つのトレンドから考える
oukayuka
40
16k
TypeScriptでDXを上げろ! Hono編
yusukebe
3
890
Android 15以上でPDFのテキスト検索を爆速開発!
tonionagauzzi
0
170
Featured
See All Featured
For a Future-Friendly Web
brad_frost
179
9.8k
StorybookのUI Testing Handbookを読んだ
zakiyama
30
5.9k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.4k
How GitHub (no longer) Works
holman
314
140k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
26
2.9k
Making the Leap to Tech Lead
cromwellryan
134
9.4k
The Pragmatic Product Professional
lauravandoore
35
6.8k
Scaling GitHub
holman
461
140k
Adopting Sorbet at Scale
ufuk
77
9.5k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
48
2.9k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
8
860
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
110
19k
Transcript
Building a real time analytics engine in JRuby David Dahl
@effata
whoami ‣ Senior developer at Burt ‣ Analytics for online
advertising ‣ Ruby lovers since 2009 ‣ AWS
None
None
None
Getting started ‣ Writing everything to mysql, querying for every
report - Broke down on first major campaign ‣ Precalculate all the things! ‣ Every operation in one application - Extremely scary to deploy ‣ Still sticking to MRI
None
Stuck ‣ Separate and buffer with RabbitMQ - Eventmachine ‣
Store stuff with MongoDB - Blocking operations ‣ Bad things
Java? ‣ Threading ‣ “Enterprise” ‣ Lots of libraries Think
about creating something Java ecosystem Discover someone has made it for you already Profit!
Moving to JRuby ‣ Threads! ‣ A real GC ‣
JIT ‣ Every Java, Scala, Ruby lib ever made ‣ Wrapping java libraries is fun! ‣ Bonus: Not hating yourself
Challenges
“100%” uptime ‣ We can “never” be down! ‣ But
we can pause ‣ Don’t want to fail on errors ‣ But it’s ok to die
Buffering ‣ Split into isolated services ‣ Add a buffering
pipeline between - We LOVE RabbitMQ ‣ Ack and persist in a “transaction” ‣ Figure out if you want - at most once - at least once
Databases ‣ Pick the right tool for the job ‣
MongoDB everywhere = bad ‣ Cassandra ‣ Redis ‣ NoDB - keep it streaming!
Java.util.concurrent
Shortcut
Executors Better than doing Thread.new
thread_pool = ! Executors.new_fixed_thread_pool(16) stuff.each do |item| thread_pool.submit do crunch_stuff(item)
end end
Blocking queues Producer/consumer pattern made easy Don’t forget back pressure!
queue = ! JavaConcurrent::LinkedBlockingQueue.new # With timeout queue.offer(data, 60, Java::TimeUnit::SECONDS)
queue.poll(60, Java::TimeUnit::SECONDS) # Blocking queue.put(data) queue.take
Back pressure Storage Timer Data processing Queue State
queue = ! JavaConcurrent::ArrayBlockingQueue.new(100) # With timeout queue.offer(data, 60, Java::TimeUnit::SECONDS)
queue.poll(60, Java::TimeUnit::SECONDS) # Blocking queue.put(data) queue.take
More awesomeness ‣ Java.util.concurrent - Atomic(Boolean/Integer/Long) - ConcurrentHashMap - CountDownLatch
/ Semaphore ‣ Google Guava ‣ LMAX Disruptor
Easy mode ‣ Thread safety is hard ‣ Use j.u.c
‣ Avoid shared mutual state if possible ‣ Back pressure
Actors Another layer of abstractions
Akka Concurrency library in Scala Most famous for its actor
implementation
Mikka Small ruby wrapper around Akka
class SomeActor < Mikka::Actor def receive(message) # do the thing
end end
Storm github.com/colinsurprenant/redstorm
We broke it But YOU should definitely try it out!
Hadoop github.com/iconara/rubydoop
module WordCount class Mapper def map(key, value, context) # ...
end end class Reducer def reduce(key, value, context) # ... end end end
Rubydoop.configure do |input_path, output_path| job 'word_count' do input input_path output
output_path mapper WordCount::Mapper reducer WordCount::Reducer output_key Hadoop::Io::Text output_value Hadoop::Io::IntWritable end end
Other cool stuff ‣ Hotbunnies ‣ Eurydice ‣ Bundesstrasse ‣
Multimeter
Thank you @effata
[email protected]