Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Storm: the Hadoop of Realtime Stream Processing
Search
Gabriel Grant
March 25, 2012
Programming
2
1.3k
Storm: the Hadoop of Realtime Stream Processing
Twitter's new scalable, fault-tolerant, and simple(ish) stream programming system... with Python!
Gabriel Grant
March 25, 2012
Tweet
Share
More Decks by Gabriel Grant
See All by Gabriel Grant
Painting Rainbows: Building Bridges in the Cloud
gabrielgrant
1
230
Other Decks in Programming
See All in Programming
「接続」—パフォーマンスチューニングの最後の一手 〜点と点を結ぶ、その一瞬のために〜
kentaroutakeda
2
480
PHPのバージョンアップ時にも役立ったAST(2026年版)
matsuo_atsushi
0
130
grapheme_strrev関数が採択されました(あと雑感)
youkidearitai
PRO
1
230
Go Conference mini in Sendai 2026 : Goに新機能を提案し実装されるまでのフロー徹底解説
yamatoya
0
610
Claude Codeセッション現状確認 2026福岡 / fukuoka-aicoding-00-beacon
monochromegane
4
440
クライアントワークでSREをするということ。あるいは事業会社におけるSREと同じこと・違うこと
nnaka2992
1
350
Claude Code の Skill で複雑な既存仕様をすっきり整理しよう
yuichirokato
1
410
AI駆動開発の本音 〜Claude Code並列開発で見えたエンジニアの新しい役割〜
hisuzuya
4
520
Swift ConcurrencyでよりSwiftyに
yuukiw00w
0
270
エンジニアの「手元の自動化」を加速するn8n 2026.02.27
symy2co
0
160
AI 開発合宿を通して得た学び
niftycorp
PRO
0
150
AI時代のシステム設計:ドメインモデルで変更しやすさを守る設計戦略
masuda220
PRO
5
1.1k
Featured
See All Featured
JAMstack: Web Apps at Ludicrous Speed - All Things Open 2022
reverentgeek
1
390
What's in a price? How to price your products and services
michaelherold
247
13k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
199
73k
Facilitating Awesome Meetings
lara
57
6.8k
The Language of Interfaces
destraynor
162
26k
Measuring Dark Social's Impact On Conversion and Attribution
stephenakadiri
1
160
Building Better People: How to give real-time feedback that sticks.
wjessup
370
20k
Navigating the moral maze — ethical principles for Al-driven product design
skipperchong
2
290
Fantastic passwords and where to find them - at NoRuKo
philnash
52
3.6k
How to optimise 3,500 product descriptions for ecommerce in one day using ChatGPT
katarinadahlin
PRO
1
3.5k
Statistics for Hackers
jakevdp
799
230k
Exploring the relationship between traditional SERPs and Gen AI search
raygrieselhuber
PRO
2
3.7k
Transcript
STORM Keeping it Real(time) Since 2011
HELLO.
dotCloud.com
DATA
DATA
MEGA-DATA
VERSION ONE
VERSION TWO
VERSION TWO
VERSION THREE
JOY
VERSION FOUR?
ENTER, STORM
REAL-TIME COMPUTATION
DISTRIBUTED RPC & STREAM PROCESSING
HISTORY
STREAM PROCESSING
STORM:REAL-TIME HADOOP:BATCH
WOW
HIGH VOLUME
CONTINUOUS
CONTINUOUS
FAULT TOLERANT
DOESN'T
PERSIST
PROCESS BATCHES RELIABLY
PROTECT AGAINST HUMAN ERROR
PROTECT AGAINST HUMAN ERROR
THREE CORE ELEMENTS
SPOUTS
STREAMS
BOLTS
TOPOLOGIES
TASKS
TASKS
OUTPUT ROUTING?
STREAM GROUPINGS
SHUFFLE GROUPING
FIELDS GROUPING
ALL GROUPING
GLOBAL GROUPING
DOWN 'N DIRTY
GATEWAYS
GATEWAYS
REAL-TIME GEOCODE BUCKETED CLIENT UPDATE
THE TOPOLOGY
THE TOPOLOGY
CODE TIME: START ECLIPSE
WAIT, WHAT?!
MULTILANG API
I'VE GOT YOU COVERED
UMBRELLA: IT PROTECTS YOU FROM STORM
THE TOPOLOGY
I'VE GOT YOU COVERED class RedisSpout(JVMSpout): class Default(Stream): fields =
'message' jvm_class = 'yieldbot.storm.spout'
I'VE GOT YOU COVERED class LogParserBolt(AutoAckBolt): class Default(Stream): fields =
'ip_address' def execute(self, input): ip_address = parse_log(input.message) self.emit(ip_address)
I'VE GOT YOU COVERED class GeolocatorBolt(AutoAckBolt): class Default(Stream): fields =
'lat', 'long' def __init__(self, *args, **kwargs): self.geoip = pygeoip.GeoIP('GeoLiteCity.dat') super(GeolocatorBolt, self) \ .__init__(*args, **kwargs) def execute(self, input): record = self.geoip.record_by_addr(input.ip) lat = record['latitude'] long_ = record['longitude'] self.emit((lat, long_))
I'VE GOT YOU COVERED class WSPuserBolt(Bolt): def __init__(self, *args, **kwargs):
self.batcher = TimeBatcher() self.pusher = zerorpc.Client(timeout=None) url = os.environ['WSPUSHER_ZERORPC_URL'] self.wspusher.connect(url) super(WSPusherBolt, self).__init__(*args, **kwargs def execute(self, input): t = time() batch = self.pop_batch(t) if batch: self.wspusher.push_list(batch) data = input.lat, input.long self.batcher.push_item(t, data)
I'VE GOT YOU COVERED class GeocoderTopology(Topology): # components redis =
RedisSpout(1) parser = LogParserBolt(3) geolocator = GeolocatorBolt(2) pusher = WSPuserBolt(4) # plumbing parser.inputs.append(ShuffleGrouping(redis)) geolocator.inputs.append(ShuffleGrouping(parser)) pusher.inputs.append( FieldsGrouping(geolocator, 'lat', 'long'))
INSIDE THE MACHINE
THREE COMPONENTS
NIMBUS
ZOOKEEPER CLUSTER
WORKER NODES
DETAILS
DEPLOYMENT
EC2?
DOTCLOUD!
$ git clone \ https://github.com/gabrielgrant/storm-on-dotcloud.git $ dotcloud push mystorm storm-on-dotcloud
… $ dotcloud scale worker=3
TESTING
JAVA
CLOJURE
ANT MAVEN
LINEINGEN
SCALING
WHEN
HOW
THE FUTURE: EASY & AUTO
THANKS!
GABRIEL GRANT @gabrielmgrant gabrielgrant.ca