Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Storm: the Hadoop of Realtime Stream Processing
Search
Gabriel Grant
March 25, 2012
Programming
3
1.3k
Storm: the Hadoop of Realtime Stream Processing
Twitter's new scalable, fault-tolerant, and simple(ish) stream programming system... with Python!
Gabriel Grant
March 25, 2012
Tweet
Share
More Decks by Gabriel Grant
See All by Gabriel Grant
Painting Rainbows: Building Bridges in the Cloud
gabrielgrant
1
220
Other Decks in Programming
See All in Programming
英語 × の私が、生成AIの力を借りて、OSSに初コントリビュートした話
personabb
0
170
国漢文混用体からHolloまで
minhee
1
140
タイムゾーンの奥地は思ったよりも闇深いかもしれない
suguruooki
1
430
Firebase Dynamic Linksの代替手段を自作する / Create your own Firebase Dynamic Links alternative
kubode
0
220
Chrome Extension Techniques from Hell
moznion
1
150
メモリウォールを超えて:キャッシュメモリ技術の進歩
kawayu
0
1.7k
リアルタイムレイトレーシング + ニューラルレンダリング簡単紹介 / Real-Time Ray Tracing & Neural Rendering: A Quick Introduction (2025)
shocker_0x15
1
280
MCP世界への招待: AIエンジニアが創る次世代エージェント連携の世界
gunta
4
860
SLI/SLOの設定を進めるその前に アラート品質の改善に取り組んだ話
tanden
3
800
「影響が少ない」を自分の目でみてみる
o0h
PRO
2
830
複数ドメインに散らばってしまった画像…! 運用中のPHPアプリに後からCDNを導入する…!
suguruooki
0
460
Java 24まとめ / Java 24 summary
kishida
3
410
Featured
See All Featured
Six Lessons from altMBA
skipperchong
27
3.7k
Music & Morning Musume
bryan
46
6.4k
Designing for humans not robots
tammielis
252
25k
KATA
mclloyd
29
14k
A designer walks into a library…
pauljervisheath
205
24k
Agile that works and the tools we love
rasmusluckow
328
21k
Building a Modern Day E-commerce SEO Strategy
aleyda
39
7.2k
Gamification - CAS2011
davidbonilla
81
5.2k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
28
9.4k
Git: the NoSQL Database
bkeepers
PRO
430
65k
Raft: Consensus for Rubyists
vanstee
137
6.9k
YesSQL, Process and Tooling at Scale
rocio
172
14k
Transcript
STORM Keeping it Real(time) Since 2011
HELLO.
dotCloud.com
DATA
DATA
MEGA-DATA
VERSION ONE
VERSION TWO
VERSION TWO
VERSION THREE
JOY
VERSION FOUR?
ENTER, STORM
REAL-TIME COMPUTATION
DISTRIBUTED RPC & STREAM PROCESSING
HISTORY
STREAM PROCESSING
STORM:REAL-TIME HADOOP:BATCH
WOW
HIGH VOLUME
CONTINUOUS
CONTINUOUS
FAULT TOLERANT
DOESN'T
PERSIST
PROCESS BATCHES RELIABLY
PROTECT AGAINST HUMAN ERROR
PROTECT AGAINST HUMAN ERROR
THREE CORE ELEMENTS
SPOUTS
STREAMS
BOLTS
TOPOLOGIES
TASKS
TASKS
OUTPUT ROUTING?
STREAM GROUPINGS
SHUFFLE GROUPING
FIELDS GROUPING
ALL GROUPING
GLOBAL GROUPING
DOWN 'N DIRTY
GATEWAYS
GATEWAYS
REAL-TIME GEOCODE BUCKETED CLIENT UPDATE
THE TOPOLOGY
THE TOPOLOGY
CODE TIME: START ECLIPSE
WAIT, WHAT?!
MULTILANG API
I'VE GOT YOU COVERED
UMBRELLA: IT PROTECTS YOU FROM STORM
THE TOPOLOGY
I'VE GOT YOU COVERED class RedisSpout(JVMSpout): class Default(Stream): fields =
'message' jvm_class = 'yieldbot.storm.spout'
I'VE GOT YOU COVERED class LogParserBolt(AutoAckBolt): class Default(Stream): fields =
'ip_address' def execute(self, input): ip_address = parse_log(input.message) self.emit(ip_address)
I'VE GOT YOU COVERED class GeolocatorBolt(AutoAckBolt): class Default(Stream): fields =
'lat', 'long' def __init__(self, *args, **kwargs): self.geoip = pygeoip.GeoIP('GeoLiteCity.dat') super(GeolocatorBolt, self) \ .__init__(*args, **kwargs) def execute(self, input): record = self.geoip.record_by_addr(input.ip) lat = record['latitude'] long_ = record['longitude'] self.emit((lat, long_))
I'VE GOT YOU COVERED class WSPuserBolt(Bolt): def __init__(self, *args, **kwargs):
self.batcher = TimeBatcher() self.pusher = zerorpc.Client(timeout=None) url = os.environ['WSPUSHER_ZERORPC_URL'] self.wspusher.connect(url) super(WSPusherBolt, self).__init__(*args, **kwargs def execute(self, input): t = time() batch = self.pop_batch(t) if batch: self.wspusher.push_list(batch) data = input.lat, input.long self.batcher.push_item(t, data)
I'VE GOT YOU COVERED class GeocoderTopology(Topology): # components redis =
RedisSpout(1) parser = LogParserBolt(3) geolocator = GeolocatorBolt(2) pusher = WSPuserBolt(4) # plumbing parser.inputs.append(ShuffleGrouping(redis)) geolocator.inputs.append(ShuffleGrouping(parser)) pusher.inputs.append( FieldsGrouping(geolocator, 'lat', 'long'))
INSIDE THE MACHINE
THREE COMPONENTS
NIMBUS
ZOOKEEPER CLUSTER
WORKER NODES
DETAILS
DEPLOYMENT
EC2?
DOTCLOUD!
$ git clone \ https://github.com/gabrielgrant/storm-on-dotcloud.git $ dotcloud push mystorm storm-on-dotcloud
… $ dotcloud scale worker=3
TESTING
JAVA
CLOJURE
ANT MAVEN
LINEINGEN
SCALING
WHEN
HOW
THE FUTURE: EASY & AUTO
THANKS!
GABRIEL GRANT @gabrielmgrant gabrielgrant.ca