Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Storm: the Hadoop of Realtime Stream Processing
Search
Gabriel Grant
March 25, 2012
Programming
2
1.3k
Storm: the Hadoop of Realtime Stream Processing
Twitter's new scalable, fault-tolerant, and simple(ish) stream programming system... with Python!
Gabriel Grant
March 25, 2012
Tweet
Share
More Decks by Gabriel Grant
See All by Gabriel Grant
Painting Rainbows: Building Bridges in the Cloud
gabrielgrant
1
230
Other Decks in Programming
See All in Programming
ハイパーメディア駆動アプリケーションとIslandアーキテクチャ: htmxによるWebアプリケーション開発と動的UIの局所的適用
nowaki28
0
420
ID管理機能開発の裏側 高速にSaaS連携を実現したチームのAI活用編
atzzcokek
0
230
著者と進める!『AIと個人開発したくなったらまずCursorで要件定義だ!』
yasunacoffee
0
140
これだけで丸わかり!LangChain v1.0 アップデートまとめ
os1ma
6
1.9k
DevFest Android in Korea 2025 - 개발자 커뮤니티를 통해 얻는 가치
wisemuji
0
150
MAP, Jigsaw, Code Golf 振り返り会 by 関東Kaggler会|Jigsaw 15th Solution
hasibirok0
0
250
Go コードベースの構成と AI コンテキスト定義
andpad
0
130
【CA.ai #3】Google ADKを活用したAI Agent開発と運用知見
harappa80
0
320
Socio-Technical Evolution: Growing an Architecture and Its Organization for Fast Flow
cer
PRO
0
350
ELYZA_Findy AI Engineering Summit登壇資料_AIコーディング時代に「ちゃんと」やること_toB LLMプロダクト開発舞台裏_20251216
elyza
0
160
Rubyで鍛える仕組み化プロヂュース力
muryoimpl
0
140
Integrating WordPress and Symfony
alexandresalome
0
160
Featured
See All Featured
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
9
1k
A Modern Web Designer's Workflow
chriscoyier
698
190k
Six Lessons from altMBA
skipperchong
29
4.1k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
286
14k
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
Building an army of robots
kneath
306
46k
The World Runs on Bad Software
bkeepers
PRO
72
12k
A better future with KSS
kneath
240
18k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
55
3.1k
Intergalactic Javascript Robots from Outer Space
tanoku
273
27k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
47
7.9k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
31
9.8k
Transcript
STORM Keeping it Real(time) Since 2011
HELLO.
dotCloud.com
DATA
DATA
MEGA-DATA
VERSION ONE
VERSION TWO
VERSION TWO
VERSION THREE
JOY
VERSION FOUR?
ENTER, STORM
REAL-TIME COMPUTATION
DISTRIBUTED RPC & STREAM PROCESSING
HISTORY
STREAM PROCESSING
STORM:REAL-TIME HADOOP:BATCH
WOW
HIGH VOLUME
CONTINUOUS
CONTINUOUS
FAULT TOLERANT
DOESN'T
PERSIST
PROCESS BATCHES RELIABLY
PROTECT AGAINST HUMAN ERROR
PROTECT AGAINST HUMAN ERROR
THREE CORE ELEMENTS
SPOUTS
STREAMS
BOLTS
TOPOLOGIES
TASKS
TASKS
OUTPUT ROUTING?
STREAM GROUPINGS
SHUFFLE GROUPING
FIELDS GROUPING
ALL GROUPING
GLOBAL GROUPING
DOWN 'N DIRTY
GATEWAYS
GATEWAYS
REAL-TIME GEOCODE BUCKETED CLIENT UPDATE
THE TOPOLOGY
THE TOPOLOGY
CODE TIME: START ECLIPSE
WAIT, WHAT?!
MULTILANG API
I'VE GOT YOU COVERED
UMBRELLA: IT PROTECTS YOU FROM STORM
THE TOPOLOGY
I'VE GOT YOU COVERED class RedisSpout(JVMSpout): class Default(Stream): fields =
'message' jvm_class = 'yieldbot.storm.spout'
I'VE GOT YOU COVERED class LogParserBolt(AutoAckBolt): class Default(Stream): fields =
'ip_address' def execute(self, input): ip_address = parse_log(input.message) self.emit(ip_address)
I'VE GOT YOU COVERED class GeolocatorBolt(AutoAckBolt): class Default(Stream): fields =
'lat', 'long' def __init__(self, *args, **kwargs): self.geoip = pygeoip.GeoIP('GeoLiteCity.dat') super(GeolocatorBolt, self) \ .__init__(*args, **kwargs) def execute(self, input): record = self.geoip.record_by_addr(input.ip) lat = record['latitude'] long_ = record['longitude'] self.emit((lat, long_))
I'VE GOT YOU COVERED class WSPuserBolt(Bolt): def __init__(self, *args, **kwargs):
self.batcher = TimeBatcher() self.pusher = zerorpc.Client(timeout=None) url = os.environ['WSPUSHER_ZERORPC_URL'] self.wspusher.connect(url) super(WSPusherBolt, self).__init__(*args, **kwargs def execute(self, input): t = time() batch = self.pop_batch(t) if batch: self.wspusher.push_list(batch) data = input.lat, input.long self.batcher.push_item(t, data)
I'VE GOT YOU COVERED class GeocoderTopology(Topology): # components redis =
RedisSpout(1) parser = LogParserBolt(3) geolocator = GeolocatorBolt(2) pusher = WSPuserBolt(4) # plumbing parser.inputs.append(ShuffleGrouping(redis)) geolocator.inputs.append(ShuffleGrouping(parser)) pusher.inputs.append( FieldsGrouping(geolocator, 'lat', 'long'))
INSIDE THE MACHINE
THREE COMPONENTS
NIMBUS
ZOOKEEPER CLUSTER
WORKER NODES
DETAILS
DEPLOYMENT
EC2?
DOTCLOUD!
$ git clone \ https://github.com/gabrielgrant/storm-on-dotcloud.git $ dotcloud push mystorm storm-on-dotcloud
… $ dotcloud scale worker=3
TESTING
JAVA
CLOJURE
ANT MAVEN
LINEINGEN
SCALING
WHEN
HOW
THE FUTURE: EASY & AUTO
THANKS!
GABRIEL GRANT @gabrielmgrant gabrielgrant.ca