Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Storm: the Hadoop of Realtime Stream Processing
Search
Gabriel Grant
March 25, 2012
Programming
3
1.3k
Storm: the Hadoop of Realtime Stream Processing
Twitter's new scalable, fault-tolerant, and simple(ish) stream programming system... with Python!
Gabriel Grant
March 25, 2012
Tweet
Share
More Decks by Gabriel Grant
See All by Gabriel Grant
Painting Rainbows: Building Bridges in the Cloud
gabrielgrant
1
220
Other Decks in Programming
See All in Programming
Ruby Parser progress report 2025
yui_knk
1
450
AIと私たちの学習の変化を考える - Claude Codeの学習モードを例に
azukiazusa1
10
4.3k
JSONataを使ってみよう Step Functionsが楽しくなる実践テクニック #devio2025
dafujii
1
540
Introducing ReActionView: A new ActionView-compatible ERB Engine @ Rails World 2025, Amsterdam
marcoroth
0
700
Navigating Dependency Injection with Metro
zacsweers
3
1k
楽して成果を出すためのセルフリソース管理
clipnote
0
160
実用的なGOCACHEPROG実装をするために / golang.tokyo #40
mazrean
1
290
Laravel Boost 超入門
fire_arlo
3
220
為你自己學 Python - 冷知識篇
eddie
1
350
Android 16 × Jetpack Composeで縦書きテキストエディタを作ろう / Vertical Text Editor with Compose on Android 16
cc4966
2
250
OSS開発者という働き方
andpad
5
1.7k
意外と簡単!?フロントエンドでパスキー認証を実現する WebAuthn
teamlab
PRO
2
770
Featured
See All Featured
A Tale of Four Properties
chriscoyier
160
23k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
8
920
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
234
17k
Automating Front-end Workflow
addyosmani
1370
200k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
34
6k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
PRO
23
1.4k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
46
7.6k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
36
2.5k
Art, The Web, and Tiny UX
lynnandtonic
303
21k
Optimizing for Happiness
mojombo
379
70k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
285
13k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
1.6k
Transcript
STORM Keeping it Real(time) Since 2011
HELLO.
dotCloud.com
DATA
DATA
MEGA-DATA
VERSION ONE
VERSION TWO
VERSION TWO
VERSION THREE
JOY
VERSION FOUR?
ENTER, STORM
REAL-TIME COMPUTATION
DISTRIBUTED RPC & STREAM PROCESSING
HISTORY
STREAM PROCESSING
STORM:REAL-TIME HADOOP:BATCH
WOW
HIGH VOLUME
CONTINUOUS
CONTINUOUS
FAULT TOLERANT
DOESN'T
PERSIST
PROCESS BATCHES RELIABLY
PROTECT AGAINST HUMAN ERROR
PROTECT AGAINST HUMAN ERROR
THREE CORE ELEMENTS
SPOUTS
STREAMS
BOLTS
TOPOLOGIES
TASKS
TASKS
OUTPUT ROUTING?
STREAM GROUPINGS
SHUFFLE GROUPING
FIELDS GROUPING
ALL GROUPING
GLOBAL GROUPING
DOWN 'N DIRTY
GATEWAYS
GATEWAYS
REAL-TIME GEOCODE BUCKETED CLIENT UPDATE
THE TOPOLOGY
THE TOPOLOGY
CODE TIME: START ECLIPSE
WAIT, WHAT?!
MULTILANG API
I'VE GOT YOU COVERED
UMBRELLA: IT PROTECTS YOU FROM STORM
THE TOPOLOGY
I'VE GOT YOU COVERED class RedisSpout(JVMSpout): class Default(Stream): fields =
'message' jvm_class = 'yieldbot.storm.spout'
I'VE GOT YOU COVERED class LogParserBolt(AutoAckBolt): class Default(Stream): fields =
'ip_address' def execute(self, input): ip_address = parse_log(input.message) self.emit(ip_address)
I'VE GOT YOU COVERED class GeolocatorBolt(AutoAckBolt): class Default(Stream): fields =
'lat', 'long' def __init__(self, *args, **kwargs): self.geoip = pygeoip.GeoIP('GeoLiteCity.dat') super(GeolocatorBolt, self) \ .__init__(*args, **kwargs) def execute(self, input): record = self.geoip.record_by_addr(input.ip) lat = record['latitude'] long_ = record['longitude'] self.emit((lat, long_))
I'VE GOT YOU COVERED class WSPuserBolt(Bolt): def __init__(self, *args, **kwargs):
self.batcher = TimeBatcher() self.pusher = zerorpc.Client(timeout=None) url = os.environ['WSPUSHER_ZERORPC_URL'] self.wspusher.connect(url) super(WSPusherBolt, self).__init__(*args, **kwargs def execute(self, input): t = time() batch = self.pop_batch(t) if batch: self.wspusher.push_list(batch) data = input.lat, input.long self.batcher.push_item(t, data)
I'VE GOT YOU COVERED class GeocoderTopology(Topology): # components redis =
RedisSpout(1) parser = LogParserBolt(3) geolocator = GeolocatorBolt(2) pusher = WSPuserBolt(4) # plumbing parser.inputs.append(ShuffleGrouping(redis)) geolocator.inputs.append(ShuffleGrouping(parser)) pusher.inputs.append( FieldsGrouping(geolocator, 'lat', 'long'))
INSIDE THE MACHINE
THREE COMPONENTS
NIMBUS
ZOOKEEPER CLUSTER
WORKER NODES
DETAILS
DEPLOYMENT
EC2?
DOTCLOUD!
$ git clone \ https://github.com/gabrielgrant/storm-on-dotcloud.git $ dotcloud push mystorm storm-on-dotcloud
… $ dotcloud scale worker=3
TESTING
JAVA
CLOJURE
ANT MAVEN
LINEINGEN
SCALING
WHEN
HOW
THE FUTURE: EASY & AUTO
THANKS!
GABRIEL GRANT @gabrielmgrant gabrielgrant.ca