Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Storm: the Hadoop of Realtime Stream Processing
Search
Gabriel Grant
March 25, 2012
Programming
3
1.3k
Storm: the Hadoop of Realtime Stream Processing
Twitter's new scalable, fault-tolerant, and simple(ish) stream programming system... with Python!
Gabriel Grant
March 25, 2012
Tweet
Share
More Decks by Gabriel Grant
See All by Gabriel Grant
Painting Rainbows: Building Bridges in the Cloud
gabrielgrant
1
220
Other Decks in Programming
See All in Programming
rails statsで大解剖 🔍 “B/43流” のRailsの育て方を歴史とともに振り返ります
shoheimitani
2
930
Асинхронность неизбежна: как мы проектировали сервис уведомлений
lamodatech
0
650
Fibonacci Function Gallery - Part 1
philipschwarz
PRO
0
200
創造的活動から切り拓く新たなキャリア 好きから始めてみる夜勤オペレーターからSREへの転身
yjszk
1
130
As an Engineers, let's build the CRM system via LINE Official Account 2.0
clonn
1
670
rails stats で紐解く ANDPAD のイマを支える技術たち
andpad
1
290
Effective Signals in Angular 19+: Rules and Helpers @ngbe2024
manfredsteyer
PRO
0
130
これでLambdaが不要に?!Step FunctionsのJSONata対応について
iwatatomoya
2
3.6k
SymfonyCon Vienna 2025: Twig, still relevant in 2025?
fabpot
3
1.2k
Webエンジニア主体のモバイルチームの 生産性を高く保つためにやったこと
igreenwood
0
330
선언형 UI에서의 상태관리
l2hyunwoo
0
140
クリエイティブコーディングとRuby学習 / Creative Coding and Learning Ruby
chobishiba
0
3.9k
Featured
See All Featured
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
95
17k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
5
440
No one is an island. Learnings from fostering a developers community.
thoeni
19
3k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
28
2.1k
We Have a Design System, Now What?
morganepeng
51
7.3k
[RailsConf 2023] Rails as a piece of cake
palkan
53
5k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
656
59k
Designing Experiences People Love
moore
138
23k
Testing 201, or: Great Expectations
jmmastey
40
7.1k
How GitHub (no longer) Works
holman
311
140k
Designing for Performance
lara
604
68k
Product Roadmaps are Hard
iamctodd
PRO
49
11k
Transcript
STORM Keeping it Real(time) Since 2011
HELLO.
dotCloud.com
DATA
DATA
MEGA-DATA
VERSION ONE
VERSION TWO
VERSION TWO
VERSION THREE
JOY
VERSION FOUR?
ENTER, STORM
REAL-TIME COMPUTATION
DISTRIBUTED RPC & STREAM PROCESSING
HISTORY
STREAM PROCESSING
STORM:REAL-TIME HADOOP:BATCH
WOW
HIGH VOLUME
CONTINUOUS
CONTINUOUS
FAULT TOLERANT
DOESN'T
PERSIST
PROCESS BATCHES RELIABLY
PROTECT AGAINST HUMAN ERROR
PROTECT AGAINST HUMAN ERROR
THREE CORE ELEMENTS
SPOUTS
STREAMS
BOLTS
TOPOLOGIES
TASKS
TASKS
OUTPUT ROUTING?
STREAM GROUPINGS
SHUFFLE GROUPING
FIELDS GROUPING
ALL GROUPING
GLOBAL GROUPING
DOWN 'N DIRTY
GATEWAYS
GATEWAYS
REAL-TIME GEOCODE BUCKETED CLIENT UPDATE
THE TOPOLOGY
THE TOPOLOGY
CODE TIME: START ECLIPSE
WAIT, WHAT?!
MULTILANG API
I'VE GOT YOU COVERED
UMBRELLA: IT PROTECTS YOU FROM STORM
THE TOPOLOGY
I'VE GOT YOU COVERED class RedisSpout(JVMSpout): class Default(Stream): fields =
'message' jvm_class = 'yieldbot.storm.spout'
I'VE GOT YOU COVERED class LogParserBolt(AutoAckBolt): class Default(Stream): fields =
'ip_address' def execute(self, input): ip_address = parse_log(input.message) self.emit(ip_address)
I'VE GOT YOU COVERED class GeolocatorBolt(AutoAckBolt): class Default(Stream): fields =
'lat', 'long' def __init__(self, *args, **kwargs): self.geoip = pygeoip.GeoIP('GeoLiteCity.dat') super(GeolocatorBolt, self) \ .__init__(*args, **kwargs) def execute(self, input): record = self.geoip.record_by_addr(input.ip) lat = record['latitude'] long_ = record['longitude'] self.emit((lat, long_))
I'VE GOT YOU COVERED class WSPuserBolt(Bolt): def __init__(self, *args, **kwargs):
self.batcher = TimeBatcher() self.pusher = zerorpc.Client(timeout=None) url = os.environ['WSPUSHER_ZERORPC_URL'] self.wspusher.connect(url) super(WSPusherBolt, self).__init__(*args, **kwargs def execute(self, input): t = time() batch = self.pop_batch(t) if batch: self.wspusher.push_list(batch) data = input.lat, input.long self.batcher.push_item(t, data)
I'VE GOT YOU COVERED class GeocoderTopology(Topology): # components redis =
RedisSpout(1) parser = LogParserBolt(3) geolocator = GeolocatorBolt(2) pusher = WSPuserBolt(4) # plumbing parser.inputs.append(ShuffleGrouping(redis)) geolocator.inputs.append(ShuffleGrouping(parser)) pusher.inputs.append( FieldsGrouping(geolocator, 'lat', 'long'))
INSIDE THE MACHINE
THREE COMPONENTS
NIMBUS
ZOOKEEPER CLUSTER
WORKER NODES
DETAILS
DEPLOYMENT
EC2?
DOTCLOUD!
$ git clone \ https://github.com/gabrielgrant/storm-on-dotcloud.git $ dotcloud push mystorm storm-on-dotcloud
… $ dotcloud scale worker=3
TESTING
JAVA
CLOJURE
ANT MAVEN
LINEINGEN
SCALING
WHEN
HOW
THE FUTURE: EASY & AUTO
THANKS!
GABRIEL GRANT @gabrielmgrant gabrielgrant.ca