Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Storm: the Hadoop of Realtime Stream Processing
Search
Gabriel Grant
March 25, 2012
Programming
2
1.3k
Storm: the Hadoop of Realtime Stream Processing
Twitter's new scalable, fault-tolerant, and simple(ish) stream programming system... with Python!
Gabriel Grant
March 25, 2012
Tweet
Share
More Decks by Gabriel Grant
See All by Gabriel Grant
Painting Rainbows: Building Bridges in the Cloud
gabrielgrant
1
220
Other Decks in Programming
See All in Programming
オープンソースソフトウェアへの解像度🔬
utam0k
16
3k
『毎日の移動』を支えるGoバックエンド内製開発
yutautsugi
2
250
CSC305 Lecture 08
javiergs
PRO
0
240
タスクの特性や不確実性に応じた最適な作業スタイルの選択(ペアプロ・モブプロ・ソロプロ)と実践 / Optimal Work Style Selection: Pair, Mob, or Solo Programming.
honyanya
3
180
組込みだけじゃない!TinyGo で始める無料クラウド開発入門
otakakot
1
320
NixOS + Kubernetesで構築する自宅サーバーのすべて
ichi_h3
0
1.1k
Writing Better Go: Lessons from 10 Code Reviews
konradreiche
0
2k
Leading Effective Engineering Teams in the AI Era
addyosmani
7
480
Cursorハンズオン実践!
eltociear
2
1.1k
What's new in Spring Modulith?
olivergierke
1
160
詳しくない分野でのVibe Codingで困ったことと学び/vibe-coding-in-unfamiliar-area
shibayu36
3
5.1k
なぜGoのジェネリクスはこの形なのか? Featherweight Goが明かす設計の核心
ryotaros
7
1.1k
Featured
See All Featured
Building Flexible Design Systems
yeseniaperezcruz
329
39k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
27k
Build The Right Thing And Hit Your Dates
maggiecrowley
37
2.9k
How to train your dragon (web standard)
notwaldorf
97
6.3k
A Tale of Four Properties
chriscoyier
161
23k
Visualization
eitanlees
149
16k
A better future with KSS
kneath
239
18k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
285
14k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
667
120k
Building an army of robots
kneath
306
46k
Testing 201, or: Great Expectations
jmmastey
45
7.7k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
127
54k
Transcript
STORM Keeping it Real(time) Since 2011
HELLO.
dotCloud.com
DATA
DATA
MEGA-DATA
VERSION ONE
VERSION TWO
VERSION TWO
VERSION THREE
JOY
VERSION FOUR?
ENTER, STORM
REAL-TIME COMPUTATION
DISTRIBUTED RPC & STREAM PROCESSING
HISTORY
STREAM PROCESSING
STORM:REAL-TIME HADOOP:BATCH
WOW
HIGH VOLUME
CONTINUOUS
CONTINUOUS
FAULT TOLERANT
DOESN'T
PERSIST
PROCESS BATCHES RELIABLY
PROTECT AGAINST HUMAN ERROR
PROTECT AGAINST HUMAN ERROR
THREE CORE ELEMENTS
SPOUTS
STREAMS
BOLTS
TOPOLOGIES
TASKS
TASKS
OUTPUT ROUTING?
STREAM GROUPINGS
SHUFFLE GROUPING
FIELDS GROUPING
ALL GROUPING
GLOBAL GROUPING
DOWN 'N DIRTY
GATEWAYS
GATEWAYS
REAL-TIME GEOCODE BUCKETED CLIENT UPDATE
THE TOPOLOGY
THE TOPOLOGY
CODE TIME: START ECLIPSE
WAIT, WHAT?!
MULTILANG API
I'VE GOT YOU COVERED
UMBRELLA: IT PROTECTS YOU FROM STORM
THE TOPOLOGY
I'VE GOT YOU COVERED class RedisSpout(JVMSpout): class Default(Stream): fields =
'message' jvm_class = 'yieldbot.storm.spout'
I'VE GOT YOU COVERED class LogParserBolt(AutoAckBolt): class Default(Stream): fields =
'ip_address' def execute(self, input): ip_address = parse_log(input.message) self.emit(ip_address)
I'VE GOT YOU COVERED class GeolocatorBolt(AutoAckBolt): class Default(Stream): fields =
'lat', 'long' def __init__(self, *args, **kwargs): self.geoip = pygeoip.GeoIP('GeoLiteCity.dat') super(GeolocatorBolt, self) \ .__init__(*args, **kwargs) def execute(self, input): record = self.geoip.record_by_addr(input.ip) lat = record['latitude'] long_ = record['longitude'] self.emit((lat, long_))
I'VE GOT YOU COVERED class WSPuserBolt(Bolt): def __init__(self, *args, **kwargs):
self.batcher = TimeBatcher() self.pusher = zerorpc.Client(timeout=None) url = os.environ['WSPUSHER_ZERORPC_URL'] self.wspusher.connect(url) super(WSPusherBolt, self).__init__(*args, **kwargs def execute(self, input): t = time() batch = self.pop_batch(t) if batch: self.wspusher.push_list(batch) data = input.lat, input.long self.batcher.push_item(t, data)
I'VE GOT YOU COVERED class GeocoderTopology(Topology): # components redis =
RedisSpout(1) parser = LogParserBolt(3) geolocator = GeolocatorBolt(2) pusher = WSPuserBolt(4) # plumbing parser.inputs.append(ShuffleGrouping(redis)) geolocator.inputs.append(ShuffleGrouping(parser)) pusher.inputs.append( FieldsGrouping(geolocator, 'lat', 'long'))
INSIDE THE MACHINE
THREE COMPONENTS
NIMBUS
ZOOKEEPER CLUSTER
WORKER NODES
DETAILS
DEPLOYMENT
EC2?
DOTCLOUD!
$ git clone \ https://github.com/gabrielgrant/storm-on-dotcloud.git $ dotcloud push mystorm storm-on-dotcloud
… $ dotcloud scale worker=3
TESTING
JAVA
CLOJURE
ANT MAVEN
LINEINGEN
SCALING
WHEN
HOW
THE FUTURE: EASY & AUTO
THANKS!
GABRIEL GRANT @gabrielmgrant gabrielgrant.ca