Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Storm: the Hadoop of Realtime Stream Processing
Search
Gabriel Grant
March 25, 2012
Programming
3
1.3k
Storm: the Hadoop of Realtime Stream Processing
Twitter's new scalable, fault-tolerant, and simple(ish) stream programming system... with Python!
Gabriel Grant
March 25, 2012
Tweet
Share
More Decks by Gabriel Grant
See All by Gabriel Grant
Painting Rainbows: Building Bridges in the Cloud
gabrielgrant
1
220
Other Decks in Programming
See All in Programming
画像コンペでのベースラインモデルの育て方
tattaka
3
1.9k
The Past, Present, and Future of Enterprise Java
ivargrimstad
0
200
GUI操作LLMの最新動向: UI-TARSと関連論文紹介
kfujikawa
0
1k
あのころの iPod を どうにか再生させたい
orumin
2
2.5k
Introduction to Git & GitHub
latte72
0
120
「リーダーは意思決定する人」って本当?~ 学びを現場で活かす、リーダー4ヶ月目の試行錯誤 ~
marina1017
0
240
AIレビュアーをスケールさせるには / Scaling AI Reviewers
technuma
2
230
兎に角、コードレビュー
mitohato14
0
150
あなたとJIT, 今すぐアセンブ ル
sisshiki1969
1
730
AIエージェント開発、DevOps and LLMOps
ymd65536
1
350
KessokuでDIでもgoroutineを活用する / Go Connect #6
mazrean
0
120
Rancher と Terraform
fufuhu
0
110
Featured
See All Featured
Music & Morning Musume
bryan
46
6.7k
Building Applications with DynamoDB
mza
96
6.6k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
139
34k
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.4k
Unsuck your backbone
ammeep
671
58k
Testing 201, or: Great Expectations
jmmastey
45
7.6k
What’s in a name? Adding method to the madness
productmarketing
PRO
23
3.6k
Building an army of robots
kneath
306
46k
jQuery: Nuts, Bolts and Bling
dougneiner
64
7.9k
It's Worth the Effort
3n
187
28k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
248
1.3M
Faster Mobile Websites
deanohume
309
31k
Transcript
STORM Keeping it Real(time) Since 2011
HELLO.
dotCloud.com
DATA
DATA
MEGA-DATA
VERSION ONE
VERSION TWO
VERSION TWO
VERSION THREE
JOY
VERSION FOUR?
ENTER, STORM
REAL-TIME COMPUTATION
DISTRIBUTED RPC & STREAM PROCESSING
HISTORY
STREAM PROCESSING
STORM:REAL-TIME HADOOP:BATCH
WOW
HIGH VOLUME
CONTINUOUS
CONTINUOUS
FAULT TOLERANT
DOESN'T
PERSIST
PROCESS BATCHES RELIABLY
PROTECT AGAINST HUMAN ERROR
PROTECT AGAINST HUMAN ERROR
THREE CORE ELEMENTS
SPOUTS
STREAMS
BOLTS
TOPOLOGIES
TASKS
TASKS
OUTPUT ROUTING?
STREAM GROUPINGS
SHUFFLE GROUPING
FIELDS GROUPING
ALL GROUPING
GLOBAL GROUPING
DOWN 'N DIRTY
GATEWAYS
GATEWAYS
REAL-TIME GEOCODE BUCKETED CLIENT UPDATE
THE TOPOLOGY
THE TOPOLOGY
CODE TIME: START ECLIPSE
WAIT, WHAT?!
MULTILANG API
I'VE GOT YOU COVERED
UMBRELLA: IT PROTECTS YOU FROM STORM
THE TOPOLOGY
I'VE GOT YOU COVERED class RedisSpout(JVMSpout): class Default(Stream): fields =
'message' jvm_class = 'yieldbot.storm.spout'
I'VE GOT YOU COVERED class LogParserBolt(AutoAckBolt): class Default(Stream): fields =
'ip_address' def execute(self, input): ip_address = parse_log(input.message) self.emit(ip_address)
I'VE GOT YOU COVERED class GeolocatorBolt(AutoAckBolt): class Default(Stream): fields =
'lat', 'long' def __init__(self, *args, **kwargs): self.geoip = pygeoip.GeoIP('GeoLiteCity.dat') super(GeolocatorBolt, self) \ .__init__(*args, **kwargs) def execute(self, input): record = self.geoip.record_by_addr(input.ip) lat = record['latitude'] long_ = record['longitude'] self.emit((lat, long_))
I'VE GOT YOU COVERED class WSPuserBolt(Bolt): def __init__(self, *args, **kwargs):
self.batcher = TimeBatcher() self.pusher = zerorpc.Client(timeout=None) url = os.environ['WSPUSHER_ZERORPC_URL'] self.wspusher.connect(url) super(WSPusherBolt, self).__init__(*args, **kwargs def execute(self, input): t = time() batch = self.pop_batch(t) if batch: self.wspusher.push_list(batch) data = input.lat, input.long self.batcher.push_item(t, data)
I'VE GOT YOU COVERED class GeocoderTopology(Topology): # components redis =
RedisSpout(1) parser = LogParserBolt(3) geolocator = GeolocatorBolt(2) pusher = WSPuserBolt(4) # plumbing parser.inputs.append(ShuffleGrouping(redis)) geolocator.inputs.append(ShuffleGrouping(parser)) pusher.inputs.append( FieldsGrouping(geolocator, 'lat', 'long'))
INSIDE THE MACHINE
THREE COMPONENTS
NIMBUS
ZOOKEEPER CLUSTER
WORKER NODES
DETAILS
DEPLOYMENT
EC2?
DOTCLOUD!
$ git clone \ https://github.com/gabrielgrant/storm-on-dotcloud.git $ dotcloud push mystorm storm-on-dotcloud
… $ dotcloud scale worker=3
TESTING
JAVA
CLOJURE
ANT MAVEN
LINEINGEN
SCALING
WHEN
HOW
THE FUTURE: EASY & AUTO
THANKS!
GABRIEL GRANT @gabrielmgrant gabrielgrant.ca