Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Storm: the Hadoop of Realtime Stream Processing
Search
Gabriel Grant
March 25, 2012
Programming
3
1.3k
Storm: the Hadoop of Realtime Stream Processing
Twitter's new scalable, fault-tolerant, and simple(ish) stream programming system... with Python!
Gabriel Grant
March 25, 2012
Tweet
Share
More Decks by Gabriel Grant
See All by Gabriel Grant
Painting Rainbows: Building Bridges in the Cloud
gabrielgrant
1
210
Other Decks in Programming
See All in Programming
Some Quick Ideas To Improve Your Tests ( #jassttokyo )
teyamagu
PRO
2
2.3k
生成 AI の中身を覗いてみよう〜基礎から医療現場での応用まで〜
soh9834
2
760
【KMC春合宿2024】実装視点で見るNeural Radiance Fields
runningoutrate
0
150
WebComponentsで フレームワークを1ページに共存させる
webuilder240
0
150
「コンパイル時のユニットテスト」導入するとユニットテストを 書かなくてよくなるのか?
tomohisa
9
2.2k
PHP 8.3で追加されたjson_validate()を徹底的に深掘りしてみよう
mashirou1234
1
720
PHPでOfficeファイルを取り扱う! PHP Officeライブラリを プロダクトに組み込んだ話
hirobe1999
0
840
Deno に Web 標準 API を実装する / Implementing Web Standard API to Deno
petamoriken
0
350
Material 3で Material 2ぽい見た目にする
numeroanddev
2
250
イベントストーミングによるオブジェクトモデリング・オブジェクト指向プログラミングの適用・開発プロセスの変遷・アーキテクチャの変革 / Object modeling with Event Storming.
nrslib
12
3k
もうすぐ新年度、Babylon.jsがお勧めな3個の理由
hideg
0
160
Learning PHP and Static Analysis with PHP Parser
inouehi
1
250
Featured
See All Featured
YesSQL, Process and Tooling at Scale
rocio
160
13k
The Straight Up "How To Draw Better" Workshop
denniskardys
227
130k
Six Lessons from altMBA
skipperchong
19
2.9k
Bash Introduction
62gerente
604
210k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
28
5.9k
Ruby is Unlike a Banana
tanoku
95
10k
Clear Off the Table
cherdarchuk
82
310k
Learning to Love Humans: Emotional Interface Design
aarron
266
39k
The Power of CSS Pseudo Elements
geoffreycrofte
58
4.9k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
124
32k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
355
22k
A Philosophy of Restraint
colly
195
15k
Transcript
STORM Keeping it Real(time) Since 2011
HELLO.
dotCloud.com
DATA
DATA
MEGA-DATA
VERSION ONE
VERSION TWO
VERSION TWO
VERSION THREE
JOY
VERSION FOUR?
ENTER, STORM
REAL-TIME COMPUTATION
DISTRIBUTED RPC & STREAM PROCESSING
HISTORY
STREAM PROCESSING
STORM:REAL-TIME HADOOP:BATCH
WOW
HIGH VOLUME
CONTINUOUS
CONTINUOUS
FAULT TOLERANT
DOESN'T
PERSIST
PROCESS BATCHES RELIABLY
PROTECT AGAINST HUMAN ERROR
PROTECT AGAINST HUMAN ERROR
THREE CORE ELEMENTS
SPOUTS
STREAMS
BOLTS
TOPOLOGIES
TASKS
TASKS
OUTPUT ROUTING?
STREAM GROUPINGS
SHUFFLE GROUPING
FIELDS GROUPING
ALL GROUPING
GLOBAL GROUPING
DOWN 'N DIRTY
GATEWAYS
GATEWAYS
REAL-TIME GEOCODE BUCKETED CLIENT UPDATE
THE TOPOLOGY
THE TOPOLOGY
CODE TIME: START ECLIPSE
WAIT, WHAT?!
MULTILANG API
I'VE GOT YOU COVERED
UMBRELLA: IT PROTECTS YOU FROM STORM
THE TOPOLOGY
I'VE GOT YOU COVERED class RedisSpout(JVMSpout): class Default(Stream): fields =
'message' jvm_class = 'yieldbot.storm.spout'
I'VE GOT YOU COVERED class LogParserBolt(AutoAckBolt): class Default(Stream): fields =
'ip_address' def execute(self, input): ip_address = parse_log(input.message) self.emit(ip_address)
I'VE GOT YOU COVERED class GeolocatorBolt(AutoAckBolt): class Default(Stream): fields =
'lat', 'long' def __init__(self, *args, **kwargs): self.geoip = pygeoip.GeoIP('GeoLiteCity.dat') super(GeolocatorBolt, self) \ .__init__(*args, **kwargs) def execute(self, input): record = self.geoip.record_by_addr(input.ip) lat = record['latitude'] long_ = record['longitude'] self.emit((lat, long_))
I'VE GOT YOU COVERED class WSPuserBolt(Bolt): def __init__(self, *args, **kwargs):
self.batcher = TimeBatcher() self.pusher = zerorpc.Client(timeout=None) url = os.environ['WSPUSHER_ZERORPC_URL'] self.wspusher.connect(url) super(WSPusherBolt, self).__init__(*args, **kwargs def execute(self, input): t = time() batch = self.pop_batch(t) if batch: self.wspusher.push_list(batch) data = input.lat, input.long self.batcher.push_item(t, data)
I'VE GOT YOU COVERED class GeocoderTopology(Topology): # components redis =
RedisSpout(1) parser = LogParserBolt(3) geolocator = GeolocatorBolt(2) pusher = WSPuserBolt(4) # plumbing parser.inputs.append(ShuffleGrouping(redis)) geolocator.inputs.append(ShuffleGrouping(parser)) pusher.inputs.append( FieldsGrouping(geolocator, 'lat', 'long'))
INSIDE THE MACHINE
THREE COMPONENTS
NIMBUS
ZOOKEEPER CLUSTER
WORKER NODES
DETAILS
DEPLOYMENT
EC2?
DOTCLOUD!
$ git clone \ https://github.com/gabrielgrant/storm-on-dotcloud.git $ dotcloud push mystorm storm-on-dotcloud
… $ dotcloud scale worker=3
TESTING
JAVA
CLOJURE
ANT MAVEN
LINEINGEN
SCALING
WHEN
HOW
THE FUTURE: EASY & AUTO
THANKS!
GABRIEL GRANT @gabrielmgrant gabrielgrant.ca