Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Storm: the Hadoop of Realtime Stream Processing
Search
Gabriel Grant
March 25, 2012
Programming
3
1.3k
Storm: the Hadoop of Realtime Stream Processing
Twitter's new scalable, fault-tolerant, and simple(ish) stream programming system... with Python!
Gabriel Grant
March 25, 2012
Tweet
Share
More Decks by Gabriel Grant
See All by Gabriel Grant
Painting Rainbows: Building Bridges in the Cloud
gabrielgrant
1
210
Other Decks in Programming
See All in Programming
Build Apps for iOS, Android & Desktop in 100% Kotlin With Compose Multiplatform (mDevCamp 2024)
zsmb
0
440
Fragment Composition of GraphQL
quramy
13
1.4k
効率化に挑戦してみたらモバイル開発が少し快適になった話
ryunakayama
0
140
dbtのドメイン分割による データ基盤の改善とDigdagとの連携
sakama
0
440
Anthropic Cookbook のおすすめレシピ
schroneko
7
1.1k
Introducing Kotlin Multiplatform in an existing mobile app - Workshop Edition | AndroidMakers Paris
prof18
0
150
Three ways to use AI on Android: The Good, the Bad and the Ugly
marxallski
0
110
サイコロで理解する統計的仮説検定の考え方
tatamiya
4
1k
CDKコントリビュートの最初の壁を越えよう! -簡単issueの見つけ方-
badmintoncryer
3
210
FigmaとPHPで作る1ミリたりとも表示崩れしない最強の帳票印刷ソリューション
ttskch
43
19k
Behind VS Code Extensions for JavaScript / TypeScript Linnting and Formatting
unvalley
6
1.2k
Polars入門
daikikatsuragawa
1
170
Featured
See All Featured
We Have a Design System, Now What?
morganepeng
43
6.8k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
21
1.9k
Building Flexible Design Systems
yeseniaperezcruz
320
37k
A designer walks into a library…
pauljervisheath
201
23k
Teambox: Starting and Learning
jrom
128
8.4k
Web development in the modern age
philhawksworth
203
10k
A Philosophy of Restraint
colly
197
16k
Scaling GitHub
holman
457
140k
Why You Should Never Use an ORM
jnunemaker
PRO
51
8.7k
What the flash - Photography Introduction
edds
64
11k
ParisWeb 2013: Learning to Love: Crash Course in Emotional UX Design
dotmariusz
104
6.6k
Adopting Sorbet at Scale
ufuk
69
8.6k
Transcript
STORM Keeping it Real(time) Since 2011
HELLO.
dotCloud.com
DATA
DATA
MEGA-DATA
VERSION ONE
VERSION TWO
VERSION TWO
VERSION THREE
JOY
VERSION FOUR?
ENTER, STORM
REAL-TIME COMPUTATION
DISTRIBUTED RPC & STREAM PROCESSING
HISTORY
STREAM PROCESSING
STORM:REAL-TIME HADOOP:BATCH
WOW
HIGH VOLUME
CONTINUOUS
CONTINUOUS
FAULT TOLERANT
DOESN'T
PERSIST
PROCESS BATCHES RELIABLY
PROTECT AGAINST HUMAN ERROR
PROTECT AGAINST HUMAN ERROR
THREE CORE ELEMENTS
SPOUTS
STREAMS
BOLTS
TOPOLOGIES
TASKS
TASKS
OUTPUT ROUTING?
STREAM GROUPINGS
SHUFFLE GROUPING
FIELDS GROUPING
ALL GROUPING
GLOBAL GROUPING
DOWN 'N DIRTY
GATEWAYS
GATEWAYS
REAL-TIME GEOCODE BUCKETED CLIENT UPDATE
THE TOPOLOGY
THE TOPOLOGY
CODE TIME: START ECLIPSE
WAIT, WHAT?!
MULTILANG API
I'VE GOT YOU COVERED
UMBRELLA: IT PROTECTS YOU FROM STORM
THE TOPOLOGY
I'VE GOT YOU COVERED class RedisSpout(JVMSpout): class Default(Stream): fields =
'message' jvm_class = 'yieldbot.storm.spout'
I'VE GOT YOU COVERED class LogParserBolt(AutoAckBolt): class Default(Stream): fields =
'ip_address' def execute(self, input): ip_address = parse_log(input.message) self.emit(ip_address)
I'VE GOT YOU COVERED class GeolocatorBolt(AutoAckBolt): class Default(Stream): fields =
'lat', 'long' def __init__(self, *args, **kwargs): self.geoip = pygeoip.GeoIP('GeoLiteCity.dat') super(GeolocatorBolt, self) \ .__init__(*args, **kwargs) def execute(self, input): record = self.geoip.record_by_addr(input.ip) lat = record['latitude'] long_ = record['longitude'] self.emit((lat, long_))
I'VE GOT YOU COVERED class WSPuserBolt(Bolt): def __init__(self, *args, **kwargs):
self.batcher = TimeBatcher() self.pusher = zerorpc.Client(timeout=None) url = os.environ['WSPUSHER_ZERORPC_URL'] self.wspusher.connect(url) super(WSPusherBolt, self).__init__(*args, **kwargs def execute(self, input): t = time() batch = self.pop_batch(t) if batch: self.wspusher.push_list(batch) data = input.lat, input.long self.batcher.push_item(t, data)
I'VE GOT YOU COVERED class GeocoderTopology(Topology): # components redis =
RedisSpout(1) parser = LogParserBolt(3) geolocator = GeolocatorBolt(2) pusher = WSPuserBolt(4) # plumbing parser.inputs.append(ShuffleGrouping(redis)) geolocator.inputs.append(ShuffleGrouping(parser)) pusher.inputs.append( FieldsGrouping(geolocator, 'lat', 'long'))
INSIDE THE MACHINE
THREE COMPONENTS
NIMBUS
ZOOKEEPER CLUSTER
WORKER NODES
DETAILS
DEPLOYMENT
EC2?
DOTCLOUD!
$ git clone \ https://github.com/gabrielgrant/storm-on-dotcloud.git $ dotcloud push mystorm storm-on-dotcloud
… $ dotcloud scale worker=3
TESTING
JAVA
CLOJURE
ANT MAVEN
LINEINGEN
SCALING
WHEN
HOW
THE FUTURE: EASY & AUTO
THANKS!
GABRIEL GRANT @gabrielmgrant gabrielgrant.ca