Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Storm: the Hadoop of Realtime Stream Processing
Search
Gabriel Grant
March 25, 2012
Programming
2
1.3k
Storm: the Hadoop of Realtime Stream Processing
Twitter's new scalable, fault-tolerant, and simple(ish) stream programming system... with Python!
Gabriel Grant
March 25, 2012
Tweet
Share
More Decks by Gabriel Grant
See All by Gabriel Grant
Painting Rainbows: Building Bridges in the Cloud
gabrielgrant
1
220
Other Decks in Programming
See All in Programming
The Flutter Journey of Building a Live Streaming App — With a Side of Performance Tuning
u503
1
100
After go func(): Goroutines Through a Beginner’s Eye
97vaibhav
0
240
デミカツ切り抜きで面倒くさいことはPythonにやらせよう
aokswork3
0
210
そのpreloadは必要?見過ごされたpreloadが技術的負債として爆発した日
mugitti9
2
3.1k
株式会社 Sun terras カンパニーデック
sunterras
0
250
Go言語の特性を活かした公式MCP SDKの設計
hond0413
1
200
CSC305 Lecture 05
javiergs
PRO
0
210
私達はmodernize packageに夢を見るか feat. go/analysis, go/ast / Go Conference 2025
kaorumuta
2
500
SpecKitでどこまでできる? コストはどれくらい?
leveragestech
0
610
エンジニアとして高みを目指す、 利益を生み出す設計の考え方 / design-for-profit
minodriven
23
12k
10年もののAPIサーバーにおけるCI/CDの改善の奮闘
mbook
0
790
Local Peer-to-Peer APIはどのように使われていくのか?
hal_spidernight
2
460
Featured
See All Featured
Principles of Awesome APIs and How to Build Them.
keavy
127
17k
Building Better People: How to give real-time feedback that sticks.
wjessup
368
20k
Become a Pro
speakerdeck
PRO
29
5.5k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
507
140k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
1.6k
How to train your dragon (web standard)
notwaldorf
96
6.3k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
45
2.5k
Side Projects
sachag
455
43k
Measuring & Analyzing Core Web Vitals
bluesmoon
9
610
Documentation Writing (for coders)
carmenintech
75
5k
How to Ace a Technical Interview
jacobian
280
24k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
53k
Transcript
STORM Keeping it Real(time) Since 2011
HELLO.
dotCloud.com
DATA
DATA
MEGA-DATA
VERSION ONE
VERSION TWO
VERSION TWO
VERSION THREE
JOY
VERSION FOUR?
ENTER, STORM
REAL-TIME COMPUTATION
DISTRIBUTED RPC & STREAM PROCESSING
HISTORY
STREAM PROCESSING
STORM:REAL-TIME HADOOP:BATCH
WOW
HIGH VOLUME
CONTINUOUS
CONTINUOUS
FAULT TOLERANT
DOESN'T
PERSIST
PROCESS BATCHES RELIABLY
PROTECT AGAINST HUMAN ERROR
PROTECT AGAINST HUMAN ERROR
THREE CORE ELEMENTS
SPOUTS
STREAMS
BOLTS
TOPOLOGIES
TASKS
TASKS
OUTPUT ROUTING?
STREAM GROUPINGS
SHUFFLE GROUPING
FIELDS GROUPING
ALL GROUPING
GLOBAL GROUPING
DOWN 'N DIRTY
GATEWAYS
GATEWAYS
REAL-TIME GEOCODE BUCKETED CLIENT UPDATE
THE TOPOLOGY
THE TOPOLOGY
CODE TIME: START ECLIPSE
WAIT, WHAT?!
MULTILANG API
I'VE GOT YOU COVERED
UMBRELLA: IT PROTECTS YOU FROM STORM
THE TOPOLOGY
I'VE GOT YOU COVERED class RedisSpout(JVMSpout): class Default(Stream): fields =
'message' jvm_class = 'yieldbot.storm.spout'
I'VE GOT YOU COVERED class LogParserBolt(AutoAckBolt): class Default(Stream): fields =
'ip_address' def execute(self, input): ip_address = parse_log(input.message) self.emit(ip_address)
I'VE GOT YOU COVERED class GeolocatorBolt(AutoAckBolt): class Default(Stream): fields =
'lat', 'long' def __init__(self, *args, **kwargs): self.geoip = pygeoip.GeoIP('GeoLiteCity.dat') super(GeolocatorBolt, self) \ .__init__(*args, **kwargs) def execute(self, input): record = self.geoip.record_by_addr(input.ip) lat = record['latitude'] long_ = record['longitude'] self.emit((lat, long_))
I'VE GOT YOU COVERED class WSPuserBolt(Bolt): def __init__(self, *args, **kwargs):
self.batcher = TimeBatcher() self.pusher = zerorpc.Client(timeout=None) url = os.environ['WSPUSHER_ZERORPC_URL'] self.wspusher.connect(url) super(WSPusherBolt, self).__init__(*args, **kwargs def execute(self, input): t = time() batch = self.pop_batch(t) if batch: self.wspusher.push_list(batch) data = input.lat, input.long self.batcher.push_item(t, data)
I'VE GOT YOU COVERED class GeocoderTopology(Topology): # components redis =
RedisSpout(1) parser = LogParserBolt(3) geolocator = GeolocatorBolt(2) pusher = WSPuserBolt(4) # plumbing parser.inputs.append(ShuffleGrouping(redis)) geolocator.inputs.append(ShuffleGrouping(parser)) pusher.inputs.append( FieldsGrouping(geolocator, 'lat', 'long'))
INSIDE THE MACHINE
THREE COMPONENTS
NIMBUS
ZOOKEEPER CLUSTER
WORKER NODES
DETAILS
DEPLOYMENT
EC2?
DOTCLOUD!
$ git clone \ https://github.com/gabrielgrant/storm-on-dotcloud.git $ dotcloud push mystorm storm-on-dotcloud
… $ dotcloud scale worker=3
TESTING
JAVA
CLOJURE
ANT MAVEN
LINEINGEN
SCALING
WHEN
HOW
THE FUTURE: EASY & AUTO
THANKS!
GABRIEL GRANT @gabrielmgrant gabrielgrant.ca