Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Timothy
Search
abhinay
June 28, 2012
Programming
2
1.5k
Timothy
Write Hadoop Jobs in NodeJS by Antonio Garrote and Abhinay Mehta
abhinay
June 28, 2012
Tweet
Share
Other Decks in Programming
See All in Programming
dbtのドメイン分割による データ基盤の改善とDigdagとの連携
sakama
0
440
Hanami and htmx
bkuhlmann
0
220
Ruby Function Composition
bkuhlmann
1
340
CDKコントリビュートの最初の壁を越えよう! -簡単issueの見つけ方-
badmintoncryer
3
200
Amazon SQSコンシューマー疎結合への旅 - 出張! #DevelopersIO IT技術ブログの中の人が語る勉強会 #3
quiver
0
300
SIMD Parallel Programming with the Vector API
josepaumard
0
230
0→1と1→10の狭間で Javaという技術選定を振り返る/Reflecting on the Decision to Choose Java Between Scaling from 0 to 1 and 1 to 10
jaguar_imo
2
400
Implementing Design Systems in Swift
seyfoyun
1
450
初心者のためのRubyKaigi入門/RubyKaigi Introduction
a_matsuda
8
1.4k
AppRouter Panel Talk
yosuke_furukawa
PRO
1
460
Code Reviews
bkuhlmann
4
900
Snowflakeで眠ったデータを起こそう!
estie
0
140
Featured
See All Featured
Bash Introduction
62gerente
605
210k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
41
4.4k
[RailsConf 2023] Rails as a piece of cake
palkan
28
4k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
13
8.3k
Building Better People: How to give real-time feedback that sticks.
wjessup
356
18k
Being A Developer After 40
akosma
66
580k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
501
140k
Gamification - CAS2011
davidbonilla
77
4.6k
It's Worth the Effort
3n
180
27k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
222
21k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
126
32k
VelocityConf: Rendering Performance Case Studies
addyosmani
321
23k
Transcript
Timothy https://github.com/forward/timothy Antonio Garrote Abhinay Mehta
Hadoop MapReduce in Node.js
Hadoop • Distributed processing of large data • Derived from
Google MapReduce and GFS • Fast becoming the de facto standard • Large ecosystem • Java • Master/Slave setup
Hadoop Architecture HDFS MapReduce Output Input
• Open Source • Uses Hadoop Streaming API • No
binaries • NPM support Timothy
$ npm install timothy
require('timothy') .configure({ config: './hadoop.xml', input: '/tmp/loremipsum.txt', output: '/tmp/wordcount/', name: 'Timothy
Word Count Example' }) .map(function(line){ line.split(" ").forEach(function(word) { emit(word, 1); }); }) .reduce(function(word,counts){ emit(word, counts.length); }) .run(function(err){ .. }); Word Count
require('timothy') .map(function(line){ line.split(" ").forEach(function(word) { emit(word, 1); }); }) .reduce(function(word,counts){
emit(word, counts.length); }) .runLocal("/local/input/path"); Local Runner
require('timothy') .map(function(line){ line.split(" ").forEach(function(word) { emit(word, 1); }); }) .reduce(function(word,counts){
emit(word, counts.length); }) .runLocal("/local/input/path"); Local Runner
Dependencies require('timothy') .configure({ ... }) .dependencies({'string' : '0.2.1-2'}) .map(function(line){ var
S = require('string'); line.split(" ").forEach(function(word) { if (S(word).isAlphaNumeric()) { emit(word, 1); } }); }) ...
Dependencies require('timothy') .configure({ ... }) .dependencies({'string' : '0.2.1-2'}) .map(function(line){ var
S = require('string'); line.split(" ").forEach(function(word) { if (S(word).isAlphaNumeric()) { emit(word, 1); } }); }) ...
Setup require('timothy') .configure({ ... }) .dependencies({'string' : '0.2.1-2'}) .map(function(line){ line.split("
").forEach(function(word) { if (S(word).isAlphaNumeric()) { emit(word, 1); } }); }) ... .setup(function() { S = require('string'); })
Setup require('timothy') .configure({ ... }) .dependencies({'string' : '0.2.1-2'}) .map(function(line){ line.split("
").forEach(function(word) { if (S(word).isAlphaNumeric()) { emit(word, 1); } }); }) ... .setup(function() { S = require('string'); })
require('timothy') .configure({ ... }) .map(function(line){ emit(line, 1); }) .reduce(function(line,counts){ emit(line,
counts.length); }) .map(function(line, count){ emit(line[0], count); }) .reduce(function(letter, counts){ var sum = counts.reduce(function(a,i) { return a+i; }); emit(letter, sum); }) .run(); Method Chaining
• Update Job Status • Create and update counters •
Pass env vars to jobs • More examples on github page Other features
Motivation • Big data is now a thing • Lower
the barrier to entry • Benefits of NodeJS on Hadoop • Development Speed
Limitations • Setup method cannot block • Lack support for
lexical scoping • NodeJS needs to be pre-installed on slaves • Probably more we haven’t thought of yet!
Improvements • Bundling local JS scripts • JSON for intermediary
data format • JVM support
Thank you! https://github.com/forward/timothy