Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Timothy
Search
abhinay
June 28, 2012
Programming
1.5k
2
Share
Timothy
Write Hadoop Jobs in NodeJS by Antonio Garrote and Abhinay Mehta
abhinay
June 28, 2012
Other Decks in Programming
See All in Programming
TypeScriptだけでAIエージェントを作る フロント・エージェント・インフラのフルスタック実践
har1101
6
1.2k
Inspired By RubyKaigi (EN)
atzzcokek
0
490
RTSPクライアントを自作してみた話
simotin13
0
420
Claspは野良GASの夢をみるか
takter00
0
150
Signal Forms: Beyond the Basics @ngBaguette 2026 in Paris
manfredsteyer
PRO
0
200
AIエージェントの隔離技術の徹底比較
kawayu
0
450
Oxcを導入して開発体験が向上した話
yug1224
4
270
CSC307 Lecture 17
javiergs
PRO
0
310
The Arts and Crafts of Work in the AI Era — Toward Mastery in Software Development
kuranuki
1
700
タクシーアプリ『GO』の バックエンド開発のおける AI利活用と若者のすべて
pyama86
3
1.8k
脅威をエンジニアリングの糧にして――現場編 / Turning Threats into Engineering Fuel — Field Edition
nrslib
0
230
フロントエンドとバックエンドで「1文字」を揃えよう
youkidearitai
PRO
0
130
Featured
See All Featured
Amusing Abliteration
ianozsvald
1
190
The agentic SEO stack - context over prompts
schlessera
0
790
Building Better People: How to give real-time feedback that sticks.
wjessup
370
20k
A Soul's Torment
seathinner
6
2.9k
Chasing Engaging Ingredients in Design
codingconduct
0
200
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
3.3k
Building Experiences: Design Systems, User Experience, and Full Site Editing
marktimemedia
0
520
Music & Morning Musume
bryan
47
7.2k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
659
62k
The #1 spot is gone: here's how to win anyway
tamaranovitovic
2
1.1k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
38
2.9k
Facilitating Awesome Meetings
lara
57
6.9k
Transcript
Timothy https://github.com/forward/timothy Antonio Garrote Abhinay Mehta
Hadoop MapReduce in Node.js
Hadoop • Distributed processing of large data • Derived from
Google MapReduce and GFS • Fast becoming the de facto standard • Large ecosystem • Java • Master/Slave setup
Hadoop Architecture HDFS MapReduce Output Input
• Open Source • Uses Hadoop Streaming API • No
binaries • NPM support Timothy
$ npm install timothy
require('timothy') .configure({ config: './hadoop.xml', input: '/tmp/loremipsum.txt', output: '/tmp/wordcount/', name: 'Timothy
Word Count Example' }) .map(function(line){ line.split(" ").forEach(function(word) { emit(word, 1); }); }) .reduce(function(word,counts){ emit(word, counts.length); }) .run(function(err){ .. }); Word Count
require('timothy') .map(function(line){ line.split(" ").forEach(function(word) { emit(word, 1); }); }) .reduce(function(word,counts){
emit(word, counts.length); }) .runLocal("/local/input/path"); Local Runner
require('timothy') .map(function(line){ line.split(" ").forEach(function(word) { emit(word, 1); }); }) .reduce(function(word,counts){
emit(word, counts.length); }) .runLocal("/local/input/path"); Local Runner
Dependencies require('timothy') .configure({ ... }) .dependencies({'string' : '0.2.1-2'}) .map(function(line){ var
S = require('string'); line.split(" ").forEach(function(word) { if (S(word).isAlphaNumeric()) { emit(word, 1); } }); }) ...
Dependencies require('timothy') .configure({ ... }) .dependencies({'string' : '0.2.1-2'}) .map(function(line){ var
S = require('string'); line.split(" ").forEach(function(word) { if (S(word).isAlphaNumeric()) { emit(word, 1); } }); }) ...
Setup require('timothy') .configure({ ... }) .dependencies({'string' : '0.2.1-2'}) .map(function(line){ line.split("
").forEach(function(word) { if (S(word).isAlphaNumeric()) { emit(word, 1); } }); }) ... .setup(function() { S = require('string'); })
Setup require('timothy') .configure({ ... }) .dependencies({'string' : '0.2.1-2'}) .map(function(line){ line.split("
").forEach(function(word) { if (S(word).isAlphaNumeric()) { emit(word, 1); } }); }) ... .setup(function() { S = require('string'); })
require('timothy') .configure({ ... }) .map(function(line){ emit(line, 1); }) .reduce(function(line,counts){ emit(line,
counts.length); }) .map(function(line, count){ emit(line[0], count); }) .reduce(function(letter, counts){ var sum = counts.reduce(function(a,i) { return a+i; }); emit(letter, sum); }) .run(); Method Chaining
• Update Job Status • Create and update counters •
Pass env vars to jobs • More examples on github page Other features
Motivation • Big data is now a thing • Lower
the barrier to entry • Benefits of NodeJS on Hadoop • Development Speed
Limitations • Setup method cannot block • Lack support for
lexical scoping • NodeJS needs to be pre-installed on slaves • Probably more we haven’t thought of yet!
Improvements • Bundling local JS scripts • JSON for intermediary
data format • JVM support
Thank you! https://github.com/forward/timothy