$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Timothy
Search
abhinay
June 28, 2012
Programming
2
1.5k
Timothy
Write Hadoop Jobs in NodeJS by Antonio Garrote and Abhinay Mehta
abhinay
June 28, 2012
Tweet
Share
Other Decks in Programming
See All in Programming
Querying Design System デザインシステムの意思決定を支える構造検索
ikumatadokoro
1
1.2k
All(?) About Point Sets
hole
0
280
関数の挙動書き換える
takatofukui
4
770
目的で駆動する、AI時代のアーキテクチャ設計 / purpose-driven-architecture
minodriven
11
3.9k
新卒エンジニアのプルリクエスト with AI駆動
fukunaga2025
0
150
令和最新版Android Studioで化石デバイス向けアプリを作る
arkw
0
230
GeistFabrik and AI-augmented software development
adewale
PRO
0
260
Microservices Platforms: When Team Topologies Meets Microservices Patterns
cer
PRO
1
930
複数人でのCLI/Infrastructure as Codeの暮らしを良くする
shmokmt
5
2.1k
AWS CDKの推しポイントN選
akihisaikeda
1
240
dotfiles 式年遷宮 令和最新版
masawada
1
680
AI時代もSEOを頑張っている話
shirahama_x
0
230
Featured
See All Featured
Into the Great Unknown - MozCon
thekraken
40
2.2k
The Pragmatic Product Professional
lauravandoore
37
7.1k
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.5k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
659
61k
GraphQLの誤解/rethinking-graphql
sonatard
73
11k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
35
3.3k
GitHub's CSS Performance
jonrohan
1032
470k
Music & Morning Musume
bryan
46
7k
Thoughts on Productivity
jonyablonski
73
5k
Why You Should Never Use an ORM
jnunemaker
PRO
60
9.6k
For a Future-Friendly Web
brad_frost
180
10k
Git: the NoSQL Database
bkeepers
PRO
432
66k
Transcript
Timothy https://github.com/forward/timothy Antonio Garrote Abhinay Mehta
Hadoop MapReduce in Node.js
Hadoop • Distributed processing of large data • Derived from
Google MapReduce and GFS • Fast becoming the de facto standard • Large ecosystem • Java • Master/Slave setup
Hadoop Architecture HDFS MapReduce Output Input
• Open Source • Uses Hadoop Streaming API • No
binaries • NPM support Timothy
$ npm install timothy
require('timothy') .configure({ config: './hadoop.xml', input: '/tmp/loremipsum.txt', output: '/tmp/wordcount/', name: 'Timothy
Word Count Example' }) .map(function(line){ line.split(" ").forEach(function(word) { emit(word, 1); }); }) .reduce(function(word,counts){ emit(word, counts.length); }) .run(function(err){ .. }); Word Count
require('timothy') .map(function(line){ line.split(" ").forEach(function(word) { emit(word, 1); }); }) .reduce(function(word,counts){
emit(word, counts.length); }) .runLocal("/local/input/path"); Local Runner
require('timothy') .map(function(line){ line.split(" ").forEach(function(word) { emit(word, 1); }); }) .reduce(function(word,counts){
emit(word, counts.length); }) .runLocal("/local/input/path"); Local Runner
Dependencies require('timothy') .configure({ ... }) .dependencies({'string' : '0.2.1-2'}) .map(function(line){ var
S = require('string'); line.split(" ").forEach(function(word) { if (S(word).isAlphaNumeric()) { emit(word, 1); } }); }) ...
Dependencies require('timothy') .configure({ ... }) .dependencies({'string' : '0.2.1-2'}) .map(function(line){ var
S = require('string'); line.split(" ").forEach(function(word) { if (S(word).isAlphaNumeric()) { emit(word, 1); } }); }) ...
Setup require('timothy') .configure({ ... }) .dependencies({'string' : '0.2.1-2'}) .map(function(line){ line.split("
").forEach(function(word) { if (S(word).isAlphaNumeric()) { emit(word, 1); } }); }) ... .setup(function() { S = require('string'); })
Setup require('timothy') .configure({ ... }) .dependencies({'string' : '0.2.1-2'}) .map(function(line){ line.split("
").forEach(function(word) { if (S(word).isAlphaNumeric()) { emit(word, 1); } }); }) ... .setup(function() { S = require('string'); })
require('timothy') .configure({ ... }) .map(function(line){ emit(line, 1); }) .reduce(function(line,counts){ emit(line,
counts.length); }) .map(function(line, count){ emit(line[0], count); }) .reduce(function(letter, counts){ var sum = counts.reduce(function(a,i) { return a+i; }); emit(letter, sum); }) .run(); Method Chaining
• Update Job Status • Create and update counters •
Pass env vars to jobs • More examples on github page Other features
Motivation • Big data is now a thing • Lower
the barrier to entry • Benefits of NodeJS on Hadoop • Development Speed
Limitations • Setup method cannot block • Lack support for
lexical scoping • NodeJS needs to be pre-installed on slaves • Probably more we haven’t thought of yet!
Improvements • Bundling local JS scripts • JSON for intermediary
data format • JVM support
Thank you! https://github.com/forward/timothy