Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Timothy
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
abhinay
June 28, 2012
Programming
2
1.5k
Timothy
Write Hadoop Jobs in NodeJS by Antonio Garrote and Abhinay Mehta
abhinay
June 28, 2012
Tweet
Share
Other Decks in Programming
See All in Programming
360° Signals in Angular: Signal Forms with SignalStore & Resources @ngLondon 01/2026
manfredsteyer
PRO
0
110
Vibe Coding - AI 驅動的軟體開發
mickyp100
0
170
AgentCoreとHuman in the Loop
har1101
5
220
カスタマーサクセス業務を変革したヘルススコアの実現と学び
_hummer0724
0
620
AIによるイベントストーミング図からのコード生成 / AI-powered code generation from Event Storming diagrams
nrslib
2
1.8k
Oxlint JS plugins
kazupon
1
660
プロダクトオーナーから見たSOC2 _SOC2ゆるミートアップ#2
kekekenta
0
200
副作用をどこに置くか問題:オブジェクト指向で整理する設計判断ツリー
koxya
1
590
OCaml 5でモダンな並列プログラミングを Enjoyしよう!
haochenx
0
130
なぜSQLはAIぽく見えるのか/why does SQL look AI like
florets1
0
440
【卒業研究】会話ログ分析によるユーザーごとの関心に応じた話題提案手法
momok47
0
190
CSC307 Lecture 08
javiergs
PRO
0
670
Featured
See All Featured
Darren the Foodie - Storyboard
khoart
PRO
2
2.3k
Joys of Absence: A Defence of Solitary Play
codingconduct
1
290
Digital Ethics as a Driver of Design Innovation
axbom
PRO
1
170
Hiding What from Whom? A Critical Review of the History of Programming languages for Music
tomoyanonymous
2
410
Learning to Love Humans: Emotional Interface Design
aarron
275
41k
Bridging the Design Gap: How Collaborative Modelling removes blockers to flow between stakeholders and teams @FastFlow conf
baasie
0
440
The Cost Of JavaScript in 2023
addyosmani
55
9.5k
Rails Girls Zürich Keynote
gr2m
96
14k
WCS-LA-2024
lcolladotor
0
440
Keith and Marios Guide to Fast Websites
keithpitt
413
23k
Why Our Code Smells
bkeepers
PRO
340
58k
[RailsConf 2023] Rails as a piece of cake
palkan
59
6.3k
Transcript
Timothy https://github.com/forward/timothy Antonio Garrote Abhinay Mehta
Hadoop MapReduce in Node.js
Hadoop • Distributed processing of large data • Derived from
Google MapReduce and GFS • Fast becoming the de facto standard • Large ecosystem • Java • Master/Slave setup
Hadoop Architecture HDFS MapReduce Output Input
• Open Source • Uses Hadoop Streaming API • No
binaries • NPM support Timothy
$ npm install timothy
require('timothy') .configure({ config: './hadoop.xml', input: '/tmp/loremipsum.txt', output: '/tmp/wordcount/', name: 'Timothy
Word Count Example' }) .map(function(line){ line.split(" ").forEach(function(word) { emit(word, 1); }); }) .reduce(function(word,counts){ emit(word, counts.length); }) .run(function(err){ .. }); Word Count
require('timothy') .map(function(line){ line.split(" ").forEach(function(word) { emit(word, 1); }); }) .reduce(function(word,counts){
emit(word, counts.length); }) .runLocal("/local/input/path"); Local Runner
require('timothy') .map(function(line){ line.split(" ").forEach(function(word) { emit(word, 1); }); }) .reduce(function(word,counts){
emit(word, counts.length); }) .runLocal("/local/input/path"); Local Runner
Dependencies require('timothy') .configure({ ... }) .dependencies({'string' : '0.2.1-2'}) .map(function(line){ var
S = require('string'); line.split(" ").forEach(function(word) { if (S(word).isAlphaNumeric()) { emit(word, 1); } }); }) ...
Dependencies require('timothy') .configure({ ... }) .dependencies({'string' : '0.2.1-2'}) .map(function(line){ var
S = require('string'); line.split(" ").forEach(function(word) { if (S(word).isAlphaNumeric()) { emit(word, 1); } }); }) ...
Setup require('timothy') .configure({ ... }) .dependencies({'string' : '0.2.1-2'}) .map(function(line){ line.split("
").forEach(function(word) { if (S(word).isAlphaNumeric()) { emit(word, 1); } }); }) ... .setup(function() { S = require('string'); })
Setup require('timothy') .configure({ ... }) .dependencies({'string' : '0.2.1-2'}) .map(function(line){ line.split("
").forEach(function(word) { if (S(word).isAlphaNumeric()) { emit(word, 1); } }); }) ... .setup(function() { S = require('string'); })
require('timothy') .configure({ ... }) .map(function(line){ emit(line, 1); }) .reduce(function(line,counts){ emit(line,
counts.length); }) .map(function(line, count){ emit(line[0], count); }) .reduce(function(letter, counts){ var sum = counts.reduce(function(a,i) { return a+i; }); emit(letter, sum); }) .run(); Method Chaining
• Update Job Status • Create and update counters •
Pass env vars to jobs • More examples on github page Other features
Motivation • Big data is now a thing • Lower
the barrier to entry • Benefits of NodeJS on Hadoop • Development Speed
Limitations • Setup method cannot block • Lack support for
lexical scoping • NodeJS needs to be pre-installed on slaves • Probably more we haven’t thought of yet!
Improvements • Bundling local JS scripts • JSON for intermediary
data format • JVM support
Thank you! https://github.com/forward/timothy