Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Timothy
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
abhinay
June 28, 2012
Programming
2
1.5k
Timothy
Write Hadoop Jobs in NodeJS by Antonio Garrote and Abhinay Mehta
abhinay
June 28, 2012
Tweet
Share
Other Decks in Programming
See All in Programming
AgentCoreとHuman in the Loop
har1101
5
220
コマンドとリード間の連携に対する脅威分析フレームワーク
pandayumi
1
440
OCaml 5でモダンな並列プログラミングを Enjoyしよう!
haochenx
0
120
フルサイクルエンジニアリングをAI Agentで全自動化したい 〜構想と現在地〜
kamina_zzz
0
400
AIと一緒にレガシーに向き合ってみた
nyafunta9858
0
160
ZJIT: The Ruby 4 JIT Compiler / Ruby Release 30th Anniversary Party
k0kubun
1
390
そのAIレビュー、レビューしてますか? / Are you reviewing those AI reviews?
rkaga
6
4.5k
Architectural Extensions
denyspoltorak
0
270
20260127_試行錯誤の結晶を1冊に。著者が解説 先輩データサイエンティストからの指南書 / author's_commentary_ds_instructions_guide
nash_efp
0
890
例外処理とどう使い分ける?Result型を使ったエラー設計 #burikaigi
kajitack
16
6k
Rust 製のコードエディタ “Zed” を使ってみた
nearme_tech
PRO
0
140
dchart: charts from deck markup
ajstarks
3
990
Featured
See All Featured
Chasing Engaging Ingredients in Design
codingconduct
0
110
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
12
1k
Color Theory Basics | Prateek | Gurzu
gurzu
0
190
Paper Plane
katiecoart
PRO
0
46k
Learning to Love Humans: Emotional Interface Design
aarron
275
41k
Winning Ecommerce Organic Search in an AI Era - #searchnstuff2025
aleyda
0
1.8k
Java REST API Framework Comparison - PWX 2021
mraible
34
9.1k
We Have a Design System, Now What?
morganepeng
54
8k
Money Talks: Using Revenue to Get Sh*t Done
nikkihalliwell
0
150
Measuring & Analyzing Core Web Vitals
bluesmoon
9
750
It's Worth the Effort
3n
188
29k
How Fast Is Fast Enough? [PerfNow 2025]
tammyeverts
3
450
Transcript
Timothy https://github.com/forward/timothy Antonio Garrote Abhinay Mehta
Hadoop MapReduce in Node.js
Hadoop • Distributed processing of large data • Derived from
Google MapReduce and GFS • Fast becoming the de facto standard • Large ecosystem • Java • Master/Slave setup
Hadoop Architecture HDFS MapReduce Output Input
• Open Source • Uses Hadoop Streaming API • No
binaries • NPM support Timothy
$ npm install timothy
require('timothy') .configure({ config: './hadoop.xml', input: '/tmp/loremipsum.txt', output: '/tmp/wordcount/', name: 'Timothy
Word Count Example' }) .map(function(line){ line.split(" ").forEach(function(word) { emit(word, 1); }); }) .reduce(function(word,counts){ emit(word, counts.length); }) .run(function(err){ .. }); Word Count
require('timothy') .map(function(line){ line.split(" ").forEach(function(word) { emit(word, 1); }); }) .reduce(function(word,counts){
emit(word, counts.length); }) .runLocal("/local/input/path"); Local Runner
require('timothy') .map(function(line){ line.split(" ").forEach(function(word) { emit(word, 1); }); }) .reduce(function(word,counts){
emit(word, counts.length); }) .runLocal("/local/input/path"); Local Runner
Dependencies require('timothy') .configure({ ... }) .dependencies({'string' : '0.2.1-2'}) .map(function(line){ var
S = require('string'); line.split(" ").forEach(function(word) { if (S(word).isAlphaNumeric()) { emit(word, 1); } }); }) ...
Dependencies require('timothy') .configure({ ... }) .dependencies({'string' : '0.2.1-2'}) .map(function(line){ var
S = require('string'); line.split(" ").forEach(function(word) { if (S(word).isAlphaNumeric()) { emit(word, 1); } }); }) ...
Setup require('timothy') .configure({ ... }) .dependencies({'string' : '0.2.1-2'}) .map(function(line){ line.split("
").forEach(function(word) { if (S(word).isAlphaNumeric()) { emit(word, 1); } }); }) ... .setup(function() { S = require('string'); })
Setup require('timothy') .configure({ ... }) .dependencies({'string' : '0.2.1-2'}) .map(function(line){ line.split("
").forEach(function(word) { if (S(word).isAlphaNumeric()) { emit(word, 1); } }); }) ... .setup(function() { S = require('string'); })
require('timothy') .configure({ ... }) .map(function(line){ emit(line, 1); }) .reduce(function(line,counts){ emit(line,
counts.length); }) .map(function(line, count){ emit(line[0], count); }) .reduce(function(letter, counts){ var sum = counts.reduce(function(a,i) { return a+i; }); emit(letter, sum); }) .run(); Method Chaining
• Update Job Status • Create and update counters •
Pass env vars to jobs • More examples on github page Other features
Motivation • Big data is now a thing • Lower
the barrier to entry • Benefits of NodeJS on Hadoop • Development Speed
Limitations • Setup method cannot block • Lack support for
lexical scoping • NodeJS needs to be pre-installed on slaves • Probably more we haven’t thought of yet!
Improvements • Bundling local JS scripts • JSON for intermediary
data format • JVM support
Thank you! https://github.com/forward/timothy