Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Timothy
Search
abhinay
June 28, 2012
Programming
2
1.5k
Timothy
Write Hadoop Jobs in NodeJS by Antonio Garrote and Abhinay Mehta
abhinay
June 28, 2012
Tweet
Share
Other Decks in Programming
See All in Programming
PHPでWebSocketサーバーを実装しよう2025
kubotak
0
280
VS Code Update for GitHub Copilot
74th
2
630
ペアプロ × 生成AI 現場での実践と課題について / generative-ai-in-pair-programming
codmoninc
1
16k
チームで開発し事業を加速するための"良い"設計の考え方 @ サポーターズCoLab 2025-07-08
agatan
1
320
A full stack side project webapp all in Kotlin (KotlinConf 2025)
dankim
0
110
Deep Dive into ~/.claude/projects
hiragram
13
2.5k
20250704_教育事業におけるアジャイルなデータ基盤構築
hanon52_
5
730
10 Costly Database Performance Mistakes (And How To Fix Them)
andyatkinson
0
230
AIエージェントはこう育てる - GitHub Copilot Agentとチームの共進化サイクル
koboriakira
0
510
#QiitaBash MCPのセキュリティ
ryosukedtomita
1
980
5つのアンチパターンから学ぶLT設計
narihara
1
160
Is Xcode slowly dying out in 2025?
uetyo
1
260
Featured
See All Featured
Rails Girls Zürich Keynote
gr2m
94
14k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
PRO
20
1.3k
Designing for Performance
lara
610
69k
Learning to Love Humans: Emotional Interface Design
aarron
273
40k
How to train your dragon (web standard)
notwaldorf
94
6.1k
Testing 201, or: Great Expectations
jmmastey
42
7.6k
Building Better People: How to give real-time feedback that sticks.
wjessup
367
19k
Typedesign – Prime Four
hannesfritz
42
2.7k
Gamification - CAS2011
davidbonilla
81
5.3k
How to Think Like a Performance Engineer
csswizardry
24
1.7k
RailsConf 2023
tenderlove
30
1.1k
We Have a Design System, Now What?
morganepeng
53
7.7k
Transcript
Timothy https://github.com/forward/timothy Antonio Garrote Abhinay Mehta
Hadoop MapReduce in Node.js
Hadoop • Distributed processing of large data • Derived from
Google MapReduce and GFS • Fast becoming the de facto standard • Large ecosystem • Java • Master/Slave setup
Hadoop Architecture HDFS MapReduce Output Input
• Open Source • Uses Hadoop Streaming API • No
binaries • NPM support Timothy
$ npm install timothy
require('timothy') .configure({ config: './hadoop.xml', input: '/tmp/loremipsum.txt', output: '/tmp/wordcount/', name: 'Timothy
Word Count Example' }) .map(function(line){ line.split(" ").forEach(function(word) { emit(word, 1); }); }) .reduce(function(word,counts){ emit(word, counts.length); }) .run(function(err){ .. }); Word Count
require('timothy') .map(function(line){ line.split(" ").forEach(function(word) { emit(word, 1); }); }) .reduce(function(word,counts){
emit(word, counts.length); }) .runLocal("/local/input/path"); Local Runner
require('timothy') .map(function(line){ line.split(" ").forEach(function(word) { emit(word, 1); }); }) .reduce(function(word,counts){
emit(word, counts.length); }) .runLocal("/local/input/path"); Local Runner
Dependencies require('timothy') .configure({ ... }) .dependencies({'string' : '0.2.1-2'}) .map(function(line){ var
S = require('string'); line.split(" ").forEach(function(word) { if (S(word).isAlphaNumeric()) { emit(word, 1); } }); }) ...
Dependencies require('timothy') .configure({ ... }) .dependencies({'string' : '0.2.1-2'}) .map(function(line){ var
S = require('string'); line.split(" ").forEach(function(word) { if (S(word).isAlphaNumeric()) { emit(word, 1); } }); }) ...
Setup require('timothy') .configure({ ... }) .dependencies({'string' : '0.2.1-2'}) .map(function(line){ line.split("
").forEach(function(word) { if (S(word).isAlphaNumeric()) { emit(word, 1); } }); }) ... .setup(function() { S = require('string'); })
Setup require('timothy') .configure({ ... }) .dependencies({'string' : '0.2.1-2'}) .map(function(line){ line.split("
").forEach(function(word) { if (S(word).isAlphaNumeric()) { emit(word, 1); } }); }) ... .setup(function() { S = require('string'); })
require('timothy') .configure({ ... }) .map(function(line){ emit(line, 1); }) .reduce(function(line,counts){ emit(line,
counts.length); }) .map(function(line, count){ emit(line[0], count); }) .reduce(function(letter, counts){ var sum = counts.reduce(function(a,i) { return a+i; }); emit(letter, sum); }) .run(); Method Chaining
• Update Job Status • Create and update counters •
Pass env vars to jobs • More examples on github page Other features
Motivation • Big data is now a thing • Lower
the barrier to entry • Benefits of NodeJS on Hadoop • Development Speed
Limitations • Setup method cannot block • Lack support for
lexical scoping • NodeJS needs to be pre-installed on slaves • Probably more we haven’t thought of yet!
Improvements • Bundling local JS scripts • JSON for intermediary
data format • JVM support
Thank you! https://github.com/forward/timothy