$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Timothy
Search
abhinay
June 28, 2012
Programming
2
1.5k
Timothy
Write Hadoop Jobs in NodeJS by Antonio Garrote and Abhinay Mehta
abhinay
June 28, 2012
Tweet
Share
Other Decks in Programming
See All in Programming
TerraformとStrands AgentsでAmazon Bedrock AgentCoreのSSO認証付きエージェントを量産しよう!
neruneruo
4
1.7k
新卒エンジニアのプルリクエスト with AI駆動
fukunaga2025
0
230
ローカルLLMを⽤いてコード補完を⾏う VSCode拡張機能を作ってみた
nearme_tech
PRO
0
160
Kotlin Multiplatform Meetup - Compose Multiplatform 외부 의존성 아키텍처 설계부터 운영까지
wisemuji
0
120
AI 駆動開発ライフサイクル(AI-DLC):ソフトウェアエンジニアリングの再構築 / AI-DLC Introduction
kanamasa
11
3.8k
まだ間に合う!Claude Code元年をふりかえる
nogu66
5
890
Combinatorial Interview Problems with Backtracking Solutions - From Imperative Procedural Programming to Declarative Functional Programming - Part 2
philipschwarz
PRO
0
110
Implementation Patterns
denyspoltorak
0
110
HTTPプロトコル正しく理解していますか? 〜かわいい猫と共に学ぼう。ฅ^•ω•^ฅ ニャ〜
hekuchan
2
120
Spinner 軸ズレ現象を調べたらレンダリング深淵に飲まれた #レバテックMeetup
bengo4com
0
170
ゆくKotlin くるRust
exoego
1
160
AI前提で考えるiOSアプリのモダナイズ設計
yuukiw00w
0
190
Featured
See All Featured
世界の人気アプリ100個を分析して見えたペイウォール設計の心得
akihiro_kokubo
PRO
65
35k
AI Search: Implications for SEO and How to Move Forward - #ShenzhenSEOConference
aleyda
1
1k
End of SEO as We Know It (SMX Advanced Version)
ipullrank
2
3.8k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
46
2.6k
GitHub's CSS Performance
jonrohan
1032
470k
Sam Torres - BigQuery for SEOs
techseoconnect
PRO
0
140
How to make the Groovebox
asonas
2
1.8k
HU Berlin: Industrial-Strength Natural Language Processing with spaCy and Prodigy
inesmontani
PRO
0
100
The Cost Of JavaScript in 2023
addyosmani
55
9.4k
Building an army of robots
kneath
306
46k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
254
22k
GraphQLの誤解/rethinking-graphql
sonatard
73
11k
Transcript
Timothy https://github.com/forward/timothy Antonio Garrote Abhinay Mehta
Hadoop MapReduce in Node.js
Hadoop • Distributed processing of large data • Derived from
Google MapReduce and GFS • Fast becoming the de facto standard • Large ecosystem • Java • Master/Slave setup
Hadoop Architecture HDFS MapReduce Output Input
• Open Source • Uses Hadoop Streaming API • No
binaries • NPM support Timothy
$ npm install timothy
require('timothy') .configure({ config: './hadoop.xml', input: '/tmp/loremipsum.txt', output: '/tmp/wordcount/', name: 'Timothy
Word Count Example' }) .map(function(line){ line.split(" ").forEach(function(word) { emit(word, 1); }); }) .reduce(function(word,counts){ emit(word, counts.length); }) .run(function(err){ .. }); Word Count
require('timothy') .map(function(line){ line.split(" ").forEach(function(word) { emit(word, 1); }); }) .reduce(function(word,counts){
emit(word, counts.length); }) .runLocal("/local/input/path"); Local Runner
require('timothy') .map(function(line){ line.split(" ").forEach(function(word) { emit(word, 1); }); }) .reduce(function(word,counts){
emit(word, counts.length); }) .runLocal("/local/input/path"); Local Runner
Dependencies require('timothy') .configure({ ... }) .dependencies({'string' : '0.2.1-2'}) .map(function(line){ var
S = require('string'); line.split(" ").forEach(function(word) { if (S(word).isAlphaNumeric()) { emit(word, 1); } }); }) ...
Dependencies require('timothy') .configure({ ... }) .dependencies({'string' : '0.2.1-2'}) .map(function(line){ var
S = require('string'); line.split(" ").forEach(function(word) { if (S(word).isAlphaNumeric()) { emit(word, 1); } }); }) ...
Setup require('timothy') .configure({ ... }) .dependencies({'string' : '0.2.1-2'}) .map(function(line){ line.split("
").forEach(function(word) { if (S(word).isAlphaNumeric()) { emit(word, 1); } }); }) ... .setup(function() { S = require('string'); })
Setup require('timothy') .configure({ ... }) .dependencies({'string' : '0.2.1-2'}) .map(function(line){ line.split("
").forEach(function(word) { if (S(word).isAlphaNumeric()) { emit(word, 1); } }); }) ... .setup(function() { S = require('string'); })
require('timothy') .configure({ ... }) .map(function(line){ emit(line, 1); }) .reduce(function(line,counts){ emit(line,
counts.length); }) .map(function(line, count){ emit(line[0], count); }) .reduce(function(letter, counts){ var sum = counts.reduce(function(a,i) { return a+i; }); emit(letter, sum); }) .run(); Method Chaining
• Update Job Status • Create and update counters •
Pass env vars to jobs • More examples on github page Other features
Motivation • Big data is now a thing • Lower
the barrier to entry • Benefits of NodeJS on Hadoop • Development Speed
Limitations • Setup method cannot block • Lack support for
lexical scoping • NodeJS needs to be pre-installed on slaves • Probably more we haven’t thought of yet!
Improvements • Bundling local JS scripts • JSON for intermediary
data format • JVM support
Thank you! https://github.com/forward/timothy