Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Timothy
Search
abhinay
June 28, 2012
Programming
2
1.5k
Timothy
Write Hadoop Jobs in NodeJS by Antonio Garrote and Abhinay Mehta
abhinay
June 28, 2012
Tweet
Share
Other Decks in Programming
See All in Programming
組込みだけじゃない!TinyGo で始める無料クラウド開発入門
otakakot
2
370
iOSでSVG画像を扱う
kishikawakatsumi
0
160
React Nativeならぬ"Vue Native"が実現するかも?_新世代マルチプラットフォーム開発フレームワークのLynxとLynxのVue.js対応を追ってみよう_Vue Lynx
yut0naga1_fa
2
580
Ktorで簡単AIアプリケーション
tsukakei
0
100
AI駆動で0→1をやって見えた光と伸びしろ
passion0102
1
820
Migration to Signals, Resource API, and NgRx Signal Store
manfredsteyer
PRO
0
110
あなたとKaigi on Rails / Kaigi on Rails + You
shimoju
0
180
Foundation Modelsを実装日本語学習アプリを作ってみた!
hypebeans
1
120
AI Agent 時代的開發者生存指南
eddie
4
2.1k
The Past, Present, and Future of Enterprise Java
ivargrimstad
0
450
はじめてのDSPy - 言語モデルを『プロンプト』ではなく『プログラミング』するための仕組み
masahiro_nishimi
4
14k
Flutterで分数(Fraction)を表示する方法
koukimiura
0
140
Featured
See All Featured
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
10
610
Stop Working from a Prison Cell
hatefulcrawdad
272
21k
How To Stay Up To Date on Web Technology
chriscoyier
791
250k
Into the Great Unknown - MozCon
thekraken
40
2.1k
Reflections from 52 weeks, 52 projects
jeffersonlam
353
21k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
10
890
How to Ace a Technical Interview
jacobian
280
24k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
127
54k
RailsConf 2023
tenderlove
30
1.3k
The Invisible Side of Design
smashingmag
302
51k
Music & Morning Musume
bryan
46
6.9k
Java REST API Framework Comparison - PWX 2021
mraible
34
8.9k
Transcript
Timothy https://github.com/forward/timothy Antonio Garrote Abhinay Mehta
Hadoop MapReduce in Node.js
Hadoop • Distributed processing of large data • Derived from
Google MapReduce and GFS • Fast becoming the de facto standard • Large ecosystem • Java • Master/Slave setup
Hadoop Architecture HDFS MapReduce Output Input
• Open Source • Uses Hadoop Streaming API • No
binaries • NPM support Timothy
$ npm install timothy
require('timothy') .configure({ config: './hadoop.xml', input: '/tmp/loremipsum.txt', output: '/tmp/wordcount/', name: 'Timothy
Word Count Example' }) .map(function(line){ line.split(" ").forEach(function(word) { emit(word, 1); }); }) .reduce(function(word,counts){ emit(word, counts.length); }) .run(function(err){ .. }); Word Count
require('timothy') .map(function(line){ line.split(" ").forEach(function(word) { emit(word, 1); }); }) .reduce(function(word,counts){
emit(word, counts.length); }) .runLocal("/local/input/path"); Local Runner
require('timothy') .map(function(line){ line.split(" ").forEach(function(word) { emit(word, 1); }); }) .reduce(function(word,counts){
emit(word, counts.length); }) .runLocal("/local/input/path"); Local Runner
Dependencies require('timothy') .configure({ ... }) .dependencies({'string' : '0.2.1-2'}) .map(function(line){ var
S = require('string'); line.split(" ").forEach(function(word) { if (S(word).isAlphaNumeric()) { emit(word, 1); } }); }) ...
Dependencies require('timothy') .configure({ ... }) .dependencies({'string' : '0.2.1-2'}) .map(function(line){ var
S = require('string'); line.split(" ").forEach(function(word) { if (S(word).isAlphaNumeric()) { emit(word, 1); } }); }) ...
Setup require('timothy') .configure({ ... }) .dependencies({'string' : '0.2.1-2'}) .map(function(line){ line.split("
").forEach(function(word) { if (S(word).isAlphaNumeric()) { emit(word, 1); } }); }) ... .setup(function() { S = require('string'); })
Setup require('timothy') .configure({ ... }) .dependencies({'string' : '0.2.1-2'}) .map(function(line){ line.split("
").forEach(function(word) { if (S(word).isAlphaNumeric()) { emit(word, 1); } }); }) ... .setup(function() { S = require('string'); })
require('timothy') .configure({ ... }) .map(function(line){ emit(line, 1); }) .reduce(function(line,counts){ emit(line,
counts.length); }) .map(function(line, count){ emit(line[0], count); }) .reduce(function(letter, counts){ var sum = counts.reduce(function(a,i) { return a+i; }); emit(letter, sum); }) .run(); Method Chaining
• Update Job Status • Create and update counters •
Pass env vars to jobs • More examples on github page Other features
Motivation • Big data is now a thing • Lower
the barrier to entry • Benefits of NodeJS on Hadoop • Development Speed
Limitations • Setup method cannot block • Lack support for
lexical scoping • NodeJS needs to be pre-installed on slaves • Probably more we haven’t thought of yet!
Improvements • Bundling local JS scripts • JSON for intermediary
data format • JVM support
Thank you! https://github.com/forward/timothy