Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Timothy

Avatar for abhinay abhinay
June 28, 2012

 Timothy

Write Hadoop Jobs in NodeJS by Antonio Garrote and Abhinay Mehta

Avatar for abhinay

abhinay

June 28, 2012
Tweet

Other Decks in Programming

Transcript

  1. Hadoop • Distributed processing of large data • Derived from

    Google MapReduce and GFS • Fast becoming the de facto standard • Large ecosystem • Java • Master/Slave setup
  2. • Open Source • Uses Hadoop Streaming API • No

    binaries • NPM support Timothy
  3. require('timothy') .configure({ config: './hadoop.xml', input: '/tmp/loremipsum.txt', output: '/tmp/wordcount/', name: 'Timothy

    Word Count Example' }) .map(function(line){ line.split(" ").forEach(function(word) { emit(word, 1); }); }) .reduce(function(word,counts){ emit(word, counts.length); }) .run(function(err){ .. }); Word Count
  4. Dependencies require('timothy') .configure({ ... }) .dependencies({'string' : '0.2.1-2'}) .map(function(line){ var

    S = require('string'); line.split(" ").forEach(function(word) { if (S(word).isAlphaNumeric()) { emit(word, 1); } }); }) ...
  5. Dependencies require('timothy') .configure({ ... }) .dependencies({'string' : '0.2.1-2'}) .map(function(line){ var

    S = require('string'); line.split(" ").forEach(function(word) { if (S(word).isAlphaNumeric()) { emit(word, 1); } }); }) ...
  6. Setup require('timothy') .configure({ ... }) .dependencies({'string' : '0.2.1-2'}) .map(function(line){ line.split("

    ").forEach(function(word) { if (S(word).isAlphaNumeric()) { emit(word, 1); } }); }) ... .setup(function() { S = require('string'); })
  7. Setup require('timothy') .configure({ ... }) .dependencies({'string' : '0.2.1-2'}) .map(function(line){ line.split("

    ").forEach(function(word) { if (S(word).isAlphaNumeric()) { emit(word, 1); } }); }) ... .setup(function() { S = require('string'); })
  8. require('timothy') .configure({ ... }) .map(function(line){ emit(line, 1); }) .reduce(function(line,counts){ emit(line,

    counts.length); }) .map(function(line, count){ emit(line[0], count); }) .reduce(function(letter, counts){ var sum = counts.reduce(function(a,i) { return a+i; }); emit(letter, sum); }) .run(); Method Chaining
  9. • Update Job Status • Create and update counters •

    Pass env vars to jobs • More examples on github page Other features
  10. Motivation • Big data is now a thing • Lower

    the barrier to entry • Benefits of NodeJS on Hadoop • Development Speed
  11. Limitations • Setup method cannot block • Lack support for

    lexical scoping • NodeJS needs to be pre-installed on slaves • Probably more we haven’t thought of yet!