Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Timothy

abhinay
June 28, 2012

 Timothy

Write Hadoop Jobs in NodeJS by Antonio Garrote and Abhinay Mehta

abhinay

June 28, 2012
Tweet

Other Decks in Programming

Transcript

  1. Hadoop • Distributed processing of large data • Derived from

    Google MapReduce and GFS • Fast becoming the de facto standard • Large ecosystem • Java • Master/Slave setup
  2. • Open Source • Uses Hadoop Streaming API • No

    binaries • NPM support Timothy
  3. require('timothy') .configure({ config: './hadoop.xml', input: '/tmp/loremipsum.txt', output: '/tmp/wordcount/', name: 'Timothy

    Word Count Example' }) .map(function(line){ line.split(" ").forEach(function(word) { emit(word, 1); }); }) .reduce(function(word,counts){ emit(word, counts.length); }) .run(function(err){ .. }); Word Count
  4. Dependencies require('timothy') .configure({ ... }) .dependencies({'string' : '0.2.1-2'}) .map(function(line){ var

    S = require('string'); line.split(" ").forEach(function(word) { if (S(word).isAlphaNumeric()) { emit(word, 1); } }); }) ...
  5. Dependencies require('timothy') .configure({ ... }) .dependencies({'string' : '0.2.1-2'}) .map(function(line){ var

    S = require('string'); line.split(" ").forEach(function(word) { if (S(word).isAlphaNumeric()) { emit(word, 1); } }); }) ...
  6. Setup require('timothy') .configure({ ... }) .dependencies({'string' : '0.2.1-2'}) .map(function(line){ line.split("

    ").forEach(function(word) { if (S(word).isAlphaNumeric()) { emit(word, 1); } }); }) ... .setup(function() { S = require('string'); })
  7. Setup require('timothy') .configure({ ... }) .dependencies({'string' : '0.2.1-2'}) .map(function(line){ line.split("

    ").forEach(function(word) { if (S(word).isAlphaNumeric()) { emit(word, 1); } }); }) ... .setup(function() { S = require('string'); })
  8. require('timothy') .configure({ ... }) .map(function(line){ emit(line, 1); }) .reduce(function(line,counts){ emit(line,

    counts.length); }) .map(function(line, count){ emit(line[0], count); }) .reduce(function(letter, counts){ var sum = counts.reduce(function(a,i) { return a+i; }); emit(letter, sum); }) .run(); Method Chaining
  9. • Update Job Status • Create and update counters •

    Pass env vars to jobs • More examples on github page Other features
  10. Motivation • Big data is now a thing • Lower

    the barrier to entry • Benefits of NodeJS on Hadoop • Development Speed
  11. Limitations • Setup method cannot block • Lack support for

    lexical scoping • NodeJS needs to be pre-installed on slaves • Probably more we haven’t thought of yet!