Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building a Serverless Data Pipeline

Lorna Mitchell
September 19, 2017

Building a Serverless Data Pipeline

Signal conference talk about Data, Serverless and a StackOverflow Dashboard

Lorna Mitchell

September 19, 2017
Tweet

More Decks by Lorna Mitchell

Other Decks in Technology

Transcript

  1. Pipeline To Shift Data Bringing data from StackOverflow into the

    dashboard my advocate team uses @lornajane
  2. Why Go Serverless? • Costs nothing when idle • Small

    application, simple architecture • Bursty usage since it runs from a cron • No real-time requirement • Easily within free tier @lornajane
  3. CouchDB Cluster Of Unreliable Commodity Hardware • Modern, scalable document

    database • HTTP API • JSON data format • Best replication on the planet (probably) @lornajane
  4. Serverless Platforms • Amazon Lambda • IBM Cloud Functions (aka

    OpenWhisk) • Twilio Functions • Azure Functions • Google Cloud Functions • Iron Functions • ... and more every week @lornajane
  5. Hello World in JS All the platforms are slightly different,

    this is for OpenWhisk exports.main = function(args) { return({"message": "Hello, World!"}); }; Function must return an object or a Promise @lornajane
  6. OpenWhisk Vocabulary • trigger an event, such as an incoming

    HTTP request • rule map a trigger to an action • action a function, optionally with parameters • package collect actions and parameters together • sequence more than one action in a row @lornajane
  7. Working With Actions Deploy code: zip hello.zip index.js wsk action

    update --kind nodejs:6 demo/hello1 hello.zip Then run it: wsk action invoke --blocking demo/hello1 @lornajane
  8. Web-Enabled Actions Deploy code: zip hello.zip index.js wsk action update

    --kind nodejs:6 --web true demo/hello1 hello.zip Then curl it: curl https://openwhisk.ng.bluemix.net/api/v1/web/.../hello1.json @lornajane
  9. OfflineFirst Applications This app is OfflineFirst: • Client side JS

    • Client side copy of DB using PouchDB • Background sync to serverside CouchDB @lornajane
  10. Designing for Serverless • Modular, testable functions • Each with

    a single purpose • Observe data hygiene @lornajane
  11. Pipeline Actions Sequence socron • collector makes an API call,

    passes on data • invoker fires many actions: one for each item Sequence qhandler • storer inserts or updates the record • notifier sends a webhook to slack or a bot @lornajane
  12. Triggering the Pipeline It's a cron job, so we create

    a periodic trigger wsk trigger delete couchdbish wsk trigger create couchdbish --feed /whisk.system/alarms/alarm \ --param cron "*/5 * * * *" \ --param trigger_payload "{\"tags\": [\"cloudant\",\"ibm-cloudant Use a rule to link this to our socron sequence wsk rule update couchdbrule couchdbish stackoverflow/socron wsk rule enable couchdbrule @lornajane
  13. Collector 1 var request = require('request'); 2 function main(message) {

    3 return new Promise(function(resolve, reject) { 4 var tagged = message.tags.join(';'); 5 var r = { method: 'get', url: https://api.stackexchange.com 6 request(r, function(err, response, body) { 7 if (err) { return reject(err); } 8 if (response.statusCode != 200) { throw(new Error('status 9 resolve({ items: body.items }); 10 }); 11 }); 12 } 13 module.exports = main; @lornajane
  14. Invoker 1 function main(args) { 2 return new Promise(function(resolve, reject)

    { 3 var openwhisk = require('openwhisk'); 4 var ow = openwhisk(); 5 var actions = args.items.map(function (item) { 6 return ow.actions.invoke( 7 {actionName: "stackoverflow/qhandler", params: {questio 8 }); 9 return Promise.all(actions).then(function (results) { 10 return resolve({payload: "All OK: " + results.length + " 11 }); 12 }); 13 } @lornajane
  15. Storer 1 function main(message) { 2 var cloudant = require('cloudant')({url:

    message.cloudantURL, 3 var db = cloudant.db.use(message.dbname); 4 var id = message.question.question_id.toString(); 5 return getDoc(db, id).then(function(data) { 6 if (data === null) { // so insert 7 message.question.tags = message.question.tags.sort(); // 8 var obj = { _id: id, type: 'question', owner: null, statu 9 return db.insert(obj).then(function(data) { 10 return obj; // pass on the new object to the next actio 11 }); 12 } else { ... } 13 }); @lornajane
  16. Notifier 1 function main(data) { 2 return new Promise(function(resolve, reject)

    { 3 var request = require('request'); 4 if(data.status == 'new') { 5 var event = { type: "new-question", data: data }; 6 request({ 7 url: hardcoded_hubot_url, method: "POST", headers: {"Co 8 }, function (err, response, body) { 9 if(err) { reject ({payload: "Failed"}); 10 } else { resolve( {payload: "Notified"} ); } 11 }); 12 } else { resolve( {payload: "Complete"} ); } 13 }); @lornajane
  17. Resources • IBM Cloud Functions: https://www.ibm.com/cloud-computing/bluemix/openwhisk • Code https://github.com/ibm-watson-data-lab/soingest •

    My blog: https://lornajane.net • CouchDB: http://couchdb.apache.org/ • PouchDB: https://pouchdb.com/ • Hubot: http://hubot.github.com @lornajane