Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Build A Serverless Data Pipeline

Build A Serverless Data Pipeline

Short version of the serverless data pipeline talk for ServerlessCPH in Copenhagen

Lorna Mitchell

May 16, 2018
Tweet

More Decks by Lorna Mitchell

Other Decks in Technology

Transcript

  1. Pipeline To Shift Data Bringing data from StackOverflow into the

    dashboard my advocate team uses @lornajane
  2. Why Go Serverless? • Costs nothing when idle • Small

    application, simple architecture • Bursty usage since it runs from a cron • No real-time requirement • Easily within free tier @lornajane
  3. Apache CouchDB • Modern, robust, scalable document database • HTTP

    API • JSON data format • Best replication on the planet (probably) @lornajane
  4. OfflineFirst Applications This app is OfflineFirst: • Client side JS

    • Client side copy of DB using PouchDB • Background sync to serverside CouchDB @lornajane
  5. Start with Security Need an API key or user creds

    for bx wsk tool Web actions: we know how to secure HTTP connections, so do it! • Auth standards e.g. JWT • Security in transmission: use HTTPS @lornajane
  6. Logging Considerations • Standard, configurable logging setup • Use a

    trace_id to link requests between services • Aggregate logs to a central place, ensure search functionality • Collect metrics (invocations, execution time, error rates) • display metrics on a dashboard • have appropriate, configurable alerting @lornajane
  7. Pipeline Actions Sequence socron • collector makes an API call,

    passes on data • invoker fires many actions: one for each item Sequence qhandler • storer inserts or updates the record • notifier sends a webhook to slack or a bot @lornajane
  8. Collector 1 var request = require('request'); 2 function main(message) {

    3 return new Promise(function(resolve, reject) { 4 var tagged = message.tags.join(';'); 5 var r = { method: 'get', url: https://api.stackexchange.com 6 request(r, function(err, response, body) { 7 if (err) { return reject(err); } 8 if (response.statusCode != 200) { throw(new Error('status 9 resolve({ items: body.items }); 10 }); 11 }); 12 } 13 module.exports = main; @lornajane
  9. Invoker 1 function main(args) { 2 return new Promise(function(resolve, reject)

    { 3 var openwhisk = require('openwhisk'); 4 var ow = openwhisk(); 5 var actions = args.items.map(function (item) { 6 return ow.actions.invoke( 7 {actionName: "stackoverflow/qhandler", params: {questio 8 }); 9 return Promise.all(actions).then(function (results) { 10 return resolve({payload: "All OK: " + results.length + " 11 }); 12 }); 13 } @lornajane
  10. Storer 1 function main(message) { 2 var cloudant = require('cloudant')({url:

    message.cloudantURL, 3 var db = cloudant.db.use(message.dbname); 4 var id = message.question.question_id.toString(); 5 return getDoc(db, id).then(function(data) { 6 if (data === null) { // so insert 7 message.question.tags = message.question.tags.sort(); // 8 var obj = { _id: id, type: 'question', owner: null, statu 9 return db.insert(obj).then(function(data) { 10 return obj; // pass on the new object to the next actio 11 }); 12 } else { ... } 13 }); @lornajane
  11. Notifier 1 function main(data) { 2 return new Promise(function(resolve, reject)

    { 3 var request = require('request'); 4 if(data.status == 'new') { 5 var event = { type: "new-question", data: data }; 6 request({ 7 url: hardcoded_hubot_url, method: "POST", headers: {"Co 8 }, function (err, response, body) { 9 if(err) { reject ({payload: "Failed"}); 10 } else { resolve( {payload: "Notified"} ); } 11 }); 12 } else { resolve( {payload: "Complete"} ); } 13 }); @lornajane
  12. Deployment • IBM Cloud Deployments or TravisCI • Deploy on

    commit (optionally just what has changed) • Recreate triggers and rules if appropriate • Use environment variables for secrets • Install bx command to use at deploy time @lornajane
  13. Resources • Cloud Functions: https://console.bluemix.net/openwhisk/ • Code https://github.com/ibm-watson-data-lab/soingest • My

    blog: https://lornajane.net/ • OpenWhisk: https://openwhisk.org/ • CouchDB: https://couchdb.apache.org/ • Offline First: https://offlinefirst.org/ @lornajane