$30 off During Our Annual Pro Sale. View details »

Build A Serverless Data Pipeline

Build A Serverless Data Pipeline

Short version of the serverless data pipeline talk for ServerlessCPH in Copenhagen

Lorna Mitchell

May 16, 2018
Tweet

More Decks by Lorna Mitchell

Other Decks in Technology

Transcript

  1. Build A Serverless Data Pipeline Lorna Mitchell, IBM https://lornajane.net/resources

  2. Stackoverflow Dashboard @lornajane

  3. Pipeline To Shift Data Bringing data from StackOverflow into the

    dashboard my advocate team uses @lornajane
  4. Why Go Serverless? • Costs nothing when idle • Small

    application, simple architecture • Bursty usage since it runs from a cron • No real-time requirement • Easily within free tier @lornajane
  5. An Aside About Databases @lornajane

  6. Document Databases Store collections of schemaless documents, in JSON @lornajane

  7. Apache CouchDB • Modern, robust, scalable document database • HTTP

    API • JSON data format • Best replication on the planet (probably) @lornajane
  8. OfflineFirst Applications This app is OfflineFirst: • Client side JS

    • Client side copy of DB using PouchDB • Background sync to serverside CouchDB @lornajane
  9. Build the Data Pipeline @lornajane

  10. Serverless Functions • independent • single purpose • testable •

    scalable @lornajane
  11. Start with Security Need an API key or user creds

    for bx wsk tool Web actions: we know how to secure HTTP connections, so do it! • Auth standards e.g. JWT • Security in transmission: use HTTPS @lornajane
  12. Logging Considerations • Standard, configurable logging setup • Use a

    trace_id to link requests between services • Aggregate logs to a central place, ensure search functionality • Collect metrics (invocations, execution time, error rates) • display metrics on a dashboard • have appropriate, configurable alerting @lornajane
  13. Pipeline Actions Sequence socron • collector makes an API call,

    passes on data • invoker fires many actions: one for each item Sequence qhandler • storer inserts or updates the record • notifier sends a webhook to slack or a bot @lornajane
  14. Pipeline Actions @lornajane

  15. Collector 1 var request = require('request'); 2 function main(message) {

    3 return new Promise(function(resolve, reject) { 4 var tagged = message.tags.join(';'); 5 var r = { method: 'get', url: https://api.stackexchange.com 6 request(r, function(err, response, body) { 7 if (err) { return reject(err); } 8 if (response.statusCode != 200) { throw(new Error('status 9 resolve({ items: body.items }); 10 }); 11 }); 12 } 13 module.exports = main; @lornajane
  16. Invoker 1 function main(args) { 2 return new Promise(function(resolve, reject)

    { 3 var openwhisk = require('openwhisk'); 4 var ow = openwhisk(); 5 var actions = args.items.map(function (item) { 6 return ow.actions.invoke( 7 {actionName: "stackoverflow/qhandler", params: {questio 8 }); 9 return Promise.all(actions).then(function (results) { 10 return resolve({payload: "All OK: " + results.length + " 11 }); 12 }); 13 } @lornajane
  17. Storer 1 function main(message) { 2 var cloudant = require('cloudant')({url:

    message.cloudantURL, 3 var db = cloudant.db.use(message.dbname); 4 var id = message.question.question_id.toString(); 5 return getDoc(db, id).then(function(data) { 6 if (data === null) { // so insert 7 message.question.tags = message.question.tags.sort(); // 8 var obj = { _id: id, type: 'question', owner: null, statu 9 return db.insert(obj).then(function(data) { 10 return obj; // pass on the new object to the next actio 11 }); 12 } else { ... } 13 }); @lornajane
  18. Notifier 1 function main(data) { 2 return new Promise(function(resolve, reject)

    { 3 var request = require('request'); 4 if(data.status == 'new') { 5 var event = { type: "new-question", data: data }; 6 request({ 7 url: hardcoded_hubot_url, method: "POST", headers: {"Co 8 }, function (err, response, body) { 9 if(err) { reject ({payload: "Failed"}); 10 } else { resolve( {payload: "Notified"} ); } 11 }); 12 } else { resolve( {payload: "Complete"} ); } 13 }); @lornajane
  19. ... and breathe! @lornajane

  20. Deployment • IBM Cloud Deployments or TravisCI • Deploy on

    commit (optionally just what has changed) • Recreate triggers and rules if appropriate • Use environment variables for secrets • Install bx command to use at deploy time @lornajane
  21. Serverless And Data @lornajane

  22. Resources • Cloud Functions: https://console.bluemix.net/openwhisk/ • Code https://github.com/ibm-watson-data-lab/soingest • My

    blog: https://lornajane.net/ • OpenWhisk: https://openwhisk.org/ • CouchDB: https://couchdb.apache.org/ • Offline First: https://offlinefirst.org/ @lornajane