Slide 1

Slide 1 text

Build A Serverless Data Pipeline Lorna Mitchell, IBM

Slide 2

Slide 2 text

Stackoverflow Dashboard @lornajane

Slide 3

Slide 3 text

Pipeline To Shift Data Bringing data from StackOverflow into the dashboard my advocate team uses @lornajane

Slide 4

Slide 4 text

Why Go Serverless? • Costs nothing when idle • Small application, simple architecture • Bursty usage since it runs from a cron • No real-time requirement • Easily within free tier @lornajane

Slide 5

Slide 5 text

An Aside About Databases @lornajane

Slide 6

Slide 6 text

Document Databases Store collections of schemaless documents, in JSON @lornajane

Slide 7

Slide 7 text

CouchDB Cluster Of Unreliable Commodity Hardware • Modern, scalable document database • HTTP API • JSON data format • Best replication on the planet (probably) @lornajane

Slide 8

Slide 8 text

Writing Serverless Functions @lornajane

Slide 9

Slide 9 text

Serverless Platforms • Amazon Lambda • IBM Cloud Functions (aka OpenWhisk) • Twilio Functions • Azure Functions • Google Cloud Functions • Iron Functions • ... and more every week @lornajane

Slide 10

Slide 10 text

Hello World in JS All the platforms are slightly different, this is for OpenWhisk exports.main = function(args) { return({"message": "Hello, World!"}); }; Function must return an object or a Promise @lornajane

Slide 11

Slide 11 text

OpenWhisk Vocabulary • trigger an event, such as an incoming HTTP request • rule map a trigger to an action • action a function, optionally with parameters • package collect actions and parameters together • sequence more than one action in a row @lornajane

Slide 12

Slide 12 text

Working With Actions Deploy code: zip hello.zip index.js wsk action update --kind nodejs:6 demo/hello1 hello.zip Then run it: wsk action invoke --blocking demo/hello1 @lornajane

Slide 13

Slide 13 text

Web-Enabled Actions Deploy code: zip hello.zip index.js wsk action update --kind nodejs:6 --web true demo/hello1 hello.zip Then curl it: curl https://openwhisk.ng.bluemix.net/api/v1/web/.../hello1.json @lornajane

Slide 14

Slide 14 text

Build the Data Pipeline @lornajane

Slide 15

Slide 15 text

OfflineFirst Applications This app is OfflineFirst: • Client side JS • Client side copy of DB using PouchDB • Background sync to serverside CouchDB @lornajane

Slide 16

Slide 16 text

Designing for Serverless • Modular, testable functions • Each with a single purpose • Observe data hygiene @lornajane

Slide 17

Slide 17 text

Pipeline Actions @lornajane

Slide 18

Slide 18 text

Pipeline Actions Sequence socron • collector makes an API call, passes on data • invoker fires many actions: one for each item Sequence qhandler • storer inserts or updates the record • notifier sends a webhook to slack or a bot @lornajane

Slide 19

Slide 19 text

Health Warning: large quanitity of code coming up ... @lornajane

Slide 20

Slide 20 text

Triggering the Pipeline It's a cron job, so we create a periodic trigger wsk trigger delete couchdbish wsk trigger create couchdbish --feed /whisk.system/alarms/alarm \ --param cron "*/5 * * * *" \ --param trigger_payload "{\"tags\": [\"cloudant\",\"ibm-cloudant Use a rule to link this to our socron sequence wsk rule update couchdbrule couchdbish stackoverflow/socron wsk rule enable couchdbrule @lornajane

Slide 21

Slide 21 text

Collector 1 var request = require('request'); 2 function main(message) { 3 return new Promise(function(resolve, reject) { 4 var tagged = message.tags.join(';'); 5 var r = { method: 'get', url: https://api.stackexchange.com 6 request(r, function(err, response, body) { 7 if (err) { return reject(err); } 8 if (response.statusCode != 200) { throw(new Error('status 9 resolve({ items: body.items }); 10 }); 11 }); 12 } 13 module.exports = main; @lornajane

Slide 22

Slide 22 text

Invoker 1 function main(args) { 2 return new Promise(function(resolve, reject) { 3 var openwhisk = require('openwhisk'); 4 var ow = openwhisk(); 5 var actions = args.items.map(function (item) { 6 return ow.actions.invoke( 7 {actionName: "stackoverflow/qhandler", params: {questio 8 }); 9 return Promise.all(actions).then(function (results) { 10 return resolve({payload: "All OK: " + results.length + " 11 }); 12 }); 13 } @lornajane

Slide 23

Slide 23 text

Storer 1 function main(message) { 2 var cloudant = require('cloudant')({url: message.cloudantURL, 3 var db = cloudant.db.use(message.dbname); 4 var id = message.question.question_id.toString(); 5 return getDoc(db, id).then(function(data) { 6 if (data === null) { // so insert 7 message.question.tags = message.question.tags.sort(); // 8 var obj = { _id: id, type: 'question', owner: null, statu 9 return db.insert(obj).then(function(data) { 10 return obj; // pass on the new object to the next actio 11 }); 12 } else { ... } 13 }); @lornajane

Slide 24

Slide 24 text

Notifier 1 function main(data) { 2 return new Promise(function(resolve, reject) { 3 var request = require('request'); 4 if(data.status == 'new') { 5 var event = { type: "new-question", data: data }; 6 request({ 7 url: hardcoded_hubot_url, method: "POST", headers: {"Co 8 }, function (err, response, body) { 9 if(err) { reject ({payload: "Failed"}); 10 } else { resolve( {payload: "Notified"} ); } 11 }); 12 } else { resolve( {payload: "Complete"} ); } 13 }); @lornajane

Slide 25

Slide 25 text

And breathe! @lornajane

Slide 26

Slide 26 text

Hubot http://hubot.github.com A customisable, life embetterment robot @lornajane

Slide 27

Slide 27 text

Serverless And Data @lornajane

Slide 28

Slide 28 text

Resources • IBM Cloud Functions: https://www.ibm.com/cloud-computing/bluemix/openwhisk • Code https://github.com/ibm-watson-data-lab/soingest • My blog: https://lornajane.net • CouchDB: http://couchdb.apache.org/ • PouchDB: https://pouchdb.com/ • Hubot: http://hubot.github.com @lornajane