Upgrade to Pro — share decks privately, control downloads, hide ads and more …

RTDB Technology

Avatar for chicagozer chicagozer
November 16, 2013

RTDB Technology

The technology and architecture behind RTDB, a real-time database.

Avatar for chicagozer

chicagozer

November 16, 2013
Tweet

More Decks by chicagozer

Other Decks in Technology

Transcript

  1. Reboot… !  Start with a requirement that analytics should be

    delivered in real-time. !  Introduce a web based “subscriber” modeled after message push. !  But instead of delivering individual messages – deliver the aggregated analytic results. !  Highly useful for executive dashboards, retail analytics, mobile real-time updates and many more.
  2. A node.js database solution ! RTDB is a real-time JSON document

    database implemented entirely in node.js. ! Documents are inserted via REST API. ! Analytics are made available via map/reduce. ! When data changes, queries are updated in real- time and subscribers are updated immediately using HTML5 server-sent events.
  3. Comparison SQL/Document DB !  Relational/JSON !  SQL or Map/Reduce ! 

    DB specific Driver !  Polling !  Dynamic/Prepared !  Separate BI/Analytics RTDB !  JSON !  Map/Reduce !  REST !  Push !  Prepared !  Integrated Analytics
  4. RTDB implementation Publisher Express Event Emitter Client Subscriber JSON over

    REST JSON via HTML5 Server Sent Event Configurable File System
  5. Map Reduce Finalize Map/Reduce Pipeline Personalize organize by “key” Rollup/

    aggregate Sorting/filter Apply user specific filter/ function Note: RTDB will perform an “incremental” map/ reduce when possible. This is very powerful. Add one record to a million record collection? No need to “remap” the entire dataset.
  6. Reduce emit(values.reduce(function (a, b) { return a + b; }));

    Map emit(item.artist,1); Finalize reduction.sort(function(a,b) {return b[1] -a[1];}); if (reduction.length > 30) reduction.length = 30; emit(reduction); { artist: “Neil Diamond”, song: “Forever in Blue Jeans”, album: “You Don’t Bring me flowers”, year: 1978 } [[‘Neil Diamond’,1]]
  7. Why node.js? ! node.js was the initial choice for prototyping Server

    Sent Events. ! As the prototype matured into an application, the application never outgrew node.js. ! node.js is interpreted so the map/reduce scripts can be changed at run- time. ! Very fast; very stable. ! Scalable I/O. Used heavily for files and subscribers. ! Async model is very well-suited for the internal architecture. ! NPM provides an excellent set of API building blocks for rapid assembly. ! Extremely PaaS friendly. (Heroku, Amazon EBS, Modulus, Cloudnode and more).
  8. APIs leveraged via NPM •  Express – used for REST

    API and web admin •  Jade – templates for web admin •  AWS-SDK – access to the S3 file system •  Async – concurrency framework •  Symmetry – JSON delta processing •  Winston – logging Note: NPM offers several overlapping APIs. You are free to choose the best fit for your needs.
  9. Configurable File System (CFS) •  RTDB can use a variety

    of backing stores; even other databases. •  Loads all javascript modules in specified directory via “requires” var cfslist = fs.readdirSync('./cfs'); var cfsTypes = {}; cfslist.forEach(function(file) { var cfs = require('./cfs/' + file); cfsTypes[cfs.name] = cfs; }); self.cfs = new cfsTypes[self.globalSettings.cfs](); self.cfs.init(self.globalSettings.cfsinit);
  10. Configurable File System •  Small set of required methods: function

    name() - return a unique name for this provider function init(parms) - initialize with params from settings.json function exists(dir, callback) - does this exist? function get(key, callback) - return object by key function del(key, callback) - delete object by key function put(prefix, item, callback, expires) - put object function list(prefix, callback) - list objects
  11. JSON simplifies everything. Here is all the code for inserting

    documents. app.post('/db/collections/:id/documents', function(req, res) { var c = database.collectionAt(req.params.id); var docs = []; if (!Array.isArray(req.body)) docs.push(req.body); else docs = req.body; c.put(docs, function(err) { if (!err) res.send(201); else res.send(500,err); }); }); Note: REST and JSON make it very easy to interact with the database using command line tools such as CURL.
  12. Node.js events are the real-time glue •  Create emitter _emitter

    = new events.EventEmitter(); •  Register reduce function _emitter.once('change', doReduce); •  When there is work… emitter.emit('change'); •  Use “once” versus “on” to manage flow.
  13. Async framework expedites concurent programming •  Async.each – process in

    parallel •  Async.eachSeries – process sequentially •  Load collections in priority order •  Async.eachLimit – parallel, but with limit •  Load files from file system without running out of system resources
  14. Symmetry for wire protocol performance a = { x: 3,

    y: 5, z: 1 }; b = { x: 3, y: 8, z: 1 }; Symmetry.diff(a, b) # => { t: 'o', s: { y: 8 } } obj = { x: 3, y: 5, z: 1 }; diff = { t: 'o', s: { y: 8 } }; Symmetry.patch(obj, diff); obj # => { x: 3, y: 8, z: 1 } Example from https://github.com/Two-Screen/symmetry Symmetry will “delta” the JSON result set and can significantly reduce the bytes transferred.
  15. GUIDs •  Every persistent object gets a GUID. •  Easy

    to share data between implementations. var uuid = require('node-uuid'); this._id = uuid.v4();
  16. For more info (I would love to hear from you.)

    [email protected] Twitter @rheosoft http://facebook.com/rheosoft