The story of how Errorception was launched, and how it has had to overcome scaling challenges with Node.js. The result is what I think is a fantastic way of building large complex apps with Node.
BACKSTORY Code was crap, no tests, no thought towards scaling Single, monolothic node.js app talking to mongodb It worked… mostly Single machine, 512 MB RAM #1 on HN for a couple of hours… What could go wrong with that?
NODE WAS THE PROBLEM! Well, actually, it was my code Lesson 1: Never do anything CPU intensive in node Re-evaluate even small loops in tight code paths
SOME MONTHS LATER… Big advertising company decides to use Errorception Ads go on the Yahoo! homepage. Yahoo's traffic hits Errorception's server Server: 768 MB RAM machine
THE DEPLOYMENT PROBLEM Deployment needs a restart Sometimes, deployments need a downtime for DB migrations When down, errors aren't collected Also, an app that's down looks bad But minimizing deployments is not a good idea
SOLUTIONS Break up the application into multiple pieces Each piece as small as necessary Deploy each piece independently, version them independently Get them to talk to each other through some message passing system
QUEUES Queues let each part of the application prepare tasks for other parts If a component dies, queue will fill up However, it lets us kill parts of the app at will
ERRORCEPTION ISN'T ONE APP The UI server deals with serving HTTP (ExpressJS) A super lightweight (90 LOC) pure-node HTTP server collects errors: Uses node's cluster to split the task across processes Collects errors and simply dumps them into a redis queue 3 micro-apps process the errors from queue to queue in Redis Finally, a single small app writes to MongoDB
THIS WAS EXHAUSTING Frequent errors and downtimes Restarts every couple of hours due to memory leaks Even wrote a script to restart applications every couple of hours!
SHARING CODE WITH SYMLINKS is now a module, rather than a service The module folder is simply symlinked into every app's n o d e _ m o d u l e s folder Still has drawbacks, but works
CURRENT STACK ExpressJS for the website Pure, straight-up node for the error catching server Redis everywhere Mongoose on top of MongoDB f o r e v e r as a process watcher 24 node processes Still one primary machine and one failover