Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Stabilizing npm

Stabilizing npm

How the npm, Inc team stabilized the registry early in 2014. Video from the presentation is here: https://www.youtube.com/watch?v=3ivx2RsZ1yA

C J Silverio

July 24, 2014
Tweet

More Decks by C J Silverio

Other Decks in Programming

Transcript

  1. February 2014 » company founded & funded » 100% hosted

    on Joyent » several skimdbs load-balanced by Fastly » hand-built CouchDB + Spidermonkey » automation by bash » Twitter tells us when we're down
  2. This is when I arrive. (funding means you can hire!)

    » PagerDuty account: first thing I did » Nagios all hooked up & monitoring basic host health » we have maybe 10 hosts total driving the registry
  3. Stabilization stage 1 reactive » monitor everything more deeply »

    methodically identify & monitor causes of outages » react quickly to fix problems » Twitter is no longer telling us when we're down
  4. Stabilization stage 2 proactive » our second devops person: Ben

    Coe » recurring problems fixed in the apps » monitoring checks self-heal » redundancy everywhere » automation! » our night shift is bored!
  5. major changes 100% on AWS Ubuntu Trusty 70/30 split between

    us-west-2 & us-east-1 100% automated with ansible 52 running instances, variable
  6. the stack » Fastly CDN for Varnish cache & geolocality

    » nginx to serve static files » pound to terminate TLS » CouchDB for package metadata & app logic » nagios + PagerDuty for monitoring » InfluxDB + Grafana for metrics » Tarsnap for backups
  7. weak points » single points of failure: Fastly, write primary

    » still looking for an off-AWS backup » expensive to run: too many couchdbs » too entangled with couchdb » complex in odd places: the skimworker, for example