Slide 1

Slide 1 text

stabilizing the registry

Slide 2

Slide 2 text

C J Silverio devops at npmjs.com @ceejbot

Slide 3

Slide 3 text

side project 100% couchdb donated hosting IrisCouch

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

December 2013

Slide 7

Slide 7 text

January 2014

Slide 8

Slide 8 text

February 2014 » company founded & funded » 100% hosted on Joyent » several skimdbs load-balanced by Fastly » hand-built CouchDB + Spidermonkey » automation by bash » Twitter tells us when we're down

Slide 9

Slide 9 text

This is when I arrive. (funding means you can hire!) » PagerDuty account: first thing I did » Nagios all hooked up & monitoring basic host health » we have maybe 10 hosts total driving the registry

Slide 10

Slide 10 text

Funding also means attention from bounty-hunters.

Slide 11

Slide 11 text

security audit

Slide 12

Slide 12 text

Stabilization stage 1 reactive » monitor everything more deeply » methodically identify & monitor causes of outages » react quickly to fix problems » Twitter is no longer telling us when we're down

Slide 13

Slide 13 text

Stabilization stage 2 proactive » our second devops person: Ben Coe » recurring problems fixed in the apps » monitoring checks self-heal » redundancy everywhere » automation! » our night shift is bored!

Slide 14

Slide 14 text

June 2013 Superficially similar.

Slide 15

Slide 15 text

major changes 100% on AWS Ubuntu Trusty 70/30 split between us-west-2 & us-east-1 100% automated with ansible 52 running instances, variable

Slide 16

Slide 16 text

the stack » Fastly CDN for Varnish cache & geolocality » nginx to serve static files » pound to terminate TLS » CouchDB for package metadata & app logic » nagios + PagerDuty for monitoring » InfluxDB + Grafana for metrics » Tarsnap for backups

Slide 17

Slide 17 text

No content

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

weak points » single points of failure: Fastly, write primary » still looking for an off-AWS backup » expensive to run: too many couchdbs » too entangled with couchdb » complex in odd places: the skimworker, for example

Slide 20

Slide 20 text

I now praise CouchDB

Slide 21

Slide 21 text

my next goal: make it cheap

Slide 22

Slide 22 text

by next week haproxy 50-50 region balance cheaper by far

Slide 23

Slide 23 text

my long-term goal: npm as reliable utility