Link
Embed
Share
Beginning
This slide
Copy link URL
Copy link URL
Copy iframe embed code
Copy iframe embed code
Copy javascript embed code
Copy javascript embed code
Share
Tweet
Share
Tweet
Slide 1
Slide 1 text
scaling the registry
Slide 2
Slide 2 text
C J Silverio devops at npmjs.com @ceejbot
Slide 3
Slide 3 text
What we did lessons learned generalizations
Slide 4
Slide 4 text
No content
Slide 5
Slide 5 text
Jacques Marneweck Benjamin Coe Laurie Voss
Slide 6
Slide 6 text
No content
Slide 7
Slide 7 text
January 2013 20K packages .5 million dls/day
Slide 8
Slide 8 text
January 2014 60K packages 8 million dls/day
Slide 9
Slide 9 text
Nov 2014 > 100K packages 28 million dls/day peak
Slide 10
Slide 10 text
side project 100% couchdb donated hosting IrisCouch
Slide 11
Slide 11 text
October 2013
Slide 12
Slide 12 text
General lesson #1 Put a cache on it
Slide 13
Slide 13 text
Re-architecture 1 move tarballs out of poor couchdb
Slide 14
Slide 14 text
No content
Slide 15
Slide 15 text
February 2014 company founded
Slide 16
Slide 16 text
hosted on Joyent/SmartOS hand-built CouchDB + Spidermonkey bash scripts to deploy
Slide 17
Slide 17 text
Twitter tells us when we're down
Slide 18
Slide 18 text
Re-architecture 2 Many couchdbs
Slide 19
Slide 19 text
General lesson #2 understand your db deeply
Slide 20
Slide 20 text
Monitoring & alerts
Slide 21
Slide 21 text
General lesson #3 Add monitoring after every outage
Slide 22
Slide 22 text
1: reactive monitor deeply fix things quickly
Slide 23
Slide 23 text
2: proactive self-healing monitoring (also things don't break)
Slide 24
Slide 24 text
June 2014 Superficially similar.
Slide 25
Slide 25 text
AWS / Ubuntu 70/30 west/east split 52 running instances, variable
Slide 26
Slide 26 text
No content
Slide 27
Slide 27 text
50/50 AWS region split haproxy to load balance no AWS-specific magic
Slide 28
Slide 28 text
Fastly: geoloc + cache haproxy / CouchDB nginx + a filesystem
Slide 29
Slide 29 text
behind the scenes ansible / nagios InfluxDB+Grafana
Slide 30
Slide 30 text
General lesson #4 metrics for everything
Slide 31
Slide 31 text
memory & cpu use request latency event counts
Slide 32
Slide 32 text
metrics == visibility
Slide 33
Slide 33 text
metrics drive monitoring
Slide 34
Slide 34 text
General lesson #5 automate
Slide 35
Slide 35 text
no special snowflakes every instance can be replaced
Slide 36
Slide 36 text
General lesson #6 the goal is to be BORING
Slide 37
Slide 37 text
if operations are boring you can do the dev
Slide 38
Slide 38 text
Goal: to be the most boring part of your node experience
Slide 39
Slide 39 text
npm client <3 npm install -g npm@latest
Slide 40
Slide 40 text
npm loves you