×
Copy
Open
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Slide 1
Slide 1 text
scaling the registry
Slide 2
Slide 2 text
C J Silverio devops at npmjs.com @ceejbot
Slide 3
Slide 3 text
What we did lessons learned generalizations
Slide 4
Slide 4 text
No content
Slide 5
Slide 5 text
Jacques Marneweck Benjamin Coe Laurie Voss
Slide 6
Slide 6 text
No content
Slide 7
Slide 7 text
January 2013 20K packages .5 million dls/day
Slide 8
Slide 8 text
January 2014 60K packages 8 million dls/day
Slide 9
Slide 9 text
Nov 2014 > 100K packages 28 million dls/day peak
Slide 10
Slide 10 text
side project 100% couchdb donated hosting IrisCouch
Slide 11
Slide 11 text
October 2013
Slide 12
Slide 12 text
General lesson #1 Put a cache on it
Slide 13
Slide 13 text
Re-architecture 1 move tarballs out of poor couchdb
Slide 14
Slide 14 text
No content
Slide 15
Slide 15 text
February 2014 company founded
Slide 16
Slide 16 text
hosted on Joyent/SmartOS hand-built CouchDB + Spidermonkey bash scripts to deploy
Slide 17
Slide 17 text
Twitter tells us when we're down
Slide 18
Slide 18 text
Re-architecture 2 Many couchdbs
Slide 19
Slide 19 text
General lesson #2 understand your db deeply
Slide 20
Slide 20 text
Monitoring & alerts
Slide 21
Slide 21 text
General lesson #3 Add monitoring after every outage
Slide 22
Slide 22 text
1: reactive monitor deeply fix things quickly
Slide 23
Slide 23 text
2: proactive self-healing monitoring (also things don't break)
Slide 24
Slide 24 text
June 2014 Superficially similar.
Slide 25
Slide 25 text
AWS / Ubuntu 70/30 west/east split 52 running instances, variable
Slide 26
Slide 26 text
No content
Slide 27
Slide 27 text
50/50 AWS region split haproxy to load balance no AWS-specific magic
Slide 28
Slide 28 text
Fastly: geoloc + cache haproxy / CouchDB nginx + a filesystem
Slide 29
Slide 29 text
behind the scenes ansible / nagios InfluxDB+Grafana
Slide 30
Slide 30 text
General lesson #4 metrics for everything
Slide 31
Slide 31 text
memory & cpu use request latency event counts
Slide 32
Slide 32 text
metrics == visibility
Slide 33
Slide 33 text
metrics drive monitoring
Slide 34
Slide 34 text
General lesson #5 automate
Slide 35
Slide 35 text
no special snowflakes every instance can be replaced
Slide 36
Slide 36 text
General lesson #6 the goal is to be BORING
Slide 37
Slide 37 text
if operations are boring you can do the dev
Slide 38
Slide 38 text
Goal: to be the most boring part of your node experience
Slide 39
Slide 39 text
npm client <3 npm install -g npm@latest
Slide 40
Slide 40 text
npm loves you