Link
Embed
Share
Beginning
This slide
Copy link URL
Copy link URL
Copy iframe embed code
Copy iframe embed code
Copy javascript embed code
Copy javascript embed code
Share
Tweet
Share
Tweet
Slide 1
Slide 1 text
stabilizing the registry
Slide 2
Slide 2 text
C J Silverio director of engineering, npm @ceejbot
Slide 3
Slide 3 text
This is the story of a plucky package registry named npm
Slide 4
Slide 4 text
scaling problem manifesting itself as a stability problem
Slide 5
Slide 5 text
"scaling" capacity to meet growing demands
Slide 6
Slide 6 text
"At scale" huge demand & lots of data
Slide 7
Slide 7 text
"stability" not falling over under normal demand
Slide 8
Slide 8 text
What's normal demand?
Slide 9
Slide 9 text
129K packages 239 GB package tarballs 40 million pkg dls/day 1500 req/sec, peak 3200
Slide 10
Slide 10 text
"Legacy" Anything you've put into production
Slide 11
Slide 11 text
this is the story of a legacy system becoming more flexible
Slide 12
Slide 12 text
No content
Slide 13
Slide 13 text
January 2013 20K packages .5 million dls/day
Slide 14
Slide 14 text
Oct 2013 44K packages 108 million dls/month 3.6 million dls/day
Slide 15
Slide 15 text
No content
Slide 16
Slide 16 text
our plucky little registry had to change
Slide 17
Slide 17 text
step 1: CDN Put Fastly.com in front of the registry
Slide 18
Slide 18 text
cache rules everything around me
Slide 19
Slide 19 text
step 2: tarballs get them out of couchdb
Slide 20
Slide 20 text
tarballs are huge! couch runs better without them base64 decoding is work.
Slide 21
Slide 21 text
No content
Slide 22
Slide 22 text
January 2014 60K packages 6+ million dls/day
Slide 23
Slide 23 text
step 3: visibility are things going wrong? what's going wrong?
Slide 24
Slide 24 text
reactive monitoring monitor deeply fix things quickly
Slide 25
Slide 25 text
proactive monitoring self-healing (also things don't break)
Slide 26
Slide 26 text
monitoring is unit testing Add monitoring after every outage
Slide 27
Slide 27 text
visibility is a prerequisite but not a solution
Slide 28
Slide 28 text
act on what monitoring and metrics reveal
Slide 29
Slide 29 text
step 4: redundancy several CouchDBs! reads, writes, & replication
Slide 30
Slide 30 text
fewer responsibilities for each piece isolates errors
Slide 31
Slide 31 text
step 5: automation ansible no server is special
Slide 32
Slide 32 text
June 2014 Superficially similar.
Slide 33
Slide 33 text
June 2014 80K packages 10 million dls/day
Slide 34
Slide 34 text
step 6: simplification now that it's not on fire we can modify at leisure
Slide 35
Slide 35 text
No content
Slide 36
Slide 36 text
Nov 2014 105K packages 28 million dls/day peak
Slide 37
Slide 37 text
50/50 AWS region split no AWS-specific magic Ubuntu 14.04 Trusty
Slide 38
Slide 38 text
Fastly: geoloc + varnish haproxy + CouchDB nginx + a filesystem
Slide 39
Slide 39 text
where's the node?
Slide 40
Slide 40 text
registry 2 electric boogaloo with 500% more node
Slide 41
Slide 41 text
No content
Slide 42
Slide 42 text
haproxy + node services couchdb ➜ postgres redis for caching nginx + filesystem
Slide 43
Slide 43 text
more complicated more flexible & redundant more scaling dials to turn
Slide 44
Slide 44 text
excited about postgres ad-hoc queries are fun
Slide 45
Slide 45 text
scaling node is exactly like scaling everything else
Slide 46
Slide 46 text
Understand system get visibility cool down hot spots add redundancy
Slide 47
Slide 47 text
npm client <3 npm install -g npm@latest
Slide 48
Slide 48 text
npm loves you