$30 off During Our Annual Pro Sale. View Details »

Stabilizing npm

Stabilizing npm

How the npm, Inc team stabilized the registry early in 2014. Video from the presentation is here: https://www.youtube.com/watch?v=3ivx2RsZ1yA

C J Silverio

July 24, 2014
Tweet

More Decks by C J Silverio

Other Decks in Programming

Transcript

  1. stabilizing
    the registry

    View Slide

  2. C J Silverio
    devops at npmjs.com
    @ceejbot

    View Slide

  3. side project
    100% couchdb
    donated hosting
    IrisCouch

    View Slide

  4. View Slide

  5. View Slide

  6. December 2013

    View Slide

  7. January 2014

    View Slide

  8. February 2014
    » company founded & funded
    » 100% hosted on Joyent
    » several skimdbs load-balanced by Fastly
    » hand-built CouchDB + Spidermonkey
    » automation by bash
    » Twitter tells us when we're down

    View Slide

  9. This is when I arrive.
    (funding means you can hire!)
    » PagerDuty account: first thing I did
    » Nagios all hooked up & monitoring basic host
    health
    » we have maybe 10 hosts total driving the registry

    View Slide

  10. Funding also means attention
    from bounty-hunters.

    View Slide

  11. security audit

    View Slide

  12. Stabilization stage 1
    reactive
    » monitor everything more deeply
    » methodically identify & monitor causes of outages
    » react quickly to fix problems
    » Twitter is no longer telling us when we're down

    View Slide

  13. Stabilization stage 2
    proactive
    » our second devops person: Ben Coe
    » recurring problems fixed in the apps
    » monitoring checks self-heal
    » redundancy everywhere
    » automation!
    » our night shift is bored!

    View Slide

  14. June 2013
    Superficially
    similar.

    View Slide

  15. major changes
    100% on AWS
    Ubuntu Trusty
    70/30 split between us-west-2 & us-east-1
    100% automated with ansible
    52 running instances, variable

    View Slide

  16. the stack
    » Fastly CDN for Varnish cache & geolocality
    » nginx to serve static files
    » pound to terminate TLS
    » CouchDB for package metadata & app logic
    » nagios + PagerDuty for monitoring
    » InfluxDB + Grafana for metrics
    » Tarsnap for backups

    View Slide

  17. View Slide

  18. View Slide

  19. weak points
    » single points of failure: Fastly, write primary
    » still looking for an off-AWS backup
    » expensive to run: too many couchdbs
    » too entangled with couchdb
    » complex in odd places: the skimworker, for example

    View Slide

  20. I now praise
    CouchDB

    View Slide

  21. my next goal:
    make it cheap

    View Slide

  22. by next week
    haproxy
    50-50 region balance
    cheaper by far

    View Slide

  23. my long-term goal:
    npm as reliable
    utility

    View Slide