Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Nerdy Pontification Mealtime

Nerdy Pontification Mealtime

Lunchtime talk to Groupon about npm present and future, and building websites in node.

Laurie

May 07, 2014
Tweet

Other Decks in Programming

Transcript

  1. Laurie Voss CTO of npm Inc @seldo Was a front-end

    developer Started building server apps in Java and PHP Slid down the stack to being a MySQL DBA Ended up an ops guy Think about the full stack, a lot
  2. What are we gonna talk about? 1. npm-the-service: architecture 2.

    npm-the-ecosystem: what's coming up 3. how should we build websites in node?
  3. Growth node is growing fast npm is growing even faster

    this is probably because of a complexity law new versions of npm will let us know if we're right growing insanely
  4. Not Web Scale registry.npmjs.org = 1500rps www.npmjs.org = 40rps but

    we are actually still pretty small as web things go peak is somewhere around 1500rps for the registry not trivial, but not crazy 40rps for the website barely there, by the scale of consumer web products
  5. Dev Scale 500,000 unique users/month ...who even knew that many

    people had heard of Node? But the world of dev products is pretty different npmjs.org sees 500,000 unique users/month for a developer product, that's amazing who even knew there were 500,000 people who used node? 1.3MM uniques YTD, but nobody trusts web analytics that far
  6. Architecture of npm architecture is a function of your load

    pattern our is simple: ludicrously high reads, tiny numbers of writes we list each write on the homepage! this is easy! but we had a TON of downtime before becoming an inc part of why we became an inc
  7. this is the architecture today what did we do? first:

    split out the binaries from the database not a scaling problem most people have! never put binaries in your database requirements of metadata and binaries scale very differently most requests to the registry are 304s the client caches the binary, but always checks the JSON client says "I've got 3.1.4, got anything better?" servers says "no". Very fast and simple lots more JSON requests than binary requests binaries now replicated independently to two different data stores now we have only metadata in the db easier to manage better suited for, you know, being a database
  8. Aside: CouchDB https://www.flickr.com/photos/revjim/118147356/ couch is a really neat b-tree on

    disk algorithm with a proof of concept client wrapped around it it is not a very good production database we do not recommend it for anything ever except it IS really simple to replicate, and keep those replicas consistent that's the thing most databases really suck at! it also has changes feeds, which are pretty great
  9. What we did: cacheability No more publish --force! (blame past-isaacs)

    Shrinkwrap works now! we stopped allowing publish -f we blame past-isaac for ever letting this happen it allowed you to replace an existing binary with a new one(!) a deeply terrible idea that was abused confused the fuck out of devops people made shrinkwrap functionally useless "check in your node_modules" became best practice
  10. Versions are immutable https://www.flickr.com/photos/ndecam/5803816395/ shrinkwrap now works the way it's

    supposed to you can still unpublish a version shrinkwrap will then fail but that's probably what you want to happen if you're depending on a bad version it makes managing binaries much simpler it allows us to cache the heck out of everything forever
  11. Scaled reads horizontally tons of read-only replicas for metadata json

    responses are only cached for 1 second but when you have 1000+rps that takes db hits down by three orders of magnitude makes it nearly trivial as long as your cache servers work
  12. Fastly is our CDN hundreds of edge servers globally distributed

    DNS lookups dark magic. they are better at it than we are. when you're small, that's a good decision to make but exporting functions is fraught with danger
  13. Caching is tricky cache servers that are remote add a

    hop cache hits are way faster cache misses are 1 hop slower think carefully about that tradeoff and tune cache to reflect that
  14. What we did: scaled vertically single giant write master w/

    fast(ish) promotion multi-master for concurrent writes is incredibly hard you need to modify your application to do it properly we can only modify *new* versions of npm old versions of npm don't do atomic writes and never will so we have a single master and we can promote a replica to master manually, but fast you don't want master failover to be automatic because it's expensive
  15. PSA: update your npm sudo npm install npm -g it's

    faster, and it fucks up less often DON'T USE BREW DON'T USE DEFAULT APT-GET new version reports what version it is! will allow us to tell you about bugs, suggest upgrades already finding and fixing bugs this way
  16. What's coming up: Private modules First paid feature of npm,

    Inc "Wait, don't we already have these?" You sort of do: you have a local copy of npm You can publish local modules and share them with each other Without having to share them with the whole world Private modules turn this hack into a first-class feature Prevent collisions with public modules by namespacing
  17. What private modules look like On the command line: npm

    install @groupon/mymodule In javascript: require('@groupon/bob') In package.json: "dependencies": { "@groupon/mymodule": "1.0.0" } This all works in node already All you need is a new version of npm Backwards-compatible with your production env
  18. What's @groupon? Creates new namespaces Groups code in node_modules Looks

    kinda like Twitter @names (super cool) Doesn't conflict with GitHub repos (oops) Orgs can own modules Modules can be public, private or in-between
  19. Global modules this is all current modules global modules can

    *only* depend on other global modules no existing module will need to change but it will make people think carefully before "promoting" a global module git repos are already in dependencies of existing modules we can't tell if they're public will grandfather them in but future globals should use public packages
  20. Owned modules npmjs.org/@myorg/mypackage can be public allows new users to

    publish descriptively-named packages without colliding lets teams organize all of their related packages can be private allows teams to share code without sharing with the world avoids the split-brain annoyance of having some modules as git repos can be semi-private grant others access to modules in your team maybe as a result of a subscription? still under discussion and debate
  21. Why private modules? https://www.flickr.com/photos/schill/4813392151/ You already have them, so probably

    don't need to be sold The node philosophy is "lots of small, simple modules" This works great when your code is public When it's private, sharing small modules between projects becomes painful Leads to monolithic projects Bad! Private modules allows you to decompose without worrying Which bring us to a larger point about websites, and how they should work in node
  22. We're at a turning point Developers are arriving at Node

    from Ruby and other places First question: "how do I build a website?" The answer is "ehhhh.... express?" We have an opportunity to define "the right way" The old way was MVC Do we want to do it again?
  23. MVC: the good parts A brilliant, industry-changing pattern Instead of

    having to train every new hire on the architecture of your web app Just say "It's an MVC app" and suddenly their first month is done Massive improvements in developer productivity Came coupled with a couple significantly less sensible ideas, like ORM http://seldo.com/weblog/2011/08/11/orm_is_an_antipattern But has some problems when used at scale Not tech scale, but team scale problems
  24. MVC: the bad parts https://www.flickr.com/photos/ecstaticist/2589723846/ Having all your data model

    code in one convenient place is great But leads to a "just one more thing" approach to design Your application slowly accumulates into a single giant monolith Becomes hard to maintain, slow to release Scales, but expensively Each server has everything in memory, means only a few workers per server You want to scale your front door to 1000rps But you're also paying to scale your login page, your admin screen, your API... Model changes become painful Splitting up your API becomes painful
  25. SOA: everybody's doing it Tons of small services Loosely coupled

    Failure tolerant Automatic recovery ...but it's not all roses SOA splits your application into lots of small services Each can be developed, deployed and scaled independently Strongly-defined interfaces allow rapid development without breakage Mechanisms for "backpressure" are essential Must be tolerant of failure, and recover automatically Queue-on-failure, backoff-and-retry There's lots to love, and most big web companies are going this way
  26. Sharing is hard https://www.flickr.com/photos/ryanr/142455033/ Assume you've turned your data models

    into APIs Is your entire website still a single application? If not, what about the common parts of your website? Are your templates common to all of your services? Do you have a "template service"? Yahoo really does have a "header service" So goes Groupon But the bigger problem with SOA is that it's harder
  27. Teach me how to website https://www.flickr.com/photos/photophilde/2941549171 A big part of

    the popularity of MVC is that it became "how you build websites" But if we've decided that MVC doesn't ultimately scale to large applications What do we do instead? The status quo is that everybody builds sites in MVC and then painfully migrates to SOA later Is there a better way?
  28. SOA from the start Can we teach new developers to

    build using SOA from the get go? Is this even a good idea, or is it premature optimization? Projects like seneca try to bridge the gap Everything's conceptually a service, but it starts out as a single app You could also use Docker to run your "services" all in a single machine easily But this isn't solving the complexity problem
  29. What is the "Rails of SOA"? ...do we even want

    one? What does the design pattern of an SOA-from-the-start look like? What tooling are we lacking? Is it just a matter of documentation? What projects haven't I heard about? What do you think? Opinions welcome! "Not a question, more of a comment" guy: now is your time to shine!
  30. BONUS SLIDE: npm commands! that the team wishes you knew

    about npm init npm install --save, --save-dev npm link npm info <package> <field> npm install express@visionmedia/express#remove-connect npm i -S