we are actually still pretty small as web things go peak is somewhere around 1500rps for the registry not trivial, but not crazy 40rps for the website barely there, by the scale of consumer web products
people had heard of Node? But the world of dev products is pretty different npmjs.org sees 500,000 unique users/month for a developer product, that's amazing who even knew there were 500,000 people who used node? 1.3MM uniques YTD, but nobody trusts web analytics that far
pattern our is simple: ludicrously high reads, tiny numbers of writes we list each write on the homepage! this is easy! but we had a TON of downtime before becoming an inc part of why we became an inc
split out the binaries from the database not a scaling problem most people have! never put binaries in your database requirements of metadata and binaries scale very differently most requests to the registry are 304s the client caches the binary, but always checks the JSON client says "I've got 3.1.4, got anything better?" servers says "no". Very fast and simple lots more JSON requests than binary requests binaries now replicated independently to two different data stores now we have only metadata in the db easier to manage better suited for, you know, being a database
disk algorithm with a proof of concept client wrapped around it it is not a very good production database we do not recommend it for anything ever except it IS really simple to replicate, and keep those replicas consistent that's the thing most databases really suck at! it also has changes feeds, which are pretty great
Shrinkwrap works now! we stopped allowing publish -f we blame past-isaac for ever letting this happen it allowed you to replace an existing binary with a new one(!) a deeply terrible idea that was abused confused the fuck out of devops people made shrinkwrap functionally useless "check in your node_modules" became best practice
supposed to you can still unpublish a version shrinkwrap will then fail but that's probably what you want to happen if you're depending on a bad version it makes managing binaries much simpler it allows us to cache the heck out of everything forever
responses are only cached for 1 second but when you have 1000+rps that takes db hits down by three orders of magnitude makes it nearly trivial as long as your cache servers work
DNS lookups dark magic. they are better at it than we are. when you're small, that's a good decision to make but exporting functions is fraught with danger
fast(ish) promotion multi-master for concurrent writes is incredibly hard you need to modify your application to do it properly we can only modify *new* versions of npm old versions of npm don't do atomic writes and never will so we have a single master and we can promote a replica to master manually, but fast you don't want master failover to be automatic because it's expensive
faster, and it fucks up less often DON'T USE BREW DON'T USE DEFAULT APT-GET new version reports what version it is! will allow us to tell you about bugs, suggest upgrades already finding and fixing bugs this way
Inc "Wait, don't we already have these?" You sort of do: you have a local copy of npm You can publish local modules and share them with each other Without having to share them with the whole world Private modules turn this hack into a first-class feature Prevent collisions with public modules by namespacing
install @groupon/mymodule In javascript: require('@groupon/bob') In package.json: "dependencies": { "@groupon/mymodule": "1.0.0" } This all works in node already All you need is a new version of npm Backwards-compatible with your production env
*only* depend on other global modules no existing module will need to change but it will make people think carefully before "promoting" a global module git repos are already in dependencies of existing modules we can't tell if they're public will grandfather them in but future globals should use public packages
publish descriptively-named packages without colliding lets teams organize all of their related packages can be private allows teams to share code without sharing with the world avoids the split-brain annoyance of having some modules as git repos can be semi-private grant others access to modules in your team maybe as a result of a subscription? still under discussion and debate
don't need to be sold The node philosophy is "lots of small, simple modules" This works great when your code is public When it's private, sharing small modules between projects becomes painful Leads to monolithic projects Bad! Private modules allows you to decompose without worrying Which bring us to a larger point about websites, and how they should work in node
from Ruby and other places First question: "how do I build a website?" The answer is "ehhhh.... express?" We have an opportunity to define "the right way" The old way was MVC Do we want to do it again?
having to train every new hire on the architecture of your web app Just say "It's an MVC app" and suddenly their first month is done Massive improvements in developer productivity Came coupled with a couple significantly less sensible ideas, like ORM http://seldo.com/weblog/2011/08/11/orm_is_an_antipattern But has some problems when used at scale Not tech scale, but team scale problems
code in one convenient place is great But leads to a "just one more thing" approach to design Your application slowly accumulates into a single giant monolith Becomes hard to maintain, slow to release Scales, but expensively Each server has everything in memory, means only a few workers per server You want to scale your front door to 1000rps But you're also paying to scale your login page, your admin screen, your API... Model changes become painful Splitting up your API becomes painful
Failure tolerant Automatic recovery ...but it's not all roses SOA splits your application into lots of small services Each can be developed, deployed and scaled independently Strongly-defined interfaces allow rapid development without breakage Mechanisms for "backpressure" are essential Must be tolerant of failure, and recover automatically Queue-on-failure, backoff-and-retry There's lots to love, and most big web companies are going this way
into APIs Is your entire website still a single application? If not, what about the common parts of your website? Are your templates common to all of your services? Do you have a "template service"? Yahoo really does have a "header service" So goes Groupon But the bigger problem with SOA is that it's harder
the popularity of MVC is that it became "how you build websites" But if we've decided that MVC doesn't ultimately scale to large applications What do we do instead? The status quo is that everybody builds sites in MVC and then painfully migrates to SOA later Is there a better way?
build using SOA from the get go? Is this even a good idea, or is it premature optimization? Projects like seneca try to bridge the gap Everything's conceptually a service, but it starts out as a single app You could also use Docker to run your "services" all in a single machine easily But this isn't solving the complexity problem
one? What does the design pattern of an SOA-from-the-start look like? What tooling are we lacking? Is it just a matter of documentation? What projects haven't I heard about? What do you think? Opinions welcome! "Not a question, more of a comment" guy: now is your time to shine!