You would not want your service to be unusable at precisely the wrong time — while everyone is watching — would you?
With adequate preparation, however, you can build a service that can preserve during traffic bursts that exceed your initially estimated capacity by orders of magnitude.
It is one thing when you create a sample web application using Node.JS (and maybe utilizing the cluster module to distribute the load), and it is totally something else when you need to horizontally scale your architecture to hundreds of thousands of concurrent connections, while trying to ensure redundancy and high availability.
Knowing how to scale is important, and more important than that is knowing “when” to scale. For this, you should constantly monitor your system. There are certain clues that you need to pay special attention to, which are precursors of the fact that the current architecture is not enough and you need to scale out. — You can either define elastic rules to automatically do the scaling for you or you can do it manually; however, it does not change the fact that you have to know what to look for before scaling up or down.
In this talk, I will try to peek into what it takes to create a real-life, scalable, highly-available, and highly-responsive Node.JS application and try to address the topics mentioned above as much as I can.